Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Probability and Statistics: Sampling and Descriptive Statistics, Exercises of Probability and Statistics

Harvard University Probability and Statistics

The document covers topics related to probability and statistics, including the capture-recapture method of sampling, clinical studies, and descriptive statistics. It explains the logic of capture-recapture computations and the controlled clinical study methodology. The document also covers descriptive statistics, including frequency tables and relative frequency. examples and questions to test understanding of the topics presented.

Typology: Exercises

2022/2023

Uploaded on 03/14/2023

tomseller 🇺🇸

4.6

(16)

271 documents

1 / 20

This page cannot be seen from the preview

Don't miss anything!

MA 105 Guided Notes – Module 1—Probability and Statistics—p.17

Capture – Recapture Method of Sampling

Capture – Recapture is a common method used to estimate the size of a population by

sampling. Biologists and ecologists use this method extensively to estimate wild animal

populations.

• Step 1: Capture a sample of the animal you want to count in the area you want to

know about. (your book calls the number captured 1

n)

o Tag all the captured animals (given them each an identifying mark)

o Release them back into the wild

• Step 2: Recapture after enough time for the released individuals to re-mix with the

whole population, capture a new sample of individuals and count the number of

tagged and the number of untagged individuals in this second sample.

The logic of capture-recapture computations: The computation is the proportion formed

by two correctly stacked ratios.

IF we can assume that the recaptured sample is representative of the whole population, then

samplerecapturetheinTotal

samplerecapturetheintagged

populationTotal

populationtotalintagged

#

## =

Notice that we KNOW the # tagged in the total population – that is the number we tagged in

the first sample.

Example 13.4 (reworded – This is the way such questions will appear on the exam)

A large pond is stocked with catfish. You capture 200 catfish, tag and release them. You

wait enough time for the tagged fish to spread out more with the general population. Then

you capture another sample. This sample has 250 catfish. Of the 250 catfish in this second

sample, 35 have tags.

If the second sample is representative of the catfish population in the pond, estimate the

number of catfish in the pond.

I suggest you memorize this

rather than the formula on

p

a

g

e 530 of

y

our text

Discover Exercises of Probability and Statistics Harvard University

Partial preview of the text

Download Probability and Statistics: Sampling and Descriptive Statistics and more Exercises Probability and Statistics in PDF only on Docsity!

Capture – Recapture Method of Sampling

Capture – Recapture is a common method used to estimate the size of a population by sampling. Biologists and ecologists use this method extensively to estimate wild animal populations.

Step 1 : Capture a sample of the animal you want to count in the area you want to know about. (your book calls the number captured n 1 ) o Tag all the captured animals (given them each an identifying mark) o Release them back into the wild
Step 2 : Recapture after enough time for the released individuals to re-mix with the whole population, capture a new sample of individuals and count the number of tagged and the number of untagged individuals in this second sample.

The logic of capture-recapture computations : The computation is the proportion formed by two correctly stacked ratios.

IF we can assume that the recaptured sample is representative of the whole population, then

Total intherecapture sample

taggedintherecapturesample Total population

taggedintotal population

Notice that we KNOW the # tagged in the total population – that is the number we tagged in the first sample. Example 13.4 (reworded – This is the way such questions will appear on the exam) A large pond is stocked with catfish. You capture 200 catfish, tag and release them. You wait enough time for the tagged fish to spread out more with the general population. Then you capture another sample. This sample has 250 catfish. Of the 250 catfish in this second sample, 35 have tags.

If the second sample is representative of the catfish population in the pond, estimate the number of catfish in the pond.

I suggest you memorize this rather than the formula on page 530 of your text

Section 13.5 Clinical Studies (Clinical Trials)

Clinical Studies do not collect data for the same purposes as surveys and censuses.

Instead, Clinical Studies attempt to determine whether a single variable can cause a certain effect.

New vaccines and drug treatments are put through clinical studies before being officially approved for public use.

Things that are “unhealthy” like cigarettes and caffeine are officially identified as “unhealthy” after clinical studies show that people who include significant amounts of them in their lifestyle have more health problems than people who do not include them.

Controlled Clinical Study Methodology

A controlled clinical study uses two groups:

treatment group (receives the actual treatment)
control group (sometimes called the comparison group) should only differ from the treatment group in that they do not receive the treatment.

Confounding Variable: a characteristic (not the one being studied) in which the control and treatment groups differ. Then you can’t tell whether the effect was due to the characteristic being studied or due to this other characteristic or a combination of both.

Randomized Controlled Study : subjects are randomly assigned to either the treatment or control group

We can only deduce that the treatment CAUSES the effect if the treatment group experiences the effect and the control group does not experience the effect.

Placebo Effect : just the idea that one is getting treatment can produce positive results. People receiving a placebo (a harmless, inactive substance like a “sugar pill”) often report experiencing improvement.

Blind Study: The placebo effect cannot be eliminated, but it can be controlled by giving a placebo to the control group and conducting a blind study, in which neither the treatment nor the control group know whether they are getting the real treatment or the placebo.

Double Blind Study: The scientists conducting the study are also not aware of whether the participant is getting the real treatment or the placebo. (participants and researchers are both “blind”).

Even clinical studies that are properly designed can lead to conflicting conclusions. But when clinical studies of the same variable, done in different labs by different groups, consistently find the same conclusion , the clinical study method is persuasive.

Experimental Variable: Flu Vaccine

Effect: Prevents getting the flu Causes???

Class Practice – Clinical Studies

Study #1: In order to determine the effectiveness of a new drug for HIV treatment, the researchers conducted a study at the Park HIV Clinic in Philadelphia. The clinic first asked all 8,000 of their HIV patients who were between the ages of 20 and 40 years of age if they would be willing to participate.

Only 2000 volunteered to participate in the study. All 2000 of those volunteers were given a battery of medical assessments to determine the severity of symptoms they were experiencing and prognosis.

The researchers looked at the results of these medical assessments and found there were 150 of these volunteers who were in the beginning stages of HIV infection and were showing only minimal symptoms. These 150 patients became the participants in the study.

By random assignment, 75 were assigned to “Group A” and the other 75 were assigned to “Group B”. Group A received injections from “Drug A” vials while Group B received injections from “Drug B” vials.

One vial was the experimental drug and the other vial was a placebo treatment. Neither the patients nor the researchers knew whether “Drug A” or “Drug B” was the actual treatment drug.

Participants received the injections once a week for 6 months. At the end of the 6 months of treatment, the patients were again given the same battery of medical assessment to determine the severity of symptoms they were experiencing and prognosis. The average level of health was found to be significantly better for Group B.

Group B turned out to be the group that had received the real drug treatment.

What is the sampling frame in this study?
What is the target population of this study?
Does it matter to the results of the study that the participants were volunteers? Why or why not?
What purpose did the initial medical screening to select the 150 actual participants serve in the methodology of this survey?
What makes this study a controlled study?
What makes this study a randomized controlled study?
Is this study blind or double blind? How can you tell?

Study #2: In order to determine the effectiveness of a new vaccine that is alleged to cure “math anxiety”, a clinical study was conducted. One thousand college students enrolled in math courses across the U.S. were chosen to participate in the study. The 1,000 students were broken up into two groups. Those enrolled in calculus courses or higher were given the real vaccine. The students in remedial and basic math courses were given a fake vaccine consisting of sugared water. None of the students knew whether they were being given the real or the fake vaccine, but the researcher conducting the experiment knew. At the end of the semester the students were given a test that measured their level of math anxiety. The students in the treatment group showed significantly lower levels of math anxiety than those in the control group. On the basis of this experiment the vaccine was advertised as being highly effective in fighting math anxiety.

The sampling frame in this study consists of A. the treatment and control groups B. all U.S. college students C. all U.S college students enrolled in math classes D. all students that suffer from “math anxiety” E. None of the above
The target population in this study consists of A. treatment and control groups B. all U.S. college students C. all U.S. college students enrolled in math classes D. all students that suffer from “math anxiety” E. None of the above
The control group in this experiment consists of A. the 1,000 volunteer college students used for the study B. the students given the real vaccine C. the students given the fake vaccine D. This experiment has no control group because it used volunteers. E. None of the above
This experiment can best be described as a A. double blind randomized controlled experiment B. double blind controlled placebo experiment C. blind randomized controlled experiment D. blind controlled placebo experiment E. All of the above
The results of this experiment should be considered unreliable because F. only college students were used G. the treatment and control groups were not the same size H. the sample was too small I. the treatment and control groups represented two very different segments of the population J. None of the above
Which of the following is most likely confounding variable for this experiment? K. the student’s background in mathematics L. the student’s grade level (freshman, sophomore, junior, senior) M. the type of college attended (two year, four year, university) N. the student’s sex (male, female) O. None of the above

Relative Frequency

Relative Frequency : the percent of the total population that had that value rather than the actual number that had that value.

Relative frequency is used most commonly when the actual frequencies are very large numbers. This makes them easier to compare.

Important: If you are graphing relative frequencies, be careful to:

Include the N-value in the title of the so that the actual data values can be reconstituted, if desired
Be sure to label the frequency scale as “relative frequency” or “percent of total”.

Example: Find the relative frequencies for these raw data

What do we need to find first????

Score Frequency Relative Frequency

5 3,

4 2,

3 1,

2 1,

1 250

Example : In a high stakes exam used for academic scholarship awards, N=200,000 and

the relative frequency of the of a perfect score is 0.04%. How many students made a

perfect score on the exam?

Bar Graphs of Frequencies

Bar graphs are often used to show frequencies. The higher the bar, the more frequent that data value.

Pictograph: uses pictures or icons to create the length of the bars

Characteristics of Bar Graphs:

Bars are separated from each other, not right up against each other
Height of bar indicates the frequencies of each score (or length of bar in horizontal)
Bar graphs are usually limited to 12 or fewer bars. More than that is difficult to read.

Steps for creating a proper bar graph:

Step 1 : Organize the data values

Step 2: Make the vertical scale (usually) the frequencies. Use equal intervals and be sure to label the scale “frequency”

Step 3 : Make the horizontal scale the possible values. Make it an equal interval scale, including values that did not occur. Include a word-label that tells what those values represent.

Step 4. Draw a bar above each value that did occur, making the bar as long as the frequency for that value. Keep the bars more narrow than the space between values so that the bars do not touch one another.

Example:

Create a bar graph to display this data set as relative frequencies. The letters represent the letters of the correct answers on a multiple choice test.

A B A C B C C B C A A A C C B B B C B B C A B A B C B B B C B A A B A B C C A A A C C B B A A A B C

Step 1: Steps 2- Organize the data values Graph the values

Creating Circle Graphs (Pie Charts)

Circle Graphs (Pie Charts) are good for showing the respective sizes of categories within a whole population. The circle represents the whole and the size of the sectors (the “slices”) are proportional to the relative frequency of each category.

Remember 25% may be thought of as the ratio 100

Also remember that a circle contains 360˚.

It is often helpful to work with the reduced fraction for the percent rather than the number of degrees.

So the basic proportion relationship when figuring out the size of each sector is:

360 degrees

sizeofsectorindegrees 100

Make a Circle Graph to Represent the following data:

N = 100,000 marbles that are either red, blue, green, yellow, or purple.

Type Percent Degrees

Red 25%

Blue 50%

Green 10%

Yellow 10%

Purple 5%

Section 14.2 Variables

Variable: in statistics a variable is any characteristic that varies within the population. Examples: what color, what size, what kind, how many of...

Categorical Variable : (qualitative variable) represents a quality that is not normally measured numerically. For instance, gender.

Categorical variables can be counted and quantified but we need to be very careful how we use those values and how we interpret them. We should not be giving a numerical average for variables like gender or hair color that are not actually numerical values to start with.

Numerical Variable : (Quantitative variable) represents a measurable quality.

Discrete numerical variables are characteristics that cannot be measured in “infinitely” small fractions of a value (counting actual people, test scores, shoe sizes)
Continuous numerical variables are characteristics that can be measured in “infinitely” small fractions of a value (time, distance traveled, volume of liquid)
In the real world this distinction is blurred by o Rounding off values that are actually continuous so they seem discrete ( like measurements of length to the nearest quarter inch) o Calculations or subdividing that create as many decimal places as necessary (often number of people, like 2.5 children per family is an average)

Practice: For each situation below, indicate whether the variable should be considered Categorical or Numerical. If Numerical, is it discrete or continuous?

(a) Hair color: blond, brown, black, red, grey

(b) Gender: male, female

(c) Shoe size: 4, 4 2

(d) Ethnicity: Black, Native American, Hispanic,. .

(e) Height in inches

(f) Gender when asked to record it as a “1” if female and a “2” if male

Section 14.3 Numerical Summaries of Data (text p. 558)

Another way to summarize data and make large data sets more comprehensible, is to numerically summarize them numerically. There are two main ingredients in such a summary:

Measures of Location (how the data “line-up” in an ordered list of the values) o Mean (Tells the center of the data) o Median (Tells the center of the data) o Percentile (Tells the percentile-rank the data value) o Quartile (Tells the center AND the quarter-marks ) o “The Five-Number Summary” (Minimum, Quartile 1, Median, Quartile 3, Max) o Box-Plots (Graphic representation of the 5-score summary)
Measures of Spread (how the data “bunch-up”) o Range (Max − Min) o Interquartile Range (Q3 −Q1) o Variance (an intermediate step to get to the standard deviation) o Standard Deviation (average distance from the mean)

MEAN

Mean: The mean is the arithmetic average.

It tells you where the center of the data is in terms of “weight” (balance point) or “volume” (equally full point -- If you think of the bars in the bar graph as being tubes filled with liquid, the average would be how full each tube would be if you used all the liquid, but completely evened-out the liquid so that each tube had the same amount in it.
To find the mean : Add up the scores and divide by the number of scores in the list.
There are other types of averages, but the mean is the most commonly used. It has one drawback as a measure of center – it can be strongly influenced by extreme outliers.
Be careful when you compute the mean for a frequency table of values. Working from a frequency table is a weighted average.

(see the next example)

Computing a Mean from a Frequency Table

Example: For the frequency chart below, you cannot get the average by adding just the numbers in the first row to get the total of the scores BECAUSE, for instance, the score of 9 did not happen just once, it happened 10 times.

If you need to compute the mean from a frequency table, it is easiest to add another row to the table that gives the “weighted” score. To get the weighted score, you multiply the exam score by the number of times it happened (the frequency). For instance, the score of 9 happened 10 times, so its weighted score is 9×10 = 90 because that is what you would get if your wrote out all 10 of them and added them up.

Then the mean of ALL OF THE SCORES = TotalofalloftheFrequencies

TotalofalloftheWeightedScores

Now finish computing the mean for this Exam.

Point Scores on a 24-Point Exam Exam Score 1 6 7 8 9 10 11 12 13 14 15 16 24 Frequency 1 1 2 6 10 16 13 9 8 5 2 1 1

Weighted Scores

Percentiles

Percentile: Percent and Percentile are NOT the same thing.

On a test, a score of 90 percent means that you got, proportionally speaking, 90 out of 100 correct. It compares the amount you scored to the total possible score.
A percentile-rank compares how you did compared to everyone else. A “90th percentile ” score means that, proportionally speaking, you did as well or better than 90% of the people who took the test. It compares your score to all the other scores.

Compute a Percentile:

Step 1 : List the data in numerical order, from least to greatest.

Remember Percentile tells the relative position in the ordered list of data. So think of the data as being an ordered list of values like

d (^) 1 , d 2 , d 3 , d 4 ,..., d N where each of these have a numerical value, but they also have a position value (the subscript)

Step 2 : Compute the locator for the pth^ –percentile using the formula:

L = N

p ⎟ ⎠

Step 3 :

Try It Yourself: Find the 80th^ percentile value for the following GPA’s:

3.4, 3.9, 3.3, 3.6, 3.5, 3.4, 4.0, 3.7, 3.3, 3.8, 3.6, 3.9, 3.7, 3.4, 3.

When L is a whole number: the p th − percentile = 2

dL + dL + 1

When L is not a whole number: the p th^ − percentile =L rounded up

Quartiles

The Quartiles divide the data set into four quarters. The data are first ordered.

Then find the middle of the data (which is the median) and that is Quartile 2 (abbreviated Q2 ). 50% of the data are below Q2 and 50% of the data are above Q.

Q1 (the first quartile) is the point below which 25% of the data occur. (If there are an even number of scores, average the two middle scores)

Q3 (the third quartile) is the point below which 75% of the data occur.

Q4 (the fourth quartile mark) of course, would be the highest value in the list so that 100% of the data falls below that point.

Generally we only talk about Q1 and Q3. Instead of talking about Q2 we call it the Median and instead of Q4 we call it the Maximum.

Example 14.14 (p. 563 of text)

During the last year, 11 homes sold in the Green Hills subdivision. The selling prices, in chronological order, were:

$167,000 Find the Median and Quartiles for this situation. 152, 128, 134, 192, 163, 121, 145, 170, 138, 155,

Q2 is 30

Range (text p. 567)

Range : tells how spread out the data are. Take the difference (subtract) between the highest value and the lowest value of the data set. Notice that the range depends only on the most extreme values of the data: the highest (maximum) and the lowest (minimum) values.

Example: Find the range of this data set:

72 85.5 93.5 68 73.5 82.5 80 79.5 56.5 87.

Example: Draw the box plot for the data in the previous example:

InterQuartile Range (p. 568)

Interquartile Range : the difference (subtraction) between the third quartile and the first quartile, Q3 −Q1.

Question: What percent of all the data points must lie within the interquartile range? Why?

Standard Deviation (text p. 568)

Standard Deviation : the most important and most commonly used measure of spread for a data set. It represents the average amount that the data points differ (deviate) from the Mean.

Step 1 : Find the average (Mean) of the data set.

Step 2 : Make a table with three columns and as many rows as there are data values.

Step 3. In the first column, put the data values. It makes things easier if you order them from least to greatest.

Step 4 : In the second column put the answer to this computation (Data Value −Mean) for each row. Do all the subtractions in that order. Some of these values will be negative.

Step 5 : In the third column, square the value in the second column.

Step 6 : Total the third column. This is the sum of the squares of the deviations.

Step 7 : The standard deviation, formally represented in statistics by the small

greek letter “sigma”(^ σ^ ) is equal

n

totalofcolumn 3

σ= the averageof(thesquaresofthedeviationsfromthemean)=

Example : Find the Standard Deviation for this data set of test scores: 1, 6, 7, 8, 8, 9, 10, 11, 12, 13, 14, 15, 16, 24

Probability and Statistics: Sampling and Descriptive Statistics, Exercises of Probability and Statistics

Related documents

Partial preview of the text

Download Probability and Statistics: Sampling and Descriptive Statistics and more Exercises Probability and Statistics in PDF only on Docsity!

Capture – Recapture Method of Sampling

Section 13.5 Clinical Studies (Clinical Trials)

Controlled Clinical Study Methodology

Class Practice – Clinical Studies

Relative Frequency

Bar Graphs of Frequencies

Creating Circle Graphs (Pie Charts)

Section 14.2 Variables

Section 14.3 Numerical Summaries of Data (text p. 558)

MEAN

Computing a Mean from a Frequency Table

Percentiles

Quartiles

Range (text p. 567)

InterQuartile Range (p. 568)

Standard Deviation (text p. 568)

greek letter “sigma”(^ σ^ ) is equal

σ= the averageof(thesquaresofthedeviationsfromthemean)=