Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Probability and Statistics: Sampling and Descriptive Statistics, Exercises of Probability and Statistics

The document covers topics related to probability and statistics, including the capture-recapture method of sampling, clinical studies, and descriptive statistics. It explains the logic of capture-recapture computations and the controlled clinical study methodology. The document also covers descriptive statistics, including frequency tables and relative frequency. examples and questions to test understanding of the topics presented.

Typology: Exercises

2022/2023

Uploaded on 03/14/2023

tomseller
tomseller 🇺🇸

4.6

(16)

56 documents

1 / 20

Toggle sidebar

Related documents


Partial preview of the text

Download Probability and Statistics: Sampling and Descriptive Statistics and more Exercises Probability and Statistics in PDF only on Docsity! MA 105 Guided Notes – Module 1—Probability and Statistics—p.17 Capture – Recapture Method of Sampling Capture – Recapture is a common method used to estimate the size of a population by sampling. Biologists and ecologists use this method extensively to estimate wild animal populations. • Step 1: Capture a sample of the animal you want to count in the area you want to know about. (your book calls the number captured 1n ) o Tag all the captured animals (given them each an identifying mark) o Release them back into the wild • Step 2: Recapture after enough time for the released individuals to re-mix with the whole population, capture a new sample of individuals and count the number of tagged and the number of untagged individuals in this second sample. The logic of capture-recapture computations: The computation is the proportion formed by two correctly stacked ratios. IF we can assume that the recaptured sample is representative of the whole population, then samplerecapturetheinTotal samplerecapturetheintagged populationTotal populationtotalintagged # ## = Notice that we KNOW the # tagged in the total population – that is the number we tagged in the first sample. Example 13.4 (reworded – This is the way such questions will appear on the exam) A large pond is stocked with catfish. You capture 200 catfish, tag and release them. You wait enough time for the tagged fish to spread out more with the general population. Then you capture another sample. This sample has 250 catfish. Of the 250 catfish in this second sample, 35 have tags. If the second sample is representative of the catfish population in the pond, estimate the number of catfish in the pond. I suggest you memorize this rather than the formula on page 530 of your text MA 105 Guided Notes – Module 1—Probability and Statistics—p.18 Section 13.5 Clinical Studies (Clinical Trials) Clinical Studies do not collect data for the same purposes as surveys and censuses. Instead, Clinical Studies attempt to determine whether a single variable can cause a certain effect. New vaccines and drug treatments are put through clinical studies before being officially approved for public use. Things that are “unhealthy” like cigarettes and caffeine are officially identified as “unhealthy” after clinical studies show that people who include significant amounts of them in their lifestyle have more health problems than people who do not include them. Controlled Clinical Study Methodology A controlled clinical study uses two groups: • treatment group (receives the actual treatment) • control group (sometimes called the comparison group) should only differ from the treatment group in that they do not receive the treatment. Confounding Variable: a characteristic (not the one being studied) in which the control and treatment groups differ. Then you can’t tell whether the effect was due to the characteristic being studied or due to this other characteristic or a combination of both. Randomized Controlled Study: subjects are randomly assigned to either the treatment or control group We can only deduce that the treatment CAUSES the effect if the treatment group experiences the effect and the control group does not experience the effect. Placebo Effect: just the idea that one is getting treatment can produce positive results. People receiving a placebo (a harmless, inactive substance like a “sugar pill”) often report experiencing improvement. Blind Study: The placebo effect cannot be eliminated, but it can be controlled by giving a placebo to the control group and conducting a blind study, in which neither the treatment nor the control group know whether they are getting the real treatment or the placebo. Double Blind Study: The scientists conducting the study are also not aware of whether the participant is getting the real treatment or the placebo. (participants and researchers are both “blind”). Even clinical studies that are properly designed can lead to conflicting conclusions. But when clinical studies of the same variable, done in different labs by different groups, consistently find the same conclusion, the clinical study method is persuasive. Experimental Variable: Flu Vaccine Effect: Prevents getting the flu Causes??? MA 105 Guided Notes – Module 1—Probability and Statistics—p.21 Study #2: In order to determine the effectiveness of a new vaccine that is alleged to cure “math anxiety”, a clinical study was conducted. One thousand college students enrolled in math courses across the U.S. were chosen to participate in the study. The 1,000 students were broken up into two groups. Those enrolled in calculus courses or higher were given the real vaccine. The students in remedial and basic math courses were given a fake vaccine consisting of sugared water. None of the students knew whether they were being given the real or the fake vaccine, but the researcher conducting the experiment knew. At the end of the semester the students were given a test that measured their level of math anxiety. The students in the treatment group showed significantly lower levels of math anxiety than those in the control group. On the basis of this experiment the vaccine was advertised as being highly effective in fighting math anxiety. 1. The sampling frame in this study consists of A. the treatment and control groups B. all U.S. college students C. all U.S college students enrolled in math classes D. all students that suffer from “math anxiety” E. None of the above 2. The target population in this study consists of A. treatment and control groups B. all U.S. college students C. all U.S. college students enrolled in math classes D. all students that suffer from “math anxiety” E. None of the above 3. The control group in this experiment consists of A. the 1,000 volunteer college students used for the study B. the students given the real vaccine C. the students given the fake vaccine D. This experiment has no control group because it used volunteers. E. None of the above 4. This experiment can best be described as a A. double blind randomized controlled experiment B. double blind controlled placebo experiment C. blind randomized controlled experiment D. blind controlled placebo experiment E. All of the above 5. The results of this experiment should be considered unreliable because F. only college students were used G. the treatment and control groups were not the same size H. the sample was too small I. the treatment and control groups represented two very different segments of the population J. None of the above 6. Which of the following is most likely confounding variable for this experiment? K. the student’s background in mathematics L. the student’s grade level (freshman, sophomore, junior, senior) M. the type of college attended (two year, four year, university) N. the student’s sex (male, female) O. None of the above MA 105 Guided Notes – Module 1—Probability and Statistics—p.22 Chapter 14: Descriptive Statistics Descriptive Statistics: statistics that summarize or otherwise describe large amounts of numerical data. • Present data visually as pictures or graphs • Numerical summaries like measures of center and measures of spread Data set: a collection of data values Data points: the individual data values in the data set Raw data: data as it was first gathered before any summarizing or computational manipulation N is the size of the data set (the population of data) Frequency: how often a particular data value occurs Outliers: Extreme values in the data that do not fit the overall pattern of the data. Create a Frequency Tally Make a frequency tally for this data set of exam scores. This is a first step in organizing data. 95, 90, 85, 90, 70, 15, 70, 50, 55, 80, 70, 80, 60, 45, 70, 75, 75, 75, 60, 65 Create a Frequency Table For the frequency tally above, make a frequency table. Important: In a Frequency Table, you only include the scores that actually happened. 100 95 90 85 80 75 70 65 60 55 50 45 40 35 Score Frequency 30 25 20 15 10 5 0 MA 105 Guided Notes – Module 1—Probability and Statistics—p.23 Relative Frequency Relative Frequency: the percent of the total population that had that value rather than the actual number that had that value. Relative frequency is used most commonly when the actual frequencies are very large numbers. This makes them easier to compare. Important: If you are graphing relative frequencies, be careful to: • Include the N-value in the title of the so that the actual data values can be reconstituted, if desired • Be sure to label the frequency scale as “relative frequency” or “percent of total”. Example: Find the relative frequencies for these raw data What do we need to find first???? Score Frequency Relative Frequency 5 3,500 4 2,000 3 1,250 2 1,000 1 250 Example: In a high stakes exam used for academic scholarship awards, N=200,000 and the relative frequency of the of a perfect score is 0.04%. How many students made a perfect score on the exam? MA 105 Guided Notes – Module 1—Probability and Statistics—p.26 Creating Circle Graphs (Pie Charts) Circle Graphs (Pie Charts) are good for showing the respective sizes of categories within a whole population. The circle represents the whole and the size of the sectors (the “slices”) are proportional to the relative frequency of each category. Remember 25% may be thought of as the ratio 100 25 . Also remember that a circle contains 360˚. It is often helpful to work with the reduced fraction for the percent rather than the number of degrees. So the basic proportion relationship when figuring out the size of each sector is: degrees 360 degreesin sector of size 100 % = Make a Circle Graph to Represent the following data: N = 100,000 marbles that are either red, blue, green, yellow, or purple. Type Percent Degrees Red 25% Blue 50% Green 10% Yellow 10% Purple 5% MA 105 Guided Notes – Module 1—Probability and Statistics—p.27 Section 14.2 Variables Variable: in statistics a variable is any characteristic that varies within the population. Examples: what color, what size, what kind, how many of . . . Categorical Variable: (qualitative variable) represents a quality that is not normally measured numerically. For instance, gender. • Categorical variables can be counted and quantified but we need to be very careful how we use those values and how we interpret them. We should not be giving a numerical average for variables like gender or hair color that are not actually numerical values to start with. Numerical Variable: (Quantitative variable) represents a measurable quality. • Discrete numerical variables are characteristics that cannot be measured in “infinitely” small fractions of a value (counting actual people, test scores, shoe sizes) • Continuous numerical variables are characteristics that can be measured in “infinitely” small fractions of a value (time, distance traveled, volume of liquid) • In the real world this distinction is blurred by o Rounding off values that are actually continuous so they seem discrete ( like measurements of length to the nearest quarter inch) o Calculations or subdividing that create as many decimal places as necessary (often number of people, like 2.5 children per family is an average) Practice: For each situation below, indicate whether the variable should be considered Categorical or Numerical. If Numerical, is it discrete or continuous? (a) Hair color: blond, brown, black, red, grey (b) Gender: male, female (c) Shoe size: 4, 4 2 1 , 5, 5 2 1 , 6, 6 2 1 , 7, 7 2 1 , 8, … (d) Ethnicity: Black, Native American, Hispanic, . . (e) Height in inches (f) Gender when asked to record it as a “1” if female and a “2” if male MA 105 Guided Notes – Module 1—Probability and Statistics—p.28 Histograms and Class Intervals ( text pp. 555-558) Histogram: a variation of a bar graph showing relative frequencies. Remember relative frequencies are the percent of the total population that had that value. In really large data sets where the raw values are very close together, we frequently group the raw data values into equal-sized classes and counting how many raw data values fall in each of these class intervals. Important: When creating histograms, it is mathematically correct to draw bars for adjacent categories touching one another (different from regular bar graphs). This is because the class intervals are continuous, where as the original categories discrete. (letter grades like A, B, C, D, F are discrete, but class intervals like 89-100 are continuous). Example: GPA's for all students at MSUM would be an example of a large data set where the values are not well-separated. The values only run from 0 to 4 and are computed to 3 decimal places so you get values like 3.725, 3.724, 3.726, 3.725 all of which are very close together and are pretty much the same GPA. Teachers frequently make histograms of grades grouped by A, B, C, D, and F to look at the type of grade distribution in their classes. This means that A+, A and A- are all recorded in the same bar. This is another example of data where the values run together and can be categorized together. Class Practice: Make a histogram of the following quiz averages. Use the final grade chart below to group the quiz averages into letter grades. 72 85.5 93.5 68 73.5 82.5 80 79.5 56.5 87.5 89.5 71 79.5 86 75 76.5 83 86.5 78 67 Grading Scale: A 91-100 B 81-90 C 71-80 D 61-70 F 60 - Remember that the frequencies are not the scores, but _______________________________. MA 105 Guided Notes – Module 1—Probability and Statistics—p.31 Median Median: The median the other measure of center that you are responsible for. • It is the value which occurs in the middle of the data when the data are put in numerical order. So it is the center of the data, physically. Half of the data values are above the median and half are below it in this ordered list. • Computing the Mean. o If there are an odd number of values in the ordered list, it is the center value. o If there are an even number of values in the ordered list, it is the average of the center two values. • The Median is useful because it is a measure of center that is not influenced by extreme outlier values. Class Practice: 1. Data Set: 5, 10, 8, 7, 4, 10, 9, 5, 7, 8, 2, 1, 7, 6, 10 (a) what is the mean of this data set? (b) what is the median of this data set? 2. Data Set: 100, 90, 70, 90, 20, 80 (a) what is the mean of this data set? (b) what is the median of this data set? 3. Data Set: (a) what is the mean of this data set? (b) What is the median of this data set? Score 0 4 6 7 8 9 10 Frequency 2 1 2 1 5 8 6 MA 105 Guided Notes – Module 1—Probability and Statistics—p.32 Percentiles Percentile: Percent and Percentile are NOT the same thing. • On a test, a score of 90 percent means that you got, proportionally speaking, 90 out of 100 correct. It compares the amount you scored to the total possible score. • A percentile-rank compares how you did compared to everyone else. A “90th percentile” score means that, proportionally speaking, you did as well or better than 90% of the people who took the test. It compares your score to all the other scores. Compute a Percentile: Step 1: List the data in numerical order, from least to greatest. Remember Percentile tells the relative position in the ordered list of data. So think of the data as being an ordered list of values like Nddddd ...,,,,, 4321 where each of these have a numerical value, but they also have a position value (the subscript) Step 2: Compute the locator for the pth –percentile using the formula: L = Np ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ 100 . Step 3: Try It Yourself: Find the 80th percentile value for the following GPA’s: 3.4, 3.9, 3.3, 3.6, 3.5, 3.4, 4.0, 3.7, 3.3, 3.8, 3.6, 3.9, 3.7, 3.4, 3.6 When L is a whole number: the =− percentilep th 2 1++ LL dd When L is not a whole number: the =− percentilep th L rounded up MA 105 Guided Notes – Module 1—Probability and Statistics—p.33 Quartiles The Quartiles divide the data set into four quarters. The data are first ordered. Then find the middle of the data (which is the median) and that is Quartile 2 (abbreviated Q2). 50% of the data are below Q2 and 50% of the data are above Q2. 10, 15, 20, 25, 25, 25, 30, 35, 40, 40, 40, 45, 50 Q1 (the first quartile) is the point below which 25% of the data occur. (If there are an even number of scores, average the two middle scores) Q3 (the third quartile) is the point below which 75% of the data occur. Q4 (the fourth quartile mark) of course, would be the highest value in the list so that 100% of the data falls below that point. Generally we only talk about Q1 and Q3. Instead of talking about Q2 we call it the Median and instead of Q4 we call it the Maximum. Example 14.14 (p. 563 of text) During the last year, 11 homes sold in the Green Hills subdivision. The selling prices, in chronological order, were: $167,000 Find the Median and Quartiles for this situation. 152,000 128,000 134,000 192,000 163,000 121,000 145,000 170,000 138,000 155,000 Q2 is 30