Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

An introduction to statistics 101, focusing on descriptive and inferential statistics and data collection. Students will learn about the importance of statistics, the distinction between descriptive and inferential statistics, and the process of collecting and analyzing data. The document also covers the concept of variables, their types, and data collection methods such as simple random sampling and stratified sampling.

Typology: Study notes

Pre 2010

1 / 26

Download Statistics 101: Understanding Descriptive and Inferential Statistics and Data Collection - and more Study notes Statistics in PDF only on Docsity! 1 Stat 104 Section B Anna Peterson 1 Stat 104 • Instructor: Anna Peterson • email: [email protected] [email protected] • Office Hours: Snedecor 2211 MWF 10:00-11:00 • Lecture: MWF 8:40-9:40 MoleBio 1420 • Laboratory: TF 1:20-3:20 Carver 0305 • Required Text: Just the Essentials of Elementary Statistics, 10th Edition, Robert Johnson & Patricia Kuby, Thomson: Brooks/Cole 2 Prerequisites • Make sure you can do basic algebra. – There will be a pre-test passed out. – Understand summation notation – Order of operations • Make sure you can use a calculator. – Bring your calculator to class and lab. 3 2 How can you do well in this class? • Attend all lectures and pay attention. • Attend all labs and participate. • Complete all assignments. • Go over answers to assignments. • READ and STUDY the textbook. • Come to office hours with questions. • Form study groups with fellow students. 4 Course Information and Policies • Exams: In class exams will be given during the lab period: Bring a Calculator, pencil/pen, formula paper, and tables handed out in class. – Exams will be given during the 2 hour lab period on Fridays. – The exam is closed notes and book. One 8x11 sheet of paper, typed/written on one side may be used for the first exam. Two 8x11 sheets of paper, typed/written on one side each for the second exam. • Final Exam: August 7th – Three 8x11 sheets of paper, typed/written on one side each may be used on the final exam. One from each of the two previous exams and one for the new material. 5 Course Information and Policies Lab: There are two weekly two-hour laboratory scheduled. Bring your book, class notes, tables and a calculator to the lab. • Class participation points are given for presenting a homework problem solution during lab. Each student is only required to present one solution. You will receive 10 points for presenting. • There are 11 labs this summer I will take your top 8 lab scores towards your lab grade. Each lab is worth 5 points • Extra presentations and better lab attendance will influence boundary grades at the end of the semester. 6 5 What is Statistics? In Business and Industry, statistics can be used to quantify unknowns in order to optimize resources, e.g. – Predict the demand for products and services. – Check the quality of items manufactured in a facility. In Agriculture, statistics can be used to: – Predict the crop yields. – Estimate minimum fertilizer needed. 13 What Is Statistics? • Statistics is about …variation. – The world is full of data. – Data exhibit variation. – Recognizing, displaying and quantifying variation in data can help us make sense of the world. – Try to explain variation. 14 We distinguish between descriptive and inferential Statistics: • look for patterns • summarize and present data • quick information • compare several groups, i.e. one can easily look for differences and similarities 15 Descriptive Statistics: The collection, presentation and description of data in form of graphs, tables and numerical summaries such as averages, variances etc. 6 We distinguish between descriptive and inferential Statistics: • making data-based decisions • generalizing information obtained from descriptive analysis to a larger group of individuals 16 Inferential Statistics: Deals with the interpretation of data as well as drawing conclusions and making generalizations based on data for a larger group of subjects. Example: Before movies are released they are previewed by a selected audience. Assume 200 people are asked to provide an overall rating for a movie. Results: • 24% very satisfied • 26% satisfied • 33% in between • 12% dissatisfied • 5% very dissatisfied 24% of the 200 previewers were very satisfied with the movie {this is a descriptive statement based on a sample of 200 previewers.} 24% of all people going to see the movie will be very satisfied {this is an inferential statement for the entire population of individuals.} 17 18 Statistics is the science of: • Collecting • Describing (displaying) • Interpreting Data 7 19 We collect data to answer a specific question of interest. • Does nitrogen improve corn yield? • What seed is best? • What is the relationship between rainfall and yield? • Does this new drug cure the disease? Is it safe? • What do voters think about a candidate or an issue? yield nitrogen 20 Does nitrogen improve corn yield? We have a question that we would like to answer. Are we interested in all corn, just one brand of corn, or only corn grown in Iowa? The group that is to be studied is called the population and each element of the population is called an individual. We now decide that we are specifically interested in all corn types grown in Iowa. Is it feasible to collect data from every single corn field in the state of Iowa? No. Not enough time or money. We look for a reasonable subset of the population called a sample. Perhaps one farm from each county in Iowa. Population: All farms in Iowa. Individual: A single farm. 21 – all items of interest. Example:All farms in Iowa Parameter – numerical value summarizing all the data of the entire population. Example: population mean yield of corn – a few items from the population. Example: 10 farms in Iowa. Statistic – numerical value summarizing the sample. Example: sample mean yield of corn. Population Sample 10 28 Nominal: names an element Examples: Gender, hair color, type of vehicle you own, favorite color Qualitative (Categorical) Ordinal: incorporates an ordered position, or ranking Examples: level of satisfaction with a product, heat setting on a microwave (low, med, high) 29 Continuous: assumes an uncountable number of values Examples: Height, weight, distance (measurements) Quantitative (Numerical) Discrete: assumes a countable number of values Examples: Age, number of siblings, dozens of eggs (things you can count) 30 Variable Quantitative (Numerical): quantifies an element of a population Nominal: names an element Qualitative (Categorical): describes or categorizes an element of a population Ordinal: incorporates an ordered position, or ranking Discrete: assumes a countable number of values Continuous: assumes an uncountable number of values 11 31 Gender Age Job Happiness (1-5) 5=very happy Number of Children Average Salary/ hour F 25 Accountant 3 0 25.25 M 30 Sales 4 2 10.21 F 19 Student 2 0 7.25 F 24 Marketing 3 1 20.41 F 56 Teacher 5 4 15.3 M 32 Librarian 5 0 20.25 M 34 Accountant 3 0 25 M 45 Realtor 2 2 15.61 M 18 Student 5 1 7.5 F 62 Bus driver 4 2 12.4 F 34 Cashier 2 3 6.5 32 Variable Quantitative (Numerical): quantifies an element of a population Nominal: Gender, Job Qualitative (Categorical): describes or categorizes an element of a population Ordinal: Happiness (ordered by amount of happiness) Discrete: Age, Number of Children Continuous: Average Wage 33 Example: Gallup News Service conducted a survey of 1012 adults aged 18 years or older, August 29-September 5, 2000. The respondents were asked, “Has anyone in your household been the victim of a crime in the past 12 months?” Of the 1012 adults surveyed, 24% said they or someone in the household had experienced some type of crime during the preceding year. Gallup News Service concluded that 24% of all households had been victimized by crime during the past year. a) Identify the research objective To determine the proportion of households in the US that have been a victim of a crime in the past 12 months. b) Identify the sample 1012 adults aged 18 yrs or older c) List the descriptive statistics : 24% of respondents stated that they or someone in the household experienced some type of crime. d) What is the corresponding parameter p: The proportion of households that experienced some type of crime in the past 12 months. e) State the conclusions made in the study 24% of all households in the US have been victimized by crime in the past 12 months. Notice that the conclusions are made (inferred) toward the entire population. We hope the statistic ( ) is a good point estimate of the parameter (p). p̂ p̂ 12 Data Collection • Sampling studies (surveys) – Ch 1.4 • Experiments – Notes only 34 Sample Surveys • Idea 1: Examine a part of the whole. – Easier to obtain – Easier to work with 35 Population – all items of interest. Sample – a few items from the population. Properties of a Sample (part of a whole) • Would like the sample to be representative of the population. – Should look like a smaller version of the population • This may not be possible, but at least we would like a sample that is not biased. – A biased sample is one that over (or under) represents a certain portion of the population – Telephone Surveys (how biased?) 36 15 Simple Random Sample (SRS) • We want a representative sample but will settle for one that is not biased. – Representative: sample resembles smaller version of the population – Unbiased: no group is under (or over) represented • SRS for example – Each combination of 400 ISU students has the same chance of being the sample selected. 43 SRS • Sampling Frame: a list, or set, of the elements belonging to the population from which the sample will be drawn – From example…Frame: A list of all students at ISU (the Registrar has such a list) – Use random numbers to select 400 students at random from this list using unique ID numbers 44 45 Similar Random Number Table can be found in Appendix B, Table 1 p658 16 46So we would sample individuals 162, 091, 170, 196, 130, 216, 336, 235, 027, 011. Problem: From a population of 400 individuals, we wish to select 10 individuals for our sample. Assign students a number from 0 to 399. Simple Random Sample • If one were to do this more than once – Different random numbers will give different samples of 400 students. – We have introduced variability by sampling! (Remember statistics is about variability!!) 47 48 We can obtain a random sample by sampling with or without replacement. Sampling without replacement Once an individual is selected to be in the sample, it cannot be selected again. For instance, if we are using a deck of cards as the population, if I draw a card and set it aside before selecting the next card, this is sampling without replacement. Sampling with replacement Once an individual is selected to be in the sample, the appropriate measurements are taken and then the individual is placed back into the population before selecting the next individual. Here it is possible for an individual to be selected more than one time. For example, if we are using a deck of cards as the population, if I draw a card and record its suit and then place it back in the deck before the next card is selected, this is sampling with replacement. 17 Other Sampling Plans • Systematic: Select in a systematic way from the sampling frame. Ie: choose every kth individual – From example: Select every 60th student on the list from the Registrar. –Caution the list should be in random order and the starting point should be selected at random. – Single-stage sampling plan 49 Other Sampling Plans • Stratified Sample –Multi-stage sampling plan • Separate into strata • SRS of each stratum –Can do comparisons across strata –Reduce error by grouping into strata – From example: Divide ISU students into colleges and select a SRS from each college. 50 51 Stratified Sample A stratified sample is obtained by separating the population into non- overlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous in some way. (strata: Geographical Regions) Stage 1: Divide Stage 2: SRS into alike strata from each strata Stratum Stratum Stratum Stratum StratumStratum 20 Observational Studies • Observational studies are those in which the researcher is a passive observer • Simply observing what happens – A sample survey is an observational study. – There are other observational studies that are not surveys. • Can’t make cause and effect inference based on observations 58 Tanning and Skin Cancer • 1,500 people. • Some had skin cancer and some did not have skin cancer. • Asked all participants whether they used tanning beds. 59 Diet and Blood Pressure • Enroll 100 individuals in the study. • Give each a diet diary. Everything eaten each day is recorded. From the diary entries the amount of sodium in the diet is calculated. • Measure blood pressure. 60 21 Differences • Retrospective – look at past records and historical data. – Tanning and Skin Cancer • Prospective – identify subjects and collect data as events unfold. – Diet and Blood Pressure 61 Experiments • Intentionally apply a treatment to individuals (referred to as experimental units) • Attempts to isolate the effects of the treatment on a response variable • Terminology… – Explanatory variable – Factor. – Response variable. – Subjects – Participants – Experimental Units. – Treatments. 62 • (Designed) Experiment: a controlled study in which one or more treatments are applied to experimental units. The experimenter then observes the effect of varying these treatments on a response variable. • Experimental unit: a person, object, or some other well-defined item upon which a treatment is applied. • Predictor (explanatory) variables are the factors that affect the response variable. Also referred to as independent variables. • Treatment: a condition applied to the experimental unit. (levels of the factors) 63 22 • Response variable is a quantitative or qualitative variable that represents the variable of interest. Also referred to as dependent variables. • Extraneous variables are neither response nor predictor variables. These are variables that may affect the outcome of the experiment, but are not controlled by the experimenter. 64 Experiments • The experimenter must actively and deliberately manipulate the factor(s) to establish the method of treatment. • Interested in “What might happen if I change this factor?” • Experimental units are assigned at random to the treatments. 65 Controlling Cholesterol • Does a higher dose of a new drug lower cholesterol more? – 30 participants. – Factor – drug dose. – Treatments: 10 mg or 20 mg. – 15 subjects randomly assigned to each treatment. – Response – change in cholesterol. 66 25 Diagram of an experiment 73 Subjects random assignment Group 1 several subjects Group 2 several subjects Treatment 1 Treatment 2 Compare Notice that we have both random assignment AND replication within the experiment (several subjects within each group) Example: A school psychologist wants to test the effectiveness of a new method for teaching reading. She selects five hundred first grade students in District 203 and randomly divides them into two groups. Group 1 is taught by means of the new method, while Group 2 is taught via traditional methods. The same teacher is assigned to teach both groups. At the end of the year, an achievement test is administered and the results of the two groups compared. 1. What is the response variable in this experiment? Achievement test score 2. What is the treatment? How many levels does the treatment have? Method of teaching, 2 levels 3. Are any of the predictor variables controlled? grade, teacher, school district 4. How does the researcher’s design handle students from different socioeconomic levels? hold constant 5. Identify the experimental units. 500 students 74 Example: An experiment is run to study the effect of cooking time on the number of unpopped kernels in bags of microwave popcorn. A package containing eight bags of the same brand of microwave popcorn is used. The first bag, selected at random, is popped for 2 minutes on high. The bag is opened and the number of unpopped kernels counted. The next bag, again selected at random, is popped for 2 minutes and 15 seconds, the bag is opened and the number of unpopped kernels counted. Subsequent bags are selected at random and popped for 15 seconds longer than the previous bag and the number of unpopped kernels is counted. The same microwave, set on high, is used for all bags. 75 26 • Is this an observational study or a designed experiment? How do you know? • What are the experimental units? • What is the explanatory variable? • What is the treatment and how many levels does the treatment have? 76 • What is the response variable? • Is the response variable qualitative or quantitative, and if it’s quantitative, is it discrete or continuous? • Are the three key “ingredients” taken care of? (Control, Random Assignment, Replication) 77 Multiple Factors • Use of exam aids on scores on a Statistics exam. • Experimental units: students in the Statistics class • Factors – Calculator: Yes or No (2 levels) – Formula sheet: Yes or No (2 levels) • Treatments – combinations of using or not using each exam aid. • The students will be randomly assigned to the treatments groups so that there are several students in each group. All students will be given the same statistics exam and the score on the exam will be the response variable. 78