



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Ch1-3 review Material Type: Notes; Class: Statistical Methods; Subject: MATHEMATICS; University: Texas Tech University; Term: Fall 2011;
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Data is the information we gather with experiments and with surveys. Ex: Say we want to know how well a group of students did in a statistics course. The data could be what every student made on their final exam grades. 3 Statistical Methods Design : planning how to obtain data. Ex: conduct an experiment/survey Description : summarizing the raw data and presenting it in a useful format. Ex: charts, graphs, average (mean), median, etc. Inference : making decisions or predictions based on data Subjects are the entities that we measure in a study. Ex: people, schools, rats, etc. Population vs. Sample Population : all subjects of interest Sample : subset of the population for whom we have data Ex: Let’s say we want to know how many Texas Tech students like coffee. To figure this out we surveyed 50 random students in the S.U.B. Population: every Texas Tech Student Sample: the 50 random students we surveyed Parameter vs. Statistic Parameter : numerical summary of the population , usually unknown Statistic : numerical summary of a sample taken from the population Ex: The average number of cigarettes smoked by all teenagers last year -------- parameter The average number of cigarettes smoked by a proportion of teenagers last year ------ statistic The proportion of all teenagers who smoked in the last month ------ parameter The proportion of teenagers who smoked last month out of 50 teenagers ------ statistic After looking at the cars in the North Commuter parking lot we conclude that 67% of the people who park in North Commuter drive trucks -------- 67% is a parameter A survey of 50 car lots in America found that 35% of the cars in the car lots in America are BMWs --------- 35% is a statistic Randomness : each subject in the population has the same chance of being included in the sample (Random sampling enables the sample to be a good reflection of the population) Ex: Let’s say I want to know if I want to know if everyone in the class understands a question The top 10 scorers on the first exam ------ not random Everyone sitting in the last row ------- not random Picking 10 names off the attendance ------ random Variability Note that measurements may vary from subject to subject and from sample to sample. Ex: If I want to know how the class did on an exam. If I took the average of everyone whose name starts with an ‘s’, the average will be different than if I take the average of everyone whose name starts with a ‘m’. In saying this, we can get a more accurate idea of the population if we take larger samples. Computer and Statistics Data file: large sets of data are typically organized in a spreadsheet format Database: an existing archive collection of data files Applet: short application program for performing a specific task. Ex: random number generator
Variable : any characteristic that is recorded for the subjects in a study Categorical Variable: described by words. Ex: gender, marital status Quantitative Variable: described by numbers. Ex: number of pets in a household, height o Discrete Variable: there is a finite number of possible values. Ex: number of pets in a household o Continuous Variable: the values are represented in an interval. Ex: height (All forms of measurements are continuous variables. Ex: time, height, volume) Proportion Frequency : number of times an observation has occurred. Proportion/Relative frequency : (The proportion will always be between 0 and 1 .) Percentage : proportion multiplied by 100 Ex: 4 students received an A out of 40 students The frequency of getting an A is 4. The proportion of students who got an A is 4/40=0.1. The percentage of students who got an A is 0.1x100=10%. Frequency Table Possible values of variable Frequency/relative frequency(proportion) Ex: The president of student council wanted to know how many hours Tech students party. Here were his results Number of Party Hours 0-1 2-3 3-4 4 or more Count 4 10 22 44 Variable of interest: Number of hours Tech students party Type of variable: Quantitative Discrete or Continuous: Continuous Add proportions to the frequency table: Number of Party Hours 0-1 2-3 3-4 4 or more Relative Frequency 0.05 0.125 0.275 0. Distribution o A distribution tells us the possible values a variable takes as well as the occurrence of those values (frequency or relative frequency). o A graph or frequency table describes a distribution. Graphs for Categorical Variables Pie Charts Bar Graph s Graphs for Quantitative Data Dot Plot (small data set, discrete variable) Stem-and-leaf plots (small data set, discrete variable)
Interquartile Range : IQR = Q 3 – Q 1 Z-Score :
Outliers : o An outlier falls far from the rest of the data. o Outliers are represented in the tails of a distribution. Detecting Potential Outliers :
r only measures strength of linear relationship. r is always between -1 and 1. r > 0 => positive association, r < 0 => negative association. r is close to -1 or 1 => strong relationship, r is close to 0 => weak relationship.
r is unitless (does not depend on the variables’ units). Two variables have the same correlation no matter which is treated as the response variable. Squared correlation r^2 : r^2 x 100% of the variation in y can be explained by x.
positive association => +, negative association => -
is the predicted value of y when x is given.
distance between the point and the regression line.)
y x
(slope, b > 0 => positive association, b < 0 => negative association)
Regression Outlier : an outlier that lies far away from the trend that the rest of the data follows An observation is influential if: o Its x value is relatively low or high compared to the remainder of the data o The observation is a regression outlier Lurking Variable : usually unobserved, influences the association between the variables of primary interest Simpson’s Paradox : When the direction of an association between two variables changes after we include a third variable and analyze the data at separate levels of that variable.