Statistical Methods - Study Material | MATH 2300, Study notes of Data Analysis & Statistical Methods

Ch1-3 review Material Type: Notes; Class: Statistical Methods; Subject: MATHEMATICS; University: Texas Tech University; Term: Fall 2011;

Typology: Study notes

2010/2011

Uploaded on 10/12/2011

zhen-zhang
zhen-zhang 🇺🇸

2 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 1-3 review
Data is the information we gather with experiments and with surveys.
Ex: Say we want to know how well a group of students did in a statistics course. The data could
be what every student made on their final exam grades.
3 Statistical Methods
Design: planning how to obtain data. Ex: conduct an experiment/survey
Description: summarizing the raw data and presenting it in a useful format. Ex: charts,
graphs, average (mean), median, etc.
Inference: making decisions or predictions based on data
Subjects are the entities that we measure in a study.
Ex: people, schools, rats, etc.
Population vs. Sample
Population: all subjects of interest
Sample: subset of the population for whom we have data
Ex: Let’s say we want to know how many Texas Tech students like coffee. To figure this out we
surveyed 50 random students in the S.U.B.
Population: every Texas Tech Student
Sample: the 50 random students we surveyed
Parameter vs. Statistic
Parameter: numerical summary of the population, usually unknown
Statistic: numerical summary of a sample taken from the population
Ex: The average number of cigarettes smoked by all teenagers last year -------- parameter
The average number of cigarettes smoked by a proportion of teenagers last year ------ statistic
The proportion of all teenagers who smoked in the last month ------ parameter
The proportion of teenagers who smoked last month out of 50 teenagers ------ statistic
After looking at the cars in the North Commuter parking lot we conclude that 67% of the
people who park in North Commuter drive trucks -------- 67% is a parameter
A survey of 50 car lots in America found that 35% of the cars in the car lots in America are
BMWs --------- 35% is a statistic
Randomness: each subject in the population has the same chance of being included in the
sample (Random sampling enables the sample to be a good reflection of the population)
Ex: Let’s say I want to know if I want to know if everyone in the class understands a question
The top 10 scorers on the first exam ------ not random
Everyone sitting in the last row ------- not random
Picking 10 names off the attendance ------ random
Variability
Note that measurements may vary from subject to subject and from sample to sample.
Ex: If I want to know how the class did on an exam. If I took the average of everyone whose
name starts with an ‘s’, the average will be different than if I take the average of everyone whose
name starts with a ‘m’.
In saying this, we can get a more accurate idea of the population if we take larger samples.
Computer and Statistics
Data file: large sets of data are typically organized in a spreadsheet format
Database: an existing archive collection of data files
Applet: short application program for performing a specific task. Ex: random number generator
pf3
pf4
pf5

Partial preview of the text

Download Statistical Methods - Study Material | MATH 2300 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Chapter 1-3 review

Data is the information we gather with experiments and with surveys. Ex: Say we want to know how well a group of students did in a statistics course. The data could be what every student made on their final exam grades.  3 Statistical MethodsDesign : planning how to obtain data. Ex: conduct an experiment/survey  Description : summarizing the raw data and presenting it in a useful format. Ex: charts, graphs, average (mean), median, etc.  Inference : making decisions or predictions based on data  Subjects are the entities that we measure in a study. Ex: people, schools, rats, etc.  Population vs. SamplePopulation : all subjects of interest  Sample : subset of the population for whom we have data Ex: Let’s say we want to know how many Texas Tech students like coffee. To figure this out we surveyed 50 random students in the S.U.B. Population: every Texas Tech Student Sample: the 50 random students we surveyed  Parameter vs. StatisticParameter : numerical summary of the population , usually unknownStatistic : numerical summary of a sample taken from the population Ex: The average number of cigarettes smoked by all teenagers last year -------- parameter The average number of cigarettes smoked by a proportion of teenagers last year ------ statistic The proportion of all teenagers who smoked in the last month ------ parameter The proportion of teenagers who smoked last month out of 50 teenagers ------ statistic After looking at the cars in the North Commuter parking lot we conclude that 67% of the people who park in North Commuter drive trucks -------- 67% is a parameter A survey of 50 car lots in America found that 35% of the cars in the car lots in America are BMWs --------- 35% is a statistic  Randomness : each subject in the population has the same chance of being included in the sample (Random sampling enables the sample to be a good reflection of the population) Ex: Let’s say I want to know if I want to know if everyone in the class understands a question The top 10 scorers on the first exam ------ not random Everyone sitting in the last row ------- not random Picking 10 names off the attendance ------ random  Variability Note that measurements may vary from subject to subject and from sample to sample. Ex: If I want to know how the class did on an exam. If I took the average of everyone whose name starts with an ‘s’, the average will be different than if I take the average of everyone whose name starts with a ‘m’. In saying this, we can get a more accurate idea of the population if we take larger samples.  Computer and Statistics Data file: large sets of data are typically organized in a spreadsheet format Database: an existing archive collection of data files Applet: short application program for performing a specific task. Ex: random number generator

Variable : any characteristic that is recorded for the subjects in a study  Categorical Variable: described by words. Ex: gender, marital status  Quantitative Variable: described by numbers. Ex: number of pets in a household, height o Discrete Variable: there is a finite number of possible values. Ex: number of pets in a household o Continuous Variable: the values are represented in an interval. Ex: height (All forms of measurements are continuous variables. Ex: time, height, volume)  Proportion Frequency : number of times an observation has occurred. Proportion/Relative frequency : (The proportion will always be between 0 and 1 .) Percentage : proportion multiplied by 100 Ex: 4 students received an A out of 40 students The frequency of getting an A is 4. The proportion of students who got an A is 4/40=0.1. The percentage of students who got an A is 0.1x100=10%.  Frequency Table Possible values of variable Frequency/relative frequency(proportion) Ex: The president of student council wanted to know how many hours Tech students party. Here were his results Number of Party Hours 0-1 2-3 3-4 4 or more Count 4 10 22 44 Variable of interest: Number of hours Tech students party Type of variable: Quantitative Discrete or Continuous: Continuous Add proportions to the frequency table: Number of Party Hours 0-1 2-3 3-4 4 or more Relative Frequency 0.05 0.125 0.275 0.  Distribution o A distribution tells us the possible values a variable takes as well as the occurrence of those values (frequency or relative frequency). o A graph or frequency table describes a distribution.  Graphs for Categorical Variables  Pie ChartsBar Graph s  Graphs for Quantitative Data  Dot Plot (small data set, discrete variable)  Stem-and-leaf plots (small data set, discrete variable)

Interquartile Range : IQR = Q 3 – Q 1  Z-Score :

x x

z

s

Outliers : o An outlier falls far from the rest of the data. o Outliers are represented in the tails of a distribution.  Detecting Potential Outliers :

  • 1.5 x IQR rule: x < Q 1 – 1.5 x IQR or x > Q 3 + 1.5 x IQR
  • z-score rule: z < -3 or z > 3  Five-number summary : min ( not including potential outliers), Q 1 , Q 2 , Q 3 , max ( not including potential outliers))  Box Plot : using five-number summary, horizontal line <-----> tail.  Resistant Measures : A numerical summary measure is resistant if extreme observations (outliers) have little, if any, influence on its value. Ex: resistant to outliers: median not resistant to outliers: mean, range, standard deviation, linear correlation  Two variables:Response variableExplanatory variableAssociation An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.  Association between Two Categorical Variables Contingency Table Conditional proportion :  Association between Two Quantitative Variables Scatter Plot : Horizontal axis: explanatory variable, x Vertical axis: response variable, y Trend : linear, curved, clusters, no pattern Direction: Positively associated : y increases as x increases Negatively associated : y decreases as x increases Strength : linear correlation

1 x y

x x y y

r

n s s

r only measures strength of linear relationship. r is always between -1 and 1. r > 0 => positive association, r < 0 => negative association. r is close to -1 or 1 => strong relationship, r is close to 0 => weak relationship.

r is unitless (does not depend on the variables’ units). Two variables have the same correlation no matter which is treated as the response variable. Squared correlation r^2 : r^2 x 100% of the variation in y can be explained by x.

r  r^2

positive association => +, negative association => -

Regression Line : ˆ y^  a^^  bx

y^ ˆ

is the predicted value of y when x is given.

Residual : y^ ^ y ˆ(measures the size of the prediction errors, the vertical

distance between the point and the regression line.)

y x

s

b r

s

(slope, b > 0 => positive association, b < 0 => negative association)

a  y  b x ( )(y-intercept, predicted value for y when x=0)

Regression Outlier : an outlier that lies far away from the trend that the rest of the data follows An observation is influential if: o Its x value is relatively low or high compared to the remainder of the data o The observation is a regression outlier Lurking Variable : usually unobserved, influences the association between the variables of primary interest Simpson’s Paradox : When the direction of an association between two variables changes after we include a third variable and analyze the data at separate levels of that variable.