Poli 8130 class notes, Lecture notes of Quantitative Techniques

Class notes to Quantitative Stats

Typology: Lecture notes

2015/2016

Uploaded on 04/04/2016

Kimberly.Payne
Kimberly.Payne 🇺🇸

4.5

(2)

2 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
~~~~~~~~~~ Class #2: 1/19/2011 ~~~~~~~~~~
Frequency Distributions (e.g. polling distributions)
Small “n” is for sample
Large N = total population
Remember that SPSS will always give us a large N
Never cut and paste data from SPSS in our articles, always recreate
Data Charts
In our ailiation we tend NOT to use pie charts, but rather bar
charges.
Cumulative Frequency Column = you add up the relative frequency as you
go to get 100% in the end.
*The discipline, as a rule, prefers fancy statistics, but you can sometimes
use simple statistics. Eg. Of simple statistics “After Sept 11, 2011,
American Civic Attitudes…”
Measures of Central Tendency (also referred to as “the average”)
For nominal data use the mode (it appears most often)
For ordinal data (or interval/ratio data with extreme outliers), use the
median (the value at the 50th percentile). E.g. Bill Gates’s income.
For interval/ratio data, use the mean: x-bar for sample, mew for
sample data. Regular X = some variable. Little x means an
observation (e.g. x1 = observation 1).
Summation sign means add up all your observations, then divide them
by the number of observations. *Use decimals, fractions are for
children*
Practice: Sample of ppl walking around Auburn. Asked gross income,
gender, and years of school.
What is the most appropriate measure of central tendency? Calculate it.
Gross Income: Median =12, 20, 35, 45, 65, 104 = 40 as median
Gender: F as mode
Years in School: 14, 16, 19, 9, 20, 18 = 16 as mean
POLI 8130 – Quantitative Stats II
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Poli 8130 class notes and more Lecture notes Quantitative Techniques in PDF only on Docsity!

 Frequency Distributions (e.g. polling distributions) - Small “n” is for sample - Large N = total population - Remember that SPSS will always give us a large N - Never cut and paste data from SPSS in our articles, always recreate Data Charts - In our affiliation we tend NOT to use pie charts, but rather bar charges. Cumulative Frequency Column = you add up the relative frequency as you go to get 100% in the end. *The discipline, as a rule, prefers fancy statistics, but you can sometimes use simple statistics. Eg. Of simple statistics “After Sept 11, 2011, American Civic Attitudes…” Measures of Central Tendency (also referred to as “the average”) - For nominal data use the **mode** (it appears most often) - (^) For ordinal data (or interval/ratio data with extreme outliers), use the **median** (the value at the 50 th^ percentile). E.g. Bill Gates’s income. - For interval/ratio data, use the **mean** : x-bar for sample, mew for sample data. Regular X = some variable. Little x means an observation (e.g. x1 = observation 1). - Summation sign means add up all your observations, then divide them by the number of observations. *Use decimals, fractions are for children* Practice: Sample of ppl walking around Auburn. Asked gross income, gender, and years of school. What is the most appropriate measure of central tendency? Calculate it. Gross Income: Median =12, 20, 35, 45, 65, 104 = **40 as median** Gender: **F as mode** Years in School: 14, 16, 19, 9, 20, 18 = **16 as mean** Means with Grouped Data F = sum of the categories (individuals) rather than the number of categories because this is grouped data. Measures of Dispersion: Range - Distance in absolute value between the smallest and largest values. We always figure out the range when we first look at a table. - Example: {4,7,8,2,1,1,6,5,9}, so R = \1-9\ = 8 - You can’t use the range when you have nominal data (counting doesn’t mean anything). - The range tells us little when we have extreme outliers. - It depends on the scale. Measures of Dispersion: Variation Ratio (not in book—we tend not to use this in our discipline). - Use with nominal data - V = 1 – n of modal category/n - Example: - Rarely reported, but may occasionally come up which is why we learned it. Measures of Dispersion: Mean Absolute Deviation (not really used, but always in text books as a nice way to talk about mean and standard deviations) - You average the distance between the mean and each value in your sample and then divide it by your sample size. You are just looking at distance between the observation and the distance from the mean. - Use with interval data(ish)… rarely used. - Example: {0,1,2,3} will give a mean of 1. Measures of Dispersion: Variance and Standard Deviation σ^2 =population variance σ igma = population standard deviation S^2 =sample variance S = sample standard deviation - Use with interval/ratio data - This tells us if the data are widely spread or tight. The Empirical Rule Analyze—Descriptive Statistics—Frequency—Select Region—Click on Everything for fun— To remove DC from the data set for persons per sq. mile because it is an outlier, do the following: Transform—Compute Variable—Target Variable (new name for the list you are changing)—Find new variable column and delete the value you want to remove—run Analyze again as new variable **Next class** : - hand in homework 1 - prepare for quiz 1, - reading Note for Homework: SPSS will code “no” as negative numbers if you do not remove them, therefore it will give you the wrong answers. ANES_EGSS2_Prelimshort.sav [DataSet2] Under “What Public Officials Care of What You Think”, get rid of all these negatives: 1,2,5,6,7 by doing this: either Recode into Same Variables OR Recode into Different Variables. Recode into Same Variable is easier. Incomplete: TransformRecode into Different VariablesC2_P1Old and New Values (make -7 become “System Missing”) Rename C2_P1_RevisedClick ChangeClick Ok. Can use Logit, SASS, or other software that you are comfortable using. SPSS Grad Pack limits number of observations, and data. OR Create 6 new e-mail addresses and download the software. SPSS available on computer lab computers and you can get on office computers. ----------- Class 3: 1/24/12 ---------- - Handed in HW 1 - Took Quiz 1 (calculated measures of central tendency and variability by hand) - Rec’d handout of p-values Lecture: - Tonight we are discussing probability - We’ll start the first half deductively, assume the world to be true, get to rules, and figure out which situations we’d apply those rules to in the real world. - In the second half of the class we’ll study random variables. Set Theory - Basics of Notation - E is the set, and e1, e2, and e3 are events in the set, so: E= {e1,e2,e3} - Union (OR) (U) If A={a1,a2,a3} and B={a3,b2,b3} then AUB= {a1,a2,a3,b2,b3} ■ (AUB) = Pr(A) + Pr(B) – (AΩB) - Intersection (AND) (upsidedownI): AUB = - Compliment - Symbolic Example - S={e1,e2,e3,e4,e5,e6,e7,e8}. E7 and e8 are simple events because they only happen once, but the following are complex events: ■ A={e1,e2,e3} ■ B={e3,e4,e5} ■ C={e6} - (AUB) = (A) + (B) – (AΩB) ■ = {e1,e2,e3} + {e3,e4,e5} – {e3} - Let B=the set of all voters who were female and voted for McCain or Obama ■ “What is the probability of A OR B?”: - .186 + .191 = 0.377 or 37.7% + .191 + .282 = 0. or 47.3% 37.7 + 47.3 – 19.1 = 65.9% ■ “What is the compliment of A?” - .186 + .191 – 1 = 62.3% - Conditional Probability Pr(A|B)=Pr(AΩB)/Pr(B) - The probability of event A given that event B has occurred. McCain Obama Green Other Male .186 .173 .087. Female .191 .282 .074. - Let A=the set of all voters who voted for McCain ■ .186 + .191 = 37.7% - Let B=the set of all voter who were female ■ .191 + .282 + .074 + .005 = 59.7% - Pr(A|B)= 34.6% = .191/. - Statistical Independence - A is statistically independent of B if Pr(A|B)=Pr(A) - Random Variables - Y=f(x)=x^2 “y” and “f of x” are interchangeable terms - X is a random variable. Little x is a particular observation of X. - All random variables have characteristics called parameters (fixed constants that describe things like the form and distribution of the variable) and moments (the mean, the variance, etc. which describe how the variables are distributed). - Two types of variables we will be concerned about in this class: discrete random variables and continuous random variables. - Discrete random Variables - Discrete random variables take on discrete values, most simply 0 and 1 - Discrete random variables have: ■ A mass function: assigns probabilities to the numbers that corresponds to these events (e.g. heads = 0 and tails = 1) ■ A cumulative distribution: the sum of the probabilities that are also bounded by 0 and 1 - Binomial Distribution - A binomial distribution is a type of random variable which describes a situation in which a finite number of events of exactly the same type have an equal probability of occurring. - Binomial distributions have 2 parameters: ■ N ■ Pr or π pie is not 3.14 in this case; it is the probability that an event will occur - F(x) = Pr(X=xi) = N!/x!(N-x)! * π x^ * (1- π) n-x - Binomial Distribution Example - We want to model the number of times you will vote for president in a given election. There are two opportunities, the primary and general elections, which is a finite number. In addition, there’s an 80% chance that you will come out to vote. N represents the number of times you’ll look for a particular outcome (here it is 2, the number of…) - **Note: Remember anything raised to the 0 power = 1. - Working out a table for yourself you can do this: **X N!**^ **X!**^ **(N-x)!**^ **π**^ **x**^ **(1-π)N-xi**^ **Mass Functio n f(x) = Pr(X=x)** **Distribution F(x)=Pr (X<=x)** ## 0 2 1 2 1 .2^2 =.04 .04. ## 1 2 1 1 .8 .2^1 =.20 .32. ## 2 2 2 1 .64 .2^0 =1 .64 100% - What is the probability of voting 1 times OR 0 times? Pr(0U1) =. - What is the probability of voting 2 times OR 0 times? Pr(0U2) =. - **The binomial distribution table replaces this** ■ On the table (page 668 of math book copy), find n = 2, then go to the probability of .80 and see that for 0,1,2 is the same: .040, .320, and. - Continuous Random Variable - A continuous random variable is another type of random variable and this one can take an infinite number of values. - Where a discrete random variable has a mass function, a continuous random variable has a density. ■ It is bounded by 0 and 1, and ■ Pr(X=x1, Y=y)=Pr(X=x)Pr(Y=y) ■ Pr(X=1, Y=1) Pr(X=1)Pr(Y=1)=.625*.250=. ■ Pr(X=1, Y=0) Pr(X=1)Pr(Y=0) = .625*.750=. - If all 4 boxes match, then this means that the variables are statistically independent. **X/Y 1 0 0** 0.156^ 0. **1** 0.094 0. End-of-class notes: - The homework is on Canvas, it does not require SPSS this week. ----------- Class 4: 1/31/12 ---------- - No class on the 14 th—Dr. Brown has to interview POLI candidates - (^) Forget schedule, 1 less HW, 1 less quiz, so things will be weighed slightly differently - Doing Confidence Intervals (CI) and t-tests instead of Hypothesis Testing and ANOVA (which will be next week) ----- CENTRAL LIMIT THEORUM ----- - Note on Terminology - Estimation Error/Standard Error SE(X-bar) = sigma / √of n - - We also used this for Beta coefficients. - We will see this for the rest of the semester - Central Limit Theorem I - Very Simple Random Sample (VSRS) ■ Pr(x)=1/N ■ Example: population of 5,000 M and 5,000 F - n = 1/10, **n X~B(N,π) Pr(=M&F) 4** (4,.5). **6** (6,.5). **10** (10,.5). **20** (20,.5). ■ If we want to know the probability of finding exactly 2 men and 2 women in the sample this is a bimodal distribution. ■ We use the sample to make inferences about the population, although it may not actually represent the population. ■ The larger the sample you draw, the less the chances will be that the sample approximates the population. ■ The larger our sample grows, the more our inferences about our sample become more accurate. - Central Limit Theorem II - But… as n increases, inferences become more precise. - Expected value, E, ■ E(x̅ ) =μ ■ E(s)= σ/√ of n ■ Central Limit Theorem: Any VSRS of the size n, the mean fluctuates around μ with a standard error (SE) - Convergence and distribution = mathematical - Central Limit Theorem III - This means that no matter what the shape of the parent distribution, the mean of a sample will always follow a normal distribution with mean μ and standard deviation σ 2 /sq.rt of n - All of which ties back to standardizing with the Z-statistic - The math for this only works if you sample with replacement (put the sample back in the population before drawing again). - The Normal Approximation Rule I (a piece of the Central Limit Theorem) - With proportions, we’ll use π and P ■ SE(P)=√ of [π(1- π)/n] - The Normal Approximation Rule II - An Example: 60% of some population votes Republican (π=.60). How likely is it that when we take a poll of n=100 voters, our sample proportion P will turn out to be within 10% (.1) of the population proportion? - So we’re asking: ■ Pr(π-.10<P< π+.1)= (since π=.6) ■ Pr(.5<P<.7) - We use π=.6, n= ■ P~N(.6, √ of [6(1-.6)]/100=N(.6,.05) ■ Pr(-2<x<2) p≈. - The math is a little different converting to a z-score for a proportion than a mean. - [missed bottom of slide] CI Slide 4: Population Standard Deviation is Known - Example: x̅ =$36,000, n=16, σ =10, - μ = 36,000 ± z.049(10,000/(√16) - = 36,000 ± z.1.65(10,000/(√16) - = 36,000 ± 4,130 answer may be wrong CI Slide 5: When the Population Standard Deviation Is Not Known Second formula: μ= x̅ ± t-crit, df(s/√ of n) CI Slide 6: When the Population Standard Deviation Is Not Known - Sample example as before: X=36,000, n=16, and s=10, - μ = 36,000 ± t.049, df=15(10,000/(√16) - = 36,000 ± t1.75 (10,000/(√16) - = 36,000 ± 4,375 answer may be wrong - So this means we have 5% for error and that “If you took repeated samples, 95% of the similarly constructed samples would contain the population average. Therefore about 5% of our samples will not contain the average.” - “The bigger the confidence interval the more likely it is that the population statistic will be captured in that interval.” - To do this well, you need 100 observations, although on the homework we may be given less. CI Slide 7: Sample Proportions **Race/Sex Male (1) Female (2) Total White (1)** 660 794 1454 **Black (2)** 79 128 207 **Latino (3)** 12 7 19 **Asian (4)** 12 12 24 **Total** 763 941 1704 - For white females, p=794/1454=0.55; n= - For black females, p=128/207=0.62; n= - White Females: π=.55± 1.96(.55(1-.55)/√1454) .55±. - The population of white women in a random draw would be (0.524, 0.576) - Black Females: π=..62± 1.96(.62(1-.62)/√207) .62±. - The population of white women in a random draw would be (0.554, 0.686) ■ Note: Black women have a larger range because the population is smaller. CI Slide 8: Difference of Proportions (aka “T-tests”) In real-life we will do this on the computer Returning to Women Example: μ 1 - μ 2 = (p 1 – p 2 ) ± z-crit(√p 1 (1-p 1 )/n1+p 2 (1- p 2 )/n 2 ) Comparing White and Black: μ 1 - μ 2 = (.55 – .62) ± 1.69(√.55(1-.55)/ +.62(1-.62)/207) CI Slide 9: Difference between Means - Any sample size, when σ is known: - μ 1 - μ 2 = (x̅ 1 – x̅ 2 ) ± zα/2(√(σ 12 /n 1 )+(σ 22 /n 2 ) - Small sample size, s is known: - μ 1 - μ 2 = (x̅ 1 – x̅ 2 ) ± tα/2(sp)(√(1/n 1 )+(1/n 2 ), where End-of-class notes: - Khan Academy: http://www.khanacademy.org/video/t-statistic- confidence-interval?topic=statistics-