Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Introduction to Probability: Covariance and Correlation in Statistics, Study notes of Probability and Statistics

A part of the lecture notes for a statistics course, specifically for the topic of covariance and correlation. It covers the concept of covariance, its properties, and the relationship between covariance and correlation. The document also includes examples and formulas for calculating covariance and correlation. The lecture is based on jim pitman's book 'probability' and focuses on sections 6.4.

Typology: Study notes

Pre 2010

Uploaded on 10/01/2009

koofers-user-sor
koofers-user-sor 🇺🇸

10 documents

1 / 15

Toggle sidebar

Related documents


Partial preview of the text

Download Introduction to Probability: Covariance and Correlation in Statistics and more Study notes Probability and Statistics in PDF only on Docsity! Lectures prepared by: Elchanan Mossel Yelena Shvets Berkeley Stat 134 FAll 2005 Introduction to probability Follows Jim Pitman’s book: Probability Sections 6.4 Do taller people make more money? Question: How can this be measured? wage at 19 height at 16 National Longitudinal Survey of Youth 1997 (NLSY97) - Ave (wage) Ave (height) Meaning of the value of Covariance Back to the National Survey of Youth study : the actual covariance was 3028 where height is inches and the wages in dollars. Question: Suppose we measured all the heights in centimeters, instead. There are 2.54 cm/inch? Question: What will happen to the covariance? Solution: So let HI be height in inches and HC be the height in centimeters, with W – the wages. Cov(HC,W) = Cov(2.54 HI,W) = 2.54 Cov (HI,W). So the value depends on the units and is not very informative! Covariance and Correlation Define the correlation coefficient: X E X Y E YCorr X Y E SD X SD Y ( ) ( )( , ) ( ) ( ) ( ) − − = = ⋅ρ Cov X Y SD X SD Y ( , ) ( ) ( ) =ρ Using the linearity of Expectation we get: Notice that ρ(aX+b, cY+d) = ρ(X,Y). This new quantity is independent of the change in scale and it’s value is quite informative. Covariance and Correlation Properties of correlation: X YX YX and Y X Y X Y 0 and X Y 1 X Y X Y X Y * * * * * * * * * * ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( , ) ( , ) ( ) SD SD E E SD SD Corr Cov E − − = = = = = = = = µ µ Roll a dye N times. Let X be #1’s, Y be #2’s. Question: What is the correlation between X and Y? Solution: To compute the correlation directly from the multinomial distribution would be difficult. Let’s use a trick: Var(X+Y) = Var(X) + Var(Y) + 2Cov(X,Y). Since X+Y is just the number of 1’s or 2’s, X+Y∼Binom(p1+p2,N). Var(X+Y) = (p1+p2)(1 - p1+p2) N. And X∼Binom(p1,N), Y∼Binom(p2,N), so Var(X) =p1(1-p1)N; Var(Y) = p2(1-p2)N. Correlations in the Multinomial Distribution Hence Cov(X,Y) = (Var(X+Y) – Var(X) – Var(Y))/2 Cov(X,Y) = N((p1+p2)(1 - p1-p2) - p1(1-p1) -p2(1-p2))/2 = -N p1 p2 In our case p1 = p2 = 1/6, so ρ = 1/5. The formula holds for a general multinomial distribution. 1 2 1 1 2 2 1 2 1 2 Np p Np 1 p Np 1 p p p 1 p 1 p ( ) ( ) ( )( ) − = − − = − − ρ Variance of the Sum of N Variables Var(∑i Xi) = ∑i Var(Xi) + 2 ∑j<i Cov(Xi Xj) Proof: Var(∑i Xi) = E[∑i Xi – E(∑j Xi) ]2 [∑i Xi – E(∑j Xi) ]2 = [∑i (Xi –µi) ]2 = ∑i (Xi –µi) 2 + 2 ∑j<i (Xi –µi) (Xj –µj). Now take expectations and we have the result.