Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

An in-depth exploration of covariance and correlation between random variables. It covers the definition, properties, and applications of covariance, including the variance of sums of random variables. Additionally, it introduces Pearson correlation and its significance in measuring linear relationships.

Typology: Study notes

2021/2022

1 / 8

Download Covariance and Correlation: Understanding the Relationship Between Random Variables and more Study notes Statistics in PDF only on Docsity! Chapter 5. Multiple Random Variables 5.4: Covariance and Correlation Slides (Google Drive) Alex Tsun Video (YouTube) In this section, we’ll learn about covariance; which as you might guess, is related to variance. It is a function of two random variables, and tells us whether they have a positive or negative linear relationship. It also helps us finally compute the variance of a sum of dependent random variables, which we have not yet been able to do. 5.4.1 Covariance and Properties We will start with the definition of covariance: Cov (X,Y ) = E [(X − E [X])(Y − E [Y ])]. By LOTUS, we know this is equal to (where µX = E [X] and µY = E [Y ])∑ x ∑ y (x− µX)(y − µY )pX,Y (x, y) Intuitively, we can see the following possibilities: • x > µX , y > µY ⇒ (x− µX)(y − µY ) > 0 (X,Y both above their means) • x < µX , y < µY ⇒ (x− µX)(y − µY ) > 0 (X,Y both below their means) • x < µX , y > µY ⇒ (x− µX)(y − µY ) < 0 (X below its mean, Y above its mean) • x > µX , y < µY ⇒ (x− µX)(y − µY ) < 0 (X above its mean, Y below its mean) So we get a weighted average (by pX,Y ) of these positive or negative quantities. Just with this brief intuition, we can say that covariance is positive when X,Y are usually both above/below their means, and negative if they are opposite. That is, covariance is positive in general when increasing one variable leads to an increase in the other, and negative when increasing one variable leads to a decrease in the other. Definition 5.4.1: Covariance Let X,Y be random variables. The covariance of X and Y is: Cov (X,Y ) = E [(X − E [X])(Y − E [Y ])] = E [XY ]− E [X]E [Y ] This should remind you of the definition of variance - think of replacing Y with X and you’ll see it! Note: Covariance can be negative, unlike variance. Covariance satisfies the following properties: 1. If X ⊥ Y , then Cov (X,Y ) = 0 (but not necessarily vice versa, because the covariance could be zero but X and Y could not be independent). 2. Cov (X,X) = Var (X). (Just plug in Y = X). 1 2 Probability & Statistics with Applications to Computing 5.4 3. Cov (X,Y ) = Cov (Y,X). (Multiplication is commutative). 4. Cov (X + c, Y ) = Cov (X,Y ). (Shifting doesn’t and shouldn’t affect the covariance). 5. Cov (aX + bY, Z) = a · Cov (X,Z) + b · Cov (Y,Z). This can be easily remembered like the distributive property of scalars (aX + bY )Z = a(XZ) + b(Y Z). 6. Var (X + Y ) = Var (X) + Var (Y ) + 2Cov (X,Y ), and hence if X ⊥ Y , then Var (X + Y ) = Var (X) + Var (Y ) (as we discussed earlier). 7. Cov (∑n i=1Xi, ∑m j=1 Yi ) = ∑n i=1 ∑m j=1 Cov (Xi, Yj). That is covariance works like FOIL (first, outer, inner, last) for multiplication of sums ((a+ b+ c)(d+ e) = ad+ ae+ bd+ be+ cd+ ce). Proof of Covariance Alternate Formula. We will prove that Cov (X,Y ) = E [XY ]− E [X]E [Y ]. Cov (X,Y ) = E [(X − E [X])(Y − E [Y ])] [def of covariance] = E [XY − E [X]Y −XE [Y ] + E [X]E [Y ]] [algebra] = E [XY ]− E [X]E [Y ]− E [X]E [Y ] + E [X]E [Y ] [Linearity of Expectation] = E [XY ]− E [X]E [Y ] [algebra] Proof of Property 1: Covariance of Independent RVs is 0. We actually proved in 5.1 already that E [XY ] = E [X]E [Y ] when X,Y are independent. Hence, Cov (X,Y ) = E [XY ]− E [X]E [Y ] = 0 Proof of Property 6: Variance of Sum of RVs. We will show that in general, for any RVs X and Y , that Var (X + Y ) = Var (X) + Var (Y ) + 2Cov (X,Y ) Var (X + Y ) = Cov (X + Y,X + Y ) [covariance with self = variance] = Cov (X,X) + Cov (X,Y ) + Cov (Y,X) + Cov (Y, Y ) [covariance like FOIL] = Var (X) + 2Cov (X,Y ) + Var (Y ) [covariance with self, and symmetry] Example(s) Let X and Y be two independent N (0, 1) random variables and: Z = 1 +X +XY 2 W = 1 +X Find Cov(Z,W ). 5.4 Probability & Statistics with Applications to Computing 5 Theorem 5.4.1: Variance of Sums of RVs If X1, X2, . . . , Xn are random variables, then Var ( n∑ i=1 Xi ) = n∑ i=1 Var (Xi) + 2 ∑ i<j Cov (Xi, Xj) Proof of Variance of Sums of RVs. We’ll first do something unintutive - making our expression more complicated. The variance of the sum X1 +X2 + · · ·+Xn is the covariance with itself! We’ll use i to index one of the sums ∑n i=1Xi and j for the other ∑n j=1Xi. Keep in mind these both represent the same quantity; you’ll see why we used different dummy variables soon! Var ( n∑ i=1 Xi ) = Cov n∑ i=1 Xi, n∑ j=1 Xj [covariance with self = variance] = n∑ i=1 n∑ j=1 Cov (Xi, Xj) [by FOIL] = n∑ i=1 Var (Xi) + 2 ∑ i<j Cov (Xi, Xj) [by symmetry (see image below)] The final step comes from the definition of covariance of a variable with itself and the symmetry of the covariance. It is illustrated below where the red diagonal is the covariance of a variable with itself (which is its variance), and the green off-diagonal are the symmetric pairs of covariance. We used the fact that Cov (Xi, Xj) = Cov (Xj , Xi) to require us to only sum the lower triangle (where i < j), and multiply by 2 to account for the upper triangle. It is important to remember than if all the RVs were independent, all the Cov (Xi, Xj) terms (for i 6= j) would be zero, and so we would just be left with the sum of the variances as we showed earlier! 6 Probability & Statistics with Applications to Computing 5.4 Example(s) Recall in the hat check problem in 3.3, we had n people who go to a party and leave their hats with a hat check person. At the end of the party, the hats are returned randomly though. We let X be the number of people who get their original hat back. We solved for E [X] with indicator random variables X1, . . . Xn for whether the i-th person got their hat back. We showed that: E [Xi] = P (Xi = 1) = P ( ith person get their hat back ) = 1 n So, E [X] = E [ n∑ i=1 Xi ] = n∑ i=1 E [Xi] = n∑ i=1 1 n = n · 1 n = 1 Above was all review: now compute Var (X). Solution Recall that each Xi ∼ Ber ( 1 n ) (1 with probability 1 n , and 0 otherwise). (Remember these were NOT independent RVs, but we still could apply linearity of expectation.) In our previous proof, we showed that Var (X) = Var ( n∑ i=1 Xi ) = n∑ i=1 Var (Xi) + 2 ∑ i<j Cov (Xi, Xj) Recall that Xi, Xj are indicator random variables which are in {0, 1}, so their product XiXj ∈ {0, 1} as well. This allows us to calculate: E [XiXj ] = P (XiXj = 1) [since indicator, is just probability of being 1] = P (Xi = 1, Xj = 1) [product is 1 if and only if both are 1] = P (Xi = 1)P (Xj = 1 | Xi = 1) [chain rule] = 1 n ( 1 n− 1 ) 5.4 Probability & Statistics with Applications to Computing 7 This is because we need both person i and person j to get their hat back: person i gets theirs back with probability 1 n , and given this is true, person j gets theirs back with probability 1 n−1 So, by definition of covariance (recall each E [Xi] = 1 n ): Cov (Xi, Xj) = E [XiXj ]− E [Xi]E [Xj ] = 1 n ( 1 n− 1 ) − 1 n · 1 n [plug in] = n n2(n− 1) − n− 1 n2(n− 1) [algebra] = 1 n2(n− 1) [algebra] Further, since Xi is a Bernoulli (indicator) random variable: Var (Xi) = p(1− p) = ( 1 n )( 1− 1 n ) Finally, we have Var (X) = n∑ i=1 Var (Xi) + 2 ∑ i<j Cov (Xi, Xj) [formula for variance of sum] = n∑ i=1 1 n ( 1− 1 n ) + 2 ∑ i<j 1 n2(n− 1) [plug in] = n ( 1 n )( 1− 1 n ) + 2 ( n 2 )( 1 n2(n− 1) ) [there are ( n 2 ) pairs with i < j] = ( 1− 1 n ) + 2 n(n− 1) 2 ( 1 n2(n− 1) ) = ( 1− 1 n ) + 1 n = 1 How many pairs are their with i < j? This is just ( n 2 ) = n(n−1) 2 since we just choose two different elements. Another way to see this is that there was an n × n square, and we removed the diagonal of n elements, so we are left with n2 − n = n(n− 1). Divide by two to get just the lower half. This is very surprising and interesting! When returning n hats randomly and uniformly, the expected number of people who get their hat back is 1, and so is the variance! These don’t even depend on n at all! It takes practice to get used to these formula, so let’s do one more problem. Example(s) Suppose we throw 12 balls independently and uniformly into 7 bins. What are the mean and variance of the number of empty bins after this process? (Hint: Indicators). Solution Let X be the total number of empty bins, and X1, . . . , X7 be the indicator of whether or not bin i is empty so that X = ∑7 i=1Xi. Then, P (Xi = 1) = ( 6 7 )12