Probability and Statstics - Correlation Regression, Study notes of Economics

In this document topics covered which are Bi-variate data, Correlation Coefficient, Method of Least Squares and Regression Lines, Correlation Ratio, Potential sale of new product and its price.

Typology: Study notes

2010/2011

Uploaded on 09/03/2011

manish
manish 🇮🇳

4.5

(24)

48 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Bi-variate data: Let (x1,y1),…,(xn,yn) be n paired observations
on two variables X and Y. Such data are called bi-variate data.
(i) Potential sale of new product and its price.
(ii) Family expenditure on travel and its income.
(iii) Patients weight and number of weeks he or she
has been given a diet.
(iv) Yield of a variety of wheat and rainfall.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Probability and Statstics - Correlation Regression and more Study notes Economics in PDF only on Docsity!

Bi-variate data: Let (x 1 ,y 1 ),…,(xn,yn) be n paired observations on two variables X and Y. Such data are called bi-variate data.

(i) Potential sale of new product and its price.

(ii) Family expenditure on travel and its income.

(iii) Patients weight and number of weeks he or she

has been given a diet.

(iv) Yield of a variety of wheat and rainfall.

The objective of correlation is to measure extent of relationship between two variables

The objective of regression is to establish

relationship between two (or more) variables.

Alternatively rxy can be written as Result: Prove that -1rxy1. Proof: For proving the result, make use of the result (uui-vi) 2 0 and (uui+vi) 2 0, with When rxy=1, we say that X and Y are perfectly (upositively or negatively) correlated. When rxy=0, we say that X and Y are uncorrelated.

  

n i i n i i n i i i xy

x nX y n Y

x y nX Y

r

1 (^22) 1 (^22) 1

i i x i i y

u  X  X  v  Y  Y 

Notice that r

xy

=r

yx

Result: If we define ui=(xi-a)/h, vi=(yi-b)/k, then If h, k are of same sign then rxy =ruv, and if h,k are of opposite sign then rxy =-ruv. xy uv

r

hk

hk

r 

Let (ux 1 ,y 1 ),…,(uxn,yn) be n paired observations on two variables X and Y. For estimating the regression coefficients we use the method of least squares. Let us write yi=+xi+ei , i=1,2,…,n where e i is the error term corresponding to i-th observation. In method of least squares we estimate  and  by minimizing the error sum of squares  ei 2 = (u yi--xi) 2 with respect to  and .

For minimizing error sum of squares, we differentiate it with respect to  and  and put it equal to zero. This yields the following set of equations: The above equations are called the set of normal equations. The resulting estimators obtained by solving the above normal equations and denoted by a and byx, and called the least squares

estimators of  and  respectively.

    

n i n i i n i i i n i n i i i

x y x x

y n x

1 1 2 1 1 1 1    

(ui) In general byx is not equal to bxy. (uii) bxy and byx are of the same sign as that of r.. (uiii) bxybyx=r 2 , i.e., r is the geometric mean of bxy and byx. (uiv) The regression coefficient byx is not effected by the change of origin but effected by the change of scale. (uv) Some other non-linear relationships can be transformed into linear form and method of least squares can be used to estimate the coefficients. Some of the examples are as follows: (ua)Y=a+b/x (ub)Y=ABX (uc)Y=AX B (ud)Y=Ae bX

Ex: The following data give the number of hours which ten person study for a French test and their scores on the test: Hrs: 4 9 10 14 4 7 12 22 1 17 Scores: 31 58 65 73 37 44 60 91 21 84 Fit the regression of test scores on number of hours study. Predict the score of a person who studies for 20 hours. a=21.69; b=.976 ,

Result: Prove that if  is the angle between the two lines of regression then Notice that for r=1 or –1, the two lines coincide (u=0) and for r=0, two lines are perpendicular to each other. Result: The point of intersection of two lines of regression is 2 2 2

tan

x y x y

r

r

    

X , Y

Multiple regression: Suppose we have more than one independent variables, say, X 1 , X 2 ,…,Xp, the multiple regression equation of Y on X 1 ,…,Xp is given by Y = + 1 X 1 +…+ pXp+e For given set of observations, we can fit the multiple regression equation by using the method of least squares.

where di (ui=1,…,n) is the difference between the two ranks. In case of ties between ranks, we assign the tied observations the mean of the ranks which they jointly occupy. Obviously rank correlation coefficient also lies between –1 and 1. Ex: Calculate the rank correlation coefficient for the above example.

Correlation Ratio: If the relationship between two variables is curvilinear, the correlation coefficient, which is a measure of extent of linear relationship, may be very low or even zero whereas the Regression between two variables may be very strong. Thus correlation coefficient will be misleading. Correlation ratio is used to measure the extent of curvilinear relationship between two variables. Suppose corresponding to value xi (ui=1,…,m) of the variable X, the variable Y takes ni values yij (uj=1,…,ni). Then the correlation ratio of Y on X is defined as ^   ^ ^        

m i n j ij m i n j ij i yx (^) i i y y y y 1 1 2 1 1 2 2  1