Regression and Correlation , Study notes of Business Statistics

Regression and Correlation Study notes with mcqs

Typology: Study notes

2015/2016

Uploaded on 11/20/2016

Mohammad.Dawood
Mohammad.Dawood 🇵🇰

1 document

1 / 58

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CHAPTER 12
CORRELATION
AND
REGRESSION
Copyright -The Institute of Chartered Accountants of India
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a

Partial preview of the text

Download Regression and Correlation and more Study notes Business Statistics in PDF only on Docsity!

CHAPTER – 12

CORRELATION

AND

REGRESSION

CORRELATION AND REGRESSION

12.2 COMMON^ PROFICIENCY^ TEST

LEARNING OBJECTIVESLEARNING OBJECTIVESLEARNING OBJECTIVES LEARNING OBJECTIVESLEARNING OBJECTIVES

After reading this chapter a student will be able to understand– u The meaning of bivariate data and technique of preparation of bivariate distribution; u The concept of correlation between two variables and quantitative measurement of correlation including the interpretation of positive, negative and zero correlation; u Concept of regression and its application in estimation of a variable from known set of data.

12.112.112.112.112.1 INTRODUCTIONINTRODUCTIONINTRODUCTIONINTRODUCTIONINTRODUCTION

In the previous chapter, we discussed many a statistical measure relating to Univariate distribution i.e. distribution of one variable like height, weight, mark, profit, wages and so on. However, there are situations that demand study of more than one variable simultaneously. A businessman may be keen to know what amount of investment would yield a desired level of profit or a student may want to know whether performing better in the selection test would enhance his or her chance of doing well in the final examination. With a view to answering this series of questions, we need study more than one variable at the same time. Correlation Analysis and Regression Analysis are the two analysis that are made from a multivariate distribution i.e. a distribution of more than one variable. In particular when there are two variables, say x and y, we study bivariate distribution. We restrict our discussion to bivariate distribution only. Correlation analysis, it may be noted, helps us to find an association or the lack of it between the two variables x and y. Thus if x and y stand for profit and investment of a firm or the marks in Statistics and Mathematics for a group of students, then we may be interested to know whether x and y are associated or independent of each other. The extent or amount of correlation between x and y is provided by different measures of Correlation namely Product Moment Correlation Coefficient or Rank Correlation Coefficient or Coefficient of Concurrent Deviations. In Correlation analysis, we must be careful about a cause and effect relation between the variables under consideration because there may be situations where x and y are related due to the influence of a third variable although no causal relationship exists between the two variables. Regression analysis, on the other hand, is concerned with predicting the value of the dependent variable corresponding to a known value of the independent variable on the assumption of a mathematical relationship between the two variables and also an average relationship between them.

12.212.212.212.212.2 BIVARIATE DATABIVARIATE DATABIVARIATE DATABIVARIATE DATABIVARIATE DATA

When data are collected on two variables simultaneously, they are known as bivariate data and the corresponding frequency distribution, derived from it, is known as Bivariate Frequency Distribution. If x and y denote marks in Maths and Stats for a group of 30 students, then the corresponding bivariate data would be (x (^) i , y (^) i ) for i = 1, 2, …. 30 where (x 1 , y 1 ) denotes the marks in Maths and Stats for the student with serial number or Roll Number 1, (x 2 , y 2 ), that for the student with Roll Number 2 and so on and lastly (x 30 , y 30 ) denotes the pair of marks for the student bearing Roll Number 30.

CORRELATION AND REGRESSION

12.4 COMMON^ PROFICIENCY^ TEST

TT TTTable 12.1able 12.1able 12.1able 12.1able 12. Bivariate Frequency Distribution of Marks of Statistics and Mathematics. MARKS IN MATHS Y 0-4 4-8 8-12 12-16 16-20 Total X 0 – 4 I (1) I (1) II (2) 4 4–8 I (1) IIII (4) IIII (5) I (1) I (1) 12 8–12 I (1) II (2) IIII (4) IIII I (6) I (1) 14 12–16 I (1) III (3) II (2) IIII (5) 11 16–20 I (1) IIII (5) III (3) 9 Total 3 8 15 14 10 50

We note, from the above table, that some of the cell frequencies (fij) are zero. Starting from the above Bivariate Frequency Distribution, we can obtain two types of univariate distributions which are known as: (a) Marginal distribution. (b) Conditional distribution. If we consider the distribution of stat marks along with the marginal totals presented in the last column of Table 12-1, we get the marginal distribution of marks of Statistics. Similarly, we can obtain one more marginal distribution of Mathematics marks. The following table shows the marginal distribution of marks of Statistics. Table 12.2Table 12.2Table 12.2Table 12.2Table 12. Marginal Distribution of Marks of Statistics Marks No. of Students 0-4 4 4-8 12 8-12 14 12-16 11 16-20 9 Total 50

We can find the mean and standard deviation of marks of Statistics from Table 12.2. They would be known as marginal mean and marginal SD of stats marks. Similarly, we can obtain the marginal mean and marginal SD of Maths marks. Any other statistical measure in respect of x or y can be computed in a similar manner.

MARKS
IN STATS

STATISTICS (^) 12.

If we want to study the distribution of Stat Marks for a particular group of students, say for those students who got marks between 8 to 12 in Maths, we come across another univariate distribution known as conditional distribution. TTTTTable 12.3able 12.3able 12.3able 12.3able 12. Conditional Distribution of Marks in Statistics for Students having Mathematics Marks between 8 to 12 Marks No. of Students 0-4 2 4-8 5 8-12 4 12-16 3 16-20 1 Total 15

We may obtain the mean and SD from the above table. They would be known as conditional mean and conditional SD of marks of Statistics. The same result holds for marks of Mathematics. In particular, if there are m classification for x and n classifications for y, then there would be altogether (m + n) conditional distribution.

12.312.312.312.312.3 CORRELATION ANALYSISCORRELATION ANALYSISCORRELATION ANALYSISCORRELATION ANALYSISCORRELATION ANALYSIS

While studying two variables at the same time, if it is found that the change in one variable is reciprocated by a corresponding change in the other variable either directly or inversely, then the two variables are known to be associated or correlated. Otherwise, the two variables are known to be dissociated or uncorrelated or independent. There are two types of correlation. (i) Positive correlation (ii) Negative correlation If two variables move in the same direction i.e. an increase (or decrease) on the part of one variable introduces an increase (or decrease) on the part of the other variable, then the two variables are known to be positively correlated. As for example, height and weight yield and rainfall, profit and investment etc. are positively correlated. On the other hand, if the two variables move in the opposite directions i.e. an increase (or a decrease) on the part of one variable result a decrease (or an increase) on the part of the other variable, then the two variables are known to have a negative correlation. The price and demand of an item, the profits of Insurance Company and the number of claims it has to meet etc. are examples of variables having a negative correlation. The two variables are known to be uncorrelated if the movement on the part of one variable does not produce any movement of the other variable in a particular direction. As for example, Shoe-size and intelligence are uncorrelated.

STATISTICS (^) 12.

FIGURE 12.3FIGURE 12.3FIGURE 12.3FIGURE 12.3FIGURE 12.3 FIGURE 12.4FIGURE 12.4FIGURE 12.4FIGURE 12.4FIGURE 12.

Showing Negative CorrelationShowing Negative CorrelationShowing Negative CorrelationShowing Negative CorrelationShowing Negative Correlation Showing perfect NegativeShowing perfect NegativeShowing perfect NegativeShowing perfect NegativeShowing perfect Negative CorrelationCorrelationCorrelationCorrelationCorrelation (–1 < r <0)(–1 < r <0)(–1 < r <0)(–1 < r <0)(–1 < r <0) (r = –1)(r = –1)(r = –1)(r = –1)(r = –1)

FIGURE 12.5FIGURE 12.5FIGURE 12.5FIGURE 12.5FIGURE 12.5 FIGURE 12.6FIGURE 12.6FIGURE 12.6FIGURE 12.6FIGURE 12.

Showing No CorrelationShowing No CorrelationShowing No CorrelationShowing No CorrelationShowing No Correlation^ Showing CurvilinearShowing CurvilinearShowing CurvilinearShowing CurvilinearShowing Curvilinear CorrelationCorrelationCorrelationCorrelationCorrelation (r = 0)(r = 0)(r = 0)(r = 0)(r = 0) (r = 0)(r = 0)(r = 0)(r = 0)(r = 0)

(b)(b)(b)(b)(b)^ KARL PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENTKARL PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENTKARL PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENTKARL PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENTKARL PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT This is by for the best method for finding correlation between two variables provided the relationship between the two variables in linear. Pearson’s correlation coefficient may be defined as the ratio of covariance between the two variables to the product of the standard deviations of the two variables. If the two variables are denoted by x and y and if the corresponding bivariate data are (x (^) i, y (^) i) for i = 1, 2, 3, ….., n, then the coefficient of correlation between x and y, due to Karl Pearson, in given by :

Y
O X
Y
O X
Y
O X
Y
O X
CORRELATION AND REGRESSION

12.8 COMMON^ PROFICIENCY^ TEST

Where

cov (x, y) =

∑ ( x – x (y – y)i ) i (^) = ∑x yi i– x y

n n

S = ∑^ (^ x – xi^ )^2 = ∑^ x^2 i – x^2

x n n ..................................................(12.3)

and ∑^ (^ i^ )^ ∑^2 i^2

2 y – y (^) y S (^) y = (^) n = (^) n – y .........................................(12.4)

A single formula for computing correlation coefficient is given by

In case of a bivariate frequency distribution, we have

and

∑j oj 2 j

y^2

f y

S = – y

N

Where x (^) i = Mid-value of the i th^ class interval of x

( ) xy = Cov^ x, y S (^) x Sy

×

r r

( )

i i i i (^2) i i (^2) i 2 i

r = n^ x y –^ x ×^ y 2

n x – x n y – ( y )

∑ ∑ ∑ (^) ∑ .............................................(12.5)

Cov(x,y)=

i,jx y fi^ i ij

  • x ×y

N

io 2 i

Sx = if x – x^2

N

CORRELATION AND REGRESSION

12.10 COMMON^ PROFICIENCY^ TEST

SolutionSolutionSolutionSolutionSolution From the given data, we have applying (12.5),

r = (^) ( ) ( )

n xy – x × y n x 2 – x 2 × n y^2 – y^2

− − 2 −^2

10 × 220 40 × 50 10 × 200 (40) × 10 × 262 (50)

2000 1600 × 2620 2500
20×10.

Thus there is a good amount of positive correlation between the two variables x and y. AlternatelyAlternatelyAlternatelyAlternatelyAlternately

As given, x = ∑^ x^ = 40 = 4 n 10

= ∑^ y^ = 50 = 5

y n 10

Cov (x, y) = ∑nxy^ −x.y

= 22010 −^4.^5 =^2

S (^) x = ∑nx^2 −(x)^2

= 20010 −^42 =^2

STATISTICS (^) 12.

S y = ∑^ −

y^2 i 2

n

y

= 26210 −^52

Thus applying formula (12.1), we get

r = (^) Sx .Sy

cov(x,y)

= 2 ×1.0954^2 =^ 0.

As before, we draw the same conclusion. Example 12.3Example 12.3Example 12.3Example 12.3Example 12.3 Find product moment correlation coefficient from the following information: X : 2 3 5 5 6 8 Y : 9 8 8 6 5 3 SolutionSolutionSolutionSolutionSolution In order to find the covariance and the two standard deviation, we prepare the following table: Table 12.3Table 12.3Table 12.3Table 12.3Table 12. Computation of Correlation Coefficient x (^) i y (^) i x (^) i y (^) i x (^) i^2 y (^) i^2 (1) (2) (3)= (1) x (2) (4)= (1) 2 (5)= (2) 2 2 9 18 4 81 3 8 24 9 64 5 8 40 25 64 5 6 30 25 36 6 5 30 36 25 8 3 24 64 9 29 39 166 163 279

STATISTICS (^) 12.

Example 12.4Example 12.4Example 12.4Example 12.4Example 12.4 The following data relate to the test scores obtained by eight salesmen in an aptitude test and their daily sales in thousands of rupees: Salesman : 1 2 3 4 5 6 7 8 scores : 60 55 62 56 62 64 70 54 Sales : 31 28 26 24 30 35 28 24 SolutionSolutionSolutionSolutionSolution Let the scores and sales be denoted by x and y respectively. We take a, origin of x as the average of the two extreme values i.e. 54 and 70. Hence a = 62 similarly, the origin of y is taken

as the 24 + 3 5 2 ≅ 3 0

Table 12.4Table 12.4Table 12.4Table 12.4Table 12. Computation of Correlation Coefficient Between Test Scores and Sales. Scores Sales in u (^) i v (^) i u (^) i v (^) i u (^) i^2 v (^) i^2 (x (^) i ) Rs. 1000 = xi – 62 = yi – 30 (1) (y (^) i ) (2) (3) (4) (5)=(3)x(4) (6)=(3) 2 (7)=(4) 2 60 31 –2 1 –2 4 1 55 28 –7 –2 14 49 4 62 26 0 –4 0 0 16 56 24 –6 –6 36 36 36 62 30 0 0 0 0 0 64 35 2 5 10 4 25 70 28 8 –2 –16 64 4 54 24 –8 –6 48 64 36 Total — –13 –14 90 221 122 Since correlation coefficient remains unchanged due to change of origin, we have

− − 2 − −^2
8 × 90 ( 13)×( 14)
8 × 221 ( 13) × 8 ×122 ( 14)
1 7 6 8 1 6 9 × 9 7 6 1 9 6

r = r (^) xy = ruv = (^) ( ) ( )

n u vi i u ×i vi 2 2 2 2 n u (^) i ui n v (^) i v i

∑ − ∑ × ∑ − ∑

CORRELATION AND REGRESSION

12.14 COMMON^ PROFICIENCY^ TEST

In some cases, there may be some confusion about selecting the pair of variables for which correlation is wanted. This is explained in the following problem. Example 12.5Example 12.5Example 12.5Example 12.5Example 12.5 Examine whether there is any correlation between age and blindness on the basis of the following data: Age in years : 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70- No. of Persons (in thousands) : 90 120 140 100 80 60 40 20 No. of blind Persons :10 15 18 20 15 12 10 06 SolutionSolutionSolutionSolutionSolution Let us denote the mid-value of age in years as x and the no. of blind persons per lakh as y. Then as before, we compute correlation coefficient between x and y. Table 12.5Table 12.5Table 12.5Table 12.5Table 12. Computation of correlation between age and blindness Age in Mid-value No. of No. of No. of xy x 2 y 2 years x Persons blind blind per (2)×(5) (2) 2 (5) 2 (1) (2) (‘000) B lakh (6) (7) (8) P (4) y=B/P × 1 lakh (3) (5) 0-10 5 90 10 11 55 25 121 10-20 15 120 15 12 180 225 144 20-30 25 140 18 13 325 625 169 30-40 35 100 20 20 700 1225 400 40-50 45 80 15 19 855 2025 361 50-60 55 60 12 20 1100 3025 400 60-70 65 40 10 25 1625 4225 625 70-80 75 20 6 30 2250 5625 900 Total 320 — — 150 7090 17000 3120

CORRELATION AND REGRESSION

12.16 COMMON^ PROFICIENCY^ TEST

= ∑^ −

y (^22) 15 = 16 20 = ∑ y 2 = 4820 Thus corrected ∑x = n x – wrong x value + correct x value. = 20 × 12 – 15 + 20 = 245

Similarly corrected∑y = 20 × 15 – 20 + 15 = 295

Corrected ∑x^2 = 3060 – 15 2 + 20 2 = 3235 Corrected ∑y^2 = 4820 – 20 2 + 15 2 = 4645 Thus corrected value of the correlation coefficient by applying formula (12.5)

= (^20). 3235 ( 245 ) (^2 20). 4645 ( 295 ) 2

− × −
68.3740×76.

Example 12.7Example 12.7Example 12.7Example 12.7Example 12.7 Compute the coefficient of correlation between marks in Stats and Maths for the bivariate frequency distribution shown in table 12. SolutionSolutionSolutionSolutionSolution For the save of computational advantage, we effect a change of origin and scale for both the variable x and y.

Define u (^) i =

x (^) i − a (^) =x (^) i− 10 b 4

And vj = yi^ d− c^ =y^ i− 410

Where x (^) i and y (^) j denote respectively the mid-values of the x-class interval and y-class interval respectively. The following table shows the necessary calculation on the right top corner of each cell, the product of the cell frequency, corresponding u value and the respective v value has been shown. They add up in a particular row or column to provide the value of f (^) ijuivj for that particular row or column. TTTTTable 12.6able 12.6able 12.6able 12.6able 12. Computation of Correlation Coefficient Between Marks of Maths and Stats

STATISTICS (^) 12.

Class Interval 0-4 4-8 8-12 12-16 16- Mid-value (^2 6 10 14 ) Class Mid V (^) j f (^) io f (^) io u (^) i f (^) io u (^) i^2 f (^) ij u (^) i v (^) j Interval -value u (^) i –2 –1 0 1 2 0-4 2 –2 1 4 1 2 2 0 4 –8 16 6 4-8 6 –1 2 4 4 4 5 0 1 –1^1 –2^13 –13 13 5 8-12 10 0 2 0 4 0 6 0 1 0 13 0 0 0 12-16 14 1 1 –1^3 0 2 2 5 10 11 11 11 16-20 18 2 1 0 5 10 3 12 9 18 36 22 foj 3 8 15 14 10 50 5 76 44 foj v (^) j –6 –8 0 14 20 20 f (^) oj v (^) j^2 12 8 0 14 40 f (^) ij u (^) i v (^) j 8 5 0 11 20 44 CHECK

A single formula for computing correlation coefficient from bivariate frequency distribution is given by

r = ( ) (^) ( )

i, j^ ij^ i^ j^ io^ i^ o j^ j 2 2 2 2 io i io i oj j oj j

N f u v – f u × f v

N f u – f u × f v – f v ...........................( 12. 10)

− 2 −^2
50× 44 8×
50×76 8 50×74 20
61.1228 × 57.

The value of r shown a good amount of positive correlation between the marks in Statistics and Mathematics on the basis of the given data. Example 12.8Example 12.8Example 12.8Example 12.8Example 12.8 Given that the correlation coefficient between x and y is 0.8, write down the correlation coefficient between u and v where (i) 2u + 3x + 4 = 0 and 4v + 16x + 11 = 0 (ii) 2u – 3x + 4 = 0 and 4v + 16x + 11 = 0 (iii) 2u – 3x + 4 = 0 and 4v – 16x + 11 = 0 (iv) 2u + 3x + 4 = 0 and 4v – 16x + 11 = 0

STATISTICS (^) 12.

In this formula, tj represents the jth^ tie length and the summation ∑j^ (t^ j^3 – t )j^ extends over the lengths of all the ties for both the series. Example 12.9Example 12.9Example 12.9Example 12.9Example 12.9 compute the coefficient of rank correlation between sales and advertisement expressed in thousands of rupees from the following data: Sales : 90 85 68 75 82 80 95 70 Advertisement : 7 6 2 3 4 5 8 1 SolutionSolutionSolutionSolutionSolution Let the rank given to sales be denoted by x and rank of advertisement be denoted by y. We note that since the highest sales as given in the data, is 95, it is to be given rank 1, the second highest sales 90 is to be given rank 2 and finally rank 8 goes to the lowest sales, namely 68. We have given rank to the other variable advertisement in a similar manner. Since there are no ties, we apply formula (12.11). Table 12.7Table 12.7Table 12.7Table 12.7Table 12. Computation of Rank correlation between Sales and Advertisement. Sales Advertisement Rank for Rank for di = xi – y (^) i d (^) i^2 Sales (x (^) i ) Advertisement (y (^) i ) 90 7 2 2 0 0 85 6 3 3 0 0 68 2 8 7 1 1 75 3 6 6 0 0 82 4 4 5 –1 1 80 5 5 4 1 1 95 8 1 1 0 0 70 1 7 8 –1 1 Total — — — 0 4

r (^) R =

( )

( )

3 (^2) i i j 2

tj 6 d + (^12) 1 n n 1

t j ∑ ∑ − −

CORRELATION AND REGRESSION

12.20 COMMON^ PROFICIENCY^ TEST

Since n = 8 and ∑ d^2 i = 4, applying formula (12.11), we get.

r (^) R = −^ −

∑ 2 i

2 1 6 d n(n 1)

= −^2 −
1 6 × 4

The high positive value of the rank correlation coefficient indicates that there is a very good amount of agreement between sales and advertisement. Example 12.10Example 12.10Example 12.10Example 12.10Example 12.10^ Compute rank correlation from the following data relating to ranks given by two judges in a contest: Serial No. of Candidate : 1 2 3 4 5 6 7 8 9 10 Rank by Judge A : 10 5 6 1 2 3 4 7 9 8 Rank by Judge B : 5 6 9 2 8 7 3 4 10 1 SolutionSolutionSolutionSolutionSolution We directly apply formula (12.11) as ranks are already given. Table 12.8Table 12.8Table 12.8Table 12.8Table 12. Computation of Rank Correlation Coefficient between the ranks given by 2 Judges Serial No. Rank by A (x (^) i) Rank by B (y (^) i) di = x (^) i – y (^) i d^2 i 1 10 5 5 25 2 5 6 –1 1 3 6 9 –3 9 4 1 2 –1 1 5 2 8 –6 36 6 3 7 –4 16 7 4 3 1 1 8 7 4 3 9 9 8 10 –2 4 10 9 1 8 64 Total — — 0 166