Bivariate Data Summary: Analyzing the Relationship between House Price and House Size, Lecture notes of Linear Algebra

A chapter from 'Analysis of Economics Data' by A. Colin Cameron of the University of California, Davis. It discusses methods for summarizing the relationship between two variables, using the example of house price and house size. Topics covered include descriptive statistics, two-way tabulation, scatterplots, correlation, and regression analysis.

Typology: Lecture notes

2021/2022

Uploaded on 07/04/2022

Doortje7
Doortje7 🇳🇱

4.5

(6)

92 documents

1 / 45

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Analysis of Economics Data
Chapter 5: Bivariate Data Summary
c
A. Colin Cameron
Univ. of Calif. Davis
March 2021
c
A. Colin C ameron Univ. of Calif. Davis () AED Ch. 5: Bivaria te Data S ummary March 202 1 1 / 45
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d

Partial preview of the text

Download Bivariate Data Summary: Analyzing the Relationship between House Price and House Size and more Lecture notes Linear Algebra in PDF only on Docsity!

Analysis of Economics Data

Chapter 5: Bivariate Data Summary

c (^) A. Colin Cameron Univ. of Calif. Davis

March 2021

Chapter 5

CHAPTER 5: Bivariate Data Summary

Relationship between two variables: I (^) e.g. earnings and education I (^) e.g. house price and house size. In principle the two variables should be treated equally. In practice one variable is often viewed as being caused by another variable I (^) additional information is needed to determine causation. Notation is that of mathematics I (^) variable y is a function of variable x.

5.1 Example: House Price and Size

5.1 Example: House Price and Size

House price and size for sample of 29 houses I (^) control for location by consider homogeneous housing market I (^) central Davis in 1999. I (^) eyeballing data it seems higher price if larger size

Sale Price Sq. Feet Sale Price Sq. Feet Sale Price Sq. Feet 375,000 3,300 255,000 1,500 235,000 1, 340,000 2,400 253,000 2,100 233,000 1, 310,000 2,300 249,000 1,900 230,000 2, 279,900 2,000 245,000 1,400 229,000 1, 278,500 2,600 244,000 2,000 224,500 2, 273,000 1,900 241,000 1,600 220,000 1, 272,000 1,800 239,500 1,600 213,000 1, 270,000 2,000 238,000 1,900 212,000 1, 270,000 1,800 236,500 1,600 204,000 1, 258,500 1,600 235,000 1,

5.1 Example: House Price and Size

Summary Statistics

House sale price ranges from $204,000 to $375, I (^) mean $253,910 and standard deviation $37,391. House size ranges from 1,400 to 3,300 square feet I (^) mean 1,883 square feet and standard deviation 398 square feet.

Statistic Sale Price Square Feet Mean 253,910 1, Standard deviation 37,391 398 Standard error 6,943 74 Maximum 375,000 3, Median (50th percentile) 244,000 1, Minimum 204,000 1, Skewness 1.56 1. Kurtosis 5.61 6.

5.2 Two-way Tabulation

5.2 Two-way Tabulation

A two-way tabulation or cross tabulation of variables x and y lists the number (or fraction) of observations equal to each of the distinct values taken by the pair (x, y ). Useful if the variables x and y take relatively few values I (^) categorical data with few categories I (^) discrete numerical taking a few values I (^) for continuous numerical convert to a few ranges. House price and size data create I (^) pricerange: low (price<$249,000) or high (price$250,000). I (^) sizerange: small (size<1,800), medium (1,800size<2,400) or large (size2,400).

5.2 Two-way Tabulation

Two-way Tabulation (continued)

Main entry: # observations with a given price-size combination I (^) e.g. there were 11 houses of low price and small size.

Size range Price range Small Medium Large Total Low 11 6 0 17 High 2 7 3 12 Total 13 13 3 29

Table also includes row sums and column sums I (^) e.g. total in row for low price range is 11 + 6 + 0 = 17 observations. Table includes a second optional entry, a row percentage I (^) for each value of pricerange gives % of obs in each of the size ranges I (^) e.g. low-priced: small = 11 out of 17 = 100  11 / 17 = 64. 71 %. Can also include similarly constructed column percentages.

5.3 Two-way Scatterplot

5.3 Two-way Scatterplot

Standard visual method is a two-way scatter plot I (^) Örst panel shows house price increases with house size.

100,000 0

200,

300,

400,

500,

House price (in dollars) 0 1,000 2,000 3,000 4, House size (in square feet)

(-) (+)

(+) (-) 100,000 0

200,

300,

400,

500,

House price (in dollars) 0 1,000 2,000 3,000 4, House size (in square feet)

5.4 Correlation Sample Correlation

5.4 Sample Correlation

Correlation coe¢ cient is a standard way to measure association between x and y I (^) it s unit-free, ranging from 1 to 1. The sample correlation coe¢ cient is deÖned by

rxy = ∑ni = 1 (xi x¯ )(yi y¯ ) p ∑ni = 1 (xi ^ x¯^ )^2 ^ ∑ni = 1 (yi ^ y¯^ )^2

Then 1  rxy  1 with

rxy = 1 perfect positive correlation 0 < rxy < 1 positive correlation rxy = 0 no correlation 1 < rxy < 0 negative correlation rxy = 1 prefect negative correlation

For the house price and size data: rxy = 0. 786.

5.4 Correlation Sample Correlation

Sample Correlation

Correlation coe¢ cient is a standard way to measure association between x and y I (^) it s unit-free, ranging from 1 to 1. The sample correlation coe¢ cient is deÖned by

rxy =

sxy sx sy

= Covariance of x and y (Standard deviation of x )  (Standard deviation of y ) ∑ni = 1 (xi ^ x¯^ )(yi ^ y¯^ ) =

p ∑ni = 1 (xi x¯ )^2  ∑ni = 1 (yi y¯ )^2

The correlation coe¢ cient is the covariance between the standardized versions of x and y ((x x¯ )/sx and (y y¯ )/sy ).

5.4 Correlation Sample Correlation

Four Examples of Strength of Correlation

(1) strong positive correlation; (2) moderate positive correlation; (3) almost zero correlation, and (4) moderate negative correlation. Though no clear cuto§s for ìweakî, ìmoderateî, ìstrongî correlation.

r =.

0

5

10

15 y

2 3 4 5 6 x

r =.

0

5

10

15 y

2 3 4 5 6 x

r =.

0

5

10

15 y

2 3 4 5 6 x

r = -.

0

5

10

15 y

2 3 4 5 6 x

5.5 Regression Line Regression Line

5.5 Regression Line

This is the key method in the analysis of economics data. The regression line from regression of y on x is denoted

by = b 1 + b 2 x,

where I (^) y is called the dependent variable I (^) by is the predicted value or Ötted value of the dependent variable I (^) x is the independent variable or explanatory variable or regressor variable or covariate I (^) b 1 is the estimated y -axis intercept I (^) b 2 is the estimated slope coe¢ cient.

5.5 Regression Line The Residual

The Residual

Residual e is the di§erence between actual value of y and predicted value by e = y by. I (^) also denoted bu = y by.

5.5 Regression Line Least Squares Regression

This is a calculus problem

I (^) Di§erentiate with respect to b 1 and b 2 I (^) Set the two derivatives equal to zero I (^) Solve two equations in two unknowns for b 1 and b 2 I (^) Algebra is skipped.

The resulting formula for the least squares slope coe¢ cient is

b 2 = ∑

n i = 1 (xi^ ^ x¯^ )(yi^ ^ y¯^ ) ∑ni = 1 (xi ^ x¯^ )^2

The least squares intercept is

b 1 = y¯ b 2 x¯.

5.5 Regression Line Interpretation of the Slope Coe¢ cient

Interpretation of the Slope Coe¢ cient

The slope coe¢ cient b 2 gives the slope:

∆by ∆x = b 2.

Reason: If regressors changes by ∆x from x to (x + ∆x ) then the Ötted value by changes from b 1 + b 2 x to b 1 + b 2 (x + ∆x ) = b 1 + b 2 x + b 2 x ∆x, a change of b 2 ∆x. I (^) It follows that ∆by = b 2 ∆x. The slope coe¢ cient b 2 is therefore easily interpreted as the change in the predicted value of y when x increases by one unit. The same result can be obtained using calculus methods I (^) since by = b 1 + b 2 x has derivative dby /dx = b 2.