Data Analysis The Truth About Linear Regression, Exercises - Engineering, Exercises of Advanced Data Analysis

Data Analysis The Truth About Linear Regression, Exercises - Engineering - Prof. Cosma Shalizi, Advanced Data Analysis, The Advantages of Backwardness

Typology: Exercises

2010/2011

Uploaded on 11/03/2011

bridge
bridge 🇺🇸

4.9

(13)

287 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Homework Assignment 2: The Advantages of
Backwardness
36-402, Data Analysis, Spring 2011
Due 25 January 2011
Many theories of economic growth say that it’s easier for poor countries to
grow faster than rich countries “catching up”, or the “advantages of back-
wardness”. One argument for this is that poor countries can grow by copying
existing, successful technologies and ways of doing business from rich ones. But
rich countries are already using those technologies, so they can only grow by
finding new ones, and copying is faster than innovation. So, all else being equal,
poor countries should grow faster than rich ones. One way to check this is to
look at how growth rates are related to other economic variables.
We will use the np package on CRAN to do kernel regression.1Install it,
and load its oecdpanel data set. This contains growth data for many countries
for 1960–1995, collected by the Organization for Economic Cooperation and
Development (the OECD). We won’t use all the variables this time.
GDP is “gross domestic product”, the total value of all economic production.
It’s usually reported per capita and per year. Call it Yi,t, since it depends on
the country iand the year t. GDP isn’t perfect2, but it is standard.
In oecdpanel, the variable growth is the logarithmic growth rate of GDP,
= log Yi,t+1 /Yi,t . We look at logarithms because economic models suggest that
the factors which affect growth should multiply together, rather than adding.
What’s actually recorded here is the average growth rate over a five-year period,
reducing year-to-year accidents.
initgdp is logYi,t , the logarithm of per-capita GDP at the start of each
five-year period.
A country’s investment rate is the fraction of its GDP that goes into building
or repairing productive assets (roads, harbors, power plants, factory machines,
buildings, etc.). inv is the logarithm of the investment rate, so inv=-2.26
means 10.4% of output was invested.
popgro, similarly, is the logarithm of the population growth rate.
1The package has good help files, if you want to know more. Or see http://www.jstatsoft.
org/v27/i05.
2If everyone gets worried about being robbed, GDP goes up by the amount we spend on
extra locks, alarms, guards, etc., none of which would be needed if we just didn’t have so
many burglars.
1
pf3

Partial preview of the text

Download Data Analysis The Truth About Linear Regression, Exercises - Engineering and more Exercises Advanced Data Analysis in PDF only on Docsity!

Homework Assignment 2: The Advantages of

Backwardness

36-402, Data Analysis, Spring 2011

Due 25 January 2011

Many theories of economic growth say that it’s easier for poor countries to grow faster than rich countries — “catching up”, or the “advantages of back- wardness”. One argument for this is that poor countries can grow by copying existing, successful technologies and ways of doing business from rich ones. But rich countries are already using those technologies, so they can only grow by finding new ones, and copying is faster than innovation. So, all else being equal, poor countries should grow faster than rich ones. One way to check this is to look at how growth rates are related to other economic variables. We will use the np package on CRAN to do kernel regression.^1 Install it, and load its oecdpanel data set. This contains growth data for many countries for 1960–1995, collected by the Organization for Economic Cooperation and Development (the OECD). We won’t use all the variables this time. GDP is “gross domestic product”, the total value of all economic production. It’s usually reported per capita and per year. Call it Yi,t, since it depends on the country i and the year t. GDP isn’t perfect^2 , but it is standard. In oecdpanel, the variable growth is the logarithmic growth rate of GDP, = log Yi,t+1/Yi,t. We look at logarithms because economic models suggest that the factors which affect growth should multiply together, rather than adding. What’s actually recorded here is the average growth rate over a five-year period, reducing year-to-year accidents. initgdp is log Yi,t, the logarithm of per-capita GDP at the start of each five-year period. A country’s investment rate is the fraction of its GDP that goes into building or repairing productive assets (roads, harbors, power plants, factory machines, buildings, etc.). inv is the logarithm of the investment rate, so inv=-2. means 10.4% of output was invested. popgro, similarly, is the logarithm of the population growth rate. (^1) The package has good help files, if you want to know more. Or see http://www.jstatsoft. org/v27/i05. (^2) If everyone gets worried about being robbed, GDP goes up by the amount we spend on extra locks, alarms, guards, etc., none of which would be needed if we just didn’t have so many burglars.

  1. (5 points) Fit a linear model of growth on initgdp. What is the coeffi- cient? What does it suggest about catching-up?
  2. (20 points) The npreg function in the np package does kernel regression. By default, it uses a combination of cross-validation and sophisticated but very slow optimization to pick the best bandwidth. In this prob- lem, though, we will force it to use fixed bandwidths, and do the cross- validation ourselves.

oecd.0.1 <- npreg(growth~initgdp,bws=0.1,data=oecdpanel)

does a kernel regression of growth on initgdp, using the default kernel (which is Gaussian) and bandwidth 0.1. You can run fitted, predict, etc., on the output of npreg just as you can on the output of lm. The code at the end of this assignment (also online) uses five-fold cross- validation to estimate the mean-squared error for the five bandwidths

  1. 1 , 0. 2 , 0. 3 , 0. 4 , 0 .5. Use it to create a plot of MSE versus bandwidth. Add to the same plot the MSEs of the five bandwidths on the whole data. What bandwidth predicts best?

  2. (10 points) Make a scatterplot of initgdp versus growth. Add the line for the linear model. Add the fitted values for the kernel curve with the best bandwidth (according to the previous problem). Does the kernel regression curve suggest that poorer countries tend to grow faster? (There are at least two ways to get the fitted values for the kernel regres- sion, using fitted or predict.)

  3. (5 points) If we want to check whether poorer countries tend to grow faster, all else being equal, it seems reasonable to try to keep all else equal. Do a linear regression of growth on initgdp, along with popgro and inv. What are the new regression coefficients? Does the coefficient of initgdp have the same sign as before? What does it suggest about catching-up?

  4. (10 points) npreg will also do kernel regressions with multiple input vari- ables. This time, use the built-in bandwidth selector:

oecd.npr <- npreg(growth ~ initgdp + popgro + inv, data=oecdpanel, tol=0.1, ftol=0.1)

(The last two arguments tell the bandwidth selector to not be very hard to optimize — which in this case saves a lot of time, and works out well.) What are the selected bandwidths? (Use summary.)

  1. (15 points) What are the median values of popgro and inv? For coun- tries with those median values, plot the predicted growth rate versus initial GDP, under both the linear model from problem 4 and the kernel regres- sion from problem 5. (One way to do this is to use predict, but there are probably others.) Describe what each curve suggests about catching-up.