

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Data Analysis The Truth About Linear Regression, Exercises - Engineering - Prof. Cosma Shalizi, Advanced Data Analysis, The Advantages of Backwardness
Typology: Exercises
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Many theories of economic growth say that it’s easier for poor countries to grow faster than rich countries — “catching up”, or the “advantages of back- wardness”. One argument for this is that poor countries can grow by copying existing, successful technologies and ways of doing business from rich ones. But rich countries are already using those technologies, so they can only grow by finding new ones, and copying is faster than innovation. So, all else being equal, poor countries should grow faster than rich ones. One way to check this is to look at how growth rates are related to other economic variables. We will use the np package on CRAN to do kernel regression.^1 Install it, and load its oecdpanel data set. This contains growth data for many countries for 1960–1995, collected by the Organization for Economic Cooperation and Development (the OECD). We won’t use all the variables this time. GDP is “gross domestic product”, the total value of all economic production. It’s usually reported per capita and per year. Call it Yi,t, since it depends on the country i and the year t. GDP isn’t perfect^2 , but it is standard. In oecdpanel, the variable growth is the logarithmic growth rate of GDP, = log Yi,t+1/Yi,t. We look at logarithms because economic models suggest that the factors which affect growth should multiply together, rather than adding. What’s actually recorded here is the average growth rate over a five-year period, reducing year-to-year accidents. initgdp is log Yi,t, the logarithm of per-capita GDP at the start of each five-year period. A country’s investment rate is the fraction of its GDP that goes into building or repairing productive assets (roads, harbors, power plants, factory machines, buildings, etc.). inv is the logarithm of the investment rate, so inv=-2. means 10.4% of output was invested. popgro, similarly, is the logarithm of the population growth rate. (^1) The package has good help files, if you want to know more. Or see http://www.jstatsoft. org/v27/i05. (^2) If everyone gets worried about being robbed, GDP goes up by the amount we spend on extra locks, alarms, guards, etc., none of which would be needed if we just didn’t have so many burglars.
oecd.0.1 <- npreg(growth~initgdp,bws=0.1,data=oecdpanel)
does a kernel regression of growth on initgdp, using the default kernel (which is Gaussian) and bandwidth 0.1. You can run fitted, predict, etc., on the output of npreg just as you can on the output of lm. The code at the end of this assignment (also online) uses five-fold cross- validation to estimate the mean-squared error for the five bandwidths
1 , 0. 2 , 0. 3 , 0. 4 , 0 .5. Use it to create a plot of MSE versus bandwidth. Add to the same plot the MSEs of the five bandwidths on the whole data. What bandwidth predicts best?
(10 points) Make a scatterplot of initgdp versus growth. Add the line for the linear model. Add the fitted values for the kernel curve with the best bandwidth (according to the previous problem). Does the kernel regression curve suggest that poorer countries tend to grow faster? (There are at least two ways to get the fitted values for the kernel regres- sion, using fitted or predict.)
(5 points) If we want to check whether poorer countries tend to grow faster, all else being equal, it seems reasonable to try to keep all else equal. Do a linear regression of growth on initgdp, along with popgro and inv. What are the new regression coefficients? Does the coefficient of initgdp have the same sign as before? What does it suggest about catching-up?
(10 points) npreg will also do kernel regressions with multiple input vari- ables. This time, use the built-in bandwidth selector:
oecd.npr <- npreg(growth ~ initgdp + popgro + inv, data=oecdpanel, tol=0.1, ftol=0.1)
(The last two arguments tell the bandwidth selector to not be very hard to optimize — which in this case saves a lot of time, and works out well.) What are the selected bandwidths? (Use summary.)