




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Econometrics is based upon the development of statistical methods for estimating economic relationships, testing economic theories, and evaluating and implementing government and business policy
Typology: Exams
1 / 819
This page cannot be seen from the preview
Don't miss anything!





























































































hapter 1 discusses the scope of econometrics and raises general issues that result from the application of econometric methods. Section 1.3 examines the kinds of data sets that are used in business, economics, and other social sciences. Section 1.4 provides an intuitive discussion of the difficulties associated with the inference of causality in the social sciences.
Imagine that you are hired by your state government to evaluate the effectiveness of a publicly funded job training program. Suppose this program teaches workers various ways to use computers in the manufacturing process. The twenty-week program offers courses during nonworking hours. Any hourly manufacturing worker may participate, and enrollment in all or part of the program is voluntary. You are to determine what, if any, effect the training program has on each worker’s subsequent hourly wage. Now suppose you work for an investment bank. You are to study the returns on dif- ferent investment strategies involving short-term U.S. treasury bills to decide whether they comply with implied economic theories. The task of answering such questions may seem daunting at first. At this point, you may only have a vague idea of the kind of data you would need to collect. By the end of this introductory econometrics course, you should know how to use econo- metric methods to formally evaluate a job training program or to test a simple eco- nomic theory. Econometrics is based upon the development of statistical methods for estimating economic relationships, testing economic theories, and evaluating and implementing government and business policy. The most common application of econometrics is the forecasting of such important macroeconomic variables as interest rates, inflation rates, and gross domestic product. While forecasts of economic indicators are highly visible and are often widely published, econometric methods can be used in economic areas that have nothing to do with macroeconomic forecasting. For example, we will study the effects of political campaign expenditures on voting outcomes. We will consider the effect of school spending on student performance in the field of education. In addition, we will learn how to use econometric methods for forecasting economic time series.
C h a p t e r One
The Nature of Econometrics and
Economic Data
E X A M P L E 1. 1 ( E c o n o m i c M o d e l o f C r i m e )
In a seminal article, Nobel prize winner Gary Becker postulated a utility maximization frame- work to describe an individual’s participation in crime. Certain crimes have clear economic rewards, but most criminal behaviors have costs. The opportunity costs of crime prevent the criminal from participating in other activities such as legal employment. In addition, there are costs associated with the possibility of being caught and then, if convicted, the costs associated with incarceration. From Becker’s perspective, the decision to undertake illegal activity is one of resource allocation, with the benefits and costs of competing activities taken into account. Under general assumptions, we can derive an equation describing the amount of time spent in criminal activity as a function of various factors. We might represent such a func- tion as
y f ( x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 ), (1.1)
where
y hours spent in criminal activities x 1 “wage” for an hour spent in criminal activity x 2 hourly wage in legal employment x 3 income other than from crime or employment x 4 probability of getting caught x 5 probability of being convicted if caught x 6 expected sentence if convicted x 7 age
Other factors generally affect a person’s decision to participate in crime, but the list above is representative of what might result from a formal economic analysis. As is common in economic theory, we have not been specific about the function f () in (1.1). This function depends on an underlying utility function, which is rarely known. Nevertheless, we can use economic theory—or introspection—to predict the effect that each variable would have on criminal activity. This is the basis for an econometric analysis of individual criminal activity.
Formal economic modeling is sometimes the starting point for empirical analysis, but it is more common to use economic theory less formally, or even to rely entirely on intuition. You may agree that the determinants of criminal behavior appearing in equa- tion (1.1) are reasonable based on common sense; we might arrive at such an equation directly, without starting from utility maximization. This view has some merit, although there are cases where formal derivations provide insights that intuition can overlook.
Here is an example of an equation that was derived through somewhat informal reasoning.
E X A M P L E 1. 2 ( J o b T r a i n i n g a n d W o r k e r P r o d u c t i v i t y ) Consider the problem posed at the beginning of Section 1.1. A labor economist would like to examine the effects of job training on worker productivity. In this case, there is little need for formal economic theory. Basic economic understanding is sufficient for realizing that factors such as education, experience, and training affect worker productivity. Also, econ- omists are well aware that workers are paid commensurate with their productivity. This sim- ple reasoning leads to a model such as
wage f ( educ , exper , training ) (1.2)
where wage is hourly wage, educ is years of formal education, exper is years of workforce experience, and training is weeks spent in job training. Again, other factors generally affect the wage rate, but (1.2) captures the essence of the problem.
After we specify an economic model, we need to turn it into what we call an econo- metric model. Since we will deal with econometric models throughout this text, it is important to know how an econometric model relates to an economic model. Take equa- tion (1.1) as an example. The form of the function f () must be specified before we can undertake an econometric analysis. A second issue concerning (1.1) is how to deal with variables that cannot reasonably be observed. For example, consider the wage that a person can earn in criminal activity. In principle, such a quantity is well-defined, but it would be difficult if not impossible to observe this wage for a given individual. Even variables such as the probability of being arrested cannot realistically be obtained for a given individual, but at least we can observe relevant arrest statistics and derive a vari- able that approximates the probability of arrest. Many other factors affect criminal behavior that we cannot even list, let alone observe, but we must somehow account for them. The ambiguities inherent in the economic model of crime are resolved by specify- ing a particular econometric model:
crime 0 + 1 wagem + 2 othinc 3 freqarr 4 freqconv 5 avgsen 6 age u , (1.3)
where crime is some measure of the frequency of criminal activity, wagem is the wage that can be earned in legal employment, othinc is the income from other sources (assets, inheritance, etc.), freqarr is the frequency of arrests for prior infractions (to approxi- mate the probability of arrest), freqconv is the frequency of conviction, and avgsen is the average sentence length after conviction. The choice of these variables is deter- mined by the economic theory as well as data considerations. The term u contains unob-
A cross-sectional data set consists of a sample of individuals, households, firms, cities, states, countries, or a variety of other units, taken at a given point in time. Sometimes the data on all units do not correspond to precisely the same time period. For example, several families may be surveyed during different weeks within a year. In a pure cross section analysis we would ignore any minor timing differences in collecting the data. If a set of families was surveyed during different weeks of the same year, we would still view this as a cross-sectional data set. An important feature of cross-sectional data is that we can often assume that they have been obtained by random sampling from the underlying population. For exam- ple, if we obtain information on wages, education, experience, and other characteristics by randomly drawing 500 people from the working population, then we have a random sample from the population of all working people. Random sampling is the sampling scheme covered in introductory statistics courses, and it simplifies the analysis of cross- sectional data. A review of random sampling is contained in Appendix C. Sometimes random sampling is not appropriate as an assumption for analyzing cross-sectional data. For example, suppose we are interested in studying factors that influence the accumulation of family wealth. We could survey a random sample of fam- ilies, but some families might refuse to report their wealth. If, for example, wealthier families are less likely to disclose their wealth, then the resulting sample on wealth is not a random sample from the population of all families. This is an illustration of a sam- ple selection problem, an advanced topic that we will discuss in Chapter 17. Another violation of random sampling occurs when we sample from units that are large relative to the population, particularly geographical units. The potential problem in such cases is that the population is not large enough to reasonably assume the obser- vations are independent draws. For example, if we want to explain new business activ- ity across states as a function of wage rates, energy prices, corporate and property tax rates, services provided, quality of the workforce, and other state characteristics, it is unlikely that business activities in states near one another are independent. It turns out that the econometric methods that we discuss do work in such situations, but they some- times need to be refined. For the most part, we will ignore the intricacies that arise in analyzing such situations and treat these problems in a random sampling framework, even when it is not technically correct to do so. Cross-sectional data are widely used in economics and other social sciences. In eco- nomics, the analysis of cross-sectional data is closely aligned with the applied micro- economics fields, such as labor economics, state and local public finance, industrial organization, urban economics, demography, and health economics. Data on individu- als, households, firms, and cities at a given point in time are important for testing micro- economic hypotheses and evaluating economic policies. The cross-sectional data used for econometric analysis can be represented and stored in computers. Table 1.1 contains, in abbreviated form, a cross-sectional data set on 526 working individuals for the year 1976. (This is a subset of the data in the file WAGE1.RAW.) The variables include wage (in dollars per hour), educ (years of educa- tion), exper (years of potential labor force experience), female (an indicator for gender), and married (marital status). These last two variables are binary (zero-one) in nature
and serve to indicate qualitative features of the individual. (The person is female or not; the person is married or not.) We will have much to say about binary variables in Chapter 7 and beyond. The variable obsno in Table 1.1 is the observation number assigned to each person in the sample. Unlike the other variables, it is not a characteristic of the individual. All econometrics and statistics software packages assign an observation number to each data unit. Intuition should tell you that, for data such as that in Table 1.1, it does not matter which person is labeled as observation one, which person is called Observation Two, and so on. The fact that the ordering of the data does not matter for econometric analysis is a key feature of cross-sectional data sets obtained from random sampling. Different variables sometimes correspond to different time periods in cross- sectional data sets. For example, in order to determine the effects of government poli- cies on long-term economic growth, economists have studied the relationship between growth in real per capita gross domestic product (GDP) over a certain period (say 1960 to 1985) and variables determined in part by government policy in 1960 (government consumption as a percentage of GDP and adult secondary education rates). Such a data set might be represented as in Table 1.2, which constitutes part of the data set used in the study of cross-country growth rates by De Long and Summers (1991).
Table 1.
A Cross-Sectional Data Set on Wages and Other Individual Characteristics
obsno wage educ exper female married
1 3.10 11 2 1 0
2 3.24 12 22 1 1
3 3.00 11 2 0 0
4 6.00 8 44 0 1
5 5.30 12 7 0 1
525 11.56 16 5 0 1
526 3.50 14 5 1 0
the next. While most econometric procedures can be used with both cross-sectional and time series data, more needs to be done in specifying econometric models for time series data before standard econometric methods can be justified. In addition, modifi- cations and embellishments to standard econometric techniques have been developed to account for and exploit the dependent nature of economic time series and to address other issues, such as the fact that some economic variables tend to display clear trends over time. Another feature of time series data that can require special attention is the data fre- quency at which the data are collected. In economics, the most common frequencies are daily, weekly, monthly, quarterly, and annually. Stock prices are recorded at daily intervals (excluding Saturday and Sunday). The money supply in the U.S. economy is reported weekly. Many macroeconomic series are tabulated monthly, including infla- tion and employment rates. Other macro series are recorded less frequently, such as every three months (every quarter). Gross domestic product is an important example of a quarterly series. Other time series, such as infant mortality rates for states in the United States, are available only on an annual basis. Many weekly, monthly, and quarterly economic time series display a strong seasonal pattern, which can be an important factor in a time series analysis. For ex- ample, monthly data on housing starts differs across the months simply due to changing weather conditions. We will learn how to deal with seasonal time series in Chapter 10. Table 1.3 contains a time series data set obtained from an article by Castillo- Freeman and Freeman (1992) on minimum wage effects in Puerto Rico. The earliest year in the data set is the first observation, and the most recent year available is the last
Table 1.
Minimum Wage, Unemployment, and Related Data for Puerto Rico
obsno year avgmin avgcov unemp gnp
1 1950 0.20 20.1 15.4 878.
2 1951 0.21 20.7 16.0 925.
3 1952 0.23 22.6 14.8 1015.
37 1986 3.35 58.1 18.9 4281.
38 1987 3.35 58.2 16.8 4496.
observation. When econometric methods are used to analyze time series data, the data should be stored in chronological order. The variable avgmin refers to the average minimum wage for the year, avgcov is the average coverage rate (the percentage of workers covered by the minimum wage law), unemp is the unemployment rate, and gnp is the gross national product. We will use these data later in a time series analysis of the effect of the minimum wage on employment.
Some data sets have both cross-sectional and time series features. For example, suppose that two cross-sectional household surveys are taken in the United States, one in 1985 and one in 1990. In 1985, a random sample of households is surveyed for variables such as income, savings, family size, and so on. In 1990, a new random sample of households is taken using the same survey questions. In order to increase our sample size, we can form a pooled cross section by combining the two years. Because random samples are taken in each year, it would be a fluke if the same household appeared in the sample during both years. (The size of the sample is usually very small compared with the num- ber of households in the United States.) This important factor distinguishes a pooled cross section from a panel data set. Pooling cross sections from different years is often an effective way of analyzing the effects of a new government policy. The idea is to collect data from the years before and after a key policy change. As an example, consider the following data set on hous- ing prices taken in 1993 and 1995, when there was a reduction in property taxes in
A panel data (or longitudinal data) set consists of a time series for each cross- sectional member in the data set. As an example, suppose we have wage, education, and employment history for a set of individuals followed over a ten-year period. Or we might collect information, such as investment and financial data, about the same set of firms over a five-year time period. Panel data can also be collected on geographical units. For example, we can collect data for the same set of counties in the United States on immigration flows, tax rates, wage rates, government expenditures, etc., for the years 1980, 1985, and 1990. The key feature of panel data that distinguishes it from a pooled cross section is the fact that the same cross-sectional units (individuals, firms, or counties in the above
A second useful point is that the two years of data for city 1 fill the first two rows or observations. Observations 3 and 4 correspond to city 2, and so on. Since each of the 150 cities has two rows of data, any econometrics package will view this as 300 obser- vations. This data set can be treated as two pooled cross sections, where the same cities happen to show up in the same year. But, as we will see in Chapters 13 and 14, we can also use the panel structure to respond to questions that cannot be answered by simply viewing this as a pooled cross section. In organizing the observations in Table 1.5, we place the two years of data for each city adjacent to one another, with the first year coming before the second in all cases. For just about every practical purpose, this is the preferred way for ordering panel data sets. Contrast this organization with the way the pooled cross sections are stored in Table 1.4. In short, the reason for ordering panel data as in Table 1.5 is that we will need to perform data transformations for each city across the two years. Because panel data require replication of the same units over time, panel data sets, especially those on individuals, households, and firms, are more difficult to obtain than pooled cross sections. Not surprisingly, observing the same units over time leads to sev-
Table 1.
A Two-Year Panel Data Set on City Crime Statistics
obsno city year murders population unem police
1 1 1986 5 350000 8.7 440
2 1 1990 8 359200 7.2 471
3 2 1986 2 64300 5.4 75
4 2 1990 1 65100 5.5 75
297 149 1986 10 260700 9.6 286
298 149 1990 6 245000 9.8 334
299 150 1986 25 543000 4.3 520
300 150 1990 32 546200 5.2 493
eral advantages over cross-sectional data or even pooled cross-sectional data. The ben- efit that we will focus on in this text is that having multiple observations on the same units allows us to control certain unobserved characteristics of individuals, firms, and so on. As we will see, the use of more than one observation can facilitate causal infer- ence in situations where inferring causality would be very difficult if only a single cross section were available. A second advantage of panel data is that it often allows us to study the importance of lags in behavior or the result of decision making. This infor- mation can be significant since many economic policies can be expected to have an impact only after some time has passed. Most books at the undergraduate level do not contain a discussion of econometric methods for panel data. However, economists now recognize that some questions are difficult, if not impossible, to answer satisfactorily without panel data. As you will see, we can make considerable progress with simple panel data analysis, a method which is not much more difficult than dealing with a standard cross-sectional data set.
Part 1 of this text is concerned with the analysis of cross-sectional data, as this poses the fewest conceptual and technical difficulties. At the same time, it illustrates most of the key themes of econometric analysis. We will use the methods and insights from cross-sectional analysis in the remainder of the text. While the econometric analysis of time series uses many of the same tools as cross- sectional analysis, it is more complicated due to the trending, highly persistent nature of many economic time series. Examples that have been traditionally used to illustrate the manner in which econometric methods can be applied to time series data are now widely believed to be flawed. It makes little sense to use such examples initially, since this practice will only reinforce poor econometric practice. Therefore, we will postpone the treatment of time series econometrics until Part 2, when the important issues con- cerning trends, persistence, dynamics, and seasonality will be introduced. In Part 3, we treat pooled cross sections and panel data explicitly. The analysis of independently pooled cross sections and simple panel data analysis are fairly straight- forward extensions of pure cross-sectional analysis. Nevertheless, we will wait until Chapter 13 to deal with these topics.
In most tests of economic theory, and certainly for evaluating public policy, the econo- mist’s goal is to infer that one variable has a causal effect on another variable (such as crime rate or worker productivity). Simply finding an association between two or more variables might be suggestive, but unless causality can be established, it is rarely compelling. The notion of ceteris paribus —which means “other (relevant) factors being equal”—plays an important role in causal analysis. This idea has been implicit in some of our earlier discussion, particularly Examples 1.1 and 1.2, but thus far we have not explicitly mentioned it.
As described earlier, this may not seem like a very good experiment, because we have said nothing about choosing plots of land that are identical in all respects except for the amount of fertilizer. In fact, choosing plots of land with this feature is not feasible: some of the factors, such as land quality, cannot even be fully observed. How do we know the results of this experiment can be used to measure the ceteris paribus effect of fertilizer? The answer depends on the specifics of how fertilizer amounts are chosen. If the levels of fer- tilizer are assigned to plots independently of other plot features that affect yield—that is, other characteristics of plots are completely ignored when deciding on fertilizer amounts— then we are in business. We will justify this statement in Chapter 2.
The next example is more representative of the difficulties that arise when inferring causality in applied economics.
E X A M P L E 1. 4 ( M e a s u r i n g t h e R e t u r n t o E d u c a t i o n ) Labor economists and policy makers have long been interested in the “return to educa- tion.” Somewhat informally, the question is posed as follows: If a person is chosen from the population and given another year of education, by how much will his or her wage increase? As with the previous examples, this is a ceteris paribus question, which implies that all other factors are held fixed while another year of education is given to the person. We can imagine a social planner designing an experiment to get at this issue, much as the agricultural researcher can design an experiment to estimate fertilizer effects. One approach is to emulate the fertilizer experiment in Example 1.3: Choose a group of people, randomly give each person an amount of education (some people have an eighth grade education, some are given a high school education, etc.), and then measure their wages (assuming that each then works in a job). The people here are like the plots in the ferti- lizer example, where education plays the role of fertilizer and wage rate plays the role of soybean yield. As with Example 1.3, if levels of education are assigned independently of other characteristics that affect productivity (such as experience and innate ability), then an analysis that ignores these other factors will yield useful results. Again, it will take some effort in Chapter 2 to justify this claim; for now we state it without support.
Unlike the fertilizer-yield example, the experiment described in Example 1.4 is infeasible. The moral issues, not to mention the economic costs, associated with ran- domly determining education levels for a group of individuals are obvious. As a logis- tical matter, we could not give someone only an eighth grade education if he or she already has a college degree. Even though experimental data cannot be obtained for measuring the return to edu- cation, we can certainly collect nonexperimental data on education levels and wages for a large group by sampling randomly from the population of working people. Such data are available from a variety of surveys used in labor economics, but these data sets have a feature that makes it difficult to estimate the ceteris paribus return to education.
People choose their own levels of education, and therefore education levels are proba- bly not determined independently of all other factors affecting wage. This problem is a feature shared by most nonexperimental data sets. One factor that affects wage is experience in the work force. Since pursuing more education generally requires postponing entering the work force, those with more edu- cation usually have less experience. Thus, in a nonexperimental data set on wages and education, education is likely to be negatively associated with a key variable that also affects wage. It is also believed that people with more innate ability often choose higher levels of education. Since higher ability leads to higher wages, we again have a correlation between education and a critical factor that affects wage. The omitted factors of experience and ability in the wage example have analogs in the the fertilizer example. Experience is generally easy to measure and therefore is sim- ilar to a variable such as rainfall. Ability, on the other hand, is nebulous and difficult to quantify; it is similar to land quality in the fertilizer example. As we will see through- out this text, accounting for other observed factors, such as experience, when estimat- ing the ceteris paribus effect of another variable, such as education, is relatively straightforward. We will also find that accounting for inherently unobservable factors, such as ability, is much more problematical. It is fair to say that many of the advances in econometric methods have tried to deal with unobserved factors in econometric models. One final parallel can be drawn between Examples 1.3 and 1.4. Suppose that in the fertilizer example, the fertilizer amounts were not entirely determined at random. Instead, the assistant who chose the fertilizer levels thought it would be better to put more fertilizer on the higher quality plots of land. (Agricultural researchers should have a rough idea about which plots of land are better quality, even though they may not be able to fully quantify the differences.) This situation is completely analogous to the level of schooling being related to unobserved ability in Example 1.4. Because better land leads to higher yields, and more fertilizer was used on the better plots, any observed relationship between yield and fertilizer might be spurious.
E X A M P L E 1. 5 ( T h e E f f e c t o f L a w E n f o r c e m e n t o n C i t y C r i m e L e v e l s )
The issue of how best to prevent crime has, and will probably continue to be, with us for some time. One especially important question in this regard is: Does the presence of more police officers on the street deter crime? The ceteris paribus question is easy to state: If a city is randomly chosen and given 10 additional police officers, by how much would its crime rates fall? Another way to state the question is: If two cities are the same in all respects, except that city A has 10 more police officers than city B, by how much would the two cities’ crime rates differ? It would be virtually impossible to find pairs of communities identical in all respects except for the size of their police force. Fortunately, econometric analysis does not require this. What we do need to know is whether the data we can collect on community crime levels and the size of the police force can be viewed as experimental. We can certainly imagine a true experiment involving a large collection of cities where we dictate how many police officers each city will use for the upcoming year.
E X A M P L E 1. 7 ( T h e E x p e c t a t i o n s H y p o t h e s i s )
The expectations hypothesis from financial economics states that, given all information available to investors at the time of investing, the expected return on any two investments is the same. For example, consider two possible investments with a three-month investment horizon, purchased at the same time: (1) Buy a three-month T-bill with a face value of $10,000, for a price below $10,000; in three months, you receive $10,000. (2) Buy a six- month T-bill (at a price below $10,000) and, in three months, sell it as a three-month T-bill. Each investment requires roughly the same amount of initial capital, but there is an impor- tant difference. For the first investment, you know exactly what the return is at the time of purchase because you know the initial price of the three-month T-bill, along with its face value. This is not true for the second investment: while you know the price of a six-month T-bill when you purchase it, you do not know the price you can sell it for in three months. Therefore, there is uncertainty in this investment for someone who has a three-month investment horizon. The actual returns on these two investments will usually be different. According to the expectations hypothesis, the expected return from the second investment, given all infor- mation at the time of investment, should equal the return from purchasing a three-month T-bill. This theory turns out to be fairly easy to test, as we will see in Chapter 11.
In this introductory chapter, we have discussed the purpose and scope of economet- ric analysis. Econometrics is used in all applied economic fields to test economic the- ories, inform government and private policy makers, and to predict economic time series. Sometimes an econometric model is derived from a formal economic model, but in other cases econometric models are based on informal economic reasoning and intuition. The goal of any econometric analysis is to estimate the parameters in the model and to test hypotheses about these parameters; the values and signs of the parameters determine the validity of an economic theory and the effects of certain policies. Cross-sectional, time series, pooled cross-sectional, and panel data are the most common types of data structures that are used in applied econometrics. Data sets involving a time dimension, such as time series and panel data, require special treat- ment because of the correlation across time of most economic time series. Other issues, such as trends and seasonality, arise in the analysis of time series data but not cross- sectional data. In Section 1.4, we discussed the notions of ceteris paribus and causal inference. In most cases, hypotheses in the social sciences are ceteris paribus in nature: all other rel- evant factors must be fixed when studying the relationship between two variables. Because of the nonexperimental nature of most data collected in the social sciences, uncovering causal relationships is very challenging.
Causal Effect Experimental Data Ceteris Paribus Nonexperimental Data Cross-Sectional Data Set Observational Data Data Frequency Panel Data Econometric Model Pooled Cross Section Economic Model Random Sampling Empirical Analysis Time Series Data