Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


appunti policy evaluation, Appunti di Econometria

Appunti lezioni anno 2023 prof. Giupponi (prima metà)

Tipologia: Appunti

2022/2023

Caricato il 12/06/2023

giulia-gianetti
giulia-gianetti 🇮🇹

1 documento

1 / 52

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
Policy evaluation – 1 st
part
Lecture 1 – Introduction
Policy evaluation = evaluating consequences of given policy interventions.
Objectives of the course
Introduce the main econometric techniques used in data analysis for the identification of causal relationships.
The methods covered will enable us to:
- Investigate important social and economic problems facing modern governments and societies
- assess quantitatively the effects of policies implemented to tackle them
The methods will be illustrated with examples at the frontier of economics and social science research topics
include education, racial disparities, tax policy, social insurance, minimum wages and gender inequality.
Structure of the course
Part I: Review of hypothesis testing and regression model
Part II: Correlation and causality (core thing) discussion of difference between correlation and causality between
two variables
Part III: Empirical methods + STATA applications (actual methods)
- Randomized controlled trials gold standard but not always implementable
- Natural experiments situations occurring in nature which mimic what you would try to create in your
experiment
- Instrumental variables
- Regression discontinuity design
- Panel data, diff-in-diff and synthetic control
Each of these topics will have a matching STATA class to see how to do that in practice.
Part IV: Topics in economic history and policy analysis
Thinking in causal terms
As human beings, we tend to always portray the world in causal terms (even if it doesn’t make sense).
The “if-then” thinking = we hypothesize that something (eg. finger crossing) is causing an outcome (eg. winning the
lottery), it is inescapable but can also be problematic.
Instances in which the public debate (consciously or not) portrays correlation as causality:
- Association between immigrant status and involvement in crime immigrants = criminals? If yes (so if there
is causation), then the policy implication is to close borders
- Association between minority status and test scores (minorities tend to be associated with poor performance
in school or university) does it mean that being a woman, belonging to religious group, racial minority etc,
will cause lower test scores? if yes, then it is efficient to discriminate? If men are the best, then let’s only
hire men or only white people
Extreme examples but this sort of thinking really present in everyday life. As social scientists and policymakers of the
future, we need to take a step further to really understand these links and communicate them better.
Policy debate slide
Disney example: Being in office 4 days a week will cause an increase in creativity and benefit the company’s culture
it’s a causal statement
TUC is one of the UK’s largest unions: claim of causal effect between having a higher min wage and lifting people out of
poverty
Universal basic income not good to fight poverty implicit causal relationship
Is there indeed a causal relationship?
Research questions
Type of causal questions we are going to ask will be like this:
What is the causal effect
- of flexible work arrangements on productivity?
- of minimum wages on employment and incomes?
- of a universal basic income on poverty?
- … in general what is the causal effect of a treatment on an outcome?
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34

Anteprima parziale del testo

Scarica appunti policy evaluation e più Appunti in PDF di Econometria solo su Docsity!

Policy evaluation – 1st^ part

Lecture 1 – Introduction Policy evaluation = evaluating consequences of given policy interventions. Objectives of the course Introduce the main econometric techniques used in data analysis for the identification of causal relationships. The methods covered will enable us to:

  • Investigate important social and economic problems facing modern governments and societies
  • assess quantitatively the effects of policies implemented to tackle them The methods will be illustrated with examples at the frontier of economics and social science research  topics include education, racial disparities, tax policy, social insurance, minimum wages and gender inequality. Structure of the course Part I: Review of hypothesis testing and regression model Part II: Correlation and causality (core thing)  discussion of difference between correlation and causality between two variables Part III: Empirical methods + STATA applications (actual methods)
  • Randomized controlled trials  gold standard but not always implementable
  • Natural experiments  situations occurring in nature which mimic what you would try to create in your experiment
  • Instrumental variables
  • Regression discontinuity design
  • Panel data, diff-in-diff and synthetic control Each of these topics will have a matching STATA class to see how to do that in practice. Part IV: Topics in economic history and policy analysis Thinking in causal terms As human beings, we tend to always portray the world in causal terms (even if it doesn’t make sense). The “if-then” thinking = we hypothesize that something (eg. finger crossing) is causing an outcome (eg. winning the lottery), it is inescapable but can also be problematic. Instances in which the public debate (consciously or not) portrays correlation as causality:
  • Association between immigrant status and involvement in crime  immigrants = criminals? If yes (so if there is causation), then the policy implication is to close borders
  • Association between minority status and test scores (minorities tend to be associated with poor performance in school or university)  does it mean that being a woman, belonging to religious group, racial minority etc, will cause lower test scores?  if yes, then it is efficient to discriminate? If men are the best, then let’s only hire men or only white people Extreme examples but this sort of thinking really present in everyday life. As social scientists and policymakers of the future, we need to take a step further to really understand these links and communicate them better. Policy debate slide Disney example: Being in office 4 days a week will cause an increase in creativity and benefit the company’s culture  it’s a causal statement TUC is one of the UK’s largest unions: claim of causal effect between having a higher min wage and lifting people out of poverty Universal basic income not good to fight poverty  implicit causal relationship Is there indeed a causal relationship? Research questions Type of causal questions we are going to ask will be like this: What is the causal effect
  • of flexible work arrangements on productivity?
  • of minimum wages on employment and incomes?
  • of a universal basic income on poverty?
  • … in general  what is the causal effect of a treatment on an outcome?

Example - Relationship between education and labour market outcomes Data for the US in 2021. Strong gradient where people who are better educated tend to earn higher wages/negative gradient where higher levels of education are correlated with lower unemployment rate. We see a strong association between schooling and earnings, does it mean that schooling increases earnings? = there is a correlation, is there causality? Possible explanations of relationship between education and earnings:

  1. Causal effect of schooling on earnings BUT there could be some confounding factors (earnings ability = some individuals have some underlying human predisposition/characteristic that are better rewarded in our labour market) so higher levels of earnings ability will lead to higher earnings, how about relationship between earnings ability and schooling? They are positively associated; it might be that the same characteristics which are highly rewarded in labour market are also those that make it less costly for individuals to sit through a class. This means that in an extreme sense the relationship we are observing here might be either a true causal effect of schooling on earnings or entirely explained by the fact that the individuals that we observe as having more years of schooling are those that have those underlying characteristics that also are better rewarded in labour market.
  2. Alternative explanation = other factors could influence both educational attainment and earnings (eg. family background, intelligence, …) Without additional analyses, impossible to distinguish between the two mechanisms (= identification issue, not being able to distinguish between causal effect from confounding factor), but huge implications  essential to inform actions of policymakers, voters and citizens Lecture 2 – Hypothesis testing Precisely because we are going to ask causal questions for the rest of the course, we will often want to compare mean outcomes in two different groups. For example:  Mean business profits for those involved in business training programs vs those who aren’t  Mean reemployment wages for unemployed who engage in active labour market program vs those who don’t  Vaccination rates for children whose parents were offered food for bringing children to clinic vs those whose parents weren’t  Test scores for children in smaller classes vs those in larger classes Example: Here, info on whether people have health insurance or not. For example, avg age of those who have health insurance is 44, the one of those who do not have it is 41. How should we interpret these differences in means? We often compare means when estimating treatment effects, is the difference “real”? = How likely is it that the particular difference in means has arisen by chance?

The variance of the sample mean is: The more dispersed the population is going to be, the more dispersed is the distribution of the sample mean is going to be // the more observations I can draw in my sample, the lower the variance of the sample mean is going to be. If I could get samples that are larger, I would on avg get closer to the true population mean. We don’t observe sigma or mu because they are population parameters (characteristics). We need to derive an estimator for sigma to find the sampling standard deviation. We need to have a sample counterfact of this formula which is the standard error of the sample mean (something we can actually calculate with the data). Back to assessing differences in means How will we approach the question of whether it is a real difference or something due to sampling variation? We will construct a test statistic and then form a belief (= define a hypothesis ) about how the world really works based upon this  this often means asking “How surprised should I be by what I observe if a particular model of the world were true?” For example, consider case in which we want to understand whether students who are allocated to smaller/larger classes achieve better results. We are going to start from a given hypothesis “there is no difference in test scores”, we’ll go to the data, observe the sample, use our estimator, and construct the mean of those in large and those in small classes to compare the two. Question we’ll ask ourselves is how surprising would it be to observe that test scores for students randomly allocated to smaller classes were five points higher if in fact class size does not matter? Data on people who have free health insurance (=can consume as much health services as they want) and those who have minimal health coverage (= covers only catastrophic events). The avg health expenses of those under the minimal one is 636, the difference with others is 285. Below in brackets we have the standard error of that means. Is this 285 real (= is it statistically relevant/is that statistically significant) or just outcome of sampling variation? Statisticians have developed formal methods to quantify how likely or unlikely a difference is. The starting point is to compare our estimate to some fixed benchmark, a procedure called hypothesis testing. Classical hypothesis testing begins with formulating a null hypothesis: for example, providing free care does not affect total health expenditures (typically test the null hypothesis that the treatment had no effect) Then, we test the null hypothesis against an alternative one and do so taking a falsification approach = the null hypothesis is true unless the data provides strong evidence against it. We reject the null if we have evidence beyond a reasonable doubt against it. Let’s test out our null hypothesis Is it plausible that we would observe a difference of 285 if indeed free care did not affect usage? First building block is defining the null, then t-statistic = a standardized difference in averages between observation and hypothesis: 285 is difference in averages, 72 is standard error and mu0 = 0

We need another building block of hypothesis testing: central limit theorem. By the central limit theorem, if the data come from a distribution with a difference in means of mu0, then the t-statistic has a standard normal distribution. We use this result to:  Perform hypothesis testing  Compute p-values for the hypothesis  Confidence intervals Classical hypothesis testing Starting from a null hypothesis, we have to test whether to reject it or not. First, we compute the t-statistic  is that t large enough or not? It depends on the research, as researchers we fix the critical values (= the benchmark). It is up to us whether to decide if it is “large” enough to reject the null hypothesis. Suppose we adopt rule that we reject the null when abs. value of t statistic is greater than 1.96 (this number means we reject the null at a 5% confidence level)  this comes from the central limit theorem We observe a t statistic in the tails only 5% of the time. The areas under the tails are called “rejection region”, while the area in the middle is the “acceptance region”. The number used as benchmark is called “critical value”. Intuition is that 5% means we allow our hypothesis testing to reject a null with a 5% confidence level, we are willing to take the risk that we commit a type 1 error (= reject the null when it’s actually true) 5% of the time. We can be more risk averse and choose a 1% confidence level, critical value increases to 2.58  we reduce the rejection region and increase the acceptance region. Picking a significance level means how often we are willing to reject the null hypothesis when it is actually true. The standard in economics is set at 5% (sometimes 10 or 1%). We set the decision rule = we pick a significance level or size. 3.96 is greater than 1.96 so we reject the null, also > 2.58 so we are able to reject the null at 5% confidence level and also 1% confidence level. P-values They are usually computed from the t-statistic, computed by STATA. P-value is the probability of observing an estimate at least as adverse to the null hypothesis as the one you actually observed in your sample  The smaller the p-value, the lower the likelihood of observing the actual estimate under the null hypothesis so lower p-value means more reason to reject the null p-values are a more informative reparameterization of the test results, rather than simply reject/don’t reject. It is computing the area under the tails. Measure of the area under the tails from 1.96 so 0.05 = 5%, if the t is 2.58 then p- value is 1%. It is a measure of the area under the tails given a particular value of t.  For example, if the p-value is 0.03, we would reject the null at the 5%-level but not at the 1%-level  The p-value for our t-statistic of 3.96 is 0.000075 (very small)

Even according to p-value, we reject the null. If p-value is greater than 5%, we cannot reject the null / if lower, we reject the null. We don’t use standard deviations but actual values. Economic and statistical significance are not the same! The 95%-confidence interval for the difference in health expenses for the groups with free and catastrophic coverage is [$144, $426]. We easily (p-value<0.01) reject the hypothesis that expenditures are the same in the two groups But is the mean difference of $285 large or small? Economic significance has to do with how big a point estimate is. Summary We use the sampling distribution to construct standard errors and with them: formal t-tests, p-values, and confidence intervals:  Standard errors and confidence intervals tell us about the precision of our estimates  Hypothesis tests and p-values tell us whether our data are consistent with a priori specified values of our parameters of interest

Lecture 3 – The linear regression model Suppose we are interested in relationship between school resources and achievements of students in school districts in California. Increasing school resources means increasing number of teachers as to decrease the student teacher ratio = nr of students / nr of teachers Consider a school authority considering a change in its class size policy: the authority is considering hiring additional teachers to reduce class sizes. To evaluate this policy, it would like to know if doing so will improve student performance. Sample: California school districts (𝑛𝑛 = 420) for 1999 Variables:  District mean 5th grade test scores  Student-teacher ratio = number of students in district divided by the number of full-time equivalent teachers, a measure of how much resources are invested in the district (larger nr of teachers = larger amount of resources) We are going to start by thinking about regression as a tool to describe data  explore the mechanical features of regression We will then come back to the causal question = “do smaller classes result in better outcomes for students?” Whenever using any type of data, best way to start is to plot the data. Look at raw correlations between the 2 variables and see how they look like in a graph. Good first step. Test scores as our y, student-teacher ratio as x. Usually, y is the dependent variable, which is a function of many x, including nr of students per teacher. What do we learn from this graph? A lot of dispersion, huge variations in test scores even for districts with the same class size  Test scores are probably related to many other factors besides class size A visible relationship between class size and test scores, test scores are determined by many factors but here we only look at two variables so can’t understand a lot. How can we better summarize (check whether or not there is a relationship) this relationship? A simple way to summarize the relationship between the two variables is to fit a line = a regression line We want to draw a line through all these points that fits “as closely as possible” (draw line as close as possible to the points)  line should minimize distance between line and each of the dots Think of relationship between dependent variable and the x as a linear function which can be written as: Alfa is the intercept (min value of the test scores), 𝛽 is the slope of the line, tells us the difference we would expect to see in test scores for each unit difference in the student-teacher ratio, it is multiplying our x

eg. population density or selection of students according to house prices. How results look like in STATA: “Constant” is the intercept alfa Str is the slope beta When we run a regression, we estimate alfa and beta separately and this means (given that for the property of the OLS, the error are 0) that we can construct a predicted value of y, knowing x  for each value of x, knowing alfa and beta, we can have a predicted value of y (= the outcome, in this case test scores). Suppose we want to know the predicted value of y when observing classes of ratio 20. I need to substitute x with 20 and compute: y = alfa – 2.28 x 20 = 653. We can use alfa and beta to predict values of the outcome y. Summary Scatterplots are a good way to display bivariate relationships:

  • you should almost always start by looking at them
  • they provide useful but sometimes overwhelming information Linear regression is the best linear approximation to the data:
  • It minimizes the sum of squared residuals and
  • Neatly summarizes the relationship between two variables in just two numbers Inference Now we move to inference, in particular hypothesis testing, and apply it to regressions. We have estimates of a bivariate relationship, now what? Once we have estimated the parameters in a regression model, we will typically want to test hypotheses, construct confidence levels, or calculate p-values. (is this 2.28 real? Is it statistically significant?) We need to extend the tools we used when conducting inference on sample averages. The estimated slopes in the two pictures are different: the first is -0.77; the second is -2.85  neither matches the population regression slope of -2. We can repeatedly draw samples and estimate the regression
  • The mean of the first ten samples I drew was -1. The estimator for the regression slope has sampling variation  STATA will calculate the standard deviation of the slope – the standard error – for you.

In STATA, our results look like this: Estimate of std error = a measure of the precision of our estimates Remember, point estimates and std err are building blocks for computing the t-statistic so for hypotheses testing, having 0.480 in mind, let’s test for the significance for the slope of the line. We are usually interested in testing whether the slope is different from 0 or not. Hypotheses testing starts with the null hypothesis, which in this case is Beta = 0, then alternative hypothesis is beta different from 0. There are 3 ways to perform hypothesis testing:

  1. Classical hypothesis testing (= t-statistic) Compute the t and compare it to the critical value (= some value we set at the beginning of the test), usually ok to have significance level 5% (= reject the null when it is true so we make type I error 5% of the time) We know t is approximated by std normal distribution, then the area under the tails of that distribution is defined by 1.96 and – 1.96  that value defines rejection and acceptance region So, compute the t and see whether it’s greater than 1. To compute the t, we look at -2.28 – target which is mu 0 =0 / std error (we have a measure that is 0.48) = 4. 4.75 is greater than critical value 1.96 so we can reject the null (it is in the rejection area)
  2. P-value approach p-value is 0.000, compare p-value to confidence level  in this case the confidence level is 5% 0.000< 0.05 so we reject the null hypothesis
  3. Confidence intervals approach  find an upper and lower bound First, compute lower bound: point estimate – (critical value x std error) = - 1.34 /// upper bound is point estimate + (critical value x std error)
  • -2.28 – (1.96 x 0.48) = -3.
  • -2.28 + (1.96 x 0.48) = -1. The confidence interval of our estimate does not intercept the line of 0 then we can reject the null. Homoskedasticity and heteroskedasticity Should you use conventional or robust standard errors?

What if we have a relationship like the one between test scores and district income? The linear regression does not fit too well. There is some curvature (most point are below the OLS line when income is very low or very high, but they are above the line when income is between 15 and 30K), in this relationship between test scores and district income, that is not captured by the linear regression  it seems the relationship between district income and test scores is not a straight line  it is non-linear. A non-linear function is a function with a slope that is not constant: f(x) is linear if the slope is the same for all values of x, but if the slope depends on value of x then f(x) is non-linear. Then if a line is not an accurate description of this relationship, what is? One way to address the issue and capture that concavity is to model the relationship as a quadratic function = we could model test scores as a function of income and the square of income. So, augment our linear regression by including a quadratic term. This fits the data better! This is the quadratic regression model. Next question is: so far we have talked about regression as linear, what is linear about linear regression? The linearity in the OLS regression function is in the parameters (not necessarily in the data). This is Cobb-Douglas production function which describes output of firm I as a function of alfa (total factor productivity), K is capital input and L is labour input in the firm. Say we are interested in estimating alfa, beta e gamma. We can’t use OLS on this type of formulation, but we can take a log transformation of that function and then run a regression. This function is not linear in the original data but then we can transform it in something that is linear in its parameters (beta and gamma in this case, not in alfa but we can calculate delta and then get alfa). How should we interpret specifications that are non-linear in the original data? Like one above where income enters linear and quadratic. (1,000 because income is reported in thousands of dollars so a 1 unit increase in income 1 is actually 1,000 dollar increase) In the quadratic specification, it is trickier. How to interpret association between test scores and avg income? Derivatives Treat the regression function as a function and take der of test scores wrt income  this gives us association between test scores and income.

This association is no longer just a number but itself a function of income. This means we can plug in there different numbers eg. interested in association at the median of income you just plug the number in. More generally, we use the log transformation more than the quadratic specification, why? The log specification (logs on both sides of the regression) allows us to retrieve proportional effects: proportional changes of income are often more reasonable than additive changes. In terms of economics impact, a $1,000 change is pretty big for a district with an avg income of $15,000; a $1, change may be less meaningful for a district with an avg income of $40,000. Comparing a percentage change may be a more similar exercise = more meaningful to think in terms of proportional change. E.g., what should we expect when income is 10% larger? 1% larger?  the log specification allows us to do so There are 3 different cases in which logarithms might be used:

  • X is transformed by taking its log but y is not  lin(ear)-log model
  • Y is transformed by taking its log but x is not  log-lin(ear) model
  • Both y and x are transformed to their logarithms  log-log model Case 1: In this model, a 1% change in x is associated with a change of Y of 0.01beta (or beta/100). Case 2: Case 3: In this case, beta is the elasticity of y wrt x, so the ratio of percentage change in y associated with percentage change in x. Summary:

We will often want to include more than one dummy variable. Each coefficient is interpreted in same way = incremental difference in the outcome between group with dummy switched on and switched off. We are allowing our regression to capture a different avg value of y Interaction between two or more variables Many of the descriptions we would like to see involve the interaction between two or more variables. Is the relationship between class size and test scores the same for:

  • High and low ability students?
  • Poor and rich students? Many questions of interest are about heterogeneous causal effects E.g., does job placement assistance (some form of active labour market policy) have a different impact on the recently unemployed or the long-term unemployed? Knowing if there’s a difference in impact between different groups allows you to design it better, spend less and make it more efficient. Yesterday, we had two dummy variables entering linearly. Now, we include an interaction term in our regression = one extra variable that is the product of the other two. Dummy variables entering separately and then interacting, we call beta 1 and beta 2 the main effects, then beta 3 is the interaction effect. Whenever you want to include an interaction, remember to have all the main effects in your regression. “Avg value of the outcome conditional on Di switched on and Gi switched off” The predicted avg value for test scores of indiv who are in small classes and are not English learners is the second line. We are interested in difference in test scores between small and large classes The association between being in a small class and test scores will be a combo of beta 1 and beta 3  beta 3 is also carrying the effect of gi. We start ask ourselves difference in test scores between small and large class for districts with low numbers of English learners: you get the relationship between being in a small district and test scores is captured by beta 1 (this is telling you the incremental effect of being in small classes). For districts with high numbers of English learners:Same difference but conditioning on Gi =1, you obtain beta1 + beta3  we see that when employing this richer specification, the relationship will depend on whether the district is one with few or many English learners. Does this necessarily imply that the association between test scores and class size is difference depending on whether Gi is switched on or off? Our specification allowed us to capture a potential difference between the two. Smaller classes on avg have scores 7 points higher than larger classes. Association between being in a small class and test scores is 3.6 (test scores are 3.6 points higher in small classes conditional on having high number of English learners) 3 rd^ column is giving beta1, beta2, beta Beta 1 change in test scores between small and large classes conditional on Gi =0  1. Beta 3 gives difference We know beta 1 + beta 3 gives us diff in test scores between large and small class conditional on dummy switched on.

When we take this model to the data, we can apply standard OLS (so minimize the sum of squared residuals) and obtain the OLS estimates of our coefficients alfa and beta. We’ll be interested more in beta = the slope of our regression line  it can be rewritten as ratio between covariance between y and x / variance of x. Up to now this is statistics, we are given two variables and we try to draw a line between the cloud of points. Next step is understanding what info the beta coefficient is giving us, we might be tempted to give a causal interpretation to this beta coefficient: if I increase x by a given quantity, that will on avg generate a change in y; or a change in x is associated with a change in y (simply say that there is an association between the two variables which is described by beta, this is more accurate than saying “causes”). Question is “does the OLS-estimated coeff beta hat capture the true causal effect of x on y?” … in most cases, no! Example: education and earnings Imagine we have (fictitious) data for a set of people on number of years of education and on daily wages in euros. Causal effect of education on earnings is an important question for economics, social sciences and policy: schooling is one of the largest areas in which policy makers can intervene and invest  trying to understand whether increasing access or quality of education can actually increase the earnings abilities of individuals, hence their earnings and the amount they are able to contribute, is generally important. We want to understand whether an increase in education causes an increase in earnings (I want to estimate the effect of additional years of education on earnings  interested in understanding if an increase in years of education leads causally to an increase in earnings). Start by trying to look at the correlation between the two variables. We see scatterplot showing a positive association between the two: whenever years of education increase, we tend to observe higher daily wages. Can we conclude anything about causality between education and earnings? NO. But why? Various reasons why we need to be careful in interpreting beta. What we see could be in part a causal effect and in part just a mere association. There are at least two possible explanations of this relationship:

  1. causal effect of schooling on labor earnings (imply that schooling increases earning ability of individuals)
  2. omitted factors and/or selection = other characteristics, which are omitted from equation, may influence both increases in education and increases in earnings (e.g. family background, motivation, intelligence) o There could be a 3rd^ factor driving both increase in education and increase in wages We are interested in effect of a given treatment on an outcome. A confounder is a factor correlated with both outcome and treatment which might drive all or half of the relationship we observe in the data. Regression output in STATA : regress wage (Y) on education (X). How to interpret the results?
  • Beta coeff “education” is slope of the fitted line, telling us the avg increase in daily wages that is associated with a 1 unit increase in education = telling us that on avg 1 additional year of education is associated with a 4.4 euro avg increase in daily wages.
  • Constant is the intercept of the fitted line, telling us what is the avg daily wage for an indiv with 0 education

 an indiv with 0 years of education would earn 61 euros a day but in practice this has no meaning in this context because there in this data, there is no one with 0 years of education (not particularly interesting to us). One very plausible omitted factor here is the ability of individuals: in particular, we talk about a specific type of ability which leads individuals to be both successful in schooling and in labour market. When comparing people that have high vs low education in the graph above: we see that this on avg corresponds to an earnings differential, but we are not considering the fact that indiv who have “high earnings” might also have high earnings ability irrespective of their education level (they are smarter and their skills are priced at a higher level in labour market). So, when we are comparing people with high vs low education, we are also implicitly comparing people with high earnings ability vs people with low earnings ability  it could be that part or all of this association is explained by the fact that they are smarted rather than more educated! Imagine in the data we observe a distribution of education that is a combo of blue and red distributions here. Imagine then we have indicator for whether an indiv is smart or not (although this would be difficult to obtain because intelligence is multidimensional and just difficult to measure): if you split the sample between indiv who are smart and not smart, you see a strong correlation between being smart and having high education level. What this implies in practice is that in the previously shown linear correlation, we are fitting a line ignoring these two groups of people: we fit a line through education ignoring the fact that we actually have two different groups (smart and not smart). When we modify our regression, taking into account the fact that there are two groups, we see how the fitted line is very different  the beta coeff (slope) is much lower because we are now accounting for the fact that part of that positive association was not at all accounted for by education, rather it was explained by earnings’ ability. How do we do this in practice? How can we separately capture the two dimensions in our regression? Dummy variables come into play. Assume that one extra year of education is associated with same increase in earnings for both low and high ability indiv (simplifying assumption) = relationship between education and earnings is the same irrespective of whether you are smart or not  the two lines are parallel (same slope) We impose same slope for smart and not smart. What we are allowing for is for a different intercept because now we are running a regression which includes our indicator for being smart. There are these two clouds both having a positive association in there but there are different levels of earnings based on being smart or not smart. We include one extra variable = a dummy for being smart or not. The dummy allows our model to have two different intercepts (one for the smart and one for the not smart). Results are to be interpreted differently: