









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to the environmental kuznets curve (ekc), a concept in environmental economics that posits an inverted u-shaped relationship between economic growth and environmental degradation. It explores the origins of the ekc, its conceptual framework involving different phases of economic development, and the conditions under which the ekc hypothesis is verified using regression models. The document also includes practical exercises involving importing and exploring datasets to empirically test the ekc hypothesis using data from the world bank's world development indicators databank. It is useful for students learning about environmental economics and econometrics, providing both theoretical background and hands-on data analysis experience. Structured as a lab assignment, guiding students through the process of data import, exploration, and regression modeling to understand the ekc.
Typology: Exercises
1 / 16
This page cannot be seen from the preview
Don't miss anything!










import numpy as np import pandas as pd import statsmodels.formula.api as smf
Reference:The evolution of the environmental Kuznets curve hypothesis assessment: A literature review under a critical analysis perspective02809-2? _returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS
"The results of the research of Kuznets disclosed an inverted U-shaped relationship between income per capita and income inequality. According to Kuznets, the inverted U-shaped relationship revealed an unequal income distribution in the early stages of income growth that moves towards equal income distribution with increasing economic productivity in the later stages of economic growth. Therefore, Kuznets specified that the transition from a pre-industrial to an industrial development firstly led to income inequality. This is followed by a rising income per capita together with superior income equality.
The EKC attracted a lot of attention from policymakers, theorists and empirical researchers and started to be widely used in environmental studies through the seminal research of Grossman and Krueger, carried out in 1991. They revealed that the relationship between income per capita and environmental degradation, like the income per capita and income inequality of Kuznets, also follows an inverted U-shaped curve.
In the early 1990s, the main idea in economics was “too poor to be green” [15]. According to Beckerman's [15] point of view regarding the effect of economic growth on environmental degradation, the author argues that there is: "clear evidence that, although economic growth usually leads to environmental deterioration in the early stages of the process, in the end, the best and probably the only way to attain a decent environment in most countries is to become rich". This view reflects the basic philosophy of the EKC theory. The World Development Report in 1992 argues that some environmental problems are aggravated by the growth of economic activity, and it suggests that accelerated equitable income growth will make it possible to achieve higher world output and improved environmental conditions [16, 17]. This proposal lays the foundation of the EKC literature."
In [1]:
The EKC is commonly interpreted in two ways:
In addition, these stages are also characterized by two effects, i.e., policy effect and income
effect :
In the pre-industrial economy, economic activity is limited and results in a natural resource
abundance and reduced formation of waste. In this phase, the use of pollutant technology,
the lack of environmental awareness, and the prioritisation of economic growth result in
rising environmental degradation.
The industrial economy is characterised by natural resources that are starting to run out and
increasing waste accumulation because of industrialisation. In this phase, a positive
relationship between economic growth and environmental deterioration is verified, and it
occurs before the turning point is achieved.
The third phase of economic development is characterised by a structural change in the
economy, changing to information- and technology-intensive industries and a services-
directed economy. This change is linked with the reinforcement of environmental
regulations, the use of cleaner and efficient technology, and a strengthening of
environmental awareness, resulting in a mitigation of environmental degradation. In this
The main data We will use is extracted from the "World Development Indicators DataBank" from the World Bank, for 2019 data only.
So we will explore the EKC hypotheis only from a cross-section perspective.
Learning Objectives:
Importing and exporting dataframes Recognizing and handling missing values and NaNs Pivoting data Regression model behind EKC
Part 1: Importing dataset
Question 1.1: Import the dataset https://raw.githubusercontent.com/Mxywp/EnvEcon105- 2025/refs/heads/data/wdi_gdp_pollution_2019.csv
gdp_ekc = pd. read_csv("https://raw.githubusercontent.com/Mxywp/EnvEcon105-2025/refs gdp_ekc
Country Name Series Name 2019values
0 Afghanistan CO2_emissionstons per capita) 0.
1 Afghanistan PM2.5 air pollution, mean annual exposure (mic... 52.
2 Afghanistan Population density (people per sq. km of land ... 57.
3 Afghanistan GDP per capita (constant 2015 US$) 584.
4 Albania CO2 emissions (metric tons per capita) 1.
... ... ... ...
863 Zambia GDP per capita (constant 2015 US$) 1310.
864 Zimbabwe CO2 emissions (metric tons per capita) 0.
865 Zimbabwe PM2.5 air pollution, mean annual exposure (mic... 20.
866 Zimbabwe Population density (people per sq. km of land ... 39.
867 Zimbabwe GDP per capita (constant 2015 US$) 1342.
868 rows × 3 columns
Part 2: Exploring the dataset
One of the first things that we will do with our dataset is to learn about its structure: how many rows and columns are there in the dataset? What values does each column store? What is the data type for each column (int, string, etc.)? For categorical variables, what are unique values? For numerical variables, what is the mean, median, min, and max?
Question 2.1: How many rows and columns are there in this dataframe gdp_ekc? Assign the number of rows to N_rows and the number of columns to N_cols.
N_rows = gdp_ekc. shape[ 0 ] # SOLUTION N_cols = gdp_ekc. shape[ 1 ] # SOLUTION N_rows N_cols
3
In [2]:
Out[2]:
In [3]:
Out[3]:
points: 0. hidden: true """ # END TEST CONFIG def test_q_2_2_2(np, N_unique_countries): assert np. isclose(N_unique_countries, 217 , rtol = 0.001)
test_q_2_2_2(np, N_unique_countries) # IGNORE
Part 3: Pivot
You should know a bit about pivot tables from our lecture on tidy data. Look at the documentation here. For this lab analysis, we would like to use .pivot() , # to convert a long form dataframe to a wide one.
Question 3.1: Convert the dataframe using pandas.pivot() and assign the pivot table to ekc_wide so that it contains new columns that correspond to the unique values of the column `Series Name'.
ekc_wide = gdp_ekc. pivot(index = 'Country Name', columns = 'Series Name', values = ekc_wide
In [7]:
In [8]:
Series Name
emissions (metric tons per capita)
CO2_emissionstons per capita)
GDP per capita (constant 2015 US$)
PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
Population density (people per sq. km of land area)
Country Name
Afghanistan NaN 0.297563651 584.3865153 52.41704109 57.
Albania 1.749462457 NaN 4543.387723 18.63878032 104.
Algeria 3.994401828 NaN 4153.003441 32.83308539 17.
American Samoa
.. NaN 13288.35656 6.300155035 236.
Andorra 6.287203804 NaN 39413.79088 9.066401367 162.
... ... ... ... ... ...
Virgin Islands (U.S.)
.. NaN 36273.0951 8.996021018 304.
West Bank and Gaza
.. NaN 3378.434621 31.30254529 778.
Yemen, Rep. 0.354864477 NaN 1182.507094 44.46696713 59.
Zambia 0.414336364 NaN 1310.622224 25.92546019 24.
Zimbabwe 0.663338328 NaN 1342.989586 20.83469969 39.
217 rows × 5 columns
def test_q_3_1(ekc_wide): assert 'CO2 emissions (metric tons per capita)' in ekc_wide. columns assert 'CO2_emissionstons per capita)' in ekc_wide. columns assert 'GDP per capita (constant 2015 US$)' in ekc_wide. columns assert 'PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)' assert 'Population density (people per sq. km of land area)' in ekc_wide. column
test_q_3_1(ekc_wide) # IGNORE
Question 3.2: Drop the column that we won't use in this lab: 'CO2_emissionstons per capita)'
Don't create a new dataframe after renaming. Check DataFrame.drop() and its `inplace' argument to make changes directly to the existing dataframe.
Out[8]:
In [9]:
Don't create a new dataframe after renaming. Check DataFrame.rename and its `inplace' argument to make changes directly to the existing dataframe.
ekc_wide. rename(columns = {'CO2 emissions (metric tons per capita)':'CO2_tonpc', # SO 'GDP per capita (constant 2015 US$)':'GDP_pc', # SOLUTION 'PM2.5 air pollution, mean annual exposure (micrograms per 'Population density (people per sq. km of land area)':'pop_ ekc_wide. columns
Index(['CO2_tonpc', 'GDP_pc', 'PM25_mcgpcm', 'pop_den'], dtype='object', name='Ser ies Name')
def test_q_3_3(ekc_wide): assert 'CO2_tonpc' in ekc_wide. columns assert 'GDP_pc' in ekc_wide. columns assert 'PM25_mcgpcm' in ekc_wide. columns assert 'pop_den' in ekc_wide. columns
test_q_3_3(ekc_wide) # IGNORE
Part 4: Missing Values and NaNs
As said in class, real-world data is rarely clean. Particularly, many datasets have significant amount of missing data. In Pandas , missing data is primarily represented by two special values: None : This is a Python object used to represent missing values, particularly in object-type (e.g., string) arrays. NaN (Not a Number): This is a special floating-point value from NumPy that is widely recognized as a missing value indicator, especially in numerical arrays.
However, different data sources may record and/or report missing data in different ways.
In our dataset, there are two types of 'missing values': "NaN" and "..". Let's see how they look like.
ekc_wide[ekc_wide["CO2_tonpc"]. isna()]
In [13]:
Out[13]:
In [14]:
In [15]:
Series Name CO2_tonpc GDP_pc PM25_mcgpcm pop_den
Country Name
Afghanistan NaN 584.3865153 52.41704109 57.
ekc_wide[ekc_wide["CO2_tonpc"] == ".."][: 5 ]
Series Name CO2_tonpc GDP_pc PM25_mcgpcm pop_den
Country Name
American Samoa .. 13288.35656 6.300155035 236.
Aruba .. 31762.73396 .. 591.
Bermuda .. 107036.2393 7.069562328 1183.
British Virgin Islands .. .. .. 204.
Cayman Islands .. 82170.59303 .. 275.
Question 4.1: For simplicity, simply drop all rows that contain missing values (either NaN or ..) for this lab. hint:
Note: As said in class, this is not a good way to deal with missing values. So, do not do this in the real world.
ekc_no_missing = ekc_wide. copy() ekc_no_missing. replace(['..'],['NaN'],inplace =True ) # SOLUTION ekc_no_missing[['GDP_pc', 'CO2_tonpc', 'PM25_mcgpcm', 'pop_den']] = ekc_no_missing[ ekc_no_missing. dropna(inplace =True ) # SOLUTION ekc_no_missing. head() n_rows = ekc_no_missing. shape[ 0 ] # SOLUTION n_rows
185
def test_q_4_1(ekc_no_missing): assert 160 < ekc_no_missing. shape[ 0 ] < 200
Out[15]:
In [16]:
Out[16]:
In [17]:
Out[17]:
In [18]:
NameError Traceback (most recent call last) File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \compat.py:40 , in call_and_wrap_exc **(msg, origin, f, args, kwargs) 39 try : ---> 40 return f(args, **kwargs) 41 except Exception as e:
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \eval.py:179 , in EvalEnvironment.eval (self, expr, source_name, inner_namespace) 178 code = compile(expr, source_name, "eval", self.flags, False ) --> 179 return eval(code, {}, VarLookupDict([inner_namespace] + self._namespaces))
File :
NameError : name 'GDP_pc2' is not defined
The above exception was the direct cause of the following exception:
PatsyError Traceback (most recent call last) Cell In[20], line 1 ----> 1 ekc_reg = smf.ols(formula="PM25_mcgpcm ~ GDP_pc + GDP_pc2", data=ekc_wide).f it() # SOLUTION 2 print(ekc_reg.summary()) # SOLUTION
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsm odels\base\model.py:203 , in Model.from_formula **(cls, formula, data, subset, drop_col s, *args, kwargs) 200 if missing == 'none': # with patsy it's drop or raise. let's raise. 201 missing = 'raise' --> 203 tmp = handle_formula_data(data, None , formula, depth=eval_env, 204 missing=missing) 205 ((endog, exog), missing_idx, design_info) = tmp 206 max_endog = cls._formula_max_endog
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsm odels\formula\formulatools.py:63 , in handle_formula_data (Y, X, formula, depth, missi ng) 61 else : 62 if data_util._is_using_pandas(Y, None ): ---> 63 result = dmatrices(formula, Y, depth, return_type='dataframe', 64 NA_action=na_action) 65 else : 66 result = dmatrices(formula, Y, depth, return_type='dataframe', 67 NA_action=na_action)
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \highlevel.py:319 , in dmatrices (formula_like, data, eval_env, NA_action, return_typ e) 309 """Construct two design matrices given a formula_like and data. 310 311 This function is identical to :func:dmatrix, except that it requires (...) 316 See :func:dmatrix for details. 317 """ 318 eval_env = EvalEnvironment.capture(eval_env, reference= 1 )
--> 319 (lhs, rhs) = _do_highlevel_design( 320 formula_like, data, eval_env, NA_action, return_type 321 ) 322 if lhs.shape[ 1 ] == 0 : 323 raise PatsyError("model is missing required outcome variables")
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \highlevel.py:164 , in _do_highlevel_design (formula_like, data, eval_env, NA_action, return_type) 161 def data_iter_maker(): 162 return iter([data]) --> 164 design_infos = _try_incr_builders( 165 formula_like, data_iter_maker, eval_env, NA_action 166 ) 167 if design_infos is not None : 168 return build_design_matrices( 169 design_infos, data, NA_action=NA_action, return_type=return_type 170 )
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \highlevel.py:56 , in try_incr_builders **(formula_like, data_iter_maker, eval_env, NA action)** 54 if isinstance(formula_like, ModelDesc): 55 assert isinstance(eval_env, EvalEnvironment) ---> 56 return design_matrix_builders( 57 [formula_like.lhs_termlist, formula_like.rhs_termlist], 58 data_iter_maker, 59 eval_env, 60 NA_action, 61 ) 62 else : 63 return None
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \build.py:746 , in design_matrix_builders (termlists, data_iter_maker, eval_env, NA_ac tion) 743 factor_states = _factors_memorize(all_factors, data_iter_maker, eval_env) 744 # Now all the factors have working eval methods, so we can evaluate them 745 # on some data to find out what type of data they return. --> 746 (num_column_counts, cat_levels_contrasts) = _examine_factor_types( 747 all_factors, factor_states, data_iter_maker, NA_action 748 ) 749 # Now we need the factor infos, which encapsulate the knowledge of 750 # how to turn any given factor into a chunk of data: 751 factor_infos = {}
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \build.py:491 , in examine_factor_types **(factors, factor_states, data_iter_maker, NA action)** 489 for data in data_iter_maker(): 490 for factor in list(examine_needed): --> 491 value = factor.eval(factor_states[factor], data) 492 if factor in cat_sniffers or guess_categorical(value): 493 if factor not in cat_sniffers:
File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy
NameError Traceback (most recent call last) Cell In[21], line 2 1 # Save your notebook first, then run this cell to export your submission. ----> 2 grader.export(run_tests= True )
NameError : name 'grader' is not defined