Environmental Kuznets Curve (EKC) Analysis: Theory and Empirical Assessment, Exercises of Economics

An introduction to the environmental kuznets curve (ekc), a concept in environmental economics that posits an inverted u-shaped relationship between economic growth and environmental degradation. It explores the origins of the ekc, its conceptual framework involving different phases of economic development, and the conditions under which the ekc hypothesis is verified using regression models. The document also includes practical exercises involving importing and exploring datasets to empirically test the ekc hypothesis using data from the world bank's world development indicators databank. It is useful for students learning about environmental economics and econometrics, providing both theoretical background and hands-on data analysis experience. Structured as a lab assignment, guiding students through the process of data import, exploration, and regression modeling to understand the ekc.

Typology: Exercises

2024/2025

Uploaded on 08/07/2025

abhijith-varma-mudunuri-1
abhijith-varma-mudunuri-1 🇺🇸

1 document

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
# ASSIGNMENT CONFIG tests: ok_format: false files: true generate: true export_cell: false
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
Lab 03: Environmental Kuznets Curve (EKC)
Reference:The evolution of the environmental Kuznets curve hypothesis assessment: A
literature review under a critical analysis perspective02809-2?
_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS240584402202809
Background of the EKC
Origin of the EKC
"The results of the research of Kuznets disclosed an inverted U-shaped relationship
between income per capita and income inequality . According to Kuznets, the
inverted U-shaped relationship revealed an unequal income distribution in the early stages
of income growth that moves towards equal income distribution with increasing economic
productivity in the later stages of economic growth. Therefore, Kuznets specified that the
transition from a pre-industrial to an industrial development firstly led to income inequality.
This is followed by a rising income per capita together with superior income equality.
The EKC attracted a lot of attention from policymakers, theorists and empirical researchers
and started to be widely used in environmental studies through the seminal research of
Grossman and Krueger, carried out in 1991. They revealed that the relationship between
income per capita and environmental degradation, like the income per capita and income
inequality of Kuznets, also follows an inverted U-shaped curve.
In the early 1990s, the main idea in economics was “too poor to be green” [15]. According to
Beckerman's [15] point of view regarding the effect of economic growth on environmental
degradation, the author argues that there is: "clear evidence that, although economic growth
usually leads to environmental deterioration in the early stages of the process, in the end,
the best and probably the only way to attain a decent environment in most countries is to
become rich". This view reflects the basic philosophy of the EKC theory. The World
Development Report in 1992 argues that some environmental problems are aggravated by
the growth of economic activity, and it suggests that accelerated equitable income growth
will make it possible to achieve higher world output and improved environmental conditions
[16, 17]. This proposal lays the foundation of the EKC literature."
Conceptual framework of the EKC
In [1]:
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Environmental Kuznets Curve (EKC) Analysis: Theory and Empirical Assessment and more Exercises Economics in PDF only on Docsity!

ASSIGNMENT CONFIG tests: ok_format: false files: true generate: true export_cell: false

import numpy as np import pandas as pd import statsmodels.formula.api as smf

Lab 03: Environmental Kuznets Curve (EKC)

Reference:The evolution of the environmental Kuznets curve hypothesis assessment: A literature review under a critical analysis perspective02809-2? _returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS

Background of the EKC

Origin of the EKC

"The results of the research of Kuznets disclosed an inverted U-shaped relationship between income per capita and income inequality. According to Kuznets, the inverted U-shaped relationship revealed an unequal income distribution in the early stages of income growth that moves towards equal income distribution with increasing economic productivity in the later stages of economic growth. Therefore, Kuznets specified that the transition from a pre-industrial to an industrial development firstly led to income inequality. This is followed by a rising income per capita together with superior income equality.

The EKC attracted a lot of attention from policymakers, theorists and empirical researchers and started to be widely used in environmental studies through the seminal research of Grossman and Krueger, carried out in 1991. They revealed that the relationship between income per capita and environmental degradation, like the income per capita and income inequality of Kuznets, also follows an inverted U-shaped curve.

In the early 1990s, the main idea in economics was “too poor to be green” [15]. According to Beckerman's [15] point of view regarding the effect of economic growth on environmental degradation, the author argues that there is: "clear evidence that, although economic growth usually leads to environmental deterioration in the early stages of the process, in the end, the best and probably the only way to attain a decent environment in most countries is to become rich". This view reflects the basic philosophy of the EKC theory. The World Development Report in 1992 argues that some environmental problems are aggravated by the growth of economic activity, and it suggests that accelerated equitable income growth will make it possible to achieve higher world output and improved environmental conditions [16, 17]. This proposal lays the foundation of the EKC literature."

Conceptual framework of the EKC

In [1]:

The EKC is commonly interpreted in two ways:

Two Phases, namely the early and later stages of economic development:

  1. The early stages are defined by a decreasing capacity of ecosystem regeneration as a consequence of intensive use of resources that lead to a rising ecological footprint and pollution. The early stages are linked with lax environmental regulations associated with a low capacity to pay for environmental conservation.
  2. The later stages are characterized by mitigation of environmental degradation resulting from the dissemination of clean technology and innovation , society environmental awareness , and effectiveness and institutional quality associated with an increase in the level of income.

In addition, these stages are also characterized by two effects, i.e., policy effect and income

effect :

  1. The policy effect consists of greater public concern about the environment, which leads to rigorous regulatory requirements.
  2. The income effect consists of the increase in income that leads to an increase in the willingness to pay for environmentally-friendly features.

Three phases of economic development:

  1. the pre-industrial economy, mainly characterised by primary sector and low levels of income;
  2. the industrial economy, constituted by the secondary sector and associated with middle-income levels; and
  3. the post-industrial economy, formed by the tertiary sector and services, and associated with higher levels of income.

In the pre-industrial economy, economic activity is limited and results in a natural resource

abundance and reduced formation of waste. In this phase, the use of pollutant technology,

the lack of environmental awareness, and the prioritisation of economic growth result in

rising environmental degradation.

The industrial economy is characterised by natural resources that are starting to run out and

increasing waste accumulation because of industrialisation. In this phase, a positive

relationship between economic growth and environmental deterioration is verified, and it

occurs before the turning point is achieved.

The third phase of economic development is characterised by a structural change in the

economy, changing to information- and technology-intensive industries and a services-

directed economy. This change is linked with the reinforcement of environmental

regulations, the use of cleaner and efficient technology, and a strengthening of

environmental awareness, resulting in a mitigation of environmental degradation. In this

Data sources:

The main data We will use is extracted from the "World Development Indicators DataBank" from the World Bank, for 2019 data only.

So we will explore the EKC hypotheis only from a cross-section perspective.

Learning Objectives:

Importing and exporting dataframes Recognizing and handling missing values and NaNs Pivoting data Regression model behind EKC

Part 1: Importing dataset

BEGIN QUESTION name: q_1_

Question 1.1: Import the dataset https://raw.githubusercontent.com/Mxywp/EnvEcon105- 2025/refs/heads/data/wdi_gdp_pollution_2019.csv

# BEGIN SOLUTION

gdp_ekc = pd. read_csv("https://raw.githubusercontent.com/Mxywp/EnvEcon105-2025/refs gdp_ekc

Country Name Series Name 2019values

0 Afghanistan CO2_emissionstons per capita) 0.

1 Afghanistan PM2.5 air pollution, mean annual exposure (mic... 52.

2 Afghanistan Population density (people per sq. km of land ... 57.

3 Afghanistan GDP per capita (constant 2015 US$) 584.

4 Albania CO2 emissions (metric tons per capita) 1.

... ... ... ...

863 Zambia GDP per capita (constant 2015 US$) 1310.

864 Zimbabwe CO2 emissions (metric tons per capita) 0.

865 Zimbabwe PM2.5 air pollution, mean annual exposure (mic... 20.

866 Zimbabwe Population density (people per sq. km of land ... 39.

867 Zimbabwe GDP per capita (constant 2015 US$) 1342.

868 rows × 3 columns

# END SOLUTION# END QUESTION

Part 2: Exploring the dataset

One of the first things that we will do with our dataset is to learn about its structure: how many rows and columns are there in the dataset? What values does each column store? What is the data type for each column (int, string, etc.)? For categorical variables, what are unique values? For numerical variables, what is the mean, median, min, and max?

BEGIN QUESTION name: q_2_

Question 2.1: How many rows and columns are there in this dataframe gdp_ekc? Assign the number of rows to N_rows and the number of columns to N_cols.

BEGIN SOLUTION

N_rows = gdp_ekc. shape[ 0 ] # SOLUTION N_cols = gdp_ekc. shape[ 1 ] # SOLUTION N_rows N_cols

3

END SOLUTION# BEGIN TESTS

In [2]:

Out[2]:

In [3]:

Out[3]:

""" # BEGIN TEST CONFIG

points: 0. hidden: true """ # END TEST CONFIG def test_q_2_2_2(np, N_unique_countries): assert np. isclose(N_unique_countries, 217 , rtol = 0.001)

test_q_2_2_2(np, N_unique_countries) # IGNORE

END TESTS# END QUESTION

Part 3: Pivot

You should know a bit about pivot tables from our lecture on tidy data. Look at the documentation here. For this lab analysis, we would like to use .pivot() , # to convert a long form dataframe to a wide one.

BEGIN QUESTION name: q_3_

Question 3.1: Convert the dataframe using pandas.pivot() and assign the pivot table to ekc_wide so that it contains new columns that correspond to the unique values of the column `Series Name'.

BEGIN SOLUTION

ekc_wide = gdp_ekc. pivot(index = 'Country Name', columns = 'Series Name', values = ekc_wide

In [7]:

In [8]:

Series Name

CO

emissions (metric tons per capita)

CO2_emissionstons per capita)

GDP per capita (constant 2015 US$)

PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)

Population density (people per sq. km of land area)

Country Name

Afghanistan NaN 0.297563651 584.3865153 52.41704109 57.

Albania 1.749462457 NaN 4543.387723 18.63878032 104.

Algeria 3.994401828 NaN 4153.003441 32.83308539 17.

American Samoa

.. NaN 13288.35656 6.300155035 236.

Andorra 6.287203804 NaN 39413.79088 9.066401367 162.

... ... ... ... ... ...

Virgin Islands (U.S.)

.. NaN 36273.0951 8.996021018 304.

West Bank and Gaza

.. NaN 3378.434621 31.30254529 778.

Yemen, Rep. 0.354864477 NaN 1182.507094 44.46696713 59.

Zambia 0.414336364 NaN 1310.622224 25.92546019 24.

Zimbabwe 0.663338328 NaN 1342.989586 20.83469969 39.

217 rows × 5 columns

# END SOLUTION# BEGIN TESTS

def test_q_3_1(ekc_wide): assert 'CO2 emissions (metric tons per capita)' in ekc_wide. columns assert 'CO2_emissionstons per capita)' in ekc_wide. columns assert 'GDP per capita (constant 2015 US$)' in ekc_wide. columns assert 'PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)' assert 'Population density (people per sq. km of land area)' in ekc_wide. column

test_q_3_1(ekc_wide) # IGNORE

END TESTS# END QUESTION# BEGIN QUESTION name: q_3_

Question 3.2: Drop the column that we won't use in this lab: 'CO2_emissionstons per capita)'

Don't create a new dataframe after renaming. Check DataFrame.drop() and its `inplace' argument to make changes directly to the existing dataframe.

BEGIN SOLUTION

Out[8]:

In [9]:

  1. 'CO2 emissions (metric tons per capita)' to 'CO2_tonpc'
  2. 'GDP per capita (constant 2015 US$)' to 'GDP_pc'
  3. 'PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)' to 'PM25_mcgpcm'
  4. 'Population density (people per sq. km of land area)' to 'pop_den'

Don't create a new dataframe after renaming. Check DataFrame.rename and its `inplace' argument to make changes directly to the existing dataframe.

BEGIN SOLUTION

ekc_wide. rename(columns = {'CO2 emissions (metric tons per capita)':'CO2_tonpc', # SO 'GDP per capita (constant 2015 US$)':'GDP_pc', # SOLUTION 'PM2.5 air pollution, mean annual exposure (micrograms per 'Population density (people per sq. km of land area)':'pop_ ekc_wide. columns

Index(['CO2_tonpc', 'GDP_pc', 'PM25_mcgpcm', 'pop_den'], dtype='object', name='Ser ies Name')

END SOLUTION# BEGIN TESTS

def test_q_3_3(ekc_wide): assert 'CO2_tonpc' in ekc_wide. columns assert 'GDP_pc' in ekc_wide. columns assert 'PM25_mcgpcm' in ekc_wide. columns assert 'pop_den' in ekc_wide. columns

test_q_3_3(ekc_wide) # IGNORE

END TESTS# END QUESTION

Part 4: Missing Values and NaNs

As said in class, real-world data is rarely clean. Particularly, many datasets have significant amount of missing data. In Pandas , missing data is primarily represented by two special values: None : This is a Python object used to represent missing values, particularly in object-type (e.g., string) arrays. NaN (Not a Number): This is a special floating-point value from NumPy that is widely recognized as a missing value indicator, especially in numerical arrays.

However, different data sources may record and/or report missing data in different ways.

In our dataset, there are two types of 'missing values': "NaN" and "..". Let's see how they look like.

ekc_wide[ekc_wide["CO2_tonpc"]. isna()]

In [13]:

Out[13]:

In [14]:

In [15]:

Series Name CO2_tonpc GDP_pc PM25_mcgpcm pop_den

Country Name

Afghanistan NaN 584.3865153 52.41704109 57.

ekc_wide[ekc_wide["CO2_tonpc"] == ".."][: 5 ]

Series Name CO2_tonpc GDP_pc PM25_mcgpcm pop_den

Country Name

American Samoa .. 13288.35656 6.300155035 236.

Aruba .. 31762.73396 .. 591.

Bermuda .. 107036.2393 7.069562328 1183.

British Virgin Islands .. .. .. 204.

Cayman Islands .. 82170.59303 .. 275.

BEGIN QUESTION name: q_4_

Question 4.1: For simplicity, simply drop all rows that contain missing values (either NaN or ..) for this lab. hint:

  1. check data type before using .dropna() , which does not work with string or object. You need to convert the columns into float type.
  2. however, the missing value '..' can't be converted from string to float. So, you need to replace it with something that can be converted to float. There are a few ways to complete this. Here is my suggestion: 1) .replace() '..' to 'NaN'. 2) .astype(float) changes data types of the columns with numerical values (but stored in object type) into float type. 3) .dropna().
  3. Finally, assign the number of rows to n_rows.

Note: As said in class, this is not a good way to deal with missing values. So, do not do this in the real world.

BEGIN SOLUTION

ekc_no_missing = ekc_wide. copy() ekc_no_missing. replace(['..'],['NaN'],inplace =True ) # SOLUTION ekc_no_missing[['GDP_pc', 'CO2_tonpc', 'PM25_mcgpcm', 'pop_den']] = ekc_no_missing[ ekc_no_missing. dropna(inplace =True ) # SOLUTION ekc_no_missing. head() n_rows = ekc_no_missing. shape[ 0 ] # SOLUTION n_rows

185

END SOLUTION# BEGIN TESTS

def test_q_4_1(ekc_no_missing): assert 160 < ekc_no_missing. shape[ 0 ] < 200

Out[15]:

In [16]:

Out[16]:

In [17]:

Out[17]:

In [18]:

NameError Traceback (most recent call last) File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \compat.py:40 , in call_and_wrap_exc **(msg, origin, f, args, kwargs) 39 try : ---> 40 return f(args, **kwargs) 41 except Exception as e:

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \eval.py:179 , in EvalEnvironment.eval (self, expr, source_name, inner_namespace) 178 code = compile(expr, source_name, "eval", self.flags, False ) --> 179 return eval(code, {}, VarLookupDict([inner_namespace] + self._namespaces))

File :

NameError : name 'GDP_pc2' is not defined

The above exception was the direct cause of the following exception:

PatsyError Traceback (most recent call last) Cell In[20], line 1 ----> 1 ekc_reg = smf.ols(formula="PM25_mcgpcm ~ GDP_pc + GDP_pc2", data=ekc_wide).f it() # SOLUTION 2 print(ekc_reg.summary()) # SOLUTION

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsm odels\base\model.py:203 , in Model.from_formula **(cls, formula, data, subset, drop_col s, *args, kwargs) 200 if missing == 'none': # with patsy it's drop or raise. let's raise. 201 missing = 'raise' --> 203 tmp = handle_formula_data(data, None , formula, depth=eval_env, 204 missing=missing) 205 ((endog, exog), missing_idx, design_info) = tmp 206 max_endog = cls._formula_max_endog

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\statsm odels\formula\formulatools.py:63 , in handle_formula_data (Y, X, formula, depth, missi ng) 61 else : 62 if data_util._is_using_pandas(Y, None ): ---> 63 result = dmatrices(formula, Y, depth, return_type='dataframe', 64 NA_action=na_action) 65 else : 66 result = dmatrices(formula, Y, depth, return_type='dataframe', 67 NA_action=na_action)

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \highlevel.py:319 , in dmatrices (formula_like, data, eval_env, NA_action, return_typ e) 309 """Construct two design matrices given a formula_like and data. 310 311 This function is identical to :func:dmatrix, except that it requires (...) 316 See :func:dmatrix for details. 317 """ 318 eval_env = EvalEnvironment.capture(eval_env, reference= 1 )

--> 319 (lhs, rhs) = _do_highlevel_design( 320 formula_like, data, eval_env, NA_action, return_type 321 ) 322 if lhs.shape[ 1 ] == 0 : 323 raise PatsyError("model is missing required outcome variables")

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \highlevel.py:164 , in _do_highlevel_design (formula_like, data, eval_env, NA_action, return_type) 161 def data_iter_maker(): 162 return iter([data]) --> 164 design_infos = _try_incr_builders( 165 formula_like, data_iter_maker, eval_env, NA_action 166 ) 167 if design_infos is not None : 168 return build_design_matrices( 169 design_infos, data, NA_action=NA_action, return_type=return_type 170 )

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \highlevel.py:56 , in try_incr_builders **(formula_like, data_iter_maker, eval_env, NA action)** 54 if isinstance(formula_like, ModelDesc): 55 assert isinstance(eval_env, EvalEnvironment) ---> 56 return design_matrix_builders( 57 [formula_like.lhs_termlist, formula_like.rhs_termlist], 58 data_iter_maker, 59 eval_env, 60 NA_action, 61 ) 62 else : 63 return None

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \build.py:746 , in design_matrix_builders (termlists, data_iter_maker, eval_env, NA_ac tion) 743 factor_states = _factors_memorize(all_factors, data_iter_maker, eval_env) 744 # Now all the factors have working eval methods, so we can evaluate them 745 # on some data to find out what type of data they return. --> 746 (num_column_counts, cat_levels_contrasts) = _examine_factor_types( 747 all_factors, factor_states, data_iter_maker, NA_action 748 ) 749 # Now we need the factor infos, which encapsulate the knowledge of 750 # how to turn any given factor into a chunk of data: 751 factor_infos = {}

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy \build.py:491 , in examine_factor_types **(factors, factor_states, data_iter_maker, NA action)** 489 for data in data_iter_maker(): 490 for factor in list(examine_needed): --> 491 value = factor.eval(factor_states[factor], data) 492 if factor in cat_sniffers or guess_categorical(value): 493 if factor not in cat_sniffers:

File c:\Users\mabhi\AppData\Local\Programs\Python\Python311\Lib\site-packages\patsy

NameError Traceback (most recent call last) Cell In[21], line 2 1 # Save your notebook first, then run this cell to export your submission. ----> 2 grader.export(run_tests= True )

NameError : name 'grader' is not defined