jsonstat.py Documentation, Exams of Italian

Notebook: using jsonstat.py python library with jsonstat format version ... import pandas as ps # using panda to convert jsonstat dataset to ...

Typology: Exams

2022/2023

Uploaded on 02/28/2023

eknath
eknath 🇺🇸

4.7

(29)

266 documents

1 / 50

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
jsonstat.py Documentation
Release 0.1.14
26fe
Aug 06, 2017
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32

Partial preview of the text

Download jsonstat.py Documentation and more Exams Italian in PDF only on Docsity!

jsonstat.py Documentation

Release 0.1.

26fe

Aug 06, 2017

ii

jsonstat.py is a library for reading the JSON-stat data format maintained and promoted by Xavier Badosa. The JSON-

stat format is a JSON format for publishing dataset. JSON-stat is used by several institutions to publish statistical

data.

Contents:

Contents 1

CHAPTER 1

Notebooks

Notebook: using jsonstat.py python library with jsonstat format ver-

sion 1.

This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON

dissemination format. For more information about the format see the official site. This example shows how to explore

the example data file oecd-canada from json-stat.org site. This file is compliant to the version 1 of jsonstat.

all import here

from future import print_function import os import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe import jsonstat # import jsonstat.py package

import matplotlib as plt # for plotting

%matplotlib inline

Download or use cached file oecd-canada.json. Caching file on disk permits to work off-line and to speed up the

exploration of the data.

url = 'http://json-stat.org/samples/oecd-canada.json' file_name = "oecd-canada.json"

file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org ˓→", file_name)) if os.path.exists(file_path): print("using already downloaded file {}".format(file_path)) else : print("download file and storing on disk") jsonstat.download(url, file_name) file_path = file_name

using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/ ˓→tests/fixtures/www.json-stat.org/oecd-canada.json

Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.

collection = jsonstat.from_file(file_path) collection

Select the dataset named oedc. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.

oecd = collection.dataset('oecd') oecd

Shows some detailed info about dimensions

oecd.dimension('concept')

oecd.dimension('area')

oecd.dimension('year')

Accessing value in the dataset

Print the value in oecd dataset for area = IT and year = 2012

oecd.data(area='IT', year='2012')

JsonStatValue(idx=201, value=10.55546863, status= None )

oecd.value(area='IT', year='2012')

oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.

oecd.value(concept='UNR',area='AU',year='2004')

Trasforming dataset into pandas DataFrame

df_oecd = oecd.to_data_frame('year', content='id') df_oecd.head()

df_oecd['area'].describe() # area contains 36 values

4 Chapter 1. Notebooks

[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'], ['unemployment rate', 'Australia', '2003', 5.943826289], ['unemployment rate', 'Australia', '2004', 5.39663128], ['unemployment rate', 'Australia', '2005', 5.044790587], ['unemployment rate', 'Australia', '2006', 4.789362794]]

It is possible to trasform jsonstat data into table in different order

order = [i.did() for i in oecd.dimensions()] order = order[::-1] # reverse list table = oecd.to_table(order=order) table[:5]

[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'], ['unemployment rate', 'Australia', '2003', 5.943826289], ['unemployment rate', 'Austria', '2003', 4.278559338], ['unemployment rate', 'Belgium', '2003', 8.158333333], ['unemployment rate', 'Canada', '2003', 7.594616751]]

Notebook: using jsonstat.py python library with jsonstat format ver-

sion 2.

This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON

dissemination format. For more information about the format see the official site.

In this notebook it is used the data file oecd-canada-col.json from json-stat.org site. This file is compliant to the version

2 of jsonstat. This notebook is equal to version 1. The only difference is the datasource.

all import here

from future import print_function import os import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe import jsonstat # import jsonstat.py package

import matplotlib as plt # for plotting %matplotlib inline

Download or use cached file oecd-canada-col.json. Caching file on disk permits to work off-line and to speed up the

exploration of the data.

url = 'http://json-stat.org/samples/oecd-canada-col.json' file_name = "oecd-canada-col.json"

file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org ˓→", file_name)) if os.path.exists(file_path): print("using already downloaded file {}".format(file_path)) else : print("download file and storing on disk") jsonstat.download(url, file_name) file_path = file_name

using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/ ˓→tests/fixtures/www.json-stat.org/oecd-canada-col.json

6 Chapter 1. Notebooks

Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.

collection = jsonstat.from_file(file_path) collection

Select the firt dataset. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.

oecd = collection.dataset(0) oecd

oecd.dimension('concept')

oecd.dimension('area')

oecd.dimension('year')

Shows some detailed info about dimensions.

Accessing value in the dataset

Print the value in oecd dataset for area = IT and year = 2012

oecd.data(area='IT', year='2012')

JsonStatValue(idx=201, value=10.55546863, status= None )

oecd.value(area='IT', year='2012')

oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.

oecd.value(concept='UNR',area='AU',year='2004')

Trasforming dataset into pandas DataFrame

df_oecd = oecd.to_data_frame('year', content='id') df_oecd.head()

df_oecd['area'].describe() # area contains 36 values

count 432 unique 36 top ES freq 12 Name: area, dtype: object

1.2. Notebook: using jsonstat.py python library with jsonstat format version 2. 7

order = [i.did() for i in oecd.dimensions()] order = order[::-1] # reverse list table = oecd.to_table(order=order) table[:5]

[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'], ['unemployment rate', 'Australia', '2003', 5.943826289], ['unemployment rate', 'Austria', '2003', 4.278559338], ['unemployment rate', 'Belgium', '2003', 8.158333333], ['unemployment rate', 'Canada', '2003', 7.594616751]]

Notebook: using jsonstat.py with eurostat api

This Jupyter notebook shows the python library jsonstat.py in action. It shows how to explore dataset downloaded

from a data provider. This notebook uses some datasets from Eurostat. Eurostat provides a rest api to download

its datasets. You can find details about the api here It is possible to use a query builder for discovering the rest api

parameters. The following image shows the query builder:

all import here

from future import print_function import os import pandas as pd import jsonstat

import matplotlib as plt %matplotlib inline

1 - Exploring data with one dimension (time) with size > 1

Following cell downloads a datataset from eurostat. If the file is already downloaded use the copy presents on the disk.

Caching file is useful to avoid downloading dataset every time notebook runs. Caching can speed the development,

and provides consistent results. You can see the raw data here

url_1 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c? ˓→precision=1&geo=IT&unit=EUR_HAB&indic_na=B1GM' file_name_1 = "eurostat-name_gpd_c-geo_IT.json"

file_path_1 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa. ˓→eu_eurostat", file_name_1)) if os.path.exists(file_path_1): print("using already donwloaded file {}".format(file_path_1)) else : print("download file") jsonstat.download(url_1, file_name_1) file_path_1 = file_name_

using already donwloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/ ˓→tests/fixtures/www.ec.europa.eu_eurostat/eurostat-name_gpd_c-geo_IT.json

Initialize JsonStatCollection with eurostat data and print some info about the collection.

1.3. Notebook: using jsonstat.py with eurostat api 9

collection_1 = jsonstat.from_file(file_path_1) collection_

Previous collection contains only a dataset named ‘nama_gdp_c‘

nama_gdp_c_1 = collection_1.dataset('nama_gdp_c') nama_gdp_c_

All dimensions of the dataset ‘nama_gdp_c‘ are of size 1 with exception of time dimension. Let’s explore the time

dimension.

nama_gdp_c_1.dimension('time')

Get value for year 2012.

nama_gdp_c_1.value(time='2012')

Convert the jsonstat data into a pandas dataframe.

df_1 = nama_gdp_c_1.to_data_frame('time', content='id') df_1.tail()

Adding a simple plot

df_1 = df_1.dropna() # remove rows with NaN values df_1.plot(grid= True , figsize=(20,5))

2 - Exploring data with two dimensions (geo, time) with size > 1

Download or use the jsonstat file cached on disk. The cache is used to avoid internet download during the devolopment

to make the things a bit faster. You can see the raw data here

url_2 = 'http://ec.europa.eu/eurostat/wdds/rest/data/v1.1/json/en/nama_gdp_c? ˓→precision=1&geo=IT&geo=FR&unit=EUR_HAB&indic_na=B1GM' file_name_2 = "eurostat-name_gpd_c-geo_IT_FR.json"

file_path_2 = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.ec.europa. ˓→eu_eurostat", file_name_2)) if os.path.exists(file_path_2):

10 Chapter 1. Notebooks

df_4 = nama_gdp_c_2.to_data_frame('time', content='id', blocked_dims={'geo':'IT'}) df_4 = df_4.dropna() df_4.plot(grid= True ,figsize=(20,5))

Notebook: using jsonstat.py to explore ISTAT data (house price in-

dex)

This Jupyter notebook shows how to use jsonstat.py python library to explore Istat data. Istat is Italian National

Institute of Statistics. It publishs a rest api for querying italian statistics.

We starts importing some modules.

from future import print_function import os import istat from IPython.core.display import HTML

Step 1: using istat module to get a jsonstat collection

Following code sets a cache dir where to store json files download by Istat api. Storing file on disk speed up develop-

ment, and assures consistent results over time. Anyway you can delete file to donwload a fresh copy.

cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached")) istat.cache_dir(cache_dir) print("cache_dir is '{}'".format(istat.cache_dir()))

12 Chapter 1. Notebooks

cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'

Using istat api, we can shows the istat areas used to categorize the datasets

istat.areas()

Following code list all datasets contained into area Prices.

istat_area_prices = istat.area('Prices') istat_area_prices.datasets()

List all dimension for dataset DCSP_IPAB (House price index)

istat_dataset_dcsp_ipab = istat_area_prices.dataset('DCSP_IPAB') istat_dataset_dcsp_ipab

Finally from istat dataset we extracts data in jsonstat format by specifying dimensions we are interested.

spec = { "Territory": 1, "Index type": 18,

"Measure": 0, # "Purchases of dwelling": 0, # "Time and frequency": 0

}

convert istat dataset into jsonstat collection and print some info

collection = istat_dataset_dcsp_ipab.getvalues(spec) collection

The previous call is equivalent to call istat api with a “1,18,0,0,0” string of number. Below is the mapping from the

number and dimensions:

dimension

Territory 1 Italy

Type 18 house price index (base 2010=100) - quarterly data’

Measure 0 ALL

Purchase of dwelling 0 ALL

Time and frequency 0 ALL

json_stat_data = istat_dataset_dcsp_ipab.getvalues("1,18,0,0,0") json_stat_data

step2: using jsonstat.py api.

Now we have a jsonstat collection, let expore it with the api of jsonstat.py

Print some info of one dataset contained into the above jsonstat collection

jsonstat_dataset = collection.dataset('IDMISURA1IDTYPPURCHIDTIME') jsonstat_dataset

Print info about the dimensions to get an idea about the data

jsonstat_dataset.dimension('IDMISURA1')

jsonstat_dataset.dimension('IDTYPPURCH')

1.4. Notebook: using jsonstat.py to explore ISTAT data (house price index) 13

from future import print_function import os import pandas as pd from IPython.core.display import HTML import matplotlib.pyplot as plt %matplotlib inline

import istat

Using istat api

Next step is to set a cache dir where to store json files downloaded from Istat. Storing file on disk speeds up develop-

ment, and assures consistent results over time. Eventually, you can delete donwloaded files to get a fresh copy.

cache_dir = os.path.abspath(os.path.join("..", "tmp", "istat_cached")) # you could ˓→choice /tmp istat.cache_dir(cache_dir) print("cache_dir is '{}'".format(istat.cache_dir()))

cache_dir is '/Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tmp/istat_cached'

List all istat areas

istat.areas()

List all datasets contained into area LAB (Labour)

istat_area_lab = istat.area('LAB') istat_area_lab

List all dimension for dataset DCCV_TAXDISOCCU (Unemployment rate)

istat_dataset_taxdisoccu = istat_area_lab.dataset('DCCV_TAXDISOCCU') istat_dataset_taxdisoccu

Extract data from dataset DCCV_TAXDISOCCU

spec = { "Territory": 0, # 1 Italy "Data type": 6, # (6:'unemployment rate') 'Measure': 1, # 1 : 'percentage values' 'Gender': 3, # 3 total 'Age class':31, # 31:'15-74 years' 'Highest level of education attained': 12, # 12:'total', 'Citizenship': 3, # 3:'total') 'Duration of unemployment': 3, # 3:'total' 'Time and frequency': 0 # All }

convert istat dataset into jsonstat collection and print some info

collection = istat_dataset_taxdisoccu.getvalues(spec) collection

Print some info of the only dataset contained into the above jsonstat collection

1.5. Notebook: using jsonstat.py to explore ISTAT data (unemployment) 15

jsonstat_dataset = collection.dataset(0) jsonstat_dataset

df_all = jsonstat_dataset.to_table(rtype=pd.DataFrame) df_all.head()

df_all.pivot('Territory', 'Time and frequency', 'Value').head()

spec = { "Territory": 1, # 1 Italy "Data type": 6, # (6:'unemployment rate') 'Measure': 1, 'Gender': 3, 'Age class':0, # all classes 'Highest level of education attained': 12, # 12:'total', 'Citizenship': 3, # 3:'total') 'Duration of unemployment': 3, # 3:'total') 'Time and frequency': 0 # All }

convert istat dataset into jsonstat collection and print some info

collection_2 = istat_dataset_taxdisoccu.getvalues(spec) collection_

df = collection_2.dataset(0).to_table(rtype=pd.DataFrame, blocked_dims={'IDCLASETA28': ˓→'31'}) df.head(6)

df = df.dropna() df = df[df['Time and frequency'].str.contains(r'^Q.*')]

df = df.set_index('Time and frequency')

df.head(6)

df.plot(x='Time and frequency',y='Value', figsize=(18,4))

fig = plt.figure(figsize=(18,6)) ax = fig.add_subplot(111) plt.grid( True ) df.plot(x='Time and frequency',y='Value', ax=ax, grid= True )

kind='barh', , alpha=a, legend=False, color=customcmap,

edgecolor='w', xlim=(0,max(df['population'])), title=ttl)

16 Chapter 1. Notebooks