Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Machine Learning for Data Analysis, Lecture notes of Machine Learning

University of Pennsylvania (UPenn)Machine Learning

The role of data in economics and the importance of causality. It also provides examples of unsupervised learning and clustering. authored by Jesu´s Fern´andez-Villaverde and Galo Nun˜o and was published on September 1, 2022. related to university topics such as data science, economics, and machine learning. The University of Pennsylvania is the most likely university to have courses related to these topics. The document could be useful as study notes with a rate of 8. The typology of the document is 'lecture notes'. The document might belong to an academic course in data science or economics. The possible academic year is 2023. The document could be more useful to a university student. The boolean output 'succeeded' is true.

Typology: Lecture notes

2022/2023

Uploaded on 05/11/2023

tanvir 🇺🇸

(4)

224 documents

1 / 67

This page cannot be seen from the preview

Don't miss anything!

Machine Learning for Data Analysis

Jes´us Fern´andez-Villaverde1and Galo Nu˜no2

September 1, 2022

1University of Pennsylvania

2Banco de Espa˜na

Discover Lecture notes of Machine Learning University of Pennsylvania (UPenn)

Partial preview of the text

Download Machine Learning for Data Analysis and more Lecture notes Machine Learning in PDF only on Docsity!

Machine Learning for Data Analysis

Jes´us Fern´andez-Villaverde^1 and Galo Nu˜no^2

September 1, 2022

(^1) University of Pennsylvania

(^2) Banco de Espa˜na

New data

Most important lesson for economists from data science: Everything is data.
Unstructured data: Newspaper articles, business reports, congressional speeches, FOMC meetings transcripts, satellite data, photographs, audio, mobility, ...

Parish and probate data

Satellite imagery

Cell use

TABLE I M OST PARTISAN PHRASES F ROM THE 2005 C ONGRESSIONAL R ECORD a Panel A: Phrases Used More Often by Democrats Two-Word Phrases private accounts Rosa Parks workers rights trade agreement President budget poor people American people Republican party Republican leader tax breaks change the rules Arctic refuge trade deficit minimum wage cut funding oil companies budget deficit American workers credit card Republican senators living in poverty nuclear option privatization plan Senate Republicans war in Iraq wildlife refuge fuel efficiency middle class card companies national wildlife Three-Word Phrases veterans health care corporation for public cut health care congressional black caucus broadcasting civil rights movement VA health care additional tax cuts cuts to child support billion in tax cuts pay for tax cuts drilling in the Arctic National credit card companies tax cuts for people victims of gun violence security trust fund oil and gas companies solvency of social security social security trust prescription drug bill Voting Rights Act privatize social security caliber sniper rifles war in Iraq and Afghanistan American free trade increase in the minimum wage civil rights protections central American free system of checks and balances credit card debt middle class families ( Continues ) 7

F IGURE 1.—Language-based and reader-submitted ratings of slant. The slant index (y axis) is shown against the average Mondo Times user rating of newspaper conservativeness (x axis), which ranges from 1 (liberal) to 5 (conservative). Included are all papers rated by at least two users on Mondo Times, with at least 25,000 mentions of our 1000 phrases in 2005. The line is pre- dicted slant from an OLS regression of slant on Mondo Times rating. The correlation coefficient is 0.40 (p = 0 0114). (^8) We wish to thank Eric Kallgren of Mondo Code for graciously providing these data. 9

Economics and machine learning II

A more general point ⇒ role of causality in economics:
1. Counterfactuals.
2. Welfare.
3. General equilibrium effects.
4. New changes.
5. Less data.
Another example by Athey (2017): hotel prices and occupancy rates. In the data, prices and occupancy rates are strongly positively correlated, but what is the expected impact of a hotel raising its prices on a given day?

Unsupervised learning

Use a sample: D = {xi }Ni= to:

Group observations in interesting patterns.
Describe most important sources of variation in the data.
Dimensionality reduction.

Example: what can we learn about the loan book of a bank without imposing too much a priori structure?
More concretely, we search for: p (xi |θ)
Clustering and association rules.

Cluster discovering

Select K clusters K ∗^ = argmax K

p (K |D)

Assign each observation to a cluster

z t∗ = argmax k

p (zi = k|xi , D)

A common method to pin down K is the silhouette. For each observation i, we compute:

si =

b(i) − a(i) max(a(i), b(i))

where a(i) is the average distance between i and all other members of the cluster while b(i) is the minimum distance between i and all other members of another cluster.

K-means

K-means clustering by Steinhaus (1957)

argmax S

X

x∈Si

∥x − μi ∥^2

It requires an iterative algorithm for implementation Lloyd (1957).
Related variations:
1. k-medians ⇒ uses medians computed through the Taxicab geometry.
2. k-medoids ⇒minimizes a sum of pairwise dissimilarities.
3. k-SVD.

Other algorithms

Other clustering methods:
1. Agglomerative clustering.
2. DBSCAN.
3. Birch.
Principal component analysis.
Density estimation.
Gaussian mixture models.
Association rules and the Apriori algorithm (Agrawal and Srikant, 1994).

Machine Learning for Data Analysis, Lecture notes of Machine Learning

Related documents

Partial preview of the text

Download Machine Learning for Data Analysis and more Lecture notes Machine Learning in PDF only on Docsity!

Machine Learning for Data Analysis

New data

Parish and probate data

Satellite imagery

Cell use

Economics and machine learning II

Unsupervised learning

Unsupervised learning

Cluster discovering

K-means

X

Other algorithms