Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Analysis Density Estimation 1, Exercises - Engineering, Exercises of Advanced Data Analysis

Carnegie Mellon University (CMU)Advanced Data Analysis

Data Analysis Density Estimation 1, Exercises - Engineering, Advanced Data Analysis

Typology: Exercises

2010/2011

Uploaded on 11/03/2011

bridge 🇺🇸

4.9

(13)

287 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Homework Assignment 6: Nice Demo City, But

Will It Scale?

36-402, Advanced Data Analysis, Spring 2011

SOLUTIONS

1. Answer: Taking the log of both sides gives

log y= log Y

N= log Y−log N

≈log(cN b)−log N

= log c+ log Nb−log N

= log c+blog N−log N

=β0+β1log N

where we have taken β0= log cand β1=b−1.

2. Answer:

#### Problem 2

# Get the data:

gmp.data = read.csv(file =

"http://www.stat.cmu.edu/~cshalizi/402/hw/06/gmp_2006.csv")

pcgmp.data = read.csv(file =

"http://www.stat.cmu.edu/~cshalizi/402/hw/06/pcgmp_2006.csv")

# Using summary(gmp.data) helps to show that X2006 is the column of interest

# Check that both data sets are ordered the same way:

sum(gmp.data$Metropolitan != pcgmp.data$Metropolitan)

# The N, in millions: Divide gdp by gdp-per-capita

N = gmp.data$X2006/pcgmp.data$X2006

# The N, corrected (since gdp is in millions)

N = N*1000000

# Observe that the result matches what Cosma says it should:

summary(N)

# Compute variance of log y

y = pcgmp.data$X2006 # same as Y/N

1

Discover Exercises of Advanced Data Analysis Carnegie Mellon University (CMU)

Partial preview of the text

Download Data Analysis Density Estimation 1, Exercises - Engineering and more Exercises Advanced Data Analysis in PDF only on Docsity!

Homework Assignment 6: Nice Demo City, But

Will It Scale?

36-402, Advanced Data Analysis, Spring 2011

SOLUTIONS

1. Answer: Taking the log of both sides gives

log y = log

Y

N

= log Y − log N

≈ log(cN

b

) − log N

= log c + log N

b

− log N

= log c + b log N − log N

= β 0 + β 1 log N

where we have taken β 0 = log c and β 1 = b − 1.

2. Answer:

#### Problem 2

# Get the data:

gmp.data = read.csv(file =

"http://www.stat.cmu.edu/~cshalizi/402/hw/06/gmp_2006.csv")

pcgmp.data = read.csv(file =

"http://www.stat.cmu.edu/~cshalizi/402/hw/06/pcgmp_2006.csv")

# Using summary(gmp.data) helps to show that X2006 is the column of interest

# Check that both data sets are ordered the same way:

sum(gmp.data$Metropolitan != pcgmp.data$Metropolitan)

# The N, in millions: Divide gdp by gdp-per-capita

N = gmp.data$X2006/pcgmp.data$X

# The N, corrected (since gdp is in millions)

N = N*

# Observe that the result matches what Cosma says it should:

summary(N)

Data Analysis Density Estimation 1, Exercises - Engineering, Exercises of Advanced Data Analysis

Related documents

Partial preview of the text

Download Data Analysis Density Estimation 1, Exercises - Engineering and more Exercises Advanced Data Analysis in PDF only on Docsity!

Homework Assignment 6: Nice Demo City, But

Will It Scale?

36-402, Advanced Data Analysis, Spring 2011

SOLUTIONS

1. Answer: Taking the log of both sides gives

log y = log

Y

N

= log Y − log N

≈ log(cN

b

) − log N

= log c + log N

b

− log N

= log c + b log N − log N

= β 0 + β 1 log N

where we have taken β 0 = log c and β 1 = b − 1.

2. Answer:

#### Problem 2

# Get the data:

gmp.data = read.csv(file =

"http://www.stat.cmu.edu/~cshalizi/402/hw/06/gmp_2006.csv")

pcgmp.data = read.csv(file =

"http://www.stat.cmu.edu/~cshalizi/402/hw/06/pcgmp_2006.csv")

# Using summary(gmp.data) helps to show that X2006 is the column of interest

# Check that both data sets are ordered the same way:

sum(gmp.data$Metropolitan != pcgmp.data$Metropolitan)

# The N, in millions: Divide gdp by gdp-per-capita

N = gmp.data$X2006/pcgmp.data$X

# The N, corrected (since gdp is in millions)

N = N*

# Observe that the result matches what Cosma says it should:

summary(N)

# Compute variance of log y

y = pcgmp.data$X2006 # same as Y/N

log.y = log(y)

> var(log.y)

[1] 0.

3. Answer:

# Compute log population:

log.N = log(N)

# Fit a linear model

lm.3 = lm(log.y ~ log.N)

# The output of summary(lm.3) includes the following

Coefficients:

Estimate

(Intercept) 8.

log.N 0.

The first thing to notice is that the adjusted R-squared value is only

about 0.24 (from the model summary). This means that the model does

not explain a very large portion of the variability in the response. But it

is still possible that the model is close to optimal, unless we have reason

to think that the single predictor variable contains enough information to

explain much more than 24% of the variability.

The model summary gives an estimate of the intercept

β 0 = 8.80 and the

log.N coefficient

β 1 = 0.12. Using the expressions at the end of part (1)

we can generate the corresponding estimates of c and b for the equation

Y ≈ cN

b

which motivates the log-linear model. We get ˆc = exp(

6610 and

b =

β 1 + 1 = 1.12. There is nothing obvious here to suggest that

the model Y ≈ 6610 N

is not a reasonable approximation.

Finally, we compute the MSE for the model in the log scale:

> mean(lm.3$residuals^2)

[1] 0.

4. Answer:

plot(N, y, main = "The fitted power-law curve",

cex = 0.4, lwd = 2, pch = 16)

# The fitted values are the exponentials of the log fit:

y.fit = exp(fitted(lm.3))