

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Project; Class: STATISTICAL INFERENCE; Subject: Statistics; University: University of Pennsylvania; Term: Fall 2003;
Typology: Study Guides, Projects, Research
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Stat 431, Fall 2003, Due Nov 4 (in class) First Project (ANOVA)
Use this page as the cover page for your project. Staple it to additional pages with answers and JMPIN output as necessary.
The data for this project is located on the class website: www-stat.wharton.upenn.edu/~lzhao.
For Problem 1 use the data entitled “Philadelphia (Suburban)”. (This is a large portion of the data set that is being used on the handout in class to illustrate linear regression. The complete dataset is also on our website, labeled as “Philadelphia (all)”.)
For Problem 2 use the data set “Call Center Arrivals”.
For all questions include the relevant part of the JMP output – if any. If you include additional JMP output be sure to describe, circle, or otherwise indicate the part of the output that is relevant to your answer. (You may answer the questions directly on the JMP printout, if that is most convenient, but be sure your answers are clearly indicated and easy to read.)
[P.S. A convenient way to print out JMP tables is as follows: When the window with the table or
plot is open on the computer screen click on “Edit → Journal” on the menu bar. This creates a
“Journal” in JMP that contains this table or plot. When the Journal window is on the screen you
may then go to “File → Save” to save it. Be sure to save it as an RTF or HTML file. You can
then open and edit that file using a word processor such as WORD.]
a. Perform the usual overall F-test. What is the P-value for this test? What null hypothesis is it testing? What is the conclusion from this test concerning the mean community-average house price among the Philadelphia counties?
b. A friend of mine was considering buying a house in one of the counties outside of Philadelphia. Consequently he looked at this data and observed that among those counties Montgomery had the highest mean price and Delaware had the lowest. For this reason he looked at the usual t-test of the difference in mean price between Montgomery and Delaware, and concluded that this difference was statistically significant at the 0.10 level. Show the analysis he performed, and comment on whether his conclusion was justified.
c. Investigate whether the standard assumptions for an ANOVA are justified. There are fairly clear indications that suggest using a transformation of the data. What are they? [Provide the usual diagnostic plots, and comment on them. Note: To get a normal quantile plot of residuals in JMP you need to first save the residuals from the “Fit Y by X” or “Fit Model” platforms. The residuals will appear as a column in your data table, and you can work from there
using the “Analyze → Distribution” command.]
d. Justified or not, the statistician decided to transform to Log(house Price) and redo the analysis. Perform the F-test for this transformed data (as in question a), above). Also perform additional tests to identify significant differences between community house prices (as in question b), above). Do your conclusions qualitatively agree with those you found in questions a) and b)? [To do this you need to create a new column variable with the formula (property) “log”. There are both menu-methods and double-click methods to create new columns.]
e. Do the standard assumptions for validity appear to be reasonably well satisfied in the transformed model? [Provide the usual diagnostic plots, and comment on them.]
f. My friend would like to predict the community house price in a randomly chosen Delaware County community. He has (randomly) chosen a community in Delaware County and would like an interval of values that has a .95 probability of containing that community’s community-average house price. Find such an interval. [Note: To most conveniently answer this question use the “Save Columns” option in the “Fit Model” platform instead of the “Fit Y by X” platform. Also the “Fit Model” platform uses the word “individual” where we would use the word “prediction”.]