Download Analysis of Variance, Simple Linear Regression - Slides | STAT 51200 and more Study notes Statistics in PDF only on Docsity!
Purdue University Spring 2009
Statistics 512: Applied Regression Analysis
•^ We will cover^ Overview
simple linear regression (SLR)
•
multiple linear regression (MLR)
•
analysis of variance (ANOVA)
January 12, 2009
Purdue University Spring 2009
manipulations. We want to(such as SAS) rather than on mathematical Emphasis will be placed on using selected practical tools
understand
the theory so that
we can
apply
it appropriately. Some of the material on
generalize the methods to MLR.SLR will be review, but our goal with SLR is to be able to
January 12, 2009
Purdue University Spring 2009
Course Information
Class:
Section 3 MWF 2:30-3:20pm at REC 121
Text:
Applied Linear Statistical Models, 5th edition
by
RecommendedKutner, Neter, Nachtsheim, and Li.
:
Applied Statistics and the SAS
Programming Language, 5th edition
by Cody and Smith.
January 12, 2009
Purdue University Spring 2009
Professor:
Dabao Zhang, MATH 534.
Office Hours:
MW 3:30pm-4:30pm or by appointment,
Evaluation: or phone (46046) or e-mail [email protected]
Problem sets will be assigned (more or
to the handout about specific evaluation policies.less) weekly. They will typically be due on Friday. Refer
January 12, 2009
Purdue University Spring 2009
Lecture Notes
•
Available as MS-Word or PDF
•
Usually (hopefully) prepared a week in advance
•
Not comprehensive (Be prepared to take notes.)
•
One/two chapters per week
•
Ask questions if you’re confused
January 12, 2009
Purdue University Spring 2009
Webpage
http://www.stat.purdue.edu/
∼
zhangdb/stat512/
•
Announcements
•
Lecture Notes
•
Homework Assignments
•
Data Sets and SAS files
•
–^ General handouts (please see immediately)
Course Information
January 12, 2009
Purdue University Spring 2009
-^ Blackboard Vista announcements through e-mail.^ I will very occasionally send reminders or^ Mailing List
- (^) Discussion groups • (^) Information restricted to enrolled students • (^) Moniter grades (^) Holds solutions documents
January 12, 2009
Purdue University Spring 2009
•
week in advance for any conflict.make sure that it works for you. Please notify me one2009 (8-10pm). Please check your schedule and^ One midterm exam has been scheduled on March 5,
•
possible.homework deadlines, please let me know as soon as^ If the lecture viewing schedule is not realistic for
•
In class, please try to make sure I hear your question.
•
please be courteous to your classmates.^ Chatting with your neighbors may disturb others,
January 12, 2009
Purdue University Spring 2009
class. SAS is the program we will use to perform data analysis for this
Learning to use SAS will be a large part of the course.
-^ Several sources for help:^ Getting Help with SAS
- (^) SAS Getting Started (inengine) • (^) World Wide Web (look up the syntax in your favorite search (^) SAS Help Files (not always best)
SAS Files
section of class website)
and Tutorials
Statistical Consulting Service
January 12, 2009
Purdue University Spring 2009
Wednesday Evening Help Sessions
editionApplied Statistics and the SAS Programming Language, 5th
by Cody and Smith; most relevant material in Chapters 1,
2, 5, 7, and 9.
Your instructor
http://www.stat.purdue.edu/scs/ Math B5 Hours 10-4 M through F Statistical Consulting Service
January 12, 2009
Purdue University Spring 2009
Off-campus students:
If DACS doesn’t work for you, fill out a
of the first week of classes.notification that you’re sending a license agreeement) by the endDisks will be sent to you. I need the license agreements (orlicense agreement online (in SAS folder), mail or fax it to Pro Ed.
January 12, 2009
Purdue University Spring 2009
•^ Evening Computer Labs
SC 283
•
help with SAS for multiple Stat courses
•
Hours 7pm-9pm Wednesdays
•
starting second week of classes
•
staffed with graduate student TA
January 12, 2009
Purdue University Spring 2009
HelpThere is a tutorial in SAS to hep you get started.SAS file to be correct, since there may be cut-and-paste errors.use in these notes. If the notes differ from the SAS file, take theoutput, or my comments. I will tell you the names of all SAS files Ihow they work. Let me know if you get confused about what is input,real output and experiment with changing the commands to learnpage of notes. You should run the SAS programs yourself to see theI will usually have to edit the output somewhat to get it to fit on thefor you to download from the website.lecture (and any other programs you should need) will be available I will often give examples from SAS in class. The programs used in
(^) →
Getting
Started
with
SAS Software
January 12, 2009
Purdue University Spring 2009
Just try to get a sense of what is going on.For today, don’t worry about the detailed syntax of the commands.with SAS. You should spend some time before next week getting comfortable
January 12, 2009
Purdue University Spring 2009
-^ Variables^ Example (Price Analysis for Diamond Rings in Singapore)
response variable
(^) – price in Singapore dollars (
Y
)
explanatory variable
(^) – weight of diamond in carets (
X
)
-^ Goals
- (^) Predict the price of a sale for a 0.43 caret diamond ring • (^) Fit a regression line (^) Create a scatterplot
January 12, 2009
Purdue University Spring 2009
File SAS Data Step
diamond.sas
on website.
case, we have a sequence of ordered pairs (weight, price).One way to input data in SAS is to type or paste it in. In this
data diamonds;
cards;input weight price @@;
; .43 ..25 655 .35 1086 .18 443 .25 678 .25 675 .15 287 .26 693 .15 316.32 919 .15 298 .16 339 .16 338 .23 595 .23 553 .17 345 .33 945.17 353 .18 438 .17 318 .18 419 .17 346 .15 315 .17 350 .32 918.12 223 .26 663 .25 750 .27 720 .18 468 .16 345 .17 352 .16 332.21 483 .15 323 .18 462 .28 823 .16 336 .20 498 .23 595 .29 860.17 355 .16 328 .17 350 .18 325 .25 642 .16 342 .15 322 .19 485
January 12, 2009
Purdue University Spring 2009
data diamonds1;
if price ne .;set diamonds;
-^ Syntax Notes appear in the^ •^ There is no output from this statement, but information does^ Each line must end with a semi-colon.
log
window.
how to do this will come later.from another file, such as a spreadsheet. Examples showingOften you will obtain data from an existing SAS file or import it
January 12, 2009
Purdue University Spring 2009
SAS
(^) proc print
Obs run; proc print data=diamonds; Now we want to see what the data look like.
weight
price
0.
0.
0.
...
0.
0.
0.
.
January 12, 2009
Purdue University Spring 2009
looks linear. Therepresent data points and adding a curve to see if it We want to plot the data as a scatterplot, using circles to
symbol
statement “
v = circle
”
(v
stands for “value”) lets us do this. The symbol
statement “
i = sm
” will add a smooth line using
the smoothing to work properly, we need to sort the data bywhich stay on until you turn them off. In order for thesplines (interpolation = smooth). These are options
X
variable.
January 12, 2009
Purdue University Spring 2009
proc gplot data=diamonds1;axis2 label=(angle=90 ’Price (Singapore $$)’);axis1 label=(’Weight (Carets)’);title2 ’Scatter plot of Price vs. Weight with Smoothing Cutitle1 ’Diamond Ring Price Study’;symbol1 v=circle i=sm70; proc sort data=diamonds1; by weight;
plot price*weight /
haxis=axis1 vaxis=axis2;
run;
January 12, 2009
Purdue University Spring 2009
January 12, 2009
Purdue University Spring 2009
the data. We use the Now we want to use the simple linear regression to fit a line through
symbol
option “
i
= rl
”, meaning
proc gplot data=diamonds1;title2 ’Scatter plot of Price vs. Weight with Regression L symbol1 v=circle i=rl;“interpolation = regression line” (that’s an “L”, not a one).
plot price*weight / haxis=axis1 vaxis=axis2;
run;
January 12, 2009
Purdue University Spring 2009
January 12, 2009
Purdue University Spring 2009
We use
(^) proc reg
(regression) to estimate a
proc reg data=diamonds;the model is, and what options we want.from the straight line. We tell it what the data are, whatregression line and calculate predictors and residuals
id weight; run;output out=diag p=pred r=resid;model price=weight/clb p r;
January 12, 2009
Purdue University Spring 2009
Analysis
of
Variance
Sum
of
Mean
Source
DF
Squares
Square
F
Value
Model
Error
Corrected
Total
Root
MSE
R-Square
Dependent
Mean
Adj
R-Sq
Coeff
Var
Parameter
Estimates
Parameter
Standard
Variable
DF
Estimate
Error
t
Value
Pr
>
|t|
Intercept
<.0001
weight
<.0001
January 12, 2009
Page 25