Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

S-Plus/R Quantitative Programming: Objects, Data Manipulation, and Graphics - Prof. A. Joh, Study notes of Statistics

Miami University - Oxford Statistics

Prof. A. John Bailer

An introduction to quantitative programming in s-plus/r, focusing on objects such as vectors, lists, matrices, and dataframes, as well as reading and summarizing data, and creating high-resolution graphs. It also covers using sas procedures, transforming sas datasets, and combining and managing sas datasets. Examples and references to relevant sas and r documentation.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-8oy 🇺🇸

10 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

Week 12+ Class Activities

File: week-12PLUS-05dec05.doc

Directory (hp/compaq):

C:\baileraj\Classes\Fall 2005\sta402\handouts

TOPICS IN STATISTICAL PROGRAMMING (varies)

* Introduction to quantitative programming in S-Plus/R

* objects – vectors, lists, matrices, dataframes

* reading data [scan, read.table, sas.get]

* summarizing data sets [mean, var, summary, table]

* graphical displays [plot, pairs, trellis displays]

* writing functions

References:

Venables WW and Ripley BD. 1997. Modern Applied Statistics with S-

Plus. Springer-Verlag.

Venables WW and Ripley BD. 2000. S Programming. Springer-Verlag.

Spector P. 1994. An introduction to S and S-Plus. Duxbury Press

Krause A and Olson M. 1997. The Basics and S and S-Plus.

Springer.

S-Plus for Windows Documentation

http://216.211.131.2/support/doc_splus_win.asp

Getting Started with S-Plus for Windows (GUI stuff)

http://216.211.131.2/support/splus60win/tutorial.pdf

http://www.insightful.com/support/doc_splus_win.asp

R Project downloads, FAQs and manuals http://www.r-project.org/

Past SAS activities and how these might relate to S-Plus/R ideas

week SAS ideas S-Plus/R ideas

Discover Study notes of Statistics Miami University - Oxford

Partial preview of the text

Download S-Plus/R Quantitative Programming: Objects, Data Manipulation, and Graphics - Prof. A. Joh and more Study notes Statistics in PDF only on Docsity!

Week 12+ Class Activities

File: week-12PLUS-05dec05.doc

Directory (hp/compaq): C:\baileraj\Classes\Fall 2005\sta402\handouts TOPICS IN STATISTICAL PROGRAMMING (varies)

Introduction to quantitative programming in S-Plus/R
objects – vectors, lists, matrices, dataframes
reading data [ scan , read.table, sas.get ]
summarizing data sets [ mean, var, summary, table ]
graphical displays [ plot, pairs , trellis displays]
writing functions References: Venables WW and Ripley BD. 1997. Modern Applied Statistics with S- Plus. Springer-Verlag. Venables WW and Ripley BD. 2000. S Programming. Springer-Verlag. Spector P. 1994. An introduction to S and S-Plus. Duxbury Press Krause A and Olson M. 1997. The Basics and S and S-Plus. Springer. S-Plus for Windows Documentation http://216.211.131.2/support/doc_splus_win.asp Getting Started with S-Plus for Windows (GUI stuff) http://216.211.131.2/support/splus60win/tutorial.pdf http://www.insightful.com/support/doc_splus_win.asp R Project downloads, FAQs and manuals http://www.r-project.org/ Past SAS activities and how these might relate to S-Plus/R ideas week SAS ideas S-Plus/R ideas

1 BASIC CONCEPTS

*Review basic concepts of statistical computing and research data management

Introduce SAS data sets
Review the form of SAS Statements and SAS names
Introduce SAS procedures
Review the structure of SAS programs
Describe SAS data libraries and what they can contain
Show documenting SAS programs using comments
Illustrate running SAS programs and basic debugging Introduce language elements Objects in S-Plus/R (scalars, vectors, matrices, dataframes, lists) Object-oriented functionality 2 USING SAS PROCEDURES
Introduce the idea of SAS system options
Briefly review statements that can be used with most procedures (BY, WHERE, TITLE, FOOTNOTE, LABEL, FORMAT)
PROC CONTENTS for describing a data set
PROC PRINT for listing the observations in a data set
PROC CHART and PROC PLOT for producing low resolution graphs
PROC FREQ for one-way frequency tables and n-way cross- tabulations
PROC UNIVARIATE for descriptive statistics and distributional information
PROC MEANS for descriptive statistics
PROC SORT for sorting a data set
SAS documentation and the online help system
options()
titles build into most displays
names(dataframe)
low resolution graphs via printer()
table() – frequency tables
summary()
means(), var(), quantile(), etc.
sort(), index()
?command 3 REPORT WRITING
Introduce the Output Delivery System (ODS) for customizing procedure output
PROC TABULATE for producing nicely-formatted tables No ODS analog although can save graphics in a host of different formats 4 AN INTRODUCTION TO STATISTICAL MODELING
PROC REG for linear modeling (a very basic introduction)
PROC GLM for anova models lm (linear models), glm (generalized linear models) 5 HIGH-RESOLUTION GRAPHICS AND FORMATS
Introduce concepts related to high-resolution graphs
PROC GCHART and PROC GPLOT for producing high- resolution graphs
SAS-supplied formats and PROC FORMAT for user-defined formats plot, pairs, boxplot,

11 SAS/IML Programming

Basic matrix concepts: rows, columns, scalars
matrix operators
subscripting
matrix functions
creating matrices from data sets and vice versa
sample applications matrix manip. part of the language Overview and abbreviated history (from Krause/Olson and R project page) S is a quantitative programming environment developed at AT&T Bell Labs
1976 start
1981 rewrite in C and port to unix
1985 function concept introduced
1987 S-Plus, a supported and extended version of S, started
1989 S-Plus for DOS
1993 S-Plus for Windows
1997 GUI for S-Plus for Windows
ca 1997 R development team working (after R started by Robert Gentleman and Ross Ihaka of the Statistics Department of the University of Auckland R is a language and environment for statistical computing and graphics.
GNU project which is similar to the S language, different implementation of S
provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control. R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs out of the box on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux). It also compiles and runs on Windows 9x/NT/2000 and MacOS.

GUI vs. Command Line S-Plus has a Graphical User Interface available (GUI) -- see http://216.211.131.2/support/splus60win/tutorial.pdf for more information We focus on command line interactions with S-Plus/R Getting help

menu
> help() [where “>” is the default command prompt in S/R]
> ?quantile > help(quantile) # for help on specific command > ?”^” > help(“^”) # for help on operator Syntax Assignment (via “<-“ or “_”) TOPICS IN STATISTICAL PROGRAMMING (varies)

writing functions Functions/Operators

HELP .. ?fun.name help(fun.name) args(fun.name) Object-oriented: classes and Methods (into S in 1992) Classes = type of object under consideration Methods = what to do with object Most commonly used Methods: print, plot, summary (others: predict, fitted, resid, coef) Methods implementation Methodname.classname

"model.matrix.lm" "multicomp.lm" "plot.lm" "predict.lm" "print.lm" splus splus nlme3 splus splus "print.summary.lm" "proj.lm" "qqnorm.lm" "residuals.lm" "ssType3.lm" stat splus menu menu menu "step.lm" "summary.lm" "tabPlot.lm" "tabPredict.lm" "tabSummary.lm" Operators

/ ^

(unary minus) %/% integer divide (7 %/% 2 = 3) %% modulo (7 %% 2 = 1) %*% matrix multiplication %o% outer product : sequence Logical operators == != < <= > >= & = elementwise AND vec1 <- c(2,4,8) vec2 <- c(3,9, 27) vec1 <=4 & vec2 < (T T F) AND (T F F) = (T F F) | = elementwise OR vec1 <- c(2,4,8) vec2 <- c(3,9, 27) vec1 <=4 | vec2 < (T T F) OR (T F F) = (T T F) && = Control AND (mode(vec1)==”numeric”) && (min(vec1)>0)) T

|| = Control OR ! = unary not !(vec1 <=4 | vec2 <9) (F F T) Precedence

{} > () > $ > [ [[ > ^ > - > : > %*% %/% %% >

*/ > +- > <. <= >= == != >! > [& && | ||] >

<<- > <- _ ->

Example with functions x1 <- c(5:1, 1:5) x [1] 5 4 3 2 1 1 2 3 4 5 x1== [1] T F F F F F F F F T y1<- c(letters[5:1],LETTERS[1:5]) y [1] "e" "d" "c" "b" "a" "A" "B" "C" "D" "E" y1[x1==5] (1:length(x1))[x1==5] [1] 1 10 seq(along=x1)[x1==5] [1] 1 10 x2<-c(x1,NA) x [1] 5 4 3 2 1 1 2 3 4 5 NA x2[!is.na(x2)] [1] 5 4 3 2 1 1 2 3 4 5 sort(x1) [1] 1 1 2 2 3 3 4 4 5 5 order(x1) [1] 5 6 4 7 3 8 2 9 1 10

[7] 40.679650 44.390966 49.811534 52.

plot(xx,yy) myfit <- lm(yy ~ xx) myfit Call: lm(formula = yy ~ xx) Coefficients: (Intercept) xx 2.658308 5. Degrees of freedom: 10 total; 8 residual Residual standard error: 1. > names(myfit) [1] "coefficients" "residuals" "fitted.values" [4] "effects" "R" "rank" [7] "assign" "df.residual" "contrasts" [10] "terms" "call" > pmin pmax # returns vector of max/min of each position of 2 vecs APPLY Functions apply(obj, dim, function) lapply(list, function) sapply(list,function) tapply(object,indices,function) state.tmp <- as.data.frame(state.x77) attach(state.tmp) inc <- cut(Income,breaks=quantile(Income),include.lowest=T) inc [1] 1 4 3 1 4 4 4 3 4 2 4 2 4 2 3 3 1 1 1 4 3 3 3 1 2 2 2 4 2 4 1 4 1 4 3 1 3 [38] 2 3 1 2 1 2 2 1 3 4 1 2 3 attr(, "levels"): [1] "3098.00+ thru 3992.75" "3992.75+ thru 4519.00" "4519.00+ thru 4813.50" [4] "4813.50+ thru 6315.00" attr(, "class"): [1] "category" tapply(Murder,inc,mean) 3098.00+ thru 3992.75 3992.75+ thru 4519.00 4519.00+ thru 4813. 9.707692 6.191667 5. 4813.50+ thru 6315.

unlist(lapply(split(Murder,inc),mean))

3098.00+ thru 3992.75 3992.75+ thru 4519.00 4519.00+ thru 4813. 9.707692 6.191667 5. 4813.50+ thru 6315.

By functions by(nitrofen$total,nitrofen$conc,function(x) {sqrt(var(x))}) INDICES: x x 3.

INDICES: x x 3.

INDICES: x x 2.

INDICES: x x 5.

INDICES: x x 3. by(nitrofen$total,nitrofen$conc,function(x) {sqrt(var(x))}) INDICES: x x 3.

INDICES: x x 3.

INDICES: x x 2.

INDICES: x x 5.

INDICES: x x 3. lapply(split(nitrofen$total,nitrofen$conc),summary) $"0": Min. 1st Qu. Median Mean 3rd Qu. Max. "24.00" "30.25" "32.50" "31.40" "33.75" "36.00" $"80": Min. 1st Qu. Median Mean 3rd Qu. Max. "26.0" "29.5" "32.5" "31.5" "33.0" "36.0"

*** writing output** cat(object,”\n”) cat(“Another plot?”) ans<- readline() if (ans==”Y”) | (ans==”y”) { expression } *** couple of tricks**

eval(parse(text= character-with-command )) cmd="objects()" cmd [1] "objects()" eval(parse(text=cmd)) [1] ".Last.fixed" ".Last.value" ".Random.seed" "Temp" [5] "ans" "cmd" "imat" "inc" [9] "intensity" "last.dump" "my.df" "my.lst" [13] "myfunc" "new.mat" "nitro.linreg" "nitrofen" [17] "ord.int" "par.old" "pi.est" "pi.est.fcn" [21] "pi.est.fcn2" "plot.any.func" "plot.any.func2" "state.tmp" [25] "tmp" "trt" "x" "x1" [29] "x2" "x3" "x4" "x5" [33] "xmat1" "xmat1.df" "xmat2" "xmat3" [37] "xmat4" "y" "y1" "y2"
deparse(substitute(object-name))

constructs character variable with value “object-name”

paste to build commands xmat C1 C2 C R1 1 4 7 R2 2 5 8 R3 3 6 9 dimnames(xmat1)<- list(paste("Row",1:3,sep=""),paste("Col",1:3,sep="")) xmat Col1 Col2 Col Row1 1 4 7 Row2 2 5 8 Row3 3 6 9

example: MC estimate of PI pi.est.fcn2 <- function(nsims=1400,do.plot=T) { xsim <- runif(nsims) ysim <- runif(nsims) icheck <- as.numeric( ysim <= sqrt(1-xsim^2) ) points.under <- sum(icheck)/nsims if (do.plot==T) { plot(xsim,ysim,type=”n”,pty=”s”) text(xsim,ysim,icheck,col=ifelse(icheck==1,1,2)) xp <- sort(xsim) lines(xp,sqrt(1-xp^2),lwd=3) } piest <- 4points.under se.piest <- 4sqrt(points.under(1-points.under)/nsims) CL <- piest + c(-1,1)qnorm(.975)se.piest list(est.pi = piest, se.est.pi = se.piest, conf.int = CL) } pi.est.fcn2(nsims=4000) example: MC estimate of PI plot.any.func <- function(func,x.low,x.high, ltype=”l”,mtitle=””) { xp <- seq(from=x.low,to=x.high,length=1000) yp <- func(xp) plot(xp,yp,type=ltype, ylab=deparse(substitute(func)), main=mtitle) } plot.any.func(cos,-2pi,2pi,mtitle=”Cosine over -2pi to 2*pi”)

t.based.ci<- function(xxx) { mean(xxx) + c(-1,1)qt(.975,length(xxx)-1) sqrt(var(xxx)/length(xxx)) } one.sample.ci.sim <- function(Nsims=1400,Nobs=15,mu=0) { xmat<- matrix(rnorm(Nsims*Nobs),nrow=Nsims,ncol=Nobs) xres<- t(apply(xmat,1,t.based.ci)) check<- (xres[,1]<=mu)&(mu<=xres[,2]) cover <- sum(check)/Nsims cat(“Normal population:\n Number of observations = “,Nobs, ”\n Number of simulated experiments = “,Nsims, ”\n Coverage Probability = “,cover,”\n\n”) } for (iNobs in c(15,30,45)) { one.sample.ci.sim(Nsims=1400,Nobs=iNobs) } Normal population: Number of observations = 15 Number of simulated experiments = 1400 Coverage Probability = 0. Normal population: Number of observations = 30 Number of simulated experiments = 1400 Coverage Probability = 0. Normal population: Number of observations = 45 Number of simulated experiments = 1400 Coverage Probability = 0.

LAB EXERCISES

Reading data into R Manatee.lst <- scan(what=list(year=0,nboats=0,ndead=0)) 77 447 13. 78 460 21. 79 481 24. 80 498 16. 81 513 24. 82 512 20. 83 526 15.

make a data frame with the Manatee data

Manatee.df <- as.data.frame(Manatee.lst) Manatee.df year nboats ndead 1 77 447 13 2 78 460 21 3 79 481 24 4 80 498 16 5 81 513 24 6 82 512 20 7 83 526 15 8 84 559 34 9 85 585 33 10 86 614 33 11 87 645 39 12 88 675 43 13 89 711 50 14 90 719 47

make a matrix with the Manatee data

Manatee.mat <- as.matrix(Manatee.df) pairs(Manatee.mat)

make a plot or two

plot(Manatee.df$nboats,Manatee.df$ndead)

how about multiple plots on a page

par(mfrow=c(2,2))

fit linear regression

man.fit <- lm(ndead~nboats, data=Manatee.df)

plot 1

plot(ndead~nboats, data=Manatee.df, xlab=”Number of boats registered (1000s)”, ylab=”Number of Manatees killed”, main=”Plot of Manatee Deaths vs. boats registered\n(years 1977- 1990)” )

pairs(state.x91[,c(1,2,3,6,7,8)]) # what does this do? faces(state.x91,labels=state.abb) stars(state.x91,labels=state.abb)

Generate a data set with x = rep(c(0,80,160,240), mu.x = exp(log(30) + .008x - .00015x*x) and Y~Poisson(mu.x). a) Plot randomly generated Y (use rpois) vs. x. b) Generate the linear regression fit of sqrt(y) = b0 + b1 X + b2 X^2 (sqrt transformation is the variance stabilizing transformation for Poisson data). c) Superimpose this fit on a scatterplot of the data.
Modify the plotting function to pass the number of x values you wish to plot. Note that you can also add the ability to pass other non-specified arguments via an ellipsis (...) in the argument list.

plot.any.func2 <- function(func,x.low,x.high, ltype=”l”,mtitle=””,...) { xp <- seq(from=x.low,to=x.high,length=100) yp <- func(xp) plot(xp,yp,type=ltype, ylab=deparse(substitute(func)), main=mtitle,...) }

check out how other parameters are passed in the function

calls below

plot.any.func2(cos,0,3pi,ltype="l",lwd=13) plot.any.func2(cos,0,3pi,ltype=”p”,cex=3) plot.any.func2(cos,0,3*pi,ltype=”p”,pch=6)

Modify the t-based CI function (one.sample.ci.sim) to study samples from uniform and exponential samples.
Import “Roberts Data.xls” (an Excel spreadsheet) into a data frame. Examine variables in this data frame. Generate summary statistics for each numeric variable in this data set. Plot X=Treatment..Hours. Y=Liver.SOD
Import “Manatee” SAS data set into a data frame. Plot number of manatees deaths versus number of boats registered. Generate a linear regression fit relating number of deaths to number of boats registered. Superimpose this fit on the manatee plot. You may want to look at “lm”, “plot”, “lines”, “abline” and “coef” functions.

Q: How do I delete a bunch of similarly named files?

Suppose you have a number of files with “tmp” as part of the name... xtmp<- ytmp <- 111 ztmp <- “more junk” objects(pattern=”tmp”) rm(list=objects(pattern="tmp"))

S-Plus/R Quantitative Programming: Objects, Data Manipulation, and Graphics - Prof. A. Joh, Study notes of Statistics

Related documents

Partial preview of the text

Download S-Plus/R Quantitative Programming: Objects, Data Manipulation, and Graphics - Prof. A. Joh and more Study notes Statistics in PDF only on Docsity!

Week 12+ Class Activities

File: week-12PLUS-05dec05.doc

1 BASIC CONCEPTS

{} > () > $ > [ [[ > ^ > - > : > %*% %/% %% >

*/ > +- > <. <= >= == != >! > [& && | ||] >

<<- > <- _ ->

[7] 40.679650 44.390966 49.811534 52.

LAB EXERCISES

make a data frame with the Manatee data

make a matrix with the Manatee data

make a plot or two

how about multiple plots on a page

fit linear regression

plot 1

check out how other parameters are passed in the function

calls below

Q: How do I delete a bunch of similarly named files?

Q: How do I manipulate multiple graphs?