Statistical Programming - Examination | STA 402, Exams of Statistics

Material Type: Exam; Professor: Bailer; Class: Statistical Programming; Subject: Statistics; University: Miami University-Oxford; Term: Fall (First Sem) 2004;

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-z9w
koofers-user-z9w 🇺🇸

10 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Roberts Excel spreadsheet imported: CONTENTS
The CONTENTS Procedure
Week 07/08 [13+ Oct.] Class Activities
C:\Documents and Settings\John Bailer\My Documents\baileraj\
Classes\Fall 2004\sta402\handouts\week-07-08-13oct04.doc
based on:
C:\Documents and Settings\John Bailer\My Documents\baileraj\
Classes\Fall 2003\sta402\handouts\week7-08oct03.doc
&
C:\Documents and Settings\John Bailer\My Documents\baileraj\
Classes\Fall 2003\sta402\handouts\week8-15oct03.doc
SAS PROGRAMMING
* Arrays
* DO groups
* Statements: RETAIN, RENAME, LABEL, FORMAT, SUM
* Using formats in DATA steps
* Conditional execution
* More on missing values
Additional Ref: Cody, R. and Pass, R. (1995) SAS® Programming by Example. SAS
Institute Inc., Cary, NC. – Chapters 7 (“arrays”), 8 (“retain”), 5 (“SAS functions”)
ARRAYS
* look to use if writing the same set of code multiple times
* “arrays” can contain lists of variables
* “arrays” also good for restructuring data sets
Common example 1: Recoding a set of variables
/*
Suppose you have a data set “old_data” containing
Variables: a_var, b_var, var3, var4, var5
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download Statistical Programming - Examination | STA 402 and more Exams Statistics in PDF only on Docsity!

The CONTENTS Procedure

Week 07/08 [13+ Oct.] Class Activities

C:\Documents and Settings\John Bailer\My Documents\baileraj\

Classes\Fall 2004\sta402\handouts\week-07-08-13oct04.doc

based on:

C:\Documents and Settings\John Bailer\My Documents\baileraj\

Classes\Fall 2003\sta402\handouts\week7-08oct03.doc

C:\Documents and Settings\John Bailer\My Documents\baileraj\

Classes\Fall 2003\sta402\handouts\week8-15oct03.doc

SAS PROGRAMMING

* Arrays

* DO groups

* Statements: RETAIN, RENAME, LABEL, FORMAT, SUM

* Using formats in DATA steps

* Conditional execution

* More on missing values

Additional Ref: Cody, R. and Pass, R. (1995) SAS®^ Programming by Example. SAS

Institute Inc., Cary, NC. – Chapters 7 (“arrays”), 8 (“retain”), 5 (“SAS functions”)

ARRAYS

* look to use if writing the same set of code multiple times

* “arrays” can contain lists of variables

* “arrays” also good for restructuring data sets

Common example 1: Recoding a set of variables

Suppose you have a data set “old_data” containing

Variables: a_var, b_var, var3, var4, var

The CONTENTS Procedure

(all numeric with missing values coded as -999)

Recode -999 as missing=.

data old_data;

input a_var b_var var3 var4 var5 @@;

datalines;

run;

data recode_ex; set old_data;

array all[5] a_var b_var var3 var4 var5;

do ii=1 to 5;

if all[ii] = -999 then all[ii]=.;

end;

drop ii;

/* can use either [], {}, () to reference array elements */

options nocenter nodate;

proc print;

run;

Obs a_var b_var var3 var4 var 1 1 2 3 4 5 2 6 7. 8 9 3 10 11 12. 14

/* alternative to get SAS to count array size &

dimension of array

data recode_ex2; set old_data;

array all{*} a_var b_var var3 var4 var5;

do ii=1 to dim(all);

The CONTENTS Procedure

run;

Recode 3: Using NUMERIC to select elements Obs char_var a_var b_var var3 var4 var 1 a 1 2 3 4 5 2 b 6 7. 8 9 3 c 10 11 12. 14

Common example 2: Creating multiple observations from a single observation

data one;

input x1 x2 x3 x4;

datalines;

data two; set one;

array xx[4] x1-x4;

do time=1 to 4;

x=xx[time];

output;

end;

drop x1-x4;

run;

proc print;

title ‘Expand one record to multiple records’;

run;

Expand one record to multiple records Obs time x 1 1 60 2 2 62 3 3 64 4 4 68 5 1 80 6 2 84 7 3 90 8 4 98

Common example 3: Creating one observations from multiple observations

The CONTENTS Procedure

data multi;

input id time heart_rate;

datalines;

proc sort data=multi; by id time;

data sorted by the variable “id”

FIRST.id = 1 if first occurrence of new by group variable

LAST.id = 1 if last occurrence of a by group variable

data one; set multi;

by id;

array xx[4] x1-x4;

retain x1-x4; * values kept from previous observation;

if FIRST.id=1 then do ii=1 to 4;

xx[ii]=.; * elements initialized to missing;

end;

xx[time]=heart_rate;

if LAST.id=1 then output;

keep id x1-x4;

The CONTENTS Procedure

title NITROFEN: t-test of ( 0 , 160 ) concentrations;

class conc;

var total;

run ;

NITROFEN: t-test of (0, 160) concentrations The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL Variable conc N Mean Mean Mean Std Dev Std Dev Std Dev Std Err total 0 10 28.827 31.4 33.973 2.4737 3.5963 6.5654 1. total 160 10 26.612 28.3 29.988 1.6229 2.3594 4.3073 0. total Diff (1-2) 0.2424 3.1 5.9576 2.2981 3.0414 4.4977 1. T-Tests Variable Method Variances DF t Value Pr > |t| total Pooled Equal 18 2.28 0. total Satterthwaite Unequal 15.5 2.28 0. Equality of Variances Variable Method Num DF Den DF F Value Pr > F total Folded F 9 9 2.32 0. */

proc print ;

title NITROFEN: print of ( 0 , 160 ) concentrations;

var conc total;

run ;

NITROFEN: print of (0, 160) concentrations Obs conc total 1 0 27 2 0 32 3 0 34 4 0 33 5 0 36 6 0 34 7 0 33 8 0 30 9 0 24 10 0 31 11 160 29 12 160 29 13 160 23 14 160 27 15 160 30 16 160 31 17 160 30 18 160 26 19 160 29 20 160 29

proc transpose data=test prefix=xx out=tran_out;

var total;

run;

data obs_test; set tran_out;

type = ‘O’;

run;

The CONTENTS Procedure

proc print data=obs_test;

title ‘Randomization test: observed data’;

run;

Randomization test: observed data _ N A x x x x x x x x x x x t O M x x x x x x x x x x x x x x x x x x x x y b E x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 2 p s _ 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 e 1 total 27 32 34 33 36 34 33 30 24 31 29 29 23 27 30 31 30 26 29 29 O */

proc plan ;

factors test= 4000 ordered in= 20 ;

output out=d_permut;

run ;

proc transpose data=d_permut prefix=in out=out_permut(keep=in1-in20); by test;

run ;

proc print data=out_permut;

run ;

data null; set obs_test;

file 'D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week7-perm.data';

put type xx1-xx20;

run ;

data null; set out_permut;

type = 'P'; * permutation data;

file 'D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week7-perm.data'

mod; /* mod option adds lines to existing file */

put type in1-in20;

run ;

/* week7-perm.data ...

O 27 32 34 33 36 34 33 30 24 31 29 29 23 27 30 31 30 26 29 29

P 8 14 4 11 3 2 12 1 6 13 17 9 15 16 5 19 20 7 10 18

P 12 2 8 10 13 7 9 16 4 19 15 3 5 14 17 1 20 11 6 18

P 18 17 13 14 5 8 19 16 3 12 11 9 10 7 2 20 4 6 1 15

P 6 12 4 20 19 16 11 5 15 18 1 8 3 13 17 14 10 9 7 2

P 8 17 4 19 2 11 1 7 6 3 9 13 20 14 12 18 15 10 5 16

P 11 7 17 6 18 13 3 12 8 10 19 16 2 20 4 5 15 1 9 14

P 17 11 4 7 20 6 9 16 1 2 14 12 5 18 10 8 15 13 3 19

data perm_data;

array both{ 20 } x1-x10 y1-y10; /* array for observed values */

array ins{ 20 } in1-in20; /* index array */

array perms{ 20 } xp1-xp10 yp1-yp10; /* array for permuted values */

The CONTENTS Procedure

clustered patterns of response.

Problem:

  • 4 trees were observed in a hypothetical square plot
  • are these trees clustered in this plot? regularly spaced?
  • how can you check?

Strategy:

1. Determine nearest-neighbor distances

2. Calculate the average NN distance

3. Generate a sample of observations that are randomly

distributed in the region of interest

4. Calculate the average NN distance for this set

5. Repeat steps 3 and 4 a large number of times

6. P-values are the proportional of generated samples

that were more extreme than observed

History:

First examined this problem (12apr95) in

[-.classes.ies612]monte_spatial.sas (old VAX file)

options ls=74;

data plot1;

title plot1 assessment of pattern;

array xobs xobs1-xobs4;

array yobs yobs1-yobs4;

array nnobs nnobs1-nnobs4;

input xobs1-xobs4 yobs1-yobs4 @@;

/* Determine the observed NN distance and average */

sumnnobs = 0;

do i=1 to 4; * find NN distance for each point ;

nnobs(i) = 100; * initialize distances to be large;

do j=1 to 4; * compare the ith point to all others;

d=sqrt( (xobs(i)-xobs(j))2 + (yobs(i)-yobs(j))2 );

if (d<nnobs(i)) and (d>0) then nnobs(i)=d;

* output; * output if debugging desired;

end;

The CONTENTS Procedure

sumnnobs=sumnnobs+nnobs(i);

end;

avgnnobs = sumnnobs/4; * observed average NN distance;

datalines;

proc print;

data mccsr1; set plot1;

array xobs xobs1-xobs4;

array yobs yobs1-yobs4;

array xsim xsim1-xsim4;

array ysim ysim1-ysim4;

array nnobs nnobs1-nnobs4;

array nncsr nncsr1-nncsr4;

/* Generate a large number of CSR plots with 4 trees */

/* CSR = completely spatially random */

* initialize counters of nn avg dist le or ge than observed;

numle = 0; numge = 0;

do isim = 1 to 1000;

do ii = 1 to 4;

xsim(ii) = ranuni(0);

ysim(ii) = ranuni(0);

end;

/* Find NN distance for the simulated trees */

sumnncsr = 0;

do i=1 to 4;

nncsr(i) = 100; * initialize;

do j=1 to 4;

d=sqrt( (xsim(i)-xsim(j))2 + (ysim(i)-ysim(j))2 );

if (d<nncsr(i)) and (d>0) then nncsr(i)=d;

* output; * debugging;

end;

The CONTENTS Procedure

data retain_demo1;

input dobs time x;

retain subject 0 ;

if time= 1 then subject=subject+ 1 ;

datalines;

proc print ;

id dobs;

run ;

dobs time x subject 1 1 60 1 2 2 62 1 3 3 64 1 4 4 68 1 5 1 80 2 6 2 84 2 7 3 90 2 8 4 98 2

data retain_demo2;

input dobs time x;

if time=1 then subject+ 1 ; * implicitly retains values for calculations;

datalines;

options nocenter;

proc print ;

title2 ‘implicitly retain with subject+1 statement’;

id dobs;

run ;

implicitly retain with subject+1 statement dobs time x subject 1 1 60 1 2 2 62 1 3 3 64 1 4 4 68 1 5 1 80 2 6 2 84 2 7 3 90 2

The CONTENTS Procedure

example: find the average weight by subject using

DATA step programming

/* STEP 1: read in the data file */

data diet;

input id @3 date mmddyy8. weight;

format date mmddyy8.;

datalines;

proc print;

title ‘diet data’;

run;

diet data Obs id date weight 1 1 10/01/92 155 2 1 10/08/92 158 3 1 10/15/92 158 4 1 10/22/92 158 5 2 09/02/92 200 6 2 09/09/92 198 7 2 09/16/92 196 8 2 09/23/92 202

data diet2; set diet;

The CONTENTS Procedure

STEPS 2 and 3 ALTERNATIVE:

Accumulate cumulative weight and average of measurements

And then extract the last measurement for each ID

data diet5; set diet;

retain total 0 count 0 ;

if id = lag(id) then do;

total=total+weight;

count+ 1 ;

wt_avg = total/count;

end;

else if id NE lag(id) then do;

total = weight;

count= 1 ;

wt_avg = total/count;

end;

proc print ;

run ;

Obs id date weight total count wt_avg 1 1 10/01/92 155 155 1 155. 2 1 10/08/92 158 313 2 156. 3 1 10/15/92 158 471 3 157. 4 1 10/22/92 158 629 4 157. 5 2 09/02/92 200 200 1 200. 6 2 09/09/92 198 398 2 199. 7 2 09/16/92 196 594 3 198. 8 2 09/23/92 202 796 4 199.

data diet6; set diet5; by id;

if LAST.id;

keep id wt_avg;

proc print;

run;

Obs id wt_avg 1 1 157. 2 2 199.

example: find the total time enrolled for each participant

[motivated by an example where people may enroll/

disenroll in a program during different quarters]

The CONTENTS Procedure

options formdlim="-";

data test;

input id xstart xstop;

datalines;

proc print;

run;

data test2; set test; by id;

array start{9} start1-start9;

array stop{9} stop1-stop9;

array times{9} times1-times9;

retain count 0;

retain start1-start9 stop1-stop9 times1-times9;

if FIRST.id=1 then do; * initialize count and arrays with new ID;

count = 0;

do ii=1 to 9;

start{ii} = .;

stop{ii} = .;

times{ii} = .;

end;

end;

count = count + 1;

start{count} = xstart;

stop{count} = xstop;

times{count} = xstop - xstart;

if LAST.id=1 then output; * output results if last obs for ID;

drop xstart xstop ii;

run;

data test3; set test2;

total_time = sum(of times1-times9);

run;

proc print;

run;

material from

C:\Documents and Settings\John Bailer\My Documents\baileraj\

Classes\Fall 2003\sta402\handouts\week8-15oct03.doc

The CONTENTS Procedure

* in the example above, two data sets are defined/referenced -

LIBREF = “class” and MEMBER = “nitrofen”

(where the “class” LIBREF refers to directory D:\baileraj\Classes\Fall 2003\sta402\data)

LIBREF = “WORK” (default – reserved for temporary SAS data sets) and

MEMBER = “nitrofen_A” [to make this permanent]

data class.nitrofen_A; set class.nitrofen;

brood=1; count=brood1; conc=conc; output;

brood=2; count=brood2; conc=conc; output;

brood=3; count=brood3; conc=conc; output;

keep brood count conc;

Create data sets for illustration by

Creating multiple data sets in one DATA step

libname class 'D:\baileraj\Classes\Fall 2003\sta402\data';

data dbrood1 dbrood2 dbrood3; set class.nitrofen;

animal = n; * create animal ID number;

brood= 1 ; count=brood1; conc=conc; output dbrood1;

brood= 2 ; count=brood2; conc=conc; output dbrood2;

brood= 3 ; count=brood3; conc=conc; output dbrood3;

keep animal brood count conc;

ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week6-

tab1.rtf';

proc print data=dbrood1;

title ‘Nitrofen Brood 1 data’;

id animal;

run ;

proc print data=dbrood2;

title ‘Nitrofen Brood 2 data’;

id animal;

run ;

proc print data=dbrood3;

title ‘Nitrofen Brood 3 data’;

id animal;

run ;

ODS RTF close;

The CONTENTS Procedure

Obs animal conc brood count

Obs animal conc brood count

Obs animal conc brood count

SET statement for concatenation and interleaving – essentially adds observations to a data set

data all3_stack; set dbrood1 dbrood2 dbrood3;

title ‘Concatenate/stack all 3 brood-specific files into one file’;

ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week8-

prt2.rtf';

proc print;

id animal;

run;