









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A part of a statistical programming in sas book by bailer. It covers the use of arrays to define and input values, changing missing value codes, and recoding missing values in numeric and character variables. It also includes examples of expanding a single record into multiple records and simulating plots under csr.
Typology: Exams
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Directory: \Muserver2\USERS\B\baileraj\Classes\sta402\handouts
8.1. Storage bins for collections of values - ARRAYS 8.1.1 Defining values in ARRAY variable list directly 8.1.2 Inputting values in ARRAY variable list 8.1.3 Changing missing value codes for numeric variables to “.” 8.1.4 Recoding missing values for all numeric and character variables 8.1.5 Creating multiple observations from a single record 8.2 Case Study 1: Monte Carlo P-value for test of spatial randomness [omit] 8.3 Remembering variable values across observations – RETAIN 8.3.1 Processing multiple observations for an individual 8.4 Case Study 2: Randomization test for the equality of two populations [omit] Summary References Exercises
data temps; array tempF(4) tempF1-tempF4 (32,50,68,86); array tempC(4) tempC1-tempC4;
do itemp = 1 to 4; tempC(itemp) = 5/9*(tempF(itemp)-32); end; drop itemp;
proc print;
run;
temp temp temp temp temp temp temp temp Obs F1 F2 F3 F4 C1 C2 C3 C 1 32 50 68 86 0 10 20 30
Data D1; ARRAY ADL{7} ADL1-ADL7; input ADL1-ADL7; datalines; 6 6 5 5 5 4 3 ;
Data D2; ARRAY ADL{*} ADL1-ADL7; input ADL1-ADL7; datalines; 6 6 5 5 5 4 3 ;
Data D3; ARRAY ADL{*} t1 t2 t3 t4 t5 time6 time_7; input t1 t2 t3 t4 t5 time6 time_7;
Data D5; input name $ sex $ t1 t2 t3 t4 t5 time6 time_7;
ARRAY num_array{} NUMERIC; ARRAY char_array{} CHARACTER;
/* recode the numeric variables */ DO inum = 1 to dim(num_array); if num_array{inum}=-999 then num_array{inum}=.; END;
/* recode the character variables */ Do ichar = 1 to dim(char_array); if char_array{ichar}="-999" then char_array{ichar}=" "; END;
drop inum ichar; datalines; MrSmith -999 6 6 5 5 5 4 3 -999 F 7 -999 4 4 3 -999 2 ; proc print; title "Recoding missing values using Arrays using DO loop"; run;
Recoding missing values using Arrays using DO loop Obs name sex t1 t2 t3 t4 t5 time6 time_ 1 MrSmith 6 6 5 5 5 4 3 2 F 7. 4 4 3. 2
Name Sex t1 t2 t3 t4 t5 time6 time_ Smith M 6 6 5 5 5 4 3 Jones F 7 5 4 4 3 2 1 Fisher M 5 5 5 3 2 2 1 that we wish to transform into
Name Sex Time ADL Smith M 1 6 Smith M 2 6 Smith M 3 5 Smith M 4 5 Smith M 5 5 Smith M 6 4 Smith M 7 3 Jones F 1 7 Jones F 2 5 Jones F 3 4 Jones F 4 4 Jones F 5 3 Jones F 6 2 Jones F 7 1 Fisher M 1 5 Fisher M 2 5 Fisher M 3 5 Fisher M 4 3 Fisher M 5 2 Fisher M 6 2 Fisher M 7 1
Display 8.5: Calculating a Monte Carlo P-value for testing average nearest neighbor distances /* enter the observed data into data arrays */ data plot_obs; array xobs(4) xobs1-xobs4; array yobs(4) yobs1-yobs4; do idat = 1 to 4; input xobs(idat) yobs(idat) @@; end; datalines; .25 .65 .25 .75 .35 .65 .35. ;
/* Calculate the observed average nearest neighbor distance */ data plot1; set plot_obs; array xobs(4) xobs1-xobs4; array yobs(4) yobs1-yobs4; array nnobs(4) nnobs1-nnobs4;
/* Determine the observed NN distance and average */ sumnnobs = 0; do i=1 to 4; * find NN distance for each point ; nnobs(i) = 100; * initialize distances to be large;
do j=1 to 4; * compare the ith point to all others; d=sqrt( (xobs(i)-xobs(j))2 + (yobs(i)-yobs(j))2 ); if (d<nnobs(i)) and (d>0) then nnobs(i)=d; end; /* Note: Each point is also compared to itself - thus, d= omitted from updated of nearest neighbor distance. */ sumnnobs=sumnnobs+nnobs(i); end;
avgnnobs = sumnnobs/4; * observed average NN distance;
/* Simulate plots under CSR */ data mccsr1; set plot1; array xobs xobs1-xobs4; * observed data; array yobs yobs1-yobs4; array xsim xsim1-xsim4; * CSR simulated data;
array ysim ysim1-ysim4;
array nnobs nnobs1-nnobs4; * observed NN distances; array nncsr nncsr1-nncsr4; * simulated NN distances;
/* Generate a large number of CSR plots with 4 trees / / CSR = completely spatially random */
nsim = 4000; * user-specified option; seed1 = 12345; * if =0, then uses time since midnight;
do isim = 1 to nsim;
do ii = 1 to 4; xsim(ii) = ranuni(seed1); ysim(ii) = ranuni(seed1); end;
/* Find NN distance for the simulated trees */ sumnncsr = 0; do i=1 to 4; nncsr(i) = 100; * initialize;
do j=1 to 4; d=sqrt( (xsim(i)-xsim(j))2 + (ysim(i)-ysim(j))2 ); if (d<nncsr(i)) and (d>0) then nncsr(i)=d; end;
sumnncsr=sumnncsr+nncsr(i); end; avgnncsr = sumnncsr/4;
/* Accumulate counts of patterns consistent with clustering */ ile = (avgnncsr <= avgnnobs); numle = numle + ile;
drop i j ii xobs1-xobs4 yobs1-yobs4 nnobs1-nnobs sumnnobs sumnncsr; output; end; * of the isim - simulation loop;
data test; input id xstart xstop; datalines; 1 15 25 2 10 12 2 18 22 3 6 12 3 14 15 3 17 23 ;
proc print; run;
data test2; set test; by id; **** Comment 1; array start{9} start1-start9; array stop{9} stop1-stop9; array times{9} times1-times9;
retain count 0; **** Comment 2; retain start1-start9 stop1-stop9 times1-times9;
count = count + 1; **** Comment 4; start{count} = xstart; stop{count} = xstop; times{count} = xstop - xstart;
if LAST.id=1 then do; **** Comment 5; first_time = times(1); total_time = sum(of times1-times9); keep id count first_time total_time; output; * output results if last obs for ID; end; run;
proc print; run;
The processing of these observations produces the following
first_ total_ Obs id count time time
1 1 1 10 10 2 2 2 2 6 3 3 3 6 13
Randomization tests (Edgington 1995, Good 2001) can also be used to obtain P-values.
We next examine the results from a t-test comparing the mean responses in the control and 160 μg/l conditions. From this section of code and the accompanying results, we see that the two group t-test resulted in a P-value of 0.035 (assuming equal variances).
proc ttest; title NITROFEN: t-test of (0, 160) concentrations; class conc; var total; run;
Display 8.7: Output from PROC TTEST
NITROFEN: t-test of (0, 160) concentrations The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL conc N Mean Mean Mean SD SD SD Std Err
0 10 28.827 31.4 33.973 2.4737 3.5963 6.5654 1. 160 10 26.612 28.3 29.988 1.6229 2.3594 4.3073 0. Diff (1-2) 0.2424 3.1 5.9576 2.2981 3.0414 4.4977 1.
T-Tests Variable Method Variances DF t Value Pr > |t| total Pooled Equal 18 2.28 0. total Satterthwaite Unequal 15.5 2.28 0.
Equality of Variances Variable Method Num DF Den DF F Value Pr > F total Folded F 9 9 2.32 0.
PROC PLAN to generate a collection of permutations of the indices 1 through 20. The 10 observations corresponding to the first 10 indices in the permuted set are assigned for group 1 for a particular permutation, while the remaining observations are assigned to group 2. These permutations are read into a SAS data set for additional processing.
TRANSPOSE procedure is used to place each permuted set of data onto a single line.
Display 8.8: Constructing a randomization test of two population mean equality proc transpose data=test prefix=xx out=tran_out; var total; run; data obs_test; set tran_out; type = ‘O’; * identifies original data; run; proc print data=obs_test; title ‘Randomization test: observed data’;
run;
proc plan; factors test=4000 ordered in=20; output out=d_permut; run;
/*
PLAN used to generate a set of indices for the randomization test and then use TRANSPOSE to package the output
*/
proc transpose data=d_permut prefix=in out=out_permut(keep=in1-in20); by test; run; proc print data=out_permut; run;
data null; set obs_test; file 'D:\baileraj\Classes\Fall 2008\sta402\SAS- programs\week7-perm.data'; put type xx1-xx20; run;
data null; set out_permut; type = 'P'; * permutation data; file 'D:\baileraj\Classes\Fall 2008\sta402\SAS- programs\week7-perm.data' mod; /* mod option adds lines to existing file */ put type in1-in20; run;
/* week7-perm.data ... O 27 32 34 33 36 34 33 30 24 31 29 29 23 27 30 31 30 26 29 29 P 8 14 4 11 3 2 12 1 6 13 17 9 15 16 5 19 20 7 10 18 P 12 2 8 10 13 7 9 16 4 19 15 3 5 14 17 1 20 11 6 18 P 18 17 13 14 5 8 19 16 3 12 11 9 10 7 2 20 4 6 1 15
... */
data perm_data; array both{ 20 } x1-x10 y1-y10; /* array for observed values / array ins{ 20 } in1-in20; / index array */ array perms{ 20 } xp1-xp10 yp1-yp10;
< - - - P(upper) = 0.
Cumulative Cumulative perm_2tail Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0 3847 96.18 3847 96. 1 153 3.83 4000 100. < - - - P(2 tail) = 0.
Thus we see that the two-tailed P-value from the randomization procedure, 0.0383, is quite similar to the P-value obtained from the t-test.
Cody, R. and Pass, R. (1995) SAS®^ Programming by Example. SAS Institute Inc., Cary, NC. – Chapters 7 (“arrays”), 8 (“retain”), 5 (“SAS functions”)
Edgington, E. (1995) Randomization tests. M. Dekker, New York.
Good, P.I. (2001) Resampling Methods: a practical guide to data analysis. Birkhauser, Boston.
Ripley (1981) Spatial Statistics. John Wiley & Sons. NY.
Waller, L.A. and Gotway-Crawford, C.(2004) Applied spatial statistics for public health data. John Wiley & Sons: NY.