Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

SAS Programming: Recoding Missing Values, Creating Observations, and Merging Data - Prof. , Exams of Statistics

Miami University - Oxford Statistics

Prof. A. John Bailer

Sas code examples demonstrating various data manipulation techniques, including recoding missing values using arrays, creating multiple observations from a single observation, and merging multiple data sets. These examples can be useful for students and researchers working with sas data analysis.

Typology: Exams

Pre 2010

Uploaded on 08/18/2009

koofers-user-c70 🇺🇸

9 documents

1 / 37

This page cannot be seen from the preview

Don't miss anything!

Week 07/08 Class Activities 1

File: week-07-08-05oct05.doc

Directory (hp/compaq):

C:\baileraj\Classes\Fall 2005\sta402\handouts

See also: Chapter-07-26may05.doc (*.pdf) on Blackboard

based on:

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2004\sta402\handouts\week-07-08-

13oct04.doc

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2003\sta402\handouts\week7-

08oct03.doc

&

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2003\sta402\handouts\week8-

15oct03.doc

SAS PROGRAMMING

* Arrays

* DO groups

* Statements: RETAIN, RENAME, LABEL, FORMAT, SUM

* Using formats in DATA steps

* Conditional execution

* More on missing values

Additional Ref: Cody, R. and Pass, R. (1995) SAS® Programming by Example. SAS

Institute Inc., Cary, NC. – Chapters 7 (“arrays”), 8 (“retain”), 5 (“SAS functions”)

ARRAYS

* look to use if writing the same set of code multiple times

* “arrays” can contain lists of variables

* “arrays” also good for restructuring data sets

Discover Exams of Statistics Miami University - Oxford

Partial preview of the text

Download SAS Programming: Recoding Missing Values, Creating Observations, and Merging Data - Prof. and more Exams Statistics in PDF only on Docsity!

Week 07/08 Class Activities 1

File: week-07-08-05oct05.doc

Directory (hp/compaq):

C:\baileraj\Classes\Fall 2005\sta402\handouts

based on:

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2004\sta402\handouts\week-07-08-

13oct04.doc

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2003\sta402\handouts\week7-

08oct03.doc

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2003\sta402\handouts\week8-

15oct03.doc

SAS PROGRAMMING

* Arrays

* DO groups

* Statements: RETAIN, RENAME, LABEL, FORMAT, SUM

* Using formats in DATA steps

* Conditional execution

* More on missing values

Additional Ref: Cody, R. and Pass, R. (1995) SAS

Programming by Example. SAS

Institute Inc., Cary, NC. – Chapters 7 (“arrays”), 8 (“retain”), 5 (“SAS functions”)

ARRAYS

* look to use if writing the same set of code multiple times

* “arrays” can contain lists of variables

* “arrays” also good for restructuring data sets

Common example 1a: Recoding a set of variables^2

Suppose you have a data set “old_data” containing

Variables: a_var, b_var, var3, var4, var

(all numeric with missing values coded as -999)

Recode -999 as missing=.

data old_data;

input a_var b_var var3 var4 var5 @@;

datalines;

run;

data recode_ex; set old_data;

array all[5] a_var b_var var3 var4 var5;

do ii=1 to 5;

if all[ii] = -999 then all[ii]=.;

end;

drop ii;

/* can use either [], {}, () to reference array elements */

options nocenter nodate;

proc print;

run;

Obs a_var b_var var3 var4 var 1 1 2 3 4 5 2 6 7. 8 9 3 10 11 12. 14

/* alternative to get SAS to count array size &

drop ii; 4

proc print;

title ‘Recode 3: Using NUMERIC to select elements’;

run;

Recode 3: Using NUMERIC to select elements Obs char_var a_var b_var var3 var4 var 1 a 1 2 3 4 5 2 b 6 7. 8 9

3 c 10 11 12. 14

Recoding both numeric and character values using arrays

Data D5;

input name $ sex $ t1 t2 t3 t4 t5 time6 time_7;

ARRAY num_array{} NUMERIC*;

ARRAY char_array{} CHARACTER*;

/* recode the numeric variables */

DO inum = 1 to dim(num_array);

if num_array{inum}=-999 then num_array{inum}=.;

END;

/* recode the character variables */

Do ichar = 1 to dim(char_array);

if char_array{ichar}="-999" then char_array{ichar}=" ";

END;

drop inum ichar;

datalines;

MrSmith -999 6 6 5 5 5 4 3

-999 F 7 -999 4 4 3 -999 2

proc print;

title "Recoding missing values using Arrays using DO loop";

run;

Recoding missing values using Arrays using DO loop

Obs name sex t1 t2 t3 t4 t5 time6 time_

1 MrSmith 6 6 5 5 5 4 3

2 F 7. 4 4 3. 2 5

C ommon example 1b: Recoding a set of variables (with array initialized)

data temps;

array tempF(4) tempF1-tempF4 (32,50,68,86);

array tempC(4) tempC1-tempC4;

do itemp = 1 to 4;

tempC(itemp) = 5/9*(tempF(itemp)-32);

end;

drop itemp;

proc print;

run;

temp temp temp temp temp temp temp temp

Obs F1 F2 F3 F4 C1 C2 C3 C

C ommon example 2: Creating multiple observations from a single observation

data one;

1-x4;

input x1 x2 x3 x4;

datalines;

data two; set one;

array xx[4] x1-x4;

do time=1 to 4;

x=xx[time];

output;

end;

drop x

run;

ta one; set multi; 7

[4] x1-x4;

lues kept from previous observation;

if FIRST.id=1 then do ii=1 to 4;

ized to missing;

=heart_rate;

t;

oc print;

se multiple records to one record’;

ltiple records to one record

da

by id;

array xx

retain x1-x4; * va

xx[ii]=.; * elements initial

end;

xx[time]

if LAST.id=1 then outpu

keep id x1-x4;

run;

pr

title ‘Conden

run;

Condense mu Obs id x1 x2 x3 x 1 1 60 62 64 68 2 2 80 84 90 9

Alternative using PROC TRANSPOSE

proc transpose data=multi out=one_tran(keep=id hr1-hr4)

prefix=HR; by id;

var heart_rate;

proc print data=one_tran;

title alternative using PROC TRANSPOSE;

run;

alternative using PROC TRANSPOSE HR

ommon example 4: Inputting values in ARRAY variable list

Obs id HR1 HR2 HR 1 1 60 62 64 68 2 2 80 84 90 98

C

7} ADL1-ADL7;

or

Data D2;

*} ADL1-ADL7;

or

Data D3;

*} t1 t2 t3 t4 t5 time6 time_7;

more complicated example: Randomization test for testing equality of 2 populations

Data D1;

ARRAY ADL{

input ADL1-ADL7;

datalines;

ARRAY ADL{

input ADL1-ADL7;

datalines;

ARRAY ADL{

input t1 t2 t3 t4 t5 time6 time_7;

datalines;

A

use PLAN to generate a set of indices for the randomization test

and then use TRANSPOSE to package the output

/* nitrofen data

concentrations 0 and 160 will be used to illustrate a randomization test

libname class 'D:\baileraj\Classes\Fall 2003\sta402\data';

data test; set class.nitrofen;

if conc= 0 | conc= 160 ;

proc ttest ;

title NITROFEN: t-test of ( 0 , 160 ) concentrations;

class conc;

var total;

run ;

proc print data=obs_test; 10

title ‘Randomization test: observed data’;

run;

Randomiza tion test: observed data _

x x x x x x x x x x x t x x x x x x x x x x x x x x x x x x x x y

N

A

O M

b E x x x x x x x x x 1 1 1 1 1 1 1 1 1 1 2 p s _ 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 e

1 total 27 32 34 33 36 34 33 30 24 31 29 29 23 27 30 31 30 26 29 29 O */

proc plan ;

fa torsc test= 4000 ordered in= 20 ;

output out=d_permut;

run ;

proc transpose data=d_permut prefix=in out=out_permut(keep=in1-in20); by

test;

run ;

proc print data=out_permut;

run ;

data null; set obs_test;

file 'D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week7-perm.data';

put type xx1-xx20;

run ;

data null; set out_permut;

type = 'P'; * permutation data;

file 'D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week7-perm.data'

mod; /* mod option adds lines to existing file */

put type in1-in20;

run ;

/* week7-perm.data ...

O 27 32 34 33 36 34 33 30 24 31 29 29 23 27 30 31 30 26 29 29

P 8 14 4 11 3 2 12 1 6 13 17 9 15 16 5 19 20 7 10 18

P 12 2 8 10 13 7 9 16 4 19 15 3 5 14 17 1 20 11 6 18

P 18 17 13 14 5 8 19 16 3 12 11 9 10 7 2 20 4 6 1 15

P 6 12 4 20 19 16 11 5 15 18 1 8 3 13 17 14 10 9 7 2

P 8 17 4 19 2 11 1 7 6 3 9 13 20 14 12 18 15 10 5 16

P 11 7 17 6 18 13 3 12 8 10 19 16 2 20 4 5 15 1 9 14

P 17 11 4 7 20 6 9 16 1 2 14 12 5 18 10 8 15 13 3 19

data perm_data;

array both{ 20 } x1-x10 y1-y10; /* array for observed values */

array ins{ 20 } in1-in20; /* index array */

array perms{ 20 } xp1-xp10 yp1-yp10; /* array for permuted values */

infile 'D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week7- 11

perm.data';

input type $ @;

if type='O' then do;

input x1-x10 y1-y10;

obs_diff = mean(of x1-x10) - mean(of y1-y10);

retain obs_diff x1-x10 y1-y10;

end;

else do;

input in1-in20;

do ii = 1 to 20 ;

perms{ii} = both{ ins[ii] };

end;

perm_diff = mean(of xp1-xp10) - mean(of yp1-yp10);

perm_ge = (perm_diff >= obs_diff); * 1-tailed;

perm_2tail = (abs(perm_diff) >= abs(obs_diff)); * 2-tailed;

keep obs_diff perm_diff perm_ge perm_2tail;

* keep in1-in20 xp1-xp10 yp1-yp10 obs_diff perm_diff perm_ge;

output;

end;

proc print ;

run ;

proc freq data=perm_data;

title ‘NITROFEN: randomization test -> upper tail P-value’;

table perm_ge perm_2tail;

run;

NI TROFEN: randomization test -> upper tail P-value

e FREQ Procedure Cumulative Cumulative

< - - - P(upper) = 0.

Cumulative Cumulative

< - - - P(2 tail) = 0.

nd more complicated example: Randomization test 2 for spatial randomness

Th

perm_ge Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 3927 98.18 3927 98. 1 73 1.83 4000 100.

perm_2tail Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 3847 96.18 3847 96. 1 153 3.83 4000 100.

A 2

gnnobs = sumnnobs/4; * observed average NN distance;

talines;

oc print;

ta mccsr1; set plot1;

array nnobs nnobs1-nnobs4;

Generate a large number of CSR plots with 4 trees */

initialize counters of nn avg dist le or ge than observed;

isim = 1 to 1000;

do ii = 1 to 4;

ni(0);

Find NN distance for the simulated trees */

sumnncsr = 0;

00; * initialize;

do j=1 to 4;

m(i)-xsim(j))2 + (ysim(i)-ysim(j))2 );

sumnncsr=sumnncsr+nncsr(i);

end;

av

da

pr

da

array xobs xobs1-xobs4;

array yobs yobs1-yobs4;

array xsim xsim1-xsim4;

array ysim ysim1-ysim4;

array nncsr nncsr1-nncsr4;

/* CSR = completely spatially random */

numle = 0; numge = 0;

do

xsim(ii) = ranu

ysim(ii) = ranuni(0);

end;

do i=1 to 4;

nncsr(i) = 1

d=sqrt( (xsi

if (d<nncsr(i)) and (d>0) then nncsr(i)=d;

* output; * debugging;

end;

avgnncsr = sumnncsr/4; 14

late counts of patterns consistent with regularity/aggreg.

ile = (avgnncsr <= avgnnobs);

numle = numle + ile;

drop i j ii xobs1-xobs4 yobs1-yobs4 nnobs1-nnobs

d; * if the isim - simulation loop;

proc print;

oc freq;

ige;

The FREQ Procedure

Cumulative Cumulative

Accumu

ige = (avgnncsr >= avgnnobs);

numge = numge + ige;

sumnnobs sumnncsr;

output;

en

pr

table ile

run;

ile Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 46 4.60 46 4. 1 954 95.40 1000 100.

ige Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 954 95.40 954 95. 1 46 4.60 1000 100.

RETAIN

* tough to perform calculations across observations

alue to a variable but to remember a

* SAS normally initializes each variable to missing

* RETAIN instructs system not to assign a missing v

different value

data retain_demo1;

input dobs time x;

example: find the average weight by subject using

DATA step programming

/* STEP 1: read in the data file */

data diet;

input id @3 date mmddyy8. weight;

format date mmddyy8.;

datalines;

proc print;

title ‘diet data’;

run;

diet data Obs id date weight 1 1 10/01/92 155 2 1 10/08/92 158 3 1 10/15/92 158 4 1 10/22/92 158 5 2 09/02/92 200 6 2 09/09/92 198 7 2 09/16/92 196

8 2 09/23/92 202

data diet2; set diet;

if n= 1 then total=weight;

else if id = lag(id) then total=total+weight;

else if id NE lag(id) then total = weight; 17

proc print;

run;

Obs id date weight total 1 1 10/01/92 155 155 2 1 10/08/92 158 158 3 1 10/15/92 158. 4 1 10/22/92 158. 5 2 09/02/92 200 200 6 2 09/09/92 198. 7 2 09/16/92 196. 8 2 09/23/92 202.

/* STEP 2:

accumulate the total weight measurements for an individual ID

data diet3; set diet;

retain total 0 ;

if id = lag(id) then total=total+weight;

else if id NE lag(id) then total = weight;

proc print ;

run ;

Obs id date weight total 1 1 10/01/92 155 155 2 1 10/08/92 158 313 3 1 10/15/92 158 471 4 1 10/22/92 158 629 5 2 09/02/92 200 200 6 2 09/09/92 198 398 7 2 09/16/92 196 594 8 2 09/23/92 202 796

STEP 3:

Calculate the average weight and output the desired data set

proc sort data=diet3; by id;

data diet4; set diet3; by id;

if LAST.id; * special variable LAST.id=1 if last value in BY;

wt_avg = total/4;

keep id wt_avg;

proc print ;

run ;

Obs id wt_avg 1 1 157.

data test;

input id xstart xstop;

datalines;

proc print;

run;

data test2; set test; by id;

array start{9} start1-start9;

array stop{9} stop1-stop9;

array times{9} times1-times9;

retain count 0;

retain start1-start9 stop1-stop9 times1-times9;

if FIRST.id=1 then do; * initialize count and arrays with new ID;

count = 0;

do ii=1 to 9;

start{ii} = .;

stop{ii} = .;

times{ii} = .;

end;

count = count + 1;

start{count} = xstart;

stop{count} = xstop;

times{count} = xstop - xstart;

if LAST.id=1 then output; * output results if last obs for ID;

drop xstart xstop ii;

run;

data test3; set test2;

total_time = sum(of times1-times9);

run;

proc print;

run;

material from

SAS Programming: Recoding Missing Values, Creating Observations, and Merging Data - Prof. , Exams of Statistics

Related documents

Partial preview of the text

Download SAS Programming: Recoding Missing Values, Creating Observations, and Merging Data - Prof. and more Exams Statistics in PDF only on Docsity!

Week 07/08 Class Activities 1

File: week-07-08-05oct05.doc

Directory (hp/compaq):

C:\baileraj\Classes\Fall 2005\sta402\handouts

See also: Chapter-07-26may05.doc (*.pdf) on Blackboard

based on:

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2004\sta402\handouts\week-07-08-

13oct04.doc

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2003\sta402\handouts\week7-

08oct03.doc

C:\Documents and Settings\John Bailer\My

Documents\baileraj\Classes\Fall 2003\sta402\handouts\week8-

15oct03.doc

SAS PROGRAMMING

* Arrays

* DO groups

* Statements: RETAIN, RENAME, LABEL, FORMAT, SUM

* Using formats in DATA steps

* Conditional execution

* More on missing values

Additional Ref: Cody, R. and Pass, R. (1995) SAS

Programming by Example. SAS

Institute Inc., Cary, NC. – Chapters 7 (“arrays”), 8 (“retain”), 5 (“SAS functions”)

ARRAYS

* look to use if writing the same set of code multiple times

* “arrays” can contain lists of variables

* “arrays” also good for restructuring data sets

Common example 1a: Recoding a set of variables^2

Suppose you have a data set “old_data” containing

Variables: a_var, b_var, var3, var4, var

(all numeric with missing values coded as -999)

Recode -999 as missing=.

data old_data;

input a_var b_var var3 var4 var5 @@;

datalines;

run;

data recode_ex; set old_data;

array all[5] a_var b_var var3 var4 var5;

do ii=1 to 5;

if all[ii] = -999 then all[ii]=.;

end;

drop ii;

/* can use either [], {}, () to reference array elements */

options nocenter nodate;

proc print;

run;

/* alternative to get SAS to count array size &

drop ii; 4

proc print;

title ‘Recode 3: Using NUMERIC to select elements’;

run;

Recoding both numeric and character values using arrays

Data D5;

input name $ sex $ t1 t2 t3 t4 t5 time6 time_7;

ARRAY num_array{*} NUMERIC;

ARRAY char_array{*} CHARACTER;

/* recode the numeric variables */

DO inum = 1 to dim(num_array);

if num_array{inum}=-999 then num_array{inum}=.;

END;

/* recode the character variables */

Do ichar = 1 to dim(char_array);

if char_array{ichar}="-999" then char_array{ichar}=" ";

END;

drop inum ichar;

datalines;

MrSmith -999 6 6 5 5 5 4 3

-999 F 7 -999 4 4 3 -999 2

proc print;

title "Recoding missing values using Arrays using DO loop";

run;

Recoding missing values using Arrays using DO loop

Obs name sex t1 t2 t3 t4 t5 time6 time_

1 MrSmith 6 6 5 5 5 4 3

ARRAY num_array{} NUMERIC*;

ARRAY char_array{} CHARACTER*;