





































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Bailer; Class: Statistical Programming; Subject: Statistics; University: Miami University-Oxford; Term: Fall (First Sem) 2008;
Typology: Assignments
1 / 45
This page cannot be seen from the preview
Don't miss anything!






































Directory: \Muserver2\USERS\B\baileraj\Classes\sta402\handouts
Formatting, basic DATA step manipulations and programming
variable
(One might argue that only two types are necessary since date and time data are essentially numeric but it is convenient to make this distinction when discussing formats
Display 7.1: data mrexample; Reading character and numeric variables with implicit and explicit formats
data mrexample; input species $ bodywt brainwt data mrexample2;length species $ 15; @@; input species bodywt brainwt @@; species bodywt brainwt species bodywt brainwt rabbit (^) 2.50 12.1 rabbit (^) 2.50 12. sheep (^) 55.50 175.0 sheep (^) 55.50 175. jaguar (^) 100.00 157.0 jaguar (^) 100.00 157. chimp (^) 52.16 440.0 chimp (^) 52.16 440. brachios (^) 87000.00 154.5 brachiosaurus (^) 87000.00 154. Rat (^) 0.28 1.9 rat (^) 0.28 1. mole (^) 0.12 3.0 mole (^) 0.12 3. Pig (^) 192.00 180.0 pig (^) 192.00 180. Common formats (numeric): w. (w.d) BESTw. COMMAw.d Ew. DOLLARw.d Display 7.3: DATA step displays of numeric formats data numeric_format_show; /* character formatting illustrated first */ test_num = 1277695.384; put 'BEST6. / BEST9. / BEST12.'; put test_num BEST6.; put test_num BEST9.; put test_num BEST12.; put '-------------------------------'; put 'COMMA7. / COMMA10.1 / COMMA11.3'; put test_num COMMA9.; put test_num COMMA12.1; put test_num COMMA13.3; put '-------------------------------'; put 'E7.'; put test_num E7.; put '-------------------------------'; put '7. / 10.1 / 11.3'; put test_num 8.; put test_num 12.1; put test_num 13.3; put '-------------------------------'; put 'DOLLAR7. / DOLLAR10.2'; put test_num DOLLAR9.; put test_num DOLLAR12.2;
run;
Display 7.4: Output from SAS LOG with examples using numeric formats BEST6. / BEST9. / BEST12. put to the log 1.28E6 1277695. -------------------------------^ 1277695. COMMA7. (^) 1,277,695/ COMMA10.1 / COMMA11. 1,277,695.4 1,277,695. ------------------------------- E7. -------------------------------^ 1.3E+
Display 7.5: Constructing a user-defined format for a numeric variable data toyexample; input literacy @@; literacy_too = literacy; -99 25.55 53 53.5 73.7 83^ datalines; 99.9 107. ; proc format; value literacyfmt 0-53=' First quartile' 53<-76=' 76<-90 =' Second quartile'Third quartile' 90<-100='Fourth quartile'. = 'Missing' data toyexample2; set toyexample;^ OTHER = 'Invalid'; format literacy literacyfmt.; ods rtf; proc print; proc means;^ run; var literacy literacy_too; run; ods rtf close;
Display 7.6: Output from printing dataset with a user-defined variable Obs literacy literacy_too (^1) Invalid -99. (^2) First quartile 25. (^3) First quartile 53. (^4) Second quartile 53. (^5) Second quartile 73. (^6) Third quartile 83. (^7) Fourth quartile 99. (^8) Invalid 107. (^9) Missing.
Display 7.7: Output from PROC MEANS Variable N Mean Std Dev Minimum Maximum literacy literacy_too
Display 7.8: Formats for reading common date-values into SAS variables data; input (^) @19 indate3 mmddyy. @26 indate4 ddmmyy8.;@1 indate1 date7. @9 indate2 date9. ; datalines; 30jun10 30jun2010 063010 30.06.10 ; ods rtf; proc print; ods rtf close;^ run; Why are 30jun10, 30jun2010, 063010 and 30.jun.10 all displayed as 18443? Display 7.9: Result of reading dates into SAS variables Obs indate1 indate2 indate3 indate (^1) 18443 18443 18443 18443
Dates are stored in SAS as the number of days from a selected point in time.
Display 7.12: Various Date formats illustrated data date_format_show; start = 0; put start date9.; today = 17700; put '-------------------------------'; * days since Jan 1, 1960; put 'DATE7. / DATE9.'; put today date7.; put today date9.; put '-------------------------------'; put 'DAY2. / DAY7.'; put today day2.; put today day7.; put '-------------------------------'; put 'EURDFDD8.'; put today eurdfdd8.; put '-------------------------------'; put 'MMDDYY8. / MMDDYY6.'; put today mmddyy8.; put today mmddyy6.; put '-------------------------------'; put 'WEEKDATE15. / WEEKDATE29.'; put today weekdate15.; put today weekdate29.; put '-------------------------------'; put 'WORDDATE12. / WORDDATE18.'; put today worddate12.; put today worddate18.; run; Display 7.13: Results from SAS LOG of displaying date using different formats 01JAN ------------------------------- DATE7. / DATE9. 17JUN08 17JUN ------------------------------- DAY2. / DAY7. (^1717) ------------------------------- EURDFDD8. 17.06.08 ------------------------------- MMDDYY8. 06/17/08 / MMDDYY6. (^061708) ------------------------------- WEEKDATE15. Tue, Jun 17, / 08 WEEKDATE29. -------------------------------^ Tuesday,^ June^ 17,^2008 WORDDATE12. Jun 17, 2008 / WORDDATE18. June 17, 2008
TIME also needs to be started as an elapsed number relative to some reference.
Display 7.18: Data transformations before fitting a polynomial regression model data nitrofen;
'\Muserver2\USERS\B\BAILERAJ\public.www\classes\sta402\data\ch2-^ infile dat.txt' firstobs=16 expandtabs missover pad ; input @17 conc 3. @49 total 2.; sqrt_total = sqrt(total); cconc = conc - 157; * transformed response variable;* construct mean-centered concentration; cconc2 = cconc*cconc; * quadratic term; ods rtf BODYTITLE; ods graphics on; proc reg data=nitrofen; model sqrt_total = cconc cconc2; * fit the polynomial reg. model; ods graphics off;^ run; ods rtf close;
Display 7.19: ODS statgraphics output from a polynomial regression
Boolean tricks to recode … Option 1: IF-THEN idino = 0; if (species="diplodoc" or species="tricerat" or species="brachios") then * define the indicator of dinosaurs; idino=1; Option 2: Boolean evaluation idino = (species="diplodoc" or species="tricerat" or species="brachios"); Display 7.20: Defining indicator variables for different treatments to fit an anova model options nodate formdlim="-"; data meat; input condition $ logcount @@; iPlastic = (condition= "Plastic"); iVacuum = (condition= "Vacuum"); iMixed = (condition= "Mixed"); iCO2 = (condition= "Co2"); Plastic^ datalines; 7.66 Plastic 6.98 Plastic 7. Vacuum Mixed 5.267.41 VacuumMixed 5.447.33 VacuumMixed 5.807. Co2 ; 3.51 Co2 2.91 Co2 3. title “bacteria growth under 4 packaging conditions”; ODS RTF bodytitle; proc print; title "Print to check indicator variable construction"; run; proc reg data=meat; title "Regression with indicator variables: alt. to one-way anova model"; model logcount = iPlastic iVacuum iMixed; run; proc glm data=meat; title "One-way anova model"; class condition; model logcount = condition; run; ods rtf close; Display 7.21: OUTPUT from PROC PRINT to confirm indicator variable coding
condition logcount iPlastic iVacuum iMixed iCO Plastic 7.66 1 0 0 0 Plastic 6.98 1 0 0 0
Example: Comparing coding when constructing categories (or why you test with missing values) Display 7.22: Constructing indicator variables for different categories of a numeric variable data toyexample; input literacy @@; cat_literacy1 = 1(0<literacy<=53) + 2(53<literacy<=76)
-99 25.55 73.7 83^ datalines; 99.9 107. ; ods rtf; proc print; ods rtf close;^ run;
Display 7.23: Output from PROC PRINT for the different indicators Obs liter cat_literacy1 cat_literacy2 cat_literacy3 cat_literacy (^1) -99.00 0 1.. (^2) 25.55 1 1 1 1 3 73.70 2 2 2 2 4 83.00 3 3 3 3 (^5) 99.90 4 4 4 4 (^6) 107.00 0 0.. (^7). 0 1..
Display 7.24: DATA block illustration for order of operations comparisons data preced_test; x1a = 322; x1b = (32)2; x2a = 3-2/2; x2b = (3-2)/2; x3a = -22; x3b = (-2)2; put '-------------------------'; put '| Order of operations |'; put '| illustrated put '-------------------------'; |'; put ' put '(32)2 = ' x1b; 322 = ' x1a; put ' put ' (3-2)/2 = ' x2b; 3-2/2 = ' x2a; put ' put ' (-2)2 = ' x3b; -22 = ' x3a; run; Display 7.25: Output from the SAS LOG for the order of operations illustration ------------------------- | | Orderillustrated of operations || ------------------------- 322 = 12 (32)2 3-2/2 == (^362) (3-2)/2 -22 == 0.5- (-2)**2 = 4
H 0 : μ 1 =μ 2 ,
T.S.: 1 2
1 2 s^1 n^1 n t Y Y stat (^) p +
1 2
2 1 12 2 22
n n s (^) p n s n s
assumes that the observations from the two populations are independent and that the two populations are normally distributed with a common variance.
STEP 1: start with a program that includes only comments.
Display 7.30: Pseudo-code/comments for the t-test simulation /* Problem: Explore whether t-test really is robust to violations of the equal variance assumption Strategy: See if the t-test operates at the nominal Type I error rate when the unequal variance assumption is violated / / specify the conditions to be generated / / generate data sets reflecting these conditions / / calculate the test statistic / / accumulate results over numerous simulated data sets */
STEP 2: identify variables that define simulation conditions
need to specify …