Statistical Programming in SAS: Basic Matrix Manipulations and Case Studies using IML - Pr, Study notes of Statistics

Examples of basic matrix manipulations and case studies using the iml procedure in sas. It includes selecting rows and columns, summing over rows and columns, creating matrices, and reading and writing sas data sets. Two case studies are presented: estimating pi using monte carlo integration and implementing a randomization test.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-0za
koofers-user-0za 🇺🇸

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistical Programming in SAS Bailer
Week 12 [30+ Nov.] Class Activities
File: week-12-IML-prog-16nov08.doc
Directory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts
From Chapter 10 - Programming with matrices and vectors - IML
10.1: Basic matrix definition + subscripting
10.2: Diagonal matrices and stacking matrices
10.3: Repeating, Element-wise operations and Matrix Multiplication
10.4 Importing SAS data sets into IML and exporting matrices from IML to data set
10.4.1: Creating matrices from SAS data sets and vice versa
10.5: CASE STUDY 1: Using IML to implement Monte Carlo integration to estimate π
10.6: CASE STUDY 2: IML to implement a bisection root finder
10.7: CASE STUDY 3: Randomization test using matrices imported from PLAN
10.8: CASE STUDY 4: IML module to implement Monte Carlo integration to estimate π
Summary
References
Exercises
SAS data sets are rectangular objects that can be manipulated
using various PROCs.
IML is short for Interactive Matrix Language and represents a
tool for manipulating matrices.
What is a matrix? A matrix can be thought of as a special case
of a data set.
A matrix is a rectangular object where all elements are of the
same data type (e.g. all numeric, all character). Thus,
is a matrix while is not.
=
174.418
157.315
111.217
A
=
1718
1515
1117
Jane
Jungle
George
B
Matrices, such as A, have properties including dimension corresponding to the number of rows
and the number of columns. In this simple example, A has 3 rows and 3 columns or has
dimension 3x3. Elements of a matrix are referenced by their row and column position, for
example, “11” is the (1,3) element of A, or A[1,3]=11. Matrices with one dimension equal to 1
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Statistical Programming in SAS: Basic Matrix Manipulations and Case Studies using IML - Pr and more Study notes Statistics in PDF only on Docsity!

Week 12 [30+ Nov.] Class Activities

File: week-12-IML-prog-16nov08.doc

Directory: \Muserver2\USERS\B\baileraj\Classes\sta402\handouts

From Chapter 10 - Programming with matrices and vectors - IML

10.1: Basic matrix definition + subscripting

10.2: Diagonal matrices and stacking matrices

10.3: Repeating, Element-wise operations and Matrix Multiplication

10.4 Importing SAS data sets into IML and exporting matrices from IML to data set

10.4.1: Creating matrices from SAS data sets and vice versa

10.5: CASE STUDY 1: Using IML to implement Monte Carlo integration to estimate π

10.6: CASE STUDY 2: IML to implement a bisection root finder

10.7: CASE STUDY 3: Randomization test using matrices imported from PLAN

10.8: CASE STUDY 4: IML module to implement Monte Carlo integration to estimate π

Summary

References

Exercises

SAS data sets are rectangular objects that can be manipulated

using various PROCs.

IML is short for Interactive Matrix Language and represents a

tool for manipulating matrices.

What is a matrix? A matrix can be thought of as a special case

of a data set.

A matrix is a rectangular object where all elements are of the

same data type (e.g. all numeric, all character). Thus,

is a matrix while is not.

A

Jane

Jungle

George

B

Matrices, such as A , have properties including dimension corresponding to the number of rows

and the number of columns. In this simple example, A has 3 rows and 3 columns or has

dimension 3x3. Elements of a matrix are referenced by their row and column position, for

example, “11” is the (1,3) element of A , or A [1,3]=11. Matrices with one dimension equal to 1

are called vectors. For example, , a 3x1 matrix, is called a column vector while

, a 1x3 matrix is called a row vector. Finally a 1x1 matrix Z=[12] is called a

scalar. Names of matrices in SAS IML are valid SAS names, 1-32 characters in length and can

begin of a letter or underscore and can continue with letters, numbers or underscores. In the next

example, we see how matrices can be defined in PROC IML and perform some basic

manipulations (e.g. extracting a row or column, summing over rows or columns.

X

Y = [ 17 15 18 ]

10.1: Basic matrix definition + subscripting

Display 10.1: Basic matrix manipulations in IML

PROC IML ;

*makes a 2x3 matrix;

C = { 1 2 3 , 4 5 6 };

print '2x3 example matrix C = {1 2 3,4 5 6} =' C;

*select 2nd row;

C_r2 = C[2,];

print '2nd row of C = C[2,] =' C_r2;

*select 3rd column;

C_c3 = C[,3];

print '3rd column of C = C[,3] =' C_c3;

*select last two columns;

Col23 = C[,2:3];

Print ‘Columns 2 and 3 of C =’ Col23;

*select the (2,3) element of C;

C23 = C[2,3];

print '(2,3) element of C = C[2,3] =' C23;

*makes a 1x3 matrix by summing over rows in each column;

C_csum=C[+,];

print '1x3 column sums of C = C[+,] =' C_csum;

*makes a 2x1 matrix by summing over columns in each rows;

C_rsum=C[,+];

print '2x1 row sums of C = C[,+] =' C_rsum;

run;

Display 10.2: Printed output from basic matrix manipulations in

IML

C

2x3 example matrix C = {1 2 3,4 5 6} = 1 2 3 4 5 6 C_R

and a 2x1 matrix by summing over columns in each row;

C1=C[+,];

C2=C[,+];

*makes a matrix (col. vector) out of second column of C;

F = C[,2];

print 'extract 2nd column of C into new vector (F) = C[,2] =' F;

*puts second column of c on diagonal;

D = DIAG( C[,2] );

print 'put 2nd column of C into a diagonal matrix (D) = DIAG(C[,2])

=' D;

*makes a vector out of the diagonal;

CC= VECDIAG(D);

print 'convert diagonal (of D) into vector (CC) = VECDIAG(D) =' CC;

*puts C next to itself - column binds C with itself;

E = C || C;

print 'Column bind C with itself yielding E = C||C =' E;

*puts a row of 2's below C - row bind ;

F = C // SHAPE(2,1,3);

print "Row bind C with vector of 2's (F) = C // SHAPE(2,1,3) =" F;

*creates a 6x6 matrix [C // C // C] || [C // C // C];

K = REPEAT(C,3,2);

print '6x6 matrix = ' K;

quit;

Display 10.4: Output from summarizing, extracting, shaping

and manipulation vectors and matrices in IML

C

2x3 example matrix C = {1 2 3,4 5 6} = 1 2 3 4 5 6 C 1x3 column sums of C = C[+,] = 5 7 9 C 2x1 row sums of C = C[,+] = 6 15 F extract 2nd column of C into new vector (F) = C[,2] = 2 5 D put 2nd column of C into a diagonal matrix (D) = DIAG(C[,2]) = 2 0 0 5 CC convert diagonal (of D) into vector (CC) = VECDIAG(D) = 2 5 Column bind C with itself yielding E = C||C = E

F

Row bind C with vector of 2's (F) = C // SHAPE(2,1,3) = 1 2 3 4 5 6 2 2 2 K 6x6 matrix = 1 2 3 1 2 3 4 5 6 4 5 6 1 2 3 1 2 3 4 5 6 4 5 6 1 2 3 1 2 3 4 5 6 4 5 6

10.3: Repeating, Element-wise operations and Matrix

Multiplication

Display 10.5: Matrix multiplication and element-wise

operations in IML

PROC IML;

C = {1 2 3,4 5 6}; * a 2x3 matrix;

D = {1, 1, 1}; * a 3x1 column vector;

* matrix multiplication – C post-multiplied by D;

row_sum = C*D;

print “row_sum = “ row_sum;

*raises each entry of columns 2 & 3 of C to the third power then

multiples by 3 and adds 3;

G = 3+3*(C[,2:3]##3);

print '3 + 3*(col2&3)^3 (G) = ' G;

*raises each entry of C to itself;

H = C ## C;

print 'raise C elements to itself (H) = C##C =' H;

*multiplies each entry of C by itself;

J = C # C;

print 'element-wise multiplication of C with itself (J) = C#C =' J;

quit;

Display 10.6: Output from IML Matrix multiplication and

element-wise operations

ROW_SUM

row_sum = 6 15 G 3 + 3*(col2&3)^3 (G) = 27 84 378 651

quit;

proc print data=n2;

title 'print of data constructed in IML';

run;

Display 10.8: IML plot of the total young vs. nitrofen concentration data Plot of #young vs. conc 40 ˆ ‚ ‚ ‚ ‚ * * 35 ˆ * ‚ * ‚ * * ‚ * * ‚ * * * 30 ˆ * * ‚ * * N ‚ u ‚ * * * * m ‚ * * b 25 ˆ e ‚ * r ‚ * * ‚ o ‚ * f 20 ˆ ‚ y ‚ o ‚ * u ‚ * n 15 ˆ * * g ‚ ‚ * ‚ * ‚ 10 ˆ ‚ ‚ ‚ * * ‚ * 5 ˆ * ‚ * ‚ ‚ ‚ 0 ˆ * ‚ ‚ ‚ Šƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒ 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 Nitrofen concentration print of data constructed in IML

print ‘Estimating PI using MC simulation methods with ‘ nsim

‘data points’;

print ‘PI-estimate = ‘ pi_est se_est pi_LCL pi_UCL;

quit;

Executing the program above generates the following output.

PI_EST SE_EST PI_LCL PI_UCL

PI-estimate = 3.19 0.025416 3.1391679 3.

10.6 CASE STUDY 2: IML to implement a bisection root

finder [OMIT]

The code executes until the interval is sufficiently narrow, here a convergence criterion of “ hi -

lo ” < 10

is set. The IML code in Display 10.11 generates a data set “ process ” containing the

( lo , hi ) pairs, i.e. the iteration history, which is then printed and plotted We often encounter

examples when we need to solve for roots for an equation, e.g. x such that f(x) = 0 when x is in

some interval, say [a,b], where f(a) < 0 < f(b) or f(a) > 0 > f(b). The following example solves

for the √3 by finding the root of f(x) = x-√3. (This example was used by my colleague Bob

Noble to illustrate IML coding in his class, and I have included it here.) The lower limit “ lo ” and

upper limit “ hi ” are specified to be 0 and 3, respectively. The bisection method finds the

midpoint of the current interval, mid =( lo + hi )/2, and checks to see if mid - √3>0, or equivalently,

“ mid^2 ” > 3. If it is, then our “guess” for the solution, mid , is too high and thus, we set “ hi ” to

“ mid ” and the process begins again. If not, then “ lo ” is set to “ mid ” and the process begins

again. (I know this is a lot of work to find a value that is one stroke on a calculator keypad;

however, this gives you the flavor of the example and demonstrates the flexibility of IML

coding.) This code also saves a matrix with the history of ( lo , hi ) intervals checked..

Display 10.11: Using the method of bisection to estimate √ 3

options ls=78 formdlim='-' nodate pageno= 1 ;

/* find sqrt(x = 3) using bisection

Author: Robert Noble, Miami University

proc iml;

x = 3;

hi = x;

lo = 0;

history = 0||lo||hi;

iteration = 1;

delta = hi - lo;

do while(delta > 1e-7);

mid = (hi + lo)/2;

check = mid**2 > x;

if check

then hi = mid;

else lo = mid;

delta = hi - lo;

history = history//(iteration||lo||hi);

iteration = iteration + 1;

end;

print mid;

create process var {iteration low high};

append from history;

The solution to x-√3=0 is x= 1.732… , i.e. √3=1.732, as we see in Display 10.12. This display also includes the iteration history of this bisection method. We see that the convergence criterion of 10

is achieved in iteration number 26. Display 10.12: Iteration history from using the method of bisection to estimate √ 3


MID

proc print data=process;

run;

/* output from PROC PRINT

Obs ITERATION LOW HIGH 1 0 0.00000 3. 2 1 1.50000 3. 3 2 1.50000 2. 4 3 1.50000 1. 5 4 1.68750 1. 6 5 1.68750 1. 7 6 1.68750 1. 8 7 1.71094 1. 9 8 1.72266 1. 10 9 1.72852 1. 11 10 1.73145 1. 12 11 1.73145 1. 13 12 1.73145 1. 14 13 1.73181 1. 15 14 1.73199 1. 16 15 1.73199 1. 17 16 1.73204 1. 18 17 1.73204 1. 19 18 1.73204 1. 20 19 1.73205 1. 21 20 1.73205 1. 22 21 1.73205 1. 23 22 1.73205 1. 24 23 1.73205 1. 25 24 1.73205 1. 26 25 1.73205 1.


symbol interpol=join value=#;

axis1 label=('Limits');

proc gplot data=process;

plot (low high)*iteration / overlay vaxis=axis1;

run;

quit;

libname class ' folder-containing-nitrofen-data ';

title “Randomization test in IML – Nitrofen conc 0 vs. 160 compared”;

data test; set class.nitrofen;

if conc=0 | conc=160;

proc plan;

factors test=4000 ordered in=20;

output out=d_permut;

run;

proc transpose data=d_permut prefix=in out=out_permut(keep=in1-in20); by

test;

run;

proc iml;

/* read SAS data in IML */

use class.nitrofen;

read all var { total conc } where (conc=0|conc=160) into nitro;

/* read the indices for generating the permutations into IML */

use out_permut;

read all into perm_index;

obs_vec = nitro[,1];

obs_diff = sum(obs_vec[1:10]) - sum(obs_vec[11:20]); * test statistic;

PERM_RESULTS = J(nrow(perm_index),2,0); * initialize results matrix;

do iperm = 1 to nrow(perm_index);

ind = perm_index[iperm,]; * extract permutation index;

perm_resp = obs_vec[ind]; * select corresponding obs;

perm_diff = sum(perm_resp[1:10]) - sum(perm_resp[11:20]);

PERM_RESULTS[iperm,1] = perm_diff; * store perm TS value/indicator;

PERM_RESULTS[iperm,2] = abs(perm_diff) >= abs(obs_diff);

end;

perm_Pvalue = PERM_RESULTS[+,2]/nrow(PERM_RESULTS);

print ‘Permutation P-value = ‘ perm_Pvalue;

Executing the code in Display 10.14 produces a Permutation P-value of 0.03575.

10.8 CASE STUDY 4: IML module to implement Monte Carlo

integration to estimate π

Display 10.15: IML module to estimate π using Monte Carlo

integration

options nocenter nodate;

proc iml;

/* MODULE TO ESTIMATE PI

  • Monte Carlo simulation used
  • Strategy:

Generate X~Unif(0,1) and Y~Unif(0,1)

Determine if Y <= sqrt(1-X*X)

PI/4 estimated by proportion of times condition above is true

  • INPUT

nsim

  • OUTPUT

estimate of PI along with SE and CI

start MC_PI(nsim);

temp_mat = J(nsim,2,0);

call randgen(temp_mat,'uniform');

temp_mat = temp_mat ||

(temp_mat[,2]<=

sqrt(J(nsim,1,1)-temp_mat[,1]#temp_mat[,1]));

pi_over4 = temp_mat[+,3]/nsim;

pi_est = 4*pi_over4;

se_est = 4sqrt(pi_over4(1-pi_over4)/nsim);

pi_LCL = pi_est - 2*se_est;

pi_UCL = pi_est + 2*se_est;

print 'Estimating PI using MC simulation methods with' nsim 'data

points';

print pi_est se_est pi_LCL pi_UCL;

finish MC_PI;

run MC_PI(400);

run MC_PI(1600);

run MC_PI(4000);

quit;

Display 10.16: Results from running with IML module to

estimate π using Monte Carlo integration

Estimating PI using MC simulation methods with 400 data points PI_EST SE_EST PI_LCL PI_UCL 3.29 0.0764183 3.1371635 3. Estimating PI using MC simulation methods with 1600 data points PI_EST SE_EST PI_LCL PI_UCL 3.16 0.0407308 3.0785384 3. Estimating PI using MC simulation methods with 4000 data points PI_EST SE_EST PI_LCL PI_UCL 3.15 0.0258723 3.0982554 3.