Download SAS IML Matrix Operations and Statistical Analysis - Prof. A. John Bailer and more Study notes Statistics in PDF only on Docsity!
Week 11+ “IML” Class Activities
File: week-11-19oct05.doc
Directory (hp/compaq):
C:\baileraj\Classes\Fall 2005\sta402\handouts
C:\Documents and Settings\John Bailer\My
Documents\baileraj\Classes\Fall 2004\sta402\handouts\week-11-
11nov04.doc
SAS/IML Programming
* Basic matrix concepts: rows, columns, scalars
* matrix operators
* subscripting
* matrix functions
* creating matrices from data sets and vice versa
* sample applications
Acknowledgments:
Thanks to Jim Deddens (UC) for sharing handouts that served
as the starting place for a number of the illustrations
contained herein (Deddens Handouts #17-22)
References:
SAS/IML User's Guide on SAS OnlineDoc, Version 8
(http://www.units.muohio.edu/doc/sassystem/sasonlinedocv8/sasdoc/sasht
ml/iml/index.htm)
Introductory remarks
* IML = Interactive Matrix Language – it is a programming
language for matrix manipulations (can control flow of
operations with commands (e.g. DO/END, START/FINISH, IF-THEN/
ELSE, etc.). Lots of built-in matrix manipulation functions
and subroutines are available.
* basic object = data matrix (so can perform operations on
entire matrices)
* can run interactively or store statements in a module
* can build functions/subroutines using these modules
Matrix Definitions and vocabulary
* Matrices have entries of the same type (either all numeric or character – note the difference
from a SAS data set which can include variables of either type).
* Matrix names are 1 to 8 characters (begin with letter or underscore and continue with letters,
numbers or underscores)
* Matrix possess property of DIMENSION (# rows and columns)
* ROW VECTOR = 1xn matrix
COLUMN vector = mx1 matrix
SCALAR = 1x1 matrix
Example 1: Basic matrix definition + subscripting
options nodate nocenter formdlim="-";
PROC IML ;
*makes a 2x3 matrix;
C = { 1 2 3 , 4 5 6 };
print '2x3 example matrix C = {1 2 3,4 5 6} =' C;
*select 2nd row;
C_r2 = C[2,];
print '2nd row of C = C[2,] =' C_r2;
*select 3rd column;
C_c3 = C[,3];
print '3rd column of C = C[,3] =' C_c3;
*select the (2,3) element of C;
C23 = C[2,3];
print '(2,3) element of C = C[2,3] =' C23;
*makes a 1x3 matrix by summing the columns;
C_csum=C[+,];
print '1x3 column sums of C = C[+,] =' C_csum;
*makes a 2x1 matrix by summing the rows;
C_rsum=C[,+];
print '2x1 row sums of C = C[,+] =' C_rsum;
run;
C
2x3 example matrix C = {1 2 3,4 5 6} = 1 2 3 4 5 6
CC= VECDIAG(D);
print 'convert diagonal (of D) into vector (CC) = VECDIAG(D) =' CC;
*exponentiates each entry;
DD = EXP(D);
print 'exponentiate D yielding DD = EXP(D) =' DD;
*puts c next to itself - column binds C with itself;
E = C || C;
print 'Column bind C with itself yielding E = C||C =' E;
*puts a row of 2's below C - row bind ;
F = C // SHAPE( 2 , 1 , 3 );
print "Row bind C with vector of 2's (F) = C // SHAPE(2,1,3) =" F;
*creates a 3x2 matrix with matrix entry C;
K = REPEAT(C, 3 , 2 );
print '6x6 matrix = ' K;
*raises each entry of columns 2 & 3 of C to the third power then multiples by
3 and adds 3;
G = 3 + 3 *(C[, 2 : 3 ]## 3 );
print '3 + 3*(col2&3)^3 (G) = ' G;
*raises each entry of C to itself;
H = C ## C;
print 'raise C elements to itself (H) = C##C =' H;
*multiplies each entry of C by itself;
J = C # C;
print 'elementwise multiplication of C with itself (J) = C#C =' J;
quit;
SAS OUTPUT...
C 2x3 example matrix C = {1 2 3,4 5 6} = 1 2 3 4 5 6
C 1x3 column sums of C = C[+,] = 5 7 9
C 2x1 row sums of C = C[,+] = 6 15
F extract 2nd column of C into new vector (F) = C[,2] = 2 5
D put 2nd column of C into a diagonal matrix (D) = DIAG(C[,2]) = 2 0 0 5
CC convert diagonal (of D) into vector (CC) = VECDIAG(D) = 2
5
DD exponentiate D yielding DD = EXP(D) = 7.3890561 1 1 148.
Column bind C with itself yielding E = C||C = E 1 2 3 1 2 3 4 5 6 4 5 6
F Row bind C with vector of 2's (F) = C // SHAPE(2,1,3) = 1 2 3 4 5 6 2 2 2 K 6x6 matrix = 1 2 3 1 2 3 4 5 6 4 5 6 1 2 3 1 2 3 4 5 6 4 5 6 1 2 3 1 2 3 4 5 6 4 5 6
G 3 + 3*(col2&3)^3 (G) = 27 84 378 651
H raise C elements to itself (H) = C##C = 1 4 27 256 3125 46656
J elementwise multiplication of C with itself (J) = C#C = 1 4 9 16 25 36
Example 3 More IML Matrix definitions
proc iml ;
c1 = shape( 1 , 5 , 1 ); * using shape function;
c2 = { 1 , 1 , 1 , 1 , 1 };
c3 = T({ 1 1 1 1 1 }); * transpose matrix;
c4 = T({[ 5 ] 1 }) ; * using repetition factors;
print 'shape(1,5,1) = ' c1;
print '{1,1,1,1,1} = ' c2;
print 'T({1 1 1 1 1}) = ' c3;
print 'T({[5] 1}) = ' c4;
quit;
C shape(1,5,1) = 1 1 1 1 1
C {1,1,1,1,1} = 1
Plot of #young vs. conc 40 ˆ ‚ ‚ ‚ ‚ * * 35 ˆ * ‚ * ‚ * * ‚ * * ‚ * * * 30 ˆ * * ‚ * * N ‚ u ‚ * * * * m ‚ * * b 25 ˆ e ‚ * r ‚ * * ‚ o ‚ * f 20 ˆ ‚ y ‚ o ‚ * u ‚ * n 15 ˆ * * g ‚ ‚ * ‚ * ‚ 10 ˆ ‚ ‚ ‚ * * ‚ * 5 ˆ * ‚ * ‚ ‚ ‚ 0 ˆ * ‚ ‚ ‚ Šƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒ 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320
Nitrofen concentration
print of data constructed in IML
print of data constructed in IML Obs COL1 COL2 COL3 COL 1 27 0 -157 24649 2 32 0 -157 24649
3 34 0 -157 24649 4 33 0 -157 24649 5 36 0 -157 24649 6 34 0 -157 24649 7 33 0 -157 24649 8 30 0 -157 24649 9 24 0 -157 24649 10 31 0 -157 24649 11 33 80 -77 5929 12 33 80 -77 5929 13 35 80 -77 5929 14 33 80 -77 5929 15 36 80 -77 5929 16 26 80 -77 5929 17 27 80 -77 5929 18 31 80 -77 5929 19 32 80 -77 5929 20 29 80 -77 5929 21 29 160 3 9 22 29 160 3 9 23 23 160 3 9 24 27 160 3 9 25 30 160 3 9 26 31 160 3 9 27 30 160 3 9 28 26 160 3 9 29 29 160 3 9 30 29 160 3 9 31 23 235 78 6084 32 21 235 78 6084 33 7 235 78 6084 34 12 235 78 6084 35 27 235 78 6084 36 16 235 78 6084 37 13 235 78 6084 38 15 235 78 6084 39 21 235 78 6084 40 17 235 78 6084 41 6 310 153 23409 42 6 310 153 23409 43 7 310 153 23409 44 0 310 153 23409 45 15 310 153 23409 46 5 310 153 23409 47 6 310 153 23409 48 4 310 153 23409 49 6 310 153 23409 50 5 310 153 23409
Example 5: sample applications I: estimating PI
proc iml ;
nsim = 4000;
temp_mat = J(nsim,3,0);
- Monte Carlo simulation used
- Strategy:
Generate X~Unif(0,1) and Y~Unif(0,1)
Determine if Y <= sqrt(1-X*X)
PI/4 estimated by proportion of times condition above is true
nsim
estimate of PI along with SE and CI
start MC_PI(nsim);
temp_mat = J(nsim, 3 , 0 );
do isim = 1 to nsim;
temp_mat[isim, ] = uniform( ); 1 0 * generate X ~ unif(0,1);
temp_mat[isim, 2 ] = uniform( 0 ); * generate Y ~ unif(0,1);
end;
temp_mat[, 3 ] = (temp_mat[, 2 ]<=
sqrt(J(nsim, 1 1 , )-temp_mat[, 1 ]#temp_mat[, 1 ]));
pi_over4 = temp_mat[+, 3 ]/nsim;
pi_est = 4 *pi_over4;
se_est = 4 sqrt(pi_over4( 1 -pi_over4)/nsim);
pi_LCL = pi_est - 2 *se_est;
pi_UCL = pi_est + 2 *se_est;
print '------------------------------------------------------------------';
print 'Estimating PI using MC simulation methods with ' nsim 'data points';
print 'PI-estimate = ' pi_est;
print 'SE(PI-estimate) = ' se_est;
print 'CI: [' pi_LCL ' , ' pi_UCL ']';
print '------------------------------------------------------------------';
finish MC_PI;
run MC_PI( 400 );
run MC_PI( 1600 );
run MC_PI( 4000 );
quit;
SAS OUTPUT... (edited)
NSIM timatin
PI_EST
SE_EST
Es g PI using MC simulation methods with 400 data points
PI-estimate = 3.
SE(PI-estimate) = 0.
PI_LCL PI_UCL CI: [ 2.911667 , 3.248333 ]
NSIM Estimating PI using MC simulation methods with 1600 data points
_EST
NSIM h 4000 data points
ation test for testing equality
PI_EST PI-estimate = 3.
SE SE(PI-estimate) = 0.
PI_LCL PI_UCL CI: [ 3.0423203 , 3.2076797 ]
Estimating PI using MC simulation methods wit
PI_EST PI-estimate = 3.
SE_EST SE(PI-estimate) = 0.
PI_LCL PI_UCL CI: [ 3.1473562 , 3.2486438 ]
Example 7: Using IML to construct a randomiz
of 2 populations (revisiting WEEK7 example)
/* RECALL – t-test previously conducted
proc ttest ;
title NITROFEN: t-test of ( 0 , 160 ) concentrations;
class conc;
var total;
run ;
NITROFEN: t-test of (0, 160) concentrations The TTEST Procedure Statistics Lower CL Upper CL Lower CL Upper CL d Dev Std Dev Std Err
10 28.827 31.4 33.973 2.4737 3.5963 6.5654 1. 10 26.612 28.3 29.988 1.6229 2.3594 4.3073 0. tal Diff (1-2) 0.2424 3.1 5.9576 2.2981 3.0414 4.4977 1.
Tests
Variable conc N Mean Mean Mean Std Dev St
total 0 total 160 to
T-
perm_Pvalue = PERM_RESULTS[+,2]/nrow(PERM_RESULTS);
print ‘Permutation P-value = ‘ perm_Pvalue;
from SAS OUTPUT...
PERM_PVALUE Permutation P-value = 0.
/* code for testing components */
print nitro;
print perm_index;
obs_vec = nitro[, 1 ];
print obs_vec;
ind = perm_index[ 1 ,];
print ind;
permdat = obs_vec[ind];
print permdat;
tranind = T(ind);
print obs_vec tranind permdat;
/* alternative coding */
obs_vec = shape(nitro[, 1 ], 1 , 20 );
obs_diff = sum(obs_vec[ 1 , 1 : 10 ]) - sum(obs_vec[ 1 , 11 : 20 ]);
print obs_vec obs_diff;
PERM_RESULTS = J(nrow(perm_index), 2 , 0 ); * initialize results
matrix;
do iperm = 1 to nrow(perm_index);
ind = perm_index[iperm,];
perm_resp = shape(obs_vec[ 1 ,ind], 1 , 20 );
perm_diff = sum(perm_resp[ 1 , 1 : 10 ]) - sum(perm_resp[ 1 , 11 : 20 ]);
PERM_RESULTS[iperm, 1 ] = perm_diff;
PERM_RESULTS[iperm, 2 ] = abs(perm_diff) >= abs(obs_diff);
end;
perm_Pvalue = PERM_RESULTS[+, 2 ]/nrow(PERM_RESULTS);
print 'Permutation P-value = ' perm_Pvalue;
Noble IML example 1: bisection method
options ls= 78 formdlim='-' nodate pageno= 1 ;
/* find sqrt(x = 3) using bisection */
proc iml ;
x = 3 ;
hi = x;
lo = 0 ;
history = 0 ||lo||hi;
iteration = 1 ;
delta = hi - lo;
do while(delta > 1e-7 );
mid = (hi + lo)/ 2 ;
check = mid** 2 > x;
if check
then hi = mid;
else lo = mid;
delta = hi - lo;
history = history//(iteration||lo||hi);
iteration = iteration + 1 ;
end;
print mid;
create process var {iteration low high};
append from history;
MID 20509
proc print data=process;
run ;
HIGH 0
5 4 1.68750 1.
7 1.71094 1.
15 1.73199 1.
21 20 1.73205 1.
/* output from PROC PRINT
Obs ITERATION LOW 1 0 0.00000 3. 2 1 1.50000 3. 3 2 1.500 00 2. 4 3 1.
6 5 1.68750 1. 7 6 1.68750 1. 8 9 8 1.72266 1. 10 9 1.72852 1. 11 10 1.73145 1. 12 11 1.73145 1. 13 12 1.73145 1. 14 13 1.73181 1. 15 14 1.73199 1. 16 17 16 1.73204 1. 18 17 1.73204 1. 19 18 1.73204 1. 20 19 1.73205 1.
22 21 1.73205 1. 23 22 1.73205 1.
1. Center and scale all variables
− ⎛^ −
n 1
Y x x x x
k
s n− 1 s n 1 s
1 L k
Yi
Y 1
R
X
y~
Note:
k
Y k
k
1
Y 1
1 0 Y
s
s x
s
s x
Ys n 1
β = − − L
s j n 1
j j
β = for j = 1 ,L,k
2. Let Xy~
X cI)
X
b (^1
R ′
R 1 R
X
X
X cI)
X
E[ b ]= ( ′ + −^ ′ β (i.e. biased estimation)
var (bR^ )= ( X^ ~′X^ ~+ cI)−^1 X^ ~′X^ ~ ( X^ ~′X^ ~+cI)−^1
OLS solution when c = 0
X
X
1 k 2 k 3 k
13 23 3 k
12 23 2 k
12 13 1 k
L
M M M O M
L
L
L
r r r
r r r
r r r
r r r
Example:
A. Suppose k = 3 and the indendepent variables are orthogonal to each other
then
0 1 0 var(b )
~ R
X X
B. Suppose k = 3 and the indendepent variables and two of the variables are
highly correlated with each other then
X)
X
X
X
Let c = 0.
X cI)
X
X(
X
X cI)
X
( -^1 -^1
R
var(b )
Illustration:
Measurements were taken on 17 U.S. Navy hospitals
Inde pendent va riables =
Average daily patient load
Monthly x-ray exposures
Monthly occupied bed days
Eligible population in area (divided by 1000)
Average length of patients’ stay in days
Response = Monthly labor hours
options ls= 78 formdlim='-' nodate pageno= 1 ;
data hospital;
input patient_load x_ray bed_days population stay_length labor_hours;
cards;
%macro find_c(datain=,y=,x=,dataout=);
| parameters |
| datain = dataset to be analyzed |
| y = response variable |
| x = list of independent variables |
| dataout = dataset containing ridge estimates |
proc iml;
/* read in data */
use &datain;
read all var {&x} into data;
/* center */
n = nrow(data);
k = ncol(data);
centered = data -j(n,n, 1 )*data/n;
/* scale */
r = j(k,k,. );
do i = 1 to k;
do j = 1 to k;
r[i,j] = (centered[,i]`*centered[,j])
/sqrt(centered[,i]*centered[,i]*centered[,j]*centered[,j]);
end;
end;
/* find c via bisection */
hi = 1 ;
lo = 0 ;
delta = hi - lo;
do while(delta > 1e-8 );
c = (hi + lo)/ 2 ;
maxvif = max(diag(inv(r+ci(k))rinv(r+ci(k))));
if maxvif > 5
then lo = c;
else hi = c;
delta = hi - lo;
end;
mattrib c label='Biasing constant';
print c;
create temp var {c};
append from c;
data null;
set temp;
call symput('c',c);
run;
proc reg data=hospital outest=&dataout noprint;
model &y = &x / ridge=&c;
run;
%mend ;
% find_c (datain=hospital,
y=labor_hours,
x=patient_load x_ray bed_days population stay_length,
dataout=ridge);
proc print data = ridge;
run ;
quit;
EPVAR_ RIDGE PCOMIT RMSE Intercept
.. 640.826 1957. MODEL1 RIDGE labor_hours 0.031808. 724.538 798.
stay_ labor_ opulation length hours
0.055738 1.69311 -4.07037 -392.649 - 063608 0.43146 2.24981 -171.973 -
Biasing constant
Obs MODEL TYPE _D
1 MODEL1 PARMS labor_hours 2
patient_ Obs load x_ray bed_days p
1 -19. 2 12.5864 0.
Example: Diagonalize a symmetric matrix:
(based on Deddens h.o.)
PROC IML;
A={1 2 3, 4 5 6,5 7 9}; * this inputs a matrix;
* this is the generalized inverse;
; * this computes the eigenvalues of B;
* M is a vector of eigenvalues;
* E is a matrix whose columns are;
* orthogonal eigenvectors;
print A;
G=GINV(A);
print G;
B=A`*A;
print B;
CALL EIGEN(M,E,B)
pr int M;
print E;
M=FUZZ(M);
print M;
BB=EDIAG(M)E`;
print BB;
RUN;
A 1 2 3 4 5 6 5 7 9
G -0.777778 0.6111111 -0. -0.111111 0.1111111 7.027E- 0.5555556 -0.388889 0.
B 42 57 72 57 78 99 72 99 126