Macro Programming in SAS: Trapezoidal Rule and Data Set Concatenation - Prof. A. John Bail, Study notes of Statistics

Examples of macro programming in sas, including the implementation of the trapezoidal rule for numerical integration and the concatenation of data sets. The trapezoidal rule is used to estimate the area under a curve, while the concatenation of data sets combines multiple data sets into one. The document also demonstrates the use of macro variables, arrays, loops, and conditional logic.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-174
koofers-user-174 🇺🇸

10 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistical Programming in SAS Bailer
Week 10-11 [17+ Nov.] Class Activities
File: week-10-11-MACRO-prog-16nov08.doc
Directory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts
From: Chapter 9 - MACRO programming
9. MACRO programming
0. What is a macro and why would you use it?
1. Motivation for Macros: numerical integration to determine P(0<Z<1.645)
2. Macro processing
3. Macro variables
4. Conditional execution, looping and macro programs
5. Debugging macro coding and programming
6. Saving macros - %include +autocall+stored compiled macros
7. Functions/routines of potential interest to macro programmers - %index, %length,
%eval, symput, symget
Exercises
9.0 What is a macro and why would you use it?
* The macro processor is a text processor that is built into SAS.
Q: Why learn more about this text processor?
A: You are executing the same programs with minor
modification with some regularity.
9.1 Motivation for Macros: numerical integration to determine
P(0<Z<1.645)
Trapezoidal rule approximates the area under a curve, f(x), over
some specified limits of integrations, say “low” and “high”, by
summing the areas of a collection of adjacent trapezoids
constructed for a collection of points [x1,f(x1)] [x2,f(x2)], … ,
[xk,f(xk)] where x1=”low”<x2< … < xk-1 < xk=”high” (see Burden
and Faires (1989) for more description of numerical integration
methods).
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Macro Programming in SAS: Trapezoidal Rule and Data Set Concatenation - Prof. A. John Bail and more Study notes Statistics in PDF only on Docsity!

Week 10-11 [17+ Nov.] Class Activities

File: week-10-11-MACRO-prog-16nov08.doc

Directory: \Muserver2\USERS\B\baileraj\Classes\sta402\handouts

From: Chapter 9 - MACRO programming

9. MACRO programming

0. What is a macro and why would you use it?

1. Motivation for Macros: numerical integration to determine P(0<Z<1.645)

2. Macro processing

3. Macro variables

4. Conditional execution, looping and macro programs

5. Debugging macro coding and programming

6. Saving macros - %include +autocall+stored compiled macros

7. Functions/routines of potential interest to macro programmers - %index, %length,

%eval, symput, symget

Exercises

9.0 What is a macro and why would you use it?

  • The macro processor is a text processor that is built into SAS.

Q: Why learn more about this text processor?

A: You are executing the same programs with minor

modification with some regularity.

9.1 Motivation for Macros: numerical integration to determine

P(0<Z<1.645)

Trapezoidal rule approximates the area under a curve, f(x), over

some specified limits of integrations, say “low” and “high”, by

summing the areas of a collection of adjacent trapezoids

constructed for a collection of points [ x 1 , f(x1) ] [ x2,f(x2) ], … ,

[ x (^) k,f(xk) ] where x 1 =”low”< x 2 < … < xk-1 < xk =”high” (see Burden

and Faires (1989) for more description of numerical integration

methods).

If xi+1 - xi = h for i=1,…,k-1, then the estimated area can be

written as A = { f x + f x + + f xk − + f ( xk )}× h

( ) ( ) ( )^1

1 2 L^1. This area

estimator is implemented in the program listed in Display 9.

where k =25, x 1 =0, xk =1.645, h =( xk - x 1 )/24 and exp( / 2 )

f ( x )= − x^2

Display 9.1: Program for estimating P(0<Z<1.645) using the

trapezoidal rule

Calculate P(0 < Z < 1.645) using the trapezoidal rule

data trapper;

retain trapsum 0;

array x_value(25) x1-x25;

array f_value(25) y1-y25;

low = 0;

high = 1.645;

incr = (high-low)/24;

pi = arcos(-1);

do i= 1 to 25;

x_value[i] = low + incr*(i-1);

f_value[i] = (1/sqrt(2pi))exp(-x_value[i]*x_value[i]/2);

if i=1 or i=25 then trapsum = trapsum + f_value[i]/2;

else trapsum = trapsum + f_value[i];

end;

area_est = trapsum*incr;

output;

ods rtf bodytitle;

proc print data=trapper;

title "Trapezoidal Rule area estimate for P(0 < Z < 1.645)";

var low high incr area_est;

run;

data trapper2;

set trapper;

array x_value(25) x1-x25;

array f_value(25) y1-y25;

do ii=1 to 25;

xout = x_value[ii];

yout = f_value[ii];

output;

end;

proc print data=trapper2;

title "Interpolation points for Trapezoidal Rule";

var ii low high incr area_est xout yout;

run;

Obs ii low High incr area_est xout Yout

[Q] could you modify this program to have vertical lines

connecting the density values to the x-axis and connect the

density values to display the trapezoids being summed to obtain

the estimated area? It would be even cooler to superimpose the

true density.

Display 9.4: Output containing the [ x,f(x) ] values used in the

trapezoidal rule estimated of P(0<Z<1.645)

yout

xout

Plot of function values vs. x-values

A number of questions arise at this point. We have a very

specific implementation for a particular problem. What might

you want to do now?

Q (^) 1: How hard would it be to modify this code if we wanted

k =50 points vs. k =25 points?

Q (^) 2: Could we generalize this code to specify different limits

of integration?

Q (^) 3: Could we set up a “switch” to produce graphics only

when requested?

Q (^) 4: Is it possible to package this estimator into a callable

“subroutine” or “function?”

Q (^) 5: Could we generalize this code for any arbitrary (corrected

coded) function?

(For the impatient, the answers to these questions are: not too hard, easily, easily, yes and yes,

respectively. The common denominator to respond to all the questions is that macros in SAS

will be employed. )

9.2 Macro processing

and this interchange will continue until the input stack has been

completely processed. This will be clearer when we look at how

code with macro statements and variables are resolved.

9.3 Macro variables, parameters and functions

A macro variable …

  • can be referenced anywhere in a SAS program (other than in

data lines that are being read into SAS).

  • can be assigned text values, and the value of such a variable is

stored in either local or global symbol tables.

  • is named using a valid SAS name

  • is preceded by an & when it is referenced in a SAS program.

  • if it follows text, then it is referenced as text&macro-

variable.

  • if it precedes text, then it is referenced with a period delimiting

the value from text, i.e. &macro-variable.text.

  • can be “automatic macro variables,” while a program can

define others, so-called “user-defined macro variables.”

How will you usually start using MACROS in SAS?

A: most likely use macro variables to define simple

substitutions in a program. In Display 9.5, we modify the

trapezoidal numerical integration routine to address Q 1 and Q (^2)

from above -

Q 1 : How hard would it be to modify this code if we

wanted k =50 points vs. k =25 points?

Q 2 : Could we generalize this code to specify different

limits of integration?

Use %LET to assign values to macro variables NPTS (the macro

variable for k ), along with LOW and HIGH (the macro variables

that correspond to the limits of integration).

Display 9.5: Replacing the limits of integration and number of

points to evaluation in a trapezoidal rule estimation of

P(low < Z < high).

/* Calculate P(low < Z < high) using the trapezoidal rule */

%let npts = 50;

%let LOW = -1.645;

%let HIGH = 1.645;

data trapper;

file "C:\Users\baileraj\BAILERAJ\BOOK-stat-prog-may05\ch-

09\est.out" MOD;

retain trapsum 0;

array x_value(&npts) x1-x&npts;

array f_value(&npts) y1-y&npts;

low = &LOW;

high = &HIGH;

incr = (high-low)/( &npts -1);

pi = arcos(-1);

do i= 1 to &npts;

x_value[i] = low + incr*(i-1);

f_value[i] = (1/sqrt(2pi))exp(-x_value[i]*x_value[i]/2);

if i=1 or i=&npts then trapsum = trapsum + f_value[i]/2;

else trapsum = trapsum + f_value[i];

end;

area_est = trapsum*incr;

output;

put;

put "est. P(&LOW < Z < &HIGH) =" area_est "(based on &NPTS

points)";

put;

ods rtf bodytitle;

proc print data=trapper;

title "Trapezoidal Rule area estimate for P(&LOW < Z < &HIGH)";

title2 "(based on &NPTS equally spaced points)";

var low high incr area_est;

run;

equivalent statements of the DO-END data step statements, here

%DO and %END.

9.4 Conditional execution, looping and macros

  • %LET macro statement above can be used anywhere in a

program, in so-called “open code.”

  • Other macro statements such as %do-%end and %if-

%then, can only be used in the context of a defined macro.

  • A macro is a collection of commands that are evaluated by

the macro processor when invoked. The general construction

of a macro involves the declaration of the macro along with

any variables that might be passed to it, the statements that are

to be invoked in the macro, and statement declaring the end of

the macro. In particular, a macro program looks like the

following

%macro prog-name;

%mend prog-name;

which is invoked by a reference to %prog-name omitting the

“;” here. The semi-colon is not needed. The %prog-name is

talking to the macro processor, not to other SAS components.

Parameters can be passed to macro programs. (Parameter has technical

meaning to the statistical community and a different meaning to the programming community.

Here, a parameter is simply an argument to a macro whose value may influence what a macro

does.)

First macro programming example: we explore complicated

macro variable constructions. Consider the example in Display 9.8 in which two

macro variables need to be resolved to define a third macro variable. Here, we have variables

that are hypothetical values of weights (weight1, weight2) measured on two separate

occasions (week1, week2). In this simple macro, we use two macro parameters

(variable, obs) to identify the variable and occasion of interest. Our macro is designed to

echo the input to the LOG and to print the value of the variable of interest.

Display 9.8: A simple macro to demonstrate resolution of a more complex macro variable name

%let var1=week;

%let var2=weight;

%let time1=1;

%let time2=2;

%let var1time1 = week1;

%let var1time2 = week2;

%let var2time1 = weight1;

%let var2time2 = weight2;

data tester;

input week1 weight1 week2 weight2;

datalines;

%macro showvalue(variable, obs);

%put Value of '&variable' = &variable;

%put Value of '&obs' = &obs;

%put Value of '&&&variable&obs' = &&&variable&obs;

proc print;

var &&&variable&obs;

run;

%mend showvalue;

% showvalue (variable=var1, obs=time1)

% showvalue (variable=var2, obs=time2)

One interesting feature of this program is the construct &&&variable&obs which allows us to

make a few additional observations about macro variables. First, macro variable names are

resolved from left to right. Second, && is resolved to & by the macro processor. In this example,

when &variable=var1(=week) and &obs=time1(=1), &&&variable&obs has value

var1time1(=week1). The SAS LOG from the first invocation of this macro is given in

Display 9.9.

Display 9.9: Output from LOG associated with invocating the macro showvalue

%showvalue(variable=var1, obs=time1) Value of '&variable' = var Value of '&obs' = time Value of '&&&variable&obs' = week

Macro defined below (trap_area_Z):

As part of the trap_area_Z macro construction, we explicitly

address the two questions:

Q 3 : Could we set up a “switch” to produce graphics only

when requested?

Q 4 : Is it possible to package this estimator into a callable

“subroutine” or “function?”

We start with the naming of this macro along with input

parameters. These input parameters are assigned default values

in the macro declaration.

  • macro comments were added to describe the input parameters

to this macro. These comments begin with a %* and end with a

semi-colon. (Aside: Now if you used standard comments, e.g. start with an asterisk and

end with a semi-colon, these would be displayed on the LOG. The macro comments won’t be

displayed until particular options are set.)

* We have already converted hard-coded values into macro variables in Display 9.5. These were

assigned values in Display 9.5 using %LET statements. In the macro below, control of printing,

graph generation and the use of ODS RTF are all based on parameters of the macro. The %IF-

%THEN conditional statements are used to check whether these displays are requested. For

example, in the macro defined below, the block of code

%if &display_graph=TRUE %then %do;

proc gplot data=trapper2;

title "Plot of function values vs. x-values";

plot yout*xout;

run;

%end;

causes the lines

proc gplot data=trapper2;

title "Plot of function values vs. x-values";

plot yout*xout;

run;

to be included in the input stack for processing WHEN the macro parameter display_graph

= TRUE. In other words, the %IF-%THEN condition is evaluated by the macro processor. If

TRUE, then the code between the %DO and %END statements are processed as part of the input

stack. Finally, the macro has the option to evaluate the integral based on a range of values. This

is accomplished via the %do-%to-%by command paired with a %end statement, here

%do npts = &npts_lo %to &npts_hi %by &npts_by; *loop over npts values;

Putting these ideas together yields the macro that is given in Display 9.10.

Display 9.10: Macro for constructing a trapezoidal rule estimate

of the area under a standard normal curve

%macro trap_area_Z(LOW=-1.645, HIGH=1.645, npts_lo=10, npts_hi=10, npts_by=2,

fout=C:\Users\baileraj\BAILERAJ\BOOK-stat-prog-may05\ch-09\est3.out,

print_est=FALSE, print_pts=FALSE, display_graph=FALSE, ODS_on=FALSE);

%* Purpose: estimate P{LOW < Z < HIGH) using the trapezoidal rule;

%* Macro variables: ;

%* LOW, HIGH: interval of interest;

%* NPTS_LO, NPTS_HI, NPTS_BY: # function values evaluated in area calc.;

%* FOUT: output data file containing area estimate for each NPTS value;

%* PRINT_EST: display PROC PRINT with area estimate;

%* PRINT_PTS: display PROC PRINT with

%* DISPLAY_GRAPH: generate PROC GPLOT with function values;

%* ODS_ON: generate ODS RTF output;

%do npts = &npts_lo %to &npts_hi %by &npts_by; *loop over npts values;

data trapper;

file "&fout" MOD;

retain trapsum 0;

array x_value(&npts) x1-x&npts;

array f_value(&npts) y1-y&npts;

low = &LOW;

high = &HIGH;

incr = (high-low)/( &npts -1);

pi = arcos(-1);

do i= 1 to &npts;

x_value[i] = low + incr*(i-1);

f_value[i] = (1/sqrt(2pi))exp(-x_value[i]*x_value[i]/2);

if i=1 or i=&npts then trapsum = trapsum + f_value[i]/2;

else trapsum = trapsum + f_value[i];

end;

area_est = trapsum*incr;

output;

put "est. P(&LOW < Z < &HIGH) =" area_est "(based on &NPTS points)";

%if &ODS_ON=TRUE %then ods rtf bodytitle;

%if &print_est=TRUE %then %do;

proc print data=trapper;

title "Trapezoidal Rule area estimate for P(&LOW < Z < &HIGH)";

title2 "(based on &NPTS equally spaced points)";

var low high incr area_est;

run;

%end;

% ncheck (0.5)

The results from applying the range checking macro is given in Display 9.12.

Display 9.12: LOG displaying ncheck macro applied to 3 different arguments

%ncheck(1) Value of '&npts' = 1 %ncheck(-2) ERROR: '&npts' must exceed 1 ERROR: value of '&npts' = - %ncheck(0.5) ERROR: '&npts' must exceed 1 ERROR: value of '&npts' = 0.

We started with 5 questions. We have not yet addressed the 5

th

question: Q (^) 5: Could we generalize this code for any arbitrary

(corrected coded) function? We will not work through it here;

however, here is a hint to try at home. Where in the code is the

standard normal density function defined? Could you define a

macro variable that would contain this arbitrary function and

pass it as a parameter to some general trap_area macro?

Would you need to change more than one line in the macro

above?

As a final remark, if you were producing this as a macro that

would be used in a production environment, then you would

need to add checks for valid parameter values. How would you

do this in the trap_area_Z macro?

Now, I could claim that this code matches my first attempt at

constructing this macro; however, that would be a lie. Alas, the

truth is that debugging and error correction was required. The

next section touches on some basic strategies for exploring and

debugging errors in macro coding.

9.5. Debugging macro coding and programming

The first place to start exploring non-working macro code is to

write out the contents or values of macro variables.

  • Can write out to the LOG the value of a macro variable that

you have defined using the statement %PUT &macro-var-

name.

  • Can display the SAS automatic macro variables (%PUT

automatic;), the user-defined macros (%PUT user;)

or both types (%PUT all;). For debugging, either

specifying particular macro variables or all user-defined macro

variables would be more useful.

  • As an aside, it is interesting to see what macro variables are

defined by SAS (you might be interested in using some of these

variables in your programs). The SYSDATE, SYSDAY,

SYSTIME are useful for extracting date and time information

while SYSLAST is useful for defining a default data set use in a

macro.

Display 9.13 includes a request to display the values of macro

variables that are automatically defined in SAS.

Display 9.13: Subset of SAS automatic macro variables (edited from %put automatic;)

656 %put automatic; AUTOMATIC SYSDATE 06JUL AUTOMATIC SYSDATE9 06JUL AUTOMATIC SYSDAY Sunday AUTOMATIC SYSDSN WORK TRAPPER AUTOMATIC SYSERR 0 AUTOMATIC SYSLAST WORK.TRAPPER AUTOMATIC SYSPROCNAME GPLOT AUTOMATIC SYSTIME 09: AUTOMATIC SYSVER 9.

There are three main “options” that are useful for constructing

and debugging macros, namely, macrogen mprint

mlogic.

SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable LOW resolves to 0 SYMBOLGEN: Macro variable HIGH resolves to 1. SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable ODS_ON resolves to FALSE SYMBOLGEN: Macro variable PRINT_EST resolves to FALSE

NOTE: The file "C:\Users\baileraj\BAILERAJ\BOOK-stat-prog-may05\ch-09\est3.out" is: Filename=C:\Users\baileraj\BAILERAJ\BOOK-stat-prog-may05\ch-09\est3.out, RECFM=V,LRECL=256,File Size (bytes)=757, Last Modified=06Jul2008:10:39:57, Create Time=02Jul2008:15:41:

NOTE: 1 record was written to the file "C:\Users\baileraj\BAILERAJ\BOOK-stat-prog-may05\ch- 09\est3.out". The minimum record length was 54. The maximum record length was 54. NOTE: The data set WORK.TRAPPER has 1 observations and 17 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds

SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable NPTS resolves to 5 SYMBOLGEN: Macro variable PRINT_PTS resolves to FALSE SYMBOLGEN: Macro variable DISPLAY_GRAPH resolves to TRUE

NOTE: There were 1 observations read from the data set WORK.TRAPPER. NOTE: The data set WORK.TRAPPER2 has 5 observations and 20 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds

SYMBOLGEN: Macro variable ODS_ON resolves to FALSE

NOTE: There were 5 observations read from the data set WORK.TRAPPER2. NOTE: PROCEDURE GPLOT used (Total process time): real time 0.47 seconds cpu time 0.32 seconds

Display 9.15 presents a portion of the SAS LOG output when the mprint option is set

prior to the invocation of the trap_area_Z macro. This option writes the SAS code resulting

from the execution of a macro. After the looping over npts values, the first lines of this macro

include

data trapper;

file "&fout" MOD;

retain trapsum 0;

array x_value(&npts) x1-x&npts;

array f_value(&npts) y1-y&npts;

low = &LOW;

high = &HIGH;

incr = (high-low)/( &npts -1);

pi = arcos(-1);

do i= 1 to &npts;

x_value[i] = low + incr*(i-1);

f_value[i] = (1/sqrt(2pi))exp(-x_value[i]*x_value[i]/2);

if i=1 or i=&npts then trapsum = trapsum + f_value[i]/2;

else trapsum = trapsum + f_value[i];

end;

which results in the SAS LOG output

MPRINT(TRAP_AREA_Z): data trapper; MPRINT(TRAP_AREA_Z): file "C:\Users\baileraj\BAILERAJ\BOOK-stat-prog-may05\ch- 09\est3.out" MOD; MPRINT(TRAP_AREA_Z): retain trapsum 0; MPRINT(TRAP_AREA_Z): array x_value(5) x1-x5; MPRINT(TRAP_AREA_Z): array f_value(5) y1-y5; MPRINT(TRAP_AREA_Z): low = 0; MPRINT(TRAP_AREA_Z): high = 1.96; MPRINT(TRAP_AREA_Z): incr = (high-low)/( 5 -1); MPRINT(TRAP_AREA_Z): pi = arcos(-1); MPRINT(TRAP_AREA_Z): do i= 1 to 5; MPRINT(TRAP_AREA_Z): x_value[i] = low + incr(i-1); MPRINT(TRAP_AREA_Z): f_value[i] = (1/sqrt(2pi))exp(-x_value[i]x_value[i]/2); MPRINT(TRAP_AREA_Z): if i=1 or i=5 then trapsum = trapsum + f_value[i]/2; MPRINT(TRAP_AREA_Z): else trapsum = trapsum + f_value[i]; MPRINT(TRAP_AREA_Z): end;

The values of the macro variables are substituted here (e.g. 5 for &npts, 0 for &low, 1.96 for

&high, etc.), and this is the SAS code that is now processed and executed. While it doesn’t

have the nice indenting for ease of reading, it does show the code that will be executed.

Display 9.15: LOG output produced by the mprint option when using with the macro

invocation % trap_area_Z (LOW=0,HIGH=1.96,npts_lo=5,npts_hi=25,npts_by=5,

display_graph=TRUE)

MPRINT(TRAP_AREA_Z): data trapper; MPRINT(TRAP_AREA_Z): file "C:\Users\baileraj\BAILERAJ\BOOK-stat-prog-may05\ch-09\est3.out" MOD; MPRINT(TRAP_AREA_Z): retain trapsum 0; MPRINT(TRAP_AREA_Z): array x_value(5) x1-x5; MPRINT(TRAP_AREA_Z): array f_value(5) y1-y5; MPRINT(TRAP_AREA_Z): low = 0; MPRINT(TRAP_AREA_Z): high = 1.96; MPRINT(TRAP_AREA_Z): incr = (high-low)/( 5 -1); MPRINT(TRAP_AREA_Z): pi = arcos(-1); MPRINT(TRAP_AREA_Z): do i= 1 to 5; MPRINT(TRAP_AREA_Z): x_value[i] = low + incr(i-1); MPRINT(TRAP_AREA_Z): f_value[i] = (1/sqrt(2pi))exp(-x_value[i]x_value[i]/2); MPRINT(TRAP_AREA_Z): if i=1 or i=5 then trapsum = trapsum + f_value[i]/2; MPRINT(TRAP_AREA_Z): else trapsum = trapsum + f_value[i]; MPRINT(TRAP_AREA_Z): end; MPRINT(TRAP_AREA_Z): area_est = trapsum*incr; MPRINT(TRAP_AREA_Z): output;