SAS Data Manipulation: Creating and Modifying Variables - Prof. James Davenport, Study notes of Statistics

How to create and modify variables in sas using data steps, with a focus on adding information to observations and using if-then statements. It also covers efficient use of variable space and changing the action of a data step. Examples are provided using the 'travel' dataset.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-159
koofers-user-159 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
When creating more than one new data set in a single Data
Step, using the DROP= or KEEP= options in the Data
Statement allows you to drop or keep variables in each of
the new data sets.
A DROP or KEEP Statement, on the other hand applies to
all of these new data sets being created.
When you create permanent SAS data Sets, they should have:
1) Data Set Labels
2) Variable Labels
3) Variable formats (either SAS supplied or customized)
4) Variable Value Labels (if needed)
Additional labels and formats may be locally modified by
procedures
These can be done in:
a) PROC DATASETS
b) during a DATA STEP
can use Label statement
can use format statement
c) PROC FORMAT
Quick ways to find out what's in a SAS Data Set
1) PROC CONTENTS;
2) PROC PRINT LABELS;
3) PROC FREQ;
TABLES;
1
pf3
pf4
pf5

Partial preview of the text

Download SAS Data Manipulation: Creating and Modifying Variables - Prof. James Davenport and more Study notes Statistics in PDF only on Docsity!

When creating more than one new data set in a single Data Step, using the DROP= or KEEP= options in the Data Statement allows you to drop or keep variables in each of the new data sets. A DROP or KEEP Statement, on the other hand applies to all of these new data sets being created. When you create permanent SAS data Sets, they should have:

  1. Data Set Labels
  2. Variable Labels
  3. Variable formats (either SAS supplied or customized)
  4. Variable Value Labels (if needed) Additional labels and formats may be locally modified by procedures These can be done in: a) PROC DATASETS b) during a DATA STEP can use Label statement can use format statement c) PROC FORMAT Quick ways to find out what's in a SAS Data Set
  5. PROC CONTENTS;
  6. PROC PRINT LABELS;
  7. PROC FREQ; TABLES;

Assume a permanent SAS data set exists in the SAS data library called "save" (libref name), and the data set is called "travel" ***** How the SAS System Processes a DATA Step *****  During compilation, the SAS system reads the entire step through once from beginning to end o Checks syntax and organization and keyword spelling o Builds the program data vector, the input buffer (if needed), and the descriptive portion of the SAS data set (work.name)  During execution – consists of a loop – by default the SAS system goes through the data step loop for each observation it processes.  Most DATA Step statements tell the SAS system how to change or add to the input data, i.e. update a value that is currently entered  Some statements tell the SAS system more about the data set (such as storage space)  Some statements delete observations that are not needed, output additional observations, or change the order in which program statements are carried out.

***** Adding Information to Some Observations *** *** but Not Others ***** Basic use of the IF-THEN and ELSE statements data bonusinfo; set save.travel; if vendor = “Hispania” then remarks = ‘For 10+ people’; else if vendor = ‘Mundial’ then remarks = ‘Yes’; run; proc print data=bonusinfo; var country vendor remarks; title ‘The SAS System Creates Variables for All Observations’; run; Note: remarks is a character variable and because the first value that SAS encounters for remarks contains 14 characters, the SAS system set aside 14 bytes of storage in each observation for bonuspts, whether the actual value is a 14-character phrase or simply a blank. (See Program SAS_BonusInfo_travel.sas)

o Changing Information without Adding Variables Instead of creating a new variable, you can modify an existing variable. data newair2; set save.travel; aircost = aircost + 10; run; proc print data = newair2; var country aircost; title ‘Changing the Information in a Variable’; run; (See Program SAS_NewAirCost2_travel.sas) Using Variables Efficiently Avoid creating variables that have many empty cells or missing values (inefficient use of storage space) data tourinfo; /* inefficient use of variables */ set save.travel; if vendor = ‘Hispania’ then bonuspts = ‘For 10+ people’; else if vendor = ‘Mundial’ then bonuspts = ‘Yes’; else if vendor = ‘Major’ then discount = ‘For 30+ people’; run; proc print data=tourinfo; var country vendor bonuspts discount; title ‘Using Variables Inefficiently’; run; (See Program SAS_Tourinfo_travel.sas)

data newinfo; /* efficient use of variable space */ set save.travel; length remarks $ 30; if vendor = ‘Hispania’ then remarks = ‘For 10+ people’; else if vendor = ‘Mundial’ then remarks = ‘Yes’; else if vendor = ‘Major’ then remarks = ‘30+ people – contact for info’; run; proc print data=newinfo; var country vendor remarks; title ‘Using Variables Efficiently’; run; (See Program SAS_Tourinfo_New1_travel.sas) Changing the Action of a DATA Step Don’t forget, we can change the number of observations & variables ( see “DROP” & “KEEP” and “OBS” and “FIRSTOBS”) data subset; set save.travel; if country = ‘Peru’ then delete; run; (See Program SAS_Subset_travel.sas)