SAS Data Sets: Subsetting, Variables, and Multiple Sets - Prof. James Davenport, Study notes of Statistics

How to access and modify existing sas data sets by creating subsets of observations and variables, keeping or dropping specific variables, and creating multiple data sets within a single data step. It also discusses the differences between using drop= and keep= options in the set and data statements.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-n8a
koofers-user-n8a 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Let’s assume a permanent SAS data set exists in the SAS data
library called "asdl" (libref name), and the data set is called
"one"
We already know how to create and examine the contents of
such a data set, but how do you access and modify/alter an
existing data set?
data two;
set asdl.one; /*This produces sequential
processing of the observations in one.*/
There are no input or cards statements!!!
While creating data two we can:
. create new variables via transformations
. choose selected observations
. choose selected variables
And we can create additional permanent SAS data sets within a
single Data Step.
(See Program SAS_ModifyVariables_census_data5.sas)
1
pf3
pf4
pf5

Partial preview of the text

Download SAS Data Sets: Subsetting, Variables, and Multiple Sets - Prof. James Davenport and more Study notes Statistics in PDF only on Docsity!

Let’s assume a permanent SAS data set exists in the SAS data library called "asdl" (libref name), and the data set is called "one" We already know how to create and examine the contents of such a data set, but how do you access and modify/alter an existing data set? data two; set asdl.one; /This produces sequential processing of the observations in one./ There are no input or cards statements!!! While creating data two we can:

. create new variables via transformations . choose selected observations . choose selected variables And we can create additional permanent SAS data sets within a single Data Step. (See Program SAS_ModifyVariables_census_data5.sas)

The following examples will use the following data set called “origins”. (See Program SAS_Create_origin_.sas) (See Program SAS_ProcContents_origin.sas) Let’s now focus on accessing a permanent SAS data set and form new SAS data sets that are subsets from the original data set. (Go over the “sub-setting” diagram) ***** Selecting subsets of observations: ***** firstobs = n & obs=m e.g. data asdl.origin_subsets1; set asdl.origin (firstobs=7); (first obs kept) or data asdl.origin_subsets3; set asdl.origin (obs=10); (last obs kept) or data asdl.origin_subsets2; set asdl.origin (firstobs=4 obs=10); this is the last observation read, not the number of observations (See program: SAS_Subset_obs_origin.sas) We could also use " IF statements" (see example). (See program: SAS_Subset_if_origin.sas) We will discuss this in more detail later in the semester.

■ Creating two Data Sets in a single Data Step no semicolon data asdl.origin_first(keep=name a b c d .... ) asdl.origin_second (keep=name a d e f .... ); set asdl.origin; In this case, you must use the KEEP= data set option. If you use the KEEP Statement, then all data sets created in this Data Step would contain the same variables. (See Program SAS_Two_Subsets_origin.sas) Differences in DROP = & KEEP= as options in the Data Statement vs the Set Statement ■ Using these options in the Set statement determines which variables are read from the permanent SAS data set being used as input; hence they determine how the program data vector is built. (Excluded variables are never read into the program data vector at all) ■ Using these options in the Data statement determines which variables are written from the program data vector to the resulting SAS data set.

■ You can use a variable from an input data set (the one read with the Set statement) to perform a calculation, and it must be in the program data vector in order for you to use it. But if you do not want the variable to appear in the resulting data set, then you can use the DROP= option in the Data statement to exclude it when the program data vector is written to the new data set. data asdl.first (keep=name a b c d) asdl.second (keep=name a d e f); set asdl.origin (drop=id); example of these options used in both statements the variable “id” is NEVER read into the program data vector. ■ In the Set Statement, this controls which variables are read into the program data vector ■ In the Data Statement, this controls which variables are written from the program data vector to the data set. NOTE: Using a DROP or KEEP Statement within a Data Step, is comparable to using DROP= or KEEP= options in the Data Statement. All variables are included in the program data vector; they are excluded when the observation is written