



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
How to access and modify existing sas data sets by creating subsets of observations and variables, keeping or dropping specific variables, and creating multiple data sets within a single data step. It also discusses the differences between using drop= and keep= options in the set and data statements.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Let’s assume a permanent SAS data set exists in the SAS data library called "asdl" (libref name), and the data set is called "one" We already know how to create and examine the contents of such a data set, but how do you access and modify/alter an existing data set? data two; set asdl.one; /This produces sequential processing of the observations in one./ There are no input or cards statements!!! While creating data two we can:
. create new variables via transformations . choose selected observations . choose selected variables And we can create additional permanent SAS data sets within a single Data Step. (See Program SAS_ModifyVariables_census_data5.sas)
The following examples will use the following data set called “origins”. (See Program SAS_Create_origin_.sas) (See Program SAS_ProcContents_origin.sas) Let’s now focus on accessing a permanent SAS data set and form new SAS data sets that are subsets from the original data set. (Go over the “sub-setting” diagram) ***** Selecting subsets of observations: ***** firstobs = n & obs=m e.g. data asdl.origin_subsets1; set asdl.origin (firstobs=7); (first obs kept) or data asdl.origin_subsets3; set asdl.origin (obs=10); (last obs kept) or data asdl.origin_subsets2; set asdl.origin (firstobs=4 obs=10); this is the last observation read, not the number of observations (See program: SAS_Subset_obs_origin.sas) We could also use " IF statements" (see example). (See program: SAS_Subset_if_origin.sas) We will discuss this in more detail later in the semester.
■ Creating two Data Sets in a single Data Step no semicolon data asdl.origin_first(keep=name a b c d .... ) asdl.origin_second (keep=name a d e f .... ); set asdl.origin; In this case, you must use the KEEP= data set option. If you use the KEEP Statement, then all data sets created in this Data Step would contain the same variables. (See Program SAS_Two_Subsets_origin.sas) Differences in DROP = & KEEP= as options in the Data Statement vs the Set Statement ■ Using these options in the Set statement determines which variables are read from the permanent SAS data set being used as input; hence they determine how the program data vector is built. (Excluded variables are never read into the program data vector at all) ■ Using these options in the Data statement determines which variables are written from the program data vector to the resulting SAS data set.
■ You can use a variable from an input data set (the one read with the Set statement) to perform a calculation, and it must be in the program data vector in order for you to use it. But if you do not want the variable to appear in the resulting data set, then you can use the DROP= option in the Data statement to exclude it when the program data vector is written to the new data set. data asdl.first (keep=name a b c d) asdl.second (keep=name a d e f); set asdl.origin (drop=id); example of these options used in both statements the variable “id” is NEVER read into the program data vector. ■ In the Set Statement, this controls which variables are read into the program data vector ■ In the Data Statement, this controls which variables are written from the program data vector to the data set. NOTE: Using a DROP or KEEP Statement within a Data Step, is comparable to using DROP= or KEEP= options in the Data Statement. All variables are included in the program data vector; they are excluded when the observation is written