SAS Programming: Extracting & Manipulating Character Values - Prof. James Davenport, Study notes of Statistics

How to extract and manipulate character values in sas using various functions such as scan, left, trim and concatenation. It covers topics like isolating a portion of a character value using scan, aligning values using left, setting lengths of variables created by the scan function, combining character values using concatenation, removing interior blanks using trim, and handling truncated variables. It also discusses treating numbers as characters and the importance of using appropriate lengths for variables.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-ne6
koofers-user-ne6 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
*** Creating New Character Values ***
You can divide long values into pieces, combine existing values to make a
longer value, read pieces, and so on.
Extracting a Portion of a Character Value
How do you isolate one piece of a character variable? For example, the
value of OTHRGATE contains two cities; the city of arrival and the city of
departure. How do you divide that value so that you can create separate
variables for the two cities? The SCAN function gives you this capability.
The SCAN function selects a term from a character value; the term can be
any character sting and the divider for terms (called the delimiter) can be
any character or any list of characters:
The general syntax: SCAN(source,n,<list-of-delimiters>)
source: can be any kind of character expression, including character
variables, character constants, and so on.
n: the number of the term in the list to be selected from the source.
list-of-delimiters: gives one or more delimiters.
If you specify more that one delimiter, then the SAS system uses any of
them; if you omit the delimiter, the SAS system divides the words
according to a default list of delimiters (including the blank and some
special characters).
1
pf3
pf4
pf5

Partial preview of the text

Download SAS Programming: Extracting & Manipulating Character Values - Prof. James Davenport and more Study notes Statistics in PDF only on Docsity!

***** Creating New Character Values ***** You can divide long values into pieces, combine existing values to make a longer value, read pieces, and so on. Extracting a Portion of a Character Value How do you isolate one piece of a character variable? For example, the value of OTHRGATE contains two cities; the city of arrival and the city of departure. How do you divide that value so that you can create separate variables for the two cities? The SCAN function gives you this capability. The SCAN function selects a term from a character value; the term can be any character sting and the divider for terms (called the delimiter ) can be any character or any list of characters: The general syntax: SCAN( source , n ,< list-of-delimiters> ) source : can be any kind of character expression, including character variables, character constants, and so on. n : the number of the term in the list to be selected from the source. list-of-delimiters : gives one or more delimiters. If you specify more that one delimiter, then the SAS system uses any of them; if you omit the delimiter, the SAS system divides the words according to a default list of delimiters (including the blank and some special characters).

arvgate=scan(othrgate,1,’,’); deptgate=scan(othrgate,2,’,’); It’s better to use non-blanks – e.g. Rio de Janerio ( See program SAS_Scan-Left1_depart2.sas ) Aligning New Values Recall, when you create new character variables using assignment statements, the SAS system maintains the existing alignment. It does not do any truncation or padding. To left-align the values, use the LEFT function. deptgate=scan(othrgate,2,’,’); deptgate=left(deptgate); or deptgate=left(scan(othrgate,2,’,’)) SAS performs the innermost nested operation first. It uses that result as the argument of the next function. ( See program SAS_Scan-Left2_depart2.sas ) Assigning Lengths to Variables Created by the SCAN Function The SCAN function causes the SAS system to assign a length of 200 bytes to the result variable in an assignment statement. Most of the other character functions cause the target variable to have the same length as the original value.

Removing Interior Blanks ALLGATES contains many interior blanks. Why? When a character value is shorter than the length of the variable to which it belongs, the SAS system pads the value with trailing blanks. The length of USGATE is 13 bytes, but only San Francisco uses all of them. Therefore, the other values contain blanks at the end, and the value for Brazil is entirely blank. The SAS system concatenates USGATE and OTHRGATE without change; therefore the middle of ALLGATES contains blanks for most observations. Of course, most of the values of OTHRGATE also contain trailing blanks as well. If you concatenate COUNTRY after OTHRGATE, you will see these trailing blanks. To remove these interior blanks, use the TRIM function. General syntax: TRIM( source ) The TRIM function produces a value without the trailing blanks in the source. However, other rules about trailing blanks in the SAS system still apply. If the trimmed result is shorter than the length of the variable to which the result is assigned, the SAS system pads the result with new blanks as it makes the assignment. ( See program SAS_TrimAllGates_depart.sas )

Adding Additional Characters Notice that the values of ALLGATES come immediately together. In the observation for Brazil, the value of OTHRGATE comes at the beginning of the value. To make the result easier to read, concatenate a comma and blank between the trimmed value of USGATE and the value of OTHRGATE. Use the IF_THEN statement to equate the value of ALLGATES with that of OTHRGATE in the case of Brazil. ( See program SAS_CommaTrimAllGates_depart.sas ) Troubleshooting: When New Variables Appear Truncated What do you do, when you have concatenated values, and the result appears to have lost part of a value? Earlier we used the SCAN function to divide OTHRGATE into two new variables ARVGATE and DEPTGATE, with default lengths of 200 bytes. ( See program SAS_Scan-Left2_depart2.sas ) Now suppose we wish to “reverse” the division and put ARVGATE and DEPTGATE back together into a new variable called OTHRGATE2 using the concatenation operator. However, we forgot to use the TRIM function to extract the padded blanks. ( See program SAS_Truncation_depart.sas )

In this data set, HOTELRNK is a character variable with length 1 byte. If you are using list input, then place a LENGTH statement before the INPUT statement. The SAS system can change a number stored as a character to a numeric variable. You can do so without making changes in your program. The SAS system automatically produces a numeric value from the character value for use in the arithmetic expression; it also issues a note that the conversion occurred. The original variable remains unchanged.