SAS Data Analysis: WHERE Statement and PROC SORT in PROC STEP, Summaries of Statistics

How to use the WHERE statement in PROC steps in SAS to read a subset of data and the importance of sorting data using PROC SORT. It includes examples and instructions on how to sort data in ascending and descending orders, print data with PROC PRINT, and write custom reports using DATA steps.

Typology: Summaries

2021/2022

Uploaded on 07/05/2022

gavin_99
gavin_99 🇦🇺

4.3

(73)

998 documents

1 / 49

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Summary Statistics in SAS
Math 3210
Dr. Zeng
Department of Mathematics
California State University, Bakersfield
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31

Partial preview of the text

Download SAS Data Analysis: WHERE Statement and PROC SORT in PROC STEP and more Summaries Statistics in PDF only on Docsity!

Summary Statistics in SAS

Math 3210 Dr. Zeng Department of Mathematics California State University, Bakersfield

Outline

  • PROC SORT
  • PROC PRINT
  • PROC MEANS
  • PROC FREQ
  • PROC SGPLOT

Example

You have a database containing information about well-known

painters. A subset of the data appears below. For each artist,

the data include the painter’s name, primary style, and nation

of origin. Suppose a day later you wanted to print a list of just

the impressionist painters.

Mary Cassatt Impressionism U

Paul Cezanne Post-impressionism F

Edgar Degas Impressionism F

Paul Gauguin Post-impressionism F

Claude Monet Impressionism F

Pierre Auguste Renoir Impressionism F

Vincent van Gogh Post-impressionism N

DATA Artists;

INFILE '/home/bzeng/my_content/Artists.dat'; INPUT Name $ 1-21 Genre $ 23-40 Origin $ 42;

RUN;

PROC PRINT DATA = Artists;

WHERE Genre = 'Impressionism'; TITLE 'Major Impressionist Painters'; FOOTNOTE 'F = France N = Netherlands U = US';

RUN;

Controlling the output data set

PROC SORT DATA= messy OUT= neat NODUPKEY DUPOUT=extraobs;

  • The DATA= and OUT= options specify the input and output data sets.
  • Without the DATA= option, SAS will use the most recently created data set. If you don’t specify the OUT= option, SAS will replace the original data set with the newly sorted version.
  • The NODUPKEY option tells SAS to eliminate any duplicate observations that have the same values for the BY variables.
  • When use the DUPOUT=option, SAS will put the deleted observations in that data set.

Ascending versus descending sorts

  • By default, SAS sorts data in ascending order,

from lowest to highest.

  • If you prefer the opposite order, add the

keyword DESCENDING to the BY statement before each variable that should be sorted in reverse order.

  • For example, the following statement tells SAS

to sort first by State (from A-Z) and then City (from Z to A) within State: BY State DESCENDING city;

DATA marine;

INFILE '/home/bzeng/my_content/Lengths.dat'; INPUT Name $ Family $ Length @@;

RUN;

  • Sort the data;

PROC SORT DATA = marine OUT = seasort

NODUPKEY;

BY Family DESCENDING Length;

PROC PRINT DATA = seasort;

TITLE 'Whales and Sharks';

RUN;

Note

  • The missing values are always low for both

numeric and character variables.

  • The NODUPKEY option eliminated a duplicate

observation for the whale shark.

  • Messages from the log:
  • Linguistic sorting: by default, upper and lowercase

letters will be sorted separately. To ignore case, you

can use the SORTSEQ=LINGUISTIC option with the

STRENGTH=PRIMARY suboption.

  • For example,

PROC SORT SORTSEQ=LINGUISTIC (STRENGTH=PRIMARY);

unsorted order ASCII

Linguistic Sort (strength=primaray) ella ANNA amanda amanda Zoe ANNA Zoe amanda ella ANNA ella Zoe

  • When numerals are sorted as character data, the

value “10” comes before “2”. To treat numerals as

their numeric equivalent, use the

NUMERIC_COLLATION=ON suboption.

  • For example, PROC SORT SORTSEQ=LINGUISTIC (NUMERIC_COLLATION=ON)

unsorted order ASCII

Linguistic Sort (NUMERIC_COLLATION=ON) 1500m freestyle 100m backstroke 50m freestyle

200m breaststroke 1500m freestyle 100m backstroke

100m backstroke 200m breaststroke 200m breaststroke 50mm freestyle 50m freestyle 1500m freestyle

DATA addresses; INFILE '/home/bzeng/my_content/Mail.dat'; INPUT Name $6. Street $18. City $9. State $6.; RUN;

PROC SORT DATA = addresses OUT = sortone SORTSEQ = LINGUISTIC (NUMERIC_COLLATION = ON); BY Street; PROC PRINT DATA = sortone; TITLE 'Addresses Sorted by Street'; RUN;

PROC SORT DATA = addresses OUT = sorttwo SORTSEQ = LINGUISTIC (STRENGTH = PRIMARY); BY State; PROC PRINT DATA = sorttwo; TITLE 'Addresses Sorted by State'; RUN;

Practice: what will the result be without using these options?

Printing Your Data with PROC PRINT

By default, SAS prints the observation numbers along with

the variables’ values. If you don’t want observation

numbers, use the NOOBS option in the PROC PRINT

statement.

For example,

PROC PRINT DATA= data-set NOOBS;

The following are optional statements that sometimes

come in handy:

Example

Students from two fourth-grade classes are selling candy to

earn money for a special field trip. The class earning more

money gets a free box of candy. The following are the data

for the results of the candy sale. The students’ names are

followed by their classroom number, the data they turned

in their money, the type of candy: mint patties or chocolate

dinosaurs, and the number of boxes sold. The teachers

want a report giving the money earned for each classroom,

the money earned by each student, the type of candy sold,

and the date the students returned their money.

DATA sales;

INFILE '/home/bzeng/my_content/CandySales.dat'; INPUT Name $ 1-11 Class @15 DateReturned MMDDYY10. CandyType $ Quantity;

Profit = Quantity * 1.25;

PROC SORT DATA = sales;

BY Class;

PROC PRINT DATA = sales;

BY Class; SUM Profit; VAR Name DateReturned CandyType Profit; TITLE 'Candy Sales for Field Trip by Class';

RUN;