









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An example of using sas procedures univariate and freq for univariate and bivariate analysis of data. It covers the use of options such as chisq, expected, deviation, and measures for testing independence and association between variables. The example includes creating contingency tables and calculating measures of association such as tau-c.
Typology: Exams
1 / 16
This page cannot be seen from the preview
Don't miss anything!










PROC UNIVARIATE < options > ; BY variables ; CLASS variable-1 <(v-options)> < variable-2 <(v-options)> > ...< / KEYLEVEL= value1 | ( value1 value2 ) >; VAR variables ; ID variables ; OUTPUT < OUT=SAS-data-set > < keyword1=names...keywordk=names > < percentile-options >;
Proc Options:DATA = , NOPRINT, PLOT, FREQ, NORMAL, PCTLDEF=, VARDEF=
ALPHA=, CIBASIC (TYPE= ALPHA=), MU0=, TRIM= (TYPE= ALPHA=) PCTLDEF = 1,2 3, 4 or 5 (5 methods of computing percentiles) VARDEF = DF, N, WEIGHT, WDF (divisor for computing variance) NORMAL computes the Shapiro-Wilk statistic W if n ≤ 2000 or the Kolmogorov-Smirnov statistic D if n > 2000 TYPE= LOWER, UPPER, TWOSIDED (specify type of confidence intervals) TRIM= list of intgers or fractions specifying amount of trimming ALPHA=.05 (for both CIBASIC and TRIM)
Class Options: The v-options are MISSING or ORDER=
ORDER = FREQ, DATA, INTERNAL, FORMATTED specifies how levels of the variable are ordered in the output
Keywords for OUTPUT:
N, NMISS, NOBS, MEAN, SUM, SD, VAR, SKEWNESS, KURTOSIS, SUMWGT, MAX, MIN, RANGE, Q3, MEDIAN, Q1, ORANGE, P1, P5, P10, P90, P95, P99, MODE, SIGNRANK, NORMAL, PCTLNAME= , PCTLPTS=, PCTLPRE=
Example:
proc univariate data=survey; class county; var acreage rainfall; output out=new mean=ave1 ave2 var= v1 v2; run;
PROC FREQ < options > ; BY variables ; TABLES requests < / options > ; EXACT statistic-options < / computation-options > ; TEST options ; OUTPUT < OUT=SAS-data-set > options ; WEIGHT variable < / options > ;
Proc Options:DATA= , ORDER= , FORMCHAR(1,2,7)=, NOPRINT
ORDER = FREQ, DATA, INTERNAL, FORMATTED specifies order of variable levels appear in the output FORMCHAR(1,2,7)=‘| −+’
TABLES Options: MISSING, LIST, OUT=
CHISQ EXPECTED, DEVIATION, CELLCH12, CUMCOL, MISPRINT, SPARSE, ALL NOFREQ, NOPERCENT, NOROW, NOCOL, NOCUM, NOPRINT BINOMIAL, TESTF=, TESTP= MEASURES, CL, ALPHA=, AGREE
TABLES Requests:
tables A tables AB tables A(B C) tables (A B)(C D) Tables (A B C)D tables A–C tables (A- -C)*D
data Color; input Eyes $ Hair $ Count @@; label Eyes =’Eye Color’ Hair =’Hair Color’; datalines; blue fair 23 blue red 7 blue medium 24 blue dark 11 green fair 19 green red 7 green medium 18 green dark 14 brown fair 34 brown red 5 brown medium 41 brown dark 40 brown black 3 ; proc freq data=Color order=data; weight Count; tables Hair / nocum testp=(30 12 30 25 3); title ’Hair Color of European Children’; run;
Application 2: Two-way Frequency Table with I populations and J categories
H 0 : p 1 j = p 1 j = · · · = pIj j = 1, 2 ,... , J
Example:
Groups p 11 p 12 p 13 Populations p 21 p 22 p 23
H 0 : p 11 = p 21 , p 12 = p 22 , p 13 = p 23
Application 3: Contingency tables constructed by multiple classification of a single random sample.
Measures of Association: suitable for measuring the strength of the depen- dency between nominal variables but are also applicable for ordinal variables. The above three measures of association are all derived from the Pearson chi-square statistic.
r − 1 ,
c − 1 }.
√ (r − 1)/r for an r × r table
Other Measures of Association: Many of these statistical measures also re- quire the assignment of a dependent variable and an independent variable, as the goal is to predict a rank (category) of an individual on the dependent variable given that the individual belongs to a certain category in the independent variable.
For calculating the following measures need to define pair of observations as con- cordant or discordant
Example:
In the following example, factor B could be a factor such as Marital Status and factor A could be a response to a question with three possibilities. Consider the table of frequencies is as follows:
b 1 b 2 b 3 Total a 1 8 16 31 55 a 2 9 18 74 101 a 3 34 23 17 74 Total 51 57 122 230
In this case, the column variable is called the independent variable with categories being classes, groups, or strata.
data ex8; input A $ B $ count @@; datalines; a1 b1 8 a1 b2 16 a1 b3 31 a2 b1 9 a2 b2 18 a2 b3 74 a3 b1 34 a3 b2 23 a3 b3 17 ;
PLOT Statement Options: HAXIS=, VAXIS= HAXIS=5 10 15 20 25 30 35 HAXIS=5 to 35 by 5 HAXIS=by 5 HAXIS=‘Kansas’ ‘Missouri’ ’Iowa’ ‘Illinois’ ‘Nebraska’ HAXIS=‘01MAY07’d to ‘01DEC07’d by month
HPOS=, VPOS=, HREF=, VREF=, HREFCHAR=, VREFCHAR=, BOX,, OVERLAY CONTOUR <=number-of-levels>
PLACEMENT=(expression(s)) PLACEMENT=(H=0, V=0,S=CENTER, L=1) PLACEMENT=(H=0 1) PLACEMENT=(H=0 1 -1V=1 -1) PLACEMENT=((s=right left:h=1 -1)(v=1 -1h=1 -1)) SPLIT=‘split-character ’
PENALTIES< (index-list) >=penalty-list PENALTIES(1)= PENALTIES(15 to 19)=2 3 4 10 15 25
Example:
proc plot data=mylib.fueldat; plot fuelroads; plot fuelroads=‘+’; plot fuelroads=‘’ $ st;
plot fuelroads=‘’ $ st/ haxis=0 to 20 by 2 placement=((s=right left:h=1 -1)(v=1 -1*h=1 -1));
title ‘Output from Sample Plot Statements’; run;
PROC CHART < option(s) >; BLOCK variable(s) < / option(s)>; BY variables; HBAR variable(s) < / option(s)>; PIE variable(s) < / option(s)>; STAR variable(s) < / option(s)>; VBAR variable(s) < / option(s)>;
Proc Options: DATA = , FORMACHAR= formchar $<($position(s)$)>$ = ‘formatting-character(s)’}
Options for: VBAR, HBAR, BLOCK, PIE, STAR MISSING, DISCRETE, TYPE=FREQ, PCT, FREQ, CPCT, SUM, MEAN SUMVAR = variable MIDPOINTS = values FREQ = variable AXIS = value
Options for: VBAR, HBAR, BLOCK only GROUP = variable SUBGROUP = variable LEVELS = n Defaults for TYPE= If TYPE= is omitted, the default is TYPE=FREQ except when SUMVAR= option is specified, in which case the default is TYPE=SUM
Example:
proc chart data=mylib.fueldat; vbar fuel; vbar fuel/midpoints =300 to 1000 by 100 type=percent; vbar incomgrp/discrete sumvar=fuel; vbar incomgrp/discrete sumvar=fuel type=mean subgroup=taxgrp; vbar incomgrp/discrete sumvar=fuel type=mean group=taxgrp; format incomgrp ing.; title ‘Illustrating HBAR statement in PROC CHART’; run;
Proc TABULATE Examples:
In the following examples, tables are presented that contain demographic informa- tion extracted from a data set containing the following variables:
Variable Description REGION code for region of the country CITYSIZE code for relative population size (S=small, M=medium, L=large) POP urban population
Each observation contains data for one city.
Most applications also use CLASS and VAR statements in addition to the PROC TABULATE statement.
In the examples below, the required table format is specified with a TABLE state- ment and the output produced by different TABLE statements are sketched:
Example 1: TABLE REGION,POP; POP SUM REGION NC 4650000. NE 6666000. SO 6864000. WE 8376000.
Example 2: TABLE REGION,CITYSIZEPOPSUM ;
CITYSIZE L M S POP POP POP SUM SUM SUM REGION NC 3750000.00 750000.00 15000. NE 5022000.00 1422000.00 222000. SO 4488000.00 2088000.00 288000. WE 5592000.00 2592000.00 192000.
Example 3: TABLE REGIONCITYSIZE,POPSUM ;
POP SUM REGION CITYSIZE NC L 3750000. M 750000. S 150000. NE CITYSIZE L 5022000. M 1422000. S 222000. SO CITYSIZE L 4488000. M 2088000. S 288000. WE CITYSIZE L 5592000. M 2592000. S 192000.
Example 5: TABLE PRODUCT, REGION CITYSIZE, SALETYPE*(QUANTITY AMOUNT) ;
PRODUCT A SALETYPE R W QUANTITY AMOUNT QUANTITY AMOUNT SUM SUM SUM SUM REGION NC 1250.00 31250.00 1250.00 25000. NE 1600.00 40000.00 1600.00 32000. SO 1880.00 47000.00 1880.00 37600. WE 1840.00 46000.00 1840.00 36800. CITYSIZE L 3190.00 79750.00 3190.00 63800. M 2440.00 61000.00 2440.00 48800. S 940.00 23500.00 940.00 18800.