Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

SAS Procedure for Univariate and Bivariate Analysis: An Example, Exams of Statistics

Iowa State University (ISU)Statistics

An example of using sas procedures univariate and freq for univariate and bivariate analysis of data. It covers the use of options such as chisq, expected, deviation, and measures for testing independence and association between variables. The example includes creating contingency tables and calculating measures of association such as tau-c.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-m96 🇺🇸

10 documents

1 / 16

This page cannot be seen from the preview

Don't miss anything!

PROC UNIVARIATE < options > ;

BY variables ;

CLASS variable-1 <(v-options)> < variable-2 <(v-options)> >

...< / KEYLEVEL= value1 | ( value1 value2 ) >;

VAR variables ;

ID variables ;

OUTPUT < OUT=SAS-data-set >

< keyword1=names...keywordk=names > < percentile-options >;

Proc Options:DATA = , NOPRINT, PLOT, FREQ, NORMAL, PCTLDEF=, VARDEF=

ALPHA=, CIBASIC (TYPE= ALPHA=), MU0=, TRIM= (TYPE= ALPHA=)

PCTLDEF = 1,2 3, 4 or 5 (5 methods of computing percentiles)

VARDEF = DF, N, WEIGHT, WDF (divisor for computing variance)

NORMAL computes the Shapiro-Wilk statistic W if n≤2000

or the Kolmogorov-Smirnov statistic D if n > 2000

TYPE= LOWER, UPPER, TWOSIDED (specify type of confidence intervals)

TRIM= list of intgers or fractions specifying amount of trimming

ALPHA=.05 (for both CIBASIC and TRIM)

Class Options: The v-options are MISSING or ORDER=

ORDER = FREQ, DATA, INTERNAL, FORMATTED

specifies how levels of the variable are ordered in the output

Keywords for OUTPUT:

N, NMISS, NOBS, MEAN, SUM, SD, VAR, SKEWNESS, KURTOSIS, SUMWGT,

MAX, MIN, RANGE, Q3, MEDIAN, Q1, ORANGE, P1, P5, P10, P90, P95, P99,

MODE, SIGNRANK, NORMAL, PCTLNAME= , PCTLPTS=, PCTLPRE=

Example:

proc univariate data=survey;

class county;

var acreage rainfall;

output out=new mean=ave1 ave2 var= v1 v2;

run;

Discover Exams of Statistics Iowa State University (ISU)

Partial preview of the text

Download SAS Procedure for Univariate and Bivariate Analysis: An Example and more Exams Statistics in PDF only on Docsity!

PROC UNIVARIATE < options > ; BY variables ; CLASS variable-1 <(v-options)> < variable-2 <(v-options)> > ...< / KEYLEVEL= value1 | ( value1 value2 ) >; VAR variables ; ID variables ; OUTPUT < OUT=SAS-data-set > < keyword1=names...keywordk=names > < percentile-options >;

Proc Options:DATA = , NOPRINT, PLOT, FREQ, NORMAL, PCTLDEF=, VARDEF=

ALPHA=, CIBASIC (TYPE= ALPHA=), MU0=, TRIM= (TYPE= ALPHA=) PCTLDEF = 1,2 3, 4 or 5 (5 methods of computing percentiles) VARDEF = DF, N, WEIGHT, WDF (divisor for computing variance) NORMAL computes the Shapiro-Wilk statistic W if n ≤ 2000 or the Kolmogorov-Smirnov statistic D if n > 2000 TYPE= LOWER, UPPER, TWOSIDED (specify type of confidence intervals) TRIM= list of intgers or fractions specifying amount of trimming ALPHA=.05 (for both CIBASIC and TRIM)

Class Options: The v-options are MISSING or ORDER=

ORDER = FREQ, DATA, INTERNAL, FORMATTED specifies how levels of the variable are ordered in the output

Keywords for OUTPUT:

N, NMISS, NOBS, MEAN, SUM, SD, VAR, SKEWNESS, KURTOSIS, SUMWGT, MAX, MIN, RANGE, Q3, MEDIAN, Q1, ORANGE, P1, P5, P10, P90, P95, P99, MODE, SIGNRANK, NORMAL, PCTLNAME= , PCTLPTS=, PCTLPRE=

Example:

proc univariate data=survey; class county; var acreage rainfall; output out=new mean=ave1 ave2 var= v1 v2; run;

PROC FREQ < options > ; BY variables ; TABLES requests < / options > ; EXACT statistic-options < / computation-options > ; TEST options ; OUTPUT < OUT=SAS-data-set > options ; WEIGHT variable < / options > ;

Proc Options:DATA= , ORDER= , FORMCHAR(1,2,7)=, NOPRINT

ORDER = FREQ, DATA, INTERNAL, FORMATTED specifies order of variable levels appear in the output FORMCHAR(1,2,7)=‘| −+’

TABLES Options: MISSING, LIST, OUT=

CHISQ EXPECTED, DEVIATION, CELLCH12, CUMCOL, MISPRINT, SPARSE, ALL NOFREQ, NOPERCENT, NOROW, NOCOL, NOCUM, NOPRINT BINOMIAL, TESTF=, TESTP= MEASURES, CL, ALPHA=, AGREE

TABLES Requests:

tables A tables AB tables A(B C) tables (A B)(C D) Tables (A B C)D tables A–C tables (A- -C)*D

data Color; input Eyes $ Hair $ Count @@; label Eyes =’Eye Color’ Hair =’Hair Color’; datalines; blue fair 23 blue red 7 blue medium 24 blue dark 11 green fair 19 green red 7 green medium 18 green dark 14 brown fair 34 brown red 5 brown medium 41 brown dark 40 brown black 3 ; proc freq data=Color order=data; weight Count; tables Hair / nocum testp=(30 12 30 25 3); title ’Hair Color of European Children’; run;

Application 2: Two-way Frequency Table with I populations and J categories

Testing homogeneity of several multinomial populations.
Random samples are taken from each population and then classified by a cat- egorical variable
The populations are usually defined by levels of variables such as gender, age group, state, etc.
Levels of the categorical variable form the k categories of the multinomial populations
Null hypothesis of homogeneity of populations is

H 0 : p 1 j = p 1 j = · · · = pIj j = 1, 2 ,... , J

Example:

For example, suppose samples are drawn from two populations (say, males and females or persons below and above the age of 40) and they are grouped into three categories (say, according to three levels of support for a certain local bond issue).
Suppose that the multinomial probabilities for each population are as given in the following table:

Groups p 11 p 12 p 13 Populations p 21 p 22 p 23

Null hypothesis of homogeneity of populations is

H 0 : p 11 = p 21 , p 12 = p 22 , p 13 = p 23

Application 3: Contingency tables constructed by multiple classification of a single random sample.

Observations in a sample may be cross-classified by variables categorical vari- ables. - ordinal values: age, income, temperature - nominal values: gender, marital status, region
These variables form natural subsets or strata of the data.
These may also be generated from other quantitative variables such as population or income. - 3 income groups (say “low,” “middle,” or “high”) - according as income is < $30,000, between $30,000 and $70,000, or > $70,

Measures of Association: suitable for measuring the strength of the depen- dency between nominal variables but are also applicable for ordinal variables. The above three measures of association are all derived from the Pearson chi-square statistic.

phi coefficient The range is 0 < φ < min {

r − 1 ,

c − 1 }.

contingency coefficient C
1. value of C is zero if there is no association
2. value that is less than 1 even with perfect dependence
3. value is dependent on the size of the table
4. a maximum value of

√ (r − 1)/r for an r × r table

Cramer’s V is a normed measure, so its value is between 0 and 1

Other Measures of Association: Many of these statistical measures also re- quire the assignment of a dependent variable and an independent variable, as the goal is to predict a rank (category) of an individual on the dependent variable given that the individual belongs to a certain category in the independent variable.

For calculating the following measures need to define pair of observations as con- cordant or discordant

the pair (12, 2.7) and (15, 3.1) are concordant
the pair (12, 2.7) and (10, 3.1) are discordant
Gamma: is a normed measure; based on the numbers of concordant and discor- dant pairs.

no discordant pairs, Gamma is +1; perfect positive association
if there are no concordant pairs, Gamma is −1: perfect negative association
values in between −1 and +1 measure the strength of negative or positive association.
the numbers of discordant and concordant pairs are equal, Gamma is zero; rank of the independent variable cannot be used to predict the rank of the dependent variable.

Kendall’s tau-b: the ratio of the difference between the number of concor- dant and discordant pairs to the total number of pairs. It is scaled to be between −1 and +1 when there are no ties, but not otherwise.
Somers’ D: is an ordinal measure; adjusts for ties so that the value of the statistic lies between −1 and +1. two values of this statistic are computed: 1. when the row variable is considered the independent variable (Somers’ D C|R) 2. when the column is considered the independent variable (Somers’ D R|C)
Asymmetric lambda λ(R|C): a nominal measure; interpreted as the propor- tional reduction in error (PRE) in predicting the dependent (row) variable given the independent (column) variable. The range 0 ≤ λ(R|C) ≤ 1, values around 0.3 are considered high.
Asymmetric lambda λ(C|R): similarly interpreted
Stuart’s tau c: both variables in ordinal scale; adjusts for table size and corrects for ties. In the range − 1 ≤ τc ≤ 1.

Example:

In the following example, factor B could be a factor such as Marital Status and factor A could be a response to a question with three possibilities. Consider the table of frequencies is as follows:

b 1 b 2 b 3 Total a 1 8 16 31 55 a 2 9 18 74 101 a 3 34 23 17 74 Total 51 57 122 230

In this case, the column variable is called the independent variable with categories being classes, groups, or strata.

data ex8; input A $ B $ count @@; datalines; a1 b1 8 a1 b2 16 a1 b3 31 a2 b1 9 a2 b2 18 a2 b3 74 a3 b1 34 a3 b2 23 a3 b3 17 ;

PLOT Statement Options: HAXIS=, VAXIS= HAXIS=5 10 15 20 25 30 35 HAXIS=5 to 35 by 5 HAXIS=by 5 HAXIS=‘Kansas’ ‘Missouri’ ’Iowa’ ‘Illinois’ ‘Nebraska’ HAXIS=‘01MAY07’d to ‘01DEC07’d by month

HPOS=, VPOS=, HREF=, VREF=, HREFCHAR=, VREFCHAR=, BOX,, OVERLAY CONTOUR <=number-of-levels>

PLACEMENT=(expression(s)) PLACEMENT=(H=0, V=0,S=CENTER, L=1) PLACEMENT=(H=0 1) PLACEMENT=(H=0 1 -1V=1 -1) PLACEMENT=((s=right left:h=1 -1)(v=1 -1h=1 -1)) SPLIT=‘split-character ’

PENALTIES< (index-list) >=penalty-list PENALTIES(1)= PENALTIES(15 to 19)=2 3 4 10 15 25

Example:

proc plot data=mylib.fueldat; plot fuelroads; plot fuelroads=‘+’; plot fuelroads=‘’ $ st;

plot fuelroads=‘’ $ st/ haxis=0 to 20 by 2 placement=((s=right left:h=1 -1)(v=1 -1*h=1 -1));

title ‘Output from Sample Plot Statements’; run;

PROC CHART < option(s) >; BLOCK variable(s) < / option(s)>; BY variables; HBAR variable(s) < / option(s)>; PIE variable(s) < / option(s)>; STAR variable(s) < / option(s)>; VBAR variable(s) < / option(s)>;

Proc Options: DATA = , FORMACHAR= formchar $<($position(s)$)>$ = ‘formatting-character(s)’}

Options for: VBAR, HBAR, BLOCK, PIE, STAR MISSING, DISCRETE, TYPE=FREQ, PCT, FREQ, CPCT, SUM, MEAN SUMVAR = variable MIDPOINTS = values FREQ = variable AXIS = value

Options for: VBAR, HBAR, BLOCK only GROUP = variable SUBGROUP = variable LEVELS = n Defaults for TYPE= If TYPE= is omitted, the default is TYPE=FREQ except when SUMVAR= option is specified, in which case the default is TYPE=SUM

Example:

proc chart data=mylib.fueldat; vbar fuel; vbar fuel/midpoints =300 to 1000 by 100 type=percent; vbar incomgrp/discrete sumvar=fuel; vbar incomgrp/discrete sumvar=fuel type=mean subgroup=taxgrp; vbar incomgrp/discrete sumvar=fuel type=mean group=taxgrp; format incomgrp ing.; title ‘Illustrating HBAR statement in PROC CHART’; run;

Proc TABULATE Examples:

In the following examples, tables are presented that contain demographic informa- tion extracted from a data set containing the following variables:

Variable Description REGION code for region of the country CITYSIZE code for relative population size (S=small, M=medium, L=large) POP urban population

Each observation contains data for one city.

Most applications also use CLASS and VAR statements in addition to the PROC TABULATE statement.

PROC TABULATE

TITLE ‘REGION, POP’;

CLASS REGION;

VAR POP;

In the examples below, the required table format is specified with a TABLE state- ment and the output produced by different TABLE statements are sketched:

Example 1: TABLE REGION,POP; POP SUM REGION NC 4650000. NE 6666000. SO 6864000. WE 8376000.

Example 2: TABLE REGION,CITYSIZEPOPSUM ;

CITYSIZE L M S POP POP POP SUM SUM SUM REGION NC 3750000.00 750000.00 15000. NE 5022000.00 1422000.00 222000. SO 4488000.00 2088000.00 288000. WE 5592000.00 2592000.00 192000.

Example 3: TABLE REGIONCITYSIZE,POPSUM ;

POP SUM REGION CITYSIZE NC L 3750000. M 750000. S 150000. NE CITYSIZE L 5022000. M 1422000. S 222000. SO CITYSIZE L 4488000. M 2088000. S 288000. WE CITYSIZE L 5592000. M 2592000. S 192000.

Example 5: TABLE PRODUCT, REGION CITYSIZE, SALETYPE*(QUANTITY AMOUNT) ;

PRODUCT A SALETYPE R W QUANTITY AMOUNT QUANTITY AMOUNT SUM SUM SUM SUM REGION NC 1250.00 31250.00 1250.00 25000. NE 1600.00 40000.00 1600.00 32000. SO 1880.00 47000.00 1880.00 37600. WE 1840.00 46000.00 1840.00 36800. CITYSIZE L 3190.00 79750.00 3190.00 63800. M 2440.00 61000.00 2440.00 48800. S 940.00 23500.00 940.00 18800.

SAS Procedure for Univariate and Bivariate Analysis: An Example, Exams of Statistics

Related documents

Partial preview of the text

Download SAS Procedure for Univariate and Bivariate Analysis: An Example and more Exams Statistics in PDF only on Docsity!

PROC TABULATE

TITLE ‘REGION, POP’;

CLASS REGION;

VAR POP;

PRODUCT A

SALETYPE

R W

QUANTITY AMOUNT QUANTITY AMOUNT

SUM SUM SUM SUM

REGION

NC 1295.00 32375.00 1295.00 25900.