Notes on Multivariate Data in Statistical Analysis | STAT 200, Study notes of Statistics

Material Type: Notes; Class: Statistical Analysis; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/10/2009

koofers-user-nlp
koofers-user-nlp 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MULTIVARIATE DATA
Looking at multivariate data is not that much different than examining bivariate data. This
section explores how we expand our notion of some R functions for multivariate data.
n-way Contingency Tables
These are for categorical data and are no different from the contingency tables in Chapter 3,
except that there are more dimensions (variables) to the table. The text covers this well on its
own. The function
ftable( )
is particularly useful.
> library(MASS); data(Cars93); attach(Cars93)
> data.frame(Make, MPG.highway, Price)
Using the
cut()
function, we can divide numeric data into intervals.
> price = cut(Price,c(0,12,20,max(Price)))
> levels(price)=c("cheap","okay","expensive")
> mpg = cut(MPG.highway,c(0,20,30,max(MPG.highway)))
> levels(mpg) = c("poor","decent","excellent")
Let’s look at some tables. Are there any apparent patterns in the 3-way contingency table?
> table(Type)
> table(price,Type)
> table(price,Type,mpg)
> ftable(price,Type,mpg)
> ftable(price,Type,mpg, col.vars=c(‘price’,’mpg’))
We can even use our new variable
price
to make barplots.
> barplot(table(price,Type),beside=T) # the price by different types
> barplot(table(Type,price),beside=T) # type by different prices
Independent Samples
As discussed earlier, sometimes we are interested in obtaining the same variable data from
several independent sample sources. The text gives an excellent example when the samples each
have their own variable name. But what about a case when the samples are their own variable.
Look at the
PlantGrowth
dataset.
> data(PlantGrowth)
> PlantGrowth[1:5, ]
There are 3 groups a control and two treatments. For each group, weights are recorded. The data
is generated this way, by recording a weight and group for each plant. However, you may want
to plot boxplots for the data broken down by their group. How to do this?
pf3

Partial preview of the text

Download Notes on Multivariate Data in Statistical Analysis | STAT 200 and more Study notes Statistics in PDF only on Docsity!

MULTIVARIATE DATA

Looking at multivariate data is not that much different than examining bivariate data. This

section explores how we expand our notion of some R functions for multivariate data.

n -way Contingency Tables

These are for categorical data and are no different from the contingency tables in Chapter 3,

except that there are more dimensions (variables) to the table. The text covers this well on its

own. The function ftable( ) is particularly useful.

library(MASS); data(Cars93); attach(Cars93) data.frame(Make, MPG.highway, Price)

Using the cut() function, we can divide numeric data into intervals.

price = cut(Price,c(0,12,20,max(Price))) levels(price)=c("cheap","okay","expensive") mpg = cut(MPG.highway,c(0,20,30,max(MPG.highway))) levels(mpg) = c("poor","decent","excellent")

Let’s look at some tables. Are there any apparent patterns in the 3-way contingency table?

table(Type) table(price,Type) table(price,Type,mpg) ftable(price,Type,mpg) ftable(price,Type,mpg, col.vars=c(‘price’,’mpg’))

We can even use our new variable price to make barplots.

barplot(table(price,Type),beside=T) # the price by different types barplot(table(Type,price),beside=T) # type by different prices

Independent Samples

As discussed earlier, sometimes we are interested in obtaining the same variable data from

several independent sample sources. The text gives an excellent example when the samples each

have their own variable name. But what about a case when the samples are their own variable.

Look at the PlantGrowth dataset.

data(PlantGrowth) PlantGrowth[1:5, ]

There are 3 groups a control and two treatments. For each group, weights are recorded. The data

is generated this way, by recording a weight and group for each plant. However, you may want

to plot boxplots for the data broken down by their group. How to do this?

A brute force way is to do as follows for each value of the variable group:

attach(PlantGrowth) weight.ctrl = weight[group == "ctrl"] # similar for trt1 and trt detach(PlantGrowth)

However, the unstack function will do this all at once for us. If the data is structured correctly, it

will create a data frame with variables corresponding to the levels of the factor.

pg = unstack(PlantGrowth) boxplot(pg)

Which group had the highest yield? Which group seems to have had the lowest yield?

Let’s try this again with a dataset that has more than two variables, and does not have an equal

number of subjects at each level. The twins (UsingR) dataset has three unique Social groups.

bio = unstack(twins, Biological~Social) attach(bio) boxplot(bio)

Which social class seems to have the highest IQ scores? The lowest IQ scores? Which class has

the lowest medians?

Now if we want to put these all back together, just use stack.

stack(bio) stack(pg) stack(pg, select = -ctrl)

Multivariate Comparisons

Sometimes we may wish to look at many variables at a time to find which ones have stronger

relationships than others. Both pairs() and plot() will give a scatterplot matrix.

EW = read.table(“http://www.stat.uiuc.edu/~dunger/EW2000.txt”, header=T) pairs(EW) # We don’t really need to see all of these. attach(EW) plot(data.frame(weight,squat,BP,deadlift,total))

Which variables appear to be linearly related?

Which pair of variables has the highest correlation coefficient? The lowest?

> cor(data.frame(weight,squat,BP,deadlift,total))