Statistical Analysis for Categorical and Quantitative Variables: Lecture 2, Study notes of Statistics

An introduction to the statistical analysis of categorical and quantitative variables. It covers the use of graphical displays, such as bar graphs, pie charts, stem-and-leaf plots, and histograms, to understand the distribution of data. Examples of analyzing categorical data from student majors and quantitative data from shark lengths.

Typology: Study notes

Pre 2010

Uploaded on 03/10/2009

koofers-user-4j2
koofers-user-4j2 🇺🇸

9 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 200, S1 Lecture 2
P. 6-21
Categorical Variables
Graphical Displays: enable us to see the distribution of the variable
Bar Graphs
Pie charts
To find the distribution of the variable:
1. List Categories
2. Indicate count or percent of individuals in each category
Example:
You are interested in studying the distribution of various majors of 400
students enrolled in an undergraduate program at a small university.
The following data is provided for you.
Major Number of Students Percent of Students
Math 65 16.25%
Stat 20 5%
Engineering 250 62.5%
Health Sciences 65 16.25%
[R command page 32-37 on R book]
numstudent <- c(65,20, 250, 65)
names(numstudent)<-c("math", "stat", "eng", "health")
barplot(numstudent)
pf3
pf4
pf5

Partial preview of the text

Download Statistical Analysis for Categorical and Quantitative Variables: Lecture 2 and more Study notes Statistics in PDF only on Docsity!

STAT 200, S1 Lecture 2

P. 6-

Categorical Variables

Graphical Displays: enable us to see the distribution of the variable

• Bar Graphs

• Pie charts

To find the distribution of the variable:

1. List Categories

2. Indicate count or percent of individuals in each category

Example:

You are interested in studying the distribution of various majors of 400

students enrolled in an undergraduate program at a small university.

The following data is provided for you.

Major Number of Students Percent of Students

Math 65 16.25%

Stat 20 5%

Engineering 250 62.5%

Health Sciences 65 16.25%

[R command page 32-37 on R book]

numstudent <- c(65,20, 250, 65)

names(numstudent)<-c("math", "stat", "eng", "health")

barplot(numstudent)

barplot(numstudent, ylab="frequency")

pie(numstudent)

Quantitative Variables

Distribution:

• What values a variable takes

• How often the variable takes those values (frequency)

Graphical Displays:

• Stem-and-leaf plots

• Histograms

attach(Shark) length [1] 18.7 12.3 18.6 16.4 15.7 18.3 14.6 15.8 14.9 17.6 12.1 16.4 16.7 17.8 16.2 12.6 17. 13.8 12.2 15.2 14.7 12.4 13.2 15.8 14.3 16.6 9.4 18.2 13.2 13. [31] 15.3 16.1 13.5 19.1 16.2 22.8 16.8 13.6 13.2 15.7 19.7 18.7 13.2 16.

sort(length) [1] 9.4 12.1 12.2 12.3 12.4 12.6 13.2 13.2 13.2 13.2 13.5 13.6 13.6 13.8 14. [16] 14.6 14.7 14.9 15.2 15.3 15.7 15.7 15.8 15.8 16.1 16.2 16.2 16.4 16.4 16. [31] 16.7 16.8 16.8 17.6 17.8 17.8 18.2 18.3 18.6 18.7 18.7 19.1 19.7 22.

Stem - and-leaf plot

stem(length, scale=2)

The decimal point is at the |

9 | 4 10 | 11 | 12 | 12346 13 | 22225668 14 | 3679 15 | 237788 16 | 122446788 17 | 688 18 | 23677 19 | 17 20 | 21 | 22 | 8

Histograms

Steps to constructing a histogram:

1. Order data

2. Divide data into intervals of equal width

To chose interval width:

Range Number of intervals(5 or 6)

3. Count the number of observations in each interval

4. Graph

> hist(length)

Interpreting histograms :

Overall pattern

  • Shape: single peak (or uni-modal, bi-modal, multi-modal) , symmetric
  • Center: the middle point
  • Spread: range
  • Outliers: falls outside of pattern

Symmetric and skewed distribution:

  • Symmetric: left and right are mirror to each other
  • Skewed to right: right side of histogram extends further than left side
  • Skewed to left: left side of histogram extends further than right side