Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture Slides on Exploratory Data Analysis | STAT 371, Study notes of Statistics

University of Wisconsin (UW) - Madison Statistics

Material Type: Notes; Class: Introductory Applied Statistics for the Life Sciences; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Spring 2006;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-nw6 🇺🇸

10 documents

1 / 18

This page cannot be seen from the preview

Don't miss anything!

Exploratory data analysis (Chapter 2)

Cécile Ané

Stat 371

Spring 2006

Discover Study notes of Statistics University of Wisconsin (UW) - Madison

Partial preview of the text

Download Lecture Slides on Exploratory Data Analysis | STAT 371 and more Study notes Statistics in PDF only on Docsity!

Exploratory data analysis (Chapter 2)

Cécile Ané

Stat 371

Spring 2006

Outline

(^1) Categorical data

(^2) Numerical data Displays Numerical summaries

R demo and Bar plots

A AB B O NA's

Blood type, 2005 survey

Displays: Stem-leaf display

Milk data: milk yields (lbs/day) were collected from a particular herd on a given day. Data: 44, 55, 37, 32, 37, 26, 23, 41, 34, 19, 30, 39, 46, 44.

stem(milk) The decimal point is 1 digit(s) to the right of the | 1 | 9 2 | 36 3 | 024779 4 | 1446 5 | 5

Look for the minimum and maximum, then decide on a precision and round off all data at the same precision. Last digit: leaf, any digit before: stem. 1 observation=1 leaf Leaves may then be ordered. Provides complete information. Pretty easy to do.

Displays: Histograms

Rotate the histogram: like stem-leaf No single histogram. Lots of them! Rules are somewhat arbitrary. Histograms are useful with larger datasets, stem-leaf displays with smaller data sets.

Measures of location: Sample mean

Milk yield data: 44, 55, 37, 32, 37, 26, 23, 41, 34, 19, 30, 39, 46, 44. y 1 = 44, y 2 = 55,... , y 14 = 44.

Sample mean

¯ y = ( 44 + 55 + · · · + 44 )/ 14 = ( y 1 + y 2 + · · · + y 14 )/ 14

n ( y 1 + y 2 + · · · + yn )

∑^ n

i = 1

mean(milk) [1] 36.

Here y ¯ = 36 .2 lbs/day

Measures of location: Mode

Mode: most common value. More interesting for discrete data, with small # possible values and large # observations. Example: # of brothers. 0 | 0000000000000000000000000000000 1 | 00000000000000000000000000000000000000000000000000 2 | 000000000000000 3 | 00000 4 | 5 | 6 | 7 | 0

Mode = 1 (brother).

Measures of location: Quantiles and percentiles

25 % percentile = 0.25 quantile = value such that 1/ observations are below and 3/4 are above. p quantile: value such that (about) a proportion p of observations are below and about 1 − p are above. Median is a special case (why?) Example: 6 9 11 17 19 23 26 26

First quartile Q 1 : median of those values below the median Third quartile Q 3 : median of those values above the median

Milk yield:

19 23 26 30 32 34 37

Display: Boxplot

fivenum(milk) summary(milk) boxplot(milk)

20 25 30 35 40 45 50 55

Min Q1 Median Q3 Max

Milk yield data

Range: 36

IQR: 14

boxplot(Height ~ Sex)

Display: Boxplot

No fence: whiskers extend to minimum and maximum With fences (Modified boxplot): Observations outside fences are drawn as points. Whiskers cannot go beyond fences. Fence = 1.5 IQR Milk example: IQR = 14. Fences are 1. 5 ∗ 14 = 21 below Q 1 and above Q 3 , i.e 30 − 21 = 9 and 44 + 21 = 65. Here smallest data point (min) = 19 > 9 and largest (max) = 55 < 65: no outlier.

Measures of spread: Standard deviation

Recall y 1 = first observation,... , yn = last observation. Deviation from the mean: yi − y ¯. Ex: first cow has deviation 44 − 36. 2 = + 7 .8, cow with data 19 has deviation 19 − 36. 2 = − 17 .2. Variance : s^2 ≥ 0 always!

s^2 =

n − 1

∑^ n

i = 1

( yi − y ¯)^2 =

n − 1

( y 1 − y ¯)^2 + · · · + ( yn − ¯ y )^2

Preferred formula for hand calculation:

s^2 =

n − 1

( (^) n ∑

i = 1

y i^2 − ny ¯^2

Here we get

s^2 =

= 95. 26 lb^2

Measures of spread: Standard deviation

Standard deviation: s =

variance =

s^2 is now in original units. s is the typical deviation. Here, s = 9 .8 lbs.

mean(milk) [1] 95. sd(milk) [1] 9.

20 30 40 50

l l l l l l l

l l l l

l l l

sd=9.

16.6 26.4 36.2 46.0 55.

sd=9.

The empirical rule: for most “mound-shaped” distributions, about 68% of observations lie within 1 standard deviation of the mean, (here: 9/14 = 64%) about 95% lie within 2 s.d. of the mean (here: 100%) about 99% lie within 3 s.d. of the mean (here: 100%)

Lecture Slides on Exploratory Data Analysis | STAT 371, Study notes of Statistics

Related documents

Partial preview of the text

Download Lecture Slides on Exploratory Data Analysis | STAT 371 and more Study notes Statistics in PDF only on Docsity!

Exploratory data analysis (Chapter 2)

Outline

R demo and Bar plots

Displays: Stem-leaf display

Displays: Histograms

Measures of location: Sample mean

Sample mean

Measures of location: Mode

Measures of location: Quantiles and percentiles

Display: Boxplot

Display: Boxplot

Measures of spread: Standard deviation

Measures of spread: Standard deviation