Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistics: Unlocking the Power of Data - Descriptive Statistics of One Variable, Slides of Descriptive statistics

University of Notre Dame Australia (UNDA)Descriptive statistics

A part of the 'Statistics: Unlocking the Power of Data' course by Dr. Kari Lock Morgan. It covers the basics of descriptive statistics for one variable, including frequency tables, proportions, and relative frequency tables for both categorical and quantitative data. The document also introduces the concepts of summary statistics and visualization methods.

Typology: Slides

2021/2022

Uploaded on 07/05/2022

gavin_99 🇦🇺

4.3

(73)

998 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

12/23/2012

1

Statistics: Unlocking the Power of Data Lock5

STAT 101

Dr. Kari Lock Morgan

9/6/12

Describing Data:

One Variable

SE CTI ONS 2 .1, 2. 2, 2.3 , 2.4

• One categorical variable (2.1)

• One quantitative variable (2.2, 2.3, 2.4)

Statistics: Unlocking the Power of Data Lock5

The Big Picture

Population

Sample

Sampling

Statistical

Inference Descriptive

Statistics

Statistics: Unlocking the Power of Data Lock5

Descriptive Statistics

In order to make sense of data, we need ways

to summarize and visualize it

Summarizing and visualizing variables and

relationships between two variables is often

known as descriptive statistics (also known as

exploratory data analysis)

Type of summary statistics and visualization

methods depend on the type of variable(s) being

analyzed (categorical or quantitative)

Statistics: Unlocking the Power of Data Lock5

One Categorical Variable

A random sample of US adults in 2012 were

surveyed regarding the type of cell phone owned

Android? iPhone? Blackberry? Non-

smartphone? No cell phone?

Statistics: Unlocking the Power of Data Lock5

Frequency Table

R: table(x)

•A frequency table shows the number of

cases that fall in each category:

Android

458

iPhone

437

Blackberry

141

Non Smartphone

924

No

cell phone

293

Total

2253

Statistics: Unlocking the Power of Data Lock5

Proportion

The proportion in a category is found by

𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛=𝑛𝑢𝑚𝑏𝑒𝑟 𝑖𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦

𝑡𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

Proportion for a sample: 𝑝

(“p-hat”)

Proportion for a population: p

Discover Slides of Descriptive statistics University of Notre Dame Australia (UNDA)

Partial preview of the text

Download Statistics: Unlocking the Power of Data - Descriptive Statistics of One Variable and more Slides Descriptive statistics in PDF only on Docsity!

Statistics: Unlocking the Power of Data Lock^5 STAT 101 Dr. Kari Lock Morgan 9/6/

Describing Data:

One Variable

SECTIONS 2.1, 2.2, 2.3, 2.

One categorical variable (2.1)
One quantitative variable (2.2, 2.3, 2.4) Statistics: Unlocking the Power of Data Lock^5

The Big Picture

Population

Sample Sampling

Statistical

Inference Descriptive Statistics

Statistics: Unlocking the Power of Data Lock^5

Descriptive Statistics

 In order to make sense of data, we need ways to summarize and visualize it  Summarizing and visualizing variables and relationships between two variables is often known as descriptive statistics (also known as exploratory data analysis )  Type of summary statistics and visualization methods depend on the type of variable(s) being analyzed (categorical or quantitative) Statistics: Unlocking the Power of Data Lock^5

One Categorical Variable

 A random sample of US adults in 2012 were surveyed regarding the type of cell phone owned  Android? iPhone? Blackberry? Non- smartphone? No cell phone? Statistics: Unlocking the Power of Data Lock^5

Frequency Table

R: table(x)

A frequency table shows the number of cases that fall in each category: Android 458 iPhone 437 Blackberry 141 Non Smartphone 924 No cell phone 293 Total 2253 Statistics: Unlocking the Power of Data Lock^5

Proportion

The proportion in a category is found by 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 =

 Proportion for a sample: 𝑝 (“p-hat”)  Proportion for a population: p

Statistics: Unlocking the Power of Data Lock^5

Proportion

 What proportion of adults sampled do not own a cell phone? Android 458 iPhone 437 Blackberry 141 Non Smartphone 924 No cell phone 293 Total 2253

or 13% Proportions and percentages can be used interchangeably Statistics: Unlocking the Power of Data Lock^5

Relative Frequency Table

 A relative frequency table shows the proportion of cases that fall in each category R: table(x)/length(x) Android 0. iPhone 0. Blackberry 0. Non Smartphone 0. No cell phone 0.  All the numbers in a relative frequency table sum to 1 Statistics: Unlocking the Power of Data Lock^5

Bar Chart/Plot/Graph

 In a barplot, the height of the bar corresponds to the number of cases falling in each category R: barchart(x) Statistics: Unlocking the Power of Data Lock^5

Pie Chart

 In a pie chart, the relative area of each slice of the pie corresponds to the proportion in each category R: pie(table(x)) Statistics: Unlocking the Power of Data Lock^5

StatKey

www.lock5stat.com/statkey Statistics: Unlocking the Power of Data Lock^5

Summary: One Categorical Variable

 Summary Statistics  Proportion  Frequency table  Relative frequency table  Visualization  Bar chart  Pie chart

Statistics: Unlocking the Power of Data Lock^5

Histogram vs Bar Chart

 A bar chart is for categorical data, and the x-axis has no numeric scale  A histogram is for quantitative data, and the x- axis is numeric  For a categorical variable, the number of bars equals the number of categories, and the number in each category is fixed  For a quantitative variable, the number of bars in a histogram is up to you (or your software), and the appearance can differ with different number of bars Statistics: Unlocking the Power of Data Lock^5

Shape

Symmetric Right-Skewed Left-Skewed Long right tail Statistics: Unlocking the Power of Data Lock^5

Notation

 The sample size, the number of cases in the sample, is denoted by n  We often let x or y stand for any variable, and x 1 , x 2 , …, xn represent the n values of the variable x  x 1 = 97.009, x 2 = 201.897, x 3 = 216.196, … Statistics: Unlocking the Power of Data Lock^5

Mean

The mean or average of the data values is 𝑚𝑒𝑎𝑛 = 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒𝑠  Sample mean: 𝑥  Population mean:  (“mu”)

R: mean(x) Statistics: Unlocking the Power of Data Lock^5

Median

The median , m , is the middle value when the data are ordered. If there are an even number of values, the median is the average of the two middle values.  The median splits the data in half. R: median(x) Statistics: Unlocking the Power of Data Lock^5 m = 76.  =150. Mean is “pulled” in the direction of skewness Measures of Center World Gross (in millions)

Statistics: Unlocking the Power of Data Lock^5

Skewness and Center

A distribution is left-skewed. Which measure of center would you expect to be higher? a) Mean b) Median The mean will be pulled down towards the skewness (towards the long tail). Statistics: Unlocking the Power of Data Lock^5

Outlier

An outlier is an observed value that is notably distinct from the other values in a dataset. Statistics: Unlocking the Power of Data Lock^5 Outliers World Gross (in millions) Harry Pirates of the^ Transformers Potter Caribbean Statistics: Unlocking the Power of Data Lock^5

Resistance

A statistic is resistant if it is relatively unaffected by extreme values.  The median is resistant while the mean is not. Mean Median With Harry Potter $150,742,300 $76,658, Without Harry Potter $141,889,900 $75,009, Statistics: Unlocking the Power of Data Lock^5

Outliers

 When using statistics that are not resistant to outliers, stop and think about whether the outlier is a mistake  If not, you have to decide whether the outlier is part of your population of interest or not  Usually, for outliers that are not a mistake, it’s best to run the analysis twice, once with the outlier(s) and once without, to see how much the outlier(s) are affecting the results Statistics: Unlocking the Power of Data Lock^5

Standard Deviation

The standard deviation for a quantitative variable measures the spread of the data  Sample standard deviation: s  Population standard deviation:  (“sigma”)

R: sd(x)

Statistics: Unlocking the Power of Data Lock^5

z-score

 A z-score puts values on a common scale  A z-score is the number of standard deviations a value falls from the mean  95% of all z-scores fall between what two values?  z-scores beyond - 2 or 2 can be considered extreme

2 and 2 Statistics: Unlocking the Power of Data Lock^5

z-score

Which is better, an ACT score of 28 or a

combined SAT score of 2100?

 ACT:  = 21,  = 5

 SAT:  = 1500,  = 325

 Assume ACT and SAT scores have

approximately bell-shaped distributions

a) ACT score of 28 b) SAT score of 2100 c) I don’t know (^28 21 7) 1. z (^) 5 5     (^2100 1500 600) 1. z (^) 325 325     Statistics: Unlocking the Power of Data Lock^5

Other Measures of Location

Maximum = largest data value

Minimum = smallest data value

Quartiles :

Q 1 = median of the values below m.

Q 3 = median of the values above m.

Statistics: Unlocking the Power of Data Lock^5

Five Number Summary

 Five Number Summary: Min Q 1 m Q 3 Max  25%   25%   25%   25%  R: summary(x) Statistics: Unlocking the Power of Data Lock^5

Five Number Summary

The distribution of number of hours spent studying each week is a) Symmetric b) Right-skewed c) Left-skewed d) Impossible to tell

summary(study_hours) Min. 1st Qu. Median 3rd Qu. Max. 2.00 10.00 15.00 20.00 69. Statistics: Unlocking the Power of Data Lock^5

Percentile

The Pth^ percentile is the value which is greater than P % of the data  We already used z-scores to determine whether an SAT score of 2100 or an ACT score of 28 is better  We could also have used percentiles:  ACT score of 28: 91st percentile  SAT score of 2100: 97th percentile

Statistics: Unlocking the Power of Data Lock^5

Five Number Summary

 Five Number Summary: Min Q 1 m Q 3 Max  25%   25%   25%   25%  0 th percentile 100 th percentile 50 th percentile 75 th percentile 25 th percentile Statistics: Unlocking the Power of Data Lock^5

Measures of Spread

 Range = Max – Min

 Interquartile Range (IQR) = Q 3 – Q 1

 Is the range resistant to outliers? a) Yes b) No  Is the IQR resistant to outliers? a) Yes b) No The range depends entirely on the most extreme values. The IQR is based off the middle 50% of the data, which will not contain outliers. Statistics: Unlocking the Power of Data Lock^5

Comparing Statistics

 Measures of Center:  Mean (not resistant)  Median (resistant)  Measures of Spread:  Standard deviation (not resistant)  IQR (resistant)  Range (not resistant)  Most often, we use the mean and the standard deviation, because they are calculated based on all the data values, so use all the available information Statistics: Unlocking the Power of Data Lock^5

Outliers

 Outliers can be informally identified by

looking at a plot, but one rule of thumb for

identifying outliers is data values more than

1.5 IQRs beyond the quartiles

 A data value is an outlier if it is

Smaller than Q 1 – 1.5(IQR)

or

Larger than Q 3 + 1.5(IQR)

Statistics: Unlocking the Power of Data Lock^5

Boxplot

Q^ Median 1 Q 3 middle 50% of data   

Lines (“whiskers”) extend from each quartile to the most extreme value that is not an outlier Outliers R: boxplot(x) Statistics: Unlocking the Power of Data Lock^5

Boxplot

Which boxplot goes with the histogram of waiting times for the bus? Histogram of Bus Bus Frequency 0 5 10 15 20 0 10 20 (a) (b) (c) The data do not show any low outliers.

Statistics: Unlocking the Power of Data - Descriptive Statistics of One Variable, Slides of Descriptive statistics

Related documents

Partial preview of the text

Download Statistics: Unlocking the Power of Data - Descriptive Statistics of One Variable and more Slides Descriptive statistics in PDF only on Docsity!

Describing Data:

One Variable

SECTIONS 2.1, 2.2, 2.3, 2.

The Big Picture

Population

Statistical

Inference Descriptive Statistics

Descriptive Statistics

One Categorical Variable

Frequency Table

Proportion

Proportion

Relative Frequency Table

Bar Chart/Plot/Graph

Pie Chart

StatKey

Summary: One Categorical Variable

Histogram vs Bar Chart

Shape

Notation

Mean

Median

Skewness and Center

Outlier

Resistance

Outliers

Standard Deviation

z-score

z-score

Which is better, an ACT score of 28 or a

combined SAT score of 2100?

 ACT:  = 21,  = 5

 SAT:  = 1500,  = 325

 Assume ACT and SAT scores have

approximately bell-shaped distributions

Other Measures of Location

Maximum = largest data value

Minimum = smallest data value

Quartiles :

Q 1 = median of the values below m.

Q 3 = median of the values above m.

Five Number Summary

Five Number Summary

Percentile

Five Number Summary

Measures of Spread

 Range = Max – Min

 Interquartile Range (IQR) = Q 3 – Q 1

Comparing Statistics

Outliers

 Outliers can be informally identified by

looking at a plot, but one rule of thumb for

identifying outliers is data values more than

1.5 IQRs beyond the quartiles

 A data value is an outlier if it is

Smaller than Q 1 – 1.5(IQR)

or

Larger than Q 3 + 1.5(IQR)

Boxplot

Boxplot