Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Descriptive Statistics: Measures of Location and Variability, Slides of Probability and Statistics

University of North Carolina (UNC) - Charlotte Probability and Statistics

A chapter from a statistics textbook focusing on descriptive statistics. It covers the concepts of populations, samples, and processes, pictorial and tabular methods, measures of location (mean and median), and measures of variability (range and variance). The document also includes examples and exercises.

Typology: Slides

2023/2024

Uploaded on 01/25/2024

john-dow-2 🇺🇸

1 document

1 / 57

This page cannot be seen from the preview

Don't miss anything!

Chapter 1: Overview and Descriptive Statistics

Instructor: Shengwen Guo

Department of Mathematics and Statistics, UNC Charlotte

January 11, 2024

1 / 57

Discover Slides of Probability and Statistics University of North Carolina (UNC) - Charlotte

Partial preview of the text

Download Descriptive Statistics: Measures of Location and Variability and more Slides Probability and Statistics in PDF only on Docsity!

Chapter 1: Overview and Descriptive Statistics

Instructor: Shengwen Guo

Department of Mathematics and Statistics, UNC Charlotte

January 11, 2024

(^1) Populations, Samples, and processes

(^2) Pictorial and Tabular Methods in Descriptive Statistics

(^3) Measures of Location

4 Measures of Variability

Sample: Constraints on time, money, and other scarce resources usually make a census impractical or infeasible. Instead, a subset of the population—a sample—is selected in some prescribed manner.

Example

approximately 5.3 million passenger cars were sold in the U.S. in 2018. We randomly select some cars (say 200), and find the average price of these 200 cars.

Characteristics: We are usually interested only in certain characteristics of the objects in a population:

salary of a software engineer. gender of an engineer graduate price of a car

A characteristic may be categorical e.g. gender, race, hair color. A characteristic may be numerical e.g. age = 23, height = 6 feet, weight = 160 lb.

Variable: A variable is any characteristic whose value may change from one object to another in the population. We shall initially denote variables by lowercase letters from the end of our alphabet. Examples include

x = brand of calculator owned by a student y = number of visits to a particular Website during a specified period z = braking distance of an automobile under specified conditions

Examples

(^) We might determine the type of transmission, automatic (A) or manual (M), on each of ten automobiles recently purchased at a certain dealership, resulting in the categorical data set:

M A A A M A A M A A

(^) Our data set might consist of a (height, weight) pair for each basketball player on a team, with the first observation as ( 72 , 168 ), the second as ( 75 , 212 ), and so on.
(^) A research physician might determine the systolic blood pressure, diastolic blood pressure, and serum cholesterol level for each patient participating in a study.

Branches of Statistics

Two branches: descriptive statistics and inferential statistics. Descriptive statistics: An investigator who has collected data may wish simply to summarize and describe important features of the data. Some of these methods are graphical in nature; histograms, boxplots, and scatter plots. Other descriptive methods involve calculation of numerical summary measures; means, standard deviations, and correlation coefficients.

Pictorial and Tabular Methods in Descriptive Statistics

Descriptive statistics can be divided into two general subject areas. Visual displays: stem-and-leaf display, histogram, dotplot. Numerical summary measures: mean, median, standard deviation.

Notations

(^) The number of observations in a single sample, that is, the sample size, will often be denoted by n.
(^) If two samples are simultaneously under consideration, either m and n or n 1 and n 2 can be used to denote the numbers of observations.
(^) Given a data set consisting of n observations on some variable x, the individual observations will be denoted by x 1 , x 2 , · · · , xn. The subscript bears no relation to the magnitude of a particular observation.

Stem-and-Leaf Displays

Consider a numerical data set x 1 , x 2 , x 3 , · · · , xn. A quick way to obtain an informative visual representation of the data set is to construct a stem-and-leaf display.

Interpretation

The first observation in the top row of the display is 5.0, corresponding to a stem of 5 and leaf of 0, and the last observation at the bottom of the display is 10.6. Note that in the absence of a context, without the identification of stem and leaf digits in the display, we wouldn’t know whether the observation with stem 7 and leaf 9 was .79, 7.9, or 79. The leaves in each row are ordered from smallest to largest; this is commonly done by software packages but is not necessary if a display is created by hand. The display suggests that a typical or representative sleep time is in the stem 8L row, perhaps 8.1 or 8.2.

Dotplots

(^) A dotplot is an attractive summary of numerical data when the data set is reasonably small or there are relatively few distinct data values.
(^) Each observation is represented by a dot above the corresponding location on a horizontal measurement scale.
(^) When a value occurs more than once, there is a dot for each occurrence, and these dots are stacked vertically. As with a stem-and-leaf display, a dotplot gives information about location, spread, extremes, and gaps.

Figure 1.6 shows a dotplot of the data. There is clearly a great deal of state-to-state variability.

The largest value, for D.C., is obviously an extreme outlier, and four other values on the upper end of the data are candidates for mild outliers (MA, MN, NY, and ND). There is also a cluster of states at the low end, primarily located in the South and Southwest.

Histograms

Definition (Discrete and Continuous Variable)

A numerical variable is discrete if its set of possible values either is finite or else can be listed in an infinite sequence (one in which there is a first number, a second number, and so on). A numerical variable is continuous if its possible values consist of an entire interval on the number line.

A discrete variable x almost always results from counting, in which case possible values are 0, 1, 2, 3,... or some subset of these integers. Continuous variables arise from making measurements. For example, if x is the pH of a chemical substance, then in theory x could be any number between 0 and 14: 7.0, 7.03, 7.032, and so on.

The relative frequency of a value is the fraction or proportion of times the value occurs:

relative frequency of a value = number of times the value occurs number of observations in the data set

Example

Suppose that our data set consists of 200 observations on x = the number of courses a college student is taking this term. If 70 of these x values are 3, then relative frequency of the x value 3 is 20070 = .35.

Multiplying a relative frequency by 100 gives a percentage; in the college-course example, 35% of the students in the sample are taking three courses.

A frequency distribution is a tabulation of the frequencies and/or relative frequencies.

Constructing a Histogram for Discrete Data

(^1) determine the frequency and relative frequency of each x value. (^2) mark possible x values on a horizontal scale. (^3) above each value, draw a rectangle whose height is the relative frequency of that value.

This construction ensures that the area of each rectangle is proportional to the relative frequency of the value. Thus if the relative frequencies of x = 1 and x = 5 are .35 and .07, respectively, then the area of the rectangle above 1 is five times the area of the rectangle above 5.

Descriptive Statistics: Measures of Location and Variability, Slides of Probability and Statistics

Related documents

Partial preview of the text

Download Descriptive Statistics: Measures of Location and Variability and more Slides Probability and Statistics in PDF only on Docsity!

Chapter 1: Overview and Descriptive Statistics

Contents

Example

Examples

Branches of Statistics

Pictorial and Tabular Methods in Descriptive Statistics

Notations

Stem-and-Leaf Displays

Interpretation

Dotplots

Histograms

Definition (Discrete and Continuous Variable)

Example

Constructing a Histogram for Discrete Data