

















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A chapter from a statistics textbook focusing on descriptive statistics. It covers the concepts of populations, samples, and processes, pictorial and tabular methods, measures of location (mean and median), and measures of variability (range and variance). The document also includes examples and exercises.
Typology: Slides
1 / 57
This page cannot be seen from the preview
Don't miss anything!


















































Instructor: Shengwen Guo
Department of Mathematics and Statistics, UNC Charlotte
January 11, 2024
(^1) Populations, Samples, and processes
(^2) Pictorial and Tabular Methods in Descriptive Statistics
(^3) Measures of Location
4 Measures of Variability
Sample: Constraints on time, money, and other scarce resources usually make a census impractical or infeasible. Instead, a subset of the population—a sample—is selected in some prescribed manner.
approximately 5.3 million passenger cars were sold in the U.S. in 2018. We randomly select some cars (say 200), and find the average price of these 200 cars.
Characteristics: We are usually interested only in certain characteristics of the objects in a population:
salary of a software engineer. gender of an engineer graduate price of a car
A characteristic may be categorical e.g. gender, race, hair color. A characteristic may be numerical e.g. age = 23, height = 6 feet, weight = 160 lb.
Variable: A variable is any characteristic whose value may change from one object to another in the population. We shall initially denote variables by lowercase letters from the end of our alphabet. Examples include
x = brand of calculator owned by a student y = number of visits to a particular Website during a specified period z = braking distance of an automobile under specified conditions
M A A A M A A M A A
Two branches: descriptive statistics and inferential statistics. Descriptive statistics: An investigator who has collected data may wish simply to summarize and describe important features of the data. Some of these methods are graphical in nature; histograms, boxplots, and scatter plots. Other descriptive methods involve calculation of numerical summary measures; means, standard deviations, and correlation coefficients.
Descriptive statistics can be divided into two general subject areas. Visual displays: stem-and-leaf display, histogram, dotplot. Numerical summary measures: mean, median, standard deviation.
Consider a numerical data set x 1 , x 2 , x 3 , · · · , xn. A quick way to obtain an informative visual representation of the data set is to construct a stem-and-leaf display.
The first observation in the top row of the display is 5.0, corresponding to a stem of 5 and leaf of 0, and the last observation at the bottom of the display is 10.6. Note that in the absence of a context, without the identification of stem and leaf digits in the display, we wouldn’t know whether the observation with stem 7 and leaf 9 was .79, 7.9, or 79. The leaves in each row are ordered from smallest to largest; this is commonly done by software packages but is not necessary if a display is created by hand. The display suggests that a typical or representative sleep time is in the stem 8L row, perhaps 8.1 or 8.2.
Figure 1.6 shows a dotplot of the data. There is clearly a great deal of state-to-state variability.
The largest value, for D.C., is obviously an extreme outlier, and four other values on the upper end of the data are candidates for mild outliers (MA, MN, NY, and ND). There is also a cluster of states at the low end, primarily located in the South and Southwest.
A numerical variable is discrete if its set of possible values either is finite or else can be listed in an infinite sequence (one in which there is a first number, a second number, and so on). A numerical variable is continuous if its possible values consist of an entire interval on the number line.
A discrete variable x almost always results from counting, in which case possible values are 0, 1, 2, 3,... or some subset of these integers. Continuous variables arise from making measurements. For example, if x is the pH of a chemical substance, then in theory x could be any number between 0 and 14: 7.0, 7.03, 7.032, and so on.
The relative frequency of a value is the fraction or proportion of times the value occurs:
relative frequency of a value = number of times the value occurs number of observations in the data set
Suppose that our data set consists of 200 observations on x = the number of courses a college student is taking this term. If 70 of these x values are 3, then relative frequency of the x value 3 is 20070 = .35.
Multiplying a relative frequency by 100 gives a percentage; in the college-course example, 35% of the students in the sample are taking three courses.
A frequency distribution is a tabulation of the frequencies and/or relative frequencies.
(^1) determine the frequency and relative frequency of each x value. (^2) mark possible x values on a horizontal scale. (^3) above each value, draw a rectangle whose height is the relative frequency of that value.
This construction ensures that the area of each rectangle is proportional to the relative frequency of the value. Thus if the relative frequencies of x = 1 and x = 5 are .35 and .07, respectively, then the area of the rectangle above 1 is five times the area of the rectangle above 5.