














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: BASIC PROBABILITY AND STATISTICS; Subject: Mathematics; University: Kent State University; Term: Spring 2000;
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















We can describe distributions using 3 characteristics: shape, center and spread.
Definition 1. A parameter is a descriptive measure of a population. A statistic is a descriptive measure of a sample.
Definition 2. The arithmetic mean of a variable is computed by determining the sum of all the values of the variable in the data set, divided by the number of observations. In other words, if x 1 , x 2 , · · · , xn are n observations of a variable from a population (sample), then the population (sample) arithmetic mean is denoted by μ (respectively ¯x for a sample) and is given by:
μ(or ¯x) = x^1 +^ x^2 + n^ · · ·^ +^ xn =
∑n i=1 xi n.
Definition 3. The median of a variable is the value that lies in the middle of the data when arranged in ascending order. Usually we denote the median by M.
Remark Half the data are below the median and half are above the median.
How do we compute the median? a) Arrange the data in ascending order and find out the number of observations n. b)• If n is odd then there is a middle value and that is the median. It will be on the n+1 2 position.
Example: Compute the mean, median, and mode for the number of children in the households in my building: 6,2,1,0,3,1,4,1,2, In general, answer the questions: Question 1) If we take a value from the right of the median and move to the right do we change the mode? Do we change the mean?
Question 2) What if we take a value from the left of the median and we move it to the right of the median?
Question 3) What is the position of the mean relative to the median in the different types of distributions? Where is the mode?
The range Range = R = Largest Data Value – Smallest Data Value
percentile of x = number of values less thann X× 100.
This section has 18 students. Suppose only 3 students get better scores than you in the final exam. At what percentile is your final-exam score?
The Quartiles are three specific quantiles that are used often:
First Quartile: Q 1 = P 25.
Second Quartile: Q 2 = P 50 (this is the median).
Third Quartile: Q 3 = P 75.
The quartiles divide the data in 4 pieces with (approximately) the same number of observations.
To compute a quartile, compute the corresponding percentile, or compute the median of the whole data and then the median of the first half of the data and the median of the second half of the data.
The interquartile range or IQR is computed by:
IQR = Q 3 − Q 1.
The IQR is an alternative to the standard deviation as a measure of spread.
Lower Fence = Q 1 − 1 .5(IQR)
Upper Fence = Q 3 + 1.5(IQR)
These are the graphical counterpart of the five-number summary, and are equally useful.
Drawing a boxplot (horizontally)
Lower Fence = Q 1 − 1 .5(IQR)
Upper Fence = Q 3 + 1.5(IQR)
Example: Construct the boxplot of the heights 64, 67, 68,69,70,70,71,71,
When the population is large, we approximate the population mean μ with the sample mean, ¯x. Similarly, we approximate the population variance σ^2 by the sample variance, denoted s^2 :
s^2 =
(xi − x¯)^2 n − 1
= (x^1 −^ x¯)
(^2) + (x 2 − x¯) (^2) + · · · + (xn − x¯) 2 n − 1
The alternative form is:
s^2 =
(xi − x¯)^2 n − 1
xi)^2 n(n − 1)
REMARK: Notice that we divide by the sample size minus one (this is different from the formula for the population variance).
Informally, we say: a sample of size n has n degrees of freedom; one degree of freedom is “used up” in computing ¯x, so there are only n − 1 degrees of freedom available for the sample variance.
For both cases (the population or the sample), the standard deviation is the square root of the corresponding variance:
The population standard deviation is denoted by σ:
σ =
σ^2.
The sample standard deviation is denoted by s:
s =
s^2.
Advantage of the (population or sample) standard deviation: it is given in the same units as the observations.
Advantage of the (population or sample) variance: it is easier to manipulate algebraically, in some cases.