Arithmetic Mean - Basic Probability - Lecture Notes | MATH 30011, Study notes of Mathematical Statistics

Material Type: Notes; Class: BASIC PROBABILITY AND STATISTICS; Subject: Mathematics; University: Kent State University; Term: Spring 2000;

Typology: Study notes

Pre 2010

Uploaded on 02/25/2010

koofers-user-7cl
koofers-user-7cl 🇺🇸

10 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 2 and Lecture 3 1
Lecture 2 and Lecture 3
We can describe distributions using 3
characteristics: shape, center and spread.
These characteristics have been discussed since
the foundation of statistics.
Shape can be determined from pictures of data
(like histograms).
We discuss center and spread next.
Definition 1. Aparameter is a descriptive
measure of a population. A statistic is a
descriptive measure of a sample.
Choice for the spread drives choice for the
center.
September 10, 2009
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Arithmetic Mean - Basic Probability - Lecture Notes | MATH 30011 and more Study notes Mathematical Statistics in PDF only on Docsity!

Lecture 2 and Lecture 3

We can describe distributions using 3 characteristics: shape, center and spread.

  • These characteristics have been discussed since the foundation of statistics.
  • Shape can be determined from pictures of data (like histograms). We discuss center and spread next.

Definition 1. A parameter is a descriptive measure of a population. A statistic is a descriptive measure of a sample.

  • Choice for the spread drives choice for the center.

Definition 2. The arithmetic mean of a variable is computed by determining the sum of all the values of the variable in the data set, divided by the number of observations. In other words, if x 1 , x 2 , · · · , xn are n observations of a variable from a population (sample), then the population (sample) arithmetic mean is denoted by μ (respectively ¯x for a sample) and is given by:

μ(or ¯x) = x^1 +^ x^2 + n^ · · ·^ +^ xn =

∑n i=1 xi n.

The median

Definition 3. The median of a variable is the value that lies in the middle of the data when arranged in ascending order. Usually we denote the median by M.

Remark Half the data are below the median and half are above the median.

How do we compute the median? a) Arrange the data in ascending order and find out the number of observations n. b)• If n is odd then there is a middle value and that is the median. It will be on the n+1 2 position.

  • If n is even there is no middle value and then the median is computed by computing the arithmetic mean of the n 2 and n 2 + 1 value.
  • Advantage: robustness/resistance, better measure of center for nonsymmetric distributions.
  • Disadvanteges: More difficult to compute, does not combine, not always unique (if n is even anything between the 2 middle observations could be a median).

Example: Compute the mean, median, and mode for the number of children in the households in my building: 6,2,1,0,3,1,4,1,2, In general, answer the questions: Question 1) If we take a value from the right of the median and move to the right do we change the mode? Do we change the mean?

Question 2) What if we take a value from the left of the median and we move it to the right of the median?

Question 3) What is the position of the mean relative to the median in the different types of distributions? Where is the mode?

Measures of Spread

The range Range = R = Largest Data Value – Smallest Data Value

  • Advanteges: easy to compute, useful in absolute terms
  • Disadvantes: uses only 2 points, hence there is no intermediate information.

How to find the percentile that

corresponds to a data value

  1. Sort: Arrange the data in ascending order.
  2. Compute:

percentile of x = number of values less thann X× 100.

  1. Round off the percentile to the nearest integer.

Example

This section has 18 students. Suppose only 3 students get better scores than you in the final exam. At what percentile is your final-exam score?

Quartiles

The Quartiles are three specific quantiles that are used often:

First Quartile: Q 1 = P 25.

Second Quartile: Q 2 = P 50 (this is the median).

Third Quartile: Q 3 = P 75.

The quartiles divide the data in 4 pieces with (approximately) the same number of observations.

To compute a quartile, compute the corresponding percentile, or compute the median of the whole data and then the median of the first half of the data and the median of the second half of the data.

The interquartile range or IQR is computed by:

IQR = Q 3 − Q 1.

The IQR is an alternative to the standard deviation as a measure of spread.

  • If the data is approx. normal (symmetric, no outliers): the standard deviation is better (more information from less data).
  • If the data is not normal (asymmetric, outliers), the IQR is better (more robust).

Checking for Outliers

  1. Determine Q 1 , Q3.
  2. Compute IRQ = Q 3 − Q 1.
  3. Find the fences:

Lower Fence = Q 1 − 1 .5(IQR)

Upper Fence = Q 3 + 1.5(IQR)

  1. A value below the lower fence or above the upper fence is considered an outlier.

Boxplots

These are the graphical counterpart of the five-number summary, and are equally useful.

Drawing a boxplot (horizontally)

  1. Find the five-number summary
  2. Compute the lower and upper fences:

Lower Fence = Q 1 − 1 .5(IQR)

Upper Fence = Q 3 + 1.5(IQR)

  1. Draw short vertical lines at Q 1 , M , and Q 3 ; enclose them in a box.
  2. Draw shorter vertical lines (or brackets) at the fences.
  3. Draw a line from Q 1 to the smallest data value inside the fences, and a line from Q 3 to the largest data value inside the fences.
  4. Mark with an asterisk (?) any values outside the fences.

Example: Construct the boxplot of the heights 64, 67, 68,69,70,70,71,71,

The Sample Variance

When the population is large, we approximate the population mean μ with the sample mean, ¯x. Similarly, we approximate the population variance σ^2 by the sample variance, denoted s^2 :

s^2 =

(xi − x¯)^2 n − 1

= (x^1 −^ x¯)

(^2) + (x 2 − x¯) (^2) + · · · + (xn − x¯) 2 n − 1

The alternative form is:

s^2 =

(xi − x¯)^2 n − 1

xi)^2 n(n − 1)

REMARK: Notice that we divide by the sample size minus one (this is different from the formula for the population variance).

Informally, we say: a sample of size n has n degrees of freedom; one degree of freedom is “used up” in computing ¯x, so there are only n − 1 degrees of freedom available for the sample variance.

The Standard Deviation

For both cases (the population or the sample), the standard deviation is the square root of the corresponding variance:

The population standard deviation is denoted by σ:

σ =

σ^2.

The sample standard deviation is denoted by s:

s =

s^2.

Advantage of the (population or sample) standard deviation: it is given in the same units as the observations.

Advantage of the (population or sample) variance: it is easier to manipulate algebraically, in some cases.