Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Arithmetic Mean - Basic Probability - Lecture Notes | MATH 30011, Study notes of Mathematical Statistics

Kent State University (KSU) - Ashtabula Campus Mathematical Statistics

Material Type: Notes; Class: BASIC PROBABILITY AND STATISTICS; Subject: Mathematics; University: Kent State University; Term: Spring 2000;

Typology: Study notes

Pre 2010

Uploaded on 02/25/2010

koofers-user-7cl 🇺🇸

10 documents

1 / 22

This page cannot be seen from the preview

Don't miss anything!

Lecture 2 and Lecture 3 1

Lecture 2 and Lecture 3

We can describe distributions using 3

characteristics: shape, center and spread.

•These characteristics have been discussed since

the foundation of statistics.

•Shape can be determined from pictures of data

(like histograms).

We discuss center and spread next.

Definition 1. Aparameter is a descriptive

measure of a population. A statistic is a

descriptive measure of a sample.

•Choice for the spread drives choice for the

center.

September 10, 2009

Discover Study notes of Mathematical Statistics Kent State University (KSU) - Ashtabula Campus

Partial preview of the text

Download Arithmetic Mean - Basic Probability - Lecture Notes | MATH 30011 and more Study notes Mathematical Statistics in PDF only on Docsity!

Lecture 2 and Lecture 3

We can describe distributions using 3 characteristics: shape, center and spread.

These characteristics have been discussed since the foundation of statistics.
Shape can be determined from pictures of data (like histograms). We discuss center and spread next.

Definition 1. A parameter is a descriptive measure of a population. A statistic is a descriptive measure of a sample.

Choice for the spread drives choice for the center.

Definition 2. The arithmetic mean of a variable is computed by determining the sum of all the values of the variable in the data set, divided by the number of observations. In other words, if x 1 , x 2 , · · · , xn are n observations of a variable from a population (sample), then the population (sample) arithmetic mean is denoted by μ (respectively ¯x for a sample) and is given by:

μ(or ¯x) = x^1 +^ x^2 + n^ · · ·^ +^ xn =

∑n i=1 xi n.

The median

Definition 3. The median of a variable is the value that lies in the middle of the data when arranged in ascending order. Usually we denote the median by M.

Remark Half the data are below the median and half are above the median.

How do we compute the median? a) Arrange the data in ascending order and find out the number of observations n. b)• If n is odd then there is a middle value and that is the median. It will be on the n+1 2 position.

If n is even there is no middle value and then the median is computed by computing the arithmetic mean of the n 2 and n 2 + 1 value.

Advantage: robustness/resistance, better measure of center for nonsymmetric distributions.
Disadvanteges: More difficult to compute, does not combine, not always unique (if n is even anything between the 2 middle observations could be a median).

Example: Compute the mean, median, and mode for the number of children in the households in my building: 6,2,1,0,3,1,4,1,2, In general, answer the questions: Question 1) If we take a value from the right of the median and move to the right do we change the mode? Do we change the mean?

Question 2) What if we take a value from the left of the median and we move it to the right of the median?

Question 3) What is the position of the mean relative to the median in the different types of distributions? Where is the mode?

Measures of Spread

The range Range = R = Largest Data Value – Smallest Data Value

Advanteges: easy to compute, useful in absolute terms
Disadvantes: uses only 2 points, hence there is no intermediate information.

How to find the percentile that

corresponds to a data value

Sort: Arrange the data in ascending order.
Compute:

percentile of x = number of values less thann X× 100.

Round off the percentile to the nearest integer.

Example

This section has 18 students. Suppose only 3 students get better scores than you in the final exam. At what percentile is your final-exam score?

Quartiles

The Quartiles are three specific quantiles that are used often:

First Quartile: Q 1 = P 25.

Second Quartile: Q 2 = P 50 (this is the median).

Third Quartile: Q 3 = P 75.

The quartiles divide the data in 4 pieces with (approximately) the same number of observations.

To compute a quartile, compute the corresponding percentile, or compute the median of the whole data and then the median of the first half of the data and the median of the second half of the data.

The interquartile range or IQR is computed by:

IQR = Q 3 − Q 1.

The IQR is an alternative to the standard deviation as a measure of spread.

If the data is approx. normal (symmetric, no outliers): the standard deviation is better (more information from less data).
If the data is not normal (asymmetric, outliers), the IQR is better (more robust).

Checking for Outliers

Determine Q 1 , Q3.
Compute IRQ = Q 3 − Q 1.
Find the fences:

Lower Fence = Q 1 − 1 .5(IQR)

Upper Fence = Q 3 + 1.5(IQR)

A value below the lower fence or above the upper fence is considered an outlier.

Boxplots

These are the graphical counterpart of the five-number summary, and are equally useful.

Drawing a boxplot (horizontally)

Find the five-number summary
Compute the lower and upper fences:

Lower Fence = Q 1 − 1 .5(IQR)

Upper Fence = Q 3 + 1.5(IQR)

Draw short vertical lines at Q 1 , M , and Q 3 ; enclose them in a box.
Draw shorter vertical lines (or brackets) at the fences.
Draw a line from Q 1 to the smallest data value inside the fences, and a line from Q 3 to the largest data value inside the fences.
Mark with an asterisk (?) any values outside the fences.

Example: Construct the boxplot of the heights 64, 67, 68,69,70,70,71,71,

The Sample Variance

When the population is large, we approximate the population mean μ with the sample mean, ¯x. Similarly, we approximate the population variance σ^2 by the sample variance, denoted s^2 :

s^2 =

(xi − x¯)^2 n − 1

= (x^1 −^ x¯)

(^2) + (x 2 − x¯) (^2) + · · · + (xn − x¯) 2 n − 1

The alternative form is:

s^2 =

(xi − x¯)^2 n − 1

xi)^2 n(n − 1)

REMARK: Notice that we divide by the sample size minus one (this is different from the formula for the population variance).

Informally, we say: a sample of size n has n degrees of freedom; one degree of freedom is “used up” in computing ¯x, so there are only n − 1 degrees of freedom available for the sample variance.

The Standard Deviation

For both cases (the population or the sample), the standard deviation is the square root of the corresponding variance:

The population standard deviation is denoted by σ:

σ =

σ^2.

The sample standard deviation is denoted by s:

s =

s^2.

Advantage of the (population or sample) standard deviation: it is given in the same units as the observations.

Advantage of the (population or sample) variance: it is easier to manipulate algebraically, in some cases.

Arithmetic Mean - Basic Probability - Lecture Notes | MATH 30011, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Arithmetic Mean - Basic Probability - Lecture Notes | MATH 30011 and more Study notes Mathematical Statistics in PDF only on Docsity!

Lecture 2 and Lecture 3

The median

Measures of Spread

How to find the percentile that

corresponds to a data value

Example

Quartiles

Checking for Outliers

Boxplots

The Sample Variance

The Standard Deviation