Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Analysis - Introduction to Statistics - Lecture notes, Study notes of Statistics

Institute of Mathematics and Applications Statistics

Data Analysis, Data Description, Parameter, Statistic, Measures of central tendency, Arithmetic average, Mean for grouped Data, Median, Mode, Modal class, Revisiting distribution shapes are learning points available in this lecture notes.

Typology: Study notes

2011/2012

Uploaded on 11/14/2012

dharm 🇮🇳

4.3

(24)

59 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

1

Section III

Data Analysis

Data Description

When measuring data it is important to note the difference between studies on samples

and studies on populations.

A parameter is a measure or characteristic obtained by studying all data values from a

population while a statistic is derived from a sample.

For many attributes we will have separate symbols for a statistic and a parameter even

though the method for computing them is the same.

The number of datum in a sample will be n as before, but for a population it will be

denoted N.

For writing equations, generic sample or population will be denoted with X values for

each datum:

Example Data:

Sample: {X1, X2, X3, ⋅ ⋅ ⋅ ⋅ ⋅ , Xn}

Population: {X1, X2, X3, ⋅ ⋅ ⋅ ⋅ ⋅ , XN}

The ambiguous term “average” is actually a category known in statistics as measures of

central tendency that includes the mean, median, mode and midrange. Another often-

used average is the weighted mean.

How data varies compared to these averages is a very useful characteristic to study.

Measures of Central Tendency

“A person has on average 1460 dreams in 1 year”

The mean:

Using a population or sample (the classical arithmetic average).

Sample of (size n)

The Mean

X

=

X1 +X2 +X3 iiiiii+Xn

n

Population (of size N)

The Mean µ =

X1 +X2 +X3 iiiiii+XN

N

Docsity.com

Discover Study notes of Statistics Institute of Mathematics and Applications

Partial preview of the text

Download Data Analysis - Introduction to Statistics - Lecture notes and more Study notes Statistics in PDF only on Docsity!

Section III Data Analysis Data Description When measuring data it is important to note the difference between studies on samples and studies on populations. A parameter is a measure or characteristic obtained by studying all data values from a population while a statistic is derived from a sample. For many attributes we will have separate symbols for a statistic and a parameter even though the method for computing them is the same. The number of datum in a sample will be n as before, but for a population it will be denoted N. For writing equations, generic sample or population will be denoted with X values for each datum: Example Data: Sample: {X 1 , X 2 , X 3 , ⋅ ⋅ ⋅ ⋅ ⋅ , X n } Population: {X 1 , X 2 , X 3 , ⋅ ⋅ ⋅ ⋅ ⋅ , XN} The ambiguous term “average” is actually a category known in statistics as measures of central tendency that includes the mean , median , mode and midrange. Another often- used average is the weighted mean. How data varies compared to these averages is a very useful characteristic to study. Measures of Central Tendency “A person has on average 1460 dreams in 1 year” The mean: Using a population or sample (the classical arithmetic average). Sample of (size n )

The Mean X^ =

X 1 +X 2 +X 3 i i i i i i+Xn

n

Population (of size N) The Mean μ =

X 1 +X 2 +X 3 i i i i i i+XN

N

In shortened form: X^ =

∑^ X n and μ = ∑^ X N Keep track of and memorize symbols like X and μ as other equations will sometimes include them without review. Find the mean for the following population and label appropriately: 22 19 8 2 4 13 16 7 Math tips:

If you are doing a calculation with intermediate steps do not round off until the very end. Frequent rounding can affect the computation significantly.
The mean should be rounded to one more decimal place than the raw data. The preceding formulas allow us to calculate the mean given a set of data. If we are given data that has already been organized into a frequency distribution we can also find the mean. Finding the Mean for grouped Data: Because data in a class can fall anywhere in the class range this is not the exact mean but a good approximation. We will use the class midpoint value Xm to represent every datum in the class. Looking at our previous example, how would we calculate the mean record high for all 50 states? (Where would your guess be?) Class Tally Frequency Cumulative 100 - 104 // 2 2 105 - 109 //////// 8 10 110 - 114 ////////////////// 18 28 115 - 119 ///////////// 13 41 120 - 124 /////// 7 48 125 - 129 / 1 49 130 - 134 / 1 50

The Median: The Median is the halfway point of the data set. Finding the Median MD :

Arrange the data in order ( data array )
Select the middle point Ex Data: 292, 300, 311, 401, 595, 618, 713
292, 300, 311, 401 , 595, 618, 71
MD = 401 Ex Data: 1, 7, 4, 2, 3, 4
1, 2, 3, 4, 4, 7
MD = 3 + 4 = 3. 2 Notes on the median:

A marker for which values fall into the upper and lower half of a distribution
Not as affected by outliers 2, 3, 4, 5, 7, 6, 5, 4, 36
Can be used for open ended distributions
The median will either be a specific data value or will fall between two data values.
There is no median type measurement for frequency distributions. The Mode: The value that occurs most often in a data set is called the mode. There can be bimodal or multimodal datasets depending on the number of modes. On the other hand, if no datum appears more than once then the data set has no mode. Ex Data: 100, 101, 105 , 110, 100, 105 , 103, 105 Since 105 occurs most often it is the mode

For Frequency distributions we speak of the modal class. The modal class is the class with the highest frequency. Class Tally Frequency 100 - 104 // 2 105 - 109 //////// 8 110 - 114 ////////////////// 18 115 - 119 ///////////// 13 120 - 124 /////// 7 125 - 129 / 1 130 - 134 / 1 Clearly in our record high temperature example, the modal class is 110˚ – 114˚. Notes on the mode:

Gives us most typical case.
Easy to compute
Can be used with nominal data o Most frequent eye color, political party, etc
The mode is not unique
There is no symbol for the mode Re-visiting distribution shapes

Measures of Variation Averages are useful concepts, but they become even more useful when you combine them with the concept of variance. One type of variance is the distance between highest and lowest value, or the range. Perhaps the most important type has to do with the average distance from the mean for a datum.

Since the mean is usually towards the middle of the distribution, however some data are in the negative direction and some are not.
As with most distances we only care how far, so to get around this problem we use the concepts of squaring and then applying a square root. This will give us the useful concepts of variance and standard deviation.

Paint Example Test: Brand A VS Brand B Variable: Months before fading Two small populations of 6 cans of each brand are tested with the following results: We can calculate the means: Brand A μ = ∑^ X N = 210/6 = 35 months Brand B μ = ∑^ X N = 210/6 = 35 months Brand A Brand B 10 35 60 45 50 30 30 35 40 40 20 25

Difference: 60 - 35 = 25 50 - 35 = 15 40 - 35 = 5 30 - 35 = - 5 20 - 35 = - 15 10 - 35 = - 25 For variance and standard deviation we only want to know how far off on average, not in which direction. Square them, add them up and divide by N this gives the average of the squares of the distance from the mean called variance. Variance = 625 + 625 + 225 + 25 + 25 + 225 = 291. 6 To get the standard deviation we simply return to our scale by square rooting. Standard deviation = 17. Standard deviation for a population σ Standard deviation for a sample s III. Variance and standard deviation for populations The algorithm we just used gives us the formula for variance when using a population and in turn the standard deviation for populations: σ 2 = ∑ (^ X^ −^ μ) N 2 and^ σ^ =^ σ^ 2 = ( X^ −^ μ) 2 ∑ N

Method: These are the same steps we took from the Brand A population’s raw data set:

Calculate μ = ∑^ X N
Subtract the mean from each value
Square each difference

Sometimes it helps to set up the following table: X (^) X - μ (X - μ)^2

Sum them up
Divide by N to get the variance σ^2
Take the square root to get the standard deviation σ Now do the same calculations for brand B σ = σ 2 = ( X^ −^ μ) 2 ∑ N Go For It: Conclusion:

Brand A Brand B

V. Variance and standard deviation - grouped data (frequency distributions): We have only so far computed variance for Samples and populations from raw Data. Finding Sample Variance and Standard Deviation for Grouped Data: We will again use the midpoints of each class as an average value to get an approximate answer. The adjusted formula for variance is: s 2 =

n ∑ f i( Xm )

2

( ) −^ (^ ∑^ f^ i Xm )

2 n ( n − 1 ) Example : Compute s^2 and s for our earlier data for record temperatures.

Make a table: Class f 100 - 104 2 105 - 109 8 110 - 114 18 115 - 119 13 120 - 124 7 125 - 129 1 130 - 134 1
Find ∑ f = n ∑ f · Xm ∑ f · Xm^2
Use the above formula for s^2
Find the standard deviation