Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Descriptive Statistics, Correlation, and Regression: Introduction to Biostatistics, Lecture notes of Biostatistics

COMSATS Institute of Information Technology (CIIT)Biostatistics

Correlation and regression are related in the sense that both deal with relationships among variables.

Typology: Lecture notes

2018/2019

Uploaded on 02/15/2019

saad-mughal 🇵🇰

(1)

2 documents

1 / 59

This page cannot be seen from the preview

Don't miss anything!

Descriptive statistics

Correlation

Regression

Descriptive statistics; Correlation and regression

Patrick Breheny

September 16

Patrick Breheny STA 580: Biostatistics I 1/59

Discover Lecture notes of Biostatistics COMSATS Institute of Information Technology (CIIT)

Partial preview of the text

Download Descriptive Statistics, Correlation, and Regression: Introduction to Biostatistics and more Lecture notes Biostatistics in PDF only on Docsity!

Descriptive statistics CorrelationRegression

Descriptive statistics; Correlation and regression

Patrick Breheny

September 16

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Tables and figures

Human beings are not good at sifting through large streams of data; we understand data much better when it is summarized for us We often display summary statistics in one of two ways: tables and figures Tables of summary statistics are very common (we have already seen several in this course) – nearly all published studies in medicine and public health contain a table of basic summary statistics describing their sample However, figures are usually better than tables in terms of distilling clear trends from large amounts of information

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Categorical data

Summarizing categorical data is pretty straightforward – you just count how many times each category occurs Instead of counts, we are often interested in percents A percent is a special type of rate, a rate per hundred Counts (also called frequencies), percents, and rates are the three basic summary statistics for categorical data, and are often displayed in tables or bar charts, as we saw in lab

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Continuous data

For continuous data, instead of a finite number of categories, observations can take on a potentially infinite number of values Summarizing continuous data is therefore much less straightforward To introduce concepts for describing and summarizing continuous data, we will look at data on infant mortality rates for 111 nations on three continents: Africa, Asia, and Europe

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Histogram of European infant mortality rates

Deaths per 1,000 births

Count

0 2 4

8 10

0 50 100 150 200

Africa

0 2

4 6

8 10

Asia

0 5 10 15 20 25

Europe

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Summarizing continuous data

As we can see, continuous data comes in a variety of shapes Nothing can replace seeing the picture, but if we had to summarize our data using just one or two numbers, how should we go about doing it? The aspect of the histogram we are usually most interested in is, “Where is its center?” This is typically represented by the average

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Spread

The second most important bit of information from the histogram to summarize is, “How spread out are the observations around the center”? This is most typically represented by the standard deviation To understand how standard deviation works, let’s return to our small example with the numbers { 4 , 5 , 1 , 9 } Each of these numbers deviates from the mean by some amount:

4 − 4 .75 = − 0. 75 5 − 4 .75 = 0. 25 1 − 4 .75 = − 3. 75 9 − 4 .75 = 4. 25

How should we measure the overall size of these deviations?

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Root-mean-square

Taking their mean isn’t going to tell us anything (why not?) We could take the average of their absolute values:

|− 0. 75 | + | 0. 25 | + |− 3. 75 | + | 4. 25 | 4

But it turns out that for a variety of reasons, the root-mean-square works better as a measure of overall size: √ (− 0 .75)^2 + (0.25)^2 + (− 3 .75)^2 + (4.25)^2 4

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Meaning of the standard deviation

The standard deviation (SD) describes how far away numbers in a list are from their average The SD is often used as a “plus or minus” number, as in “adult women tend to be about 5’4, plus or minus 3 inches” Most numbers (roughly 68%) will be within 1 SD away from the average Very few entries (roughly 5%) will be more than 2 SD away from the average This rule of thumb works very well for a wide variety of data; we’ll discuss where these numbers come from in a few weeks

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Standard deviation and the histogram

Background areas within 1 SD of the mean are shaded:

Deaths per 1,000 births

Count

0 2 4 6 8 10 50 100 150 200

Africa

0 50 100 150

Asia

10 20 30 40

Europe

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Summaries can be misleading!

All of the following have the same mean and standard deviation:

−4 −2 0 2 4

Frequency

−4 −2 0 2 4

Frequency

−4 −2 0 2 4

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Percentiles

The average and standard deviation are not the only ways to summarize continuous data Another type of summary is the percentile A number is the 25th percentile of a list of numbers if it is bigger than 25% of the numbers in the list The 50th percentile is given a special name: the median The median, like the mean, can be used to answer the question, “Where is the center of the histogram?”

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Skew

Note that the histogram for Europe is not symmetric: the tail of the distribution extends further to the right than it does to the left Such distributions are called skewed The distribution of infant mortality rates in Europe is said to be right skewed or skewed to the right For asymmetric/skewed data, the mean and the median will be different

Descriptive statistics CorrelationRegression Numerical summariesPercentiles

Hypothetical example

Azerbaijan had the highest infant mortality rate in Europe at 37 What if, instead of 37, it was 200? Mean Median Real 14.1 11 Hypothetical 19.2 11 The mean is now higher than 72% of the countries Note that the average is sensitive to extreme values, while the median is not; statisticians say that the median is robust to the presence of outlying observations

Descriptive Statistics, Correlation, and Regression: Introduction to Biostatistics, Lecture notes of Biostatistics

Related documents

Partial preview of the text

Download Descriptive Statistics, Correlation, and Regression: Introduction to Biostatistics and more Lecture notes Biostatistics in PDF only on Docsity!

Descriptive statistics; Correlation and regression

Tables and figures

Categorical data

Continuous data

Histogram of European infant mortality rates

Summarizing continuous data

Spread

Root-mean-square

Meaning of the standard deviation

Standard deviation and the histogram

Summaries can be misleading!

Percentiles

Skew

Hypothetical example