Descriptive Statistics: Understanding Categorical and Numerical Data, Summaries of Descriptive statistics

An overview of descriptive statistics, the difference between categorical and numerical data, and methods for analyzing each type. It includes examples of barplots, histograms, boxplots, and scatterplots, as well as measures of location and spread such as mean, median, standard deviation, variance, minimum, maximum, range, and inter-quartile range.

Typology: Summaries

2021/2022

Uploaded on 07/05/2022

barbara_gr
barbara_gr ๐Ÿ‡ฆ๐Ÿ‡บ

4.6

(73)

1K documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistical Data Analysis:
Descriptive Statistics
Jane Meza, PhD
Fang Yu, PhD
Lynette Smith, PhD
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Descriptive Statistics: Understanding Categorical and Numerical Data and more Summaries Descriptive statistics in PDF only on Docsity!

Statistical Data Analysis:

Descriptive Statistics

Jane Meza, PhD

Fang Yu, PhD

Lynette Smith, PhD

Outline

Statistical Analysis: Descriptive vs.

Inferential

Data Type

Numeric Data

Categorical Data

Descriptive Statistics and Plots

Categorical Data

  • It provides qualitative description
  • Observations can take only one of limited set of possible values
  • Three Types of Categorical data:
    • Binary/Dichotomous data: Only two categories
    • Nominal: More than 2 categories, and no obvious ordering
of the categories
  • Ordinal: More than 2 categories, with a natural ordering
of categories
  • Examples:

Gender (female/male) Binary

Blood Group (A/B/AB/O) Nominal

Disease Stage (I/II/III/IV) Ordinal

Numerical Data

Provides quantitative description
Discrete data
1. Observations can only take certain numeric
values, usually counts of events
2. Example: number of doctor visits in a year
3. What is the difference between discrete data and
ordinal data?
  • Stage of breast cancer (ordinal): 1, 2, 3, 4
Number of doctor visits (discrete): 0, 1, 2, 3,

Methods for Descriptive

Statistics

Categorical data

1. Descriptive plots: barplot / bargraph

2. Descriptive statistics: contingency tables

Numerical data

1. Descriptive plots: histogram, boxplot, Scatter Plot

2. Descriptive statistics: location, spread

Barplot for Categorical Data

Useful to describe the distribution of values for a

categorical variable

Proportion of cases versus stage of cancer

*: number inside parenthesis is the percentage within the row (the corresponding treatment group)

What do we conclude about comparability of the
groups at baseline in terms of race?

White Black Other Total

Enalapril+Felodipine ER 52(48%*) 36(33%) 21(19%) 109

Enalapril 52(48%) 33(31%) 23(21%) 108

Total (^104 69 44) 217

Contingency Tables: Two Variables

Histogram for Continuous Data

  • bin width = 5 yrs
Data that is not symmetric is skewed.
Mean may not be a good measure of central tendency.
Why?

0 5 10 15 20

0

5

10

15

20

25

30

var

-20 -15 -10 -5 0

0

5

10

15

20

25

30

var

Skewed Distributions

Negative skew, or skewed to the left, mean < median

Positive skew, or skewed to the right, mean > median

Measures of Spread

Measure the variability:

  • Variance: average squared deviation from the mean
  • Standard deviation S (or SD ) : square root of
variance, in same units as the original data
Example: find the standard deviation of the disease
duration
(1) Mean=
(2) Variance: 4.
(3) Standard deviation: sqrt(4.5)=2.

Choosing Measures of Location & Spread

How to choose statistics to describe the location

and spread of data?

  • Symmetric distribution:

Location: mean

Spread: standard deviation

  • Skewed distribution:

Location: median

Spread: range and/or IQR

Boxplot

Median, Q

75 th^ Percentile, Q

25 th^ Percentile, Q

Age (yrs) vertical, without extreme/outlying values.

Scatter Plot : Describe joint distribution of

values from two continuous variables

0 .70 0 .80 0. Waist-hip Ratio

2 0.0 0

2 5.0 0

3 0.0 0

3 5.0 0

4 0.0 0

BMI

๏

๏

๏ ๏

๏

๏

๏

๏ ๏ ๏ ๏ (^) ๏

๏ ๏ ๏ (^) ๏ ๏

๏ ๏ ๏ ๏๏ ๏

๏

๏

๏

๏ (^) ๏๏ ๏ ๏

๏

๏ ๏๏

๏

๏๏ ๏ ๏๏ ๏

๏ (^) ๏ ๏

๏

๏

๏

๏

๏ ๏๏

๏

๏

๏

๏ ๏

๏ ๏

๏ ๏

๏ ๏ ๏ ๏ ๏ (^) ๏

๏

๏ ๏

๏

๏ ๏๏

๏

๏๏

๏ (^) ๏

๏

๏๏ ๏๏

๏ ๏

๏๏

๏

๏

๏ ๏

๏

๏ ๏๏^ ๏๏๏๏ ๏๏ (^) ๏ ๏๏ ๏๏

๏

๏

๏๏ ๏ ๏ ๏ ๏

๏ (^) ๏ ๏

๏ (^) ๏

๏ ๏ ๏

๏๏ (^) ๏ ๏ (^) ๏ ๏ ๏๏

๏

๏ ๏

๏

๏ ๏ (^) ๏

๏

๏ ๏ ๏ ๏

๏

๏

๏

๏

๏ (^) ๏ ๏

๏ ๏

๏ ๏ ๏

๏

๏

๏

๏

๏

๏

๏ ๏ ๏

๏

๏ ๏ ๏ ๏๏

๏ ๏

๏

๏ ๏

๏ (^) ๏

๏

๏ ๏ ๏ ๏

๏

๏

๏

๏

๏

๏

๏

๏

๏

๏ ๏

๏

๏ ๏ ๏

๏

๏

๏

๏

๏

๏

๏ ๏

๏

๏

๏

๏

๏

๏

Plot of Body Mass Index versus Waist-hip Ratio

(Data Source: Fundamentals of Biostatistics by Rosner)

Summary of Descriptive

Statistics

โ€ข Categorical data

1. Bar plot

2. Contingency tables

โ€ข Numerical data

1. Histogram, box-plot, scatter plot

2. Symmetric distribution: mean, SD

3. Skewed distribution: median, IQR