Understanding Measures of Central Tendency, Correlation, and Effect Sizes in Statistics - , Exams of Engineering

An overview of statistics, focusing on measures of central tendency, correlation analysis, and effect sizes. It covers concepts such as descriptive and inferential statistics, measures of central tendency including mean, median, mode, and measures of dispersion. The document also explains correlation analysis, its purpose, and the correlation coefficient. Effect sizes and their importance are also discussed, including cohen's d, hedges' g, and standardized difference between means.

Typology: Exams

Pre 2010

Uploaded on 08/18/2009

koofers-user-pw1-1
koofers-user-pw1-1 🇺🇸

9 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
5/6/2009
1
STATISTICAL DATA
ANALYSIS
Dr. Yan Liu
Department of Biomedical, Industrial and Human Factors Engineering
Wright State University
2
Statistics
Types of Statistics
Descriptive statistics
Comprises the statistical methods dealing with the collection, tabulation and
summarization of data, so as to present meaningful information of the data
Inferential statistics
Consists of the methods involved with the analysis and interpretation of data that will
enable the researcher to develop meaningful inferences about the data
These two areas interrelated
While descriptive statistics organizes the collected data in a systematic manner,
inferential statistics analyzes the data and enables one to produce significant
inferences about it
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Understanding Measures of Central Tendency, Correlation, and Effect Sizes in Statistics - and more Exams Engineering in PDF only on Docsity!

STATISTICAL DATA

ANALYSIS

Dr. Yan Liu

Department of Biomedical, Industrial and Human Factors Engineering

Wright State University

2

Statistics

 Types of Statistics

 Descriptive statistics

 Comprises the statistical methods dealing with the collection, tabulation and

summarization of data, so as to present meaningful information of the data

 Inferential statistics

 Consists of the methods involved with the analysis and interpretation of data that will

enable the researcher to develop meaningful inferences about the data

 These two areas interrelated

 While descriptive statistics organizes the collected data in a systematic manner,

inferential statistics analyzes the data and enables one to produce significant

inferences about it

3

Measures of Central Tendency

 Indicate the central point or the greatest frequency concerning a set of

data

 Mean

 The statistical mean of a set of data is its average

 Population mean vs. sample mean

 Population mean, μ , is the expected value E( x ), such that if an infinite number of

measurements are made, the average of the infinite measurements is the result; this

represents the true value of a measurement

 The sample mean, , is the average value of a sample, which is a finite series of

measurements, and is an estimate of the population mean

 Median

 The median of a set of data, , is the value which, when the data are arranged in

an ascending or descending order, satisfies the following conditions: 1) If the

number of data is odd, the median is the middle value; and 2) If the number of

observations is even, the median is the average of the two middle values

 The same as the 50th percentile of a set of data

x

x ~

4

Measures of Central Tendency (Cont’d)

 Mode

 The mode of a set of data is the specific value that occurs with the greatest

frequency

 May be more than one or none

7

Correlation Analysis

 Purpose

 Measures how strongly two attributes correlate with each other

 Correlation Coefficient

 Correlation analysis for numerical variables

 Indicates the strength and direction of a linear relationship between two numeric

random variables

 Pearson’s product moment coefficient

 Curvilinear Relationship

 If the relationship between two variables is curvilinear, the Pearson product-

moment correlation coefficient will not indicate the existence of a relationship

 To check whether the relationship is linear, the easiest way is to construct a

scatterplot which gives a visual indication of the shape of the relationship

A B

i

N

i=

i

AB

i

n

i

i

nσ σ

(ab)-nAB

n

a A bB rAB

= 1 1 σ σ

8

Perfect Positive Linear Relationship Perfect Negative Linear Relationship

Curvilinear Relationship

(quadratic relationship with an

intermediate minimum)

No Relationship

9

Correlation Analysis (Cont’d)

 Chi-Square (χ^2 ) Test

 Correlation analysis for categorical variables

A

A 1 A 2 … Ap

B

B 1 A 1 B 1 A 2 B 1 … ApB 1 B 2 A 1 B 2 A 2 B 2 … ApB 2 … … … … … Bq A 1 Bq A 2 Bq … ApBq

Cell (Ai, Bj) represents the joint event that A= Ai and B= Bj

(-1)(- 1)

2

= 1 = 1

2 ( - )

2 p q

p

i

q

j

e

o e ij

ij ij : statistic test of the hypothesis

that A and B are independent

n

countA A countB B ij

i j

e

( = )× ( = )

n : number of cases in each variable

Count ( A=Ai ): the observed frequency of the event A=Ai

Count ( B=Bj ): the observed frequency of the event B=Bj

Gender (^) Row male female Margin Preferred_ Reading

fiction 250 200 450 non-fiction 50 1000 1050

Column Margin 300 1200 n=

(male)× (fiction) 300 × 450 11 n

count count

e

(male)× (non-fiction) 300 × 1050 12 n

count count

e

(female)× (fiction) 1200 × 450 21 n

count count

e

(female)× (non-fiction) 1200 × 1050 22 n

count count

e

10

Gender and Preferred Reading Example

13

Standardized Difference Between Means

 Cohen’s d

 Defined as mean difference divided by the pooled standard deviation

 Interpretation of Cohen’s d

 Small effect size: ~[0.0, 0.5)

 Medium effect size: ~[0.5, 0.8)

 Large effect size: ~[0.8, +∞)

p

X X d σ 1 − 2 =

1 2

112 222

σ+ σ σ = n n

n n p

σp is pooled population standard deviation of X 1 and X 2

when n 1 =n 2 , 2

σ 12 +σ^22 σ (^) p =

14

Standardized Difference Between Means (Cont’d)

 Hedges’ g

 Virtually the same as Cohen’s d in large sample sizes

 Some software (e.g. Effect Size Generator) calculates g by adjusting the overall

effect size based on the sample sizes, as follows

122

2 ( 21 ) 2 2 ( 11 ) 1

1 2 1 2

− + −

− − = = n n

p n S n S

X X S

X X g Sp is pooled sample standard deviation of^ X 1 and^ X 2

S 1 = σ 1 √n 1 /(n 1 -1), S 2 = σ 2 √n 2 /(n 2 -1)

( (^1 4) (^3 ) 9 ) ( (^14) (^3 ) 9 ) 1 2 1 22

2 ( 21 ) 2 2 ( 11 ) 1

1 2 1 2

1 2

− = − = −

− + − n n

X X S n n

X X

n n

p n S n S

g

2

12 +^22

S S when n 1 =n 2 , Sp

15

Standardized Difference Between Means (Cont’d)

 Glass’s Delta ∆

 Defined as the mean difference between the experimental and control group

divided by the standard deviation of the control group

S control

X (^) 1 − X 2 ∆ =

16

Cohen’s d

= =- 1. 2

  1. 1662 + 1. 3232 )

d

Interface A)

A1 6 5 5 7 4 3 5 4

A2 8 6 9 6 6 5 5 7

Visual Interface Example (I)

X 1 = 4. 875 X 2 =^6.^5

σ = = 1. 166

∑ (^) ( - ) 1

= 1

2 1 1 n

X X

n i i σ = = 1. 323

∑ (^) ( - ) 2

= 1

2 2 2 n

X X

n i i

= (^) - 1 = 1. 246

∑ (^) ( - ) 1

= 1

2 1 1 n

X X

n i i S =^ - 1 =^1.^414

∑ ( - ) 2

= 1

2 2 2 n

X X

n i i S

Hedges’ g

= =- 1. 2

  1. 2462 + 1. 4142 )

g

19

Participant (S) 1 2 3 4 5 6 7 8

Interface A)

A1 6 5 5 7 4 3 5 4

A2 8 6 9 6 6 5 5 7

D 12 -2 -1 -4 1 -2 -2 0 -

Visual Interface Example

D 12 = 1. 625

σ = = 1. 495

∑ (^) (D-D) D

= 1

2

n

n i i

Cohen’s d (^) = (^1). 495 = 1. 087

  1. 625 d

= (^) - 1 = 1. 598

∑ (^) (D-D) D

= 1

2

n

n i i S

Hedges’ g (^) = (^1). 598 = 1. 017

  1. 625 g

20

Effect Sizes of Correlation

 Pearson Product-Moment Correlation Coefficient

 Correlation between numeric variables

 Point Biserial Correlation Coefficient (rpb)

 Used when one variable, say X 1 , is continuous but the other variable, say X 2 , is

dichotomous

 Assuming that X 2 has two values, 0 and 1, the data set can be divided into two

groups, group 1 which receives the value "1" on X 2 and group 2 which receives

the value "0" on X 2. Then rpb is calculated as follows

( 1 0 )( 1 01 )

1 0 10

− ⋅ = (^) n n n n

nn S

M M rpb (^) X

where M 1 is the mean of X 1 for all data points in group 1 of X 2 , M 0 is the

mean of X for all data points in group 2 of X 2 , n 1 is the number of data points

in group 1, n 0 is the number of data points in group 2

21

Effect Sizes for ANOVA

 Effect Sizes for ANOVA

 Measure the degree of association between an effect (i.e., a main effect, an

interaction) and the dependent variable

 Can be thought of as the correlation between an effect and the dependent

variable

 If the value of the measure of association is squared, it can be interpreted as the

proportion of variance in the dependent variable that is attributable to each effect

 Commonly used measures of effect size in AVOVA

 Eta squared, η^2

 Partial Eta squared, ηp^2

 Omega squared, ω^2

 Intraclass correlation, ρI

 η^2 and ηp^2 are estimates of degree of association for the sample, while ω^2 and ρI are

estimates of the degree of association in the population

22

Eta Squared

 Eta Squared, η^2

 The proportion of the total variance that is attributed to an effect

 Statistical issue

 The effect size of an effect is dependent upon the number and magnitude of other

effects

T

Effect SS

2 SS η =

Effect

Sum of

Squares

η^2

Drive 24

Reward 112 18.36%

Reward * Drive 144 23.61%

Error 330 54.10%

SST 610

25

Within-Subject Design

Effect

Sum of

Squares

ηp^2

Interface 10.56 =10.56/(10.56+8.94) =54.2%

Participant 15.

Participant * Interface

(error term)

Total 35.

26

Omega Squared

 Omega Squared, ω^2

 An estimate of the dependent variance accounted for by the independent

variable in the population for a fixed effects model

 ω^2 for between-subjects, fixed effects is

 ω^2 is always smaller than either η^2 or η p^2

(SS MS )

2 (SS ( )(MS ))

T Err

Effect Effect Err

=

df ω

Effect

Sum of

Squares

df

Mean

Squares

ω^2

Drive 24 1 24

Reward 112 2 56 12.0%

Reward *

Drive

Error 330 18 18.

SST 610

27

Within-Subject Design

Effect

Sum of

Squares

df Mean Squares ω^2

Interface 10.56 1 10.

Participant 15.94 7 2.

Participant * Interface

(error term)

Total 35.43 15

28

Intraclass Correlation

 Intraclass Correlation, ρI

 An estimate of the dependent variance accounted for by the independent

variable in the population for a random effects model

(MS ( )(MS ))

(MS MS )

Effect Effect Err

Effect Err I + df

ρ =

31

σ^2 = 60

n = 48

Critical region

Pr(reject H 0 |H 0 is false)= π = 0.

Compare Two Population Means: Independent Samples (II)

32

Compare Two Population Means with

Independent Samples

 For large samples (n>30), the sample size per group n needs to satisfy

2

2 2 1 - α/ 2 π ∆

2 ( + )•σ ≥

z z error n

σ^2 error: the within group variance

∆: the smallest difference between the two groups you wish to detect

z1-α/2 : the percentile of the normal distribution used as the critical value in a two-tailed

test of size (1.96 for α = 0.05)

zπ : the π ×100-th percentile of the normal distribution (0.84 for the 80-th percentile)

 For small samples (n<30), the sample size per group n needs to satisfy

2

2 2 1 - α/ 2 ,- 1 π,,- 1 ∆

2 ( + )•σ ≥

t n t n error n

 Since the particular t distribution depends on the sample size, the equation must be

solved iteratively (trial-and-error)

 The sample size increases with σerror and decreases with ∆

33

Compare Two Population Means with

Independent Samples (Cont’d)

 Estimate the Within-Group Standard Deviation

 Often comes from previous similar studies

 Sometimes it is necessary to a pilot study to get some idea of the inherent

variability

 Conservative estimates (estimates that lead to a slightly larger sample size) are

preferable to underestimates

 Rules of Thumb

 For 80% power, need 393 samples for each group when Cohen’s d = 0.2, 64

samples when d =0.5, and 26 samples when d =0.

Compare Two Population Means with

Paired Samples

 The formula for the total number of pairs is the same as for the number of

independent samples except that the factor of 2 is dropped, i.e.

 Rules of Thumb

 For 80% power, need 196 samples for each group when Cohen’s d = 0.2, 32

samples when d =0.5, and 13 samples when d =0.

34

2

2 2 1 - α/ 2 π ∆

( + )•σ ≥

z z error n (when^ n >30)

2

2 2 ( 1 - / 2 ,- 1 π,,- 1 ) ≥ ∆

t n + t nerror n

α σ

(when n <30)