Chemometric ANOVA, Lecture Notes- Physics, Study notes of Advanced Physics

Variance, Source of variance, analysis of variance, sum of square analysis Dunnett test, Tukey test, confidence interval of means, ANOVA table, T values, F test

Typology: Study notes

2010/2011

Uploaded on 09/10/2011

gerrard_11
gerrard_11 🇬🇧

4.3

(6)

234 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Variance
Simple analysis of variance
Confidence Interval of the Mean
t values
Do two means differ
F test
Simple analysis of variance
So far, we’ve assumed that all observed
variance comes from a single, random source.
not likely
there can be many sources of variance
We’ll now introduce a way to analyze
the variance in sample sets.
Analysis of variance
In general, when sources of variance are linearly
related (independent and uncorrelated), the variances
are additive.
We often need to do experiments to evaluate the
magnitude and sources of variance.
stotal
2=s1
2+s2
2+.... +sk
2
Determining sources of variance
Let’s start with a simple example where there
should only be two potential sources of variance.
In this example, a series of four samples
are obtained and each is analyzed in
triplicate.
Determining sources of variance
Since there is error in any measurement, it’s not
surprising that the means are different.
We want to know if the difference is due to variance in
the method or real sample differences.
Simple two level model
S
2between = variance of sample material
S
2within = variance of analytical method
S
2total = S2between + S2within
Simple two level model
We have two potential sources of variance.
Samples may actually be different.
Run to run errors.
A simple set of calculations can be used to sort
out the sources of variance.
The F test can then be used to determine
if the variance values are significant.
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Chemometric ANOVA, Lecture Notes- Physics and more Study notes Advanced Physics in PDF only on Docsity!

Variance

Simple analysis of variance

Confidence Interval of the Mean

t values

Do two means differ

F test

Simple analysis of variance

So far, we’ve assumed that all observed

variance comes from a single, random source.

not likely

there can be many sources of variance

We’ll now introduce a way to analyze

the variance in sample sets.

Analysis of variance

In general, when sources of variance are linearly

related (independent and uncorrelated), the variances

are additive.

We often need to do experiments to evaluate the

magnitude and sources of variance.

s

total

2

= s

1

2

+ s

2

2

+.... + s

k

2

Determining sources of variance

Let’s start with a simple example where there

should only be two potential sources of variance.

In this example, a series of four samples

are obtained and each is analyzed in

triplicate.

Sample Replicates Mean

Determining sources of variance

Since there is error in any measurement, it’s not

surprising that the means are different.

We want to know if the difference is due to variance in

the method or real sample differences.

Simple two level model

S

between = variance of sample material

S

within = variance of analytical method

S

total

= S

between

+ S

within

Simple two level model

We have two potential sources of variance.

Samples may actually be different.

Run to run errors.

A simple set of calculations can be used to sort

out the sources of variance.

The F test can then be used to determine

if the variance values are significant.

Simple two level model

Level 1

Level 2

Level 1 gives us an idea as to sample variability

Level 2 tells use about the method variability

X

1

X

2

X

3

X

4

x

1

’ x

1

’ x

1

’ x

2

’ x

2

’ x

2

’ x

3

’ x

3

’ x

3

’ x

4

’ x

4

’ x

4

Simple analysis of variance

First, calculate sum of

squares for all values

in your sample :

The variation of the

total mean can the be

calculated as:

ssT = x i - x

T

! ` j

2

MST =

dfT

x i - x

T

! ` j

2

df

T

= total - 1

x T

= Grand Mean

mean of all the points

ss between = nr x

s

- x

T

` j

2

MS between =

df (^) s

ss between

Next, calculate the between sample variance

Then the mean square for the samples

x s

= mean of each sample

df (^) s =# samples - 1

nr =# replicates per sample

ssT = ss between + sswithin

Since you know SST and SSbetween, you can find the

within sample variance by:

The mean sum of squares for our replicates is then:

MSwithin =

dfT - # samples

^ ss T - ss between h

Simple

analysis of

variance

Back to the

example.

ReplicatesReplicatesReplicates

1 15.9 16.1 16.

2 14.9 15.2 15.

3 14.8 15.8 15.

4 16.2 16.0 15.

Source df SS MS

Total

SS

T

Sample

SS

between

Replicate

SS

within

F test

OK. We’ve done several

calculations. Now what?

We can now use the F test to

determine if there is a significant

difference between the two

sources of variance.

F is then compared to F c to see if

the difference is significant. This

will be covered in a bit.

F =

s

small

2

s

big

2

Using XLStat

Note: XLstat does not report Fc values - just the P value

  • the probability that your values are NOT different.

Data must be ordered in a single column.

Two methods are available for calculating sum of

squares for your groups - Type I and III. These are only

useful for more complex multivariable ANOVA

Might as well review them at this point.

Sum of squares analysis.

Type I (Sequential)

The Sums of Squares obtained by fitting effects in the order

specified in the model. Type I SS for each effect will change

if the order of the effects in the model is changed.

Type III (Marginal)

The Sums of Squares obtained by fitting each effect after all

the other terms in the model. The Type III SS do not depend

upon the order in which effects are specified in the model.

Sum of squares analysis.

Type I SS - Useful to explore unbalanced experimental data - where

some effects are measured more than others. Can also show flaws in

an experimental design (next chapter)

Type III Sums of Squares are preferable in most cases since they

correspond to the variation attributable to an effect after correcting for

any other effects in the model. They are unaffected by the frequency

of observations.

With a balanced experiment (all combinations measured with equal

frequency), Type I and III give the same results.

Analysis of variance:Analysis of variance:Analysis of variance:Analysis of variance:Analysis of variance:Analysis of variance:

Source DF

Sum of

squares

Mean

squares

F Pr > F

Model 11 438.943 39.904 40.264 < 0.

Error 36 35.678 0.

Corrected

Total

Fcrit =

There is a difference. Can we tell what it is?

In this example, there is <0.01% chance of there NOT

being a difference.

XLStat results.

Lead / Standardized

coefficients

Chemist-A1 Chemist-A

Chemist-A

Chemist-A4 Chemist-A

Chemist-A

Chemist-A

Chemist-A

Chemist-A

Chemist-A

Chemist-A

Chemist-A

-0.

0

1

Variable

Standardized coefficients

This plot shows

how each chemist

performed.

While results have

been normalized,

you’d get the

same basic plot

with the raw data.

Using XLStat

The Dunnett test is used to

compare samples (your chemists)

to a control.

There actually is no control but

the test provides a useful way of

comparing results.

In this case, choose Chemist A

because his/her results were the

lowest, causing the results to be

positive for the others.

Dunnett test

Compares group means.

Each is pitted against one control or

reference group.

Calculate a t test values for each group

comparison.

Test typically can only be used when all

groups are of equal size.

Dunnett test

Category Difference

Standardize

d difference

Critical

value

Critical

difference

Pr > Diff Significant

A1 vs A12 -10.798 -15.339 2.890 2.034 0.000 Yes

A1 vs A9 -8.078 -11.475 2.890 2.034 0.000 Yes

A1 vs A5 -6.345 -9.014 2.890 2.034 0.000 Yes

A1 vs A4 -6.328 -8.989 2.890 2.034 0.000 Yes

A1 vs A7 -5.838 -8.293 2.890 2.034 0.000 Yes

A1 vs A10 -5.227 -7.426 2.890 2.034 0.000 Yes

A1 vs A8 -4.903 -6.964 2.890 2.034 0.000 Yes

A1 vs A6 -4.475 -6.357 2.890 2.034 0.000 Yes

A1 vs A3 -2.575 -3.658 2.890 2.034 0.007 Yes

A1 vs A11 -2.165 -3.076 2.890 2.034 0.032 Yes

A1 vs A2 -0.105 -0.149 2.890 2.034 1.000 No

Tukey Test

“Honestly Significantly Different (HSD) test.

Based on pairwise comparison among means.

Mi - Mj = difference between pair means

MSE = mean square error

nh = the harmonized mean

Harmonized mean is the weighted

arithmetic mean, with each value's weight

being the reciprocal of the value.

ts =

nh

MSE

Mi - Mj

Harmonized

mean

nh =

xi

i = 1

n /

x

Tukey test

Contrast Difference

Standardized difference

Critical value Pr > Diff Significant

A12 vs A1 10.798 15.339 3.490 < 0.0001 Yes

A12 vs A2 10.693 15.190 3.490 < 0.0001 Yes

A12 vs A11 8.633 12.263 3.490 < 0.0001 Yes

A12 vs A3 8.223 11.681 3.490 < 0.0001 Yes

A12 vs A6 6.323 8.982 3.490 < 0.0001 Yes

A12 vs A8 5.895 8.374 3.490 < 0.0001 Yes

A12 vs A10 5.570 7.913 3.490 < 0.0001 Yes

A12 vs A7 4.960 7.046 3.490 < 0.0001 Yes

Compares each

chemist’s results

to see if there is a

significant

difference.

Tukey test

Provides grouping

of chemists with

statistically similar

results (95%

confidence.)

Chemist Means GroupsGroupsGroupsGroupsGroupsGroups

A12 45.170 A

A9 42.450 B

A5 40.718 B C

A4 40.700 B C

A7 40.210 B C

A10 39.600 C

A8 39.275 C D

A6 38.848 C D E

A3 36.948 D E

A11 36.538 E F

A2 34.478 F

A1 34.373 F

t values

t values account for error introduced based on

sample size, degrees of freedom and potential

sample skew. Actually use χ

distribution - chi

squared.

n-

= (n-1) s

This allows us to estimate population

variance from sample variance. All of

this is tied together into the t values.

t

values

Confidence level

Degrees of 90% 95% 99%

freedom t . t . t .

Example

Data: 1.01, 1.02, 1.10, 0.95, 1.

mean = 1.

s

x

s

x

t values for 4 degrees of freedom

90% confidence = 2.

95% confidence = 2.

Example

CI

CI

2.13 x 0.

5

1/

2.78 x 0.

5

1/

t test example

Beyond the mean

You can have samples that are considered significantly

different and still have the same mean.

In both examples, the populations would be considered to

be different - even though the means, medians and modes

are identical in example on the right.

The F test

This test can be used to tell if two populations

are different based on changes in variance.

Examples

Has the measurement precision changed?

Has the method been altered?

Were there any significant changes due to

the lab or analyst?

The F test

Calculation of F

F is always 1 or greater and depends on the

confidence level and degrees of freedom for

both data sets

You can look up the F

c

value for the

desired levels or use a spreadsheet.

F =

S

2 larger

S

2 smaller

Example

A - mean = 50 mg/l, s = 2.0 mg/l, n = 5, df = 4

B - mean = 45 mg/l, s = 1.5 mg/l, n = 6, df = 5

F = 2

F

c is 5.19 at 95% confidence

The variance values are essentially the same so the

means must really differ.

You need to be concerned with differences

in both the mean and sample variance.

For this example, F

would not exceed

F

c but the means

are significantly

different.

Comparison of the methods

The difference in the means is

smaller than the sample

variance.

Comparison of the methods

Here, the means are identical but

the distributions look different.

However, the lower curve is for a

much smaller data set.

The F test would show them to be

the same.

It accounts for the variations in

sample size - using df.