4.1 Two Factor Factorial Designs, Slides of Design

A two-factor factorial design is an experimental design in which data is collected for all possible combinations of the levels of the two factors of ...

Typology: Slides

2022/2023

Uploaded on 03/01/2023

stefan18
stefan18 🇺🇸

4.2

(36)

278 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
4FACTORIAL DESIGNS
4.1 Two Factor Factorial Designs
Atwo-factor factorial design is an experimental design in which data is collected for all possible
combinations of the levels of the two factors of interest.
If equal sample sizes are taken for each of the possible factor combinations then the design is a
balanced two-factor factorial design.
A balanced a×bfactorial design is a factorial design for which there are alevels of factor A,blevels
of factor B, and nindependent replications taken at each of the a×btreatment combinations. The
design size is N=abn.
The effect of a factor is defined to be the average change in the response associated with a change in
the level of the factor. This is usually called a main effect.
If the average change in response across the levels of one factor are not the same at all levels of the
other factor, then we say there is an interaction between the factors.
TYPE TOTALS MEANS (if nij =n)
Cell(i, j)yij ·=Pnij
k=1 yijk yij·=yij ·/nij =yij·/n
ith level of A yi·· =Pb
j=1 Pnij
k=1 yijk yi·· =yi··/Pb
j=1 nij =yi··/bn
jth level of B y·j·=Pa
i=1 Pnij
k=1 yijk y·j·=y·j·/Pa
i=1 nij =y·j·/an
Overall y··· =Pa
i=1 Pb
j=1 Pnij
k=1 yijk y··· =y···/Pa
i=1 Pb
j=1 nij =y···/abn
where nij is the number of observations in cell (i,j ).
EXAMPLE (A 2 ×2 balanced design): A virologist is interested in studying the effects of a= 2 different
culture media (M) and b= 2 different times (T) on the growth of a particular virus. She performs a
balanced design with n= 6 replicates for each of the 4 MTtreatment combinations. The N= 24
measurements were taken in a completely randomized order. The results:
THE DATA
M
Medium 1 Medium 2
12 21 23 20 25 24 29
Thours 22 28 26 26 25 27
18 37 38 35 31 29 30
hours 39 38 36 34 33 35
TOTALS
T= 1 T= 2
T= 12 y11·= 140 y12·= 156 y1·· = 296
T= 18 y21·= 223 y22·= 192 y2·· = 415
y·1·= 363 y·2·= 348 y··· = 711
i= Level of T j = Level of M
k= Observation number
yijk =kth observation from the ith
level of Tand jth level of M
MEANS
M= 1 M= 2
T= 12 y11·= 23.3y12·= 26 y1·· = 24.6
T= 18 y21·= 37.16 y22·= 32 y2·· = 34.583
y·1·= 30.25 y·2·= 29.00 y··· = 29.625
The effect of changing Tfrom 12 to 18 hours on the response depends on the level of M.
For medium 1, the Teffect = 37.16 23.3 =
For medium 2, the Teffect = 32 26 =
The effect on the response of changing Mfrom medium 1 to 2 depends on the level of T.
For T= 12 hours, the Meffect = 26 23.3 =
For T= 18 hours, the Meffect = 32 37.16 =
125
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download 4.1 Two Factor Factorial Designs and more Slides Design in PDF only on Docsity!

4 FACTORIAL DESIGNS

4.1 Two Factor Factorial Designs

  • A two-factor factorial design is an experimental design in which data is collected for all possible

combinations of the levels of the two factors of interest.

  • If equal sample sizes are taken for each of the possible factor combinations then the design is a

balanced two-factor factorial design.

  • A balanced a × b factorial design is a factorial design for which there are a levels of factor A, b levels

of factor B, and n independent replications taken at each of the a × b treatment combinations. The

design size is N = abn.

  • The effect of a factor is defined to be the average change in the response associated with a change in

the level of the factor. This is usually called a main effect.

  • If the average change in response across the levels of one factor are not the same at all levels of the

other factor, then we say there is an interaction between the factors.

TYPE TOTALS MEANS (if nij = n)

Cell(i, j) yij· =

∑nij

k=1 yijk^ yij·^ =^ yij·/nij^ =^ yij·/n

ith^ level of A yi·· =

∑b

j=

∑n ij k=

yijk yi·· = yi··/

∑b

j=

nij = yi··/bn

j

th level of B y·j· =

∑a

i=

∑nij

k=1 yijk^ y·j·^ =^ y·j·/^

∑a

i=1 nij^ =^ y·j·/an

Overall y··· =

a i=

b j=

∑n ij k=1 yijk^ y···^ =^ y···/^

a i=

b j=1 nij^ =^ y···/abn

where nij is the number of observations in cell (i, j).

EXAMPLE (A 2 × 2 balanced design): A virologist is interested in studying the effects of a = 2 different

culture media (M ) and b = 2 different times (T ) on the growth of a particular virus. She performs a

balanced design with n = 6 replicates for each of the 4 M ∗ T treatment combinations. The N = 24

measurements were taken in a completely randomized order. The results:

THE DATA

M

Medium 1 Medium 2

12 21 23 20 25 24 29

T hours 22 28 26 26 25 27

18 37 38 35 31 29 30

hours 39 38 36 34 33 35

TOTALS

T = 1 T = 2

T = 12 y 11 · = 140 y 12 · = 156 y 1 ·· = 296

T = 18 y 21 · = 223 y 22 · = 192 y 2 ·· = 415

y· 1 · = 363 y· 2 · = 348 y··· = 711

i = Level of T j = Level of M

k = Observation number

yijk = k

th observation from the i

th

level of T and jth^ level of M

MEANS

M = 1 M = 2

T = 12 y 11 · = 23. 3 y 12 · = 26 y 1 ·· = 24. 6

T = 18 y 21 · = 37. 16 y 22 · = 32 y 2 ·· = 34. 583

y· 1 · = 30. 25 y· 2 · = 29. 00 y··· = 29. 625

  • The effect of changing T from 12 to 18 hours on the response depends on the level of M.
    • For medium 1, the T effect = 37. 16 − 23. 3 =
    • For medium 2, the T effect = 32 − 26 =
  • The effect on the response of changing M from medium 1 to 2 depends on the level of T.
    • For T = 12 hours, the M effect = 26 − 23. 3 =
    • For T = 18 hours, the M effect = 32 −^37.^16 =
  • If either of these pairs of estimated effects are significantly different then we say there exists a

significant interaction between factors M and T. For the 2 × 2 design example:

  • If 13.83 is significantly different than 6 for the M effects, then we have a significant M ∗ T

interaction.

Or,

  • If 2.6 is significantly different than − 5 .16 for the T effects, then we have a significant M ∗ T

interaction.

  • There are two ways of defining an interaction between two factors A and B:
    • If the average change in response between the levels of factor A is not the same at all levels of

factor B, then an interaction exists between factors A and B.

  • The lack of additivity of factors A and B, or the nonparallelism of the mean profiles of A and

B, is called the interaction of A and B.

  • When we assume there is no interaction between A and B, we say the effects are additive.
  • An interaction plot or treatment means plot is a graphical tool for checking for potential

interactions between two factors. To make an interaction plot,

  1. Calculate the cell means for all a · b combinations of the levels of A and B.
  2. Plot the cell means against the levels of factor A.
  3. Connect and label means the same levels of factor B.
  • The roles of A and B can be reversed to make a second interaction plot.
  • Interpretation of the interaction plot:
  • Parallel lines usually indicate no significant interaction.
  • Severe lack of parallelism usually indicates a significant interaction.
  • Moderate lack of parallelism suggests a possible significant interaction may exist.
  • Statistical significance of an interaction effect depends on the magnitude of the M SE :

For smal values of the M SE , even small interaction effects (less nonparallelism) may be significant.

  • When an A ∗ B interaction is large, the corresponding main effects A and B may have little practical

meaning. Knowledge of the A ∗ B interaction is often more useful than knowledge of the main effect.

  • We usually say that a significant interaction can mask the interpretation of significant main effects.

That is, the experimenter must examine the levels of one factor, say A, at fixed levels of the other

factor to draw conclusions about the main effect of A.

  • It is possible to have a significant interaction between two factors, while the main effects for both

factors are not significant. This would happen when the interaction plot shows interactions in different

directions that balance out over one or both factors (such as an X pattern). This type of interaction,

however, is uncommon.

4.3 Matrix Forms for the Twoway ANOVA

Example: Consider a completely randomized 2 × 3 factorial design with n = 2 replications for each of the

six combinations of the two factors (A and B). The following table summarizes the results:

Factor A Factor B Levels

Levels 1 2 3

1 1 , 2 4 , 6 5 , 6

2 3 , 5 5 , 7 4 , 6

  • Model: yijk = μ + αi + βj + (αβ)ij + ijk for i = 1, 2 j = 1, 2 , 3 k = 1, 2 and ijk ∼ N (0, σ

2 )

  • Assume (i)

i=1 αi^ = 0^ (ii)^

j=1 βj^ = 0

(iii)

j=1(αβ)ij^ = 0 for^ i^ = 1,^2 (iv)^

i=1(αβ)ij^ = 0 for^ j^ = 1,^2 ,^3

  • Thus, for the main effect constraints, we have α 2 = −α 1 and β 3 = −β 1 − β 2.
  • The interaction effect constraints can be written in terms of just αβ 11 and αβ 12 :

αβ 12 = αβ 22 = αβ 13 = αβ 23 =

  • Thus, the reduced form of model matrix X requires only 6 columns: μ, α 1 , β 1 , β 2 , αβ 11 and αβ 12.

μ α 1 β 1 β 2 αβ 11 αβ 12

X =

y =

                  X

′ X =

X

′ y =

(X

′ X)

− 1

(X

′ X)

− 1 X

′ y =

μ

̂ α 1

β̂ 1

β̂ 2

αβ̂ 11

αβ̂ 12

Thus, α̂ 2 = −α̂ 1 = 0. 5 β̂ 3 = −β̂ 1 − β̂ 2 = 0. 75 αβ̂ 21 = −αβ̂ 11 = 0. 75

αβ̂ 22 =^ − αβ̂ 12 = 0 αβ̂ 13 =^ − αβ̂ 11 − αβ̂ 12 = 0.^75 αβ̂ 23 = αβ̂ 11 + αβ̂ 12 =^ −^0.^75

4.4 Notation for an ANOVA

  • SSA = nb

a ∑

i=

(yi·· − y···)

2 = the sum of squares for factor A (df = a − 1)

M SA = SSA/(a − 1) = the mean square for factor A

  • SSB = na

b ∑

j=

(y·j· − y···)

2 = the sum of squares for factor B (df = b − 1)

M SB = SSB /(b − 1) = the mean square for factor B

  • SSAB = n

∑^ a

i=

∑^ b

j=

[(yij· − y···) − (yi·· − y···) − (y·j· − y···)]

2 = n

∑^ a

i=

∑^ b

j=

(yij· − yi·· − y·j· + y···)

2

= the A ∗ B interaction sum of squares (df = (a − 1)(b − 1))

M SAB = SSAB /(a − 1)(b − 1)= the mean square for the A ∗ B interaction

• SSE =

∑a

i=

∑b

j=

∑n

k=

yijk − yij·

= the error sum of squares (df = ab(n − 1))

M SE = SSE /ab(n − 1)= the mean square error

• SST =

∑^ a

i=

∑^ b

j=

∑^ n

k=

(yijk − y···)

2 = the total sum of squares (df = abn − 1)

  • the total sum of squares is partitioned into components corresponding to the terms in the model:

a ∑

i=

b ∑

j=

n ∑

k=

(yijk − y···)

2 = nb

a ∑

i=

(yi·· − y···)

2

  • na

b ∑

j=

(y·j· − y···)

2

  • n

a ∑

i=

b ∑

j=

(yij· − yi·· − y·j· + y···)

2

r ∑

i=

ni ∑

j=

(yij − yi·)

2

OR

  • The alternate SS formulas for the balanced two factorial design are:

SST =

∑^ a

i=

∑^ b

j=

∑^ n

k=

y

2 ijk −^

y

2 ···

abn

SSA =

∑^ a

i=

y

2 i··

bn

y

2 ···

abn

SSB =

∑^ b

j=

y

2 ·j·

an

y

2 ···

abn

SSAB =

a ∑

i=

b ∑

j=

y

2 ij·

n

− SSA − SSB −

y

2 ···

abn

SSE = SST − SSA − SSB − SSAB

  • The alternate SS formulas for the unbalanced two factorial design are:

SST =

∑^ a

i=

∑^ b

j=

nij ∑

k=

y

2 ijk −^

y

2 ···

N

SSA =

∑^ a

i=

y

2 i··

ni·

y

2 ···

N

SSB =

∑^ b

j=

y

2 ·j·

n·j

y

2 ···

N

SSAB =

a ∑

i=

b ∑

j=

y

2 ij·

nij

− SSA − SSB −

y

2 ···

N

SSE = SST − SSA − SSB − SSAB

where N =

∑a

i=

∑b

j=1 nij^ ,^ ni·^ =^

∑b

j=1 nij^ ,^ n·j^ =^

∑a

i=1 nij^.

Balanced Two-Factor Factorial ANOVA Table

Source of Sum of Mean F

Variation Squares d.f. Square Ratio

A SSA a − 1 M SA = SSA/(a − 1) FA = M SA/M SE

B SSB b − 1 M SB = SSB /(b − 1) FB = M SB /M SE

A ∗ B SSAB (a − 1)(b − 1) M SAB = SSAB /(a − 1)(b − 1) FA∗B = M SAB /M SE

Error SSE ab(n − 1) M SE = SSE /(ab(n − 1)) ——

Total SStotal abn − 1 —— ——

For the unbalanced case, replace ab(n − 1) with N − ab for the d.f. for SSE and replace abn − 1 with N − 1

for the d.f. for SStotal where N =

∑a

i=

∑b

j=1 nij^.

4.5 Comments on Interpreting the ANOVA

  • Test H 0 : (αβ) 11 = (αβ) 12 = · · · = (αβ)ab vs. H 1 : at least one (αβ)ij 6 = (αβ)i′j′ first.
    • If this test indicates that there is not a significant interaction, then continue testing the hy-

potheses for the two main effects:

H 0 : α 1 = α 2 = · · · = αa vs. H 1 : at least one αi 6 = αi′

H 0 : β 1 = β 2 = · · · = βb vs. H 1 : at least one βj 6 = βj′

  • If this test indicates that there is a significant interaction, then the interpretation of significant

main effects hypotheses can be masked. To draw conclusions about a main effect, we will fix

the levels of one factor and vary the levels of the other. Using this approach (combined with

interaction plots) we may be able to provide an interpretation of main effects.

  • If we assume the constraints in (24), then the hypotheses can be rewritten as:

H 0 : (αβ) 11 = (αβ) 12 = · · · = (αβ)ab = 0 vs. H 1 : at least one (αβ)ij 6 = 0

H 0 : α 1 = α 2 = · · · = αa = 0 vs. H 1 : at least one αi 6 = 0

H 0 : β 1 = β 2 = · · · = βb = 0 vs. H 1 : at least one βj 6 = 0

4.6 ANOVA for a 2 × 2 Factorial Design Example

  • We will now use SAS to analyze the 2 × 2 factorial design data discussed earlier.

M

Medium 1 Medium 2

T hours 22 28 26 26 25 27

hours 39 38 36 34 33 35

Dependent Variable: growth

Fit Diagnostics for growth

Adj R-Square 0.

R-Square 0.

MSE 5.

Error DF 20

Parameters 4

Observations 24

Proportion Less

0.0 0.4 0.

Residual

0.0 0.4 0.

Fit–Mean

0

5

-7 -5 -3 -1 1 3 5 7

Residual

0

10

20

30

40

Percent

0 5 10 15 20 25

Observation

Cook's D

20 25 30 35 40

Predicted Value

20

25

30

35

40

growth

-2 -1 0 1 2

Quantile

0

2

4

Residual

0.20 0.25 0.

Leverage

0

1

2

RStudent

25 30 35

Predicted Value

0

1

2

RStudent

25 30 35

Predicted Value

0

2

4

Residual

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

20

25

30

35

40

growth

12 18 time

Distribution of growth

growth Level of time N Mean Std Dev 12 12 24.6666667 2. 18 12 34.5833333 3.

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

20

25

30

35

40

growth

1 2 medium

Distribution of growth

growth Level of medium N Mean Std Dev 1 12 30.2500000 7. 2 12 29.0000000 3.

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

time

Distribution of growth

growth

Level of

time N Mean Std Dev

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

growth

medium

Distribution of growth

growth

Level of

medium N Mean Std Dev

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

12 18

time

20

25

30

35

40

growth

medium 1 2

Interaction Plot for growth

ANOVA and Estimation of Effects for a 2x2 Design

The GLM Procedure

20

25

30

35

40

growth

12 1 12 2 18 1 18 2

time*medium

Distribution of growth

growth

Level of time

Level of medium N Mean Std Dev

12 1 6 23.3333333 3.

12 2 6 26.0000000 1.

18 1 6 37.1666667 1.

18 2 6 32.0000000 2.

4.7 Tests of Normality (Supplemental)

  • For an ANOVA, we assume the errors are normally distributed with mean 0 and constant variance

σ

2

. That is, we assume the random error  ∼ N (0, σ

2 ).

  • The Kolmogorov-Smirnov Goodness-of-Fit Test, the Cramer-Von Mises Goodness-of-Fit Test, and

the Anderson-Darling Goodness-of-Fit Test can be applied to any distribution F (x).

  • Although the following notes use the general form F (x), we will be assuming F (x) represents a

normal distribution with mean 0 and constant variance.

  • We are also assuming that the random sample referred to in each test is the set of residuals from the

ANOVA.

  • Thus, in each each test we are checking the normality assumption in the ANOVA. In this case, we

want to see a large p-value because we do not want to reject the null hypothesis that the errors are

normally distributed.

4.7.1 Kolmogorov-Smirnov Goodness-of-Fit Test

Assumptions: Given a random sample of n independent observations

  • The measurement scale is at least ordinal.
  • The observations are sampled from a continuous distribution F (x).

Hypotheses: For a hypothesized distribution F

∗ (x)

(i) Two-sided: H 0 : F (x) = F

∗ (x) for all x vs. H 1 : F (x) 6 = F

∗ (x) for some x

(ii) One-sided: H 0 : F (x) ≥ F

∗ (x) for all x vs. H 1 : F (x) < F

∗ (x) for some x

(iii) One-sided: H 0 : F (x) ≤ F

∗ (x) for all x vs. H 1 : F (x) > F

∗ (x) for some x

Method: For a given α

  • Define the empirical distribution function Sn(x) =

Number of observations ≤ x

n

(i) Two-sided test statistic: T = sup x

|F

∗ (x) − Sn(x)|

  • When plotted, T is the greatest vertical difference between the empirical and the hypothesized dis-

tribution.

(ii) One-sided test statistic: T

= sup x

(F

∗ (x) − Sn(x))

(iii) One-sided test statistic: T

− = sup x

(Sn(x) − F

∗ (x))

Decision Rule

  • Critical values for T , T

and T

− are found in nonparametrics textbooks. For larger samples sizes,

an asymptotic critical value can be used.

  • We will just rely on p-values to make a decision.

4.7.2 Cramer-Von Mises Goodness-of-Fit Test

Assumptions: Same as the Kolmogorov-Smirnov test

Hypotheses: For a hypothesized distribution F

∗ (x)

H 0 : F (x) = F

∗ (x) for all x vs. H 1 : F (x) 6 = F

∗ (x) for some x

Method: For a given α

  • Define the empirical distribution function Sn(x) =

Number of observations ≤ x

n

  • The Cramer-von Mises test statistic W 2 is defined to be

W

2 = n

−∞

[F

∗ (x) − Sn(x)]

2 dF

∗ (x).

  • This form can reduces to W

2

12 n

∑^ n

i=

F

∗ (x(i)) −

2 i − 1

2 n

where x(1), x(2),... , x(n) represents the ordered sample in ascending order.

Decision Rule

  • Tables of critical values exist for the exact distribution of W

2 when H 0 is true. Computers generate

critical values for the asymptotic (n → ∞) distribution of W

2 .

  • If W

2 becomes too large (or p-value < α), then we will Reject H 0.

4.7.3 Anderson-Darling Goodness-of-Fit Test

Assumptions: Same as the Kolmogorov-Smirnov and Cramer-von Mises tests

Hypotheses: Same as the Cramer-von Mises test.

Method: For a given α

  • Define the empirical distribution function Sn(x) =

Number of observations ≤ x

n

  • The Anderson-Darling test statistic A

2 is defined to be

A

2

−∞

F ∗(x)(1 − F ∗(x))

[F

∗ (x) − Sn(x)]

2 dx.

  • This form can reduces to A

2 = −

n

(2i − 1)

lnF

∗ (x(i)) + ln(1 − F

∗ (x(n+1−i))

− n where

x(1), x(2),... , x(n) represents the ordered sample in ascending order.

Decision Rule

  • Computers generate critical values for the asymptotic (n → ∞) distribution of A

2 .

  • If A

2 becomes too large (or p-value < α), then we will Reject H 0.