chapter4-anova-experimental-design-analysis, Study notes of Mathematics

chapter4-anova-experimental-design-analysis students should study this

Typology: Study notes

2020/2021

Uploaded on 04/01/2022

levee-zachary-torremocha
levee-zachary-torremocha 🇵🇭

1 document

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur
11
Chapter 4
Experimental Designs and Their Analysis
Design of experiment means how to design an experiment in the sense that how the observations or
measurements should be obtained to answer a query in a valid, efficient and economical way. The designing
of the experiment and the analysis of obtained data are inseparable. If the experiment is designed properly
keeping in mind the question, then the data generated is valid and proper analysis of data provides the valid
statistical inferences. If the experiment is not well designed, the validity of the statistical inferences is
questionable and may be invalid.
It is important to understand first the basic terminologies used in the experimental design.
Experimental unit:
For conducting an experiment, the experimental material is divided into smaller parts and each part is
referred to as an experimental unit. The experimental unit is randomly assigned to treatment is the
experimental unit. The phrase “randomly assigned” is very important in this definition.
Experiment:
A way of getting an answer to a question which the experimenter wants to know.
Treatment
Different objects or procedures which are to be compared in an experiment are called treatments.
Sampling unit:
The object that is measured in an experiment is called the sampling unit. This may be different from the
experimental unit.
Factor:
A factor is a variable defining a categorization. A factor can be fixed or random in nature. A factor is termed
as a fixed factor if all the levels of interest are included in the experiment.
A factor is termed as a random factor if all the levels of interest are not included in the experiment and those
that are can be considered to be randomly chosen from all the levels of interest.
Replication:
It is the repetition of the experimental situation by replicating the experimental unit.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download chapter4-anova-experimental-design-analysis and more Study notes Mathematics in PDF only on Docsity!

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Chapter 4

Experimental Designs and Their Analysis

Design of experiment means how to design an experiment in the sense that how the observations or measurements should be obtained to answer a query in a valid, efficient and economical way. The designing of the experiment and the analysis of obtained data are inseparable. If the experiment is designed properly keeping in mind the question, then the data generated is valid and proper analysis of data provides the valid statistical inferences. If the experiment is not well designed, the validity of the statistical inferences is questionable and may be invalid. It is important to understand first the basic terminologies used in the experimental design.

Experimental unit:

For conducting an experiment, the experimental material is divided into smaller parts and each part is referred to as an experimental unit. The experimental unit is randomly assigned to treatment is the experimental unit. The phrase “randomly assigned” is very important in this definition.

Experiment:

A way of getting an answer to a question which the experimenter wants to know.

Treatment

Different objects or procedures which are to be compared in an experiment are called treatments.

Sampling unit:

The object that is measured in an experiment is called the sampling unit. This may be different from the experimental unit.

Factor:

A factor is a variable defining a categorization. A factor can be fixed or random in nature. A factor is termed as a fixed factor if all the levels of interest are included in the experiment. A factor is termed as a random factor if all the levels of interest are not included in the experiment and those that are can be considered to be randomly chosen from all the levels of interest.

Replication:

It is the repetition of the experimental situation by replicating the experimental unit.

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Experimental error:

The unexplained random part of the variation in any experiment is termed as experimental error. An estimate of experimental error can be obtained by replication.

Treatment design:

A treatment design is the manner in which the levels of treatments are arranged in an experiment.

Example: (Ref.: Statistical Design, G. Casella, Chapman and Hall, 2008) Suppose some varieties of fish food is to be investigated on some species of fishes. The food is placed in the water tanks containing the fishes. The response is the increase in the weight of fish. The experimental unit is the tank, as the treatment is applied to the tank, not to the fish. Note that if the experimenter had taken the fish in hand and placed the food in the mouth of fish, then the fish would have been the experimental unit as long as each of the fish got an independent scoop of food.

Design of experiment:

One of the main objectives of designing an experiment is how to verify the hypothesis in an efficient and economical way. In the contest of the null hypothesis of equality of several means of normal populations having the same variances, the analysis of variance technique can be used. Note that such techniques are based on certain statistical assumptions. If these assumptions are violated, the outcome of the test of a hypothesis then may also be faulty and the analysis of data may be meaningless. So the main question is how to obtain the data such that the assumptions are met and the data is readily available for the application of tools like analysis of variance. The designing of such a mechanism to obtain such data is achieved by the design of the experiment. After obtaining the sufficient experimental unit, the treatments are allocated to the experimental units in a random fashion. Design of experiment provides a method by which the treatments are placed at random on the experimental units in such a way that the responses are estimated with the utmost precision possible.

Principles of experimental design:

There are three basic principles of design which were developed by Sir Ronald A. Fisher. (i) Randomization (ii) Replication (iii) Local control

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Complete and incomplete block designs:

In most of the experiments, the available experimental units are grouped into blocks having more or less identical characteristics to remove the blocking effect from the experimental error. Such design is termed as block designs.

The number of experimental units in a block is called the block size. If size of block = number of treatments and each treatment in each block is randomly allocated, then it is a full replication and the design is called a complete block design.

In case, the number of treatments is so large that a full replication in each block makes it too heterogeneous with respect to the characteristic under study, then smaller but homogeneous blocks can be used. In such a case, the blocks do not contain a full replicate of the treatments. Experimental designs with blocks containing an incomplete replication of the treatments are called incomplete block designs.

Completely randomized design (CRD)

The CRD is the simplest design. Suppose there are v treatments to be compared.  All experimental units are considered the same and no division or grouping among them exist.  In CRD, the v treatments are allocated randomly to the whole set of experimental units, without making any effort to group the experimental units in any way for more homogeneity.  Design is entirely flexible in the sense that any number of treatments or replications may be used.  The number of replications for different treatments need not be equal and may vary from treatment to treatment depending on the knowledge (if any) on the variability of the observations on individual treatments as well as on the accuracy required for the estimate of individual treatment effect.

Example: Suppose there are 4 treatments and 20 experimental units, then

  • the treatment 1 is replicated, say 3 times and is given to 3 experimental units,
  • the treatment 2 is replicated, say 5 times and is given to 5 experimental units,
  • the treatment 3 is replicated, say 6 times and is given to 6 experimental units and
  • finally, the treatment 4 is replicated [20-(6+5+3)=]6 times and is given to the remaining 6 experimental units.

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

 All the variability among the experimental units goes into experimented error.  CRD is used when the experimental material is homogeneous.  CRD is often inefficient.  CRD is more useful when the experiments are conducted inside the lab.  CRD is well suited for the small number of treatments and for the homogeneous experimental material.

Layout of CRD Following steps are needed to design a CRD:  Divide the entire experimental material or area into a number of experimental units, say n.  Fix the number of replications for different treatments in advance (for given total number of available experimental units).  No local control measure is provided as such except that the error variance can be reduced by choosing a homogeneous set of experimental units.

Procedure

Let the v treatments are numbered from 1,2,..., v and ni be the number of replications required for i th

treatment such that 1

vini^  n

 Select n 1 units out of n units randomly and apply treatment 1 to these n 1 units.

( Note : This is how the randomization principle is utilized is CRD.)

 Select n 2 units out of ( n  n 1 ) units randomly and apply treatment 2 to these n 2 units.

 Continue with this procedure until all the treatments have been utilized.  Generally, the equal number of treatments are allocated to all the experimental units unless no practical limitation dictates or some treatments are more variable or/and of more interest.

Analysis

There is only one factor which is affecting the outcome – treatment effect. So the set-up of one-way analysis of variance is to be used.

yij :Individual measurement of j th^ experimental units for i th^ treatment i = 1,2,..., v , j = 1,2,..., ni^.

yij :Independently distributed following N (     i , 2 )with

1

vini^  i .  (^) : overall mean

 i : i th^ treatment effect

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Solving them using 1

v

 i  ni^  i  , we get

oo i io oo

y y y

where 1

1 ni

yio  ni  j  yij is the mean of observation receiving the i th^ treatment and 1

1 v ni

yoo  n  i  j  yij is the mean

of all the observations.

The fitted model is obtained after substituting the estimate ˆ and ˆ i^ in the linear model. Using the fitted

model, we can write ( ) ( ) or ( ) ( ) ( ).

ij oo io oo ij io ij oo io oo ij

y y y y y y y y y y y y

Squaring both sides and summing over all the observation, we have

2 2 2 1 1 1 1 1

Total sum Sum of squares or (^) of squares = due to treatment effects

v n i^ v v ni ijyij^ ^ yoo^ ^ ini^ yio^ ^ yoo^ ^ ijyij^  yio       

+^ Sum of squares due to error or TSS SSTr SSE

 Since 1 1

v n i

 i  j  yij^ ^ yoo  so^ TSS^ is based on the sum of^ (^ n^ 1)^ squared quantities. The^ TSS

carries only (^ n^ 1)^ degrees of freedom.

 Since 1

 ^ 

v

i ni^ yio^ yoo^ so^ SSTr^ is based only on the sum of^ ( v^ -1) squared quantities. The

SSTr carries only ( v -1) degrees of freedom.

 Since 1

n i

 i  ni^ yij^ ^ yio  for all^ i^ = 1,2,..., v , so^ SSE^ is based on the sum of squaring^ n^ quantities like

( yijyio ) with v constraints 1

 ^ 

n i j yij^ yio^ So^ SSE^ carries ( n^ –^ v ) degrees of freedom.  Using the Fisher-Cochran theorem, TSS = SSTr + SSE with degrees of freedom partitioned as ( n – 1) = ( v - 1) + ( nv ).

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Moreover, equality in TSS = SSTr + SSE has to hold exactly. To ensure that the equality holds exactly, we find one of the sums of squares through subtraction. Generally, it is recommended to find SSE by subtraction as SSE = TSS - SSTr 2 1 1 2 2 1 1

i

i

v n i j ij^ io v n i j ij

TSS y y

y G n

 

 

where

1 1

v n i

G   i  j  yij

2 1 2 2 1

1

where

i

i

n j i^ io^ oo v i i (^) i n i (^) j ij

SSTr n y y T G n n T y

 ^ 

2

Gn : correction factor.

Now under H 0^ :^  1 ^  2 ^ ...^ ^  v ^0 , the model become

Yij    ij ,

and minimizing 2 1 1

v n i

S   i  j   ij

with respect to^ ^ gives

 S   0  ˆ Gn  yoo.

The SSE under H 0 becomes

2 1 1

v n i

SSE   i  j  yij  yoo

and thus TSS  SSE .This TSS under H 0 contains the variation only due to the random error whereas the

earlier TSSSSTrSSE contains the variation due to treatments and errors both. The difference between the two will provides the effect of treatments in terms of the sum of squares as

2 1

v

SSTr   i  ni yi  yoo

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

2 1 2 1 2 2 2 1 1 2 2 2 1 1 2 2 1

 

 

  ^  
  ^  

 

 

v i i^ io^ oo v i i^ i^ io^ oo v v i i^ i^ i i^ io^ oo v v i i^ i^ i i i v i i^ i

E SSTr n E y y

n E

n n n

n n (^) n n n

n v

 ^ 

2 2 1

v i i^ i

E MSTr E SStr n

v v ^ ^ 

 ^  

 ^    In general E (^)  MSTr   ^2 but under H (^) 0 ,all  i  0 and so E MSTr ( ) ^2.

Randomized Block Design

If a large number of treatments are to be compared, then a large number of experimental units are required. This will increase the variation among the responses and CRD may not be appropriate to use. In such a case when the experimental material is not homogeneous and there are v treatments to be compared, then it may be possible to  group the experimental material into blocks of sizes v units.  Blocks are constructed such that the experimental units within a block are relatively homogeneous and resemble to each other more closely than the units in the different blocks.  If there are b such blocks, we say that the blocks are at b levels. Similarly, if there are v treatments, we say that the treatments are at v levels. The responses from the b levels of blocks and v levels of treatments can be arranged in a two-way layout. The observed data set is arranged as follows:

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Treatments (Factor B ) Block totals 1 2 j v

Blocks (Factor

A

1 y 11 y 1 2 y 1 j y 1 v B 1 2 y 21 y 22 … y 2 j y 2 v B 2 . . .

i y (^) i 1 y (^) i 2 … y (^) ij y (^) iv Bi . . .

b y (^) b 1 y (^) b 2 … y (^) bj y (^) bv Bb Treatment totals T 1 T 2 … Tj Tv Grand total ( G)

Layout:

A two-way layout is called a randomized block design (RBD) or a randomized complete block design (RCB) if, within each block, the v treatments are randomly assigned to v experimental units such that each of the v! ways of assigning the treatments to the units has the same probability of being adopted in the experiment and the assignment in different blocks are statistically independent.

The RBD utilizes the principles of design - randomization, replication and local control - in the following way:

1. Randomization:

  • Number the v treatments 1,2,…, v.
  • Number the units in each block as 1, 2,..., v.
  • Randomly allocate the v treatments to v experimental units in each block.

2. Replication

Since each treatment is appearing in each block, so every treatment will appear in all the blocks. So each treatment can be considered as if replicated the number of times as the number of blocks. Thus in RBD, the number of blocks and the number of replications are same.

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

There are two null hypotheses to be tested.

  • related to the block effects

H 0 B :  1   2  ....   b 0.

  • related to the treatment effects

H 0 T :  1   2  ....  v 0.

The linear model, in this case, is a two-way model as

yij     i   j   ij , i  1, 2,.., ; b j 1, 2,.., v

where  (^) ij are identically and independently distributed random errors following a normal distribution with

mean 0 and variance  2.

The tests of hypothesis can be derived using the likelihood ratio test or the principle of least squares. The use of likelihood ratio test has already been demonstrated earlier, so we now use the principle of least squares.

2 2 1 1 1 1

Minimizing ( )

b v b v S  (^)  ij   ij  (^)  ijyij     i (^)  j

and solving the normal equation

0, 0, 0 for all 1, 2,.., , 1, 2,..,. i j

S S S (^) i b j v

the least squares estimators are obtained as ˆ (^) , ˆ (^) , ˆ.

oo i io oo j oj oo

y y y y y

Using the fitted model (obtained after substituting the estimated values of the parameters in the model), we can write yij = yoo  ( yioyoo )  ( yojyoo )  ( yijyioyojyoo ).

Squaring both sides and summing over i and j gives

2 2 2 2 1 1 1 1 1 1

or

b v b v b v i j yij^ yoo^ v^ i yio^ yoo^ b^ j yoj^ yoo^ i j yij^ yio^ yoj^ yoo TSS SSBl SSTr SSE

     

   

with degrees of freedom partitioned as bv  1  ( b  1)  ( v 1)  ( b 1)( v 1).

The reason for the number of degrees of freedom for different sums of squares is the same as in the case of CRD.

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

2 1 1 2 2 1 1

Here ( )

b v i j ij^ oo b v i j ij

TSS y y

y G bv

 

 



 2

Gbv :correction factor.

1 1

b v G  (^)   ijyij : Grand total of all the observation.

2 1 (^2 ) 1

1

: block total

b i io^ oo b i i v (^) th i (^) j ij

SSBl v y y B G v bv B y i

 2 1 (^2 ) 1

v j oj^ oo v (^) j j

SSTr b y y T (^) G b bv

1 2 1 1

: treatment total

( ).

b (^) th j (^) i ij b v i j ij^ io^ oj^ oo

T y j

SSE y y y y

 



The expectations of mean squares are

2 2 1 2 2 1 2

b i i v j j

E MSBl E SSBl^ v b b E MSTr E SSTr^ b v v E MSE E SSE b v

 

 

 ^  
 ^  
 ^  
 ^  
 ^ 

Moreover,

2 2

2 2

2 2

b SSBl b v SSTr v

b v SSE b v

^ 

Under 0 : 1 2 ... 0, ( ) ( )

H B b E MSBl E MSE

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Latin Square Design

The treatments in the RBD are randomly assigned to b blocks such that each treatment must occur in each block rather than assigning them at random over the entire set of experimental units as in the CRD. There are only two factors – block and treatment effects – which are taken into account and the total number of experimental units needed for complete replication are bv where b and v are the numbers of blocks and treatments respectively.

If there are three factors and suppose there are b , v and k levels of each factor, then the total number of experimental units needed for a complete replication are bvk. This increases the cost of experimentation and the required number of experimental units over RBD. In Latin square design (LSD), the experimental material is divided into rows and columns, each having the same number of experimental units which is equal to the number of treatments. The treatments are allocated to the rows and the columns such that each treatment occurs once and only once in each row and in each column.

In order to allocate the treatment to the experimental units in rows and columns, we take help from Latin squares.

Latin Square:

A Latin square of order p is an arrangement of p symbols in p^2 cells arranged in p rows and p columns

such that each symbol occurs once and only once in each row and in each column. For example, to write a Latin square of order 4, choose four symbols – A, B, C and D. These letters are Latin letters which are used as symbols. Write them in a way such that each of the letters out of A, B, C and D occurs once and only once in each row and each column. For example, as

A B C D B C D A C D A B D A B C

This is a Latin square. We consider first the following example to illustrate how a Latin square is used to allocate the treatments and in getting the response.

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Example: Suppose different brands of petrol are to be compared with respect to the mileage per litre achieved in motor cars. Important factors responsible for the variation in mileage are

  • the difference between individual cars.
  • the difference in the driving habits of drivers.

We have three factors – cars, drivers and petrol brands. Suppose we have

  • 4 types of cars denoted as 1, 2, 3, 4.
  • 4 drivers that are represented by a, b, c, d.
  • 4 brands of petrol are indicated as A, B, C, D. Now the complete replication will require 4  4  4  (^64) the number of experiments. We choose only 16

experiments. To choose such 16 experiments, we take the help of the Latin square. Suppose we choose the following Latin square: A B C D B C D A C D A B D A B C Write them in rows and columns and choose rows for drivers, columns for cars and letter for petrol brands. Thus 16 observations are recorded as per this plan of treatment combination (as shown in the next figure) and further analysis is carried out. Since such design is based on Latin square, so it is called as a Latin square design.

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Standard form of Latin square A Latin square is in the standard form if the symbols in the first row and first columns are in the natural order (Natural order means the order of alphabets like A, B, C, D,…).

Given a Latin square, it is possible to rearrange the columns so that the first row and first column remain in a natural order.

Example : Four standard forms of 4  4 Latin square are as follows.

A B C D B A D C C D B A D C A B

A B C D
B C D A
C D A B
D A B C
A B C D
B D A C
C A D B
D C B A
A B C D
B A D C
C D A B
D C B A

For each standard Latin square of order p , the p rows can be permuted in p! ways. Keeping a row fixed, vary and permute ( p - 1) columns in ( p - 1)! ways. So there are p !( p - 1)! different Latin squares.

For illustration Size of square Number of Standard squares

Value of p !(1 - p )!

Total number of different squares 3 x 3 1 12 12 4 x 4 4 144 576 5 x 5 56 2880 161280 6 x 6 9408 86400 812851250

Conjugate: Two standard Latin squares are called conjugate if the rows of one are the columns of other. For example A B C D A B C D B C D A and B C D A C D A B C D A B D A B C D A B C are conjugate. In fact, they are self conjugate.

A Latin square is called self conjugate if its arrangement in rows and columns are the same.

Analysis of Variance | Chapter 4 | Experimental Designs & Their Analysis | Shalabh, IIT Kanpur

Transformation set: A set of all Latin squares obtained from a single Latin square by permuting its rows, columns and symbols is called a transformation set.

From a Latin square of order p , p !( p - 1)! different Latin squares can be obtained by making p! permutations of columns and ( p - 1)! permutations of rows which leaves the first row in place. Thus

Number of different p !( p - 1)! X number of standard Latin Latin squares of order = squares in the set p in a transformation set

Orthogonal Latin squares If two Latin squares of the same order but with different symbols are such that when they are superimposed on each other, every ordered pair of symbols (different) occurs exactly once in the Latin square, then they are called orthogonal.

Graeco-Latin square: A pair of orthogonal Latin squares, one with Latin symbols and the other with Greek symbols form a Graeco-Latin square. For example A B C D B A D C C D A B D C B A

is a Graeco-Latin square of order 4.

Graeco Latin squares design enables to consider one more factor than the factors in Latin square design. For example, in the earlier example, if there are four drivers, four cars, four petrol and each petrol has four

varieties, as  ,  , and , then Graeco-Latin square helps in deciding the treatment combination as

follows: