Hypergeometric Distribution and its Limiting Form: The Binomial Distribution, Slides of Law

An in-depth analysis of the Hypergeometric Distribution, its moments, and its limiting form as the population size approaches infinity. The document also covers the relationship between the Hypergeometric Distribution and the Binomial Distribution.

Typology: Slides

2021/2022

Uploaded on 09/27/2022

marcyn
marcyn 🇬🇧

4.3

(12)

226 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CHAPTER 10
Statistical Distributions
In this chapter, we shall present some probability distributions that play a central
role in econometric theory. First, we shall present the distributions of some
discrete random variables that have either a finite set of values or that take
values that can be indexed by the entire set of positive integers. We shall also
present the multivariate generalisations of one of these distributions.
Next, we shall present the distributions of some continuous random variables
that take values in intervals of the real line or over the entirety of the real line.
Amongst these is the normal distribution, which is of prime importance and for
which we shall consider, in detail, the multivariate extensions.
Associated with the multivariate normal distribution are the so-called sam-
pling distributions that are important in the theory of statistical inference. We
shall consider these distributions in the final section of the chapter, where it will
transpire that they are special cases of the univariate distributions described in
the preceding section.
Discrete Distributions
Suppose that there is a population of Nelements, Np of which belong to
class Aand N(1p) to class Ac. When we select nelements at random from the
population in nsuccessive trials, we wish to know the probability of the event
that xof them will be in Aand that nxof them will be in Ac.
The probability will be affected by the way in which the nelements are
selected; and there are two ways of doing this. Either they can be put aside after
they have been sampled, or else they can be restored to the population. Therefore
we talk of sampling without replacement and of sampling with replacement.
If we sample with replacement, then the probabilities of selecting an element
from either class will the same in every trial, and the size Nof the population will
have no relevance. In that case, the probabilities are governed by the binomial
law. If we sample without replacement, then, in each trial, the probabilities of
selecting elements from either class will depend on the outcomes of the previous
trials and upon the size of the population; and the probabilities of the outcomes
from nsuccessive trials will be governed by the hypergeometric law.
Binomial Distribution
When there is sampling with replacement, the probability is pthat an ele-
ment selected at random will be in class A, and the probability is 1pthat it will
be in class Ac. Moreover, the outcomes of successive trials will be statistically
independent. Therefore, if a particular sequence has xelements in Ain nx
elements Ac, then, as a statistical outcome, its probability will be px(1 p)nx.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Hypergeometric Distribution and its Limiting Form: The Binomial Distribution and more Slides Law in PDF only on Docsity!

CHAPTER 10

Statistical Distributions

In this chapter, we shall present some probability distributions that play a central role in econometric theory. First, we shall present the distributions of some discrete random variables that have either a finite set of values or that take values that can be indexed by the entire set of positive integers. We shall also present the multivariate generalisations of one of these distributions. Next, we shall present the distributions of some continuous random variables that take values in intervals of the real line or over the entirety of the real line. Amongst these is the normal distribution, which is of prime importance and for which we shall consider, in detail, the multivariate extensions. Associated with the multivariate normal distribution are the so-called sam- pling distributions that are important in the theory of statistical inference. We shall consider these distributions in the final section of the chapter, where it will transpire that they are special cases of the univariate distributions described in the preceding section.

Discrete Distributions

Suppose that there is a population of N elements, N p of which belong to class A and N (1 − p) to class Ac. When we select n elements at random from the population in n successive trials, we wish to know the probability of the event that x of them will be in A and that n − x of them will be in Ac. The probability will be affected by the way in which the n elements are selected; and there are two ways of doing this. Either they can be put aside after they have been sampled, or else they can be restored to the population. Therefore we talk of sampling without replacement and of sampling with replacement. If we sample with replacement, then the probabilities of selecting an element from either class will the same in every trial, and the size N of the population will have no relevance. In that case, the probabilities are governed by the binomial law. If we sample without replacement, then, in each trial, the probabilities of selecting elements from either class will depend on the outcomes of the previous trials and upon the size of the population; and the probabilities of the outcomes from n successive trials will be governed by the hypergeometric law.

Binomial Distribution

When there is sampling with replacement, the probability is p that an ele- ment selected at random will be in class A, and the probability is 1−p that it will be in class Ac. Moreover, the outcomes of successive trials will be statistically independent. Therefore, if a particular sequence has x elements in A in n − x elements Ac, then, as a statistical outcome, its probability will be px(1 − p)n−x.

D.S.G. POLLOCK: ECONOMETRICS

There are altogether nCx = n!/{(n − x)!x!} such sequences, with x elements in A in n − x elements in Ac. These sequences represent a set of mutually exclusive ways in which the event in question can occur; and their probabilities can be added to give the probability of the event of which they are the particular instances. Therefore, the probability of the finding x elements in class A after n trials is given by the binomial probability function:

(1) b(x; n, p) =

n! (n − x)!x!

px(1 − p)n−x.

The number x of the elements in class A is commonly described as the number of successes, in which case n − x is the number of failures. The archetypal example of a binomial random variable is the number of heads in n tosses of a coin. The moment generating function of the binomial distribution is given by

M (x, t) = E(ext) =

∑^ n

x=

ext^

n! x!(n − x)!

pxqn−x

∑^ n

x=

n! x!(n − x)!

(pet)xqn−x

= (q + pet)n.

By differentiating the function with respect t and then setting t = 0, we can find the following moments:

E(x) = np,

E(x^2 ) = np(q + np),

V (y) = E(x^2 ) − {E(x)}^2 = npq.

Example. A standardised version of the binomial outcome is obtained by sub- tracting the mean np and by dividing by the standard deviation

npq. The result is

(4) z =

x − np √ npq

(x/n) − p √ pq/n

It will be demonstrated later, in the context of our treatment of the normal distribution, that, as the number n of the trails increases, the distribution of the standardised binomial variable tends to the standard normal distribution N (0, 1), which has a mean of zero and a variance of unity. This theorem justifies the use of the normal distribution in approximating the binomial probabilities when the sample size is large. One application of this result is in testing an hypothesis concerning the probability of the event that is designated a success. If n is sufficiently large, then the proportion of successes x/n will be approximately normally distributed with mean p and variance pq/n. The RHS of equation (4) shows how the standardised binomial can be expressed in term of proportions. If the value z,

D.S.G. POLLOCK: ECONOMETRICS

hypergeometric distribution to converge to the binomial distribution as N → ∞. From (7), we have

h(x) =

(N p)! x!(N p − x)!

×

{N (1 − p)}! (n − x)!{N (1 − p) − (n − x)}!

×

(N − n)!n! N!

n! x!(n − x)!

×

{N p(N p−1)···(N p−x+1)}{N q(N q−1)···(N q−(n−x)+1)} N (N −1)(N −2)···(N −n+1)

where q = 1−p. In the term on the RHS of the final expression, there are n factors in both the numerator and the denominator. Therefore, dividing numerator and denominator by N n^ gives

(9) h(x) = nCx

{p(p − (^) N^1 ) · · · (p − x− N 1 )}{q(q − (^) N^1 ) · · · (q − n+ Nx− 1 )} {(1 − (^) N^1 )(1 − (^) N^2 ) · · · (1 − n N− 1 )}

For any fixed x and n, there are

lim(N → ∞)

p(p −

N

) · · · (p −

x − 1 N

(10) = px,

lim(N → ∞)

q(q −

N

) · · · (q −

n + x − 1 N

(11) = qn−x^ = (1 − p)n−x,

lim(N → ∞)

N

N

x − 1 N

Therefore,

(13) h(x; N, n, p) → b(x; n, p) = nCxpx(1 − p)n−x^ as N → ∞.

That is to say:

(14) If x ∼ h(p, N, n), then, when N is large, it is distributed approxi- mately as b(p, n).

The Poisson Distribution

The Poisson distribution may be derived directly as the probability of a rare event in a large number of trials, or else it may be derived as a limiting case of the binomial distribution. We shall begin by taking the latter approach. Therefore, consider, x ∼ b(p; n), where np = μ is constant; and let n → ∞, so that p = μ/n → 0. We can set (1 − p)n−x^ = (1 − p)n(1 − p)−x^ and p = μ/n in equation (1) to give

(15) b(x) =

n! (n − x)!x!

( (^) μ

n

)x ( 1 −

μ n

)n ( 1 −

μ n

)−x .

The RHS of this equation may be re-arranged as follows:

(16) b(x) =

μx x!

n! (n − x)!nx

μ n

)n ( 1 −

μ n

)−x .

15: STATISTICAL DISTRIBUTIONS

The expression may be disassembled for the purpose of taking limits in its com- ponent parts. The limits in question are

(17) lim(n → ∞)

n! (n − x)!nx^

(18) lim(n → ∞)

μ n

)n = e−μ,

(19) lim(n → ∞)

μ n

)−x = 1.

The first of these becomes intelligible when the term in question is written as

n! (n − x)!nx^

n(n − 1) · · · (n − x + 1) nx

= 1

n

x + 1 n

for each of the factors tends to unity as n increases. On reassembling the parts, it is found that the binomial function has a limiting form of

(21) P (x; μ) =

μxe−μ x!

This is the Poisson probability function. The Poisson function can be derived by considering a specification for a so- called emission process or an arrival process. One can imagine a Geiger counter which registers the impacts of successive radioactive alpha particles upon a thin metallic film. Let f (x, t) denote the probability of x impacts or arrivals in the time interval (0, t]. The following conditions are imposed:

(a) The probability of a single arrival in a very short time interval (t, t + Δt] is f (1, Δt) = aΔt, (b) The probability of more than one arrival during that time interval is negligible, (c) The probability of an arrival during the time interval is independent of any occurrences in previous periods.

Certain consequences follow from these assumptions; and it can be shown that the Poisson distribution is the only distribution that fits the specification. As a first consequence, it follows from the assumptions that the probability of there being x arrivals in the interval (0, t + Δt] is

f (x, t + Δt) = f (x, t)f (0, Δt) + f (x − 1 , t)f (1, Δt)

= f (x, t)(1 − aΔt) + f (x − 1 , t)aΔt.

This expression follows from the fact that there are two mutually exclusive ways in which the circumstance can arise. The first way is when all of the x arrivals

15: STATISTICAL DISTRIBUTIONS

As n → ∞, there is p → 0, because μ is fixed, and the RHS tends to

−μ

1 − et

= μ

et^ − 1

= μ

t +

t^2 2!

t^3 3!

and each cumulant κ 1 , κ 2 ,... = μ has the value of the mean. Thus

(30) lim(n → ∞)κ(x; t) = μ

et^ − 1

Now consider x ∼ P (μ). Then

κ(x, t) = log E

etx

= log

x=

etx^

e−μμx x!

= log

e−μ^

∑ (^) etxμx x!

But, if λ = etμ, then

∑ (^) etxμx x!

∑ (^) λx x!

= 1 + λ +

λ^2 2!

λ^3 3!

= eλ,

and therefore

κ(x; t) = log(e−μ+λ) = μ(et^ − 1).

Since (30) and (33) are identical, we can deduce the following:

(34) If x ∼ b(p, n), then μ = E(x) = np and, for fixed μ, there is lim(n → ∞)κ(x; t) = μ(et^ − 1) which is the cumulant generating function of x ∼ P (μ). Therefore, x is asymptotically distributed as the Poisson P (μ) for large n. Finally, we have the following

(35) If x ∼ P (μ 1 ) is independent of x 2 ∼ P (μ 2 ), then y = (x 1 + x 2 ) ∼ P (μ 1 + μ 2 ).

Proof. For x 1 ∼ P (μ 1 ), there is κ(x 1 ; t) = μ 1 (et^ − 1). For x 2 ∼ P (μ 2 ), there is κ(x 2 ; t) = μ 2 (et^ − 1). Now, if x 1 and x 2 are independent, then

κ(x 1 + x 2 ; t) = κ(x 1 ; t) + κ(x 2 ; t) = (μ 1 + μ 2 )(et^ − 1).

This is the cumulant generating function of the P (μ 1 + μ 2 ) distribution.

D.S.G. POLLOCK: ECONOMETRICS

Corollary. If xi ∼ P (μi); i = 1,... , n is a sequence of n mutually independent Poisson variates, then

xi ∼ P (

μi). Moreover, and if μi = μ for all i, then n−^1

xi = ¯x ∼ P (μ).

The Multinomial Distribution

In the binomial distribution, there are two mutually exclusive outcomes, A and Ac. In the multinomial distribution, there are k mutually exclusive outcomes A 1 , A 2 ,... , Ak, one of which must occur in any trial. The probabilities of these outcomes are denoted by p 1 , p 2 ,... , pk. In a sequence of n trials, there will be x 1 instances of A 1 , x 2 of A 2 and so on, including xk instances of Ak; and the sum of these instances is x 1 +x 2 +· · ·+xk = n. Since the outcomes of the trails are statistically independent, the probability that a particular sequence will arise which has these numbers of instances is given by the product px 1 1 px 22 · · · px kk. However, the number of such sequences is n!/{x 1 !x 2! · · · xk!}; and, together, they represent the set of mutually exclusive ways in which the numbers can occur. It follows that the multinomial probability function is given by

(37) M (x 1 , x 2 ,... , xk) =

n! x 1 !x 2! · · · xk!

px 1 1 px 2 2 · · · px kk

The following results concerning the moments may be derived in a straight- forward manner:

(38) E(xi) = μi = npi,

(39) V (xi) = σii = npi(1 − pi),

(40) C(xi, xj ) = σij = −npipj , i = j.

Example. The multinomial distribution provided the basis of one of the earliest statistical tests, which is Pearson’s goodness-of-fit test. If the number n of the trails is large enough, then the distribution of the random variable

(41) z^2 =

∑^ k

i=

(xi − npi)^2 npi

will be well approximated by a chi-square distribution of k −1 degrees of freedom. This statistic is an evident generalisation of the binomial statistic of (5). If p 1 ,... , pk are given a set of hypothesised probability values, then the validity of the hypothesis can be gauged via the resulting value of z^2. If z^2 exceeds a critical value, the hypothesis is liable to be rejected. By this mean, we can determine, for example, whether a sample comes from a multinomial distribution of known parameters. We are also able to test whether two multinomial distributions are the same without any prior knowledge of their parameters To understand how such comparisons may be conducted, let us begin by considering two independent multinomial distributions, the first of which has

D.S.G. POLLOCK: ECONOMETRICS

for this integral, and so we must work, instead, with the square of the integral. Consider, therefore, the product

I^2 =

−∞

e−x

(^2) / 2 dx ×

−∞

e−y

(^2) / 2 dy

−∞

−∞

e−(x

(^2) +y (^2) )/ 2 dxdy.

The variables in this double integral can be changed to polar coordinates by the substitution of x = ρ sin θ and y = ρ cos θ, which allows it to be written as

I^2 =

0

∫ (^2) π

0

ρe−ρ

(^2) / 2 dθdρ = 2π

0

ρe−ρ

(^2) / 2 dρ

= 2π

0

e−ω^ dω = 2π.

The change-of-variable technique entails the Jacobian factor, which is the de- terminant of the matrix of the transformation from (θ, ρ) to (x, y). This takes the value of ρ. The final integral is obtained via a change of variables that sets ρ^2 /2 = ω; and it takes the value of unity—see equation (64). Since I = (2π)^1 /^2 is the integral of the function exp{x^2 / 2 }, the upshot is that the standrds normal density function N (x; 0, 1) = (2π)−^1 /^2 exp{x^2 / 2 } integrates to unity over the range of x. The moment generating function of the standard normal N (x; 0, 1) distribu- tion is defined by

M (x, t) = E(ext) =

−∞

ext^

2 π

e−x

(^2) / 2 dx

−∞

2 π

e−(x

(^2) − 2 xt)/ 2 dx.

By completing the square, it is found that

x^2 − 2 xt = x^2 − 2 xt + t^2 − t^2 = (x − t)^2 − t^2.

Therefore, the moment generating function of the standard normal distribution is given by

M (x, t) = et

(^2) / 2

−∞

2 π

e−(x−t)

(^2) / 2 dx

= et

(^2) / 2 .

If y ∼ N (μ, σ^2 ) is a normal variate with parameters μ and σ^2 , then it can be expressed as y = σx + μ, where x ∼ N (0, 1) is a standard normal variate. Then the moment generating function is

(51) M (y, t) = eμtM (x, σt) = eμt+(σ

(^2) t (^2) )/ 2 .

15: STATISTICAL DISTRIBUTIONS

By differentiating the function with respect t and then setting t = 0, we can find the following moments:

E(y) = μ,

E(y^2 ) = σ^2 + μ^2 ,

V (y) = E(x^2 ) − {E(x)}^2 = σ^2.

Example. The normal distribution with μ = np and σ^2 = npq, where q = 1 − p, provides an approximation for the binomial b(x; n, p) distribution when n is large. This result can be demonstrated using the moment generating functions. Let

(53) z =

x − μ σ

x − np √ npq

The moment generating function for z is

M (z, t) = e−μt/σ^ M (x, t/σ) = e−μt/σ^ (q + pet/σ^ )n.

Taking logs and expanding the exponential term pet/σ^ gives

(55) log M (z, t) = −

μt σ

  • n log

⎣1 + p

t σ

t σ

t σ

where we have used p + q = 1. The logarithm on the RHS is in the form of log(1 + z), where z stands for the sum within the braces {, } times p. This is amenable to the Maclaurin expansion of (27) on the condition that |z| < 1. Since σ =

npq increases indefinitely with n, the condition is indeed fulfilled when n is sufficiently large. The Maclaurin expansion gives rise to the following expression:

log M (z, t) = −

μt σ

  • n

[

p

t σ

t σ

p^2 2

t σ

t σ

]

Collecting terms in powers of t gives

log M (z, t) =

μ σ

np σ

t + n

p σ

p^2 σ^2

t^2 2!

t^2 2

Here, the second equality follows because the coefficient of t is zero and that of t^2 /2! is unity. Moreover, the coefficients associated with {t^3 , t^4 ,.. .} all tend to zero as n increases. Thus lim(n → ∞) log M (z, t) = t^2 /2; from which it follows that

(58) lim(n → ∞)M (z, t) = et

(^2) / 2

This is the moment generating function of the standard normal distribution, given already by (50); and thus the convergence of z in distribution to a standard normal is demonstrated.

15: STATISTICAL DISTRIBUTIONS

For an integer value of n, the gamma type 1 gives the probability distribution of the waiting time to the nth event in a Poisson arrival process of unit mean. When n = 1, it becomes the exponential distribution, which relates to the waiting time for the first event. To define the type 2 gamma function, we consider the transformation z = βx. Then, by the change-of-variable technique, we have

γ 2 (z) = γ 1 {x(z)}

dx dz

e−z/β^ (z/β)α−^1 Γ(α)

β

Here we have changed the notation by setting α = n. The probability function of the type 2 gamma distribution is written more conveniently as

(69) γ 2 (z; α, β) =

e−z/β^ zα−^1 Γ(α)βα^

An important special case of the γ 2 distribution is when α = r/2 with r ∈ { 0 , 1 , 2 ,.. .} and β = 2. This is the so-called chi-square distribution of r degrees of freedom:

(70) χ^2 (x; r) =

e−x/^2 x(r/2)−^1 Γ(r/2)2r/^2

Now let us endeavour to find the moment generating function of the γ 1 distribution. We have

Mx(t) =

ext^

e−xxn−^1 Γ(n)

dx

e−x(1−t)xn−^1 Γ(n)

dx.

Now let w = x(1 − t). Then, by the change-of-variable technique,

Mx(t) =

e−wwn−^1 (1 − t)n−^1 Γ(n)

(1 − t)

dw

(1 − t)n

e−wwn−^1 Γ(n)

dw

(1 − t)n^

Also, the cumulant generating function is

κ(x; t) = −n log(1 − t)

= n

t +

t^2 2

t^3 3

D.S.G. POLLOCK: ECONOMETRICS

We find, in particular, that

(74) E(x) = V (x) = n.

We have defined the γ 2 distribution by

(75) γ 2 =

e−x/β^ xα−^1 Γ(α)βα^

; 0 ≤ x < ∞.

Hence the moment generating function is defined by

Mx(t) =

0

etxe−x/β^ xα−^1 Γ(α)βα^

dx

0

e−x(1−βt)/β^ xα−^1 Γ(α)βα^

dx.

Let y = x(1 − βt)/β, which gives dy/dx = (1 − βt)/β. Then, by the change-of- variable technique we get

Mx(t) =

0

e−y Γ(α)βα

βy 1 − βt

)α− 1 β (1 − βt)

dy

(1 − βt)α

yα−^1 e−y Γ(α)

dy

(1 − βt)α^

It follows that the cumulant generating function is

κ(x; t) = −α log(1 − βt) = −α

−βt −

β^2 t^2 2

β^3 t^3 3

= α

βt +

β^2 t^2 2

β^3 t^3 3

We find, in particular, that

E(x) = αβ

V (x) = αβ^2

Now consider two independent gamma variates of type 2: x 1 ∼ γ 2 (α 1 , β 1 ) and x 2 ∼ γ 2 (α 2 , β 2 ). Since x 1 and x 2 are independent, the cumulant generating function of their sum is the sum of their separate generating functions:

κ(x 1 + x 2 ; t) = κ(x 1 ; t) + κ(x 2 ; t)

= α 1

β 1 t +

β 12 t^2 2

β 13 t^3 3

  • α 2

β 2 t +

β^22 t^2 2

β 23 t^3 3

D.S.G. POLLOCK: ECONOMETRICS

Next, we shall prove an important theorem.

(88) If x ∼ γ 2 (α, λ) and y ∼ γ 2 (θ, λ) are independent random vari- ables which have the gamma type 2 distribution, then x/y = z ∼ β 2 (α, θ) is distributed as a beta type 2 variable.

Proof. When x ∼ γ 2 (α, λ) and y ∼ γ 2 (θ, λ) are independently distributed, there is

(89) f (x, y) = e−(x+y)/λ^

xα−^1 yθ−^1 λα+θ^ Γ(α)Γ(θ)

Let v = x/y and w = x + y, whence

(90) y =

w v + 1

, x =

vw v + 1

and w, v > 0. Also, let

(91) J =

∂x ∂w

∂x ∂v ∂y ∂w

∂x ∂v

v v + 1

w (v + 1)^2 1 v + 1

−w (v + 1)^2

be the matrix of partial derivatives of the mapping from (v, w) to (x, y). The Jacobian of this transformation, which is absolute value of the determinant of the matrix, is

(92) ‖J‖ =

w (1 + v)^2

It follows that the joint distribution of (v, w) is

g(w, v) = exp

−w λ

vw v + 1

}α− 1 { w v + 1

}θ− 1 w (v + 1)^2

λα+θ^ Γ(α)Γ(θ)

=

exp(−w/λ)wα+θ−^1 λα+θ^ Γ(α + θ)

×

vα−^1 (1 + v)α+θ^ B(α, θ) = γ 2 (w) × β 2 (v).

Here w ∼ γ 2 (α + θ, λ) and v ∼ β 2 (β, θ) are independent random variables; and, moreover, v has the required beta type 2 distribution.

The Multivariate Normal Distribution

Let {x 1 , x 2 ,... , xn} be a sequence of independent random variables each distributed as N (0, 1). Then the joint density function f (x 1 , x 2 ,... , xn) is given by the product of the individual density functions:

(94) f (x 1 , x 2 ,... , xn) = (2π)−n/^2 exp

− 12 (x^21 + x^22 + · · · + x^2 n)

15: STATISTICAL DISTRIBUTIONS

The sequence of n independent N (0, 1) variates constitutes a vector x of order n, and the sum of squares of the elements is the quadratic x′x. The zero valued expectations of the elements of the sequence can be gathered in a zero vector E(x) = 0 of order n, and their unit variances can be represented by the diagonal elements of an identity matrix of order n which constitutes the variance– covariance or dispersion matrix of x:

D(x) = E[{x − E(x)}{x − E(x)}′] = E(xx′) = I.

In this notation, the probability density function of the vector x is the n-variate standard normal function

(96) N (x; 0, I) = (2π)−n/^2 exp

− 12 x′x

Next, we consider a more general normal density function which arises from a linear transformation of x followed by a translation of the resulting vector. The combined transformations give rise to the vector y = Ax + μ. It is reasonable to require that A is a matrix of full row rank, which implies that the dimension of y can be no greater than the dimension of x and that none of the variables within Ax is expressible as a linear combination of the others. In that case, there is

E(y) = AE(x) + μ = μ and

D(y) = E[{y − E(y)}{y − E(y)}′] = E[{Ax − E(Ax)}{Ax − E(Ax)}′] = AD(x)A′^ = AA′^ = Σ.

The density function of y is found via the change-of-variable technique. This involves expressing x in terms of the inverse function x(y) = A−^1 (y − μ) of which the Jacobian is

∂x ∂y

∥ =^ ‖A

The resulting probability density function of the multivariate normal distribution is

(99) N (y; μ, Σ) = (2π)−n/^2 |Σ|−^1 /^2 exp{− 12 (y − μ)′Σ−^1 (y − μ)}.

We shall now consider the relationships which may subsist between groups of elements within the vector x. Let the vectors x and E(x) = μ and the dispersion matrix Σ be partitioned conformably to yield

(100) x =

[

x 1 x 2

]

, μ =

[

μ 1 μ 2

]

and Σ =

[

]

where x 1 is of order p, x 2 is of order q and p + q = n.

15: STATISTICAL DISTRIBUTIONS

When it is written in terms of x 1 and x 2 , the condition becomes

0 = E[{[x 1 − E(x 1 )] − B′[x 2 − E(x 2 )]}{x 2 − E(x 2 )}′] = Σ 12 + B′Σ 22.

The solution is B′^ = Σ 12 Σ− 221 ; and thus the transformation is given by

(110) y =

[

y 1 y 2

]

[

Ip −Σ 12 Σ− 221 0 Iq

] [

x 1 x 2

]

= Qx.

Now, if x ∼ N (μ, Σ), then it follows that y ∼ N (Qμ, QΣQ′), where

QΣQ′^ =

[

Ip −Σ 12 Σ− 221 0 Ip

] [

] [

Ip 0 −Σ− 221 Σ 21 Ip

]

[

]

The condition of non-correlation implies that y 1 and y 2 are statistically inde- pendent. Therefore, the joint distribution of y 1 and y 2 is the product of their marginal distributions: N (y) = N (y 1 ) × N (y 2 ). Next, we use the change-of-variable technique to recover the distribution of x from that of y. We note that the Jacobian of the transformation from x to y is unity (since its matrix is triangular with units on the principal diagonal). Thus, by using the inverse transformation x = x(y), we can write the distribution of x as (112) N (x; μ, Σ) = N {y(x); E(y), QΣQ′} = N {y 1 (x); E(y 1 ), Σ 11 − Σ 12 Σ− 221 Σ 21 } × N {y 2 (x); E(y 2 ), Σ 22 },

wherein, there are

y 1 (x) = x 1 − B′x 2 ,

E(y 1 ) = μ 1 − B′μ 2 ,

y 2 (x) = x 2 and

E(y 2 ) = μ 2.

The second of the factors on the RHS of (112) is the marginal distribution of x 2 = y 2 (x). Since the product of the two factors is the joint distribution of x 1 and x 2 , the first of the factors on the RHS must be the conditional distribution of x 1 given x 2. A summary of these results is as follows:

(114) If x ∼ N (μ, Σ), is partitioned as x = [x′ 1 , x′ 2 ]′^ with μ = [μ′ 1 , μ′ 2 ]′ partitioned conformably, then the marginal distribution of x 2 is N (x 2 ; μ 2 , Σ 22 ) and the conditional distribution of x 1 given x 2 , is N (y 1 ; E{y 1 }, Σ 11 − Σ 12 Σ− 221 Σ 21 ), where y 1 = x 1 − B′x 2 with B′^ = Σ 12 Σ− 221 , and where E{y 1 } = μ 1 − B′μ 2.

D.S.G. POLLOCK: ECONOMETRICS

Within the conditional distribution, there is the quadratic form

(115) [y 1 − E(y 1 )]′[Σ 11 − Σ 12 Σ− 221 Σ 21 ]−^1 [y 1 − E(y 1 )].

This contains the term

ε = y 1 − E(y 1 ) = x 1 − μ 1 − B′(x 2 − μ 2 ) = x 1 − E(x 1 |x 2 ).

The conditional expectation, which is

(117) E(x 1 |x 2 ) = E(x 1 ) + B′(x 2 − μ 2 ),

is commonly described as the equation of the regression of x 1 on x 2 , whilst B′^ is the matrix of the regression parameters. The quantity denoted by ε is described as the vector of prediction errors or regression disturbances.

The Chi-square Distribution

Chi-square distribution is a special case of a type 2 gamma distribution. The type 2 gamma has been denoted by γ 2 (α, β), and its functional form is given by equation (69). When α = r/2 and β = 2, the γ 2 density function becomes the probability density function of a chi-square distribution of r degrees of freedom:

(118) χ^2 (x; r) =

e−x/^2 x(r/2)−^1 Γ(r/2)2r/^2

The importance of the chi-square is in its relationship with the normal dis- tribution: the chi-square of one degree of freedom represents the distribution of the quadratic exponent (x−μ)^2 /σ^2 of the univariate normal N (x; μ, σ^2 ) function. To demonstrate the relationship, let us consider the integral of a univariate standard normal N (z; 0, 1) function, over the interval [−θ, θ] together with the integral of the density function of v = z^2 over the interval [0, θ^2 ]. We can use the change-of-variable technique to find the density function of v. The following relationship must hold:

∫ (^) θ

−θ

N (z)dx = 2

∫ (^) θ^2

0

N {z(v)}

dz dv

∣ dv.

To be more explicit, we can use z^2 = v and dz/dv = v−^1 /^2 /2 in writing the following version of the equation:

∫ (^) θ

−θ

2 π

e−z

(^2) / 2 dy =

∫ (^) θ^2

0

2 π

e−v/^2 v−^1 /^2 dv

∫ (^) θ 2

0

e−v/^2 v−^1 /^2 Γ(1/2)

dv.