Probit for Estimating Dichotomous Regression: Notation, Assumptions, and ML Estimation, Study notes of Mathematical Statistics

An overview of the probit procedure, a statistical method used to estimate the effects of independent variables on a dichotomous dependent variable. It covers the notation, assumptions, and maximum-likelihood estimation process of the probit model. Particularly useful for students and researchers in statistics, econometrics, and related fields.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
PROBIT1
The Probit procedure is used to estimate the effects of one or more independent
variables on a dichotomous dependent variable. The program is designed for
dose-response analyses and related models, but Probit can also estimate logistic
regression models.
Notation
The following notation is used throughout this chapter unless otherwise stated:
m Number of covariate patterns
ni Number of subjects for ith covariate pattern
ri Number of responses for ith covariate pattern
p Number of independent variables
q Number of levels of the grouping variable. q=0 when there is no gr ouping
variable
c Natural response rate
X npq×+() matrix with element xij , which represents the jth covariate for the
ith covariate pattern
γ
p×1 vector with element
γ
j, which represents the slope parameter of the jth
independent variable
α q×1vector with element
α
j, which represents the parameter for the jth level
of the grouping variable
β
()pq1 vector which is a composite of
γ
and α
s Total number of parameters in the model, equal to pq+ if the natural
response rate is set to a constant, pq++1 if the natural response rate is t o be
estimated by the model
1 This algorithm a pplies to SPSS 5.0 and later releases. To learn about algorithms
for previous releases, call SPSS Technical Support.
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Probit for Estimating Dichotomous Regression: Notation, Assumptions, and ML Estimation and more Study notes Mathematical Statistics in PDF only on Docsity!

PROBIT

The Probit procedure is used to estimate the effects of one or more independent variables on a dichotomous dependent variable. The program is designed for dose-response analyses and related models, but Probit can also estimate logistic regression models.

Notation

The following notation is used throughout this chapter unless otherwise stated:

m (^) Number of covariate patterns n (^) i Number of subjects for i th covariate pattern ri Number of responses for i th covariate pattern p (^) Number of independent variables q (^) Number of levels of the grouping variable. q = 0 when there is no grouping variable c Natural response rate X (^) n × ( p + q ) matrix with element xij , which represents the j th covariate for the i th covariate pattern

γ p × 1 vector with element γ j , which represents the slope parameter of the j th

independent variable

α q × 1 vector with element α j , which represents the parameter for the j th level

of the grouping variable β ( p + q )× 1 vector which is a composite of γ and α s (^) Total number of parameters in the model, equal to p + q if the natural response rate is set to a constant, p + q + 1 if the natural response rate is to be estimated by the model

1 This algorithm applies to SPSS 5.0 and later releases. To learn about algorithms for previous releases, call SPSS Technical Support.

Model

The model assumes a dichotomous dependent variable with probability P for the event of interest. Since the procedure assumes aggregated data for every covariate pattern, the random variable yi takes a binomial distribution.

P y r n r i i i P^ P^ i^ m i i

r i

= = F i^ n^ i^ ri

HG

I

KJ^

b g b 1 −^ g − =1,^ K,

Hence, the log likelihood, L , for m observations after ignoring the constant factor can be written as

L ri Pi ni ri Pi i

m = + − − =

∑ ln^ b^ g bln^1 g 1

For dose-response models, it is further assumed that

Pi = c + (^) a 1 − c F f e X ' i^ βj (1)

where X i is the vector of covariates for the i th covariate pattern and F (^) e X (^) i '^ βj has two forms:

F

e e e

i z

i i i

X

X X X

' /

' ' β '

β β e j (^) β

R

S

| |

T

| |

z−∞

(^2 )

if logit model

dz if probit model π

When there is no grouping variable, xij is simply the observed value of the j th independent variable for the i th covariate pattern, and β = γ. When there is a grouping variable, a set of indicator variables is constructed. There will be q

indicator variables l i 1 , K, liq added to the X matrix and q parameters α 1 , K,α q

added to the β vector.

Natural Response Rate

When the user specifies a fixed number for the natural response rate, Lc^ *^ is set to 0 for iterations and the bound for c is set equal to the fixed number.

Initial Values

The initial value for each β is set to 0. If there is a control group, the initial value

of c , designated by c 0 , is set to the ratio of the response to the number of subjects for the control group. If there is no control group, then c 0 is set to the minimum ratio of the response to the number of subjects, over all covariate patterns.

Criteria

Users can control two criteria, ITER and CONV. ITER is the maximum number

of iterations allowed. The default value is max b50 3 , a s + 1 fg. CONV (criterion of

convergence) is the same as the OPTOLERANCE criterion in CNLR.

Asymptotic Covariance Matrix

The asymptotic covariance matrix for the MLE (^) e c $, β$^1 , K, β$ p + q j is estimated by I −^1 , where I is the information matrix containing the negatives of the second partial derivatives of L.

2 2 2 2 1

2 1 1

L 1

c

r n P P P

r P P

i i i F i i

i i i^ i

m = (^) i

L

N

M M

O

Q

P P

=

∑ b g b^ g^

e e X '^ βjj

∂ ∂ ∂β

2 2 2 1

L

c

x c F r^ n P P P

r P P

r n P P P

dF j ij^ i^ d

i i i i i

i i i

i i i i i^ i

m (^) i

i

L

N

M M

O

Q

P P

F

H

GG

I

K

JJ

∑ a^ fe^ e^ jj^ b g b^ g^ b^ g

e j X

X
X

'

' β (^) '

β β

where

dF d

i F^ F i

i i i

X X

X X X

' '

' '

exp '

β β

β β β

d i d^ i^ d^ d^ ii { d^ i}

=

− −

R S

|

T|

1 1 2

2

if logit model 1 2

if probit model π

2 2 2 2 1

2

1

2 2

L (^) c r n P P P

r P P

x x

dF d

c r n P P P x x

d F d

j h

i i i i i

i i i^ i

m ij ih

i i

i i i i i^ i

m ij ih

i i

L

N

M M

O

Q

P P

F

H

G G

I

K

J J

L N

M M

O Q

P P

=

=

e j b g b^ g

e j

a f b g

e j

X
X
X
X

' '

' '

β β

β β

where

d F d

i F^ F^ F i

i i i

i i

2 (^2 ) 2

2

X
X
X X X
X X

' '

' ' '

' (^) exp '

β β

β β β

β β

e j e^ j^ e^ e^ jj^ e^ e^ jj e j e j

− F− H

I K

R S

|

T

|

if logit model

if probit model

Frequency Table and Goodness of Fit

For every covariate pattern i , i = 1, K, m , compute

$

' ' ' /

F

e e e

i z

i i i

R
S
T

z−∞

X X X

β β β

(^2 )

if logit model

dz if probit model π

P^ $ i = c $^ + b 1 − c F $g$ i

Then the expected frequency is equal to

E^ $^ i = n Pi $ i

The Pearson chi-square statistic is defined by

x g g

x v

t g

v x x v g v v

h

j

j

j j j j j

j

F

H

GG

I

K

JJ

F

H

G

GG

I

K

J

JJ

R

S

T

U

V

W

2

cov $^ , $ $

$

$ (^) cov $ (^) , $ $ $

cov $^ , $ $

d i c h

a f d^ i^ d^ i^ c h (^) d i

e d^ ij c h

where

g

t v

t v h

R

S

| |

T

| |

2 2 2 2

$ $ $ $

c h

c h

without heterogeneity factor

with heterogeneity factor

t (^) t df

R
S
T|

0 025

. ,

without heterogeneity factor a f with heterogeneity factor

h = x (^) j^2 / a f df

h h

* = RS
T

1 without heterogeneity factor with heterogeneity factor

The heterogeneity factor is used if the Pearson chi-square statistic is significant.

Note : If the covariate (predictor variable) x is transformed, transform it back to the original metrics for the estimate and its two limits. For example, if log 10 is applied to the predictor for the analysis and x $^ (^) L , x $ , and x $ U are the lower limit, the estimate, and the upper limit on the log 10 scale, then 10 x^ $^^ L^ and 10 $ xU are the lower and upper limits on the original scale.

Relative Median Potency

The relative median potency is available when there is a factor variable and the covariate is transformed. It is not available if there is no factor variable or if there is more than one covariate. The estimate of relative median potency for group j versus group k is

M (^) jk = (^) d α $^ k −α$^ j i/ γ$

and its 95% confidence limit is

M g g

M v v

t g

v M v M v g v v v jk +^ jk jk jk h

F

HG^

I

KJ^

F

H

G

I

K

J

R

S

T|

U

V

1 1 W|

22

11 12 11 12 22

2 22

2 $

γ b g

where

v v v v v v

j k j k j k

11 12 22

( $^ ) $^ cov( $^ , $^ ) cov( $^ , $^ ) cov( $^ , $^ ) ( $^ )

c h

Note: If the covariate (predictor variable) x is transformed, transform it back to the original metrics for the relative median potency.

Parallelism Test Chi-Square Statistic

The parallelism test is available only if there is a factor variable.

χ 2 χ^2 χ^2

1

=

j j

q