





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of the probit procedure, a statistical method used to estimate the effects of independent variables on a dichotomous dependent variable. It covers the notation, assumptions, and maximum-likelihood estimation process of the probit model. Particularly useful for students and researchers in statistics, econometrics, and related fields.
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






The Probit procedure is used to estimate the effects of one or more independent variables on a dichotomous dependent variable. The program is designed for dose-response analyses and related models, but Probit can also estimate logistic regression models.
Notation
The following notation is used throughout this chapter unless otherwise stated:
m (^) Number of covariate patterns n (^) i Number of subjects for i th covariate pattern ri Number of responses for i th covariate pattern p (^) Number of independent variables q (^) Number of levels of the grouping variable. q = 0 when there is no grouping variable c Natural response rate X (^) n × ( p + q ) matrix with element xij , which represents the j th covariate for the i th covariate pattern
independent variable
of the grouping variable β ( p + q )× 1 vector which is a composite of γ and α s (^) Total number of parameters in the model, equal to p + q if the natural response rate is set to a constant, p + q + 1 if the natural response rate is to be estimated by the model
1 This algorithm applies to SPSS 5.0 and later releases. To learn about algorithms for previous releases, call SPSS Technical Support.
The model assumes a dichotomous dependent variable with probability P for the event of interest. Since the procedure assumes aggregated data for every covariate pattern, the random variable yi takes a binomial distribution.
P y r n r i i i P^ P^ i^ m i i
r i
Hence, the log likelihood, L , for m observations after ignoring the constant factor can be written as
L ri Pi ni ri Pi i
m = + − − =
∑ ln^ b^ g bln^1 g 1
For dose-response models, it is further assumed that
Pi = c + (^) a 1 − c F f e X ' i^ βj (1)
where X i is the vector of covariates for the i th covariate pattern and F (^) e X (^) i '^ βj has two forms:
e e e
i z
i i i
X X X
' /
' ' β '
β β e j (^) β
R
S
| |
T
| |
−
(^2 )
if logit model
dz if probit model π
When there is no grouping variable, xij is simply the observed value of the j th independent variable for the i th covariate pattern, and β = γ. When there is a grouping variable, a set of indicator variables is constructed. There will be q
added to the β vector.
When the user specifies a fixed number for the natural response rate, Lc^ *^ is set to 0 for iterations and the bound for c is set equal to the fixed number.
of c , designated by c 0 , is set to the ratio of the response to the number of subjects for the control group. If there is no control group, then c 0 is set to the minimum ratio of the response to the number of subjects, over all covariate patterns.
Users can control two criteria, ITER and CONV. ITER is the maximum number
convergence) is the same as the OPTOLERANCE criterion in CNLR.
The asymptotic covariance matrix for the MLE (^) e c $, β$^1 , K, β$ p + q j is estimated by I −^1 , where I is the information matrix containing the negatives of the second partial derivatives of L.
2 2 2 2 1
2 1 1
c
r n P P P
r P P
i i i F i i
i i i^ i
m = (^) i
L
N
M M
O
Q
P P
=
∑ b g b^ g^
e e X '^ βjj
∂ ∂ ∂β
2 2 2 1
c
x c F r^ n P P P
r P P
r n P P P
dF j ij^ i^ d
i i i i i
i i i
i i i i i^ i
m (^) i
i
L
N
M M
O
Q
P P
F
H
GG
I
K
∑ a^ fe^ e^ jj^ b g b^ g^ b^ g
e j X
'
' β (^) '
β β
where
dF d
i F^ F i
i i i
X X
X X X
' '
' '
exp '
β β
β β β
d i d^ i^ d^ d^ ii { d^ i}
=
− −
R S
|
T|
1 1 2
2
if logit model 1 2
if probit model π
2 2 2 2 1
2
1
2 2
L (^) c r n P P P
r P P
x x
dF d
c r n P P P x x
d F d
j h
i i i i i
i i i^ i
m ij ih
i i
i i i i i^ i
m ij ih
i i
L
N
M M
O
Q
P P
F
H
G G
I
K
J J
L N
M M
O Q
P P
=
=
∑
∑
e j b g b^ g
e j
a f b g
e j
' '
' '
β β
β β
where
d F d
i F^ F^ F i
i i i
i i
2 (^2 ) 2
2
' '
' ' '
' (^) exp '
β β
β β β
β β
e j e^ j^ e^ e^ jj^ e^ e^ jj e j e j
− F− H
I K
R S
|
T
|
if logit model
if probit model
For every covariate pattern i , i = 1, K, m , compute
$
' ' ' /
e e e
i z
i i i
−
X X X
β β β
(^2 )
if logit model
dz if probit model π
Then the expected frequency is equal to
E^ $^ i = n Pi $ i
The Pearson chi-square statistic is defined by
x g g
x v
t g
v x x v g v v
h
j
j
j j j j j
j
2
cov $^ , $ $
$
$ (^) cov $ (^) , $ $ $
cov $^ , $ $
d i c h
a f d^ i^ d^ i^ c h (^) d i
e d^ ij c h
where
g
t v
t v h
R
S
| |
T
| |
2 2 2 2
$ $ $ $
c h
c h
without heterogeneity factor
with heterogeneity factor
t (^) t df
0 025
. ,
without heterogeneity factor a f with heterogeneity factor
h = x (^) j^2 / a f df
h h
1 without heterogeneity factor with heterogeneity factor
The heterogeneity factor is used if the Pearson chi-square statistic is significant.
Note : If the covariate (predictor variable) x is transformed, transform it back to the original metrics for the estimate and its two limits. For example, if log 10 is applied to the predictor for the analysis and x $^ (^) L , x $ , and x $ U are the lower limit, the estimate, and the upper limit on the log 10 scale, then 10 x^ $^^ L^ and 10 $ xU are the lower and upper limits on the original scale.
The relative median potency is available when there is a factor variable and the covariate is transformed. It is not available if there is no factor variable or if there is more than one covariate. The estimate of relative median potency for group j versus group k is
M (^) jk = (^) d α $^ k −α$^ j i/ γ$
and its 95% confidence limit is
M g g
M v v
t g
v M v M v g v v v jk +^ jk jk jk h −
22
11 12 11 12 22
2 22
2 $
where
v v v v v v
j k j k j k
11 12 22
( $^ ) $^ cov( $^ , $^ ) cov( $^ , $^ ) cov( $^ , $^ ) ( $^ )
c h
Note: If the covariate (predictor variable) x is transformed, transform it back to the original metrics for the relative median potency.
The parallelism test is available only if there is a factor variable.
1
=
∑ j j
q