Homework Assignment: Statistical Inference and Model Comparison - Prof. John Monahan, Assignments of Statistics

A series of simulation study problems related to statistical inference and model comparison. Topics include: estimators and their asymptotics, standard errors under heteroskedasticity, error structure in pd/pk models, testing variance components, and analysis of rate statistics. Students are expected to compare different estimators, estimators under heteroskedasticity, and model performances.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-qex
koofers-user-qex 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Homework #7 Simulation Study Problems
ST790R
01 November 2008
1 Least Three-halves Estimator
Recall from Exercise 8.13, the estimator ˜µthat minimizes Pi|Xiµ|3/2. The asymptotics for this estimator
should follow nµµ)Nor mal(0, a/b)
where a=R|x|f(x)dx is estimated by Pi|Xi˜µ|and b=R|x|1/2f0(x)dx2is estimated by Pi|Xi˜µ|1/2/22
Compare this estimator to other location estimators; see where asymptotics apply.
2 Standard Errors under Heteroskedasticity
The sandwich covariance estimate in Chapter 9 is a generalization of some work by Halbert White (among
others) on the effect of heteroskedasicity (different variances) in multiple regression. Under standard (ho-
moskedastic) assumptions with V ar(ei) = σ2, the covariance matrix of the parameter estimates is the usual
σ2(XTX)1; sought is a consistent estimator under heteroskedasticity. One proposed estimator is
H1=n
np(XTX)1(XT1X)(XTX)1
where 1=diag{ˆei}and ˆei, i = 1,. . . , N are residuals. A second estimator does a different correction
H2= (XTX)1(XT2X)(XTX)1
where 2=diag{ˆei/(1 (PX)ii)}. Compare these estimators with the usual.
3 Error Structure in PD/PK models
In pharmacodynamic/pharmacokinetic models, the response often a chemical concentration must be
nonnegative. Two routes are commonly used for fitting these nonlinear regression models:
Using generalized least squares with error variance proportional to the mean, or square: YjNormal(gj, σ2g2θ
j),
where θmay be 0, 1/2, or 1.
Fitting a log-normal model: log(Yj)Normal(log(gj), σ2)
Choose one of these four (that is, three values of θand log-normal) as the truth and compare the
performance of these models. Include as another (fifth) competitor a model with no heteroskedasticity.
1
pf2

Partial preview of the text

Download Homework Assignment: Statistical Inference and Model Comparison - Prof. John Monahan and more Assignments Statistics in PDF only on Docsity!

Homework #7 – Simulation Study Problems

ST790R

01 November 2008

1 Least Three-halves Estimator

Recall from Exercise 8.13, the estimator ˜μ that minimizes

i |Xi^ −^ μ|

3 / (^2). The asymptotics for this estimator

should follow (^) √ n(˜μ − μ) ≈ N ormal(0, a/b)

where a =

|x|f (x)dx is estimated by

i |Xi−μ˜|^ and^ b^ =^

[∫

|x|^1 /^2 f ′(x)dx

] 2

is estimated by

[∑

i |Xi^ −^ μ˜|

− 1 / 2 / 2 ]^2

Compare this estimator to other location estimators; see where asymptotics apply.

2 Standard Errors under Heteroskedasticity

The sandwich covariance estimate in Chapter 9 is a generalization of some work by Halbert White (among others) on the effect of heteroskedasicity (different variances) in multiple regression. Under standard (ho- moskedastic) assumptions with V ar(ei) = σ^2 , the covariance matrix of the parameter estimates is the usual σ^2 (XT^ X)−^1 ; sought is a consistent estimator under heteroskedasticity. One proposed estimator is

H 1 = n n − p

(XT^ X)−^1 (XT^ Ω 1 X)(XT^ X)−^1

where Ω 1 = diag{ˆei} and ˆei, i = 1,... , N are residuals. A second estimator does a different correction

H 2 = (XT^ X)−^1 (XT^ Ω 2 X)(XT^ X)−^1

where Ω 2 = diag{ˆei/(1 − (PX)ii)}. Compare these estimators with the usual.

3 Error Structure in PD/PK models

In pharmacodynamic/pharmacokinetic models, the response – often a chemical concentration – must be nonnegative. Two routes are commonly used for fitting these nonlinear regression models:

  • Using generalized least squares with error variance proportional to the mean, or square: Yj ∼ N ormal(gj , σ^2 g^2 j θ), where θ may be 0, 1/2, or 1.
  • Fitting a log-normal model: log(Yj ) ∼ N ormal(log(gj ), σ^2 )

Choose one of these four (that is, three values of θ and log-normal) as the truth and compare the performance of these models. Include as another (fifth) competitor a model with no heteroskedasticity.

4 Testing Variance Components – Balanced

Consider the balanced (ni = n) one-way ANOVA case of Yij = μ + αi + eij where eij N ormal(0, σ^2 ) and, independently, αi N ormal(0, σ^2 a). We want to test the hypothesis H : σ a^2 = 0. Two approaches are considered.

a) The usual F-test: F = SSA/ [a − 1] SSE/ [a(n − 1)]

where SSA = n

i(yi.^ −^ y..)

2 , SSE = ∑

ij (yij^ −^ yi.) (^2) , and reject H if F is too big.

b) Likelihood Ratio Test

the log-likelihood under the alternative can be written as

`(μ, σ^2 a, σ^2 ) = −

log(2π) −

log

[

(σ^2 )a(n−1)(σ^2 + nσ^2 a)a)

]

[

SSE/σ^2 + SSA/(σ^2 + nσ a^2 ) + an(y.. − μ)^2 /(σ^2 + nσ a^2 )

]

and you should be able to write and maximize the likelihood under the hypothesis. Reject H if the difference in log-likelihoods is too large.

5 Analysis of Rate Statistics

In evaluating the performance of public health programs, often the statistics are cited in terms of rates of incidence of disease per unit, say, deaths per 10,000. In cases where only aggregate data are available, sometimes the aggregation units are very different in size. For example, in North Carolina, disease prevalence data are available for each county either in terms of counts or in terms of rates per 1,000 or some similar unit. (There are some very large counties in NC, e.g. Mecklenberg and Wake, as well as many very small ones.) So the true model may be that the rate in county i follows λi = β 0 + β 1 incomei with a single covariate of income, and the data are only available by county, so the number observed Yi in county i may be Poisson with rate popiλi. Compare some different methods of analysis:

  1. Simple linear regression of the rate Yi/popi on income.
  2. Simple linear regression of the square root of the incidence

Yi on income.

  1. Generalized least squares of the rate Yi/popi with variance proportional to the reciprocal of popi
  2. Poisson regression of the incidence Yi with rate popiexp{β 0 + β 1 incomei}
  3. Finally, you might try to fit the true model – a Poisson regression

The file ncc03q4.dat in the ’rfiles’ directory holds NC County data as of the end of 2003. The columns are:

  1. county name (character)
  2. population
  3. median household income
  4. per capita personal income

Choose one of the two income measures.