Download simulation study

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Principal component regression wikipedia , lookup

Lasso (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Time series wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Homework #7 – Simulation Study Problems
ST758
01 November 2011
1
Least Maximum (L∞ ) Regression
P
The location problem in Exercise 8.13 describes the estimator µ̃ that minimizes i |Xi − µ|3/2 as
intermediate between L2 (least squares) and L1 (least absolute value). For simple linear regression,
pushing in the other direction leads to L∞ , minimizing
maxi |Yi − β0 − β1 xi |.
Compare estimators based on this criterion to the usual least squares estimator in cases where the
error distribution has heavier (e.g. exponential or logistic) or lighter (e.g. uniform) tails than the
normal. (See the code ’slrinf.r’ in the rfiles directory for hints on computation.)
2
Tobit Regression
This kind of censored regression model arises in Econometrics where the response variable Y is
censored at some value c, here c = 0, e.g. the number of overtime hours when a worker is unemployed.
Statistically, if the censoring is ignored, the usual least squares regression coefficients will be
biased. The usual approach is to take account of the censoring with maximum likelihood, where the
log-likelihood is
X
X
1
1
`(β, σ 2 ) =
{− log(σ 2 ) − (Yi − β T xi )2 /σ 2 } +
log Φ((c − β T xi )/σ)
2
2
uncensored
censored
One point of interest may be the effect of design characteristics (spread of X, fraction of censoring,
noise level) on the bias of least squares estimates. Another may be the performance of ML estimates.
3
Error Structure in PD/PK models
In pharmacodynamic/pharmacokinetic models, the response – often a chemical concentration – must
be nonnegative. Two routes are commonly used for fitting these nonlinear regression models:
• Using generalized least squares with error variance related to the mean: Yj ∼ N ormal(gj , σ 2 gj2θ ),
where θ may be 0, 1/2, or 1.
• Fitting a log-normal model: log(Yj ) ∼ N ormal(log(gj ), σ 2 )
Choose one of these four (that is, three values of θ and log-normal) as the truth and compare
the performance of some of these models. Always include as a competitor the model with no
heteroskedasticity (θ = 0).
1
4
Standard Errors under Heteroskedasticity
The sandwich covariance estimate in Chapter 9 is a generalization of some work by Halbert White
(among others) on the effect of heteroskedasicity (different variances) in multiple regression. Under
standard (homoskedastic) assumptions with V ar(ei ) = σ 2 , the covariance matrix of the parameter
estimates is the usual σ 2 (XT X)−1 ; a consistent estimator under heteroskedasticity is sought. One
proposed estimator is
n
(XT X)−1 (XT Ω1 X)(XT X)−1
H1 =
n−p
where Ω1 = diag{ê2i } and êi , i = 1, . . . , N are residuals. A second estimator does a different
correction
H2 = (XT X)−1 (XT Ω2 X)(XT X)−1
where Ω2 = diag{ê2i /(1 − (PX )ii )}. Compare these estimators with the usual σ̂ 2 (XT X)−1 under
both homoskedastic and heteroskedastic error.
5
Reduced Major Axis Regression
An estimate of slope in simple linear regression
E(yi ) = α + βxi
that has been proposed in measurement error problems is known as ’reduced major axis regression’
(RMA). I’ll skip the geometric motivation and just give the formulas. The RMA estimate of the
slope is
q
X
(xi − x)(yi − y),
βRM A = sign(r) Sxx /Syy , where Sxy =
and the intercept estimate follows the usual αRM A = y − βRM A x. Note that the sign of the centered
crossproduct Sxy could be used in place of the usual Pearson correlation r. Analogues to standard
formulas have been suggested for obtaining standard errors, as
p
se(βRM A ) = σ̂RM A / Sxx , and
q
se(αRM A = σ̂RM A 1/n + x2 /Sxx
P
2
where σ̂RM
(yi − αRM A − βRM A xi )2 . The effect of design characteristics on the performance
A =
of the estimator (and its competitors) would be interesting.
2