Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Principal component regression wikipedia , lookup
Lasso (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Time series wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Choice modelling wikipedia , lookup
Regression analysis wikipedia , lookup
Homework #7 – Simulation Study Problems ST758 01 November 2011 1 Least Maximum (L∞ ) Regression P The location problem in Exercise 8.13 describes the estimator µ̃ that minimizes i |Xi − µ|3/2 as intermediate between L2 (least squares) and L1 (least absolute value). For simple linear regression, pushing in the other direction leads to L∞ , minimizing maxi |Yi − β0 − β1 xi |. Compare estimators based on this criterion to the usual least squares estimator in cases where the error distribution has heavier (e.g. exponential or logistic) or lighter (e.g. uniform) tails than the normal. (See the code ’slrinf.r’ in the rfiles directory for hints on computation.) 2 Tobit Regression This kind of censored regression model arises in Econometrics where the response variable Y is censored at some value c, here c = 0, e.g. the number of overtime hours when a worker is unemployed. Statistically, if the censoring is ignored, the usual least squares regression coefficients will be biased. The usual approach is to take account of the censoring with maximum likelihood, where the log-likelihood is X X 1 1 `(β, σ 2 ) = {− log(σ 2 ) − (Yi − β T xi )2 /σ 2 } + log Φ((c − β T xi )/σ) 2 2 uncensored censored One point of interest may be the effect of design characteristics (spread of X, fraction of censoring, noise level) on the bias of least squares estimates. Another may be the performance of ML estimates. 3 Error Structure in PD/PK models In pharmacodynamic/pharmacokinetic models, the response – often a chemical concentration – must be nonnegative. Two routes are commonly used for fitting these nonlinear regression models: • Using generalized least squares with error variance related to the mean: Yj ∼ N ormal(gj , σ 2 gj2θ ), where θ may be 0, 1/2, or 1. • Fitting a log-normal model: log(Yj ) ∼ N ormal(log(gj ), σ 2 ) Choose one of these four (that is, three values of θ and log-normal) as the truth and compare the performance of some of these models. Always include as a competitor the model with no heteroskedasticity (θ = 0). 1 4 Standard Errors under Heteroskedasticity The sandwich covariance estimate in Chapter 9 is a generalization of some work by Halbert White (among others) on the effect of heteroskedasicity (different variances) in multiple regression. Under standard (homoskedastic) assumptions with V ar(ei ) = σ 2 , the covariance matrix of the parameter estimates is the usual σ 2 (XT X)−1 ; a consistent estimator under heteroskedasticity is sought. One proposed estimator is n (XT X)−1 (XT Ω1 X)(XT X)−1 H1 = n−p where Ω1 = diag{ê2i } and êi , i = 1, . . . , N are residuals. A second estimator does a different correction H2 = (XT X)−1 (XT Ω2 X)(XT X)−1 where Ω2 = diag{ê2i /(1 − (PX )ii )}. Compare these estimators with the usual σ̂ 2 (XT X)−1 under both homoskedastic and heteroskedastic error. 5 Reduced Major Axis Regression An estimate of slope in simple linear regression E(yi ) = α + βxi that has been proposed in measurement error problems is known as ’reduced major axis regression’ (RMA). I’ll skip the geometric motivation and just give the formulas. The RMA estimate of the slope is q X (xi − x)(yi − y), βRM A = sign(r) Sxx /Syy , where Sxy = and the intercept estimate follows the usual αRM A = y − βRM A x. Note that the sign of the centered crossproduct Sxy could be used in place of the usual Pearson correlation r. Analogues to standard formulas have been suggested for obtaining standard errors, as p se(βRM A ) = σ̂RM A / Sxx , and q se(αRM A = σ̂RM A 1/n + x2 /Sxx P 2 where σ̂RM (yi − αRM A − βRM A xi )2 . The effect of design characteristics on the performance A = of the estimator (and its competitors) would be interesting. 2