Download What is measurement error? Occurs when we cannot observe

Document related concepts

Data assimilation wikipedia , lookup

Regression toward the mean wikipedia , lookup

Forecasting wikipedia , lookup

Time series wikipedia , lookup

Bias of an estimator wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
1
1
INTRODUCTION
1
Introduction
What is measurement error?
Occurs when we cannot observe exactly some of the variables which define
our model of interest. Usually this is instrument error or sampling error.
A few examples of error-prone variables:
True Value
presence or absence of disease
Observed
result of diagnostic test
dietary intake
or physical activity
measure from questionnaire,
diary, replication, etc.
air pollution exposure
indoor measures
or local monitors
Food expenditures
self report
Employment status
Education level
Land area classification
self report
self report
value from satellite image
animal abundance
estimate from sampling
Delivered “dose” (e.g.,concentration, target dose
speed on treadmill)
1
INTRODUCTION
2
The main ingredients in a M.E. problem.
• MODEL FOR THE TRUE VALUES. e.g.,
1. Estimation of a single proportion, mean, variance, etc.
2. Contingency tables
3. Linear regression
4. Nonlinear Regression (including Binary regression)
• MODEL FOR MEASUREMENT ERROR
Specification of relationship between error-prone measurements and true
values (more later)
• EXTRA INFORMATION/DATA (allowing us to correct for ME)
1. Knowledge about some of ME parameters or functions of them.
2. Replicate values
3. Estimated standard errors attached to error prone variable
4. Validation data (internal or external)
5. Instrumental variables.
1
INTRODUCTION
3
General Questions
• What are the consequences of analyses which ignore the measurement error
(so-called naive analyses) on estimation, testing, confidence intervals,
etc.?
• How do we correct for measurement error?
This usually requires information or extra data.
- Direct bias corrections.
- Moment approaches.
- Likelihood based techniques.
- Regression calibration.
- Simex.
- Modified estimating equations.
1
INTRODUCTION
4
Differentiality, surrogacy and conditional independence.
X = true value, W = error-prone version of X
Y = another variable (usually a response)
• Differential and non-differential measurement error.
Nondifferential error: Measurement error in X is non-differential with
respect to y if the distribution of W |x, y = distribution of W |x
Differential error: distribution of W |x, y depends on y.
• Surrogacy: Distribution of Y |x, w = that of Y |x
w contains no information about y beyond that contained in x.
• Conditional independence:
Y and W are independent given X = x
Surrogacy ⇔ conditional independence ⇔ nondifferential error
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
2
5
Misclassification in Estimating a Proportion
2.1
Examples
Example 1 : Prevalence of HIV Virus.
Mali et al. (1995): Estimate 4.9% of 4404 women attending a family planning
clinic in Nairobi have HIV.
What if assessment is subject to misclassification?
ELISA(w)
0
1
Truth(x) 0 293 (275) 4 (22) 297
1
2
86
88
External validation data for the ELISA test for presence of HIV. x is the
true value, with indicating being HIV positive, and w is the error prone
measure from the assay. Based on Weiss et al. (1985).
If what Weiss et al. (1985) considered borderline are classified as negative
then there are only 4 false positives, while if they are considered positive there
are 22 false positives.
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
6
Example 2 . From Fleiss (1981, p. 202). Goal is to estimate the proportion
of heavy smokers in a population.
• x = “Truth” from blood test. (1 = heavy smoker)
• w = self report
Internal validation sample. Overall random sample of 200. 50 subsampled
for validation.
w
0
x = 0 24
1 6
30
? 82
112
1
2
18
20
68
88
26
24
50
150
200
Example 3. Lara et al. (2004) use as the randomized response technique (RRT) to estimate the proportion of induced abortions in Mexico. With
probability 1/2 the woman answered the question “Did you ever interrupt a
pregnancy?” and with probability 1/2 answered the question “Were you born
in April?”. Only the respondent knew which question they answered. Of 370
women interviewed, 56 answered yes.
The RRT intentionally introduces misclassification but with known misclassification probabilities (sensitivity = 13/24 = .5 + (1/2)(1/12); specificity =
23/24 = 1/2 + (1/2)11/12)).
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
7
Example 4. Use satellite imagery to classify habitat types. Validation data
taken from Canada Centre for Remote Sensing for five classes of land-use/landcover maps produced from a Landsat Thematic Mapper image of a rural area
in southern Canada. Unit is a pixel.
LANDSTAT
Water BareGround Deciduous Coniferous Urban
TRUTH
Water
BareGround
Deciduous
Coniferous
Urban
367
2
3
12
16
2
418
14
5
26
4
8
329
26
29
3
9
24
294
43
6
17
25
23
422
Other Examples.
• Theau et al. (2005): 10 categories involved in mapping lichen in Caribou
habitat
• Chandramohan et al. (2001). Misclassification based on “verbal autopsy”
in determination of the cause of death.
• Ekelund et al. (2006). Measure if someone is meeting PA quidelines using
7-day International Physical Activity Questionnaire (IPAQ). This is w; 1 =
meets standard. x = assessment by accelerometer (treated as truth).
w
0 1
x = 0 25 30 55
1 30 100 130
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
2.2
8
Models
X = true value
= 1 if ”success”,
= 0 if ”failure”
X1, . . . , Xn a random sample(i.i.d.)
OBJECTIVE: Inferences for π = P (Xi = 1).
With no measurement error: T = number of successes is Binomial(n, π).
p = T /n = proportion of successes in sample, E(p) = π, V (p) = π(1−π)/n.
- Exact techniques are based on the Binomial.
- Large sample confidence interval: p ± zα/2(p(1 − p)/n)1/2.
- Large sample tests based on normal approximation.
With misclassification
Observe W , fallible/error-prone measurement, instead of X.
P (W = 1|X = 1) = θ1|1 (sensitivity),
P (W = 0|X = 1) = 1 − θ1|1
P (W = 0|X = 0) = θ0|0 (specificity),
P (W = 1|X = 0) = 1 − θ0|0
Since X is random, we can also derive reclassification/predictive probabilities
λx|w = P (X = x|W = w).
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
9
Observe W1, . . . , Wn: pW = sample proportion with W = 1. Naive analysis uses pW .
πW = P (Wi = 1) = π(θ1|1 + θ0|0 − 1) + 1 − θ0|0.
(1)
We can also reverse the roles of X and W , leading to
π = πW (λ1|1 + λ0|0 − 1) + 1 − λ0|0.
(2)
Since E(pW ) = πW rather than π, the naive estimator has a bias of
BIAS(pw ) = πW − π = π(θ1|1 + θ0|0 − 2) + (1 − θ0|0).
For a simple problem the nature of the bias can be surprisingly complex. The
absolute bias can actually increase as the sensitivity or specificity increases with
the other held fixed.
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
10
0.10
0.00
0.05
Bias
0.15
0.20
p = .01
0.90
0.92
0.94
0.96
0.98
1.00
0.98
1.00
Sensitivity
−0.05
0.00
Bias
0.05
0.10
p = .5
0.90
0.92
0.94
0.96
Sensitivity
Plot of bias in naive proportion plotted versus sensitivity for
different values of specificity (dotted line = .85; solid line = .90;
dashed line = .95).
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
2.3
11
Correcting for misclassification
Using known or estimated misclassification rates,
pW − (1 − θb0|0)
.
π̂ = b
θ0|0 + θb1|1 − 1
(3)
Ignoring uncertainty in the misclassification rates

1/2
pW (1 − pW ) 

SE(π̂) = 
n(θ1|1 + θ0|0 − 1)2
and an approximate large sample Wald confidence interval for π is
π̂ ± zα/2SE(π̂).
(4)
Exact CI. Get an “exact” CI (LW , UW ) for πW (based on the Binomial)
and transform it.
Assuming θb1|1 + θb0|0 − 1 > 0 (it usually is)


b
b
 (LW − (1 − θ0|0 )) (UW − (1 − θ0|0 )) 
.
,
[L, U ] =  b
(θ1|1 + θb0|0 − 1) (θb1|1 + θb0|0 − 1)
(5)
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
12
Example 3 Abortion example (n = 370).
Sensitivity θ1|1 = 13/24 and specificity θ0|0 = 23/24.
Method Estimate SE
Wald Interval Exact Interval
Naive
.1514
.0186 (.1148,.1878) (.1161,.1920)
Corrected .2194
.0373 (.1463, .2924) (.1488, .3007)
2.3.1
Correction using external validation data.
- nV independent observations (not involving units from the main study), on
which W and X are both observed.
- An important assumption is that the misclassification model is the same
for this data as the main data; that is, the measurement error model is
exportable.
Representation of external
W
0
1
X 0 nV 00 nV 01
1 nV 10 nV 11
nV.0 nV.1
validation data
nV 0.
nV 1.
nV
Estimated specificity: θ̂0|0 = nV 00/nV 0.
Estimated sensitivity: θ̂1|1 = nV 11/nV 1.
π̂ =
pW − (1 − θ̂00)
.
θ̂00 + θ̂11 − 1
This is the Maximum Likelihood Estimate (MLE) of π if 0 ≤ π̂ ≤ 1.
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
13
An “exact” approach: Get confidence intervals for each πW , θ0|0 and
θ1|1 and get a confidence set for π by using the minimum and maximum value
of π computed over πW , θ0|0 and θ1|1 ranging over their intervals. If each of
the intervals has confidence coefficient 1 − α∗, the confidence coefficient for the
interval for π is ≥ (1 − α∗)3.
Delta Method/Approximate normal based interval:
π̂ = Z1 /Z2, where Z1 = pW − (1 − θ̂00), Z2 = θ̂00 + θ̂11 − 1
Account for uncertainty in estimates of sensitivity and specificity to get the
approximate variance
1
2
V (Z1) − 2πcov(Z1, Z2) + π V (Z2)
V (π̂) ≈
(θ0|0 + θ1|1 − 1)2
V (Z1) = V (pW ) + V (θ̂00) =
V (Z2) = V (θ̂00 + θ̂11) =
πW (1 − πW ) θ0|0(1 − θ0|0)
+
n
nV 0.
θ0|0(1 − θ0|0) θ1|1(1 − θ1|1)
+
nV 0.
nV 1.
Cov(Z1, Z2) = V (θ̂00) =
θ0|0(1 − θ0|0)
nV 0.
Get estimated variance σ̂π̂2 , by replacing parameters in σπ̂2 by estimates and
use approximate confidence interval π̂ ± zα/2σ̂π̂ .
Fieller Method: Use Fieller’s approach for getting confidence set (usually
an interval) for a ratio.
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
14
Bootstrapping.
For the bth bootstrap sample, b = 1, . . . , B, where B is large:
1. Generate pwb = Tb /n, θb0|0b = nV 00b/nV 0. and θb1|1b = nV 11b/nV 1., where Tb ,
nV 00b and nV 11b are generated independently as Bin(n, pw ), Bin(nV 0., θb0|0)
and Bin (nV 1., θb1|1), respectively.
b
b
b
c = (p
2. Use the generated quantities to obtain π
b
wb −(1− θ0|0b))/(θ0|0b + θ1|1b −
1) (truncated to 0 or 1 if necessary).
c to obtain bootstrap estimates of bias
3. Use the B bootstrap values c
π1 , . . . , π
B
c and to compute a bootstrap confidence interval.
and standard error for π,
Here, simple bootstrap percentile intervals will be used.
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
15
Estimation of prevalence in HIV example. Cor-K refers to corrected estimates and intervals treating the misclassification rates as known, while
Cor-U accounts for uncertainty in the estimated misclassification rates.
Boot indicates the bootstrap analysis. Bootstrap means are .0369 and .0274 for 4 and 22 false positives, respectively. Confidence intervals are
95% Wald intervals except for the bootstrap where they are percentile intervals.
Method Est. SE
CI
Fieller
Exact
Naive
.049 .0033 (.043, .055)
(.043, .056)
b
b
4 false positives: θ1|1 = .977, θ0|0 = .986
Cor-K .0369 .0033 (.030, .043)
(.030, .044)
Cor-U .0369 .0075 (.022,.052)
(.022, .052) (.002, .047)
Boot
.0075 (.021, .050)
22 false positives: θb1|1 = .977, θb0|0 = .926
Cor-K -.0278 .0036 (-.035, -.021)
(-.035, -.020)
Cor-U -.0278 .0177 (-.062, .007) (-.064, .006) (-.098, .004)
Boot
.0179 (-.065, .006)
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
2.3.2
16
Correcting using internal validation data
Observe Wi on all n units. Also observe Xi on random sample of size nV .
Representation of main study and internal validation data.
W
0
X = 0 nV 00
1 nV 10
nV.0
? nI0
n.0
1
nV 01 nV 0.
nV 11 nV 1.
nV.1 nV
nI1 nI
n.1
n
Could mimic what was done with external validation data, using estimated
misclassification rates, but this is inefficient. More efficient approach is to use
the estimated reclassification rates
λ̂1|1 = nV 11/nV.1 and λ̂0|0 = nV 00/nV.0
and
b
b
b
c = p (λ
π
r
W
1|1 + λ0|0 − 1) + 1 − λ0|0 .
c < 1.
This is the maximum likelihood estimator (MLE) as long as 0 < π
r
SE(πr ) =
c
p2W Vc (λb 1|1)
2c b
b
b
2c
+ (pW − 1) V (λ0|0) + (λ0|0 + λ1|1 − 1) V (pw )
1/2
(6)
where Vc (λb 1|1) = λb 1|1(1−λb 1|1)/nV.1, Vc (λb 0|0) = λb 0|0(1−λb 0|0)/nV.0, and Vc (pW ) =
c ±z
c
pW (1 − pW )/n. A Wald confidence interval for π is given by π
r
α/2 SE(πr ).
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
17
Bootstrapping. With the validation sample chosen as a random sample
from the main study units, the bootstrap can be carried out as follows. For the
bth bootstrap sample, b = 1, . . . , B:
1. For i = 1 to n generate Wbi distributed Bernoulli (0 or 1) with P (Wbi =
1) = pw .
2. For i = 1 to nV generate Xbi distributed Bernoulli, with P (Xbi = 1) equal
to 1 − λb 0|0 = λb 1|0 if Wbi = 0, and equal to λb 1|1 if Wbi = 1.
b
b
b
c
3. Calculate π
rb = pwb(λ1|1b + λ0|0b − 1) + 1 − λ0|0b which is the estimate of π
based on the reclassification rates for the bth bootstrap sample.
4. Use the B bootstrap values to obtain bootstrap inferences.
Exact confidence interval. Using Bonferonni’s inequality, form 100(1 −
α/3) % confidence intervals for πw , λ1|1 and λ0|0 based on the binomial (or
modifications of it). Confidence set for π is formed by value of πW (λ1|1 + λ0|0 −
1) + 1 − λ0|0 as the three parameters range over their respective intervals.
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
18
Smoking Example: Estimation of proportion of heavy smokers.
The estimated reclassification rates are
Method
Naive
Corrected
Bootstrap
λb 0|0 = 24/30 = .8 and λb 1|1 = 18/20 = .9
Estimate
.44
.508
.509(mean)
SE
95% Wald Interval 95% Exact Interval
.0351 (0.3712, .5088)
( .3680, 0.5118)
.0561 (.3980,.6180)
(.0982, .9502)
.0571
95% Bootstrap CI (.3973,.6213)
- The bootstrap estimate of bias (.509 - .508) is small.
- The “exact” corrected interval, which is conservative, is large, and is not
recommended given the relatively large sample sizes involved.
Extensions
• Handling two-phase/designed double sampling with internal validation (methods above still good except bootstrap needs modification.)
• Finite population adjustments.
• Multiple measures (rather than validation).
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
19
More than two categories.
M ≥ 2 categories with πx = P (X is in category x).
π ′ = (π1, . . . πM )
πM = 1 − π1 − . . . − πM−1 .
Observed W could have a different number of categories but here assume it
has M categories also with
P (W = w|X = x) = θw|x.
γw = P (Wi = w) =
PM
x=1 θw|x πx .
With γ ′ = (γ1, . . . , γM ) then
γ = Θ π and π = Λγ ,
where Θ has θw|1, . . . θw|M in the wth row.
Similarly, Λ is a matrix of reclassification probabilities, using
λx|w = P (X = x|W = w).
Vector of naive proportions, pW , estimate Θπ rather than π .
Bias = (Θ
Θ − I)π
π where I is an identity matrix.
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
20
If we have estimated misclassification rates from external validation data
−1
π̂
π = Θ̂
Θ pW .
With internal validation data the maximum likelihood estimator (assuming
c are between 0 and 1) is π̂
all components of π
π r = Λ̂
Λp, where p is the vector
r
of proportions formed using all of the observations.
c or π
c using theory
Can get analytical expressions for the covariance of either π
r
at end of Ch. 3. See also Greenland (1988) and Kuha et al. (1998).
Mapping example. To illustrate consider the mapping validation data
given in Example 3 and suppose there is separate random sample of an area
with 1000 pixels yielding proportions p′W = (.25, .10, .15, .20, .30), where .25 is
the proportion of pixels that are sensed as water, etc. The combined proportions
for W over the main and validation data are p′ = (.208, .181, .175, .183, .245).
From the validation data, the estimated misclassification and reclassification
rates are

0.961
0.004
Θ̂
Θ = 0.008
0.033
0.030













0.005
0.921
0.035
0.014
0.049
0.010
0.018
0.833
0.072
0.054
0.008
0.020
0.061
0.817
0.080

0.016 

0.037 

0.063 
0.064 

0.787
2
MISCLASSIFICATION IN ESTIMATING A PROPORTION
and
21

0.918 0.005 0.008
0.004 0.899 0.030
Λ̂
Λ = 0.010 0.020 0.831
0.008 0.024 0.064
0.012 0.034 0.051
Treating the validation data as external and















 0.251 




 0.087 





c
π =  0.134 


 0.195 




0.337

0.030 0.040 

0.011 0.056 

0.066 0.073  .
0.788 0.115 

0.047 0.856
internal respectively, leads to

and

 0.209 




 0.185 





c =  0.181 
π
.
r




 0.191 




0.243
Treating validation data as external. 1000 bootstrap samples.
Estimate
TYPE
Naive Corrected
Water
0.25 0.251
BareGround 0.1
0.087
Deciduous
0.15 0.134
Coniferous 0.2
0.195
Urban
0.3
0.337
Bootstrap
SE
90% CI
0.015 (0.227, 0.275
0.011 (0.069, 0.105)
0.015 (0.111,0.158)
0.017 (0.167, 0.224)
0.021 (0.302, 0.371)
3
3
MISCLASSIFICATION IN TWO-WAY TABLES
22
Misclassification in two-way tables
Example 1: Antibiotics/SIDS. From Greenland (1988). Case-control study
of the association of antibiotic use by the mother during pregnancy, X, and the
occurrence of sudden infant death syndrome (SIDS), Y . W = antibiotic use
based on a self report from the mother.
Validation study of 428 women. X = from medical records.
Antibiotic use and SIDS example.
MAIN STUDY
Y
Controls(0) Cases(1)
W No Use (0)
479
442
Use (1)
101
122
580
564
VALIDATION DATA
Control (Y=0)
X
0 (no use) 1 (use) 0
W = 0 (no use)
168
16
1 (use)
12
21
180
37
Cases (Y=1)
X
(no use) 1 (use)
143
17
22
29
165
46
For controls (Y = 0) estimated specificity and sensitivity are .933 and .568
For cases (Y = 1), .867 and .630.
Suggests misclassification may be differential.
3
MISCLASSIFICATION IN TWO-WAY TABLES
23
Accident Example. Partial data from Hochberg (1977). Look at seat
belt use and injury, accidents in North Carolina in 1974-1975 with high damage
to the car and involving male drivers.
Error prone measures (from police report):
D (1 = injured, 0 = not ) and W ( 0 = no seat belt use, 1 = use).
True values (based on more intensive inquiry):
Y (injury) and X (seat-belt use).
MAIN STUDY (error-prone values)
D
Prop. inj.
No injury(0) Injured(1)
W No Seat Belt (0)
17476
6746
2422 .278
2738 .213
Seat Belt (1)
2155
583
Y
0
0
1
1
X
0
1
0
1
VALIDATION DATA
D=0
D=1
W= 0 W= 1 W = 0 W = 1
299
4
11
1
20
30
2
2
59
1
118
0
9
6
5
9
3
MISCLASSIFICATION IN TWO-WAY TABLES
24
Common ecological settings.
X = habitat type or category (may be different types or categories formed
from a categorizing a quantitative measure; e.g., low, medium or high level of
vegetation cover) or different geographic regions.
Y = outcome; presence or absence of one or many “species”, nesting success,
categorized population abundance, etc.
3
MISCLASSIFICATION IN TWO-WAY TABLES
3.1
25
Models for True values
There are three formulations:
• joint model for X and Y
γxy = P (X = x, Y = y).
Y
0 1
X 0 γ00 γ01 γ0.
1 γ10 γ11 γ1.
γ.0 γ.1 1
• The conditional model for Y given X, which is specified by
πx = P (Y = 1|X = x),
Y
0
1
x 0 1 − π0 π0
1 1 − π1 π1
• The conditional model for X given Y , specified by
αy = P (X = 1|Y = y)
y
0
1
X 0 1 − α0 1 − α1
α0
α1
1
3
MISCLASSIFICATION IN TWO-WAY TABLES
26
Three types of study.
• Random Sample.
• “Prospective” or ”cohort” study, pre-stratified on X.
Can only estimate π’s and functions of them
• “Case-Control” study, pre-stratified on Y
Can only estimate α’s and functions of them
Relative Risk =
Odds Ratio = Ψ =
π1
π0
π1/(1 − π1) π1(1 − π0)
=
.
π0/(1 − π0) π0(1 − π1)
OR ≈ R.R. if π0 and π1 are small. (These can be defined in terms of γ’s if
have a random sample.
π1(1 − π0) α1 (1 − α0 ) γ11γ00
=
=
.
π0(1 − π1) α0 (1 − α1 ) γ01γ10
Can estimate the odds ratio from a case-control study.
X is independent of Y ⇔ π0 = π1 ⇔ α0 = α1 ⇔ OR = 1.
Tested via Pearson chi-square, likelihood ratio or Fisher’s exact test.
3
MISCLASSIFICATION IN TWO-WAY TABLES
27
Observed main study data
D
0
1
W 0 n00 n01 n0.
1 n10 n11 n1.
n.0 n.1 n
Observed Proportions from naive 2 × 2 table.
Proportion
Overall
Row
Column
Quantity
Naive Estimate of:
pwd = nwd/n
γwd
pw = nw1/nw.
πw
ad = n1d/n.d
αd
The naive estimator of the odds ratio
Ψ̂naive =
a1(1 − a0) n11n00 p1(1 − p0)
=
=
a0(1 − a1) n10n01 p0(1 − p1)
Other naive estimators are similarly defined.
There are a huge number of cases based on combining
• Sampling design
• Whether one or both of the variables are misclassified
• Whether validation data is internal or external
• If internal, is a random sample or a two-phase design used.
3
MISCLASSIFICATION IN TWO-WAY TABLES
3.2
28
Models and biases
Error in one variable (X) with other perfectly measured. With change in
notation this handles error in Y only (e.g., replaces αj with πj ). This will
illustrate the main points.
Observe W instead of X. General misclassification probabilities are given by
θw|xy = P (W = w|X = x, Y = y).
Differential M. Error: θw|xy depends on y.
Nondifferential M. Error: θw|xy = θw|x (doesn’t depend on y.)
θ1|1y = sensitivity at y.
θ0|0y = specificity at y.
Biases of naive estimators: RS or stratified on y.
ay (proportion of data with Y = y at W = 1): naive estimator of αy =
P (X = 1|Y = y). E(ay ) = µy , where
µ0 = θ1|00 + α0 (θ1|10 + θ0|00 − 1) and µ1 = θ1|01 + α1 (θ1|11 + θ0|01 − 1).
With nondifferential misclassification
µ0 = θ1|0 + α0(θ1|1 + θ0|0 − 1) and µ1 = θ1|0 + α1(θ1|1 + θ0|0 − 1)
µ1 − µ0 = (α1 − α0)(θ1|1 + θ0|0 − 1).
µ1 = µ0 is equivalent to α0 = α1 (independence) under non-differential
misclassification.
3
MISCLASSIFICATION IN TWO-WAY TABLES
29
• Naive estimator of α1 − α0 estimates µ1 − µ0.
• With nondifferential misclassification in one variable and no misclassification in the other, naive tests of independence are valid in that they
have the correct size (approximately). With differential misclassification these tests are not valid.
Bias in naive estimators of odd-ratio.
The naive estimator of the odd ratio, Ψ̂naive = a1(1 − a0)/a0(1 − a1), is a
consistent estimator of
Ψnaive =
µ1(1 − µ0)
.
µ0(1 − µ1)
• The bias is difficult to characterize. It may go in either direction, either over
or under estimating the odds-ratio (on average), when the misclassification
is differential.
• With nondifferential misclassification as long as θ1|1 + θ0|0 − 1 > 0 (which
will almost always hold) the naive estimator is biased towards the value of
1 in that either 1 ≤ Ψnaive ≤ Ψ or Ψ ≤ Ψnaive ≤ 1.( Gustafson (2004,
Section 3.3))
• Bias results are asymptotic/approximate. May not work for small samples.
No guarantee of direction in a specific sample (see Jurek et al. (2005)).
MISCLASSIFICATION IN TWO-WAY TABLES
30
15
***
***
***
***
*
**
**
**
**
***
**
**
***
**
**
**
***
***
**
0.5
1.0
1.5
10
5
0
Naive odds ratio
Differential Misclassification
**
**
***
**
**
**
**
**
***
**
2.0
True odds ratio
0.5
1.5
Nondifferential Misclassification
Naive odds ratio
3
**
**
***
*
*
***
**
**
0.5
1.0
1.5
**
**
**
**
**
***
*
2.0
True odds ratio
Plot of limiting value of naive estimator.
• odds ratio Ψ = .5, 1, 1.5 or 2.
• α1 = .05 to .95 by .3 ( α0 to yield the desired Ψ).
• Sensitivity and specificity ranged from .8 to 1 by .5. Taken equal for nondifferential cases.
3
MISCLASSIFICATION IN TWO-WAY TABLES
31
Error in both variables.
The general misclassification model is
θwd|xy = P (W = w, D = d|X = x, Y = y).
If misclassification is both nondifferential and independent, P (W = w, D =
∗
d|x, y) = θwd|xy = θw|xθd|y
, with
∗
θw|x = P (W = w|X = x) and θd|y
= P (D = d|Y = y).
For a random sample, it is relatively easy to address bias in the naive
estimators. The naive estimate of γwv is pwd with
E(pwd) = µwd =
XX
x y
θwd|xy γxy .
(7)
This yields the bias in pwd as an estimator of γwd and can be used to determine
biases (sometimes exact, often approximate) for other quantities in the same
manner as in the previous sections.
For example, the approximate bias in the naive estimate of π1 − π0 is
µ11 µ01
−
− (π1 − π0)
µ1.
µ0.
and for the odd-ratio is
µ11µ00 γ11γ00
−
.
µ01µ10 γ01γ10
Bias expressions are complicated. No general characterizations of the bias.
More complicated expressions and derivations if stratify on a one of the misclassified variables.
3
MISCLASSIFICATION IN TWO-WAY TABLES
3.3
32
Correcting using external validation data.
The general approach using external data.
• p = vector of proportions from main study (either cell proportions or conditional proportions.)
• φ = parameters of interest (what p naively estimates.)
E.g., p′ = (p00, p01, p10, p11) and φ′ = (γ00, γ01, γ10, γ11).
• Find E(p) = b + Bφ
φ, where b and B are functions of the misclassification
rates. In some cases b is 0.
• Estimate misclassification rates, and hence, b and B, using external validation data.
• Get corrected estimate
φ̂
φ = B̂−1(p − b̂).
This feeds into other estimates (e.g., odds ratios.)
• Getting an expressions for Cov(φ̂
φ) is “straightforward” in principle, but can
be tedious. There are exact expression are available for some special cases.
• Bootstrapping: Resample main study data and external validation (similar
to with one proportion).
3
MISCLASSIFICATION IN TWO-WAY TABLES
33
Back to misclassification in X only.
External Validation data at Y = y.
W
0
1
X = 0 nV 00(y) nV 01(y) nV 0.(y)
1 nV 10(y) nV 11(y) nV 1.(y)
nV.0(y) nV.1(y) nV (y)
θb1|1y = nV 11(y)/nV 1.(y) (estimated sensitivity atY = y)
θb0|0y = nV 00(y)/nV 0.(y) (estimated specificity atY = y),
θb0|1y = 1 − θb1|1y and θb1|0y = 1 − θb0|0y
With non-differential misclassification, have θb1|1y = θb1|1, etc.
Suppose the design is either a random sample or case-control with α’s or the
odds-ratio of primary interest.
Recall: With a change of notation this covers the case of misclassification of
a response over perfectly measured categories (for RS or prospective study).
α̂0 = (a0 − θ̂1|00)/(θ̂1|10 − θ̂1|00)
and
α̂1 = (a1 − θ̂1|01)/(θ̂1|11 − θ̂1|01)
For nondifferential misclassification, replace θb1|10 with θb1|1, etc.
3
MISCLASSIFICATION IN TWO-WAY TABLES
34
• Differential misclassification:
c ,α
cov(α
0 c1 ) = 0
1 2
2
V (α̂0) ≈ 2 V (a0) + (1 − α0 ) V (θ̂1|00) + α0 V (θ̂1|10)
∆0
1 2
2
V (α̂1) ≈ 2 V (a1) + (1 − α1 ) V (θ̂1|01) + α1 V (θ̂1|11)
∆1
where V (ay ) = µy (1 − µy )/n.y , V (θ̂1|xy ) = θ1|xy (1 − θ1|xy )/nV x.(y),
∆0 = θ1|10 − θ1|00, and ∆1 = θ1|11 − θ1|01.
• Non-differential
∆0 = ∆1 = ∆ = θ1|1 − θ1|0, while nV x.(y) is replaced by nV x. .
1 cov(α̂0, α̂1 )) ≈ 2 (1 − α0)(1 − α1)V (θ̂1|0) + α0 α1V (θ̂1|1) .
∆
c −α
c )) = V (α̂ ) + V (α̂ ) − 2cov(α̂ , α̂ ).
V (α
0
1
0
1
0
1
The estimate of L = log(OR) is
L̂ = log(α̂1) + log(1 − α̂0) − log(α̂0) − log(1 − α̂1 ).
(Common to first get a confidence interval for L and then exponentiate to get
CI for odds ratio.)
V (L̂) ≈
V (α̂0)
V (α̂1)
cov(α̂0, α̂1 )
.
+
−
2
α02(1 − α0)2 α12(1 − α1)2
α0(1 − α0 )α1(1 − α1)
(8)
3
MISCLASSIFICATION IN TWO-WAY TABLES
35
SIDS/Antibiotic Example. Case/control study.
θ̂0|00
θ̂1|10
θ̂0|01
θ̂1|11
= .933 =
= .568 =
= .867 =
= .630 =
estimated specificity at y = 0,
estimated sensitivity at y = 0,
estimated specificity at y = 1, and
estimated sensitivity at y = 1.
Differential Misclassification
Quantity Naive Corrected SE Wald CI Boot SE Boot CI
α0
.174 .215
α1
.216 .167
Odds ratio 1.309 .7335
.403 (.25,2.15) .532
(.167,2.17)
nondifferential Misclassification
Quantity Naive Corrected SE Wald CI Boot SE Boot CI
α0
.174 .150
α1
.216 .234
Odds ratio 1.309 1.73
.555 (.92,3.24) .222
(.963, 4.00)
• The assumption of nondifferential misclassification is questionable.
• As Greenland (1988) illustrated and discussed, allowing differential misclassification produces very different results.
• Under differential misclassification the bootstrap mean is .8260 leading to
a bootstrap estimate of bias in the corrected estimator of .8260 - .7335 =
.0925.
• The calculations for the example were calculated using SAS-IML. Certain
aspects of the analysis can also be carried out using the GLLAMM procedure
in STATA; see Skrondal and Rabe-Hesketh (2004).
3
MISCLASSIFICATION IN TWO-WAY TABLES
36
External data with both Misclassified.
3.4
Error in X and Y both



p00   θ00|00


p01   θ01|00
E
 = 
p10   θ10|00


p11
θ11|00









θ00|01
θ01|01
θ10|01
θ11|01
θ00|10
θ01|10
θ10|10
θ11|10
θ00|11
θ01|11
θ10|11
θ11|11











γ00 

γ01 
.
γ10 

γ11
In the most general case there are 12 distinct misclassification rates involved,
since each column of B sums to 1.
γ̂γ = B̂−1p.
(9)
Cov(θb ) and Cov(p) can be obtained using multinomial results. These can be
used along with multivariate delta method to obtain an approximate expression
c) and to obtain approximate variances for the estimated π’s, their
for Cov(γ
difference, the odds ratio, etc. Example below only utilizes the bootstrap for
obtaining standard errors and confidence intervals.
3
MISCLASSIFICATION IN TWO-WAY TABLES
37
Accident Example. Treat validation data as external, with both variables
measured with error.
For bootstrapping, first a vector of main sample counts is generated using
a multinomial with sample size nI = 26960 and a vector of probabilities
p = (0.6482, .2502, .0799, .0216) based on the observe cell proportions. The
validation data is generated by generating a set of counts for each (x, y) combination using a multinomial with sample size equal to nV xy , the number of
observations in validation data with X = x and Y = y, and probabilities given
by the estimated reclassification rates associated with that (x, y) combination.
π0 and π1 are probability of injury without and with seat belts, respectively.
Estimates
Parameter Naive Cor.
γ00
.648 .508
γ01
.250 .331
γ10
.080 .110
γ11
.022 .051
π0
.279 .395
π1
.213 .319
π1 − π0
-.066 -.076
Mean
0.505
0.329
0.108
0.060
0.394
0.344
-0.050
Bootstrap Analysis
Median SE
90% CI
0.507
0.0278 ( 0.460,0.547)
0.329
0.027 (0.289,0.369)
0.108
0.025 (0.071, 0.148)
0.053
0.063 (0.026,0.104)
0.392
0.031 (0.350, 0.445)
0.333
0.135 (0.160, 0.571)
-0.070 0.147 (-0.252, 0.198)
c = .2785, π
c = .2129, estimated difference = -.0656,
NAIVE: π
0
1
SE = .0083, 90% confidence interval of (−0.0793, −0.0518).
3
MISCLASSIFICATION IN TWO-WAY TABLES
3.5
38
Correcting using internal validation data.
General Strategies
• Use the validation data to estimate the misclassification rates then correct
the naive estimators as above for external data. Sometimes referred to as
the matrix method.
• Weighted average approach (assuming a random subsample). Use true values from validation data to get estimates in usual fashion and to estimate
misclassification rates. Then correct estimates from part of main study that
is not validated (as above using external validation) and use a weighted average.
• Correct using estimated reclassification rates. This usually produces the
MLEs and is more efficient. Can be used with designed two-phase studies.
Sometimes called the inverse matrix method (ironically).
Illustrate last approach here with overall random sample. (Same developments work for other cases with redefinition of γ , Λ, µ and p)
Error in X only: P (X = x|W = w, Y = y) = λx|wy
γxy = P (X = x, Y = y) =
X
w
λx|wy µwy , µwy = P (W = w, Y = y).
3
MISCLASSIFICATION IN TWO-WAY TABLES



 λ0|00
 γ00 






 λ1|00
 γ10 



γ = 
 = 

 0

 γ01 



0
γ11
39
λ0|10
λ1|10
0
0
0
0
λ0|01
λ1|01
0
0
λ0|11
λ1|11


  µ00 




  µ10 
 = Λµ.




 µ
01 



µ11
Error in X and Y both: P (X = x, Y = y|W = w, D = d) = λxy|wd



 λ00|00
 γ00 






 λ01|00
 γ01 



=
γ = 



 λ
 10|00
 γ10 



λ11|00
γ11
λ00|01
λ01|01
λ10|01
λ11|01
λ00|10
λ01|10
λ10|10
λ11|10
λ00|11
λ01|11
λ10|11
λ11|11


  µ00 




  µ01 
 = Λµ.



 µ

10 



µ11
µwd = P (W = w, D = d) and λwd|xy = P (X = x, Y = y|W = w, D = d).
E(p) = µ (p contains cell proportions.)
γ̂γ = Λ̂
Λp.
b obtained using the delta method in combination with Cov(p) and
• Cov(γ)
b
the variance-covariance structure of the γ’s
(from binomial and multinomial
results.)
Cov(γ̂γ ) = Σγb ≈ A + ΛCov(p)Λ
Λ′
A is a 4 × 4 matrix with (j, k)th element equal to µ ′ Cjk µ , with Cjk =
c
cov(Zj , Zk ), Z′j =e jth row of Λ
.
3
MISCLASSIFICATION IN TWO-WAY TABLES
40
Accident Example with internal validation data. Treated as overall
random sample with both variables misclassified.
nI = 26969 cases with W and D only.
nV = 576 validated cases.

0.7726098
0.1524548
Λ̂
Λ=
0.0516796
0.0232558









0.0808824
0.8676471
0.0147059
0.0367647
0.097561
0.0243902
0.7317073
0.1463415

0.0833333 


0



0.1666667 

0.75
c = .2781, π
c = .2132,
Naive analysis: p = (0.6487, .249, .0796, .0216), π
0
1
estimated difference of -.0649 (SE = .0083)
Corrected estimate and bootstrap analysis of accident data with internal
validation and both variables misclassified.
Variable
γ00
γ01
γ10
γ11
π0
π1
π1 − π0
Estimate
.5310
.3177
.0994
.0520
.3743
.3447
-.0297
Mean
.5310
.3175
.0994
.0520
.3742
.3433
-.0309
Bootstrap
SE
90% CI
0.5819 (0.5049,0.5573)
0.3682 (0.2949, 0.3411)
0.1303 (0.0836, 0.1148)
0.0799 (0.0395, 0.0651)
0.4267 (0.3481, 0.4001)
0.4883 (0.2712, 0.4181)
0.1121 (-0.1070, 0.0489)
4
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
41
Simple Linear Regression with additive error
Look at simple linear case first to get main ideas across in simple fashion. Then
move to multiple regression.
Yi|xi = β0 + β1xi + ǫi
E(ǫi) = 0, V (ǫi) = σ 2.
The ǫi (referred to as error in the equation) are uncorrelated,
With random Xi there may be interest in the correlation
ρ = σXY /σX σY = β1 (σX /σY ).
D = observed value of Y (this is equal to Y if no error in response)
W = measured value of X.
Examples
• Soil nitrogen/corn yield example from Fuller (1987).
Nitrogen content (X) estimated through sampling. Measurement error variance treated as known. Y treated as measured without error.
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
42
• Gypsy moth egg mass concentrations and defoliation.
Model is defined in terms of values on sixty hectare units. Very costly to get
exact values. Subsample m subplots that are .01 hectare in size.
w = estimated gypsy moth egg mass (w)
d = estimated defoliation (percent)
FOREST mi
di
σ̂qi
wi
GW
20 45.50 6.13 69.50
GW
20 100.00 0.00 116.30
GW
20 25.00 2.67 13.85
GW
20 97.50 1.23 104.95
GW
20 48.00 2.77 39.45
GW
20 83.50 4.66 29.60
MD
15 42.00 8.12 29.47
MD
15 80.67 7.27 41.07
MD
15 35.33 7.92 18.20
MD
15 14.67 1.65 15.80
MD
15 16.00 1.90 54.99
MD
15 18.67 1.92 54.20
MD
15 13.33 2.32 21.13
MM
10 30.50 5.65 72.00
MM
10 6.00 0.67 24.00
MM
10 7.00 0.82 56.00
MM
10 14.00 4.46 12.00
MM
10 11.50 2.48 44.00
σ̂ui
12.47
28.28
3.64
19.54
7.95
6.47
7.16
9.57
3.88
4.32
26.28
12.98
5.40
26.53
8.84
14.85
6.11
28.25
σ̂uqi
13.20
0.00
1.64
0.76
-2.74
12.23
-5.07
18.66
18.07
-2.07
25.67
-1.31
-3.17
18.22
1.78
-5.78
12.44
8.22
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
43
• Functional case: xi ’s treated as fixed values
• Structural case: Xi’s are random and xi = realized value.
2
Usually assume X1, . . . , Xn independent with E(Xi) = µX , V (Xi) = σX
but can be relaxed.
Naive estimators
β̂1naive = SW D /SW W , β̂0naive = D̄ − β̂1naive W̄ ,
2
=
σ̂naive
W =
4.1
P
i Wi /n,
D=
SW D =
P
X
i
P
i (Wi
(Di − (β̂0naive + β̂1naiveWi ))2/(n − 2),
i Di /n,
− W̄ )(Di − D)
and SW W =
n−1
P
− W̄ )2
.
n−1
i (Wi
The additive Berkson model and consequences
The “error-prone value”, wi , is fixed value while the true value, Xi, is random.
This occurs when wi is a target “dose” (dose, temperature, speed). There are
numerous nitrogen intake/nitrogen balance studies used to determine nutritional requirements. Target intake is w.
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
44
Xi = wi + ei, E(ei) = 0, and V (ei) = σe2.
If error in y then Di = yi + qi with E(qi) = 0 and V (qi ) = σq2.
Reasonable to assumed qi is independent of ei .
Di = β0 + β1wi + ηi ,
where ηi = β1 ei + ǫi + qi with
E(ηi ) = 0, V (ηi) = β12σe2 + σ 2 + σq2.
Key point: with wi fixed, ηi is uncorrelated with wi (since any random
variable is uncorrelated with a constant)
For the additive Berkson model, naive inferences for the coefficients, based
on regressing D on w, are correct. If there is no error in Y , naive predictions of
Y from w, including prediction intervals, are correct. If there is measurement
error in Y then the naive prediction intervals for Y from w are not correct.
- This result depends on both the additivity of the error and the linear model.
If Xi = λ0 + λ1wi + δi, the naive estimate of the slope estimates β1 λ1 rather
than β1 .
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
4.1.1
45
Additive Error and consequences
Given xi and yi ,
Di = yi + qi
and
W i = x i + ui
E(ui |xi) = 0, E(qi |yi ) = 0,
2
2
V (ui |xi) = σui
, V (qi |yi) = σqi
and Cov(ui, qi|xi, yi ) = σuqi.
qi = measurement error in Di as an estimate of yi
ui = measurement error in Wi as an estimate of xi.
(ui, qi) independent over i, uncorrelated with the ǫi.
• Either of the variables may be observed exactly in which case the appropriate
measurement error is set to 0, as is the corresponding variance and the
covariance.
• The most general form here allows a heteroscedastic measurement error
model in which the (conditional) measurement error variances, or covariance
if applicable, may change with i. This can arise for a number of reasons,
including unequal sampling effort on a unit or the fact that the variance
may be related to the true value.
If X is random then it may be necessary to carefully distinguish the conditional behavior of u given X = x, from the behavior of the “unconditional
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
46
measurement error” W − X. This is a subtle point that we’ll put aside
here.
• Uncorrelatedness of (ui, qi ) with ǫi is weaker than independence. It is implied just by the assumption that the conditional means are 0! The variances
could depend on the true values (a type of dependence.)
Constant ME variances/covariance:
2
σui
= σu2 ,
2
σqi
= σq2,
2
σuqi
= σuq .
The behavior of naive analyses under additive measurement error.
Why should there be any trouble?
With xi fixed,
Di = β0 + β1Wi + ǫ∗i ,
ǫ∗i = −β1ui + ǫi + qi ,
(10)
2
. This violates one
ǫ∗i is correlated with Wi since Cov(Wi, ǫ∗i ) = σuqi − β1σui
of the standard regression assumptions. Can also show E(ǫ∗i |Wi = wi ) is not 0
(unless β1 = 0).
The naive estimates of the regression coefficients or the error variance
are typically biased. The direction and magnitude of bias depends on a
variety of things. If there is error in both variables, the correlation in the
measurement errors plays a role.
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
47
Approximate/Asymptotic Biases:
2
E(β̂0naive ) ≈ γ0, E(β̂1naive ) ≈ γ1 and E(σ̂naive
) ≈ σδ2.
γ0 = β0 +

γ1 = 
2
σX
2
σX
+ σu2



β1 +
µX
(β1σu2 − σuq )
2
2
σX + σu

σu2

σuq
σuq



 β1 +
=
β
−
1
2
2
2
σX
+ σu2
σX
+ σu2
σX
+ σu2
σδ2 = σ 2 + σq2 + β12σx2 − (β1σx2 + σuq )2/(σx2 + σu2 ).
µX =
SXX =
P
i (Xi
X
i
2
E(Xi)/n, σX
= E(SXX ),
− X)2/(n − 1).
σu2 =
n
X
i=1
2
σui
/n, σq2 =
n
X
i=1
2
σqi
/n
and σuq =
n
X
i=1
σuqi/n.
This handles either the structural case (without the Xi necessarily being
2
=
i.i.d.) or the functional cases. In the functional case, Xi = xi fixed, so σX
Pn
i=1 (xi
− x)2/n and µX =
Pn
i=1 xi /n.
• If the X’s are i.i.d. normal and the measurement errors are normal with
constant variance and covariances then the result above is exact in the sense
that Di|wi = γ0 + γ1wi + δi where δi has mean 0 and variance σδ2.
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
48
Special case: No error in the response or uncorrelated
measurement errors.
Important and widely studied special case, with σuq = 0.
• The naive estimate of the slope estimates


2
σX


 β1 = κβ1 ,
γ1 =  2
σX + σu2
where


2
σX



κ= 2
σX + σu2
is the reliability ratio. When β1 6= 0 and σu2 > 0 then |γ1| < |β1| leading
to attenuation (slope is biased towards 0.)
• γ1 = 0 if and only if β1 = 0, so naive test of H0 : β1 = 0 is essentially
correct.
The general statements that “the effect of measurement error is to underestimate the slope but the test for 0 slope are still okay” are not always
true. They do hold when there is additive error and either there is error
in X only or, if there is measurement error in both variables, the measurement errors are uncorrelated.
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
49
Numerical illustrations of bias with error in X only.
2
= 1 and σ = 1.
β0 = 0, β1 = 1, µX = 1,σX
3
Truth: κ = 1 (solid line)
0
−3
−2
−1
γ0 + γ1x
1
2
1
.5
.7
.9
−3
−2
−1
0
1
2
3
x
Shows attenuation in the slope and that a point on the regression line, β0 +
β1x0 is underestimated if x0 > µX = 0 and overestimated if x0 < µX = 0.
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
4.2
50
Correcting For Measurement Error
Long history and a plethora of techniques available depending on what is assumed known; Fuller (1987) and Cheng and Van Ness (1996).
Focus on a simple moment corrected estimator.
2
2
and σ̂uqi denote estimates of measurement error variances and
, σ̂qi
Let σ̂ui
covariance.
• If the measurement error variances are constant across i, drop the i subscript.
• In some cases (and in much of the traditional literature) it is assumed that
these are known in which case the “hats” would be dropped.
• there are a variety of ways these can be estimated depending on the context.
Replication.
With additive error the measurement error variances and covariances are
often estimated through replicate measurements.
Wi1, . . . Wimi (replicate values of error-prone measure of x)
Wij = xi + uij
2
E(uij ) = 0, V (uij ) = σui1
(per-replicate variance).
Wi =
mi
X
j=1
Wij /mi
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
51
2
2
2
2
σui
= σui1
/mi , σ̂ui
= SW
i /mi
2
SW
i = sample variance of Wi1 , . . . , Wimi .
Similarly if error in Y ,
2
σ̂qi
= s2Di /ki
σ̂uqi = SW Di/mi .
and (if paired)
ki = number of replicates of D on unit i ( = mi when paired).
sW di = sample covariance of replicate values.
REMARK: If the per-replicate measurement error variance is constant, can
use pooling across units without replicates on each individual.
More generally the manner in which estimated variances and covariance are
estimated depends on the manner in which Wi (and Di) are generated on a
unit.
Moment based correction:
P
SW D − i σ̂uqi/n
β̂1 =
P
2
SW W − i σ̂ui
/n
ρ̂ =
σ̂ 2 =
(SW W
X
i
β̂0 = D̄ − β̂1W̄
P
SW D − i σ̂uqi/n
P
P
2 /n)1/2 (S
2 /n)1/2 .
− i σ̂ui
−
σ̂
DD
i di
(ri2/(n − 2)) −
X
i
2
2
(σ̂qi
+ β̂12 σ̂ui
− 2β̂1σ̂uqi)/n,
(11)
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
52
where ri = Di − (β̂0 + β̂1Wi).
NOTE: If either SW W −
P
2
i σ̂ui/n,
some adjustments need to be made.
2
c2 are negative,
, or σ
which estimates σX
Motivation: Conditional on the x′s and y ′ s,
E(SW D −
P
i σ̂uqi /n)
= Sxy and E(SW W −
P
2
i σ̂ui /n)
= Sxx .
Or can view as bias correction to naive estimators.
• The corrected estimators are not unbiased, but are they are consistent under
general conditions.
• Under the normal structural model with normal measurement errors and
constant known ME variances/covariances, these are close to the maximum
2
= SW W − σu2 and σ̂ 2 are nonnegative.)
Likelihood estimators (as long as σ̂X
Differ by divisors being used.
• The sampling properties and how to get standard errors, etc. are discussing
later under multiple linear regression.
• SPECIAL CASE: Uncorrelated measurement errors and constant variance.
−1
β̂1 = β̂1naive κ̂ ,
2
σ̂X
κ̂ = 2
σ̂X + σ̂u2
2
= SW W − σ̂u2 .
κ̂ = estimated reliability ratio and σ̂X
(12)
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
53
• Regression Calibration Approach: No error in y.
Motivation for RC will come later. With random X and some assumptions
the best linear predictor of X given W is µX +κ(w−µX ). (Under normality
and constant measurement error variance this is exactly E(X|W = w).)
If we regress Yi on X̂i = W̄ + κ̂(wi − W̄ ), the estimated coefficients are
identical to the moment corrected estimators.
Bootstrapping.
Bootstrapping needs to be approached carefully (more general discussion
later.)
Data on the ith unit is (Di, Wi, θb i) where θb i contains quantities associated
with estimating the measurement error parameters. With error in X only then
′
b
c2 , while with correlated errors in both variables, θ
c2 c2 c
θb i = σ
i = (σui, σqi , σuqi).
ui
One-stage bootstrap. Resample units; i.e., resample from {(Di, Wi, θb i )}.
Only justified if overall random sample of units and common manner of es-
timating the measurement error parameters on each unit (e.g., equal sampling
effort). Will not work for non random-samples or if sampling effort changes
and is attached to the position in the sample (rather than being attached to
the unit occurring in that position.)
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
54
Two-stage bootstrap. Mimic the whole process (e.g., occurrence of measurement error and of error in the equation.).
Consider error in x only.
xb i = estimate of xi (could be Wi or predicted value from E(X|w))
On bth bootstrap sample the ith observation is (Ybi , Wbi, θbbi ) where
Ybi = βc0 + βc1xb i + ebi
c2 .
where ebi is a generated value having mean 0 and variance σ
Wbi and θbbi are created by mimicking how Wi and θbi arose. For example with
replication we would generate replicates Wbij = xb i + ubij where ubij has mean
2
c2 .)
σui
0 and per-replicate variance mi c
(so the mean Wbi has variance σ
ui
How to generate ebi ? Easy if assume normality, but difficult to do nonpara-
metrically. With measurement error can’t just use residuals from the corrected
fit since even with known coefficients
ri = Di − (β̂0 + β̂1Wi) = ǫi + qi − β1 ui.
Need to unmix/deconvolve to remove the influence of the measurement error.
4.3
Examples
Defoliation example with error in both variables
Assume a common linear regression model holds over all three forests (more
on this later).
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
55
No bootstrap analysis not provided here. Since the 18 stands were not an
overall random sample and the sample effort is unequal an analysis based on
resampling observations could be misleading.
SE(β̂0 ) SE(β̂1 ) v01
CI for β1
11.459 .2122 -2.042 (.163,1.063)
12.133
.182 -1.918 (.469,1.181)
15.680
.338 -4.848 (.163,1.488)
50
0
Defoliation
100
Method
β̂0
β̂1
σ̂ 2
NAIVE 10.504 .6125 696.63
COR-R .8545 .8252 567.25
COR-N .8545 .8252 567.25
0
20
40
60
80
100
Egg Density
Naive: solid; corrected: = dots; dots = Robust CIs; dashed = normal CIs
4
SIMPLE LINEAR REGRESSION WITH ADDITIVE ERROR
56
Nitrogen-yield example. Y = corn yield (no error).
X = nitrogen level. Measured with error with σu2 = 57.
v01 = estimated covariance of βc0 and βc1 .
Method
NAIVE
COR-N
COR-R
COR-RC
β̂0
73.153
67.56
67.56
67.56
β̂1
.344
.423
.423
.423
σ̂ 2
SE(β̂0 ) SE(β̂1 ) v01
CI (β1)
57.32 9.951
.137 -1.328 (.034, .654)
49.235 12.56
.176 -2.157 (.081,.766)
49.234 10.787
.146 -1.547 (.137,.710)
57.32 12.130
.169
Using the rcal function in STATA (non-bootstrap SE’s for intercept are incorrect).
. mat input suu=(57)
. rcal (yield=) (w:nitrogen), suuinit(suu)
---------------------------------------------------------------yield |
Coef. Std. Err. t
P>|t|
[95% Conf. Interval]
-------+-------------------------------------------------------w |
.4232
.1712
2.47
0.035
.0358
.8105
_cons |
67.56
9.951
6.79
0.000
45.053
90.075
---------------------------------------------------------------rcal (yield=) (w: nitrogen), suuinit(suu) robust
---------------------------------------------------------------|
Semi-Robust
yield | Coef. Std. Err. t
P>|t|
[95% Conf. Interval]
-------+-------------------------------------------------------w | .4232
.1491
2.84
0.019
.0860
.7604
_cons | 67.56
8.559
7.89
0.000
48.202 86.926
---------------------------------------------------------------. rcal (yield = ) (w: nitrogen), suuinit(suu) bstrap brep(5000)
---------------------------------------------------------------|
Bootstrap
yield | Coef. Std. Err. t
P>|t|
[95% Conf. Interval]
-------+-------------------------------------------------------w |.4232
.1927
2.20
0.056
-.01276
.8591
_cons | 67.56
14.16
4.77
0.001
35.530
99.599
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
5
57
Multiple Linear Regression With Additive Error
5.1
Model and the performance of naive estimators.
Yi|xi = β0 +
p−1
X
j=1
βj xij + ǫi = β ′xi∗ + ǫi = β0 + β ′1xi + ǫi
xi = (x1, . . . , xp−1)′ and x′i∗ = (1, x1, . . . , xp−1),
β ′ = (β0, β1, . . . , βp−1 ) = (β0, β ′1), β ′1 = (β1, . . . , βp−1 ).
ǫi are assumed uncorrelated with mean 0 and variance σ 2.
Without an intercept, xi = xi∗ and β = β 1.
In matrix form: Y = Xβ
β + ǫ,

Y1
Y .
Yn













E(ǫǫ) = 0 and Cov(ǫǫ) = σ 2I.



′
 x1∗ 



ǫ1 

ǫ = . 

ǫn )






X =  .  ,


x′n∗
Define
X̄ =
SXX =
P
X
i
Xi/n, Ȳ =
i (Xi
X
i
Yi/n, SY2 =
X
i
(Yi − Ȳ )2/(n − 1),
− X̄)(Xi − X̄)′
and SXY =
n−1
P
i (Xi
− X̄)(Yi − Ȳ )
.
n−1
SXX is a (p − 1) × (p − 1) matrix and SXY is a (p − 1) × 1 vector.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
58
Mimicking what was done in the simple linear case
µX =
n
X
i=1
E(Xi) and ΣXX = E(SXX ).
(13)
Accommodates any mix of random and fixed predictors.
- In the structural case with Xi i.i.d., µ X = E(Xi) and ΣXX = Cov(Xi).
- In functional case with all fixed predictors, µ X =
P
i (xi
− x̄)(xi − x̄)′/(n − 1).
P
i xi /n
and ΣXX = Sxx =
- With a combination of fixed and random predictors the expected value and
covariance of Xi are conditional on any fixed components.
Additive Measurement Error model:
Given yi and xi ,
Di = yi + qi ,
Wi = xi + ui,
with E(qi |yi , xi) = 0, E(ui|yi , xi ) = 0,
2
V (qi|yi , xi ) = σqi
, Cov(ui|yi, xi ) = Σui, and Cov(ui, qi |yi, xi ) = Σuqi.
• Allows error in the response as well as the predictors. If there is no error in
2
= 0 and Σuqi = 0.
y then σqi
• If parts of xi are measured without error, the appropriate components of ui
equal 0 and all of the components in Σui and Σuqi involving that variable
also equal 0.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
59
• The measurement error variances and covariances are allowed to change
with i with
Σu =
n
X
i=1
Σui/n, Σuq =
n
X
i=1
Σuqi/n and
σq2
=
n
X
i=1
2
σqi
/n.
Naive estimators.
β̂
β naive = (W′W)−1W′ D,
W is the same as X but Wi′ in place of x′i.
′
β 1naiveW̄,
β̂
β 1naive = S−1
W W SW D , and β̂0naive = D̄ − β̂
c2
σ
naive
=
P
d 2
#
′
n−1" 2
−D
i)
=
S − β̂
β 1naive ,
β 1naiveSW W β̂
n−p
n−p D
i (Di
d
where D
i is the fitted value using the naive estimates.
Approximate/asymptotic properties of naive estimators
E(β̂
β 1naive ) ≈ γ 1 = (Σ
ΣXX + Σu )−1ΣXX β 1 + (Σ
ΣXX + Σu )−1Σuq
E(β̂
β 0naive ) ≈ γ0 = β0 + (β
β 1 − γ 1 )′ µ X
′
2
2
′
c2
E(σ
ΣXX + Σu )γγ 1.
naive ) ≈ σ + σq + β 1 Σ XX β 1 − γ 1 (Σ
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
60
Alternate expression:
E(β̂
β naive ) ≈ γ = (MXX + Σu∗)−1(MXX β + Σuq∗),
MXX =
µ X∗


X
i

E(Xi∗X′i∗)/n = ΣXX∗ + µX∗µ ′X∗,


′
′

 0 0

 0 0
 1
 , Σ u∗ = 
=  ′  , ΣXX∗ = 
µX
0 Σu
0 ΣXX
In functional case MXX = X′ X/n.



and Σuq∗


0 
.
= 
Σuq
• Similar to simple linear regression the expectations are exact under the normal structural model with normal measurement error and constant measurement error covariance matrix. Otherwise they are only approximate/asymptotic.
• If Σuq = 0 (i.e., no error in the response or the error in the response is
uncorrelated with any errors in the predictors) then
E(β̂
β 1naive) = γ 1 = (Σ
ΣXX + Σu )−1ΣXX β 1 = κβ 1
(14)
κ = (Σ
ΣXX + Σu )−1ΣXX
(15)
where
is referred to as the reliability matrix.
• These same approximations work for a number of generalized linear models
(including logistic regression) with additive error in the predictors.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
61
• Measurement error in one of the variables often induces bias in the
estimates of all of the coefficients including those that are not measured
with error.
Ex. Two predictors x1 and x2 with regression function β0 + β1x1 + β2x2.
x1 subject to measurement error with variance σu2 . No measurement error in
either x2 or y. Then

X
Xi =  i1
Xi2



and ΣXX


2
 σ1 σ12 
,
=
σ12 σ22
with the components of ΣXX interpreted accordingly.
Σui =


2
 σu

0

0

0

2
σ22σ12 − σ12


 β1
E(β̂1naive ) ≈  2 2
2
σ2 (σ1 + σu2 ) − σ12
Bias in β̂1naive



−σ22σu2



≈ 2 2
2 β1
2
σ2 (σ1 + σu ) − σ12
Bias in β̂2naive ≈
REMARKS:

σ12σu2


σ22(σ12 + σu2 )
−
2
σ12


β1.
• If σ12 = 0, then there is no bias in the naive estimator of β2. If the two
predictors are “correlated” (σ12 6= 0) then the measurement error in x1
induces bias in the estimated coefficient for x2.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
62
• The bias in the naive estimator of β2 (the coefficient associated with the
variable measured without error) can be either positive or negative. It does
not depend on β2, but does depend on β1.
• If σ12 = 0, then β̂1naive estimates κ1β1, where κ1 = σ12/(σ12 + σu2 ). This
means the attenuation is the same as in simple linear regression. It is
critical to note that this requires X1 and X2 to be uncorrelated.
Illustration based on values in Armstrong et al.(1989)
x2 = caloric intake (in 100’s of grams), x1 = fiber intake ( in gms).
V (X2) = 33, V (X1) = 72.0, cov(X1, X2) = 16.4 ( correlation of .336).
Measurement error in fiber with measurement variance σu2 = 70.1.
• In a simple linear regression model with only x1 in it the naive estimator
of the coefficient for x1 is estimating κ×(true coefficient for x1), where
κ = 72/(72 + 70.1) = .507.
In a model with both x1 and x2, then naive estimate of coefficient of x1 is
estimating κ1β1 where κ1 = ((33)(72) − 16.42)/(33(72 + 70.1) − 16.42) =
.4767. In this case, if we interpret the bias (or correct the naive estimator)
by assuming the attenuation factor of .507 applies in the multiple setting,
then we don’t go too wrong. This is because the correlation between X1
and X2 is relatively low.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
63
• If we change ρ to .8, then κ = .507, but κ1 = .270, while with ρ = .8 and
σu2 reduced to 10, then κ = .878, but κ1 = .722. This illustrates how it can
be misleading to try and characterize the bias in β1 using κ.
Plot of attenuation factor (κ1 ) in the naive estimator of the coefficient of
the mismeasured predictor in a model with two variables versus ρ = correlation between the mismeasured variable and the second, perfectly measured,
predictor. The reliability on the predictor by itself is κ = σ12/(σ12 + σu2 ) =
0.0
0.4
κ1
0.8
1.2
.9 (solid line), .7 (dotted line) and .5 (dashed line).
0.0
0.2
0.4
0.6
0.8
1.0
0.6
0.8
1.0
0.0
0.4
γ
0.8
1.2
ρ
0.0
0.2
0.4
ρ
The above can be generalized easily to multivariate X1 and X2 with error in
X1 but not X2.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
64
Analysis of Covariance: Interested in comparing group means where
individuals (plants, people, etc.) are randomized to groups (e.g., treatments).
The analysis of covariance adjusts for the effect of an additional variable x, (the
covariate), which is associated with the unit.
Example 1: Randomize varieties of tomato plants over an area observe
Y = yield. Use x = measure of soil nitrogen at the location as a covariate.
Example 2: Randomize individuals with high cholesterol to different treatment regimes. Look at change in cholesterol after a certain period of time.
x = person’s level of physical activity (e.g., METS/day.)
For kth individual assigned to group j, model is Yjk = τj +βx xjk +ǫjk , xjk =
covariate.
µj = population mean for group j = the expected average response if all
units in the population get treatment j.
With randomization µj = τj + βx µX , where µX is the population mean of
the covariate.
The τj are the adjusted group means; i.e., τj = µj − βx µX .
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
65
Can show that the dummy variables indicating group membership and the
covariate are uncorrelated over the randomization. This is because the expected
value of the covariate given the group assignment is always µX .
In the standard analysis of covariance model, which involves randomization to treatment groups and a constant coefficient for the covariate
in each group, measurement error in the covariate does not influence inferences for the τ ’s and functions of them. In particular inferences for
contrasts in the group means (e.g., testing equality of group means and
estimating difference in group means) are correct.
It is important to notice however that the naive estimators of the group means
are not unbiased since τbj,naive + βcxnaive x̄ is a biased estimator of µj .
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
5.2
66
Correcting for Measurement Error
Moment correction: Generalizes the result for simple linear regression case:
′
−1
c
c
c
c
β̂
β1 = Σ
XX Σ XY and β0 = D̄ − β 1 W̄,
c
c
Σ
Σu and Σ
Σuq (with no error in the response
XX = SW W − Σ̂
XY = SW D − Σ̂
c
Σ
XY = SW D ).
Motivated by E(SW W ) = ΣXX + Σu and E(SW D ) = ΣXY + Σuq .
Alternate form
d −1 d
β̂
β=M
XX MXY
where
d
MXX
W′ W
W′ D
c
d
c
=
− cΣ u∗ and MXY =
− cΣ
uq∗ ,
n
n
c
Σ
u∗

′
 0 0
= c
0 Σu



The constant c can be taken to be either c = 1 or c = (n−1)/n. c = (n−1)/n
yields estimators above.
σ̂ 2 =
P
i (Di
′
c
2
′
′
− (βc0 + β
1 Wi ))
c2 − 2β̂
− (σ
β
Σ̂
Σ
+
β̂
β
Σu β̂
β 1).
q
1 uq
1 Σ̂
n−p
If a divisor of n − 1 is used in the first piece above then
′
c
c2 − β̂
β 1.
σ̂ 2 = Sd2 − σ
β 1Σ
XX β̂
q
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
67
• These are essentially the maximum likelihood estimates under normality.
• With no error in Y this is equivalent to a regression calibration estimator
obtained by regressing Yi on x̂i = w̄ + κ̂
κ′ (wi − w̄).
• Similar to simple linear regression, we need a modification if certain matrices
c2 < 0.
(e.g., M̂xx ) are negative or if σ
5.3
Sampling Properties and Approximate Inferences
c
Small sample properties of β
difficult to ascertain. Asymptotic properties
treated in detail in Fuller (1987) and summarized in Buonaccorsi (2010).
Under some general conditions
c
β
∼ AN (β
β , Σβ )
−1
Σβ = M−1
XX HMXX
with
H = Cov(
n
X
i=1
∆i)/n2,
′
β ) − (Σ̂
Σuqi∗ − Σ̂
Σui∗β ) = Ti − Zi,
∆i = Wi∗(Di − Wi∗
ei = ǫi + qi − u′iβ 1, Ti = Wi∗ei and Zi = (Σ̂
Σuqi∗ − Σ̂
Σui∗β ).
E(∆
∆i ) = 0.
c
With an estimate Σ
β , can use large sample normal based Wald methods
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
68
Considerations in the form of Σβ and how to estimate it.
• Do we have separately estimated measurement error parameters for each i?
• Are the measurement error parameters constant over i?
• Can the estimated measurement error parameters be treated as known?
• Is ǫi assumed normal?
• Are the measurement errors assumed normal?
• If the measurement error variances and covariances are estimated what can
we say about their sampling properties?
• If replication is involved are the replicates assumed normal?
• Robust estimate
d −1 d d −1
Σ̂
Σβ,Rob = M
XX HR MXX ,
• Normal based with known M.E. parameters ( (3.1.23) in Fuller.)
d −1 d d −1
Σ̂
Σβ,N = M
XX HN MXX ,
ĤR =
P
′
Pn
c c′
′ c2
2
ˆ
ˆ
∆
∆
i i i /n(n − p) and ĤN = i=1 (Wi∗ Wi∗ σ
ei + Zi Zi )/n
c
c
c
ˆ i = Wi∗ri − (Σ̂uqi∗ − Σ̂ui∗β
) and Z
∆
i = Σ̂uqi∗ − Σ̂ui∗ β
′
′
c c
c c c
c2 = σ
c2 + σ
c 2 − 2β
σ
ei
qi
1 Σ uqi + β 1 Σ ui β 1 .
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
69
c
c
- Without measurement error Σ
β,Rob is White’s robust estimate and Σ β,N =
c2 (X′ X)−1 (the usual covariance estimate for linear regression with constant
σ
variances.)
- There are other possibilities under various assumptions.
Bootstrapping. Same comments made for simple linear regression apply
here.
- Be careful just resampling the units. Only applicable under certain assumptions.
- Two stage resampling (mimicking the randomness in original model) is
difficult to do non-parametrically but parametric bootstrap is pretty straightforward.
There are a variety of modifications and alternate estimators that can be
considered. These include (but are not limited to)
• Maximum likelihood estimators under normality and particular restrictions
on some parameters (includes orthogonal least squares).
• Small sample modifications.
• Weighted corrected estimators.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
70
• Estimation using instrumental variables.
Instead of replication or estimated M.E. variances and covariances have an
r × 1 vector of instruments Ri (r ≥ p = number of elements of xi∗ ).
Variables measured without error are included in Ri , so they are instruments
for themselves. For Ri to be instrumental for xi∗ requires:
1. Ri is uncorrelated with ui, qi and ǫi
2. Ri is correlated with xi∗ .
SAS- SYSLIN, STATA-IVREG and other programs will run this.
Some other issues.
• Residual analysis for model assessment
• Prediction
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
5.4
71
Examples
Defoliation example revisited
Allow a different intercept for each forest.
Forest j model: τj + βx x.
Yi = β ′xi + ǫi, x′i = (xi1, xi2, xi3, xi), β ′ = (τ1, τ3, τ3, βx )
xi1, xi2 and xi3 are dummy variables indicating forest membership.

 0


 0
Σui = 
 0

0
τ1
τ2
τ3
βx
σ2
Naive (SE)
39.21 (14.65)
16.78 (10.26)
-4.49 (12.37)
0.44 (0.19)
455.24
0
0
0
0
Cor
26.86
10.12
-12.73
0.638
381.87
0
0
0
0
0
0
0
2
σui
SE-R
21.26
11.31
12.75
0.239











and Σuqi

 0 




 0 

.
= 


 0 


σuqi
CI-R
(-14.82,68.54)
(-12.05, 32.30)
(-37.72, 12.25)
( 0.17, 1.11)
SE-N
20.66
13.26
16.31
0.321
CI-N
(-13.63, 67.35)
(-15.86, 36.11)
(-44.70, 19.24)
( 0.01, 1.27)
R: robust standard error. N: “normal” based.
- Substantial correction for attenuation in the estimate of βx from .44 to a
corrected estimate of .64.
- Some relatively large changes in the estimated intercepts. Recall that naive
estimates of coefficients of perfectly measured predictors can be biased. Here
due to different estimated egg mass densities(62.3, 33.6 and 41.6) over the
three different forests, indicating ”correlation” between the dummy variables
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
72
indicating forest stand and the egg mass density.
Hypothesis testing:
Null Hypothesis Naive (Pvalue) Robust (Pvalue) Normal (Pvalue)
βx = 0
2.32 (.036)
2.67 (.008)
1.98 (.047)
τ1 = τ2 = τ2
5.24 (.020)
12.58 (.002)
7.29 (.026)

Σ̂
Σβ,naive
 214.75


 74.82
= 
 92.77

−2.23

 452.17


 122.02
Σ̂
Σβ,rob = 
 220.91

−4.77

Σ̂
Σβ,norm
 426.75


 194.84
= 
 234.53

−5.99
74.82
105.34
49.98
−1.20
92.77
49.98
153.02
−1.49

−2.23 

−1.20 

−1.49 

0.04

122.02
128.03
66.47
−1.31
220.91
66.47
162.52
−2.74
−4.77 

−1.31 

−2.74 

0.06
194.84
175.77
132.19
−3.37
234.53
132.19
266.06
−4.06
−5.99 

−3.37 
.
−4.06 

0.10

5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
73
House price example
Data from DASL (Data and Story Library) in StatLib ( “a random sample
of records of resales of homes from Feb 15 to Apr 30, 1993 from the files
maintained by the Albuquerque Board of Realtors. This type of data is
collected by multiple listing agencies in many cities and is used by realtors
as an information base.”)
Variable
PRICE
SQFT
TAX
Mean
1077
1667
793.49
SD
383.99
531.34
308.18
Min
540.00000
837.00000
223.00000
Max
2150
3750
1765
House price is in hundreds of dollars so the mean price is $10,770 (the good
old days). The correlation between recorded square footage and tax level is
.8586.
Regress sale price on square footage and the yearly tax using the 107 (out
of 117) cases that have all three variables assuming square footage is measured
with error having a constant, known measurement error variance, σu2 .
• The bootstrap is one-stage (resampling cases). 52 out of 5000 bootstrap
samples required modification for some negative matrices.
• Fairly big difference between robust and the “normal” based estimates of
Σβ . Probably due to heterocedasticity in the error in the equation. Use
robust or bootstrap methods.
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
74
σu = 175; Cor-R uses robust; Cor-N normal based. Standard errors in
parentheses.
Method
Intercept
Tax
SqFt
Naive:
88.34 (55.81)
.721 (.107) .250 (.062)
Naive: RobSE 88.34 (87.70)
.721 (.178) .250 (.131)
Cor-R:
1.71 (171.05)
.46 (.41)
.43 (.30)
Cor-N:
1.71 (74)
.46 (.18)
.43 (.11)
Bootstrap
Mean(SE)
-104.15(241.84) .21 (.59)
.61 (.43)
Confidence intervals
Naive: CI
(-22.34, 199.01) (.508, .933) (.127,.373)
Cor-R: CI
(-333.55,336.98) ( -0.34,1.26) (-0.16, 1.01)
Cor-N: CI
(-143.3,146.75) (0.11, 0.81) (0.20, 0.65)
Boot: CI
(-628.66, 195.33) (-1.08, 0.96) (0.09,1.56)
STATA analyses using rcal function.
---------------------------------------------------------------price | Coef.
Std. Err t
P>|t|
[95% Conf. Interval]
-------+-------------------------------------------------------tax | .46
.3218453
1.43
0.156
-.1777726
1.09869
w | .43
.2236257
1.90
0.060
-.0174522
.8694642
_cons | 1.71
55.80885
0.03
0.976
-108.9563
112.3858
----------------------------------------------------------------|
Semi-Robust
price | Coef.
Std. Err
t
P>|t|
[95% Conf. Interval]
-------+--------------------------------------------------------tax | .46
.4149749
1.11
0.270
-.362452
1.283369
w | .43
.2979442
1.43
0.156
-.1648285
1.016841
_cons | 1.71 87.70255
0.02
0.984
-172.2027
175.6322
----------------------------------------------------------------|
Bootstrap
price | Coef.
Std. Err. t
P>|t|
[95% Conf. Interval]
-------+--------------------------------------------------------tax | .46
.5837385
0.79
0.432
-.6971165
1.618034
w | .43
.4241468
1.00
0.318
-.415093
1.267105
_cons | 1.71
239.2451
0.01
0.994
-472.7173
476.1468
-----------------------------------------------------------------
σ2
30314.0
27094.1
27094.1
5
MULTIPLE LINEAR REGRESSION WITH ADDITIVE ERROR
75
Illustration of impact of measurement error on estimated coefficients for tax
and square footage, their correlation and variance in error in the equation as σu
ranged from 0 to 200 in steps of 10 ( reliability ratio, for square footage alone
0.55
0.35
0.25
sqft
0.45
0.3 0.4 0.5 0.6 0.7
tax
goes from 1 down to .876.)
0
50
150
0
50
correlation
0
50
150
σu
0.86 0.88 0.90 0.92
σu
25000 27000 29000
Error Variance
σu
150
0
50
150
σu