Download Notation summary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Bias of an estimator wikipedia , lookup

German tank problem wikipedia , lookup

Choice modelling wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Coefficient of determination wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Notation.1
STAT 950
Notation summary
Fall 2012
The notation associated with the bootstrap can be difficult to get used to at first. The purpose of this
document is to give you a summary of the notation and particular equivalences in it. I recommend
printing this notation handout with two pages per sheet of paper.
Chapter 2
1) F is a CDF and is used notationally to represent the distribution of Y (the random variable of
interest)
2) General parameter of interest: 
a) Written as a statistical function: t(F)
b) This parameter could be E(Y), a parameter in a probability distribution, a function of other
parameters, or simply just a particular population quantity
3) General statistic of interest: T
a) Written as a statistical function: t( F̂ )
b) T is a random variable; it is an estimator
c) t is the observed value of the random variable; it is an estimate
4) Y is a random variable with distribution of F
a) Shorthand notation for specifying the distribution of Y: Y ~ F
b) F is estimated by F̂
c) When the parametric form for F is known, we can estimate the parameters and substitute them
back into F to produce F̂ ; sometimes, this estimate is denoted by Fˆ where ̂ is a vector of
estimated parameters (possibly containing )
d) When F is unknown, we can estimate it using the EDF and denote it by F̂
e) Expected value of the random variable Y: E(Y) = E(Y|F)
5) Sampling distribution of T: G
a) Other notational representations of G: G(t)  G(t | F)
b) Frequently, we will look at the sampling distribution of T –  and this will also be denoted by the
same way as here: G = G(t)  G(t | F)
ˆ  G(t)
ˆ  G(t | F)
ˆ – This is what the bootstrap is trying to find
c) Estimate of G: G
d) th quantile of Ĝ(t) : Ĝ1 ( )
6) Y is a random variable with distribution of F̂
a) Shorthand notation for specifying the distribution of Y: Y ~ F̂
b) Expected value of the random variable Y*: E(Y) = E(Y| F̂ ), notice there is no  on the last E( )
c) Resample of size n from F̂ : Y1,Y2, ,Yn
d) Resamples taken are r = 1, …, R
e) The rth actual resample from F̂ : yr1,yr2 , ,yrn
7) The statistic for the resample: T
a) T is a statistic calculated on Y1,Y2, ,Yn
b) t is an observed value of the statistic calculated on y1,y2 , ,yn
 2012 Christopher R. Bilder
Notation.2
c) For the rth resample from F̂ , tr is the observed value of the statistic calculated from

yr1,yr2 , ,yrn
d) Expected value of the random variable T: E(T) = E(T| F̂ ), notice there is no  on the last E( )
ˆ  G(u)
ˆ
ˆ  P(T  u | F)
ˆ
 G(u | F)
e) The estimated G: G
i) Frequently, we will want the sampling distribution of T – , and we will also call this G
ii) The estimated sampling distribution of T –  is the distribution of T– t:
ˆ  G(u)
ˆ
ˆ  P(T  t  u | F)
ˆ
G
 G(u | F)
ˆ , notice there is no  on the
f) Equivalent probability representations P (T  t  u)  P(T  t  u | F)
last P( ).
8) Estimated values through using R resamples
1 R
a) The estimated value of E(T):  Tr
R r 1
b) The estimated value of Ĝ(t) using R resamples: ĜR (t)
c) The estimated value of the th quantile of Ĝ(t) using R resamples: t((R1) )) , which is the (R+1)

ordered value from t(1) ,...,t(R)
.
9) Bias of T: 
a) True bias:  = b(F) = E(T|F) – t(F) = E(T) – 
b) Estimate of bias: B = b( F̂ ) = E(T| F̂ ) – t( F̂ ) = E(T) – t
1 R 
c) The estimated value of the bias using R resamples: BR    Tr   t
 R r 1 
10) Double bootstrap
a) F̂ represents the distribution of the resample
b) F̂r represents the distribution of the rth resample involving yr1 , …, yrn
c) tr is the statistic of interest calculated on a resample from F̂r
11) Influence function of T: Lt(y; F)
a) Empirical influence function: Lt(y; F̂ ) = l(y)
b) Empirical influence function values: lj(yj) = lj
c) Jackknife estimate of lj: ljack,j
d) Regression estimate of lj: lˆj
12) Estimates of variance
a) Nonparametric -method variance: vL = vL( F̂ )
b) Jackknife estimate: vjack
c) Regression estimate: vreg
d) Bootstrap estimate: vboot
e) All of these can be calculated on a resample. For example, vL is the nonparametric -method
variance calculated on a resample.
 2012 Christopher R. Bilder
Notation.3
Chapter 3
1) Notation in Section 3.3
a) ij is one of the resampled eij ; the i index on ij does not mean it originally came from sample i.
2) Notation in Section 3.9
a) E(T| F̂ ) is equivalent to E(T) and E(T| F̂ ) from Chapter 2; BMA does have some
inconsistency in their notation
b) Statistic of interest calculated on the resample: t(Fˆ  )  T
c) Expected values with respect to the resampling distribution: E(T) = E(T F̂ ) = E(T F̂ )
d) Estimated distribution of the re-resample: F̂
e) Statistic of interest calculated on the re-resample: t(Fˆ  )  T
3) Notation in Section 3.10
a) Observed statistic value without the jth observation: t  j
b) Mean value of statistic calculated on the resamples that do not have the jth observation:
1
tj 
tr

R j r:y j not in resample


Chapter 4
1) Probability distribution under Ho: F̂0
2) P-value: p
a) When resampling is used to estimate the p-value, this estimate is often denoted by p*
b) Unfortunately, BMA is inconsistent with their notation so sometimes it may not be denoted by
p*
3) Studentized statistic assuming Ho is true: z0
a) Studentized statistic with resamples taken from F̂0 : z0   t  0 
b) Studentized statistic with resamples taken from F̂ : z   t  t 
Chapter 5
1) Confidence interval for 
a) Lower limit: ˆ 
b) Upper limit: ˆ1
c) ˆ  and ˆ1 are the limits of a (1 – 2)100% confidence interval
2) Transformations
a) u = h(t)
b)  = h()
3) BCa interval
a) Acceleration constant: a
b) Bias correction: w
c) Adjusted -level: 
 2012 Christopher R. Bilder
v0
v
Notation.4
Chapter 6
1) Raw residuals: ej  y j  ˆ j where ˆ j  ˆ 0  ˆ1x j ; note that eij was a little different in Section 3.3.
y j  ˆ j
2) Modified residuals: rj 
1/2
1  hj 
3) Standardized residuals:
y j  ˆ j
s 1  hj 
1/2
4) Null hypothesis value of 1 in a regression model: 1,0
5) Predicted values of E(Y) under the null hypothesis model: ̂0
6) Adjusted response variable value under the null hypothesis: Yj = Yj – 1,0xj
7) Observed value of the mean square error using regression modeling: s2
8) Testing Ho:  = 0 vs. Ha:  0 in the full model of Y = X +  = X0 + X1 + 
a) Reduced model: Y = X0 + 0
b) Residuals for reduced model: e0
c) Residuals for model with X1 as the response variable(s) and X0 as the explanatory variables:
X1.0  (I  X0 (X0 X0 )1 X0 )X1
9) Unknown individual response: Y+; vector of corresponding explanatory variable values: x+
ˆ   Y  xˆ   x   
10) Accuracy of point predictor:   Y
ˆ    Y
ˆ      xˆ    xˆ   
a) Bootstrap version of the accuracy:   Y
1
ˆ  )  sy  s 1  x  XX  x
11) Var(Y  Y
Chapter 7
1) Standardized Pearson residual rPj  (y j  ˆ j )
(1 hj )V(ˆ j )
2) Standardized deviance residual rDj  dj (1  hj )1/2 where dj  sign(y j  ˆ j ) 2 j (y j )  j (ˆ j )
1/2
 2012 Christopher R. Bilder