Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Linear regression wikipedia , lookup
Regression analysis wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Bias of an estimator wikipedia , lookup
German tank problem wikipedia , lookup
Choice modelling wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Notation.1 STAT 950 Notation summary Fall 2012 The notation associated with the bootstrap can be difficult to get used to at first. The purpose of this document is to give you a summary of the notation and particular equivalences in it. I recommend printing this notation handout with two pages per sheet of paper. Chapter 2 1) F is a CDF and is used notationally to represent the distribution of Y (the random variable of interest) 2) General parameter of interest: a) Written as a statistical function: t(F) b) This parameter could be E(Y), a parameter in a probability distribution, a function of other parameters, or simply just a particular population quantity 3) General statistic of interest: T a) Written as a statistical function: t( F̂ ) b) T is a random variable; it is an estimator c) t is the observed value of the random variable; it is an estimate 4) Y is a random variable with distribution of F a) Shorthand notation for specifying the distribution of Y: Y ~ F b) F is estimated by F̂ c) When the parametric form for F is known, we can estimate the parameters and substitute them back into F to produce F̂ ; sometimes, this estimate is denoted by Fˆ where ̂ is a vector of estimated parameters (possibly containing ) d) When F is unknown, we can estimate it using the EDF and denote it by F̂ e) Expected value of the random variable Y: E(Y) = E(Y|F) 5) Sampling distribution of T: G a) Other notational representations of G: G(t) G(t | F) b) Frequently, we will look at the sampling distribution of T – and this will also be denoted by the same way as here: G = G(t) G(t | F) ˆ G(t) ˆ G(t | F) ˆ – This is what the bootstrap is trying to find c) Estimate of G: G d) th quantile of Ĝ(t) : Ĝ1 ( ) 6) Y is a random variable with distribution of F̂ a) Shorthand notation for specifying the distribution of Y: Y ~ F̂ b) Expected value of the random variable Y*: E(Y) = E(Y| F̂ ), notice there is no on the last E( ) c) Resample of size n from F̂ : Y1,Y2, ,Yn d) Resamples taken are r = 1, …, R e) The rth actual resample from F̂ : yr1,yr2 , ,yrn 7) The statistic for the resample: T a) T is a statistic calculated on Y1,Y2, ,Yn b) t is an observed value of the statistic calculated on y1,y2 , ,yn 2012 Christopher R. Bilder Notation.2 c) For the rth resample from F̂ , tr is the observed value of the statistic calculated from yr1,yr2 , ,yrn d) Expected value of the random variable T: E(T) = E(T| F̂ ), notice there is no on the last E( ) ˆ G(u) ˆ ˆ P(T u | F) ˆ G(u | F) e) The estimated G: G i) Frequently, we will want the sampling distribution of T – , and we will also call this G ii) The estimated sampling distribution of T – is the distribution of T– t: ˆ G(u) ˆ ˆ P(T t u | F) ˆ G G(u | F) ˆ , notice there is no on the f) Equivalent probability representations P (T t u) P(T t u | F) last P( ). 8) Estimated values through using R resamples 1 R a) The estimated value of E(T): Tr R r 1 b) The estimated value of Ĝ(t) using R resamples: ĜR (t) c) The estimated value of the th quantile of Ĝ(t) using R resamples: t((R1) )) , which is the (R+1) ordered value from t(1) ,...,t(R) . 9) Bias of T: a) True bias: = b(F) = E(T|F) – t(F) = E(T) – b) Estimate of bias: B = b( F̂ ) = E(T| F̂ ) – t( F̂ ) = E(T) – t 1 R c) The estimated value of the bias using R resamples: BR Tr t R r 1 10) Double bootstrap a) F̂ represents the distribution of the resample b) F̂r represents the distribution of the rth resample involving yr1 , …, yrn c) tr is the statistic of interest calculated on a resample from F̂r 11) Influence function of T: Lt(y; F) a) Empirical influence function: Lt(y; F̂ ) = l(y) b) Empirical influence function values: lj(yj) = lj c) Jackknife estimate of lj: ljack,j d) Regression estimate of lj: lˆj 12) Estimates of variance a) Nonparametric -method variance: vL = vL( F̂ ) b) Jackknife estimate: vjack c) Regression estimate: vreg d) Bootstrap estimate: vboot e) All of these can be calculated on a resample. For example, vL is the nonparametric -method variance calculated on a resample. 2012 Christopher R. Bilder Notation.3 Chapter 3 1) Notation in Section 3.3 a) ij is one of the resampled eij ; the i index on ij does not mean it originally came from sample i. 2) Notation in Section 3.9 a) E(T| F̂ ) is equivalent to E(T) and E(T| F̂ ) from Chapter 2; BMA does have some inconsistency in their notation b) Statistic of interest calculated on the resample: t(Fˆ ) T c) Expected values with respect to the resampling distribution: E(T) = E(T F̂ ) = E(T F̂ ) d) Estimated distribution of the re-resample: F̂ e) Statistic of interest calculated on the re-resample: t(Fˆ ) T 3) Notation in Section 3.10 a) Observed statistic value without the jth observation: t j b) Mean value of statistic calculated on the resamples that do not have the jth observation: 1 tj tr R j r:y j not in resample Chapter 4 1) Probability distribution under Ho: F̂0 2) P-value: p a) When resampling is used to estimate the p-value, this estimate is often denoted by p* b) Unfortunately, BMA is inconsistent with their notation so sometimes it may not be denoted by p* 3) Studentized statistic assuming Ho is true: z0 a) Studentized statistic with resamples taken from F̂0 : z0 t 0 b) Studentized statistic with resamples taken from F̂ : z t t Chapter 5 1) Confidence interval for a) Lower limit: ˆ b) Upper limit: ˆ1 c) ˆ and ˆ1 are the limits of a (1 – 2)100% confidence interval 2) Transformations a) u = h(t) b) = h() 3) BCa interval a) Acceleration constant: a b) Bias correction: w c) Adjusted -level: 2012 Christopher R. Bilder v0 v Notation.4 Chapter 6 1) Raw residuals: ej y j ˆ j where ˆ j ˆ 0 ˆ1x j ; note that eij was a little different in Section 3.3. y j ˆ j 2) Modified residuals: rj 1/2 1 hj 3) Standardized residuals: y j ˆ j s 1 hj 1/2 4) Null hypothesis value of 1 in a regression model: 1,0 5) Predicted values of E(Y) under the null hypothesis model: ̂0 6) Adjusted response variable value under the null hypothesis: Yj = Yj – 1,0xj 7) Observed value of the mean square error using regression modeling: s2 8) Testing Ho: = 0 vs. Ha: 0 in the full model of Y = X + = X0 + X1 + a) Reduced model: Y = X0 + 0 b) Residuals for reduced model: e0 c) Residuals for model with X1 as the response variable(s) and X0 as the explanatory variables: X1.0 (I X0 (X0 X0 )1 X0 )X1 9) Unknown individual response: Y+; vector of corresponding explanatory variable values: x+ ˆ Y xˆ x 10) Accuracy of point predictor: Y ˆ Y ˆ xˆ xˆ a) Bootstrap version of the accuracy: Y 1 ˆ ) sy s 1 x XX x 11) Var(Y Y Chapter 7 1) Standardized Pearson residual rPj (y j ˆ j ) (1 hj )V(ˆ j ) 2) Standardized deviance residual rDj dj (1 hj )1/2 where dj sign(y j ˆ j ) 2 j (y j ) j (ˆ j ) 1/2 2012 Christopher R. Bilder