Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Revisiting an Old Topic: Probability of Replication D. Lizotte, E. Laber & S. Murphy Johns Hopkins Biostatistics September 23, 2009 Outline • • • • • Scientific Background Our Estimand: Probability of Selection Estimators STAR*D Where to go from here? 2 Scientific Background First experiment results in – or – – – – , what is the chance that we will replicate this result in a subsequent experiment? Prob. of Concurrence or Prob. of Replication Killeen (2005) followed by great controversy in psychology (Cumming, (2005, 2006, 2008); MacDonald (2005);Doros & Geier(2005); Iverson(2008); Iverson, Wagenmakers & Lee (2008); Asby & O’Brien(2008), Iverson, Lee & Wagenmakers (2009)……) 3 Scientific Background Similar problem but discredited: • Post-hoc power/ Observed power: Assuming the observed standardized effect size is the truth, calculate the probability of rejecting null hypothesis. Hoenig & Heisey (2001) 4 Scientific Background First experiment results in – or – – what is the chance that we will replicate this result in a subsequent experiment? • • Why is this question so attractive? Scientists (including statisticians!) often want to answer this question with 1 – p-value 5 Scientific Background • First experiment results in – or – , – what is the chance that we will replicate this result in a subsequent experiment? • 1 – p-value does not address this question. – Goodman (1992), Cumming (2008) – 1 – p-value is not an estimator. 6 Scientific Background • Much confusion about estimand: – , what is the chance that we will replicate this result in a subsequent experiment? • Do we want to “estimate” 1) or 2) or 3) or 4) ? • Good frequentist properties are desired. 7 Our Estimand • • Probabilities of Selection 2) The probability of selection is a composite measure of signal, noise, and sample size 8 Our Estimand • Advantages (The Hope) over the concept of p-value – – – – • Close to what many scientists want. The intuitive interpretation is correct. Does not rely on the correctness of a data generating model for meaning. Less ambitious than 3) Disadvantages – – – We changed the question. Some may think that there is no need for a confidence interval—wrong. Non-regular 9 Estimators • Why is this a hard problem? – The desire for good frequentist properties – The fact that effect sizes tend to be small relative to the noise. – This is a non-regular problem—bias is of the same order as variance. • Back of the envelope calculations: 10 Estimators • • Use plug-in estimator • Plug-in estimator is 1 – p-value (Goodman, 1992)! – Nonregular • Near a uniform distribution if • If n is large, close to 0 or 1 otherwise – We can expect to be small. 11 Estimators • Try a Bayesian approach. – Random sample from a , – Flat prior on , known – Use as an estimator of – • Bayesian methods do not eliminate non-regularity. 12 Estimators Focus on MSE in formulating estimators for 1) Assume is approximately normal with mean and variance . 1) Flat prior (e.g. Killeen’s prep) 2) Normal Prior: 3) Prior is mixture between N(0,1) with probability w point mass on with probability 1-w 13 Estimators Focus on MSE in formulating estimators for 2) Single bootstrap (Efron & Tibshirani:1989) . • This is 1 - p-value. No assumption of approximate normality. If is approximately normal then this is approximately the plug-in estimator: 3) Double bootstrap • This is a bagged plug-in estimator. This bags the 1bootstrap p-value. No assumption of approximate normality. 14 . Why a double bootstrap? Double bootstrap estimator for . • Bagging is used to trade variance for bias when estimators are unstable (Buehlman & Yu, 2002). • The bootstrap estimator of is unstable; if it does not converge as the sample size increases. • Under local alternatives such as the bootstrap estimator is inconsistent as well. 15 Double Bootstrap Double bootstrap estimator for . If has an approximate normal distribution then the double bootstrap estimator is That is, the double bootstrap reduces to prep in this case. 16 MSE Plots • Two groups, each of size 25 • Two distributions (normal, bimodal) • Two definitions of – – • Compare – prep, pnorm, pmix, single bootstrap, double bootstrap 17 Estimators Instead of a point estimator, consider a confidence interval for . Assume then has an approximate normal distribution; In this case a confidence interval for can be found from a confidence interval for the standardized effect size: 21 STAR*D • Sequenced Treatment Alternatives to Relieve Depression • Large multi-site study focused on individuals whose depression did not remit with citalopram • In this trial each individual can proceed through up to 4 stages of treatment. The individual moves to a next stage if the individual is not responding to present treatment. • Each stage involves a randomization. 22 STAR*D • This is a data from 683 individuals who did not respond to citalopram and preferred a switch in treatment. • These individuals were randomized between Venlafaxine, Bupropion, Sertraline • Outcome: Time until remission. • We model the area under the survival curve from entry into this stage of treatment until 30 months. (e.g. min(T, 30)). 23 STAR*D Regression formula at level 2: STAR*D • For each s, • Double Bootstrap – Inner-most bootstrap counts proportion of “votes” in which – Outer-most bootstrap averages over the proportion across the bootstrap samples 25 Discussion • • • • Definition of the probability of selection when there is more than two treatments. Confidence intervals for comparisons between more than two treatments. Is there a minimax estimator of the selection probability? Is there hope for the replication probability? 28 Truth in Advertising: STAR*D Missing Data + Study Drop-Out • • • • 1200 subjects begin level 2 (e.g. stage 1) 42% study dropout during level 2 62% study dropout by 30 weeks. Approximately 13% item missingness for important variables observed after the start of the study but prior to dropout. 29 This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/HopkinsBiostat09.23.09.ppt Email me with questions or if you would like a copy! [email protected] 30 Our Estimand • The probability of selection is a composite measure of signal, noise and sample size • The p-value is a composite measure of estimated signal, estimated noise and sample size. 31