• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Confidence interval wikipedia, lookup

Birthday problem wikipedia, lookup

Randomness wikipedia, lookup

Probability box wikipedia, lookup

Generalized linear model wikipedia, lookup

Double bind wikipedia, lookup

Transcript
```Revisiting an Old Topic:
Probability of Replication
D. Lizotte, E. Laber & S. Murphy
Johns Hopkins Biostatistics
September 23, 2009
Outline
•
•
•
•
•
Scientific Background
Our Estimand: Probability of Selection
Estimators
STAR*D
Where to go from here?
2
Scientific Background
First experiment results in
–
or
–
–
–
–
,
what is the chance that we will replicate this result
in a subsequent experiment?
Prob. of Concurrence or Prob. of Replication
Killeen (2005) followed by great controversy in
psychology (Cumming, (2005, 2006, 2008); MacDonald
(2005);Doros & Geier(2005); Iverson(2008); Iverson, Wagenmakers
& Lee (2008); Asby & O’Brien(2008), Iverson, Lee & Wagenmakers
(2009)……)
3
Scientific Background
Similar problem but discredited:
•
Post-hoc power/ Observed power: Assuming
the observed standardized effect size is the
truth, calculate the probability of rejecting
null hypothesis. Hoenig & Heisey (2001)
4
Scientific Background
First experiment results in
–
or
–
– what is the chance that we will replicate this result
in a subsequent experiment?
•
•
Why is this question so attractive?
Scientists (including statisticians!) often want to
answer this question with 1 – p-value
5
Scientific Background
•
First experiment results in
–
or
–
,
– what is the chance that we will replicate this result
in a subsequent experiment?
•
1 – p-value does not address this question.
– Goodman (1992), Cumming (2008)
– 1 – p-value is not an estimator.
6
Scientific Background
•
–
, what is the chance that we will replicate this
result in a subsequent experiment?
• Do we want to “estimate”
1)
or
2)
or
3)
or
4)
?
•
Good frequentist properties are desired.
7
Our Estimand
•
•
Probabilities of Selection
2)
The probability of selection is a composite measure
of signal, noise, and sample size
8
Our Estimand
•
Advantages (The Hope) over the concept of p-value
–
–
–
–
•
Close to what many scientists want.
The intuitive interpretation is correct.
Does not rely on the correctness of a data generating
model for meaning.
Less ambitious than 3)
–
–
–
We changed the question.
Some may think that there is no need for a confidence
interval—wrong.
Non-regular
9
Estimators
• Why is this a hard problem?
– The desire for good frequentist properties
– The fact that effect sizes tend to be small relative to
the noise.
– This is a non-regular problem—bias is of the same
order as variance.
• Back of the envelope calculations:
10
Estimators
•
• Use plug-in estimator
• Plug-in estimator is 1 – p-value (Goodman, 1992)!
– Nonregular
• Near a uniform distribution if
• If n is large, close to 0 or 1 otherwise
– We can expect
to be small.
11
Estimators
• Try a Bayesian approach.
– Random sample
from a
,
– Flat prior on , known
– Use
as an estimator of
–
• Bayesian methods do not eliminate non-regularity.
12
Estimators
Focus on MSE in formulating estimators for
1) Assume is approximately normal with mean
and variance
.
1) Flat prior (e.g. Killeen’s prep)
2) Normal Prior:
3) Prior is mixture between N(0,1) with probability w point
mass on
with probability 1-w
13
Estimators
Focus on MSE in formulating estimators for
2) Single bootstrap (Efron & Tibshirani:1989) .
•
This is 1 - p-value. No assumption of approximate
normality. If
is approximately normal then this is
approximately the plug-in estimator:
3) Double bootstrap
•
This is a bagged plug-in estimator. This bags the 1bootstrap p-value. No assumption of approximate
normality.
14
.
Why a double bootstrap?
Double bootstrap estimator for
.
• Bagging is used to trade variance for bias when
estimators are unstable (Buehlman & Yu, 2002).
• The bootstrap estimator of
is
unstable; if
it does not converge
as the sample size increases.
• Under local alternatives such as
the
bootstrap estimator is inconsistent as well.
15
Double Bootstrap
Double bootstrap estimator for
.
If
has an approximate normal distribution then the
double bootstrap estimator is
That is, the double bootstrap reduces to prep in this
case.
16
MSE Plots
• Two groups, each of size 25
• Two distributions (normal, bimodal)
• Two definitions of
–
–
• Compare
– prep, pnorm, pmix, single bootstrap, double
bootstrap
17
Estimators
Instead of a point estimator, consider a confidence
interval for
.
Assume
then
has an approximate normal distribution;
In this case a confidence interval for
can be found from a confidence interval for the
standardized effect size:
21
STAR*D
• Sequenced Treatment Alternatives to Relieve
Depression
• Large multi-site study focused on individuals
whose depression did not remit with citalopram
• In this trial each individual can proceed through
up to 4 stages of treatment. The individual
moves to a next stage if the individual is not
responding to present treatment.
• Each stage involves a randomization.
22
STAR*D
• This is a data from 683 individuals who did not
respond to citalopram and preferred a switch in
treatment.
• These individuals were randomized between
Venlafaxine, Bupropion, Sertraline
• Outcome: Time until remission.
• We model the area under the survival curve
from entry into this stage of treatment until 30
months. (e.g. min(T, 30)).
23
STAR*D
Regression formula at level 2:
STAR*D
• For each s,
• Double Bootstrap
– Inner-most bootstrap counts proportion of “votes”
in which
– Outer-most bootstrap averages over the proportion
across the bootstrap samples
25
Discussion
•
•
•
•
Definition of the probability of selection when
there is more than two treatments.
Confidence intervals for comparisons between
more than two treatments.
Is there a minimax estimator of the selection
probability?
Is there hope for the replication probability?
28
STAR*D
Missing Data + Study Drop-Out
•
•
•
•
1200 subjects begin level 2 (e.g. stage 1)
42% study dropout during level 2
62% study dropout by 30 weeks.
Approximately 13% item missingness for
important variables observed after the start
of the study but prior to dropout.
29
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/
seminars/HopkinsBiostat09.23.09.ppt
Email me with questions or if you would like a
copy!
[email protected]
30
Our Estimand
•
The probability of selection is a composite measure
of signal, noise and sample size
•
The p-value is a composite measure of estimated
signal, estimated noise and sample size.
31
```
Related documents