Download PowerPoint Presentation - Running Randomized Evaluations

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PRE-ANALYSIS PLANS
Module 8.3
Recap on statistics
• If we find a result is significant at the 5% level, what does
this mean?
• there is a 5% or less probability it is the result of chance
• If we test 10 independent hypotheses there is
a 40% chance we will fail to reject the null
at the 5%
level at least once
find one hypothesis is significant at
the 5% level
• i.e. 40% chance
Publication bias: what is it?
• Evaluations with positive or negative significant
impacts more likely to get published
• If enough evaluations are run on a given type of
program, some will give positive result by chance
• Those with a vested interest may even deliberately
run many studies and publicize only some of the
results
• Published results will suggest the program is
effective even if it isn’t
Study registers as a solution
• All studies are registered before the results are
known
• Shows how many studies examined a type of
program
• If 20 studies are started on a given question and
only one positive study is published, we may
worry the others found no result
• Incentives for registration
• Some medical journals require registration before study started for
publication
• New registries for RCTs in social sciences
• e.g, by the American Economic Association
Coverage of registries
• Q: what are the advantages of having a registry that only
covers RCTs?
• Q: Is publication bias only a problem for RCTs?
• If not why not have a registry for non RCTs?
• Q: Should registration only be allowed before the start of a
study?
Data mining: what is it?
• Looking at the data many different ways, trying to
find the result you want
• If test impact of program on many different
outcomes, some will show positive (or negative)
impact by chance
• If test impact of program on many different
subgroups, some will show positive (or negative)
impact by chance
• We may be falsely accused of data mining
• e.g., we test one subgroup and report the results but
readers think maybe we tested many and only reported the
one that was significant
Preanalysis plans as a solution
• Write down in advance how the data will be analyzed
• What outcomes are of primary interest
• What subgroups will be examined
• Register the plan with some external organization
• When presenting results, show all those covered in the PAP
• highlight any deviations from or additions to the PAP
When is a PAP most useful?
• When a study has a large number of outcomes with no
obvious hierarchy of which are the most important
• When researchers know they are interested in differential
impact on different subgroups (heterogeneous treatment
effects)
• When researchers are concerned others will push them to
find positive impacts
• When want to adjust statistical tests for multiple hypothesis
tests
Incentives for immunization (no PAP)
• Program to increase immunization in rural India with two arms
• Predictable and reliable immunization camps
• Camps plus 1kg lentils for each shot and set of plates on
completion of full immunization
• One main outcome (immunization), more than one indicator
• Ever received any immunization
• % with full immunization
• Showing two indicators illustrated why the program worked
• Regular camps increased children with at least one shot
• Incentive increased % with full immunization
• Suggests unwillingness to immunize not main barrier to full
immunization, instead incentive helped maintain persistence
Results: number of immunizations
Source: Banerjee et al 2010
Results fully immunized
Source: Banerjee et al 2010
Community driven development (with PAP)
• Evaluation of the impact of a CDD program including:
• Quality and quantity of public goods
• Social capital
• Trust
• Participation
• Each outcome area had many indicators, e.g., many public
goods
• >300 outcome indicators in total
• Several important subgroups to examine:
• Women and youth primary targets
• Program implemented in two very different regions
When should a PAP be written?
• Before the baseline?
• Prevents us from learning about what questions have low response
rates or inconsistent answers
• Before the program starts?
• prevents researchers taking advantage of random shocks which
happen to mainly impact treatment or comparison
• Also prevents researchers thinking of new hypotheses, e.g. unthoughtof of negative consequences
• Before looking at any data?
• Can be useful to look at comparison data to determine appropriate
control variables
• drop variables where little chance of improvement (e.g., 95% of control
already do)
What should be covered in a PAP?
• Main outcome measures
• Which outcome measures are primary and which secondary
• The precise composition
• Subgroups to be analyzed
• Direction of expected impact if we want to use a one-sided
test
• Primary specification to be used
• What control variables to include
• How is the outcome variable specified, in logs, changes, etc.
Disadvantages of a PAP
• Any analysis that is not included in the PAP has less credibility
• Only do a PAP if you have the time to think it through carefully
• Sometimes patterns in the data tell consistent stories we never
thought of
• We might want the flexibility to pursue these
• With complex evaluations it can be hard to think through all
outcome combinations and how analysis should proceed with
each
• We may do different secondary analysis if the impact is positive vs.
negative
• One option is to do PAP in stages, look at some data, then write another
PAP