Download Comments

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
ISPOR Good Research Practices for Retrospective Database Analysis
Task Force
Comments received from Reviewer/Leadership/ISPOR membership:
I. GOOD RESEARCH PRACTICES FOR COMPARATIVE EFFECTIVENESS RESEARCH: DEFINING, REPORTING
AND INTERPRETING NON-RANDOMIZED STUDIES OF TREATMENT EFFECTS USING SECONDARY DATA
SOURCES - Report of the ISPOR Retrospective Database Analysis Task Force – Part I
Respondent #1
Regarding #1--line 96--Defining the question - it might be helpful to add a citation regarding the
evolution to evidence based medicine
Respondent #2
In Part I - It is better to have transition phrases for Line 325 and line 331 to connect the previous
paragraph, otherwise I don't understand why these two paragraphs are here and what idea the authors
want to present.
Respondent #3
Line 53 I feel the sentence is bit too long.
Line 58 The sentence could be revised and length reduced.
Line 170 The grammar of the sentence needs to be revised.
Line 181 It should be adequate number of patient? not numbers
Line 479 Should be an expensive enterprise.
Respondent #4
(page 5) “The alternative – waiting for perfect evidence – is usually not acceptable, because we never
have perfect evidence and we are incapable (either due to cost or feasibility) to perform “gold standard”
randomized clinical trials to answer the myriad questions posed for a forever-growing armamentarium
of health care technologies.”
Page 5: Not only incapable, but may not be able to answer the research question of interest.
More explanation for the figure on page 17. Box highlighting the overall recommendations
Line 6: add “RCT or prospective observational studies”
Line 19: consider replacing “prospective” with “a-priori”
Line 41: add “and safe”
Line 42: add “or prospective observational studies”
Line 47: add “relevance of study question, appropriateness of population and timeframe”
Line 82: add “or prospective observational studies”
Lines 84-93: arguments in favor of retrospective DB studies also apply to prospective observational
studies
Respondent #5
1. Throughout document: suggest limiting recommendation of justification of changes in analytical
plan to only primary objective and some key secondary objectives
2. Throughout document: add “on secondary databases” to reference to observational studies
3. It would benefit the audience if the authors could first define their key terms such as “effectiveness
research” “comparative effectiveness research ” as “secondary data sources.” “observational data”
“epidemiologic studies” and to use these terms and definitions consistently across the three
documents. Providing common definitions would also be extremely helpful to the field. It’s not
clear whether these series is limited to “comparative effectiveness” or “effectiveness.” The authors
seem to use “observational research” and “epidemiologic research” interchangeably.
4. In discussing the relative merits of observational studies versus RTCs, the authors should
acknowledge that most evidence hierarchies consider RTCs the gold standard of evidence and rank
case control and cohort studies further done on the evidence later. The main reason for this is that
unmeasured or unobservable confounding brings into question the validity of observational studies.
This does not mean that observational studies are not useful, it just acknowledges how they are
currently being viewed and why. The role of unobservable confounding and the limited clinical
information to control for confounding should be acknowledged early on in the documents.
Respondent #6
1. Line 126. The decision of whether to use imperfect information is often framed as a question of
weighing the potential harms and benefits given the level of the evidence. For example, the
recommendation to drink 4 glasses of water a day may need less rigorous evidence that the
recommendation to take a medication that influence multiple systems in the body. This context
could be useful to add to the discussion of whether one should wait for “perfect evidence.” This is
touched upon later in the document (page 17) but not in the context of “first do no harm.” In
general, the tone seems trying to advocate for observational data and a more balanced presentation
may be preferable to making this case.
2. Line 135. It’s important to acknowledge that the main reasons why health policy decision makers
are reluctant to use observational databases is because of concerns about confounding and
selection bias which are widely viewed to be more effectively addressed through randomization.
3. Line 179. The discussion of feasibility seems very important. Could this discussion be expanded to
suggest that researchers consider the major threats to validity prior to conducting their study. For
example, if the researchers are trying to compare two different medications that are typical
prescribed to populations with different unmeasured severity characteristics, they should
acknowledge upfront the threats of selection bias. Similarly, if the outcome measures available are
only proximally related to the true outcome of interest, this limitation should also be acknowledged.
4. Line 249. The Interrupted time series design is typically referred to in economics as a “difference-indifference” approach where changes in the outcomes of interest following an event in the case
group are compared to changes in the outcomes of interest in the control group. Alternatively they
have been described as pre/post analyses with a contemporaneous control group.
Respondent #7
Throughout document: suggest limiting recommendation of justification of changes in analytical plan to
only primary objective and some key secondary objectives
Respondent #8
1.
2.
3.
4.
5.
6.
Throughout document: add “on secondary databases” to reference to observational studies
Line 53 I feel the sentence is bit too long.
Line 58 The sentence could be revised and length reduced.
Line 170 The grammar of the sentence needs to be revised.
Line 181 It should be adequate number of patient?..not numbers Line 479
pg 16 Should be an expensive enterprise.
Respondent #9
1. The section entitled “Prospective Specification of Research and Protocol Development“, should be
immediately followed by a subsection to present and explain the rationale for the hypotheses
testing. That is common practice in hypothesis testing, especially in theses and dissertations, and it
makes sense.
2. Line 166, section on Specificity. The authors indicate that the Objectives section is the appropriate
place for justifying the use of the specific database. It seems that the Methods section would be an
acceptable alternative. In fact, on page 12 the authors do exactly that for their proposed structured
abstract. Therefore, I suggest inserting a statement to that effect.
3. Line 172, section on Novelty. Another reason for using a database would be that there have been
conflicting or inconclusive results presented in the literature and the availability of large patient
numbers could be useful to provide clarity on the research question. I suggest adding this
information.
4. Line 179, under Feasibility. I suggest that there be a statement that authors of protocols and
manuscripts provide a justification of the feasibility of answering the research question.
5. Line 196: It seems that it should read “at least one pre-specified primary endpoint”.
6. Line 228, Cross-Sectional Designs. Perhaps there needs to be an alternate terminology here, that is,
prevalence based studies. Similarly, cohort studies may be characterized as “incidence based
studies”. These studies are often used to estimate burden of illness or cost of treatment. These
terms need to be included.
7. The structured abstract does not mention the population under study. Perhaps it could be done in
describing the cohorts, but it is also an integral part of the research question. I suggest that it be
mentioned there.
8. Under Methods, there is mention of treatment cohorts. That would appropriate in a cohort study,
but not necessarily other models. I suggest changing the term to “treatment groups” or “study
groups”.
9. Line 390. According to the Merriam-Webster dictionary (http://www.merriamwebster.com/dictionary/PROBATIVE), the word “PROBATIVE” has two meanings:
1 : serving to test or try : exploratory
2 : serving to prove : substantiating
10. The authors need to state which one they mean, or (better) to avoid big words with double
meanings. I suggest another word such as “conclusive” or even “substantiating”.
11. Lines 416-420: The authors need to address clinical importance/relevance as opposed to statistical
significance. They start to mention it, but I think it deserves more emphasis, particularly when there
are huge numbers of patients being studied. I suggest that they need to define a priori what a
clinically relevant difference would be, along with justification for the value(s) selected. For
example, ...”based on the work of Frankenstein and Monster (1933), we a priori considered that
10,000 volts would represent a good shock value...”
Respondent #10
1. Line 158. Since “data” are a plural word, replace “data has” with “data have”
2. Line 159: replace “information of long-term outcomes” with “information on long-term outcomes”
3. Line 216 seems to be a stray statement
4. Line 217: Perhaps we should be consistent (and more concise) with words. I suggest that
“epidemiologic and econometric” should be preferred over “epidemiological and econometric”. If we
say “econometric”, why should the parallel in our field not be “epidemiologic”? Some people say that
epidemiology is not necessarily logical but does contain some logic...therein lies the justification!
Respondent #11
Part I Pre-specification of analysis.
I don't think it is a realistic possibility, both from a theoretical and pragmatic perspective.
Theoretical
-Data-mining: I feel sorry that data-mining is present as "dark force" science in this recommendation
(line 154). The specificity of large database is that they can generate hypothesis that cannot be seen
with the traditional glasses of clinical or outcomes research.
Pragmatic
In the absence of similar process of ICH guidelines, and regulatory framework, it will be easy to still do
some data mining, and then develop a post hoc a priori analysis plan. Even in Europe where the GPRD
database request a protocol and where a scientific committee needs to give its approval, there is no
track of results of the analyses, so using US commercial database it would be easy to use the database to
find some results, then define the specific research question and finally outsource the protocol and the
analysis. I truly cannot see a pathway where real a priori protocol could be secure
So in a nutshell, pre-specification is impossible to control in real life, casts shadow on valid techniques
like data mining and at the end of the day will undermine the use of database. It is better to ask for
more transparency on how the sample was finally selected from the original dataset
Finally one can imagine some kind of audit- like from the FDA. Authors could commit to follow the
ISPOR guidelines, and in exchange for this ISPOR label in the published paper, ISPOR could randomly
select some published database or be committed by decision to re-run the analysis plan to QC
methods& results- of course some funding would be needed-but I heard about a 1.1 billion funding as
part of the stimulus bill for comparative effectiveness...
At the end of the day the strength of the database is that there are observed data, so it prevent some of
the risk of post hoc. Thus some systematic presentations of raw data (e.g. before adjustment by
propensity score) would prevent the risk of spurious post hoc analysis.
So more that pre specification, it is replication that is needed. Indeed all the requirement that are
highlighted in the paper are sufficient (framing of research, reporting, interpretation). The prespecification would only give a sense a false security by mimicking the RCT process. But pre-specification
is only possible in the artificial world of the RCT. After all, you can find some benefit by data mining, if
the benefit truly exists, then it does not matter whether is was find by a priori thinking or post hoc
Respondent #12
In my opinion, following points can be useful for the document:
1. Conflicts of interest of the data providers should be considered.
2. Inclusion of PICO format into abstract can provide clear picture of the study.
Respondent #13
(1) line 26, "report out the results of the results of their pre-specified plan". Should that be "report out
the results of their pre-specified plan"?
(2) the table below line 358, the descriptor for item 6, "Discuss the potential for confounding, both
measured and unmeasured, and how this was assessed and addressed". Should that be "Discuss the
potential for confounding, both measured and unmeasured, and how this was assessed"?
Respondent #14
How do we know the weight to give such a study in a given policy context?
If there are multiple approaches and data sources to answer effectiveness questions, how do we decide
which approaches are better or have more utility.
The draft reports offer many thoughts with respect to these questions. We do not and will never
precisely know the complicated interactions of medical interventions and medical conditions. We use
scientific procedures and judgment to form the best picture we can from the information we have.
Building a base of trust, best practices and appropriate expectations from such comparative
effectiveness research will take time.
I wanted to focus on the first report styled Good Research Practices for Defining, Reporting and
Interpreting Non-randomized Studies of Treatment Effects Using Secondary Databases. This is a very
good draft and I have a few suggestions below. Overall, I think there is more work to do with respect to
the questions I asked above. In particular, I would like to work with ISPOR regarding measures and
criteria to evaluate more successful research approaches.
Respondent #15
On page 3 lines 40-42 the draft states:
“Large computerized databases with millions of observations of drug treatment and health outcomes
can be used to assess which drugs are most effective in routine care without long delays and the
prohibitive costs of most RCTs.”
I understand this is to emphasize certain points about utility, but, at this juncture is there enough
evidence to show that outcomes research can assess which drugs are most effective in routine care?
Perhaps “may be helpful in assessing…”
On page 5 lines 138-140 the draft states
“This distrust is derived, at least in part, from the lack of generally accepted good research practices and
lack of standardized reporting; it is also due to discordance of the results in examination of clinical
effectiveness between some observational studies and randomized control trials.”
You could add there is not a body of studies showing reproducibility of data base observational studies
across data environments and the view among providers that such studies oversimplify the elements of
the care process and impact of other factors.
On page 6 lines 172-177 the draft states:
“3) Novelty: Ideally, there should be an absence of literature that directly relates to the proposed study
question thereby making the proposed research question novel. Alternatively, the proposed study
design for the given research question should be superior to previously used design where previous
research has been conducted and whose findings are conflicting or questioned because of poor study
design. As the number of well-designed studies addressing a specific question whose findings are
consistent with each other increases, the value of an additional study addressing this question
diminishes.”
Because I believe we need to establish a body of work to establish the validity, reproducibility and
reliability of data base research of this type, I am not sure novelty or absence of literature that relates to
the issue is a good factor right now. It may be helpful to have a body of work that evaluates these
research tools in the context of existing studies so the research tools can be evaluated.
On several pages the draft makes a point that ex ante judgments and modifications may compromise
the value of a study for hypothesis testing.
This may be so, but (a) I don’t see particular support on this point in this context and (b) there is a lot to
be gained in midcourse corrections. There are many issues that arise in data base research that are not
the same and not under the same level of control or planning as a clinical trial. One of the good things
about observational data base research is to easily do multiple runs with different approaches. These
additional runs can add to the body of knowledge as long as the steps are transparent.
On page 10 lines 320-323 the draft states:
“Medical records data may provide more extensive data for comorbidity adjustment for research studies
that may be particularly susceptible to selection bias whereas administrative claims data, if considerably
larger in numbers of patients captured, may be better suited for research questions that involve rare
outcomes.”
Respondent #16
I have been worried about expectations regarding rare outcomes in the context of this research. It is
true that larger data bases are helpful but there are so many confounding variables and inaccurate
measures that the focus needs to be on more robust signals. Reference to rare outcomes suggests
outcomes that may be two or three orders of magnitude below baseline numbers. Given the state of
the studies, we do not really know whether it is feasible to get to rare outcomes.
I would be very interested in hearing the groups thoughts on these points and happy to talk about the
draft. Again, great work, and I look forward to working with you and the group on this and other issues.
Respondent #17
Page 6, line 173: yes, it is true that "replication" studies are of diminished value as they increase in
number, but that is not in itself a reason not to replicate methods in multiple data sources, particularly if
the burden of evidence has not yet of sufficient volume.
Page 13, line 378: all of these possible explanations indicate flaws in the observational study not
present in the RCT. The most well-designed, robust RCTs still cause concern regarding generalizability,
long-term treatment effects, etc. that often represent the impetus for conducting retrospective
observational studies in the first place. So, there may be a flaw in trial design or breadth that the
observational study may be intended to correct, and will by definition conflict with the RCT on.
Page 14, line 416: an illustrative example would be the case of statistically-significant differences that
are a product of the large sample sizes available in many databases, but are not of any clinical import.
Page 17, line 514 and beyond: the biggest problem with the document. The figure presented is one
developed by my employer, and it is outdated and has been in place. In addition, a conversation
between the document's primary author and our president and founder (Steve Pearson) concerning
the interpretation of the figure and its attribution needs to occur before this appears in hard copy.
Respondent #18
In line 60 and 61, after the following sentence: We believe that more discipline and transparency are
required. It is not clear if the following sentence is a question or an affirmation that the researches are
making. Please clarify.
In the section that talks about the establishment of the research questions. For me, it is not well
specified (or maybe it is a little bit confusing) what it is important to consider when designing a research
question. It may be worth trying to put bullets and detail the steps in the construction of a study
research question. This may clarify the context and the information may flow more easily.
Line 145 The report might be enhanced if we give some guidance on whether the research question be
consistent with economic theory.
Line 162 I believe, it should be meaningful but needs not be topical. Some interesting research
questions are those that have been answered previously using crude and perhaps inappropriate
methodologies or datasets.
Line 169 This part seems more appropriately discussed within the methodology and data sections, since
the rationale for the study or research question may not stem from limitations attributable to current or
previous secondary databases used in answering the research question.
Line 172 Again, it should be meaningful but it needs not be novel. For instance, a majority of research
questions using survival analysis, almost always relies on Cox PH. However evidence abounds to suggest
that this approach and the inherent assumptions by and large do not hold. A recommendation on novel
research questions rests on the assumption that previous work examining similar research questions are
without errors.
Line 175 At first glance an accumulation of “well-designed studies” may seem to confirm what we
already know. However, given the likelihood of rejection by journals of studies which buck a
predetermined trend (publication bias) and the fact that previous research are consulted by analysts in
our industry, such a recommendation may curtail researchers zeal to push the frontiers of research.
Line 216 It would have been helpful to provide/suggest a few econometrics methods appropriate to
each selected study design, the appropriate type of dataset given particular types of dependent
variables.
Line 331-336 A recommendation on how to handle missing data would be helpful.
Line 377 Before proceeding to the interpretation of the results, the Task Force Report and
recommendations may be enhanced if it were to touch on model specification, identification of the
equation to be estimated, assumptions of the statistical model, distribution of variables and error term,
the effects of past realization of certain variables (DV or IV), fit of the model, the types of test to assure
the reader that some of these issues have been addressed in the analysis.
Line 416 This may not be necessarily accurate. If point estimates of the effects of two treatments are not
clinically compelling it does not necessarily imply the estimates are suspect. Clinically compelling effect
might depend on the direction of change and the baseline value.
Line 371 The expected magnitude of potential confounding variables may not be feasible to estimate, ex
ante without performing a regression analysis or deriving an approximation through the use of
simulation techniques (stochastic uncertainty propagation). The magnitude and the directionality may
depend on how the RHS variables relate to the LHS variable. I believe, one may only be able to describe
what these confounders might be and their expected signs.
Respondent #19
While several of the references are not familiar to me (perhaps because I am new to this group), the
document is clear, concise and accessible to the reader. The guidelines content on reporting
observational studies gives explanations of these data sources, however guidance could also include
relative strengths and weaknesses of using such data, both from a qualitative and quantitative
standpoint. For example, from my experience of Case-Control Designs (line 242) in assessing diagnostic
test accuracy, flexibility exist on which comparative measures are taken (however I accept this may not
be within the scope of this paper). In the section on interpretation (377), the line of thought becomes
confusing and may raises questions of potential bias through allowing judgement. Perhaps some
elaboration on how a 'conflict' (379) are determined (is this a judgement call or should there be some
statistical basis?). Standards of good practice could be in the framework of idea, standard or minimum.
The reader may also benefit from some indication of how the process could be standardised in the
future could be useful may also be useful.
Respondent #20
Comments for report 1:
This report validates its authenticity as it is prepared by expert members of task force and study was
supervised by them.
I recommend some more points should be added and discussed in the conclusion of this study. Focus
should be given more on data collection and analysis from primary data collection method to support
the study. Any supportive case study would have been added more value to this report. New
approaches discussed here for interpreting secondary data are impressive.
Respondent #21
Please find below my remarks on the well written text on secondary databases and then I consider the
benefits of the principles laid down in the text as tremendous!
It is stated ( e.g. sentence 152-154 in text 1) that the use of protocols and a priori hypotheses will assure
end users that the results were not the product of data mining: to me, this is 1 essential step but
assurance can only be given if QC/QA policies are implemented, just like for clinical research. Stating
that standards were used / are to be used might not be enough in the 'real world'.
It may be worthwhile to mention that authorities may also be reluctant to provide access to the best
data available to them which in turn makes it difficult or even impossible for an applicant to provide the
best analysis.
Limits beyond which results become hypothesis generating (sentence 206): unless I did miss important
information elsewhere in the text, isn't this de facto the case for observational studies, no matter the
complex statistical adjustments made for known prognostic factors but not for unknown factors?