Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ISPOR Good Research Practices for Retrospective Database Analysis Task Force Comments received from Reviewer/Leadership/ISPOR membership: I. GOOD RESEARCH PRACTICES FOR COMPARATIVE EFFECTIVENESS RESEARCH: DEFINING, REPORTING AND INTERPRETING NON-RANDOMIZED STUDIES OF TREATMENT EFFECTS USING SECONDARY DATA SOURCES - Report of the ISPOR Retrospective Database Analysis Task Force – Part I Respondent #1 Regarding #1--line 96--Defining the question - it might be helpful to add a citation regarding the evolution to evidence based medicine Respondent #2 In Part I - It is better to have transition phrases for Line 325 and line 331 to connect the previous paragraph, otherwise I don't understand why these two paragraphs are here and what idea the authors want to present. Respondent #3 Line 53 I feel the sentence is bit too long. Line 58 The sentence could be revised and length reduced. Line 170 The grammar of the sentence needs to be revised. Line 181 It should be adequate number of patient? not numbers Line 479 Should be an expensive enterprise. Respondent #4 (page 5) “The alternative – waiting for perfect evidence – is usually not acceptable, because we never have perfect evidence and we are incapable (either due to cost or feasibility) to perform “gold standard” randomized clinical trials to answer the myriad questions posed for a forever-growing armamentarium of health care technologies.” Page 5: Not only incapable, but may not be able to answer the research question of interest. More explanation for the figure on page 17. Box highlighting the overall recommendations Line 6: add “RCT or prospective observational studies” Line 19: consider replacing “prospective” with “a-priori” Line 41: add “and safe” Line 42: add “or prospective observational studies” Line 47: add “relevance of study question, appropriateness of population and timeframe” Line 82: add “or prospective observational studies” Lines 84-93: arguments in favor of retrospective DB studies also apply to prospective observational studies Respondent #5 1. Throughout document: suggest limiting recommendation of justification of changes in analytical plan to only primary objective and some key secondary objectives 2. Throughout document: add “on secondary databases” to reference to observational studies 3. It would benefit the audience if the authors could first define their key terms such as “effectiveness research” “comparative effectiveness research ” as “secondary data sources.” “observational data” “epidemiologic studies” and to use these terms and definitions consistently across the three documents. Providing common definitions would also be extremely helpful to the field. It’s not clear whether these series is limited to “comparative effectiveness” or “effectiveness.” The authors seem to use “observational research” and “epidemiologic research” interchangeably. 4. In discussing the relative merits of observational studies versus RTCs, the authors should acknowledge that most evidence hierarchies consider RTCs the gold standard of evidence and rank case control and cohort studies further done on the evidence later. The main reason for this is that unmeasured or unobservable confounding brings into question the validity of observational studies. This does not mean that observational studies are not useful, it just acknowledges how they are currently being viewed and why. The role of unobservable confounding and the limited clinical information to control for confounding should be acknowledged early on in the documents. Respondent #6 1. Line 126. The decision of whether to use imperfect information is often framed as a question of weighing the potential harms and benefits given the level of the evidence. For example, the recommendation to drink 4 glasses of water a day may need less rigorous evidence that the recommendation to take a medication that influence multiple systems in the body. This context could be useful to add to the discussion of whether one should wait for “perfect evidence.” This is touched upon later in the document (page 17) but not in the context of “first do no harm.” In general, the tone seems trying to advocate for observational data and a more balanced presentation may be preferable to making this case. 2. Line 135. It’s important to acknowledge that the main reasons why health policy decision makers are reluctant to use observational databases is because of concerns about confounding and selection bias which are widely viewed to be more effectively addressed through randomization. 3. Line 179. The discussion of feasibility seems very important. Could this discussion be expanded to suggest that researchers consider the major threats to validity prior to conducting their study. For example, if the researchers are trying to compare two different medications that are typical prescribed to populations with different unmeasured severity characteristics, they should acknowledge upfront the threats of selection bias. Similarly, if the outcome measures available are only proximally related to the true outcome of interest, this limitation should also be acknowledged. 4. Line 249. The Interrupted time series design is typically referred to in economics as a “difference-indifference” approach where changes in the outcomes of interest following an event in the case group are compared to changes in the outcomes of interest in the control group. Alternatively they have been described as pre/post analyses with a contemporaneous control group. Respondent #7 Throughout document: suggest limiting recommendation of justification of changes in analytical plan to only primary objective and some key secondary objectives Respondent #8 1. 2. 3. 4. 5. 6. Throughout document: add “on secondary databases” to reference to observational studies Line 53 I feel the sentence is bit too long. Line 58 The sentence could be revised and length reduced. Line 170 The grammar of the sentence needs to be revised. Line 181 It should be adequate number of patient?..not numbers Line 479 pg 16 Should be an expensive enterprise. Respondent #9 1. The section entitled “Prospective Specification of Research and Protocol Development“, should be immediately followed by a subsection to present and explain the rationale for the hypotheses testing. That is common practice in hypothesis testing, especially in theses and dissertations, and it makes sense. 2. Line 166, section on Specificity. The authors indicate that the Objectives section is the appropriate place for justifying the use of the specific database. It seems that the Methods section would be an acceptable alternative. In fact, on page 12 the authors do exactly that for their proposed structured abstract. Therefore, I suggest inserting a statement to that effect. 3. Line 172, section on Novelty. Another reason for using a database would be that there have been conflicting or inconclusive results presented in the literature and the availability of large patient numbers could be useful to provide clarity on the research question. I suggest adding this information. 4. Line 179, under Feasibility. I suggest that there be a statement that authors of protocols and manuscripts provide a justification of the feasibility of answering the research question. 5. Line 196: It seems that it should read “at least one pre-specified primary endpoint”. 6. Line 228, Cross-Sectional Designs. Perhaps there needs to be an alternate terminology here, that is, prevalence based studies. Similarly, cohort studies may be characterized as “incidence based studies”. These studies are often used to estimate burden of illness or cost of treatment. These terms need to be included. 7. The structured abstract does not mention the population under study. Perhaps it could be done in describing the cohorts, but it is also an integral part of the research question. I suggest that it be mentioned there. 8. Under Methods, there is mention of treatment cohorts. That would appropriate in a cohort study, but not necessarily other models. I suggest changing the term to “treatment groups” or “study groups”. 9. Line 390. According to the Merriam-Webster dictionary (http://www.merriamwebster.com/dictionary/PROBATIVE), the word “PROBATIVE” has two meanings: 1 : serving to test or try : exploratory 2 : serving to prove : substantiating 10. The authors need to state which one they mean, or (better) to avoid big words with double meanings. I suggest another word such as “conclusive” or even “substantiating”. 11. Lines 416-420: The authors need to address clinical importance/relevance as opposed to statistical significance. They start to mention it, but I think it deserves more emphasis, particularly when there are huge numbers of patients being studied. I suggest that they need to define a priori what a clinically relevant difference would be, along with justification for the value(s) selected. For example, ...”based on the work of Frankenstein and Monster (1933), we a priori considered that 10,000 volts would represent a good shock value...” Respondent #10 1. Line 158. Since “data” are a plural word, replace “data has” with “data have” 2. Line 159: replace “information of long-term outcomes” with “information on long-term outcomes” 3. Line 216 seems to be a stray statement 4. Line 217: Perhaps we should be consistent (and more concise) with words. I suggest that “epidemiologic and econometric” should be preferred over “epidemiological and econometric”. If we say “econometric”, why should the parallel in our field not be “epidemiologic”? Some people say that epidemiology is not necessarily logical but does contain some logic...therein lies the justification! Respondent #11 Part I Pre-specification of analysis. I don't think it is a realistic possibility, both from a theoretical and pragmatic perspective. Theoretical -Data-mining: I feel sorry that data-mining is present as "dark force" science in this recommendation (line 154). The specificity of large database is that they can generate hypothesis that cannot be seen with the traditional glasses of clinical or outcomes research. Pragmatic In the absence of similar process of ICH guidelines, and regulatory framework, it will be easy to still do some data mining, and then develop a post hoc a priori analysis plan. Even in Europe where the GPRD database request a protocol and where a scientific committee needs to give its approval, there is no track of results of the analyses, so using US commercial database it would be easy to use the database to find some results, then define the specific research question and finally outsource the protocol and the analysis. I truly cannot see a pathway where real a priori protocol could be secure So in a nutshell, pre-specification is impossible to control in real life, casts shadow on valid techniques like data mining and at the end of the day will undermine the use of database. It is better to ask for more transparency on how the sample was finally selected from the original dataset Finally one can imagine some kind of audit- like from the FDA. Authors could commit to follow the ISPOR guidelines, and in exchange for this ISPOR label in the published paper, ISPOR could randomly select some published database or be committed by decision to re-run the analysis plan to QC methods& results- of course some funding would be needed-but I heard about a 1.1 billion funding as part of the stimulus bill for comparative effectiveness... At the end of the day the strength of the database is that there are observed data, so it prevent some of the risk of post hoc. Thus some systematic presentations of raw data (e.g. before adjustment by propensity score) would prevent the risk of spurious post hoc analysis. So more that pre specification, it is replication that is needed. Indeed all the requirement that are highlighted in the paper are sufficient (framing of research, reporting, interpretation). The prespecification would only give a sense a false security by mimicking the RCT process. But pre-specification is only possible in the artificial world of the RCT. After all, you can find some benefit by data mining, if the benefit truly exists, then it does not matter whether is was find by a priori thinking or post hoc Respondent #12 In my opinion, following points can be useful for the document: 1. Conflicts of interest of the data providers should be considered. 2. Inclusion of PICO format into abstract can provide clear picture of the study. Respondent #13 (1) line 26, "report out the results of the results of their pre-specified plan". Should that be "report out the results of their pre-specified plan"? (2) the table below line 358, the descriptor for item 6, "Discuss the potential for confounding, both measured and unmeasured, and how this was assessed and addressed". Should that be "Discuss the potential for confounding, both measured and unmeasured, and how this was assessed"? Respondent #14 How do we know the weight to give such a study in a given policy context? If there are multiple approaches and data sources to answer effectiveness questions, how do we decide which approaches are better or have more utility. The draft reports offer many thoughts with respect to these questions. We do not and will never precisely know the complicated interactions of medical interventions and medical conditions. We use scientific procedures and judgment to form the best picture we can from the information we have. Building a base of trust, best practices and appropriate expectations from such comparative effectiveness research will take time. I wanted to focus on the first report styled Good Research Practices for Defining, Reporting and Interpreting Non-randomized Studies of Treatment Effects Using Secondary Databases. This is a very good draft and I have a few suggestions below. Overall, I think there is more work to do with respect to the questions I asked above. In particular, I would like to work with ISPOR regarding measures and criteria to evaluate more successful research approaches. Respondent #15 On page 3 lines 40-42 the draft states: “Large computerized databases with millions of observations of drug treatment and health outcomes can be used to assess which drugs are most effective in routine care without long delays and the prohibitive costs of most RCTs.” I understand this is to emphasize certain points about utility, but, at this juncture is there enough evidence to show that outcomes research can assess which drugs are most effective in routine care? Perhaps “may be helpful in assessing…” On page 5 lines 138-140 the draft states “This distrust is derived, at least in part, from the lack of generally accepted good research practices and lack of standardized reporting; it is also due to discordance of the results in examination of clinical effectiveness between some observational studies and randomized control trials.” You could add there is not a body of studies showing reproducibility of data base observational studies across data environments and the view among providers that such studies oversimplify the elements of the care process and impact of other factors. On page 6 lines 172-177 the draft states: “3) Novelty: Ideally, there should be an absence of literature that directly relates to the proposed study question thereby making the proposed research question novel. Alternatively, the proposed study design for the given research question should be superior to previously used design where previous research has been conducted and whose findings are conflicting or questioned because of poor study design. As the number of well-designed studies addressing a specific question whose findings are consistent with each other increases, the value of an additional study addressing this question diminishes.” Because I believe we need to establish a body of work to establish the validity, reproducibility and reliability of data base research of this type, I am not sure novelty or absence of literature that relates to the issue is a good factor right now. It may be helpful to have a body of work that evaluates these research tools in the context of existing studies so the research tools can be evaluated. On several pages the draft makes a point that ex ante judgments and modifications may compromise the value of a study for hypothesis testing. This may be so, but (a) I don’t see particular support on this point in this context and (b) there is a lot to be gained in midcourse corrections. There are many issues that arise in data base research that are not the same and not under the same level of control or planning as a clinical trial. One of the good things about observational data base research is to easily do multiple runs with different approaches. These additional runs can add to the body of knowledge as long as the steps are transparent. On page 10 lines 320-323 the draft states: “Medical records data may provide more extensive data for comorbidity adjustment for research studies that may be particularly susceptible to selection bias whereas administrative claims data, if considerably larger in numbers of patients captured, may be better suited for research questions that involve rare outcomes.” Respondent #16 I have been worried about expectations regarding rare outcomes in the context of this research. It is true that larger data bases are helpful but there are so many confounding variables and inaccurate measures that the focus needs to be on more robust signals. Reference to rare outcomes suggests outcomes that may be two or three orders of magnitude below baseline numbers. Given the state of the studies, we do not really know whether it is feasible to get to rare outcomes. I would be very interested in hearing the groups thoughts on these points and happy to talk about the draft. Again, great work, and I look forward to working with you and the group on this and other issues. Respondent #17 Page 6, line 173: yes, it is true that "replication" studies are of diminished value as they increase in number, but that is not in itself a reason not to replicate methods in multiple data sources, particularly if the burden of evidence has not yet of sufficient volume. Page 13, line 378: all of these possible explanations indicate flaws in the observational study not present in the RCT. The most well-designed, robust RCTs still cause concern regarding generalizability, long-term treatment effects, etc. that often represent the impetus for conducting retrospective observational studies in the first place. So, there may be a flaw in trial design or breadth that the observational study may be intended to correct, and will by definition conflict with the RCT on. Page 14, line 416: an illustrative example would be the case of statistically-significant differences that are a product of the large sample sizes available in many databases, but are not of any clinical import. Page 17, line 514 and beyond: the biggest problem with the document. The figure presented is one developed by my employer, and it is outdated and has been in place. In addition, a conversation between the document's primary author and our president and founder (Steve Pearson) concerning the interpretation of the figure and its attribution needs to occur before this appears in hard copy. Respondent #18 In line 60 and 61, after the following sentence: We believe that more discipline and transparency are required. It is not clear if the following sentence is a question or an affirmation that the researches are making. Please clarify. In the section that talks about the establishment of the research questions. For me, it is not well specified (or maybe it is a little bit confusing) what it is important to consider when designing a research question. It may be worth trying to put bullets and detail the steps in the construction of a study research question. This may clarify the context and the information may flow more easily. Line 145 The report might be enhanced if we give some guidance on whether the research question be consistent with economic theory. Line 162 I believe, it should be meaningful but needs not be topical. Some interesting research questions are those that have been answered previously using crude and perhaps inappropriate methodologies or datasets. Line 169 This part seems more appropriately discussed within the methodology and data sections, since the rationale for the study or research question may not stem from limitations attributable to current or previous secondary databases used in answering the research question. Line 172 Again, it should be meaningful but it needs not be novel. For instance, a majority of research questions using survival analysis, almost always relies on Cox PH. However evidence abounds to suggest that this approach and the inherent assumptions by and large do not hold. A recommendation on novel research questions rests on the assumption that previous work examining similar research questions are without errors. Line 175 At first glance an accumulation of “well-designed studies” may seem to confirm what we already know. However, given the likelihood of rejection by journals of studies which buck a predetermined trend (publication bias) and the fact that previous research are consulted by analysts in our industry, such a recommendation may curtail researchers zeal to push the frontiers of research. Line 216 It would have been helpful to provide/suggest a few econometrics methods appropriate to each selected study design, the appropriate type of dataset given particular types of dependent variables. Line 331-336 A recommendation on how to handle missing data would be helpful. Line 377 Before proceeding to the interpretation of the results, the Task Force Report and recommendations may be enhanced if it were to touch on model specification, identification of the equation to be estimated, assumptions of the statistical model, distribution of variables and error term, the effects of past realization of certain variables (DV or IV), fit of the model, the types of test to assure the reader that some of these issues have been addressed in the analysis. Line 416 This may not be necessarily accurate. If point estimates of the effects of two treatments are not clinically compelling it does not necessarily imply the estimates are suspect. Clinically compelling effect might depend on the direction of change and the baseline value. Line 371 The expected magnitude of potential confounding variables may not be feasible to estimate, ex ante without performing a regression analysis or deriving an approximation through the use of simulation techniques (stochastic uncertainty propagation). The magnitude and the directionality may depend on how the RHS variables relate to the LHS variable. I believe, one may only be able to describe what these confounders might be and their expected signs. Respondent #19 While several of the references are not familiar to me (perhaps because I am new to this group), the document is clear, concise and accessible to the reader. The guidelines content on reporting observational studies gives explanations of these data sources, however guidance could also include relative strengths and weaknesses of using such data, both from a qualitative and quantitative standpoint. For example, from my experience of Case-Control Designs (line 242) in assessing diagnostic test accuracy, flexibility exist on which comparative measures are taken (however I accept this may not be within the scope of this paper). In the section on interpretation (377), the line of thought becomes confusing and may raises questions of potential bias through allowing judgement. Perhaps some elaboration on how a 'conflict' (379) are determined (is this a judgement call or should there be some statistical basis?). Standards of good practice could be in the framework of idea, standard or minimum. The reader may also benefit from some indication of how the process could be standardised in the future could be useful may also be useful. Respondent #20 Comments for report 1: This report validates its authenticity as it is prepared by expert members of task force and study was supervised by them. I recommend some more points should be added and discussed in the conclusion of this study. Focus should be given more on data collection and analysis from primary data collection method to support the study. Any supportive case study would have been added more value to this report. New approaches discussed here for interpreting secondary data are impressive. Respondent #21 Please find below my remarks on the well written text on secondary databases and then I consider the benefits of the principles laid down in the text as tremendous! It is stated ( e.g. sentence 152-154 in text 1) that the use of protocols and a priori hypotheses will assure end users that the results were not the product of data mining: to me, this is 1 essential step but assurance can only be given if QC/QA policies are implemented, just like for clinical research. Stating that standards were used / are to be used might not be enough in the 'real world'. It may be worthwhile to mention that authorities may also be reluctant to provide access to the best data available to them which in turn makes it difficult or even impossible for an applicant to provide the best analysis. Limits beyond which results become hypothesis generating (sentence 206): unless I did miss important information elsewhere in the text, isn't this de facto the case for observational studies, no matter the complex statistical adjustments made for known prognostic factors but not for unknown factors?