Download Measurement and Causality in Medical Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fetal origins hypothesis wikipedia , lookup

Heritability of IQ wikipedia , lookup

Transcript
Measurement and Causality in
Medical Science
Alex Broadbent, Olaf Dammann, Leah McClimans, Zinhle Mncube,
Benjamin Smart
Summary
Medical science seeks to quantify various phenomena that are hard to quantify.
Among these, the quantitative measurement of “causal effects” of exposures or
treatments on health outcomes is particularly interesting from a philosophical
perspective, since very little philosophical work seeks to understand how
causation or related phenomena could be quantifiable. Dismissing such
measures out of hand as meaningless is irresponsible given their centrality in
medical research. This symposium makes a start on identifying and resolving
some of the conceptual difficulties the medical sciences face in devising
meaningful causal measures and understanding them.
These questions are pursued with particular reference to diagnostic tests,
measures of attributability, heritability, and the role of measurement in evidence
based medicine. The participants in the symposium include both philosophers of
science and working scientists.
Topic
Measurement in the medical sciences faces several interesting challenges. Many
of the phenomena that medical science needs to measure are hard to quantify. In
particular, quantitative measures of causal phenomena pose conceptual
difficulties. Many philosophical treatments of causation consider only
propositions concerning the presence or absence of causation (and typically
singular causation at that), propositions characterised by the form “C causes E”.
But in the medical sciences, it is common to find quantitative causal claims.
These have roughly the form “C has n effect on E”. Correspondingly, much
medical research is directed, not merely at finding out whether C has some effect
on E, but what the value of n is.
Thus, for example, in drug trials, the aim is not merely to discover whether the
drug has a positive effect on the outcome, but how large that effect is. In
assessing the evidence for a causal link between Zika virus and microcephaly, the
question is not merely whether there is a causal link (whether Zika features in
the causal history of some cases) but how “strong” or “large” the effect is. In
considering the long term effects of breast-feeding on IQ, the question is not
merely how strong the evidence is for a causal effect but how large the effect is.
Stronger evidence for a smaller effect may be of considerably less interest for
medical or public health purposes than weaker evidence for a larger effect.
1
The idea of measuring causality is not one that fits readily into standard
philosophical frameworks for thinking about causation; indeed from a
philosophical perspective it is tempting to dismiss the idea as senseless. But a
more constructive response would be to seek to make sense of what medical
research scientists are seeking to achieve.
Perhaps because of the lack of philosophical work on the topic, the medical
science literature, notably in epidemiology, has proceeded to devise its own
ways of thinking about and expressing quantitative measures of causality. These
developments encounter difficulties that are ripe for philosophical attention.
This symposium explores a collection of these difficulties concerning
measurement and causation, especially as they appear in epidemiology, clinical
medicine, and genetics.
Measurement
In clinical epidemiology, the validity of a diagnostic test is established in
comparison to a gold standard, usually a reference method with known validity.
Olaf Dammann has recently addressed the qualitative consequences of
“Fletcher’s Paradox”, the case where a new test looks better than the gold
standard although it is less accurate. He suggests a set of formulas to calculate
the degree of congruence (co-positivity and co-negativity) of two tests (i.e., new
and gold standard) in a simulated scenario where the true disease status is
known. At this symposium, he will provide and discuss examples based on these
congruence formulas and explore consequences for the theory of measurement
in medical test validation.
In addition to methodological issues (such as those addressed by Dammann),
‘measurement’ is of particular importance to a number of topics in the
philosophy of medicine. In recent years, for example, many philosophers have
discussed in detail the virtues and vices of evidence based medicine (EBM).
These debates have primarily focused on whether EBM is, in practice, good for
clinical medicine; that is, whether the move towards evidence based practice has
in any way improved the wellbeing of patients. But few, if any, have discussed
the connection between measurement and evidence that EBM brings to light.
Smart’s contribution to this symposium is intended to go some way towards
correcting this lack.
Causation
Causation and causal inference are of fundamental importance to both
epidemiology, and research in genetic heritability. The former, since, in essence,
epidemiology is the study of the distribution and determinants of disease. The
latter, since the primary goal of work in heritability is to establish the strength of
the causal relationship between genotypic and phenotypic differences.
One popular approach to causal inference in epidemiology is the counterfactual,
or ‘potential outcomes approach’. This strategy of effect-measurement requires
the comparison of the outcome of an ‘actual’ study, to that of some hypothetical
scenario. The difference between the two is the inferred ‘strength’ of the causal
connection between the intervention and the outcome. Some have argued,
2
however, that the intervention must be ‘well specified’ in order for the causal
inference to be justified. This requirement is in stark contrast to the way
epidemiology has been practiced in the last few decades, and does not represent
the methodology employed in some of its defining discoveries, including the
identification of smoking as a cause of lung cancer. These recent developments
have thus given rise to a current debate, with which Broadbent’s contribution to
this symposium engages.
For many years, given the relationship between heritability and environmental
factors, philosophers of science have agreed that that heritability estimates do
not indicate the causal strength of genes on phenotypic variance. However, in
this symposium, Mncube will argue that heritability estimates can bear a causal
interpretation when: (a) there is no statistical gene-environment interaction, (b)
there is small to no gene-environment correlation, and (c) only within the
domain of populations that have similar causally salient features.
Papers
Fletchers’ Paradox: Effects of a Not So Golden Gold Standard on Measures of
Diagnostic Test Validity
Olaf Dammann
In clinical epidemiology, the validity of a diagnostic test is established in
comparison to a gold standard, usually a reference method with known validity.
The comparison is performed by measuring the new test’s sensitivity, specificity,
positive and negative predictive values, as well as its accuracy (Figure).
I have previously outlined
the qualitative consequences
of “Fletcher’s Paradox”, the
case where a new test looks
better than the gold
standard although it is less
accurate. I have also
suggested a set of formulas
to calculate the degree of
congruence (co-positivity and co-negativity) of two tests (i.e., new and gold
standard) in a simulated scenario where the true disease status is known. In this
presentation, I will provide and discuss examples based on these congruence
formulas and explore consequences for the theory of measurement in medical
test validation.
How Evidence Based Medicine brings the Connection between Evidence and
Measurement into Focus
Benjamin Smart
3
Proponents of evidence based medicine (EBM) take the evidence provided by
epidemiological studies to provide better justification for clinical decisions, than
expert opinion/intuition or ‘mechanistic reasoning’. EBM thus dictates that
whether or not an individual should be treated in a particular way, depends
upon whether or not clinical trials have shown the treatment in question to be
beneficial, effective, and cost-efficient, relative to the alternatives. This strategy,
it is argued, limits the risk of prescribing inefficient and/or harmful treatments,
improving overall mortality and morbidity rates, as well as quality-of-life.
EBM is often associated with a hierarchy of evidence. At the top of the hierarchy
sit randomised controlled trials (RCTs), followed by observational studies such
as case-control and cohort studies. At the bottom of the evidence hierarchy are
mechanistic reasoning, clinical judgement and expert opinion (Howick 2011;
Cartwright 2007). Epidemiological studies such as RCTs and observational
studies are grounded by statistics/quantitative data, which, of course, one can
only obtain through ‘measurement’; it is unsurprising, then, that McClimans has
suggested that the ‘“paradigm shift” to EBM is not so much a shift toward the
reliance on evidence as it is a shift toward reliance on measurement.” (2013,
521).
Given the role of measurement in EBM, it is clear that the methods used when
collecting data for epidemiological studies, and the nature of the data itself, is of
fundamental importance - only when the appropriate measures are employed,
and the appropriate data is collected, will the evidence used in EBM be good
evidence. In this paper I examine the types of evidence EBM takes to be of
primary importance, and highlight the relationship between evidence and
measurement that EBM brings into focus.
How Much Mortality Does Obesity Cause? Measuring Causality in
Populations
Alex Broadbent
In 2008 Miguel and Hernán and Lisa Taubman argued that seeking to estimate
the effect of obesity on mortality in a population was meaningless because
different ways of intervening to reduce obesity would result in different changes
in mortality. They argued this in a journal largely devoted to publishing
estimates of this kind. Their point is part of a larger methodological movement in
some parts of epidemiology, aiming to tighten up talk of causality by insisting
that causal effects always be specified relative to some contemplated
intervention, even where the study is observational and no actual intervention
has occurred. The methodological questions raised by the Potential Outcomes
Approach have sparked considerable discussion concerning causal inference
(VanderWeele and Hernán 2012; VanderWeele and Robinson 2014; Glymour
and Glymour 2014; Broadbent 2015; Vandenbroucke, Broadbent and Pearce,
2016). In this paper, however, I focus on the measurement question raised so
effectively by Hernán and Taubman. The question is whether measures of the
proportion of an outcome that is attributable to a certain causal factor are
meaningful, and if so, what they mean.
4
A popular but inaccurate way to understand such measures is as telling us what
proportion of an outcome would disappear if the exposure were absent. This is
inaccurate partly because it is vague, as Hernán and Taubman point out:
different replacements for the exposure would result in different outcomes. It is
also inaccurate because some replacements might cause the outcome as well,
yielding an underestimate of the proportion of the outcome in which the
exposure is causally involved (Greenland and Robins 1988; Greenland 2005).
Elsewhere I have provided an account of measures of attributable fraction
making essential reference to the notion of explanation, which is sensitive to
redundancy-type difficulties of this kind (Broadbent 2013). I do not, however,
deal with Hernán and Taubman’s attack there, nor address the specific claim that
unless a specific intervention is intended, estimates of—for example—the effect
of obesity on mortality are meaningless. In this paper I seek to assess the impact
of this objection, separate what is right about it from what is overstated, and
identify a theoretical basis for some practical guidelines to guide the use of such
measures in real-life epidemiological work.
Heritability
Zinhle Mncube
Heritability estimates “have been regarded as important primarily on the
expectation that they would furnish valuable information about the causal
strength of genetic influence on phenotypic differences” (Sesardic, 1993:399;
slightly rephrased in Sesardic, 2005:22). But for over 30 years, a consensus has
existed in the philosophy of science that heritability estimates do not indicate the
causal strength of genes on phenotypic variance (Oftedal, 2005; Downes, 2016).
Theorists provide conceptual and methodological arguments to the effect that:
(i) because of gene-environment interaction, genotypes and environment
continuously interact during the individual development of phenotypes such that
we cannot partition the causes of variation; (ii) the existence of geneenvironment correlation – cases in which two different and separate sources of
phenotypic variance (genetic and environmental variance) are correlated – again
highlights the idea that it is difficult to entangle genotypic and environmental
effects of phenotypic variance; and (iii) heritability estimates are environment-,
time- and population-dependent, therefore they cannot be generalized onto
other populations. As such, most theorists are against defining heritability as a
measure of causal strength of genetic variance on total phenotypic variance.
Against this consensus, I propose that heritability estimates can bear a causal
interpretation when: (a) there is no statistical gene-environment interaction, (b)
there is small to no gene-environment correlation, and (c) only within the
domain of populations that have similar causally salient features (Sesardic, 2005;
Tal, 2009, 2011). When these conditions are met, it makes sense to causally
interpret that heritability estimate as a measure of the causal strength of
differences in genes on total phenotypic variance. Viewed in this different light, a
question arises about the power of heritability to predict diseases in populations,
and perhaps even in individuals.
5
Clinical Outcomes Assessments and Epistemic Risk
Leah McClimans
Clinical Outcomes Assessments (COAs) are commonly used measures to quantify
a patients’ symptoms, overall mental state or the effects of a disease or condition
on how patients function. They can be used to in the contexts of clinical trails,
drug labeling claims, quality of care assessments and priority setting. But the
adequacy of the psychometric and econometric methods used to develop these
instruments has been a topic of much debate for over a century. Recent
discussion in leading psychometric journals has focused on the ontology of
psychological attributes and what measurement theories and methods befit
them. While there is some consensus amongst psychometricians that a realist
ontology is necessary for valid and interpretable psychological measurement,
including COAs (Michell 2005; Borsboom 2006; Maul 2013), many others, e.g.
psychologists, epidemiologists, etc. who continue to design and employ
measuring instruments built out of theories and methods that cannot sustain a
realist ontology (Michell 1999; Borsboom 2006).
In this paper my aim is to reframe this debate from one about the appropriate
ontology of psychological attributes, e.g. realist, operationalist needed to achieve
measurement, to a debate about epistemic risk. Drawing on Justin Biddle and
Rebecca Kukla’s recent work on phronetic risk I argue that the debate over
whether a COA is qualified, i.e. that within the context of use the results of the
COA can be relied upon to measure a specific concept and have a specific
interpretation, can be understood as a debate over the epistemic risk and values
applied in different contexts.
To illustrate the debate over qualifying COAs I compare instruments designed
using two different measurement theories: Classical Test Theory (CTT) and
Rasch Measurement Theory (RMT). CTT is usually associated with operationalist
ontology and RMT is usually associated with realist ontology. Although neither
theory is without criticism, it is generally claimed that RMT is more scientific
than CTT. Yet CTT continues to be the dominate form of COAs and indeed other
psychological measures. Some have tried to explain this discrepancy in terms of
the dominance of CTT in the education of psychologists and others and the level
of difficulty of RMT (Borsboom 2006). Although these explanations may hold
some truth, I argue that they do not provide the entire story. When we view the
choice between types of measures as a choice of values and epistemic risk we see
a larger, more comprehensive picture. For instance, RMT is more likely to
produce false negatives and CTT is more likely to produce false positives; RMT is
arguably a more precise measure, but what does precision mean in the context of
the attributes and concepts COAs are often employed to measure, attributes such
as depression, concepts such as physical functioning? I argue that what is at
stake in different measuring contexts and how the attribute/concept is
understood will affect whether precision is of overwhelming importance. It will
thus contribute to the measurement methodology you seek to use.
6
Participants
Alex Broadbent
Professor of Philosophy and Executive Dean of Humanities, University of
Johannesburg. [email protected]
Alex Broadbent (PhD Cambridge 2007) is a philosopher of science with
particular interests in philosophy of epidemiology, philosophy of medicine, and
philosophy of law, connected by the philosophical themes of causation,
explanation, and prediction. He is committed to finding philosophical problems
in practical contexts, and to contributing something useful concerning them. He
holds a P-rating from the National Research Foundation of South Africa (20132018) and is a member of the South African Young Academy of Sciences. He has
published a number of articles in top international journals across three
disciplines (philosophy, epidemiology, law). His first book, Philosophy of
Epidemiology, was published by Palgrave Macmillan in 2013, and has been
translated into Korean. His second book, Philosophy for Graduate Students:
Metaphysics and Epistemology, was published by Routledge in 2016. He is
currently working on his third book, Philosophy of Medicine, under contract with
Oxford University Press.
Recent representative publications
Vandenbroucke, J., Broadbent, A., and Pearce, N. Online first. Causality and Causal
Inference in Epidemiology—the need for a pluralistic approach.
International Journal of Epidemiology. doi: 10.1093/ije/dyv341 [open
access]
Broadbent, A. and Seung-Sik, H. 2016. Tobacco and Epidemiology in Korea: old
tricks, new answers? Journal of Epidemiology and Community Health 70:
527-528.
Broadbent, A. 2015. Causation and Prediction in Epidemiology: A Guide to the
Methodological Revolution. Studies in History and Philosophy of Biological
and Biomedical Sciences 54: 72-80.
Broadbent, A. 2015. Risk Relativism and physical law. Journal of Epidemiology
and Community Health 69: 92-94.
Broadbent, A. 2014. Disease as a Theoretical Concept: the Case of HPV-itis.
Studies in History and Philosophy of Biological and Biomedical Sciences 48:
250-257.
Broadbent, A. 2013. Philosophy of Epidemiology. London: Palgrave Macmillan.
Olaf Dammann
Professor of Public Health and Community Medicine, Pediatrics, and
Ophthalmology at Tufts University School of Medicine. [email protected]
Olaf Dammann, M.D. (U Hamburg, ’90), SM Epidemiology (Harvard, ’97) is
Professor of Public Health and Community Medicine, Pediatrics, and
Ophthalmology at Tufts University School of Medicine in Boston, USA. His
7
research interests include the elucidation of risk factors for brain damage and
retinopathy in preterm newborns, the theory of risk and causation in public
health research, and the development of computational population models of
disease occurrence. His current grant support is from the National Eye Institute
and Tufts University School of Medicine Chairs’ Initiative. His bibliography lists
180 publications.
Recent representative publications
Dammann O. Causality, mosaics, and the health sciences. Theor Med Bioeth. 2016
Apr;37(2):161-8.
Escobar E, Durgham R, Dammann O, Stopka TJ. Agent-based computational
model of the prevalence of gonococcal infections after the implementation of
HIV pre-exposure prophylaxis guidelines. Online J Public Health Inform.
2015 Dec 30;7(3):e224.
Fiorentino AR, Dammann O. Evidence, illness, and causation: an epidemiological
perspective on the Russo-Williamson Thesis. Stud Hist Philos Biol Biomed
Sci. 2015 Dec;54:1-9.
Leviton A, Gressens P, Wolkenhauer O, Dammann O. Systems approach to the
study of brain damage in the very preterm newborn. Front Syst Neurosci.
2015 Apr 14;9:58.
Dammann O, Gray P, Gressens P, Wolkenhauer O, Leviton A. Systems
Epidemiology: What's in a Name? Online J Public Health Inform. 2014 Dec
15;6(3):e198.
Leah McClimans (Symposium Chair)
Associate Professor, Department of Philosophy, University of South Carolina.
[email protected]
Leah McClimans (PhD LSE, 2007) is Associate Professor of Philosophy at the
University of South Carolina. Her research interests focus on measurement in
medical contexts, including the measurement of quality of life and the role of
measurement in evidence based medicine. She is also interested in methodology
of health-related quality of life measures, the art of questioning and the use of
empirical outcomes in bioethical decisions. She is currently editing a collection
on measurement in medicine.
Recent representative publications
Leah McClimans & Anne Slowther (2016). ‘Moral Expertise in the Clinic: Lessons
Learned From Medicine and Science.’ Journal of Medicine and
Philosophy 41 (4):401-415.
L. McClimans (2013). The Role of Measurement in Establishing Evidence. Journal
of Medicine and Philosophy 38 (5):520-538.
Leah McClimans & John P. Browne (2012). Quality of Life is a Process Not an
Outcome. Theoretical Medicine and Bioethics 33 (4):279-292.
8
Zinhle Mncube
Lecturer, Department of Philosophy, University of Johannesburg.
[email protected]
Zinhle’s research interests lie broadly in the philosophy of science, the
philosophy of biology, and the philosophy of race. Her Honour's research was on
the biological basis of race and her Master's dissertation was on heritability and
genetic causation. Zinhle lectures undergraduate courses in metaphysics and
epistemology. She is also an Iris Marion Young Scholar and a Cornelius Golightly
fellow.
Recent representative publications
Mncube, Z. (2015). Are human races cladistic subspecies? South African Journal
of Philosophy 34 (2): 163-174.
Benjamin Smart
Senior Lecturer, Department of Philosophy, University of Johannesburg.
[email protected]
Benjamin Smart (Ph. D. University of Nottingham, 2012) is a philosopher of
science and metaphysician, specialising in the philosophy of disease,
epidemiology, causation, and laws of nature. He has recently published a
monograph with Palgrave Macmillan entitled Concepts and Causes in the
Philosophy of Disease, in which he demonstrates that a variety of analyses of
causation, and numerous concepts of disease, are employed in the medical
sciences.
Recent representative publications
Smart, B. The Philosophy of Disease, 2016, Basingstoke: Palgrave Macmillan
Smart, B. and Thebault, K. 2015. ‘The Principle of Least Action Revisited’ Analysis
75(2): 386-395
Smart, B. 2014. ‘On the Classification of Diseases’. Theoretical Medicine and
Bioethics 35(40): 251-269
Smart, B. 2013. ‘Is the Humean Defeated by Induction?’ Philosophical Studies
162(2): 319-332
Barker, S. and Smart, B. 2012. ‘The Ultimate Argument Against the Dispositional
Monist Accounts of Laws’ Analysis 72(4): 714-723
9