Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Enterprise risk management wikipedia , lookup

Data analysis wikipedia , lookup

Corecursion wikipedia , lookup

Data assimilation wikipedia , lookup

Neuroinformatics wikipedia , lookup

Transcript
Doing Data Science – Chapter 12: Epidemiology
•
Vast amounts of individual patient medical data is available
– Detailed – visits, prescriptions, outcomes, etc.
– Records cover lifetimes
– Largest databases have records on 80 million people
•
However many medical studies are observational
– Not founded on data
– Results effect actions of doctors and insurance regulators
Confounder Problem and Stratification
•
•
•
•
Confounding problem: an extraneous variable which correlates to both the
dependent and independent statistical variable, giving an incorrect perception of
cause and effect
Stratification: partitioning a case into subcases and evaluating just the subcases to
reach conclusions about the top level case
– Weighted average is one way of evaluating subcases
Example [p.294-295]:
– In study where equal number of women (50) and men (50) had treatment but
different numbers (80 women, 20 men) were in the control group
– Original causal effect is 10%
– Stratified causal effect is 5% for men and 11.25% for women
– This does NOT prove that the treatment side effects are twice as strong for
women
Problem – errors in causality if the numbers in the groups after stratification are
too different to give meaningful statistics
Data Driven Studies
•
Analysis of 50 studies of drug/outcome pairs
– 5000 analyses for each pair on nine databases
– Example:
• ACE inhibitors (treatment for hypertension)/swelling of the heart
• Results varied between databases from 3X risk to 6X risk
– For 20 of 50 pairs, risk or no risk was database dependent
– By adjusting factors of databases, confounders, and time windows, all studies
can show risk or no risk
Data Driven Studies
•
Observational Medical Outcomes Partnership (OMOP)
– See how well current methods predict things we already know
– 10 large medical databases containing records for 200 million people
– $25M
– Determined an ROC curve. Area Under the Curve (AUC) was 0.65, not much
better than a random 0.5
– Databases are self-consistent – using one database gave better accuracy (0.92
in one case)
– Graphs below show ~80% sensitivity with ~10% false-positive rate [p.302]
“The epidemiologists in general don’t believe
the results of this study.”
In other words, they prefer to rely on
observational rather than data driven
conclusions
References
•
•
http://en.wikipedia.org/wiki/Confounding
https://en.wikipedia.org/wiki/ACE_inhibitor