Download Improving Healthcare with Analytics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Adherence (medicine) wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Bad Pharma wikipedia , lookup

Transcript
Answering Hard
Healthcare
Questions with
Data
Fred Rahmanian
Chief Technology Officer
Geneia
“I know that half of my advertising doesn’t work. The problem is that I don’t
know which half.”
department store magnate John Wanamaker
Google’s Answer
 Cost/1000 impression (CPM) Vs. Cost/Click (CPC)
 Using data Google understood each user’s behavior
 Google was able to place advertisements that an individual was
likely to click.
 They knew “which half” of their advertising was more likely to be
effective
 And didn’t bother with the rest.
Healthcare is expensive
 The U.S. spends over $2.6 trillion on health care every year;
 These costs include over $600 billion of unexplained variations in treatments;
 Misuse of drugs and treatments, resulting in avoidable adverse effects of medical
treatment that could save $52.2 billion;
 Overuse of non-urgent emergency department (ED) care that could save
(conservatively) $21.4 billion;
 Underuse of generic anti-hypertensives, with potential savings of $3 billion;
 Underuse of controller medicines in pediatric asthma, particularly inhaled
corticosteroids, with projected savings of $2.5 billion;
 Overuse of antibiotics for respiratory infections, with potential savings of $1.1
billion.
Source: http://media.washingtonpost.com/wp-srv/nation/pdf/healthreport_092909.pdf
Average Treatment
 For the past 60 years we’ve treated patients as some
sort of an average
 Diagnose a condition and recommend a treatment based
on what worked for most people, as reflected in large
clinical studies
 A treatment was deemed effective or ineffective
 Safe or unsafe
 Based on, gold standard, double-blind studies that
rarely took into account the differences between
patients
Remember Tamoxifen?
 Roughly 80% effective for breast cancer patients.
 But now we know much more
 We know that it’s 100% effective in 85% to 90% of the
patients, and ineffective in the rest.
 Would be nice to know for which patients it’s effective
100% of the times
Explosion of Data

In recent years, there has been an explosion of data
in healthcare

Clinical and Health outcomes data contained in ever
more prevalent electronic health records (EHRs)

Longitudinal drug and medical claims

Genomic data

Proteomic data

Metabolomic data (systematic study of the unique
chemical fingerprints that specific cellular processes
leave behind)

Social network data

Mobile Devices

Exogenous data
And with this
 Our ability to process this data have improved
drastically
 We can now ask important questions
 the Wanamaker questions, about what treatments work
and for whom.
 How to improve the health of population
 How to improve the experience of care
 And perhaps more importantly do all of this while
reducing the cost of care
Data science may be the answer
 We know much of our medicine
doesn't work for half the patients
 Just don't know which half – like
Wanamaker
 Data science promise is that if we
can collect enough treatment data
and use it effectively
 We'll be able to develop predictive
models that will tell us which
treatment will be more effective
for which patient
Healthcare Analytic
 Data availability and variability in the ways we analyze it are
the two factors behind this new approach to medicine
 It is not enough to say that a drug is effective on most
patients
 Using machine learning techniques we can group patients and
then determine the difference between these groups
 We can now ask for which patient a drug is effective instead
of just asking whether a drug is effective
 This is possible because we are now using data that was not
available before
 So is more data the answer?
Knowledge Discovery for Survival Analysis in NSCLC
Does incorporating more data help?
LOO ROC Plot for S2y (82pts, P/N: 24/58)
1
0.9
0.8
sensitivity
0.7
Combining clinical data
from disparate sources
improves prediction accuracy
0.6
0.5
0.4
0.3
0.2
AUC: 0.65 (Clinic)
AUC: 0.76 (Clinic + Image)
AUC: 0.85 (Clinic + Image + Biomarker)
0.1
0
0
0.2
0.4
0.6
0.8
1
1-specificity
S. Yu, C. Dehing-Oberije, D. De Ruysscher, K. van Beek, Y. Lievens, J. Van Meerbeeck, W. De Neve, G. Fung, B. Rao, P. Lambin, “Development, External
Validation and further Improvement of a Prediction Model for Survival of Non-Small Cell Lung Cancer Patients treated with (Chemo) Radiotherapy,”, ASTRO
2008
So is data the answer?
 May be
 Peter Novig is credited for
saying ‘Our algorithms haven’t
gotten that much better. We
just have more data’
 To understand what he means
we need to understand
predictive modeling first.
Goal of supervised learning algorithm(predictive
models)
 Find the best estimate for mapping function (f)
for the output variable (Y) given the input data
(X).
 Y=f(X)+ϵ
 Mapping function is also know as ‘Target function’
 The prediction error for any machine learning
algorithm can be defined by three types of errors:



Irreducible error
Variance error
Bias error
 Can’t do much about irreducible error
 So the goal of any model is to reduce bias and
variance errors
Why do some models don’t perform
well
 Typically there are two reasons
why a model is not performing
well (can you guess what they
are?)
1.
2.
Model is too complicated for the size of
data

This is generally caused by high
variance and leads to overfitting

Can spot high variance when training
error is much lower than training
error

High variance can be addressed by
reducing the number of features or
adding more observations
Model is too simple to explain the data

This is due to high bias

Adding more data doesn’t help bias

But adding more features does
Source:
http://statweb.stanford.edu/%7Etibs/ElemSta
tLearn/ Figure: 2.1
Why some models don’t perform well
 To address high variance or high bias we need to add
more data or features.
 Features are still data
 So does this mean
More data = Better Signal (insight)
Is more data better?
NO
More data + sound approach = Better Signal (insight)
Healthcare is expensive
 The U.S. spends over $2.6 trillion on health care every year;
 These costs include over $600 billion of unexplained variations in treatments;
 Misuse of drugs and treatments, resulting in avoidable adverse effects of medical
treatment that could save $52.2 billion;
 Overuse of non-urgent emergency department (ED) care that could save
(conservatively) $21.4 billion;
 Underuse of generic anti-hypertensives, with potential savings of $3 billion;
 Underuse of controller medicines in pediatric asthma, particularly inhaled
corticosteroids, with projected savings of $2.5 billion;
 Overuse of antibiotics for respiratory infections, with potential savings of $1.1
billion.
Source: http://media.washingtonpost.com/wp-srv/nation/pdf/healthreport_092909.pdf
Explosion of Healthcare Data
Means opportunity
 Identify high risk patients
 Opioid dependency
 COPD patients
 Formulary optimization
 Identify variation in treatment
 Provider teaming
 Identifying gaps in care
 Computer aided diagnostics
Source: http://media.washingtonpost.com/wp-srv/nation/pdf/healthreport_092909.pdf