Download Improving Healthcare with Analytics

Answering Hard Healthcare Questions with Data Fred Rahmanian Chief Technology Officer Geneia “I know that half of my advertising doesn’t work. The problem is that I don’t know which half.” department store magnate John Wanamaker Google’s Answer  Cost/1000 impression (CPM) Vs. Cost/Click (CPC)  Using data Google understood each user’s behavior  Google was able to place advertisements that an individual was likely to click.  They knew “which half” of their advertising was more likely to be effective  And didn’t bother with the rest. Healthcare is expensive  The U.S. spends over $2.6 trillion on health care every year;  These costs include over $600 billion of unexplained variations in treatments;  Misuse of drugs and treatments, resulting in avoidable adverse effects of medical treatment that could save $52.2 billion;  Overuse of non-urgent emergency department (ED) care that could save (conservatively) $21.4 billion;  Underuse of generic anti-hypertensives, with potential savings of $3 billion;  Underuse of controller medicines in pediatric asthma, particularly inhaled corticosteroids, with projected savings of $2.5 billion;  Overuse of antibiotics for respiratory infections, with potential savings of $1.1 billion. Source: http://media.washingtonpost.com/wp-srv/nation/pdf/healthreport_092909.pdf Average Treatment  For the past 60 years we’ve treated patients as some sort of an average  Diagnose a condition and recommend a treatment based on what worked for most people, as reflected in large clinical studies  A treatment was deemed effective or ineffective  Safe or unsafe  Based on, gold standard, double-blind studies that rarely took into account the differences between patients Remember Tamoxifen?  Roughly 80% effective for breast cancer patients.  But now we know much more  We know that it’s 100% effective in 85% to 90% of the patients, and ineffective in the rest.  Would be nice to know for which patients it’s effective 100% of the times Explosion of Data  In recent years, there has been an explosion of data in healthcare  Clinical and Health outcomes data contained in ever more prevalent electronic health records (EHRs)  Longitudinal drug and medical claims  Genomic data  Proteomic data  Metabolomic data (systematic study of the unique chemical fingerprints that specific cellular processes leave behind)  Social network data  Mobile Devices  Exogenous data And with this  Our ability to process this data have improved drastically  We can now ask important questions  the Wanamaker questions, about what treatments work and for whom.  How to improve the health of population  How to improve the experience of care  And perhaps more importantly do all of this while reducing the cost of care Data science may be the answer  We know much of our medicine doesn't work for half the patients  Just don't know which half – like Wanamaker  Data science promise is that if we can collect enough treatment data and use it effectively  We'll be able to develop predictive models that will tell us which treatment will be more effective for which patient Healthcare Analytic  Data availability and variability in the ways we analyze it are the two factors behind this new approach to medicine  It is not enough to say that a drug is effective on most patients  Using machine learning techniques we can group patients and then determine the difference between these groups  We can now ask for which patient a drug is effective instead of just asking whether a drug is effective  This is possible because we are now using data that was not available before  So is more data the answer? Knowledge Discovery for Survival Analysis in NSCLC Does incorporating more data help? LOO ROC Plot for S2y (82pts, P/N: 24/58) 1 0.9 0.8 sensitivity 0.7 Combining clinical data from disparate sources improves prediction accuracy 0.6 0.5 0.4 0.3 0.2 AUC: 0.65 (Clinic) AUC: 0.76 (Clinic + Image) AUC: 0.85 (Clinic + Image + Biomarker) 0.1 0 0 0.2 0.4 0.6 0.8 1 1-specificity S. Yu, C. Dehing-Oberije, D. De Ruysscher, K. van Beek, Y. Lievens, J. Van Meerbeeck, W. De Neve, G. Fung, B. Rao, P. Lambin, “Development, External Validation and further Improvement of a Prediction Model for Survival of Non-Small Cell Lung Cancer Patients treated with (Chemo) Radiotherapy,”, ASTRO 2008 So is data the answer?  May be  Peter Novig is credited for saying ‘Our algorithms haven’t gotten that much better. We just have more data’  To understand what he means we need to understand predictive modeling first. Goal of supervised learning algorithm(predictive models)  Find the best estimate for mapping function (f) for the output variable (Y) given the input data (X).  Y=f(X)+ϵ  Mapping function is also know as ‘Target function’  The prediction error for any machine learning algorithm can be defined by three types of errors:    Irreducible error Variance error Bias error  Can’t do much about irreducible error  So the goal of any model is to reduce bias and variance errors Why do some models don’t perform well  Typically there are two reasons why a model is not performing well (can you guess what they are?) 1. 2. Model is too complicated for the size of data  This is generally caused by high variance and leads to overfitting  Can spot high variance when training error is much lower than training error  High variance can be addressed by reducing the number of features or adding more observations Model is too simple to explain the data  This is due to high bias  Adding more data doesn’t help bias  But adding more features does Source: http://statweb.stanford.edu/%7Etibs/ElemSta tLearn/ Figure: 2.1 Why some models don’t perform well  To address high variance or high bias we need to add more data or features.  Features are still data  So does this mean More data = Better Signal (insight) Is more data better? NO More data + sound approach = Better Signal (insight) Healthcare is expensive  The U.S. spends over $2.6 trillion on health care every year;  These costs include over $600 billion of unexplained variations in treatments;  Misuse of drugs and treatments, resulting in avoidable adverse effects of medical treatment that could save $52.2 billion;  Overuse of non-urgent emergency department (ED) care that could save (conservatively) $21.4 billion;  Underuse of generic anti-hypertensives, with potential savings of $3 billion;  Underuse of controller medicines in pediatric asthma, particularly inhaled corticosteroids, with projected savings of $2.5 billion;  Overuse of antibiotics for respiratory infections, with potential savings of $1.1 billion. Source: http://media.washingtonpost.com/wp-srv/nation/pdf/healthreport_092909.pdf Explosion of Healthcare Data Means opportunity  Identify high risk patients  Opioid dependency  COPD patients  Formulary optimization  Identify variation in treatment  Provider teaming  Identifying gaps in care  Computer aided diagnostics Source: http://media.washingtonpost.com/wp-srv/nation/pdf/healthreport_092909.pdf

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Improving Healthcare with Analytics