Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Machine Learning Misconceptions May rd 3 , 2017 © 2016 Health Catalyst Proprietary and Confidential Data Science Team Levi Thatcher, PhD Mike Mastanduno, PhD Taylor Miller, PharmD Taylor Larsen, MS Director of Data Science Data Scientist Data Scientist Data Science Engineer 2 © 2016 Health Catalyst Proprietary and Confidential Purpose of Today’s Chat • Compare and contrast machine learning and artificial intelligence. • Discuss techniques that offer feedback into the system and when it’s necessary to retrain a model. • Give advice on how to avoid common pitfalls in machine learning implementation. • Talk about potential applications of the different classes of machine learning techniques. • Q&A 3 © 2016 Health Catalyst Proprietary and Confidential Machine Learning Definition Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed. Such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions through building a model from sample inputs. - Wikipedia 4 © 2016 Health Catalyst Proprietary and Confidential Machine Learning Typical Use • Movie recommendations on Netflix • People you may know on Facebook • Advertising • Patient likelihood of contracting sepsis, being readmitted… • Using any tabular data source to predict a Y/N or continuous outcome 5 © 2016 Health Catalyst Proprietary and Confidential Artificial Intelligence Definition Artificial intelligence (AI) is intelligence exhibited by machines. In computer science, the field of AI research defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal. - Wikipedia These models are limited in their ability to “reason”, i.e. to carry out long chains of inferences, or optimization procedure to arrive at an answer. The number of steps in a computation is limited by the number of layers in feed-forward nets, and by the length of time a recurrent net will remember things. - Yann LeCun, Director of Facebook AI Research 6 © 2016 Health Catalyst Proprietary and Confidential Artificial Intelligence Typical Use • Speech translation • Complex game playing • Self-driving cars • Content delivery • Radiology? 7 © 2016 Health Catalyst Proprietary and Confidential Difference Between ML and AI • It’s fuzzy • Learning from data? No, not really. • Continuous learning from data? No, not really. • AI feels more complicated. • AI should be able to learn a skill and generalize it to another entirely different thing. • Many AI ideas get rebranded as ML as time goes on and we understand them. 8 © 2016 Health Catalyst Proprietary and Confidential Poll #1: Have you ever used machine learning or AI? 148 respondents • Yes, in my daily work – 21% • Yes, as a hobby – 17% • No, but I plan to – 52% • No, not applicable – 9% 9 © 2016 Health Catalyst Proprietary and Confidential How is machine learning used? 10 © 2016 Health Catalyst Proprietary and Confidential Poll #2: Where is your organization in terms of using machine learning in regular operations? 138 respondents • Using machine learning tools daily across many departments and use cases – 13% • Daily across a couple of use case – 17% • Confined to a research study or two – 49% • What is machine learning? – 21% 11 © 2016 Health Catalyst Proprietary and Confidential When does a model learn? • Different algorithms learn at different times • Only during training • • • Logistic regression Random forest Clustering • Periodically after new data comes in • • • • Any of the above (but more complex implementation) Naïve Bayes Neural networks Deep learning • Continuously as new data comes in • Any of the above (but still more complex implementation) 12 © 2016 Health Catalyst Proprietary and Confidential When should a model be retrained? • After significant data turnover • If performance in production drops over time • Seasonality • Changing treatment methods • If new features or techniques are identified • If the use case changes 13 © 2016 Health Catalyst Proprietary and Confidential Pitfall 1: Poorly Defined Use Case • Leads to: • Use case is always the first priority • Incorrect usage of data fields • Unavailable data • What is the question? • No adoption • Who are the users? • When are they using it? • How are they using? 14 © 2016 Health Catalyst Proprietary and Confidential Pitfall 2: Production Environment is Different • Data might not be available • Learn how your data is populated over time • Timing of data might lead • Only train with what’s to target leakage available at the time of prediction • Predictions are made multiple times per patient • Know your use case! 15 © 2016 Health Catalyst Proprietary and Confidential Pitfall 3: Bad Performance Metrics • 99% accurate, but didn’t find any sick people • AUC or Precision-Recall • Sampling methods during model training • Imbalanced classes • Performance changing over time • Monitor correct performance metric over time • Know your use case! 16 © 2016 Health Catalyst Proprietary and Confidential Pitfall 4: Poor Adoption • Do people know about it? • Tell people about it • Know the use case • Is it answering a relevant • Simple is better, question? shouldn’t affect workflow • Is visualization done • Improve trust with well? prediction explanations • Do people trust the or transparent models model? 17 © 2016 Health Catalyst Proprietary and Confidential Poll #3: What’s impeding you from moving forward with machine learning in your organization? 116 respondents • Available tools are overwhelming OR don’t know what exists – 16% • Use cases are overwhelming OR don’t know what’s possible – 28% • Don’t have or can’t afford the technical staff to implement – 23% • Adoption—clinical team isn’t interested – 9% • Other – 25% 18 © 2016 Health Catalyst Proprietary and Confidential Potential Applications: ML and EMR • Clinical • • • • Risk scores – readmissions, mortality Risk adjusted comparisons Replacing clinical rulesets Correct coding • Operational • • Staff need forecasting Length of stay prediction • Financial • • Propensity to pay Predicted procedure cost 19 © 2016 Health Catalyst Proprietary and Confidential Potential Applications: NLP or Smarter Analytics • Parsing clinical notes • Fill in discrete text fields automatically • Find new features that only come up in conversation • Smart retrospective analysis • Trend analysis • Exploration across the whole EMR • Serve up insights automatically 20 © 2016 Health Catalyst Proprietary and Confidential Potential Applications: Image Processing • Diagnostics of pre-segmented suspicious regions • Automatic segmentation of tissue types • Diagnosis of or staging of screening images • Diagnosis or staging of pathology slides 21 © 2016 Health Catalyst Proprietary and Confidential Poll #4: What’s the most valuable use for ML/AI/Big Data to your organization? 95 respondents • Parsing free-form clinical notes – 14% • Image interpretation – 5% • Clinical risk scores – 47% • Operational efficiency – 29% • These are buzz words and not worth the time. – 4% 22 © 2016 Health Catalyst Proprietary and Confidential Poll #5: If there was an algorithm that was FDA approved and read mammographic images on par with a radiologist, would you use it? 90 respondents • Yes, I’d trust it completely – 16% • Yes, but only as an aide to the radiologist – 81% • No, I wouldn’t trust it – 3% 23 © 2016 Health Catalyst Proprietary and Confidential Before we end… 24 © 2016 Health Catalyst Proprietary and Confidential Questions? 25 © 2016 Health Catalyst Proprietary and Confidential