Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Impact of Big Data on Health Science Research Vipin Kumar University of Minnesota [email protected] www.cs.umn.edu/~kumar Delivery Science Summit, Mayo Clinic, 2015 Big Data Era Delivery Science Summit, Mayo Clinic, 2015 Big Data in Health Science EHR Data Genomics Data SNP Gene Expression Mass Spectrometry Brain Imaging Data fMRI Data Diffusion Tensor Imaging MEG Data EEG Data PET Data Protein Network Mobile Health Data Activity Heart rate Glucose monitoring Delivery Science Summit, Mayo Clinic, 2015 Sleep monitoring Blood pressure Electronic Health Records - Big Data holds the potential for improving clinical quality and reducing healthcare costs - - Understanding the Natural History of Disease Risk Prediction / Biomarker Quantifying the Effect of Intervention Constructing Evidence Based Guidelines Adverse Event Detection Many Challenges - Data is High-Dimensional, Sparse, Fragmented, often Missing/Censored Questions of interest are complex Need to integrate Expert Knowledge Delivery Science Summit, Mayo Clinic, 2015 Case study Why does colonoscopy sometimes fail to prevent colon cancer ? - Joint work with Piet de Groen (Mayo Clinic) - Focuses on Endoscopist and Withdrawal Time - Makes use of Mayo Clinic Rochester Dataset Gupta et al. 2009 - 2008 ACG/Olympus and ACG Presidential Award - Plenary talk at Digestive Disease Week (DDW) , 2009 Delivery Science Summit, Mayo Clinic, 2015 Miss Rate for each Endoscopist Missed • 9.9% cancers Truly are missed 25 Probably missed Seen & Removed • Varies among endoscopists • Not related to experience or withdrawal time 20 15 10 5 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 Delivery Science Summit, Mayo Clinic, 2015 Case study Predicting Improvement of Mobility from Home Health Care Data • Outcome and Assessment Information Set Data (OASIS) - Sample of 270,634 patient records ( 10/1/2008 – 12/31/2009) from 581 Medicare-certified, home healthcare agencies Diagnosis Codes (ICD-9) Admission Survey (OASIS) Home Healthcare Discharge Survey (OASIS) Demographic, behavioral, pathological, psychosocial factors, outcome variables. • Objectives - Identify patterns of factors associated with improvement and no improvement in mobility within each group Delivery Science Summit, Mayo Clinic, 2015 Patterns Associated with Improvement and No-Improvement in Mobility Outcome [Dey et al. Nur. res. 2014] - The size of the circles represents the magnitude of individual ORs of each variable present in the patterns. - The larger the circle/node in the pattern, the more likely the variable is associated with the outcome Delivery Science Summit, Mayo Clinic, 2015 Genomics Data SNP Gene Expression Mass Spectrometry Protein Network • Driven by advances in high-throughput technologies • Holds great promise for revolutionizing practice of medicine – Determining predisposition to a disease – Personalized medicine Delivery Science Summit, Mayo Clinic, 2015 Published Genome-Wide Associations through 12/2013 Published GWA at p≤5X10-8 for 17 trait categories NHGRI GWA Catalog www.genome.gov/GWAStudies www.ebi.ac.uk/fgpt/gwas/ Missing heritability McCarthy et al. 2008 Manolio et al. 2009 • Most have modest effect sizes • Limited overall impact even when combined Marked disparity with • Extent of overall familial aggregation Eichler et al. 2010 Manolio et al. 2009 Brendan Maher, 2008 Missing heritability • Rare variations • Combinatorial biomarkers • Novel types of (epi)genetic variations Delivery Science Summit, Mayo Clinic, 2015 Association Analysis for Discovering Interesting Combinations Anti-monotonic upper bound null Disqualified A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Prune all the supersets [Agrawal et al. 1994] Delivery Science Summit, Mayo Clinic, 2015 12 Case study Combinatorial markers for Lung Cancer ≈ 60% ≈ 10% [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] Selected for highlight talk, RECOMB SB 2010 Best Network Model award, Sage Congress, 2010 Delivery Science Summit, Mayo Clinic, 2015 Big Data in Human Neuroscience Diffusion Tensor Imaging fMRI Data EEG Data MEG Data PET Data • Increasing amount of neuroimaging data is available • E.g., Single fMRI 20 min. scan is 6GB in size • 8,000 MRI datasets available in public domain (Poldrack et al. 2014) • • Human Connectome Project (HCP) Alzheimer’s Disease Neuroimaging Initiative (ADNI) • Healthcare questions • How can we automate diagnosis using imaging data? • How can we predict the disease in advance? • How can we study effectiveness of treatment methodologies? Delivery Science Summit, Mayo Clinic, 2015 Gene Exp. Brain Connectivity: Healthy vs. Disease fMRI scan Atlas regions 90 brain regions regions time Brain Network regions Lynall et al. 2010 What are the key differences in networks between two groups? Delivery Science Summit, Mayo Clinic, 2015 regions Correlation matrix schizophrenia Healthy vs. Disease – univariate testing edges Healthy regions ● Several studies have reported connections associated with schizophrenia in the last decade ● Are these reported connections consistent across studies? Delivery Science Summit, Mayo Clinic, 2015 Inconsistent findings reported in Independent studies Delivery Science Summit, Mayo Clinic, 2015 A Big Data Approach edges cluster1 Cluster edges cluster2 cluster3 . . . subjects subjects Case study Strengths: • Handles redundancy, small number of tests, statistical power • Handles noise by averaging connectivity across multiple edges • Could potentially increase reliability Delivery Science Summit, Mayo Clinic, 2015 Test significance A Big Data Approach Thalamus and primarily striate visual regions edges cluster1 cluster2 cluster3 . . . Cluster edges subjects subjects Case study Test significance Thalamus and lateral visual regions Thalamus and lateral temporal regions Clusters of connections Delivery Science Summit, Mayo Clinic, 2015 Studying Dynamics in Brain Networks Resting state connectivity Atluri et al. SDM 2014 Resting state vs. Watching cartoons Delivery Science Mayo Clinic, 2015 Atluri Summit, et al. 2014 Studying Dynamics Dynamic Brain Connectivity Atluri et al. SDM 2014 Opportunities for cross-fertilization among diverse domains with spacetime data such as • Climate, Bio-images, Taxi data, Astronomy, Precision agriculture Expeditions in Computing: Understanding Climate Change - A Data Driven Approach Resting state vs. Watching cartoons • 5-year, $10 Million project • Leverages the wealth of climate Delivery Science Summit, Mayo Clinic, 2015 and ecosystem data Atluri et al. 2014 mHealth • Rapid growth in wearable devices market • Collects a variety of variables: Respiration, Heartrate, Activity, Sleep, Glucose, Blood-pressure, Audio-visual samples Healthcare questions: – – – – Can we predict psychological state (e.g. stress, anger)? Can we assess lifestyle choices (e.g., smoking and diet)? Can we study the relapse patterns in treating addiction? Can we deliver telemedicine at the right time? Delivery Science Summit, Mayo Clinic, 2015 Advancing Biomedical Discovery and Improving Health through Mobile Sensor Big Data • MD2K is one of 11 national NIH Big Data Centers of Excellence • Part of Big-Data-to-Knowledge (BD2K) Initiative • Collaborative effort between diverse areas • Computer Science, Engineering, Medicine, Behavioral Science, and Statistics Autosense Project mConverse: Inferring conversations Rahman et al. WirelessHealth’11 mPuff: Detecting smoking episodes Ali et al. IPSN’12 mStress: Identifying stress Raij et al. WirelessHealth’11 Delivery Science Summit, Mayo Clinic, 2015 Conclusion • Huge opportunities for application of Big Data to Healthcare Research • Many Challenges – Data not in the form to be used by readily available big data technologies – Translating healthcare questions to data science questions – Many health specific data science questions require advances in big data analytics Delivery Science Summit, Mayo Clinic, 2015 References • • • • • • • • • • • • • • • Atluri, G., Steinbach, M., Lim, K. O., Kumar, V., & MacDonald, A. (2015). Connectivity cluster analysis for discovering discriminative subnetworks in schizophrenia. Human brain mapping, 36(2), 756-767. Atluri, G., Steinbach, M., Lim, K. O., MacDonald III, A., & Kumar, V. Discovering Groups of Time Series with Similar Behavior in Multiple Small Intervals of Time. SDM 2014 Atluri, G., Steinbach, M., Lim, K., MacDonald, A., & Kumar, V. (2014). Discovering the Longest Set of Distinct Maximal Correlated Intervals in Time Series Data. University of Minnesota, Tech-report 14-025, 2014. Rahman, M. M., Ali, A. A., Plarre, K., al'Absi, M., Ertin, E., & Kumar, S. (2011, October). mConverse: inferring conversation episodes from respiratory measurements collected in the field. In Proceedings of the 2nd Conference on Wireless Health (p. 10). ACM. Ali, A. A., Hossain, S. M., Hovsepian, K., Rahman, M. M., Plarre, K., & Kumar, S. (2012, April). mPuff: automated detection of cigarette smoking puffs from respiration measurements. In Proceedings of the 11th international conference on Information Processing in Sensor Networks (pp. 269-280). ACM. Raij, A., Blitz, P., Ali, A. A., Fisk, S., French, B., Mitra, S., ... & Smailagic, A. (2010). mstress: Supporting continuous collection of objective and subjective measures of psychosocial stress on mobile devices. ACM Wireless Health 2010 San Diego, California USA. Dey, Sanjoy, et al. "Mining Patterns Associated With Mobility Outcomes in Home Healthcare." Nursing research 64.4 (2015): 235-245. Gupta, Rohit, et al. "284 Colorectal Cancer Despite Colonoscopy: Critical Is the Endoscopist, Not the Withdrawal Time." Gastroenterology 136.5 (2009): A-55 Simon, Gyorgy J., et al. "Survival association rule mining towards type 2 diabetes risk assessment." AMIA Annual Symposium Proceedings. Vol. 2013. American Medical Informatics Association, 2013. Li, Dingcheng, et al. "Using Association Rule Mining for Phenotype Extraction from Electronic Health Records." AMIA Summits on Translational Science Proceedings 2013 (2013): 142. Schrom, John R., et al. "Quantifying the effect of statin use in pre-diabetic phenotypes discovered through association rule mining." AMIA Annual Symposium Proceedings. Vol. 2013. American Medical Informatics Association, 2013. Pandey, Gaurav, et al. "An association analysis approach to biclustering."Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009. Kim, Hye Soon, et al. "Comorbidity study on type 2 diabetes mellitus using data mining." The Korean journal of internal medicine 27.2 (2012): 197202. Shin, A. Mi, et al. "Diagnostic analysis of patients with essential hypertension using association rule mining." Healthcare informatics research 16.2 (2010): 77-81 Fang, Gang, et al. "Subspace differential coexpression analysis: problem definition and a general approach." Pacific symposium on biocomputing. Vol. 15. 2010. Delivery Science Summit, Mayo Clinic, 2015