Download Kumar Mayo Summit Sept2015 - users.cs.umn.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Impact of Big Data on Health Science Research
Vipin Kumar
University of Minnesota
[email protected]
www.cs.umn.edu/~kumar
Delivery Science Summit, Mayo Clinic, 2015
Big Data Era
Delivery Science Summit, Mayo Clinic, 2015
Big Data in Health Science
EHR Data
Genomics Data
SNP
Gene Expression
Mass Spectrometry
Brain Imaging Data
fMRI Data
Diffusion Tensor
Imaging
MEG Data
EEG Data
PET Data
Protein Network
Mobile Health Data
Activity
Heart rate
Glucose monitoring
Delivery Science Summit, Mayo Clinic, 2015
Sleep monitoring
Blood pressure
Electronic Health Records
-
Big Data holds the potential for improving clinical quality and
reducing healthcare costs
-
-
Understanding the Natural History of Disease
Risk Prediction / Biomarker
Quantifying the Effect of Intervention
Constructing Evidence Based Guidelines
Adverse Event Detection
Many Challenges
-
Data is High-Dimensional, Sparse, Fragmented, often Missing/Censored
Questions of interest are complex
Need to integrate Expert Knowledge
Delivery Science Summit, Mayo Clinic, 2015
Case study
Why does colonoscopy sometimes fail to
prevent colon cancer ?
- Joint work with Piet de Groen (Mayo Clinic)
- Focuses on Endoscopist and Withdrawal Time
- Makes use of Mayo Clinic Rochester Dataset
Gupta et al. 2009
- 2008 ACG/Olympus and
ACG Presidential Award
- Plenary talk at Digestive
Disease Week (DDW) , 2009
Delivery Science Summit, Mayo Clinic, 2015
Miss Rate for each Endoscopist
Missed
• 9.9% cancers Truly
are missed
25
Probably missed
Seen & Removed
• Varies among endoscopists
• Not related to experience or
withdrawal time
20
15
10
5
0
2
4
6
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
Delivery Science Summit, Mayo Clinic, 2015
Case study
Predicting Improvement of Mobility from
Home Health Care Data
• Outcome and Assessment Information Set Data (OASIS)
- Sample of 270,634 patient records ( 10/1/2008 – 12/31/2009) from 581
Medicare-certified, home healthcare agencies
Diagnosis
Codes
(ICD-9)
Admission
Survey
(OASIS)
Home Healthcare
Discharge
Survey
(OASIS)
Demographic, behavioral, pathological, psychosocial factors, outcome variables.
• Objectives
- Identify patterns of factors associated with improvement and no improvement
in mobility within each group
Delivery Science Summit, Mayo Clinic, 2015
Patterns Associated with Improvement and
No-Improvement in Mobility Outcome
[Dey et al. Nur. res. 2014]
- The size of the circles represents the magnitude of individual ORs of each variable present
in the patterns.
- The larger the circle/node in the pattern, the more likely the variable is associated with
the outcome
Delivery Science Summit, Mayo Clinic, 2015
Genomics Data
SNP
Gene Expression
Mass Spectrometry
Protein Network
• Driven by advances in high-throughput
technologies
• Holds great promise for revolutionizing
practice of medicine
– Determining predisposition to a disease
– Personalized medicine
Delivery Science Summit, Mayo Clinic, 2015
Published Genome-Wide Associations through 12/2013
Published GWA at p≤5X10-8 for 17 trait categories
NHGRI GWA Catalog
www.genome.gov/GWAStudies
www.ebi.ac.uk/fgpt/gwas/
Missing heritability
McCarthy et al. 2008
Manolio et al. 2009
• Most have modest effect sizes
• Limited overall impact even when combined
Marked disparity with
• Extent of overall familial aggregation
Eichler et al. 2010
Manolio et al. 2009
Brendan Maher, 2008
Missing heritability
• Rare variations
• Combinatorial biomarkers
• Novel types of (epi)genetic variations
Delivery Science Summit, Mayo Clinic, 2015
Association Analysis for Discovering
Interesting Combinations
Anti-monotonic upper bound
null
Disqualified
A
B
C
D
E
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
Prune all the supersets
[Agrawal et al. 1994]
Delivery Science Summit, Mayo Clinic, 2015
12
Case study
Combinatorial markers for Lung Cancer
≈ 60%
≈ 10%
[Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]
Selected for highlight talk, RECOMB SB 2010
Best Network Model award, Sage Congress, 2010
Delivery Science Summit, Mayo Clinic, 2015
Big Data in Human Neuroscience
Diffusion Tensor
Imaging
fMRI Data
EEG Data
MEG Data
PET Data
• Increasing amount of neuroimaging data is available
• E.g., Single fMRI 20 min. scan is 6GB in size
• 8,000 MRI datasets available in public domain (Poldrack et al. 2014)
•
•
Human Connectome Project (HCP)
Alzheimer’s Disease Neuroimaging Initiative (ADNI)
• Healthcare questions
• How can we automate diagnosis using imaging data?
• How can we predict the disease in advance?
• How can we study effectiveness of treatment methodologies?
Delivery Science Summit, Mayo Clinic, 2015
Gene Exp.
Brain Connectivity: Healthy vs. Disease
fMRI scan
Atlas
regions
90 brain
regions
regions
time
Brain
Network
regions
Lynall et al. 2010
What are the key differences in networks
between two groups?
Delivery Science Summit, Mayo Clinic, 2015
regions
Correlation matrix
schizophrenia
Healthy vs. Disease – univariate testing
edges
Healthy
regions
● Several studies have reported connections associated with
schizophrenia in the last decade
● Are these reported connections consistent across studies?
Delivery Science Summit, Mayo Clinic, 2015
Inconsistent findings
reported in
Independent studies
Delivery Science Summit, Mayo Clinic, 2015
A Big Data Approach
edges
cluster1
Cluster
edges
cluster2 cluster3 . . .
subjects
subjects
Case study
Strengths:
• Handles redundancy, small number of tests, statistical power
• Handles noise by averaging connectivity across multiple edges
• Could potentially increase reliability
Delivery Science Summit, Mayo Clinic, 2015
Test
significance
A Big Data Approach
Thalamus and primarily striate visual regions
edges
cluster1 cluster2 cluster3 . . .
Cluster
edges
subjects
subjects
Case study
Test
significance
Thalamus and lateral visual regions
Thalamus and lateral temporal regions
Clusters of connections
Delivery Science Summit, Mayo Clinic, 2015
Studying Dynamics in Brain Networks
Resting state connectivity
Atluri et al. SDM 2014
Resting state vs. Watching cartoons
Delivery Science
Mayo Clinic, 2015
Atluri Summit,
et al. 2014
Studying Dynamics
Dynamic Brain Connectivity
Atluri et al. SDM 2014
Opportunities for cross-fertilization
among diverse domains with spacetime data such as
•
Climate, Bio-images, Taxi data,
Astronomy, Precision agriculture
Expeditions in Computing:
Understanding Climate Change
- A Data Driven Approach
Resting state vs. Watching cartoons
• 5-year, $10 Million project
• Leverages the wealth of climate
Delivery Science Summit, Mayo Clinic, 2015
and ecosystem data
Atluri et al. 2014
mHealth
• Rapid growth in wearable devices market
• Collects a variety of variables:
Respiration, Heartrate,
Activity, Sleep,
Glucose, Blood-pressure,
Audio-visual samples
Healthcare questions:
–
–
–
–
Can we predict psychological state (e.g. stress, anger)?
Can we assess lifestyle choices (e.g., smoking and diet)?
Can we study the relapse patterns in treating addiction?
Can we deliver telemedicine at the right time?
Delivery Science Summit, Mayo Clinic, 2015
Advancing Biomedical Discovery and
Improving Health through Mobile Sensor Big Data
• MD2K is one of 11 national NIH Big Data Centers of Excellence
•
Part of Big-Data-to-Knowledge (BD2K) Initiative
• Collaborative effort between diverse areas
•
Computer Science, Engineering, Medicine, Behavioral Science, and Statistics
Autosense Project
mConverse: Inferring conversations
Rahman et al. WirelessHealth’11
mPuff: Detecting smoking episodes
Ali et al. IPSN’12
mStress: Identifying stress
Raij et al. WirelessHealth’11
Delivery Science Summit, Mayo Clinic, 2015
Conclusion
• Huge opportunities for application of Big Data to
Healthcare Research
• Many Challenges
– Data not in the form to be used by readily available
big data technologies
– Translating healthcare questions to data science
questions
– Many health specific data science questions require
advances in big data analytics
Delivery Science Summit, Mayo Clinic, 2015
References
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Atluri, G., Steinbach, M., Lim, K. O., Kumar, V., & MacDonald, A. (2015). Connectivity cluster analysis for discovering discriminative subnetworks in
schizophrenia. Human brain mapping, 36(2), 756-767.
Atluri, G., Steinbach, M., Lim, K. O., MacDonald III, A., & Kumar, V. Discovering Groups of Time Series with Similar Behavior in Multiple Small Intervals
of Time. SDM 2014
Atluri, G., Steinbach, M., Lim, K., MacDonald, A., & Kumar, V. (2014). Discovering the Longest Set of Distinct Maximal Correlated Intervals in Time
Series Data. University of Minnesota, Tech-report 14-025, 2014.
Rahman, M. M., Ali, A. A., Plarre, K., al'Absi, M., Ertin, E., & Kumar, S. (2011, October). mConverse: inferring conversation episodes from respiratory
measurements collected in the field. In Proceedings of the 2nd Conference on Wireless Health (p. 10). ACM.
Ali, A. A., Hossain, S. M., Hovsepian, K., Rahman, M. M., Plarre, K., & Kumar, S. (2012, April). mPuff: automated detection of cigarette smoking puffs
from respiration measurements. In Proceedings of the 11th international conference on Information Processing in Sensor Networks (pp. 269-280).
ACM.
Raij, A., Blitz, P., Ali, A. A., Fisk, S., French, B., Mitra, S., ... & Smailagic, A. (2010). mstress: Supporting continuous collection of objective and subjective
measures of psychosocial stress on mobile devices. ACM Wireless Health 2010 San Diego, California USA.
Dey, Sanjoy, et al. "Mining Patterns Associated With Mobility Outcomes in Home Healthcare." Nursing research 64.4 (2015): 235-245.
Gupta, Rohit, et al. "284 Colorectal Cancer Despite Colonoscopy: Critical Is the Endoscopist, Not the Withdrawal Time." Gastroenterology 136.5
(2009): A-55
Simon, Gyorgy J., et al. "Survival association rule mining towards type 2 diabetes risk assessment." AMIA Annual Symposium Proceedings. Vol. 2013.
American Medical Informatics Association, 2013.
Li, Dingcheng, et al. "Using Association Rule Mining for Phenotype Extraction from Electronic Health Records." AMIA Summits on Translational
Science Proceedings 2013 (2013): 142.
Schrom, John R., et al. "Quantifying the effect of statin use in pre-diabetic phenotypes discovered through association rule mining." AMIA Annual
Symposium Proceedings. Vol. 2013. American Medical Informatics Association, 2013.
Pandey, Gaurav, et al. "An association analysis approach to biclustering."Proceedings of the 15th ACM SIGKDD international conference on Knowledge
discovery and data mining. ACM, 2009.
Kim, Hye Soon, et al. "Comorbidity study on type 2 diabetes mellitus using data mining." The Korean journal of internal medicine 27.2 (2012): 197202.
Shin, A. Mi, et al. "Diagnostic analysis of patients with essential hypertension using association rule mining." Healthcare informatics research 16.2
(2010): 77-81
Fang, Gang, et al. "Subspace differential coexpression analysis: problem definition and a general approach." Pacific symposium on biocomputing. Vol.
15. 2010.
Delivery Science Summit, Mayo Clinic, 2015
Related documents