Download Rubik: Knowledge Guided Tensor Factorization and Completion for

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Rubik: Knowledge Guided Tensor Factorization and
Completion for Health Data Analytics
Yichen Wang1, Robert Chen1, Joydeep Ghosh2, Joshua C. Denny3,
Abel Kho4, You Chen3, Bradley A. Malin3, Jimeng Sun1
Georgia Institute of Technology1
University of Texas, Austin2
Vanderbilt University3
Northwestern University4
1
Phenotyping from Electronic Health
Records
Demographic
Procedure
Diagnosis
EHR
Lab Tests
Medication
Phenotyping
Clinical notes
Medical Concepts (phenotypes)
• Limitations of existing phenotyping methods
– Unable to leverage existing knowledge
– High overlapping between discovered phenotypes
– Not robust to missing and noisy data
– Not scalable
2
Ideal Phenotyping Algorithms
• Guidance: incorporate medical knowledge
• Non-overlap: discover distinct and meaningful
phenotypes
• Robust: handle noisy and missing data
• Scalable
3
Algorithms Comparison
[1] Ho, Joyce C., Joydeep Ghosh, and Jimeng Sun. "Marble: high-throughput phenotyping from electronic health records via sparse nonnegative
tensor factorization." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
[2] Liu, Ji, et al. "Tensor completion for estimating missing values in visual data." Pattern Analysis and Machine Intelligence, IEEE Transactions on 35.1
(2013): 208-220.
[3] Acar, Evrim, Tamara G. Kolda, and Daniel M. Dunlavy. "All-at-once optimization for coupled matrix and tensor factorizations." arXiv preprint
arXiv:1105.3422 (2011).
[4] Kim, Jingu, and Haesun Park. "Fast nonnegative tensor factorization with an active-set-like method." High-Performance Scientific Computing. Springer
London, 2012. 311-326.
[5] Acar, Evrim, et al. "Scalable tensor factorizations for incomplete data." Chemometrics and Intelligent Laboratory Systems 106.1 (2011): 41-56.
[6] Davidson, Ian, et al. "Network discovery via constrained tensor analysis of fmri data." Proceedings of the 19th ACM SIGKDD international conference
on Knowledge discovery and data mining. ACM, 2013.
4
Rubik
Distinct phenotypes
Interaction tensor
Complete
Missing data
Align with medical knowledge
Full Tensor
Scalable
Examples of Phenotype
Medication factor
Diagnosis factor
Patients
factor
Phenotype with Guidance info
on heart failure
6
Problem Overview
≈
X: Tensor
+
…
T: Interaction tensor
Factorization error
+
C: Bias tensor
Guidance
Nonoverlap constraint
O: Observed tensor
Sparsity
Nonnegativity
Guidance Information
phenotypes
Phenotype 1
Phenotype 2
hypertension
Diabetes 1
diagnosis
Secondary hypertension
Diabetes 2
Factor matrix
Guidance is limited
Guidance knowledge
8
Pairwise Constraints
• We can penalize the cases where phenotypes have
overlapping dimensions.
Phenotype 1 Phenotype 2
Phenotype 1
phenotypes
Phenotype 2
hypertension
phenotypes
Diabetes 1
Q: similarity matrix
Overlapping case
Non-overlap case
9
Formulation
Objective
Lagrange Function
(ADMM)
p and Y are Lagrange multipliers
Block coordinate
scheme
10
Block coordinate scheme
Update interaction tensor (T)
Update bias tensor (C)
Update Lagrange multipliers (p, Y)
Update full tensor (X)
11
EXPERIMENTS
12
Constructing Tensor
++
++
7 days
7 days
Sliding window Time series data
Binary Tensor
Figure 1: Co-­occurrences of events within a patient’s history are captured in the tensor as binary values.
Datasets
• Vanderbilt: 3rd order tensor with patient,
diagnosis and medication modes of size 7,744
by 1,059 by 501, respectively.
• CMS: 472,645 patients by 11,424 diagnoses by
262,312 medication events.
14
Experiments
• Phenotype discovery: How Rubik discovers meaningful phenotypes?
• Phenotype discovery: How Rubik discovers fine-grained subphenotypes?
• Phenotype discovery: How Rubik discovers distinct phenotypes?
• Noise analysis: Is Rubik robust to noisy and missing data?
• Scalability: Is Rubik Scalable?
• Constraints analysis: Are all constraints important?
15
Phenotype Discovery:Meaningful Interaction Tensor
• Conduct surveys with medical experts
to evaluate 30 phenotypes from
Marble and Rubik
• YES means clinically meaningful
• POSS means possibly meaningful
• NOT means not meaningful
Rubik generates more meaningful phenotypes
16
Phenotype Discovery:Meaningful Bias Tensor
• Meaningful: accurately reflects the stereotypical type of patients
• Supports the medical report:
• 80% of older adults suffer from at least one chronic condition
and 50% have two or more chronic conditions
17
Phenotype Discovery:Meaningful Subphenotypes
Marble
Rubik
18
Phenotype Discovery:Meaningful Subphenotypes
Experiment setup:
• Four guidance: hypertension, diabetes1, diabetes2, heart disease
• 2 sub-phenotypes for each guidance
• 8 sub-phenotypes in total
Experts evaluation:
• 62.5% are meaningful
• 37.5% are possibly meaningful
19
More Distinct Phenotypes
• Pairwise constraint leads to more distinct phenotypes
• Average similarity tend to stabilize when
is larger than 10
20
Missing Data Analysis
Generate missing data:
randomly set the observed values to be 0
21
Noise Analysis
Generate noise:
randomly set the unobserved entries to be 1
22
Incorrect Guidance
Generate incorrect guidance:
randomly set entries in guidance matrix to be 1
23
Scalability
Rubik is around six times faster than competitors
24
Constraints Analysis
• All constraints are important!
• Pairwise constraint provides the largest boost
25
Conclusions
• The resulting phenotypes are concise, distinct,
and interpretable
• Rubik can also incorporate guidance from
medications and patients
• Rubik is robust to noisy and missing data
• Rubik is scalable
26
Rubik
+
≈
Phenotype 1
Observed Tensor
Complete
Missing data
…
+
Phenotype R
Distinct phenotypes
Bias Tensor
Align with medical knowledge
Full Tensor
Scalable
Thanks!
Yichen Wang1, Robert Chen1, Joydeep Ghosh2, Joshua C. Denny3, Abel
Kho4, You Chen3, Bradley A. Malin3, Jimeng Sun1
Georgia Institute of Technology1
University of Texas, Austin2
Vanderbilt University3
Northwestern University4
{yichen.wang, rchen87}@gatech.edu, [email protected],
[email protected] {josh.denny, you.chen, b.malin}@vanderbilt.edu, [email protected]
28