Download Slides - clear - Rice University

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Mixture model wikipedia , lookup

Multinomial logistic regression wikipedia , lookup

Transcript
Vital Sign Quality Assessment
using Ordinal Regression of
Time Series Data
Risa B. Myers
Comp 600
September 30, 2013
Christopher M. Jermaine PhD
Rice University
Department of Computer Science
John C. Frenzel MD
University of Texas
MD Anderson Cancer Center
Patient Monitoring
http://ak4.picdn.net/shutterstock/videos/1240198/preview/stock-footage-looping-animation-of-a-medicalhospital-monitor-of-normal-vital-signs-hd.jpg
What Vitals Signs Are
• Physiological Measures
– Temperature
– Blood Pressure
– Heart Rate
– Respiration Rate
Vital signs vs. EKG
Systolic BP
✔
Heart Rate
Diastolic BP
Minutes
✗
Seconds
Volatility
• New term, wrt vital signs
• Changes
• Not just variance
Anesthesia Vital Signs
Motivation
• Computer Science
– Learn to interpret pattern-less signals
• Biomedical
– Assess quality of care
– Clinical Decision Support
• Interpret patient data
• Discover underlying causes
• Predict outcomes and events
#7
Goals
• Interpret vital sign data in a patient chart
• Assign a volatility label
• Mimic an expert’s assessment
• Predict outcome
Contributions
• Novel approach to ordinal regression for
time series data lacking characteristic
patterns
• Ability to identify outlier time series
• Model that can mimic expert
assessment
Terms
Vital Sign Quality Assessment using
Ordinal Regression of Time Series Data
Time Series
• Ordered series of data
• Some relationship exists
Average monthly high temperatures in Houston
63, 66, 72, 79, 85, 90, 92, 93, 88, 81, 72, 65
www.weather.com
Ordinal Regression
Ordinal Temperature Labels
120
° Fahrenheit
100
80
Really Hot
Hot
Nice
Cool
60
40
20
0
Temperature
Ordinal Regression
Classification vs. Ordinal Regression
Classes have order
Labeled Vital Signs
State of the Art
• Bayesian modeling of time series
– Sykacek & Roberts – Hierarchical Bayesian model
to perform feature extraction and classify time
segments using a latent feature space
• Small # of real examples
• Time Series
–
–
–
–
kNN – DTW
Complexity-invariant classification
Shapelets
…
kNN-DTW
C. Cassisi, P. Montalto, M. Aliotta, A. Cannata, and A. Pulvirenti,
“Similarity Measures and Dimensionality Reduction Techniques for Time
Series Data Mining,” no. 3, InTech, 2012.
1NN-DTW
Complexity Invariance
G. Batista, X. Wang, and E. J. Keogh, “A complexity-invariant distance
measure for time series,” SIAM Conf Data Mining, 2011.
Shapelets
L. Ye and E. Keogh, “Time series shapelets,” presented at the the 15th
ACM SIGKDD international conference, New York, New York, USA, 2009,
p. 947.
Biomedical Labeling
• Vital sign analysis
– Yang et al. – Classification of anesthesia time series
segments
• Patterns, durations, frequencies and sequences of patterns
defined by an anesthesiologist
• (Ordinal) regression
– Meyfroidt et al. – Length of stay prediction after cardiac
surgery
•
•
•
•
Vital signs derived values + additional patient and case data
Off-the-shelf classifiers
Regression problem, but use RMSE for evaluation
Best result: better than nurses, better than standard risk model,
comparable to physicians’ predictions
The AR-OR Model
• Autoregressive – Ordinal Regression Model
• Generates ordinal labels using statistical
properties of time series
• Assumes patients with the same volatility
label have similar state profiles
AR-OR Model Components
1. Autoregression – Time series representation
2. Segmenting – State assignment
3. Ordinal Regression – Integer valued output
1. Autoregression
Linear combination of previous values + noise
Autoregression in AR-OR
• Order = 1
• Coefficients = 1
Average monthly high temperatures
63, 66, 72, 79, 85, 90, 92, 93, 88, 81, 72, 65
Change in average monthly high temperatures
3,
6,
7,
6,
5,
2,
1, -5, -7, -9, -7
2. States via HMM
• Hidden Markov Model
– States (hidden)
– Emissions (visible)
– Transition Matrix
2. State Assignment
Inference
2. Segmenting
State 1: 41%
State 2: 25%
State 3: 7%
State 4: 19%
State 5: 8%
3. Regression
Generative Process
K
Number of states
L
Number of labels
D
Number of time series
Φ
Autoregression coefficients
R
Autoregressive order
p
State transition probabilities
μ
State means
Σ
State covariance matrices
r
Regression coefficients
p0
Initial state probabilities
ω
Goalpost
σ2ω
Goalpost variance
σ2r
Regression variance
Mi
Time series length
s
State
f
Fraction of time in each state
x
Observations
v’
Real valued label
v
Ordinal label
Bayesian Approach
• Probability Density Function of the form
• X - training data set
– Observed values
• Y - hidden variables
– States, hidden label, …
• Θ - model parameters
– State means, co-variances, transition matrix, …
Data
• MD Anderson Cancer Center
• Surgical vital sign
– Systolic Blood Pressure
• 3 anesthetists
• 200 time series
• Labels:1 (stable) to 5 (highly volatile)
Implementation
• Markov chain Monte Carlo
– Iterative process
– Sampling from probability distributions
• Gibbs Sampling
– Conjugate priors
– Rejection Sampling
• Two phases
– Learning model parameters
– Labeling unknown series
Final Label
• Assign label based on the mode of last n
iterations
Comparison
1.
2.
3.
4.
5.
6.
Upper Bound – 2 experts predicting 1
AR-OR Model*
1NN-DTW
1NN-Complexity-Invariant Distance
Linear Regression on variance
Guess the most common label
*My model
Results
3
2
1
0
All
Outliers
Current Work
• Other time series without patterns
– ICU
• Expanded model
– Demographics
– Time series features
– Multiple time series
• Direct comparisons
– Demographic data only
– Demographics + 1st and 2nd order features
– Demographics + times series features + time series
• More objective labels
– Length of stay
– Expiration
Next Steps
• Focus on feature selection
• Solving a clinical problem
• Expand model
– History
• Medications
• Lab results
References and
Acknowledgements
•
P. Sykacek and S. Roberts, “Bayesian time series classification,” presented at
the Advances in Neural Information Processing 14, Boston, MA, 2002, pp. 937–
944.
1. P. Yang, G. Dumont, and J. M. Ansermino, “Online pattern recognition based on
a generalized hidden Markov model for intraoperative vital sign monitoring,” Int.
J. Adapt. Control Signal Process., vol. 24, 2010.
2. G. Meyfroidt, F. Güiza, D. Cottem, W. De Becker, K. Van Loon, J.-M. Aerts, D.
Berckmans, J. Ramon, M. Bruynooghe, and G. Van Den Berghe, “Computerized
prediction of intensive care unit discharge after cardiac surgery: development
and validation of a Gaussian processes model.,” BMC Med Inform Decis Mak,
vol. 11, p. 64, 2011.
Supported in part by by the NSF under grant number 0964526 and by a
training fellowship from the Keck Center of the Gulf Coast Consortia, on
Rice University’s NLM Training Program in Biomedical Informatics (NLM
Grant No. T15LM007093).
Take-aways
• Time series data are difficult to analyze
• Using time series data produces better
results than approaches like Linear
Regression
• Machine learning approaches can
approximate expert assessments
• Opportunity & need for clinical decision
support
Provider Labels
Apply Bayes’ Theorem
• To learn the model parameters
• To learn the label for the test time series
Autoregression
in the AR-OR Model
• Time series values used to determine
the state means and variances
• Each state has a set of AR coefficients
• Simplified
– AR(1)
– Coefficients = 1
• Values are the differences between
points
MSE- All Test Cases
Provider
Gold
Stnd
AROR
1NNDTW
1NNCID
LR
Gues
s3
1
0.52
0.50
1.38
0.65
0.63
0.61
2
0.81
0.94
1.25
1.39
1.20
1.01
3
0.58
0.58
1.10
0.89
0.80
0.76
TPR– All Test Cases
Provider
Gold
Stnd
AROR
1NNDTW
1NNCID
LR
Gues
s3
1
0.57
0.58
0.35
0.50
0.55
0.55
2
0.44
0.35
0.42
0.30
0.35
0.39
3
0.52
0.53
0.39
0.46
0.41
0.41
MSE – Outliers
Provider
Gold
Stnd
AROR
1NNDTW
1NNCID
LR
Gues
s3
1
2.71
2.08
4.81
2.51
4.00
4.00
2
2.32
1.99
2.20
1.80
3.49
4.00
3
1.68
1.16
3.54
2.15
3.55
4.00
TPR– Outliers
Provider
Gold
Stnd
AROR
1NNDTW
1NNCID
LR
Gues
s3
1
0.01
0.11
0.05
0.05
0.00
0.00
2
0.00
0.06
0.28
0.41
0.00
0.00
3
0.04
0.06
0.01
0.06
0.00
0.00
State Fraction Equation
• Time spent in state S
States for time series i
State S
Indicator function
Length of time series i
Ordinal Regression in the
AR-OR Model
Real valued outcome
Ordinal regression noise
Number of states
State fraction function
Regression coefficient for state k
Autoregression
Observed data
Constant
Order of the regression
Regression coefficient
Noise
States and mmHg
State Assignments
150
100
50
0
50
100
150
200
Duration of Surgery (Minutes)
250
Bootstrapping
• Randomly sample test set with
replacement
– 30% of records
• Remaining records are training set
• Repeat
• Alternative to k-fold cross-validation