Download Measuring Glycemic Variability and Predicting Blood Glucose

Document related concepts

Epidemiology of metabolic syndrome wikipedia , lookup

Group development wikipedia , lookup

Artificial pancreas wikipedia , lookup

Transcript
Measuring Glycemic Variability and Predicting Blood Glucose Levels Using Machine
Learning Regression Models.
A thesis presented to
the faculty of
the Russ College of Engineering and Technology of Ohio University
In partial fulfillment
of the requirements for the degree
Master of Science
Nigel W. Struble
December 2013
c 2013 Nigel W. Struble. All Rights Reserved.
2
This thesis titled
Measuring Glycemic Variability and Predicting Blood Glucose Levels Using Machine
Learning Regression Models.
by
NIGEL W. STRUBLE
has been approved for
the School of Electrical Engineering and Computer Science
and the Russ College of Engineering and Technology by
Cynthia R. Marling
Associate Professor of Electrical Engineering and Computer Science
Dennis Irwin
Dean, Russ College of Engineering and Technology
3
Abstract
STRUBLE, NIGEL W., M.S., December 2013, Computer Science
Measuring Glycemic Variability and Predicting Blood Glucose Levels Using Machine
Learning Regression Models. (108 pp.)
Director of Thesis: Cynthia R. Marling
This thesis presents research in machine learning for diabetes management. There are
two major contributions:(1) development of a metric for measuring glycemic variability, a
serious problem for patients with diabetes; and (2) predicting patient blood glucose levels,
in order to preemptively detect and avoid potential health problems. The glycemic
variability metric uses machine learning trained on multiple statistical and domain specific
features to match physician consensus of glycemic variability. The metric performs
similarly to an individual physician’s ability to match the consensus. When used as a
screen for detecting excessive glycemic variability, the metric outperforms the baseline
metrics. The blood glucose prediction model uses machine learning to integrate a general
physiological model and life-events to make patient-specific predictions 30 and 60
minutes in the future. The blood glucose prediction model was evaluated in several
situations such as near a meal or during exercise. The prediction model outperformed the
baselines prediction models, and performed similarly to, and in some cases outperformed,
expert physicians who were given the same prediction problems.
4
Acknowledgments
I would like to humbly thank my academic advisor and committee chair, Dr. Cynthia
Marling. Without her boundless inspiration and help this thesis would not have been
undertaken. I would also like to thank my committee members Dr. Razvan Bunescu for
his machine learning advice, Dr. Frank Schwartz for his willingness to work with
computer scientists, and Dr. Jundong Liu. I am thankful for everyone who has contributed
to the research behind this thesis, including the medical doctors Dr. Schwartz, Dr.
Shubrook, and Dr. Guo, the patients, and students on the project past and present. I would
also like to thank my family for supporting me through my education.
5
Table of Contents
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 SmartHealth Lab Research . . . . . . . . . . . . . . . . . .
2.2.1 The 4 Diabetes Support SystemTM Project . . . . . .
2.2.2 A Consensus Perceived Glycemic Variability Metric
2.2.3 Blood Glucose Prediction . . . . . . . . . . . . . .
2.3 Machine Learning Algorithms . . . . . . . . . . . . . . . .
2.3.1 Linear Regression . . . . . . . . . . . . . . . . . .
2.3.2 Support Vector Regression . . . . . . . . . . . . . .
2.3.3 Kernel Functions . . . . . . . . . . . . . . . . . . .
2.3.4 Multilayer Perceptron . . . . . . . . . . . . . . . .
2.3.5 Auto Regressive Integrated Moving Average . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
13
13
14
15
15
16
18
23
23
29
3
Consensus Perceived Glycemic Variability Metric . . . . . .
3.1 Background . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Distinction from HbA1C . . . . . . . . . . . .
3.1.2 Difficulty of Quantifying Glycemic Variability
3.1.3 Previous Measurement Methods and Studies .
3.1.4 The New Metric . . . . . . . . . . . . . . . .
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Data Collection and Format . . . . . . . . . .
3.2.2 Feature Engineering . . . . . . . . . . . . . .
3.2.3 Smoothing the Data . . . . . . . . . . . . . .
3.2.4 Machine Learning Algorithms . . . . . . . . .
3.2.5 Algorithm Configurations . . . . . . . . . . .
3.2.6 Feature Selection and Tuning . . . . . . . . .
3.2.7 10-Fold Cross Validation . . . . . . . . . . . .
3.2.8 Datasets . . . . . . . . . . . . . . . . . . . . .
3.2.9 Defining the CPGV Metric . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
32
33
33
34
35
35
35
39
41
41
41
43
43
44
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
3.3
3.4
3.2.10 Screen for Excessive Glycemic Variability . . . . . . . . .
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 CPGV Metric Performance . . . . . . . . . . . . . . . . .
3.3.2 Performance of the Excessive Glycemic Variability Screen
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
44
45
45
46
50
4
Blood Glucose Prediction . . . . . . .
4.1 Background . . . . . . . . . . .
4.1.1 Previous Work . . . . .
4.2 Methods . . . . . . . . . . . . .
4.2.1 Data . . . . . . . . . . .
4.2.2 Physiological Model . .
4.2.3 Feature Vector . . . . .
4.2.4 Walk Forward Testing .
4.2.5 Baselines for Evaluation
4.2.6 Evaluation . . . . . . .
4.3 Results . . . . . . . . . . . . . .
4.4 Discussion . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
52
52
52
54
54
56
58
59
60
61
63
71
5
Related Research . . . . . . . . . . . . . . . . . . . .
5.1 Diabetes Related Research . . . . . . . . . . . .
5.1.1 Physiological Models . . . . . . . . . . .
5.1.2 Predictive Blood Glucose Control . . . .
5.1.3 Glycemic Variability . . . . . . . . . . .
5.2 Machine Learning Related Research . . . . . . .
5.2.1 Machine Learning For Problem Detection
5.2.2 Time Series Prediction . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
73
73
73
74
77
78
79
80
6
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.1 Variability Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Blood Glucose Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7
Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Appendix A: CPGV Full Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Appendix B: Full Blood Glucose Level Prediction Results . . . . . . . . . . . . . . 99
7
List of Tables
Table
Page
3.1
3.2
3.3
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Metric performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Classification performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1
4.2
4.3
4.4
4.5
4.6
4.7
Previous study prediction results.
RMSE baselines . . . . . . . . .
Ternary baselines . . . . . . . .
New prediction model results . .
Statistical significance . . . . .
CEGA Regions 30 . . . . . . .
CEGA Regions 60 . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
54
63
64
65
65
70
70
A.1 Full glycemic variability results . . . . . . . . . . . . . . . . . . . . . . . . . 97
B.1 Full prediction results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8
List of Figures
Figure
Page
2.1
2.2
2.3
2.4
2.5
Tuning λ . . . . .
-tube . . . . . .
Perceptron . . . .
MLP . . . . . . .
Sigmoid function
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
20
24
25
26
3.1
3.2
3.3
3.4
Glycemic Variability
Slope . . . . . . . .
ROC curve . . . . .
Rated plots . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
39
49
50
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
Physiological Model . . . .
Walk Forward Testing . . . .
Physician prediction GUI . .
CEGA . . . . . . . . . . . .
t0 CEGA . . . . . . . . . . .
ARIMA CEGA . . . . . . .
SVR CEGA . . . . . . . . .
Physician 1 CEGA . . . . .
Physician 2 CEGA . . . . .
Physician 3 CEGA . . . . .
New prediction model CEGA
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
60
62
66
67
67
68
68
69
69
70
.
.
.
.
.
9
1
Introduction
This thesis presents research in machine learning for diabetes management. There are
two major contributions:
1. development of a metric for measuring glycemic variability, a serious problem for
patients with diabetes; and
2. predicting patient blood glucose levels, in order to preemptively detect and avoid
potential health problems.
This work contributes to two of the three major projects of the SmartHealth Lab at
Ohio University. The SmartHealth Lab projects include the 4 Diabetes Support System
(4DSS), glycemic variability measurement, and blood glucose prediction. The 4DSS
project provides problem detection and decision support for patients with type 1 diabetes
mellitus (T1DM). The glycemic variability metric is a tool that physicians can use to help
gauge overall glycemic control. The blood glucose prediction project aims to anticipate
impending blood glucose control problems, thereby enabling preventative action.
Patients with T1DM use insulin to control their blood glucose levels to a range
prescribed by their doctor. Poor blood glucose control fails to maintain this range, and
over time leads to several adverse effects including blindness, kidney failure, and
premature death (DCCT Research Group and others, 1987). The glycemic variability
metric provides an easy way to measure glycemic control (Rodbard et al., 2009). Blood
glucose prediction can enable patients to preemptively correct blood glucose levels before
a dangerous excursion occurs. A detailed description of diabetes, and the machine
learning techniques used in this work is provided in Chapter 2.
The first contribution of this thesis is a Consensus Perceived Glycemic Variability
metric (CPGV). The metric was built using machine learning algorithms to capture the
consensus of a group of expert physicians’ impression of glycemic variability. The metric
10
combines several calculations and metrics performed on the blood glucose signal. The
glycemic variability metric developed in this work is the third iteration of the project,
(Vernier, 2009) being the first, and (Wiley, 2011) the second. The first two iterations
focused on classifying excessive vs. acceptable glycemic variability based on the
impressions of two local physicians. This work extends that idea to provide a continuous
metric based on the impressions of 12 physicians from across the country and around the
world. A complete report and evaluation of the metric is presented in Chapter 3. A full list
of the machine learning algorithms and results for the metric are provided in Appendix A.
The second contribution of this work provides a framework for a blood glucose
prediction system. The prediction system incorporates a physiological model of blood
glucose as well as other factors. The blood glucose prediction system uses machine
learning to combine the components of the physiological model with the other factors to
predict blood glucose levels 30 and 60 minutes in the future. The full description of the
blood glucose prediction work is shown in Chapter 4. A comprehensive list of prediction
results is given in Appendix B.
Chapter 5 describes related research. This includes research on physiological models,
predictive blood glucose control, glycemic variability, problem detection, and time series
prediction. Chapter 6 describes future work possibilities. Chapter 7 gives a summary and
conclusion to this work.
11
2
Background
This chapter provides background relevant to this work. First diabetes is defined and
the challenges of managing diabetes are presented. Next, the work is positioned within
SmartHealth Lab research at Ohio University on intelligent diabetes management. Finally,
the machine learning approaches used in this work are described.
2.1
Diabetes
Diabetes mellitus, or simply diabetes, is a chronic disease which disrupts the body’s
natural ability to manage blood glucose levels. There are two types of diabetes, Type 1
Diabetes Mellitus (T1DM), and Type 2 Diabetes Mellitus (T2DM). Patients with T1DM
do not produce insulin on their own to control their blood glucose (American Diabetes
Association, 2012c). Patients with T2DM do produce insulin, however it is in insufficient
amounts to control their blood glucose.
Worldwide, there are about 350 million people living with diabetes (Danaei et al.,
2011). Of those, about 5-10% have T1DM for which there is no known cure or prevention
(World Health Organization, 2011). In 2007, the total annual cost of diabetes in the United
States was $174 billion. In 5 years this figure increased by 41% to $245 billion in 2012
(American Diabetes Association, 2013).
The primary goal of diabetes management is for the patient to maintain a glucose
level in a range prescribed by the patient’s physician, typically between 70 and 160 mg/dl
(American Diabetes Association, 2012a). A patient is hypoglycemic when their glucose
level drops below this range; when the glucose level rises above this range, they are
hyperglycemic. When a patient experiences hypoglycemia, they typically feel short-term
side effects including dizziness, and confusion, and they are at risk for more serious
problems, including coma and seizure (American Diabetes Association, 2012b).
12
Prolonged hyperglycemia is known to increase the risk of chronic complications such as
heart disease, kidney failure, and blindness (DCCT Research Group and others, 1987).
The blood glucose level is affected directly by insulin and also by carbohydrate
intake. The blood glucose level is also affected indirectly by several life events such as
stress, exercise, and sleep. Patients continuously monitor their blood glucose levels and
make corrections in an attempt to keep the level within the range prescribed by their
doctor.
There are patients who, usually over time, become insensitive to the symptoms of
hypoglycemia. Undetected hypoglycemia during sleep can be particularly dangerous,
since the patient may not wake up in time to take action. These patients need to take extra
care to control their disease to avoid and correct hypoglycemia.
To monitor diabetes control on a day-to-day basis, patients with T1DM take a
fingerstick blood sample 4-8 times a day to measure their blood glucose. An insulin pump
gives patients more control over when and how much insulin they take than traditional
injections. The pump delivers a basal amount of insulin continuously throughout the day.
It delivers additional boluses of insulin as needed for meals or to correct hypoglycemia. A
Continuous Glucose Monitor (CGM) can also be used, which reads a measurement of the
blood glucose every 5 minutes. A CGM does not replace fingersticks since the CGM is
not as accurate as fingersticks, and needs to be calibrated several times a day.
Patients with a Medtronic insulin pump have access to the Bolus WizardR . The Bolus
Wizard uses information from the current blood glucose level and carbohydrate intake to
calculate how much bolus insulin a patient needs to take to correct for hyperglycemia or to
compensate for a meal Medtronic also provides CareLinkR software, which allows
patients to upload and review their pump and CGM data.
When diabetes patients visit their physicians, they take a blood test to determine their
HbA1c (glycosylated hemoglobin), which reflects their average glucose level over a
13
six-week period. It is recommended that the HbA1c is below 7% for most patients
(American Diabetes Association, 2012a). Physicians use information from HbA1c,
fingersticks, insulin, and CGM data to make recommendations to improve patients’
diabetes control.
2.2
SmartHealth Lab Research
The SmartHealth lab is currently working on three major projects (Marling et al.,
2012), the Diabetes Support System (4DSS) support system, a glycemic variability metric,
and blood glucose prediction. As of 2013, the SmartHealth lab has collected data from
three clinical research studies, and a fourth running study, of patients with T1DM to
develop its projects.
2.2.1
The 4 Diabetes Support SystemTM Project
The 4DSS project is a case-based reasoning (CBR) system that identifies problems
and offers possible treatments. The CBR system grows the case base when a physician
identifies a new problem with a patient and decides on a clinical treatment. The outcome
of the case is evaluated based on whether the treatment was followed by the patient, and if
the treatment successfully to fixed the problem.
When evaluating a new patient, the 4DSS system uses life-events and blood glucose
levels to find problems. The system then finds the closest match to other occurrences of
the problems that it finds. The suggested treatment is an adaptation of the treatment of the
most similar cases. If the treatment is followed and successful, then the new case would
be added to the case-base for future diagnoses.
Random samples of identified problems and treatments were shown to a panel of
physicians (Marling et al., 2012). The physicians agreed 90% of the time that the problem
identification system would be useful for physicians. They agreed 80% of the time that the
14
cases used for suggesting treatment were similar to the identified problems. They agreed
70% of the time that the suggested treatment would be beneficial to the patient.
The goal of this project is to provide automated problem detection and treatments to
physicians and nurses, who then can decide to relay the suggested treatments to patients if
they are deemed appropriate. This would allow physicians to give treatments and advice
to patients more frequently than their routine clinical check-ups.
2.2.2
A Consensus Perceived Glycemic Variability Metric
Glycemic variability is an important part of diabetes management. Excessive
glycemic variability has been linked to hypoglycemia unawareness (Rodbard et al., 2009),
which can lead to dangerously prolonged hypoglycemia. Automated detection of
glycemic variability would identify potentially at-risk patients.
There is no current metric for glycemic variability which has been agreed upon by
physicians, so it is not routinely assessed in clinical practice. However, physicians are able
recognize excessive glycemic variability when they see it in blood glucose plots. The goal
of this project is to capture that physician perception in order to measure glycemic
variability for clinical use.
Chapter 3 presents the Consensus Perceived Glycemic Variability (CPGV) metric
that has been developed to supplement HbA1c as a measure of overall glycemic control in
clinical practice. To develop this metric, 12 physicians managing patients with type 1
diabetes rated 250 24-hour continuous glucose monitoring (CGM) plots as exhibiting
(1)low, (2)borderline, (3)high or (4)extremely high glycemic variability. When physician
ratings were not unanimous, they were averaged to obtain a consensus. Descriptive
features derived from the CGM plots were used to train machine learning algorithms to
match consensus ratings.
15
2.2.3
Blood Glucose Prediction
When managing blood glucose, there is a time delay between an action and the
outcome. Food needs to be digested, insulin needs to be absorbed, and the CGM sensor
measures the glucose in the interstitial tissue which lags the glucose in the blood plasma
by about 15 minutes. Therefore, future uncertainty is a limiting factor in detecting
problems in real time, or before they ever occur.
Blood glucose prediction is a time series forecasting problem. Blood glucose is
predicted based on past blood glucose levels, insulin data, meal data, exercise,
medications, stress, sleep patters, work schedules, etc. Some patients respond differently
to certain life events such as stress.
Chapter 4 presents a glucose prediction model which uses a physiological model to
combine certain features in an informed way to improve prediction accuracy for 30 and 60
minutes in the future. This model is an incremental prediction model that is trained on
each patient individually.
The goal of this project is to incorporate a real time prediction system for patients.
The model could provide feedback to the patient who could use the predictions to
preemptively correct a problem. The model could also be used in a closed-loop artificial
pancreas to administer the appropriate amount insulin without patient intervention.
2.3
Machine Learning Algorithms
This section describes the machine learning and statistical approaches used in this
work. These are Linear Regression (LR), Support Vector Regression (SVR), Multilayer
Perceptrons (MLP), and Auto Regressive Integrated Moving Average (ARIMA).
16
2.3.1
Linear Regression
Linear Regression (LR), like all of the regression models in this work, is a means of
creating a mathematical function or model which takes input values and outputs a close
approximation to a desired value. The input values are a collection of features, called the
input vector, which is computed from a dataset. The purpose of regression is to use data
which is already known to approximate data which is difficult to obtain. In this work, LR
is one of the machine learning approaches used for measuring glycemic variability, as
described in Chapter 3, where features are computed from data which is automatically
collected by a sensor to approximate a manual evaluation of physicians.
The simplest form of LR, as described in (Bishop, 2007), takes the following form:
y(x, w) = wT x + w0
(2.1)
where x is a vector of features, and w is a weight vector. Equation 2.1 is a linear
combination of the input variables x and w. w0 allows for any fixed offset in the data and
is usually called the bias for the reason that it is the bias of the data, not to be confused
with statistical bias.
Since it is a linear combination of the input variables, the simple form of LR is
limited as to what it can fit. For this reason, the input variable x is usually replaced by the
basis function φ(x), as in Equation 2.2.
y(x, w) = wT φ(x) + w0
(2.2)
Using the form in Equation 2.2, the same behavior of Equation 2.1 can be achieved by
using the identity φ(x) = x. However, the function y(x, w) can be made nonlinear of the
input vector x by using a nonlinear basis function such as the polynomial basis function of
the form
φi (x) = xi
(2.3)
17
or the Gaussian basis function of the form
(x − µi )2
φi (x) = exp −
2s2
(
)
(2.4)
where µi controls the location of the Gaussian curve, and s controls the scale. Even if the
basis function is nonlinear, Function 2.2 is still considered a LR model since it is linear in
w. Many nonlinear basis functions exist, but the choice of basis function does not affect
the method of computing the vector w. The implementation of LR in this work is the
WEKA implementation (Hall et al., 2009), which uses the identity φ(x) = x for the basis
function.
The vector w is computed on a set of training vectors x1 , x2 , ..., xN , which have the
target values t, by minimizing the data-dependent sum-of-squares error function given by
N
1X
(tn − wT φ(xn ))2
(2.5)
E D (w) =
2 n=1
The minimum error is where Equation 2.5 is minimum. Since the function is convex, the
minimum is where the gradient is zero. The gradient of Equation 2.5 is given by
 N

N
X
X



∇E D (w) =
tn φ(xn )T − w  φ(xn )φ(xn )T 
n=1
(2.6)
n=1
Setting ∇E D (w) = 0 and solving for w gives
w = (ΦT Φ)−1 ΦT t
(2.7)
where Φ is an N × M matrix, called the design matrix, given by




φ
(x
)
φ
(x
)
·
·
·
φ
(x
)
 0 1

1 1
M−1 1 


 φ (x ) φ (x ) · · · φ (x ) 
1 2
M−1 2 
 0 2

Φ =  .

.
..
..
 ..
..

.
.



φ (x ) φ (x ) · · · φ (x )
0
N
1
N
M−1
(2.8)
N
In order to reduce over-fitting the training data, a regularization parameter is
introduced to the error function, which takes the form
E D (w) + λEW (w)
(2.9)
18
where λ is the regularization coefficient, E D (w) is as defined in Equation 2.5, and EW (w)
is given by
1
EW (w) = wT w
2
(2.10)
By introducing λ to the error function, the parameters in the vector w can be included in
the minimization function. This helps reduce over-fitting by learning smaller parameters
at the cost of increasing the error on the training set. The justification for this compromise
is based on Occam’s razor, which states that the simplest solution is usually correct
(Domingos, 1999). The regularization parameter can take any value, so the parameter
needs to be tuned to find a good balance between data error and complexity. Figure 2.1
shows an example of tuning λ on a dataset where ln(λ) ranges in steps of 5 from -50 to 0,
i.e., λ = e−50 to e0 using Root Mean Square Error (RMSE) as the measure of error.
Solving Equation 2.7 with the changes from Equation 2.9 gives
w = (λI + ΦT Φ)−1 ΦT t
(2.11)
where I is the identity matrix.
2.3.2
Support Vector Regression
Support Vector Regression (SVR) was first described by (Vapnik, 1995) and has been
described in many works including (Vapnik, 1998; Smola and Schölkopf, 2004; Bishop,
2007). SVRs are based on the equation
y(xn ) = wT φ(xn ) + b
where xn is an input vector, w is the vector of learned weights, φ is a transformation
function, and b is the bias.
(2.12)
19
Figure 2.1: Tuning the regularization coefficient, λ, on a separate validation dataset using
RMSE to measure the error.
SVRs minimize an error function based on the training vectors and their target
values. In this work, the type of SVR used is an -SVR in which the error function is




if |y(xn ) − tn | < ;

 0,
E (y(xn ) − tn ) = 
(2.13)



 |y(xn ) − tn | − , Otherwise
where > 0. If the difference between y(xn ) and the target tn is less than , i.e.,
|y(xn ) − tn | < , then the error is 0, otherwise the error is reduced by . The region where
|y(xn ) − tn | < is referred to as the -tube.
20
A regularization parameter, C, is introduced to give the error function:
C
N
X
n=1
1
E (y(xn ) − tn ) + kwk2
2
(2.14)
For target points that lie outside of the -tube, slack variables are introduced. There
are two slack variables ξn , and ξ̂n for each vector xn such that ξn ,ξ̂n ≥ 0 gives
tn ≤ y(xn ) + + ξn
(2.15)
tn ≥ y(xn ) − − ξ̂n
(2.16)
where ξn represents the error for y(xn ) that lies above or on the top edge of the -tube, and
ξ̂n represents the error for y(xn ) that lies below or on the bottom edge of the -tube.
Figure 2.2 shows an example of an -tube. Using the slack variables, Equation 2.14 can be
Figure 2.2: y(xn ) curve showing the -tube. Points above the tube have ξn > 0, points below
the tube have ξ̂n > 0. Filled in points represent support vectors.
rewritten as
C
N
X
1
(ξn + ξ̂n ) + kwk2
2
n=1
(2.17)
21
which can be minimized subject to the constraints in Equations 2.15 and 2.16 and
ξn ,ξˆn ≥ 0 by using Lagrangian multipliers and optimizing the following Lagrangian
(Bishop, 2007)
N
X
N
X
1
(ξn + ξ̂n ) + kwk2 +
(µn ξn + µ̂n ξ̂n )
L =C
2
n=1
n=1
−
N
X
an ( + ξn + y(xn ) − tn ) −
N
X
(2.18)
ân ( + ξ̂n − y(xn ) + tn )
n=1
n=1
By substituting Equation 2.12 for y(xn ) and setting the partial derivative with respect to
w, b, ξn , and ξ̂n to zero gives
N
X
∂L
=0⇒w=
(an − ân )φ(xn )
∂w
n=1
(2.19)
N
X
∂L
=0⇒
(an − ân ) = 0
∂w
n=1
∂L
= 0 ⇒ an + µn = C
∂w
∂L
= 0 ⇒ ân + µ̂n = C
∂w
(2.20)
(2.21)
(2.22)
Using these results with Equation 2.18 gives the optimization
N
min
N
1 XX
(an − ân )(am − âm )k(xn , xm )
2 n=1 m=1
a, â
+
N
X
(an + ân ) −
n=1
N
X
n=1
(an − ân )tn
(2.23)
subject to
N
X
(an − ân ) = 0,
n=1
0 ≤ an , ân ≤ C
Equation 2.23 is called the dual representation equation. The function k(xn , xm ) is called
the kernel function (Section 2.3.3).
22
The dual representation equation can be solved by Karush-Kuhn-Tucker (KKT)
conditions by providing stopping criteria. The KKT conditions imply that the product of
the dual variables and constraints vanish at the solution. The KKT conditions are:
an ( + ξn + y(xn ) − tn ) = 0
(2.24)
ân ( + ξ̂n − y(xn ) + tn ) = 0
(2.25)
(C − an )ξn = 0
(2.26)
(C − ân )ξ̂n = 0
(2.27)
By adding Equations 2.24 and 2.25 together, it can be seen that either an or ân (or
both) must be zero since > 0 and an , ân , ξn , ξ̂n ≥ 0. In the case where either an or ân is
non-zero, then xn is called a support vector. If an is non-zero, then Equation 2.24 implies
that either tn lies on the -tube (ξn = 0) or lies above it (ξn > 0). Similarly, if ân is
non-zero, then Equation 2.25 implies that either tn lies on the -tube (ξ̂n = 0) or lies above
it (ξ̂n > 0). If both an and ân are zero, then tn is within the -tube and xn is not a support
vector, as shown in Figure 2.2.
Substituting Equation 2.19 for w in Equation 2.12 yields
y(x) =
N
X
(an − ân )k(x, xn ) + b
(2.28)
n=1
which allows for predictions on a test vector x, where b is the bias which can be calculated
on a support vector xn in which 0 < an < C and ξn = 0, i.e., the support vector lies on the
edge of the -tube. Equation 2.24 with ξn = 0 implies that
+ ξn + y(xn ) + tn = 0
(2.29)
Substituting Equations 2.12 and 2.19 gives
b = tn − −
N
X
(am − âm )k(xm , xn )
m=1
(2.30)
23
A similar equation can be constructed for support vectors having 0 < ân < C, and in
practice, it is best to average the bias over all support vectors that lie on the -tube.
The implementation of SVR in this work is from the LIBSVM package (Chang and
Lin, 2011).
2.3.3
Kernel Functions
There is a class of functions that are valid kernels, all of which take the form of
Equation 2.31 for some function φ (Herbrich, 2002).
k(x, x0 ) = φ(x)T φ(x0 )
(2.31)
This work uses two kernel functions, the linear kernel
k(x, x0 ) = xT x0
(2.32)
k(x, x0 ) = exp(−γkx − x0 k2 )
(2.33)
and the Gaussian kernel
For the linear kernel, the φ function is simply
φ(x) = x
(2.34)
Kernels allow the input space to be mapped in higher dimensions to find relations in
the data. The Gaussian kernel creates an infinite dimensional feature space.
2.3.4
Multilayer Perceptron
Multilayer perceptrons are built from perceptrons (Figure 2.3) in a directed graph as
shown in Figure 2.4. An individual perceptron takes a vector of inputs a, and performs the
dot product of a with a weight vector w. The perceptron uses an additional fixed input of 1
for a0 . The weight value for w0 represents the bias term. The dot product is then used as
the input of an activation function g(x), to produce a single numeric output. In this work,
24
Figure 2.3: The Perceptron. Boxes represent input values. Each edge stores an associated
weight. An implicit input of a0 = 1 allows w0 to represent the bias term. The output of the
perceptron is the sum of inputs multiplied by their weights passed to the activation function.
there are two activation functions in the perceptrons in the MLP – The identity function,
and the Sigmoid function. The identity function is used only by the perceptron that gives
the final output of the network. This function is given by
g(ai ) = ai
(2.35)
The Sigmoid function is used by the perceptrons in the hidden layer of the network. This
function takes the form
g(ai ) =
1
(1 + exp(−ai ))
(2.36)
The Sigmoid function curve is shown in Figure 2.5.
The MLP is called a feed-forward network because the inputs are fed into the MLP at
the input layer to compute the values for all of the perceptrons in the hidden layer. Then
the values of the perceptrons in the hidden layer are fed into the output perceptron to
produce the final output.
25
Figure 2.4: The Multilayer Perceptron. Boxes represent input values. Circles represent
perceptrons. Each edge stores an associated weight. Perceptrons in the hidden layer use the
Sigmoid activation function. The output perceptron uses the identity activation function.
26
Figure 2.5: The Sigmoid function.
27
Like the other ML algorithms, the MLP is minimized in terms of its error. For the
MLP the error is expressed as
N
E=
1X
(y(xn ) − tn )2
2 n=1
(2.37)
where E is the error, y(xn ) is the output of the MLP for the nth training vector, and tn is the
target value for the nth training vector.
The error is minimized using a gradient descent approach on the following
differential
∂E
∂wi, j
(2.38)
where wi, j is the weight of the input i for perceptron j. Equation 2.38 represents the
change of error with respect to the change of wi, j for the current state of the network. The
error decreases the fastest if the components of wi, j are changed in the opposite direction
as the gradient, so the weight change function for the output perceptron becomes (Russell
et al., 1995)
N
X
∂E
= ai
(tn − y(xn ))
∆wi = −
∂wi
n=1
where
PN
n=1 (tn
(2.39)
− y(xn )) is the negative of the derivative of Equation 2.37, and ai is the input
i for the output perceptron. The change is combined over all training vectors.
Equation 2.39 computes the direction that the weight vector w should be changed by,
but to follow the gradient mathematically would involve taking a continuous descent over
the surface. To overcome the impractical mathematical limitations of gradient descent, the
weights are changed by taking steps along the gradient. The weight update function for
the output perceptron becomes
wi = wi + γ∆wi
(2.40)
28
where γ is the learning rate. The learning rate determines the step size, which is
proportional to the magnitude of the gradient. The step size allows the network to learn
faster, but a step size too large can cause oscillations around the optimum value.
For the hidden layer perceptrons, the error is propogated backwards from the output
perceptron. Each perceptron j of the hidden layer shares a portion of the error from
Equation 2.39. The weight change function for the hidden layer perceptron j is given by
∆wi, j = g (in j )ai, j
0
K
X
wk, j ∆w j
(2.41)
k=0
where
in j =
K
X
ak wk, j
k=0
g0 (a) = g(a)(1 − g(a))
where wi, j is the weight for input i to perceptron j, g(a) is the Sigmoid function shown in
Equation 2.36, and ai, j is the input i to perceptron j. ∆w j is defined in Equation 2.39. The
weight update function for the hidden layer perceptrons is similar to the weight update
function for the output perceptron. The update function is given by
wi, j = wi, j + γ∆wi, j
(2.42)
In addition to the learning rate, a parameter called momentum is introduced. The
momentum parameter alters the weight change functions to include momentum as a
portion of the weight change from the previous learning iteration. The purpose of the
momentum parameter is to help overcome an intrinsic issue with the gradient descent
approach for MLPs. The gradient in Equation 2.38 is not guaranteed to have a local
minimum which is the global minimum. This means that if the gradient descent approach
finds a point at which no small change to the weights will decrease the error, it might not
be the best solution, which would decrease performance. The momentum helps the weight
change function to escape these non-optimal local minimum points, but this will only
29
work if the momentum is large enough to push the weight change function past the local
minimum point. However, if the momentum is too large, the weights may oscillate or fall
in a non-optimal local minimum point.
The updated weight change functions with the momentum parameter become
∆wi (T + 1) = β∆wi (T ) + ai
N
X
(tn − y(xn ))
(2.43)
n=1
∆wi, j (T + 1) = β∆wi, j (T ) + g0 (in j )ai, j
K
X
wk, j ∆w j (T + 1)
(2.44)
k=0
where T is the learning iteration, and β is the momentum.
The MLP implementation in this work is from the WEKA package (Hall et al.,
2009). In this work, there are 500 learning iterations for the MLP, and the hidden layer has
|xn |
2
+ 1 perceptrons in the hidden layer.
2.3.5
Auto Regressive Integrated Moving Average
Auto Regressive Integrated Moving Average (ARIMA) is a statistical model of a time
series which creates a function for the value of the time series at time t based on previous
values. This allows an ARIMA model to extrapolate future values of the time series, since
the model assumes homogeneity, i.e., that a portion of the time series behaves much like
the rest of the time series. The ARIMA model and its equations are described in (Box
et al., 2008). The ARIMA model is generally referred to as ARIMA(p, d, q), in which the
parameters p, d, and, q correspond to the order of the individual processes of the ARIMA
model – Auto Regressive(AR), Integrated(I), and Moving Average(MA) respectively.
When one of the parameters is 0, that process drops out of the model, i.e., ARIMA(1,0,1)
would be equivalent to an ARMA(1,1) model.
Given a time series of the form y = {yt , yt−1 , ..., y1 }, the ARIMA(p, d, q) model can be
described by
yt = φ1 yt−1 + ... + φd+p yt−d−p − β1 t−1 − ... − βq t−q + t
(2.45)
30
where φ are the components of the AR model, β are the components of the MA model,
and t represents the error of the prediction for time t.
Equation 2.45 can be used to predict future values of the time series. If yt is the last
known value of the time series, then the prediction for the next value of the series would
be given by
yt+1 = φ1 yt + ... + φd+p yt−d−p+1 − β1 t − ... − βq t−q+1
(2.46)
t+1 drops out since the error is unknown, and the expected value for i is 0. The predicted
value for yt+1 can then be used to predict yt+2 , and so on; so the value for yt+l can
eventually be predicted. Since the predicted value for yt+l is based on l − 1 predictions, the
uncertainty of the predictions increases as l increases.
The ARIMA model in this work is implemented in the R statistical package (R Core
Team, 2013) using the auto.ARIMA function. The auto.ARIMA function determines the
values of p, d, and q using the Hyndman and Khandakar algorithm (Hyndman and
Khandakar, 2008).
31
3
Consensus Perceived Glycemic Variability Metric
This chapter presents the new Consensus Perceived Glycemic Variability Metric
(CPGV). First, glycemic variability is described and the purpose of measuring it is
explained. Next, the development and evaluation of the new CPGV metric is presented.
Finally, there is a discussion which includes how the metric can be further used as a
feature for predicting blood glucose. A paper based on the work presented in this chapter
has been published in the Journal of Diabetes Science and Technology (Marling et al.,
2013).
3.1
Background
Both physicians and patients are now beginning to consider glycemic variability in
addition to maintaining a glycemic range (Siegelaar et al., 2010). Increased glycemic
variability is associated with poor glycemic control (Rodbard et al., 2009) and is a strong
predictor of hypoglycemia (Monnier et al., 2011; Qu et al., 2012), which has been linked
to excessive morbidity and mortality (Zoungas et al., 2010; Seaquist et al., 2012). While
physicians acknowledge the need for a glycemic variability measurement, there is no
current consensus on the best glycemic variability metric to use, or on the criteria for
acceptable or excessive glycemic variability (Bergenstal et al., 2013).
Stated simply, glycemic variability is the fluctuation in blood glucose level over time.
However, many factors are considered by a physician to determine the variability
displayed on a patient’s glucose chart, and no single mathematical feature has been able to
closely measure variability. Figure 3.1 shows four examples of glucose variability. The
variability displayed by each subfigure was evaluated by physicians (described in
Section 3.2.1) and unanimously rated as low (3.1a), borderline (3.1b), high (3.1c), or
extremely high (3.1d).
32
(a) Low
(b) Borderline
(c) High
(d) Extremely high
Figure 3.1: Blood glucose plots over 24 hours showing glycemic variability
3.1.1
Distinction from HbA1C
Patients know that a lower HbA1c is better and know that they are improving their
diabetes management if they are able to reduce their HbA1c to closer to their target, which
is 7% for most patients (American Diabetes Association, 2012a). However, there is no
method for patients to determine if their glycemic variability is improving. Since HbA1c
is an average, it does not capture variability, and a patient may have a low HbA1c and still
be at risk for complications if their glycemic variability is high.
33
Some patients experience many hypoglycemic events, which puts them at high risk
and reduces their HbA1c. Their low HbA1c, if considered on its own, would show good
diabetes management, however, glycemic variability would reflect this high risk.
3.1.2
Difficulty of Quantifying Glycemic Variability
Quantifying glycemic variability is difficult to automate since there are several
factors that are considered by physicians when reviewing a patient’s data. Some of these
factors include the number of excursions from a normal glucose level, the magnitude of
the excursions, how rapidly the glucose level is changing, and in what direction.
Physicians also know that the blood glucose readings from the sensors can be off by as
much as 20% (Klonoff, 2005). This means that physicians know that many small changes,
or a very rapid change, are likely the result of signal noise. For these reasons, glycemic
variability is not routinely assessed in clinical practice.
3.1.3
Previous Measurement Methods and Studies
One of the earliest methods designed to measure glycemic variability is mean
amplitude of glycemic excursions (MAGE) (Service et al., 1970). Since then, other
metrics have been developed. Previous studies have been conducted at Ohio University to
classify blood glucose plots as excessively variable or not using these metrics. The first of
these studies (Vernier, 2009) used MAGE, 75 point excursions (EF), and distance traveled
(DT) to classify plots as excessively variable or not. A naive Bayes classifier was trained
during this study that agreed with physicians’ classifications 85% of the time.
The second study (Wiley, 2011) added to the work from the first study by engineering
additional features and smoothing sensor data. These features, described in Section 3.2.2,
were used to build a multilayer perceptron model that agreed with the physicians 93.8% of
the time.
34
Both of the previous studies, however, only classified plots as exhibiting excessive
glycemic variability or not. This makes the previous models useful for detecting excessive
glycemic variability, but not as useful for measuring overall diabetes control. In this study,
a metric is developed to represent a patient’s glucose variability as a single number that
can be used to assess control and evaluate progress.
3.1.4
The New Metric
To address the limitations of the previous classifiers, a new metric was developed for
potential clinical use as a measure of glycemic variability. This new metric could be used
in a similar way as HbA1c is used now. The patient would submit Continuous Glucose
Monitoring (CGM) data over a few days before each doctor’s appointment. The physician
would then be able to determine whether the patient is improving his or her glycemic
control by comparing metric values.
The metric is trained on a consensus of the perceived glycemic variability exhibited
in patient blood glucose charts. Glycemic variability is perceived because it is evaluated
and quantified by physicians reviewing blood glucose charts. There is a consensus,
because the evaluation of multiple physicians are combined.
The new glycemic variability metric could also be used as a screen to classify blood
glucose variability as excessive or not, like the previous models. This would be
accomplished by determining a threshold value that separates excessive and acceptable
glycemic variability. This threshold could provide a target for patients to achieve, much
like the 7% target for HbA1C.
A potential third use for the metric is to quantify glycemic variability as an input
feature for a blood glucose prediction model (Chapter 4). Knowledge of glycemic
variability could improve a prediction model’s ability to predict the magnitude of change
in blood glucose.
35
3.2
Methods
This section describes the methods used to develop the glycemic variability metric.
3.2.1
Data Collection and Format
To obtain a consensus of perceived glycemic variability, expert physicians were
asked to rate blood glucose plots for their variability. Each expert was asked to give a
rating of 1 - 4, where 1 was low variability, 2 was borderline variability, 3 was high
variability, and 4 was extremely high variability.
In total, twelve physicians rated 250 glucose plots, each representing 24 hours of
CGM data. Each doctor was asked to rate 55 of these plots, but some of them chose to rate
a second batch of plots, for a combined total of 820 ratings. The first five plots each doctor
rated were the same for all doctors. These five plots were intended to discover if any of the
doctor’s ratings were significantly different from the others, and to calibrate the doctors by
providing them with examples over the spectrum of variability that they were to see.
Each doctor received plots randomly, but in such a way that all plots would have
approximately the same number of ratings, and no doctor would rate the same plot twice.
Thus, each plot was rated by either three or four different doctors. The consensus rating
for each plot is the average of the ratings for the plot. The doctors rated the plots using a
web interface away from the researchers, which eliminated any accidental influence on the
physicians from the researchers.
3.2.2
Feature Engineering
In the first study (Vernier, 2009) three features were used: Mean Amplitude of
Glycemic Excursions (MAGE), Excursion Frequency (EF), and Distance Traveled (DT).
MAGE (Service et al., 1970) was developed to measure the mean of the glycemic
excursions that have an amplitude greater than the standard deviation over the 24 hours.
36
The amplitude of an excursion is the difference between the peak (high) and nadir (low).
Similar to MAGE, EF measures the frequency of excursions of 75 mg/dl or greater. The
third measurement, DT, simply sums the difference between each pair of consecutive
glucose readings.
In the second study conducted (Wiley, 2011), eight additional features were added to
the original three. The first of these is standard deviation.
The next feature added in the second study was Area Under the Curve (AUC). This
feature computes the total area under the blood glucose curve relative to the minimum
observed blood glucose value. Equation 3.1 shows the formula for calculating area under
the curve:
AUC =
N
X
xi − xmin
(3.1)
i=1
The next set of features involves central image moments. To compute the central
image moments, a binary intensity function f (x, y) is used to denote whether the pixel
(x, y) lies within the image. The image is represented by the region C between the glucose
plot curve and the minimum glucose value. The binary intensity function is defined in
Equation 3.2:





 1, (x, y) ∈ C
f (x, y) = 



 0, Otherwise
(3.2)
With this function, the image moments can be computed with Equation 3.3:
m pq =
XX
x
x p yq f (x, y)
(3.3)
y
Coincidentally, m00 is equivalent to Area under the curve. The center of mass for the x and
y axis can be calculated with m00 , m01 and m10 giving the centroid of the image ( x̄, ȳ)
defined by:
x̄ =
m10
,
m00
ȳ =
m01
m00
37
The central image moments are calculated using the center of mass for the x and y axis as
shown in Equation 3.4:
µ pq =
XX
x
(x − x̄) p (y − ȳ)q f (x, y)
(3.4)
y
The central image moments used as features are µ11 , µ20 , µ02 , µ21 , µ12 , µ30 and µ03 .
Eccentricity is a measure of how close an object is to a circle. In this case, the object
is the outside edge of the glucose plot in which the horizontal line passing through the
minimum glucose value is curved back onto itself into a circle, with the rest of the glucose
plot being wrapped around this circle. Eccentricity can be calculated from central image
moments, shown in Equation 3.5 as the ratio of the maximum and minimum distance of
the center of mass ( x̄, ȳ) and the edge (Theodoridis and Koutroumbas, 2009):
=
(µ20 − µ02 )2 + 4µ11
µ00
(3.5)
Discrete Fourier Transform (DFT) converts an analog signal into sinusoidal
components of different frequencies. Each component represents a frequency, along with
a value, as a complex number encoding both the amplitude and phase of the sinusoidal
wave. A summation of each component’s sinusoidal frequency multiplied with its
amplitude, and accounting for phase shift, would result in the original signal. Large
amplitudes in the lower frequencies would correspond to fluctuations that take a long
time, while large amplitudes in higher frequencies correspond to high frequency
fluctuations such as sensor noise. The amplitudes of the first 24 DFT frequencies (FF) are
taken as features.
Roundness Ratio (RR) is a ratio between the square of the perimeter, P, and the area
of the CGM plot, as represented in Equation 3.6. P is different from DT because DT only
measures the change in glucose level between each pair of consecutive glucose readings,
but P is a measure of the Euclidean distance between the points.
RR =
P2
4πµ00
(3.6)
38
Bending energy (BE) is a representation of the amount of energy a particle would
require to traverse the glucose plot. BE can be calculated by computing the average
curvature shown by Equation 3.7:
n−2
1X
(θi+1 − θi )2 ,
BE =
P i=1
yi+1 − yi
where θi = arctan
xi+1 − xi
!
(3.7)
Direction codes (DCs) are the absolute difference between two consecutive glucose
levels; thus there are n − 1 DCs, where n is the number of blood glucose readings.These
DCs are placed into three bins of size three starting at zero, i.e.,
b1 = [0, 2], b2 = [3, 5] and b3 = [6, 8]. Any DC greater than 8 is not placed into a bin.
There are three features derived from DCs corresponding to the ratio of the size of bin bi
and the total number of direction codes n − 1; thus, DCi = ci /(n − 1), where ci is the total
number of DCs that fall into bin bi .
In this third study, two additional features were added to the set of features. Both of
these new features quantify the maximum slope during an excursion, one for an increasing
slope, and one for a decreasing slope. Two separate features were chosen, because an
increasing blood glucose level is caused by different factors than a decreasing one, and a
decreasing blood glucose level is considered more dangerous, because that could indicate
that the patient is heading toward hypoglycemia.
The slope is calculated on an excursion from the time the excursion began to the time
when the blood glucose traveled 75 mg/dl or more. Excursions begin at either a peak
(local maxima) or nadir (local minima). A distance greater than 75 mg/dl is included only
when necessary for one end of the excursion or the other to be outside of the normal
range. This calculation follows the prior implementation of the 75 point excursion
frequency feature. Figure 3.2 shows decreasing slopes in red and increasing slopes in
yellow. The intuition is that the slope associated with an excursion is more important than
a slope that is not. Calculating the slope only on excursions also reduces the likelihood of
39
the feature simply representing random signal noise. The maximum daily increasing
slope, and the maximum daily decreasing slope are used as features. The maximum slope
is intended to be more sensitive to large changes, whereas the average slope would not
capture the importance of excursion slopes.
Table 3.1 shows a summary of the features used, as well as the study in which each
feature was first introduced. The third study is the focus of this thesis.
Figure 3.2: Slope of blood glucose levels calculated on excursions. Red lines show where
decreasing slopes are calculated. Yellow lines show increasing slope. Only the maximum
of each are used as features.
3.2.3
Smoothing the Data
The sensor used to collect the CGM data is not perfect, and introduces some signal
noise to the data. The sensor can give readings ±20% of the actual level (Klonoff, 2005).
The noise caused by this inaccuracy would cause several of the features such as DT and
40
Table 3.1: Summary of the features used in this study
Study first
Feature
Description
MAGE
Mean Amplitude of Glycemic Excursions
introduced
First
EF
Excursion Frequency
DT
Distance Traveled
σ
Standard Deviation
AUC
µ pq
Area Under the Curve
2-Dimensional central moments of order 2 ≤ p+q ≤ 3
Eccentricity
Second
FFi
Amplitudes of low DFT frequencies for 1 ≤ i ≤ 24
RR
Roundness Ratio
BE
Bending Energy
DCi
Direction Codes, for 1 ≤ i ≤ 3
S lope ↑
Maximum increasing slope
S lope ↓
Maximum decreasing slope
Third
RR to be inaccurate since they are based on the jaggedness of the glucose plot. For
example, the RR would generally be much higher on raw CGM data than on smooth CGM
data. The intuition is that smoothing the data will better represent glucose levels.
The second study (Wiley, 2011) used a cubic spline smoothing filter (Pollock, 1993),
which was identified by physicians as the best of several available smoothing methods. A
spline connects adjacent points with a polynomial function while maintaining continuity.
A cubic spline is a spline that uses a cubic function as the polynomial function that
connects the points. A smoothing function allows the polynomial end points to differ from
41
the data points that are being modeled. Using equations described by Pollock, it is
possible to give higher weights to certain datapoints such as fingerstick readings over
sensor readings, which adds more flexibility and power to the smoothing algorithm.
3.2.4
Machine Learning Algorithms
Three machine learning (ML) algorithms were trained on the features in Table 3.1.
These algorithms are Support Vector Regression (SVR), Multilayer Perceptron (MLP),
and Linear Regression (LR). These algorithms are described in detail in Chapter 2.
3.2.5
Algorithm Configurations
Each algorithm was trained and evaluated in several different configurations. For
MLP and LR, there were three configurations producing eight combinations: Using
forward or backward feature selection (described in Section 3.2.6), using smoothed or raw
data (described in Section 3.2.3), and using or not using the development set as training
data after feature selection and tuning. The SVR used these three configurations with two
different kernel types: Linear and Gaussian (also known as radial basis function, or RBF),
producing 16 combinations.
3.2.6
Feature Selection and Tuning
Not all of the features described in Section 3.2.2 are equally useful. Also, the ML
algorithms can be confused by too many features and uncover patterns in the data that are
only a coincidence of the training data. This phenomenon is called over-fitting, which can
be reduced by choosing a subset of features on which to train the algorithms. Since there
are over 40 features in total, it would not be feasible to try each subset of features since
there are 2n subsets. For this reason, greedy algorithms are used to select features. There
are two types of greedy algorithms used for feature selection in this experiment.
42
The first feature selection algorithm is a forward selection wrapper. This algorithm
initializes with the set S = ∅, then for each feature, f ∈ S̄ , where S̄ is the complement of
S , train on S ∪ f and evaluate on the development set. Select the feature f 0 such that the
Root Mean Square Error (RMSE) is minimized on the development set and set
S = S ∪ f 0 . If the RMSE is the minimum so far, Set S 0 = S and repeat until S̄ = ∅;
otherwise, stop and return S 0 as the feature set.
The other feature selection algorithm is a backward elimination wrapper. Backward
elimination differs from forward selection by starting with the set S = all features, so for
each feature f ∈ S , train on S − f and select the feature f 0 such that the RMSE is
minimized on the development set and set S = S − f 0 . If the RMSE is the minimum so
far, set S 0 = S . Repeat until S = ∅ and select S 0 as the feature set.
When the subset of features has been chosen by either the forward selection or
backward elimination wrapper, the tuning process can begin. Each algorithm has its own
set of parameters to tune. The SVM with the Gaussian kernel tunes the cost (C) and the
kernel width (γ) parameters. A grid search of these two parameters is performed with each
ranging from
0.001
N
to
10000
,
N
where N is the number of training instances. The SVM with
the linear kernel only has the cost parameter. The LR only has the regularization
coefficient parameter (λ) which takes values in the same range
1.0×10−10
N
to
1000
.
N
The
parameters are doubled each iteration because exponential growth of the parameters is
considered a practical method of identifying good parameters (Hsu et al., 2003). The MLP
has learning rate and momentum parameters which take the values {0.2, 0.5, 0.8}. The
parameters for each of these algorithms are described in detail in Chapter 2.
Similar to feature selection, the tuning process trains on the training set and evaluates
on the development set to choose the best tuning parameters based on the RMSE on the
development set. The process of feature selection and parameter tuning is performed for
each of the folds in the 10-fold cross validation.
43
3.2.7
10-Fold Cross Validation
For training, tuning and testing a machine learning model, there are three sets of
instances – a training set, a development set, and a testing set. The training and
development sets are used to prepare the model for the testing set, which contains
instances that were not used to create the model. The evaluation of the model on the
testing set is a representation of how the model would perform on new, unseen data. The
model is trained on instances from the training set. However, to tune the parameters and
do feature selection (Section 3.2.6) to improve performance, the model is evaluated on the
development set. This way, the model is configured with a feature set and algorithmic
parameters to perform the best it can on the development set. If the development and
training sets are a good representation of all of the instances in the domain, and there are
enough instances in the sets, the model should also perform well on the testing set.
If there are too few instances to adequately create distinct training, development, and
testing sets, a method called 10-fold cross validation can be used. In this method, some of
the instances are held out for development, while the rest of the instances are segmented
into 10 folds. Each fold of the 10-fold cross validation contains a training set (90% of the
instances) and a testing set (10% of the instances). Each instance is bound to exactly one
testing set, so each instance is used for testing once and for training nine times over the 10
folds. The idea behind 10-fold cross validation is to mimic a large testing set and a large
training set without having a large number of instances.
3.2.8
Datasets
For the experiments in this study, 250 CGM plots were rated by the physicians. Since
there are relatively few instances, 10-fold cross validation is used, as described in
Section 3.2.7. For this experiment, 50 of the instances were set aside as the development
44
set, leaving the other 200 to be used in the 10-fold cross validation. Therefore, 20
instances were tested on for each fold, and 180 were trained on.
3.2.9
Defining the CPGV Metric
A metric was sought that matched physician consensus ratings as closely as possible.
The performance of each configuration of each algorithm model was measured by the
RMSE over the 10 folds. Any difference between the output of the model and the
consensus rating of the physicians represents an error. The square of the error on the
testing sets was averaged over the 10 folds to calculate the RMSE over all 200 CGM plots.
The configuration/algorithm combination with the lowest RMSE was used to build the
model for the CPGV metric. This model was trained on all 200 CGM plots that were used
in the 10-fold cross validation, and tuned on the same 50 development plots. Results of
the evaluation of the CPGV metric are reported in Section 3.3.1.
3.2.10
Screen for Excessive Glycemic Variability
To obtain a screen for excessive glycemic variability, the CPGV metric was used to
classify CGM plots as exhibiting excessive or acceptable glycemic variability, based on
the value of the metric. A threshold value of the metric was chosen to separate excessive
from acceptable glycemic variability. A value less than the threshold would be acceptable
glycemic variability, and a value greater than or equal to the threshold would be excessive
glycemic variability. Two hundred sixty-two CGM plots were collected for a previous
study (Wiley, 2011) and reused for this purpose in this experiment. In that study, two
physicians, Frank Schwartz and Jay Shubrook, classified CGM plots manually, providing
a gold standard for correct classification. However, 64 of the 262 CGM plots were
included in the dataset used to develop the CPGV metric. In evaluating the performance of
the CPGV metric as a screen, it was important to avoid any statistical bias due to this
overlap. Ideally, there would have been no overlap, but excluding the 64 plots would have
45
resulted in a smaller evaluation set. To avoid testing the metric on plots used for training
the metric, values were taken from folds of the cross validation in which these plots were
used only for testing. Results of the evaluation of the screen for excessive glycemic
variability are reported in Section 3.3.2.
3.3
Results
3.3.1
CPGV Metric Performance
Table 3.2 shows a comparison of the ability of the best machine learning algorithms
and the individual physicians to match the consensus ratings. The individual physicians’
ability to match the consensus is shown for two different ways of calculating the
consensus. Normally, the consensus for a plot is the average of all of the physicians’
ratings of that plot. This consensus (physicians inclusive) represents the inter-rater
agreement and a gold standard for the models to achieve. However, because it is easier for
an individual physician to match a consensus that includes his or her own rating, a second
consensus (physicians exclusive) was computed to exclude the individual’s own rating.
Physicians exclusive tests the ability of an individual physician to agree with the
consensus of other doctors who rated the same CGM plot. The Mean Absolute Error
(MAE) column represents the average difference from the consensus. Since the physicians
only give ratings in whole numbers, the model outputs were rounded to the nearest integer
for comparison.
The best model was the SVR using smoothed CGM data on a Gaussian kernel with
features chosen by the greedy forward selection method. This model performed slightly
better when using the development set as training examples after features and tuning
parameters were computed. The CPGV metric was created using an SVR with this
configuration. It was trained on all 200 CGM plots that were used in the 10-fold cross
validation, with feature selection and parameter tuning on the 50 development plots.
46
Table 3.2: Performance of the best MLP, SVR and LR models compared to the performance
of individual doctors at matching the consensus
RMSE
MAE
RMSE
MAE
(floating pt) (floating pt) (integer)
(integer)
Best SVR
0.417
0.316
0.511
0.376
Best MLP
0.496
0.392
0.575
0.445
Best LR
0.471
0.354
0.541
0.397
Physicians inclusive
0.489
0.355
Physicians exclusive
0.699
0.509
Table 3.2 shows that the CPGV metric performs comparably to the physicians when
constrained to integer evaluations. When the metric is not constrained to integer
evaluations, it performs much better at matching the consensus than individual physicians.
3.3.2
Performance of the Excessive Glycemic Variability Screen
A glycemic variability metric can be used as a screen by choosing a cutoff value to
separate acceptable from excessive glycemic variability. The CPGV metric was compared
to several other glycemic variability metrics as screens. Excursion frequency, distance
traveled, standard deviation, and MAGE, which were described in Section 3.2.2, were
evaluated for comparison. One additional metric, Interquartile Range (IQR) (Rodbard,
2009b), was was also evaluated. The IQR is a metric which measures glycemic variability
by finding the difference of blood glucose readings at the 75th and 25th percentiles. This
method effectively measures the range of blood glucose while treating the highest and
lowest 25% of readings as outliers, giving the size of the spread of the readings.
47
The screens were evaluated on 262 CGM plots that were previously annotated by two
physicians, Frank Schwartz and Jay Shubrook (Wiley, 2011). The threshold value for each
metric was experimentally chosen as the one that gives the best accuracy when trained on
a randomly selected development set. This process was repeated 10 times with the
accuracy, sensitivity, and specificity averaged over the 10 tests. Table 3.3 compares the
performance of the screens. Statistical significance was calculated between CPGV and the
other metrics using a one-tailed paired sample t-test. P-values are reported where
statistical significance was found. Table 3.3 shows that the CPGV metric classifies
statistically significantly better than all of the other metrics based on accuracy and
sensitivity and better than STD, MAGE, and IQR based on specificity.
The Receiver Operator Characteristic (ROC) curve in Figure 3.3 shows the
comparison of the classification performance of the metrics. The threshold value is the
dependent variable for computing the false positive and true positive rate. The closer the
line is to the top left corner, the better. As can be seen from Figure 3.3, the CPGV metric
outperforms all of the other metrics as a screen.
48
Table 3.3: Classification performance of CPGV, EF, DT, STD, MAGE, and IQR. p values
show a one tailed paired t-test comparison to the CPGV metric
Accuracy
Sensitivity
Specificity
90.1%
97.0%
74.1%
84.3%
92.0%
66.1%
p < 0.005
p < 0.05
83.9%
89.1%
p < 0.005
p < 0.005
83.6%
91.6%
64.8%
p < 0.001
p < 0.005
p < 0.05
80.8%
90.9%
56.7%
Excursions (MAGE)
p < 0.001
p < 0.05
p < 0.005
Interquartile Range
78.7%
91.8%
47.2%
p < 0.001
p < 0.05
p < 0.005
Consensus Perceived
Glycemic Variability (CPGV)
Excursion Frequency
(EF)
Distance Traveled
(DT)
Standard Deviation
(STD)
Mean Amplitude of Glycemic
(IQR)
72.3%
49
Figure 3.3: ROC curve showing CPGV, EF, DT, STD, MAGE, and IQR
50
3.4
Discussion
The CPGV metric gives a single numeric value that can be used to quantify the
glycemic variability of a 24-hour CGM plot as represented in Figure 3.4. The CPGV
CPGV = 1.4
CPGV = 2.0
CPGV = 3.1
CPGV = 3.9
Figure 3.4: CGM plots automatically rated by the CPGV metric.
metric could be run on as many days of data as a patient has, as long as they have enough
data to run the metric. Analytics can be derived from the daily values, such as average
CPGV, or percentage of days with excessive variability. Correlations can also be made
with predictable events such as weekends, holidays, exams, menstrual cycles, etc.
A recent recommendation to standardize reporting of glycemic control for use in
clinical decision making (Bergenstal et al., 2013) calls for a glycemic variability metric.
51
The metrics that were considered by Bergenstal et al. were STD and IQR. The CPGV
metric has been shown to match the consensus of multiple expert physicians significantly
better than STD (p < 0.001) and IQR (p < 0.001).
High glucose variability has been used to predict hypoglycemia (Monnier et al.,
2011; Qu et al., 2012). Both of these studies used STD and MAGE, finding a significant
correlation between high variability and frequency of future hypoglycemic events. Using
the CPGV metric instead of STD or MAGE could lead to more accurate predictions.
52
4
Blood Glucose Prediction
This chapter describes blood glucose level prediction for patients with Type 1
diabetes (T1D). The background and previous work are described, and then the methods
of the physiological model and machine learning are explained. Finally, the results and
discussion are presented. A paper based on the work presented in this chapter has been
submitted to the International Conference on Machine Learning and Applications
(ICMLA).
4.1
Background
The problem of predicting blood glucose falls within the domain of a time series
prediction. A time series prediction uses all of the information up until a cut-off time t, to
predict the value of a variable at time t + l. This work uses the data described in
Section 4.2.1 to predict the value of blood glucose (BG) at two future times: t + 30
minutes, and t + 60 minutes.
4.1.1
Previous Work
A preliminary study (Marling et al., 2011; Marling et al., 2012), simple Support
Vector Regression (SVR) models were shown to outperform a simple baseline of BGt+0 ,
i.e., unchanging blood glucose, based on Root Mean Square Error (RMSE). These models
use an arbitrary pivot point about 1 month into the patient’s study. The seven days before
the pivot point were used as training data, while the 3 days after and including the pivot
date were used as testing data. Two separate SVR models were trained, one for a 30
minute prediction, the other for a 60 minute prediction. These models used the following
features:
1. Blood glucose level at present time, t0 .
53
2. Moving average over the previous four blood glucose levels before, and including,
the prediction time.
3. Exponentially smoothed rate of change over the previous four blood glucose levels
before, and including, the prediction time.
4. Insulin bolus dosage totals over 30 minutes before the prediction time computed
over intervals of 10 or 30 minutes.
5. Insulin basal rate averages over 5 or 15 minutes before the prediction time.
6. Carbohydrate consumption over 30 minutes before the prediction time computed
over intervals of 15 or 30 minutes
7. Exercise duration and intensity averages over 5, 30, or 60 minutes before the
prediction time.
In later work (Wiley, 2011), an Auto Regressive Integrated Moving Average
(ARIMA) model was investigated as a new feature for input to the SVR model. An
ARIMA model is a statistical model of a time series that uses past data points to make
forecasts about the time series in the future. The ARIMA model itself can be used as a
predictor, as well as to generate a feature to inform the SVR models. The ARIMA model
was trained on four days of CGM data immediately before the test point. The SVR model
used a pivot date on a Sunday about 1 month into the patient’s study. The SVR model was
trained on 14 days before the pivot date and tested on the 14 days after the pivot date.
Each of the features was designed as a parameterized template that was tuned using a grid
search on two weeks of training and one week of development data before the pivot date.
The investigation used two sets of features for the SVR. SVR1 used features 1 through 6
from the preliminary study, as well as the following features:
8. The ARIMA forecast for the prediction time (either 30 or 60 minutes in the future).
54
Table 4.1: Previous study prediction results.
Horizon
t0
ARIMA
SVR1
SVR2
30 min
19.5
4.5
4.7
4.5
60 min
35.5
17.9
17.7
17.4
9. The amount of time spent exercising over the previous hour.
10. The amount of time spent sleeping over the previous hour.
11. The amount of time spent working over the previous hour.
SVR2 used features 1, 2, 3, and 8 above. The CGM data used in the investigation was
smoothed to reduce sensor noise. The smoothing algorithm was the same cubic spline
smoothing algorithm described in Section 3.2.3. This algorithm uses both past and future
information to move each point to form a smooth curve. Since the prediction point was
also smoothed, this results in an easier prediction problem. Table 4.1 shows the results
from this study. As can be seen from the table, the simple ARIMA model performed
similarly to the SVR models.
4.2
Methods
This section describes the methods used in this study for blood glucose prediction.
4.2.1
Data
The blood glucose prediction model takes advantage of a comprehensive database of
patient data accumulated from trials involving patients with Type 1 diabetes. This work
TM
uses the data from the third 4DSS
following information:
trial in which patients reported or submitted the
55
• CareLink data which includes:
– CGM readings in five minute intervals.
– Finger stick readings.
– Insulin bolus dosages.
– Bolus type.
– Basal levels.
– Temporary basals.
• Time they went to bed.
• Time they woke up.
• Time they went to work.
• Time they returned from work and work intensity on a revised Borg scale of
perceived physical exertion (Borg, 1982) of 1 to 10.
• Time, duration, and type of exercise and intensity on the same Borg scale of 1 to 10.
• Time of hypoglycemia events with the corrective action (carbohydrates and food
composition), and symptoms felt.
• Time of meal, type of meal and composition, and estimated carbohydrates.
• Time and type of stress.
• Time of illness or injury.
• Time of menses.
• Time of a pump problem.
56
The CareLink data is recorded by the patient’s pump, which automatically records the
time of the events. The rest of the data was reported by the patient through a web
browser-based interface on a personal computer at the end of each day, as described in
(Maimone, 2006).
Two hundred test points were chosen from 5 patients - 40 from each. The test points
were chosen to represent a wide range of situations: different times of day and night; soon
or long after a variety of life events; on rising, falling, and static points of the blood
glucose curve; or before, after, or far away from local optima and inflection points in the
blood glucose curve.
4.2.2
Physiological Model
To better capture the effects of carbohydrates and insulin in the body, a physiological
model is used. The physiological model attempts to characterize meal absorption
dynamics, insulin dynamics, and glucose dynamics. The physiological model is based on
equations presented in (Duke, 2009), with a few adaptations to better match published
data and feedback from physicians. The physiological model used in this work was
developed by Razvan Bunescu at Ohio University.
The physiological model is represented by its state variables X, input variables U,
and a state transition function that computes the next state variables given the current state
variables and the input variables, i.e., Xt+1 = f (Xt , Ut ). The state variables X are organized
as follows:
1. Meal Absorption Dynamics:
• Cg1 (t) = carbohydrate consumption (g).
• Cg2 (t) = carbohydrate digestion (g).
2. Insulin Dynamics:
57
Figure 4.1: Variable dependency for calculating the next state for the physiological model.
Figure by Razvan Bunescu.
• I s (t) = subcutaneous insulin (µU).
• Im (t) = insulin mass (µU).
• I(t) = level of active plasma insulin (µU/ml).
3. Glucose Dynamics:
• Gm (t) = blood glucose mass (mg).
• G(t) = blood glucose concentration (mg/dl).
The vector of input variables, U, contains the carbohydrate intake UC (t) measured in
grams (g), and the amount of insulin intake U I (t) measured in units of insulin(U). The
insulin intake is computed from the bolus and basal rate data from the patient’s pump. The
carbohydrate data is from the patient’s report for meals, snacks, and hypoglycemia
corrections. Figure 4.1 shows the interaction between the state variables at time t to
compute the state at time t + 1. The boxes represent a single state variable. The set of state
variables, X, is on the horizontal, and the passage of time is on the vertical, where moving
down is later. Each state variable is computed by an equation that is dependent on the
variables pointing to it.
The equations for the state variables are as follows: Meal Absorption Dynamics:
58
• Cg1 (t + 1) = Cg1 (t) − α1C ∗ Cg1(t) + UC (t)
• Cg2 (t + 1) = Cg2 (t) + α1C ∗ Cg1(t) − α2c /(1 + 25/Cg2 (t))
Insulin Dynamics:
• I s (t + 1) = IS (t) − α f i ∗ IS (t) + U I (t)
• Im (t + 1) = Im (t) + α f i ∗ IS (t) − αci ∗ Im (t)
The equation for glucose dynamics is Gm (t + 1) = Gm (t) + ∆abs − ∆ind − ∆dep − ∆clr + ∆egp ,
where:
• ∆abs (absorption) = α3c ∗ α2c /(1 + 25/Cg2 (t))
• ∆ind (insulin independent utilization) = α1ind ∗
√
G(t)
• ∆dep (insulin dependent utilization) = α1dep ∗ I(t) ∗ (G(t) + α2dep )
• ∆clr (renal clearance, when G(t) > 115) = α1clr ∗ (G(t) − 115)
• ∆egp (endogenous liver production) = α2egp ∗ exp(−I(t)/α2egp ) − α1egp ∗ G(t)
Finally, the glucose and insulin concentrations are simply scalars of their respective mass,
which are given by the following equations, where bm refers to body mass, and IS refers
to insulin sensitivity:
• G(t) = Gm (t)/(2.2 ∗ bm)
• I(t) = Im (t) ∗ IS /(142 ∗ bm)
4.2.3
Feature Vector
For each blood glucose reading at time t, a feature vector is computed. Each state
vector can have one of two target values, blood glucose level at time t+30 or time t+60 . A
separate SVR model is trained for each target value. The feature vector consists of
features computed from:
59
• Blood glucose level at present time, t0 .
• Blood glucose deltas. The values of BG(t0 ) − BG(t−5i ) for 1 ≤ i ≤ 12.
• ARIMA forecasts. The values of the ARIMA forecast for time t+5i for 1 ≤ i ≤ 12.
• Physiological model. The values of the components of the physiological model for
Cg1 , Cg2 , IS , Im , and Gm at time t+30 and t+60 as well as the change of the components
between time t0 and t+30 , and between t0 and t+60 .
4.2.4
Walk Forward Testing
Figure 4.2 shows how the machine learning model was designed to train and tune on
past data and test on future data. For each test point, the model uses the week prior to the
point as training data, and all of the data before the training data as development data for
tuning. A new prediction model is trained and tuned for each test point. It is important to
note that the model does not use training vectors for which the target time is after the time
for the prediction test point. For example, if the prediction point is BGt+30 , then the
training vector for the preceding blood glucose reading cannot be used since the
prediction time for it is equal to BGt+25 , which is in the future.
To effectively tune the model, there needs to be at least a full week of CGM data for
training and a full week of CGM data for tuning. This experiment includes some test
points where there is not enough data to effectively tune the parameters for the model.
Generic parameters were used for those test points. To obtain the generic parameters, 100
validation points were chosen at random from a separate dataset from two patients. These
separate 100 points start far enough into the patient’s study to ensure that there is enough
data for a full week of training. A grid search was performed over the 100 validation
points to determine the set of parameters with the lowest RMSE. This set of parameters
60
Figure 4.2: Illustration of the walk-forward testing scheme used for the machine learning
model. The darkest section, which is tested on, represents the feature vector for a single
time value.
became the generic parameters used for test points without at least two weeks of prior
CGM data.
4.2.5
Baselines for Evaluation
To gauge the effectiveness of the prediction model, this experiment uses four
baselines for comparison:
• A simple BG(t0 ) value. This baseline predicts that the blood glucose level will stay
the same.
61
• ARIMA forecast for time t+30 for the 30 minute prediction model or for time t+60 for
the 60 minute prediction model.
• An SVR model trained on the features used for the SVR1 model described in
Section 4.1.1. This model was trained and tuned using the walk forward method
described in Section 4.2.4.
• The predictions of three expert physicians. These physicians were given the same
test points as the model and were asked to predict the blood glucose levels using a
GUI.
Figure 4.3 shows the GUI that the physicians used. The GUI was developed by Melih
Altun, based on a visualization program originally developed by Wesley Miller (Miller,
2009). This figure shows the blood glucose curve in blue with a prediction point around
8:45pm. The dashed vertical line around 9:45pm represents the time for the 60 minute
prediction. The physician is able to use the mouse pointer to place a prediction value on
this vertical line. The top of the image shows many dots representing the various life
events which the patient experienced. The physician was blocked from seeing future
blood glucose levels, but was able to view any day prior to the day containing the test
point. The two black dots at around 10:00am and 10:30am are visual feedback from
predictions for a test point earlier in the same day.
4.2.6
Evaluation
The standard evaluation metric for the model and the baselines is RMSE. During the
physician prediction sessions, it was noticed that one of the physicians was good at
identifying the direction of the blood glucose change, but overestimated the magnitude of
the change. To capture this ability, an additional ternary evaluation metric was developed.
62
Figure 4.3: A screen capture of the GUI used by the physicians to predict future blood
glucose levels.
The ternary metric uses three categories for the change in the blood glucose level:
decrease, stay the same, or increase. If the blood glucose deviates by less than or equal to
5 mg/dl for the 30 minute prediction, or less than or equal to 10 mg/dl for the 60 minute
prediction, it is classified as staying the same. If the blood glucose increases or decreases
by more than that, then it is classified as increasing or decreasing, accordingly.
63
Table 4.2: RMSE baselines for the prediction dataset.
ARIMA
SVR Physician 1 Physician 2 Physician 3
30 minute prediction 27.6
22.9
23.7
19.8
21.2
34.1
60 minute prediction 43.8
42.2
41.8
38.4
40.0
47.0
t0
The ternary metric uses a cost evaluation based on the following matrix:
Actual
Decrease
Predicted S ame
Increase
Decrease
S ame
Increase
0
1
2
1
0
1
2
1
0
(4.1)
A prediction in the same class as the actual change results in a cost of zero. A prediction
of an adjacent class results in a cost of one. A prediction of the wrong direction results in
a cost of two. The value for the ternary metric is the total cost for all of the test points.
When using RMSE or ternary cost, a lower score is considered better.
4.3
Results
Table 4.2 shows the RMSE scores for the baselines. As seen in the table, Physician 1
performed best for both 30 minute and 60 minute predictions, followed by Physician 2.
Physician 3 had the highest scores when using RMSE as the evaluation metric, but did
better when using the ternary cost as the evaluation metric, as shown in Table 4.3.
The simplest baseline, t0 , is comparable to the other baselines at predicting blood
glucose for 60 minutes, but worse at predicting for 30 minutes when using RMSE as the
evaluation metric. For the 30 minute prediction, the RMSE scores for all of the other
baselines except Physician 3 are significantly lower than the score for t0 (p < 0.01). For
the 60 minute prediction, only the RMSE scores for Physician 1 (p < 0.01) and Physician
64
Table 4.3: Ternary cost baselines for the prediction dataset.
ARIMA
SVR Physician 1 Physician 2
30 minute prediction 159
107
95
81
86
115
60 minute prediction 151
131
118
106
119
113
t0
Physician 3
2 (p < 0.05) are significantly lower than the score for t0 . However, when using ternary
cost as the evaluation metric, all other baselines have significantly lower scores than t0 for
both the 30 and 60 minute predictions (p < 0.01).
Table 4.4 shows the results for the new prediction model. The table shows that the
new prediction model performs better than all baselines when using RMSE as the
evaluation metric, and better than all of the baselines for 60 minute predictions when
using the ternary cost as the evaluation metric. The new prediction model performs better
than all but Physician 1 and Physician 2 for 30 minute predictions when using the ternary
cost as the evaluation metric.
Table 4.5 shows the p-values from a one tailed paired t-test comparison of the
baselines to the new prediction model. As seen in the table, the new prediction model is
significantly better than t0 and ARIMA for both 30 minute and 60 minute predictions when
using either RMSE or ternary cost as the evaluation metric. The new prediction model is
also significantly better than the SVR baseline for both 30 and 60 minute predictions when
using RMSE as the evaluation metric and comparable when using ternary cost as the
evaluation metric. The new prediction model performs comparably to the physicians
overall; it is significantly better than the physicians in four cases, as shown in Table 4.5.
The Clarke Error Grid Analysis (CEGA) is an analysis method first developed by
(Clarke et al., 1987) for evaluating blood glucose sensor accuracy, but it can also be useful
for evaluating the goodness of blood glucose prediction. Figure 4.4 shows a Clarke Error
65
Table 4.4: Results for the new prediction model.
RMSE Ternary cost
30 minute prediction
19.5
88
60 minute prediction
35.7
105
Table 4.5: Statistical significance of the improvement of the new prediction model over the
baselines. NS denotes not statistically significant.
t0
ARIMA
SVR
Physician 1
Physician 2 Physician 3
30 minute RMSE
p < 0.0001
p < 0.001
p < 0.0005 NS
NS
p < 0.0001
60 minute RMSE
p < 0.0001
p < 0.0005
p < 0.0005 NS
p < 0.01
p < 0.001
30 minute Ternary
p < 0.0001
p < 0.05
NS
NS
NS
p < 0.01
60 minute Ternary
p < 0.0001
p < 0.05
NS
NS
NS
NS
Grid. The regions of the grid represent the clinical consequences of prediction accuracy.
Region A represents clinical accuracy. Region B represents benign clinical error. Region
C represents erroneous predictions suggesting treatment that is unnecessary, but not
harmful. Region D represents erroneous predictions that fail to detect problems requiring
treatment. Region E represents erroneous predictions suggesting the wrong clinical
treatment, which could endanger patients. Predictions within regions A and B are
considered clinically acceptable, but predictions within regions C, D, and E represent
errors with potential clinical consequences. Tables 4.6 and 4.7 show the percentages of
predictions falling within each CEGA region, for 30 and 60 minute predictions,
respectively. For 30 minute predictions, the new prediction model tied the SVR baseline
with predictions in region A, and only had fewer predictions in region A than Physician 1.
For 60 minute predictions, the new prediction model tied Physician 3 with predictions in
66
Figure 4.4: The Clarke Error Grid.
region A, and had more predictions in region A than all other baselines.
Figures 4.5, 4.6, 4.7, 4.8, 4.9,and 4.10 show the CEGA for the baselines. Figure 4.11
shows the CEGA for the new prediction model.
67
30 minute prediction.
60 minute prediction.
Figure 4.5: t0 baseline CEGA.
30 minute prediction.
60 minute prediction.
Figure 4.6: ARIMA baseline CEGA.
68
30 minute prediction.
60 minute prediction.
Figure 4.7: SVR baseline CEGA.
30 minute prediction.
60 minute prediction.
Figure 4.8: Physician 1 baseline CEGA.
69
30 minute prediction.
60 minute prediction.
Figure 4.9: Physician 2 baseline CEGA.
30 minute prediction.
60 minute prediction.
Figure 4.10: Physician 3 baseline CEGA.
70
30 minute prediction.
60 minute prediction.
Figure 4.11: New prediction model CEGA.
Table 4.6: CEGA Region percentages for 30 minute predictions.
t0
ARIMA
SVR
Physician 1 Physician 2
Physician 3 New Prediction Model
A 78.57% 81.12%
84.70% 86.73%
83.67%
77.04%
84.70%
B
19.39% 17.35%
14.29% 11.73%
14.80%
19.90%
13.78%
C
0%
0%
0%
0%
0%
0%
0%
D 2.04%
1.53%
1.02%
1.53%
1.53%
3.06%
1.53%
E
0%
0%
0%
0%
0%
0%
0%
Table 4.7: CEGA Region percentages for 60 minute predictions.
t0
ARIMA
SVR
Physician 1 Physician 2
Physician 3 New Prediction Model
A 57.14% 58.16%
64.29% 65.31%
64.29%
67.35%
67.35%
B
35.71% 35.71%
30.10% 29.08%
30.61%
25.51%
26.53%
C
0%
0.51%
0%
0%
0%
0%
0%
D 7.14%
5.61%
5.10%
5.61%
4.59%
6.63%
6.12%
E
0%
0.51%
0%
0.51%
0.51%
0%
0%
71
4.4
Discussion
The new prediction model provides a prediction for 30 minutes and 60 minutes in the
future based on features from past blood glucose levels, insulin dosages, and meals. The
model uses features from ARIMA and blood glucose delta values, which are based on past
blood glucose levels, and features from a physiological model, which uses past blood
glucose levels, insulin dosages, and meals. This model has been shown to be a better
predictor than the t0 , ARIMA, and the old SVR baselines when using RMSE as the
evaluation metric and to outperform t0 and ARIMA when using ternary cost as the
evaluation metric. The new prediction model also outperforms Physician 2 on 60 minute
predictions and Physician 3 on 30 and 60 minute predictions when using RMSE as the
evaluation metric, and Physician 3 on 30 minute predictions when using ternary cost as
the evaluation metric.
Since the new prediction model outperforms the old SVR, this shows that it is
important to have a good representation of the data as features. Both the new prediction
model and the old SVR model use the same machine learning algorithm and experimental
platform, as well as features based on the same types of data (blood glucose, insulin, and
carbohydrates), but the new prediction model used more informed features from this data
by better representing the interactions of the data through a physiological model.
The physiological model uses generalizations about body mass index, insulin
sensitivity, and carbohydrate ratio, rather than parameters tuned to each individual, which
limits its accuracy. However, the SVR has the advantage of being able to learn these
parameters automatically. The SVR has the additional advantage of being able to include
features based on data that would be difficult to incorporate in a physiological model, such
as work or stress.
The ability to predict blood glucose levels 30 minutes or 60 minutes in the future
would provide an invaluable tool for patients with diabetes to preemptively correct for
72
dangerous blood glucose levels before they occur. This tool could also bring peace of
mind to patients with diabetes and their carers, who have to live with the stress of not
knowing if life threatening blood glucose levels are imminent.
73
5
5.1
Related Research
Diabetes Related Research
This section discusses research related to this work pertaining to diabetes including
physiological models, predictive blood glucose control, and glycemic variability.
5.1.1
Physiological Models
There are several physiological models that are used for modeling blood glucose.
Currently, there are no physiological models being used for providing clinical advice for
real patients. They are used for demonstration and learning purposes. Some studies use
in silico trials, which use a physiological model/simulator for evaluation.
The Lehmann and Deutsch simulator (Lehmann and Deutsch, 1992) uses four
differential equations and 12 auxiliary equations to model blood glucose. The model
implements the idea that physiology responds in transitory phases. Before the
carbohydrates in a meal are absorbed in the blood, they first transition through the
digestive system in phases. Subcutaneous insulin injection transits in similar phases. The
equations in the model attempt to reflect these transitions and their eventual effect on
glucose levels. The physiological model used in this work for predicting blood glucose is
very similar to this simulator.
AIDA (Lehmann et al., 2011; Lehmann, 2013) was developed by the same authors as
the Lehmann and Deutsch simulator. The primary goal of AIDA is to provide a teaching
tool for diabetes self-management. According to the Diabetes Control and Complications
Trial (DCCT) (DCCT Research Group and others, 1987), intensive glycemic control
reduces and delays serious complications. The aim of AIDA is to transfer clinical
knowledge from physicians to patients through the computer tool, thereby reducing the
time physicians need to teach patients how to control diabetes. The authors acknowledge
74
that AIDA is not complete enough to be used for modeling individual patients, but is a
valuable tool for demonstrating the general responses to carbohydrates and insulin.
Other models include the Glucose Insulin Model (GIM) (Dalla Man et al., 2007a)
which implements the physiological model described in (Dalla Man et al., 2007b). This
model is similar to the Lehmann and Deutsch simulator and AIDA. The GIM software can
be used for simulating physiological responses to glucose and insulin. The H∞ robust
model (Parker et al., 2000), is a 19th order linear model and one of the most complicated
models in the literature.
This work extends a physiological model by using machine learning to learn the
effects of the variables on an individual patient. The physiological models described here
attempt to find parameter weights based on values such as body weight, insulin sensitivity,
carbohydrate ratio, etc. For example, the GIM model uses over 30 parameter weights.
Most of the weights are tuned to generalized patients so that the models do not need to be
completely reworked for each patient. Machine learning allows for weights to be learned
automatically, allowing a machine learning algorithm to learn the effect of each equation
of the model independently. Machine learning also allows for additional parameters to be
included that are not part of the physiological model.
5.1.2
Predictive Blood Glucose Control
Currently, diabetes management is conducted in an open-loop style, where glucose
readings are presented to the patient, who then manually administers treatment using a
simple algorithm. A closed-loop system continuously and automatically monitors glucose
readings and administers treatment without patient intervention. This type of system is
referred to as an artificial pancreas. According to (Klonoff, 2007), the benefits of a
closed-loop system over an open-loop system are “1) less glycemic variability; 2) less
75
hypoglycemia; 3) less pain from pricking the skin to check the blood glucose and deliver
insulin boluses; and 4) less overall patient effort.”
An artificial pancreas consists of three components: (1) An insulin pump; (2) a CGM
sensor; and (3) a closed-loop control algorithm to determine the rate of insulin delivery.
Insulin pumps and CGM sensors exist on the market, but the control algorithm is still in
development. Some control algorithms are based on the traditional
Proportional-Integral-Derivative (PID) controller and do not model blood glucose
(Bequette, 2005). This type of controller is widely used in industrial processes to maintain
or adjust process variables in the presence of uncontrollable disturbances (Petruzella,
2005). These controllers need to have parameters tuned to each specific process. Since
each patient has their own insulin sensitivity, the controller would need to be tuned for
each patient. Other control algorithms use Model Predictive Control (MPC), which uses
mathematical or AI models to forecast blood glucose (Bequette, 2005).
Modeling blood glucose for the purpose of providing informed insulin dosages dates
back to the 1960s (Boutayeb and Chetouani, 2006). Early efforts had to make do with
fingerstick readings, since CGM sensors were not available until 1999. The current
approach to control algorithms for the artificial pancreas is focused on eliminating all
patient intervention. Therefore, they consider automatically recorded events such as blood
glucose and insulin, but they do not consider any patient recorded events.
(Kovács et al., 2012) evaluated a linear parameter varying (LPV) methodology for
systems of nonlinear equations for modeling blood glucose levels for MPC. The LPV is an
extension of linear systems where the inputs are considered linear, but their relations are a
function of the signal, allowing nonlinear behavior. Choosing the variables using LPV
allows the nonlinearity of the original model to be hidden, since the measured parameters
describe the controller. The LPV methodology was used in the model developed by
(Parker et al., 2000). This model takes into consideration several factors including glucose
76
and insulin kinetics, and renal excretion. The LPV model was evaluated using the
Lehmann and Deutsch simulator (Lehmann and Deutsch, 1992). The model was shown to
be capable of avoiding hypoglycemia.
(Magni et al., 2009) is a simulated trial using the GIM software package (Dalla Man
et al., 2007a) to evaluate the model. The trial used MPC to deliver insulin for 100
simulated patients over a 46 day period. Parameters to the physiological model were
adjusted throughout the experiment; for example, the time of breakfast was shifted by a
random amount. The model was robust to meal variations and was able to tune
parameters. The authors conclude that their model achieves satisfactory results, especially
during sleep, but the model should not replace traditional open-loop basal + bolus therapy
for meals.
The insulin infusion advisory system (IIAS) (Zarkogianni et al., 2011) is an
intelligent insulin delivery system for patients with T1DM on CGM monitors and insulin
pumps. The IIAS uses the information from the CGM sensor and patient reports of meals
to estimate the optimum insulin dosage. It uses a personalized glucose-insulin metabolism
model built on a neural network. The IIAS was evaluated in silico with the University of
Virginia (UVa) T1DM simulator based on (Dalla Man et al., 2007b). Patients with
variations were used to better simulate real life situations like sensor errors and
carbohydrate estimation errors. The simulation showed that the IIAS controller, compared
to a more traditional controller, reduced average blood glucose levels from 152 ± 28 to
118 ± 7 mg/dl, and reduced the percentage of time spent hyperglycemic from 28 ± 21% to
1 ± 2%.
The closest research to this work is the intelligent diabetes assistant (IDA) (Duke,
2009), a system which collects information about patients through a phone interface. The
IDA collects meal, insulin, medication, and exercise information. The information is used
for predicting blood glucose after a meal, generating therapy advice and continuous
77
glucose modeling. Duke postulated that when patients correct for their meal, they are
essentially attempting to predict their after-meal glucose levels when choosing a dosage.
Duke developed Gaussian process regression with a Gaussian kernel to predict post-meal
glucose levels. The model outperformed patients and other published results. The
Gaussian Process regression approach focused on 2 hour predictions of blood glucose
after a meal. The predictions were modeled using auto regressive and physiological
models. Auto regressive models and physiological models were used for 15 and 45 minute
predictions. The auto regressive models out performed the physiological models for the 15
minute predictions, but the physiological models performed better for the 45 minute
predictions. This work uses a broader set of life events than (Duke, 2009), and more
historical data from patients (3 months compared to 2 weeks).
5.1.3
Glycemic Variability
Increased glycemic variability is associated with poor glycemic control (Rodbard
et al., 2009) and is a strong predictor of hypoglycemia (Monnier et al., 2011; Qu et al.,
2012), which has been linked to excessive morbidity and mortality (Zoungas et al., 2010;
Seaquist et al., 2012). While physicians acknowledge the need for a glycemic variability
measurement, there is no current consensus on the best glycemic variability metric to use,
or on the criteria for acceptable or excessive glycemic variability (Bergenstal et al., 2013).
In (Bergenstal et al., 2013), a panel of 34 expert diabetes specialist met on March
28-29, 2012 to discuss the current state of diabetes care and which direction to move
forward. Part of their discussion was focused around glycemic variability. There are a
number of metrics for measuring glycemic variability (Rodbard, 2009a) including
standard deviation (SD), coefficient of variation (CV), interquartile range (IQR) (Rodbard,
2009b), mean amplitude of glycemic excursion (MAGE) (Service et al., 1970), M-value
78
(Schlichtkrull et al., 1965), mean of daily difference (MODD) (Molnar et al., 1972),
continuous overall net glycemic action (CONGA) (McDonnell et al., 2005), and others.
SD, IQR, and MAGE are described in detail in Chapter 3. SD and MAGE are both
used as features for the CPGV metric. CV is a normalized standard deviation, the ratio of
standard deviation and the mean. M-value is a measure of the stability of excursions
compared to a normal blood glucose level. MODD is the mean difference between blood
glucose levels on two consecutive days at the same time of day. CONGA is the SD of the
differences of consecutive time windows of arbitrary size. CONGA is similar to MODD,
but the window is not restricted to 24 hours, and the points are not averaged, but used to
compute the SD. CONGA was the first glycemic variability metric designed specifically
for CGM data.
The panel focused on SD, IQR, and CV for their simplicity and familiarity. IQR was
brought forward because SD, and other metrics based on SD including CV, assumes a
normal distribution in the calculation even though blood glucose is not normally
distributed, but IQR does not have this restriction.
The panel acknowledged that the metrics are capable of measuring one aspect of
glycemic variability or another, but none of them are capable of reflecting all aspects of
glycemic variability. The panel concluded that glycemic variability should be part of a
patient’s evaluation during doctor visits, but they did not conclude that any currently
accepted metric should be the metric to use. The glycemic variability metric developed in
this work is the only one that combines machine learning with many metrics to fit the
consensus gestalt of multiple physicians.
5.2
Machine Learning Related Research
The Machine Learning algorithms in this work have also been applied to a wide
range of applications. There are pattern recognition applications, such as medical
79
diagnosis, which use the machine learning approaches used in this work. There are also
time series forecasting applications such as Financial Market, Utility Load, and Control
Systems prediction.
5.2.1
Machine Learning For Problem Detection
In medicine, you need to distinguish between normal and abnormal cases. Being able
to detect medical problems is something which machine learning is capable of doing, and
is a common use of machine learning. Many applications in medicine collect large
amounts of data, which allows machine learning to become useful in these applications.
Doctors and nurses spend a large portion of time interpreting the large amounts of data
being generated by modern tests. Machine learning is capable of screening many samples
quickly and automatically so the the important results can be brought to the attention of
the humans.
In (Kiyan, 2011), artificial neural networks like multilayer perceptrons are used to
classify breast cancer data as benign or malignant. The data contained about 700 samples
of 9 features such as lump thickness and cell size. Using the neural networks, the authors
were able to correctly classify over 95% of unseen samples. The authors compared 4
different neural networks: radial basis function networks; probabilistic neural networks;
general regression neural networks; and multilayer perceptrons. The authors concluded
that on their data samples, the general regression neural network was the best model with
98.8% accuracy.
In (Chen et al., 2012), an expert system for diagnosing thyroid disease was
developed. The system was based on a SVM with the features being selected using the
Fisher score, a supervised feature selection method which determines relevant features for
classification. The tuning parameters of the SVM were chosen using particle swarm
optimization. As was done for the CPGV metric in this work, (Chen et al., 2012)
80
compared multiple methods. The authors cite 7 articles which used the same thyroid
database to compare their system to. Using their methods, the authors were able to
achieve 97.5% accuracy, better than the previous published methods for their dataset.
Features extracted from brain scans, i.e. magnetic resonance imaging (MRI), have
been used for diagnosing Alzheimer’s disease (Klöppel et al., 2008), and for identifying
early risk of Alzheimer’s (Polat et al., 2012). Alzheimer’s disease is difficult to distinguish
from natural aging and frontotemporal lobar degeneration (FTLD). In (Klöppel et al.,
2008), an SVM was used to correctly classify Alzheimer’s vs. normal 95% of the time,
Alzheimer’s disease vs. dementia 93% of the time, and Alzheimer’s disease vs. FTLD
81% of the time. In (Polat et al., 2012), an SVM was trained on a dataset of patients who
were diagnosed with early onset Alzheimer’s disease. The SVM was able to correctly
classify 79% of Alzheimer’s disease patients vs normal patients.
5.2.2
Time Series Prediction
A survey of time series prediction applications (Sapankevych and Sankar, 2009)
found the most published papers related to time series prediction were focused on
financial markets. The next most common topic was electrical utility load. The rest of the
topics were control system forecasting, general business forecasting, and other
miscellaneous applications.
Many time series forecasting applications are very similar to blood glucose
prediction. In blood glucose prediction, as well as in other applications like financial
market prediction, several factors influence the time series. In some applications, the
factors need to be evaluated to determine if the influence is positive or negative. Blood
glucose is nonlinear and non-stationary. Nonlinearity implies that no linear equation can
closely approximate the data. Being non-stationary implies that it has dynamic trends
81
which change over time. Many time series problems share this nonlinearity and
non-stationarity aspect.
Five financial time series sources, including S&P 500 and several foreign bond
indices, were studied in (Tay and Cao, 2001a). They found that an SVR outperformed an
MLP, with the SVR being able to better fit the data, and the MLP overfitting. The authors
published several follow-up studies. In the first follow-up publication (Tay and Cao,
2001b), the SVR was combined with a Self Organizing Feature Map (SOM) which
clusters the entire input space into disjoint sections. The SOM is used as an intermediate
layer to feed more informed features to the SVR. This method significantly improved
prediction performance. In later work, the authors integrated ascending C SVR (Tay and
Cao, 2002a), and -descending SVR (Tay and Cao, 2002b) into a single SVR (Cao and
Tay, 2003). Ascending the C parameter of the SVR gives more weight to recent examples.
Descending the parameter results in support vectors being made from more recent
examples.
Electrical load forecasting allows the power company to provide for more efficient
power transmission, which reduces the price of electricity. (Bo-Juen et al., 2004) uses an
SVR with features including day of week, time of day, weather, and holidays to forecast
the maximum electrical load for the next 30 days. Including temperature reduced their
performance, which they attributed to the inaccuracies in forecasting temperature. An
SVR was compared to an Auto Regressive model for electrical load forecasting in
(Mohandes, 2002). The SVR significantly outperformed the Auto Regressive model,
especially when the number of training points was increased.
82
6
6.1
Future Work
Variability Metric
The development of the CPGV metric described in Chapter 3 is complete. However,
validation of the metric needs to be done for the metric to become accepted in clinical
practice. Biomarkers have been shown to correlate with the risk of long term
complications of diabetes. If the CPGV metric can be correlated with biomarkers, then the
metric would be validated as a method of determining risk of complications. Currently the
SmartHealth Lab is conducting a patient trial in which the patients submit blood and urine
samples, which are used to extract biomarkers. The team is also seeking out other
databases which contain biomarkers and CGM data to expedite the validation process.
6.2
Blood Glucose Prediction
The blood glucose prediction model described in Chapter 4 can be improved in many
ways. The current model only considers some of the features that are tracked and are
known to affect blood glucose levels. Some of the features that are yet not considered are:
• Sleep events. Sleep is known to change the levels of the hormone cortisol. Cortisol
changes a patient’s insulin sensitivity causing the blood glucose to change. A well
known example of sleep affecting blood glucose levels is the dawn phenomenon.
• Exercise events. Physical activity tends to pull glucose out of the blood into muscles
causing blood glucose to drop.
• Stress. Stress can induce cortisol, resulting in increasing blood glucose levels.
However, the effects of stress are highly individual. In some patients it causes blood
glucose levels to increase, and in others to decrease.
83
• Medication. Some medications, especially ones based on steroids, can have large
influences on blood glucose levels.
• Data from other sensors such as heart rate and galvanic skin response.
The prediction model also does not consider time based events, such as the change in
blood glucose 24 hours prior to the prediction point, or 7 days prior to the prediction
point. The behavior of blood glucose on weekdays vs weekends can be considerably
different for some patients.
Another refinement would involve preprocessing the input data to the prediction
model. In the CPGV metric, the blood glucose values were smoothed using a cubic spline
smoothing algorithm. A modified method could be incorporated for the prediction model.
However, the algorithm would need to be modified for smoothing the points near the end
of the time series since to smooth point BGn , the cubic splines algorithm uses points
BGn−1 and BGn+1 . When smoothing points at the end of the time series, the point BGn+1 is
obscured.
Transfer learning could be used to improve the predictions when there is not enough
data to adequately train and tune a model for a new patient. Currently, it requires 1 week
of data to fully train a model, and an additional week of data to fully tune the model.
Transfer learning is a machine learning technique which stores knowledge from one
problem for use in another. Currently, a model uses parameters tuned on other models if
there is not enough tuning data. Transfer learning has not yet been explored to aid in
situations where there is less than 1 week of data to train the model. Although patients are
highly individual in their reactions to certain life events, some effects are expected to be
shared across patients. For example, carbohydrates and insulin raise and lower blood
glucose, respectively, for everybody; only the amount of change depends on the individual.
84
Optimizing the set of features is not part of the current work. All of the computed
features are used to train the SVR. A greedy wrapper like the ones used for the CPGV
metric could be used to select the features for the prediction model. Time series
optimizations like the ascending-C and descending- (Cao and Tay, 2003) have improved
other time series prediction problems, and could be beneficial to this work.
85
7
Summary and Conclusion
This thesis presents research in machine learning for diabetes management. This
work is contributes to two of the three major projects of the SmartHealth Lab at Ohio
University, glycemic variability measurement and blood glucose prediction. There are two
major contributions:
1. development of a metric for measuring glycemic variability, a serious problem for
patients with diabetes; and
2. predicting patient blood glucose levels, in order to preemptively detect and avoid
potential health problems.
The first contribution is a novel solution to the glycemic variability measurement
problem. The CPGV metric in this work is a machine learning regression model that
learns the gestalt of 12 expert physicians’ impression of glycemic variability control. The
value of the metric has been shown to closely reflect the physicians’ consensus. The root
mean square error (RMSE) of the metric compared to the consensus was 0.417. If the
metric is rounded to the nearest integer, the RMSE was 0.511. The RMSE of individual
physicians, who gave integer ratings, compared to the consensus was 0.489. When used as
a screen for excessive glycemic variability, the CPGV metric outperformed all other
metrics that it was evaluated against. The accuracy of the CPGV metric as a screen was
90.1%. The accuracy of the most commonly used metrics, MAGE, standard deviation, and
IQR was 80.8%, 84.3%, and 78.7%, respectively. These results have been published in the
Journal of Diabetes Science and Technology (Marling et al., 2013).
The second contribution is a blood glucose prediction model that uses machine
learning for determining patient-specific predictions. The machine learning approach
allows the model to learn the individuality of a patient that a pure physiological model
cannot. The downfall of physiological models is they are not flexible enough to be used in
86
practice. The prediction model in this work combines the power of physiological models
with the flexibility of machine learning models to provide individualized blood glucose
models for prediction. The prediction model has been shown to outperform the baselines
that it was compared against: a simple baseline, which assumes that the blood glucose
levels do not change; and the statistical ARIMA model, which bases predictions on past
blood glucose levels. The model also performed very similarly to and in some cases
outperformed physicians who were given the same prediction problems. The RMSE of the
30 minute predictions compared to the actual blood glucose level was 19.5. The RMSE of
the baseline ARIMA model was 22.9. The RMSE of the best physician predictions was
19.8. The RMSE of the 60 minute predictions compared to the actual blood glucose level
was 35.7. The RMSE of the baseline ARIMA model was 42.2. The RMSE of the best
physician predictions was 38.4. These results will be presented at the International
Conference on Machine Learning and Applications in December, 2013.
Future work is planned to validate the CPGV metric by correlating the metric with
biomarkers that are linked to patient outcomes. The blood glucose prediction model could
potentially be improved in the future through the incorporation of additional life-event
features and experimentation with additional machine learning approaches.
87
References
American Diabetes Association (2012a). Checking your blood glucose.
http://www.diabetes.org/living-with-diabetes/treatment-and-care/blood-glucosecontrol/checking-your-blood-glucose.html, accessed November,
2012.
American Diabetes Association (2012b). Hypoglycemia (low blood glucose).
http://www.diabetes.org/living-with-diabetes/treatment-and-care/blood-glucosecontrol/hypoglycemia-low-blood.html, accessed November,
2012.
American Diabetes Association (2012c). Type-1.
http://www.diabetes.org/diabetes-basics/type-1, accessed November, 2012.
American Diabetes Association (2013). Economic costs of diabetes in the US in 2012.
Diabetes Care, 36(4):1033–1046.
Bequette, B. W. (2005). A critical assessment of algorithms and challenges in the
development of a closed-loop artificial pancreas. Diabetes Technology &
Therapeutics, 7(1):28–47.
Bergenstal, R. M., Ahmann, A. J., Bailey, T., Beck, R. W., Bissen, J., Buckingham, B.,
Deeb, L., Dolin, R. H., Garg, S. K., Goland, R., et al. (2013). Recommendations for
standardizing glucose reporting and analysis to optimize clinical decision making in
diabetes: the ambulatory glucose profile (AGP). Diabetes Technology &
Therapeutics, 15(3):198–211.
Bishop, C. M. (2007). Pattern recognition and machine learning (information science and
statistics). Springer, New York.
88
Bo-Juen, C., Ming-Wei, C., and Chih-Jen, L. (2004). Load forecasting using support
vector machines: A study on EUNITE competition 2001. IEEE Transactions on
Power Systems, 19(4):1821 – 1830.
Borg, G. A. V. (1982). Psychophysical bases of perceived exertion. Medicine and Science
in Sports and Exercise, 14(5):377–381.
Boutayeb, A. and Chetouani, A. (2006). A critical review of mathematical models and
data used in diabetology. BioMedical Engineering OnLine, 5(1):43.
Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (2008). Time series analysis: forecasting
and control. Wiley series in probability and statistics. Hoboken, N.J. : J. Wiley &
Sons, c2008.
Cao, L.-J. and Tay, F. E. H. (2003). Support vector machine with adaptive parameters in
financial time series forecasting. IEEE Transactions on Neural Networks,
14(6):1506–1518.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines.
ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software
available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.
Chen, H.-L., Yang, B., Wang, G., Liu, J., Chen, Y.-D., and Liu, D.-Y. (2012). A
three-stage expert system based on support vector machines for thyroid disease
diagnosis. Journal of medical systems, 36(3):1953–1963.
Clarke, W. L., Cox, D., Gonder-Frederick, L. A., Carter, W., and Pohl, S. L. (1987).
Evaluating clinical accuracy of systems for self-monitoring of blood glucose.
Diabetes Care, 10(5):622–628.
89
Dalla Man, C., Raimondo, D. M., Rizza, R. A., and Cobelli, C. (2007a). Mathematical
models of the metabolic system in health and in diabetes: GIM, simulation software
of meal glucose–insulin model. Journal of Diabetes Science and Technology,
1(3):323–330.
Dalla Man, C., Rizza, R. A., and Cobelli, C. (2007b). Meal simulation model of the
glucose-insulin system. IEEE Transactions on Biomedical Engineering,
54(10):1740–1749.
Danaei, G., Finucane, M., Lu, Y., Singh, G., Cowan, M., Paciorek, C., Lin, J., Farzadfar,
F., Khang, Y., Stevens, G., et al. (2011). Global burden of metabolic risk factors of
chronic diseases collaborating group (blood glucose). National, regional, and global
trends in fasting plasma glucose and diabetes prevalence since 1980: systematic
analysis of health examination surveys and epidemiological studies with 370
country-years and 2.7 million participants. Lancet, 378(9785):31–40.
DCCT Research Group and others (1987). Diabetes control and complications trial
(DCCT): results of feasibility study. Diabetes Care, 10(1):1–19.
Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data mining
and Knowledge Discovery, 3(4):409–425.
Duke, D. L. (2009). Intelligent Diabetes Assistant: A Telemedicine System for Modeling
and Managing Blood Glucose. PhD thesis, Carnegie Mellon University, Pittsburgh,
Pennsylvania.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009).
The WEKA data mining software: an update. ACM SIGKDD Explorations
Newsletter, 11(1):10–18.
90
Herbrich, R. (2002). Learning kernel classifiers: theory and algorithms. The MIT Press,
Cambridge, MA.
Hsu, C.-W., Chang, C.-C., and Lin, C.-J. (2003). A practical guide to support vector
classification. Technical Report, Department of Computer Science and Information
Engineering, National Taiwan University, http://www.csie.ntu.edu.tw/∼cjlin.
Hyndman, R. J. and Khandakar, Y. (2008). Automatic time series forecasting: The
forecast package for R. Journal of Statistical Software, 27(3):1–22.
Kiyan, T. (2011). Breast cancer diagnosis using statistical neural networks. IU-Journal of
Electrical & Electronics Engineering, 4(2):1149–1153.
Klonoff, D. (2005). Continuous glucose monitoring roadmap for 21st century diabetes
therapy. Diabetes Care, 28(5):1231–1239.
Klonoff, D. C. (2007). The artificial pancreas: how sweet engineering will solve bitter
problems. Journal of Diabetes Science and Technology, 1(1):72–81.
Klöppel, S., Stonnington, C. M., Chu, C., Draganski, B., Scahill, R. I., Rohrer, J. D., Fox,
N. C., Jack, C. R., Ashburner, J., and Frackowiak, R. S. (2008). Automatic
classification of mr scans in alzheimer’s disease. Brain, 131(3):681–689.
Kovács, L., Szalay, P., Almássy, Z., and Barkai, L. (2012). Applicability results of a
nonlinear model-based robust blood glucose control algorithm. Journal of Diabetes
Science and Technology, 7(3):708–716.
Lehmann, E. and Deutsch, T. (1992). A physiological model of glucose-insulin interaction
in type 1 diabetes mellitus. Journal of Biomedical Engineering, 14(3):235–242.
Lehmann, E. D. (2013). AIDA. http://www.2aida.net/welcome/, accessed September,
2013.
91
Lehmann, E. D., Tarı́n, C., Bondia, J., Teufel, E., and Deutsch, T. (2011). Development of
AIDA V4.3b diabetes simulator: Technical upgrade to support incorporation of
lispro, aspart, and glargine insulin analogues. Journal of Electrical and Computer
Engineering, 2011.
Magni, L., Forgione, M., Toffanin, C., Dalla Man, C., Kovatchev, B., De Nicolao, G., and
Cobelli, C. (2009). Artificial pancreas systems: Run-to-run tuning of model
predictive control for type 1 diabetes subjects: In silico trial. Journal of Diabetes
Science and Technology, 3(5):1091–1098.
Maimone, A. (2006). Data and knowlege acquisition in case-based reasoning for diabetes
management. Master’s thesis, Ohio University.
Marling, C., Wiley, M., Bunescu, R., Shubrook, J., and Schwartz, F. (2011). Emerging
applications for intelligent diabetes management. In Proceedings of the Twenty-Third
Innovative Applications of Artificial Intelligence Conference (IAAI-11), San
Francisco, CA, USA. AAAI Press.
Marling, C., Wiley, M., Bunescu, R. C., Shubrook, J., and Schwartz, F. (2012). Emerging
applications for intelligent diabetes management. AI Magazine, 33(2):67–78.
Marling, C. R., Struble, N. W., Bunescu, R. C., Shubrook, J. H., and Schwartz, F. L.
(2013). A consensus perceived glycemic variability metric. Journal of Diabetes
Science and Technology, 7(4):871–879.
McDonnell, C., Donath, S., Vidmar, S., Werther, G., and Cameron, F. (2005). A novel
approach to continuous glucose analysis utilizing glycemic variation. Diabetes
Technology & Therapeutics, 7(2):253–263.
Miller, W. A. (2009). Problem detection for situation assessment in case-based reasoning
for diabetes managemnt. Master’s thesis, Ohio University.
92
Mohandes, M. (2002). Support vector machines for short-term electrical load forecasting.
International Journal of Energy Research, 26(4):335–345.
Molnar, G., Taylor, W., and Ho, M. (1972). Day-to-day variation of continuously
monitored glycaemia: a further measure of diabetic instability. Diabetologia,
8(5):342–348.
Monnier, L., Wojtusciszyn, A., Colette, C., and Owens, D. (2011). The contribution of
glucose variability to asymptomatic hypoglycemia in persons with type 2 diabetes.
Diabetes Technology & Therapeutics, 13(8):813–818.
Parker, R. S., Doyle, F. J., Ward, J. H., and Peppas, N. A. (2000). Robust H∞ glucose
control in diabetes using a physiological model. AIChE Journal, 46(12):2537–2549.
Petruzella, F. D. (2005). Programmable logic controllers. Tata McGraw-Hill Education,
New York.
Polat, F., Orhan Demirel, S., Kitis, O., Simsek, F., Isman Haznedaroglu, D., Coburn, K.,
Kumral, E., and Saffet Gonul, A. (2012). Computer based classification of mr scans
in first time applicant alzheimer patients. Current Alzheimer Research, 9(7):789–794.
Pollock, S. (1993). Smoothing with cubic splines. Department of Economics, Queen Mary
and Westfield College.
Qu, Y., Jacober, S. J., Zhang, Q., Wolka, L. L., and DeVries, J. H. (2012). Rate of
hypoglycemia in insulin-treated patients with type 2 diabetes can be predicted from
glycemic variability data. Diabetes Technology & Therapeutics, 14(11):1008–1012.
R Core Team (2013). R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria.
93
Rodbard, D. (2009a). Interpretation of continuous glucose monitoring data: glycemic
variability and quality of glycemic control. Diabetes Technology & Therapeutics,
11(S1):S–55.
Rodbard, D. (2009b). New and improved methods to characterize glycemic variability
using continuous glucose monitoring. Diabetes Technology & Therapeutics,
11(9):551–565.
Rodbard, D., Bailey, T., Jovanovic, L., Zisser, H., Kaplan, R., and Garg, S. K. (2009).
Improved quality of glycemic control and reduced glycemic variability with use of
continuous glucose monitoring. Diabetes Technology & Therapeutics,
11(11):717–723.
Russell, S. J., Norvig, P., Canny, J. F., Malik, J. M., and Edwards, D. D. (1995). Artificial
intelligence: a modern approach. Prentice Hall Englewood Cliffs.
Sapankevych, N. and Sankar, R. (2009). Time series prediction using support vector
machines: a survey. IEEE Computational Intelligence Magazine, 4(2):24–38.
Schlichtkrull, J., Munck, O., and Jersild, M. (1965). The m-value, an index of blood-sugar
control in diabetics. Acta Medica Scandinavica, 177(1):95–102.
Seaquist, E. R., Miller, M. E., Bonds, D. E., Feinglos, M., Goff, D. C., Peterson, K.,
Senior, P., et al. (2012). The impact of frequent and unrecognized hypoglycemia on
mortality in the accord study. Diabetes Care, 35(2):409–414.
Service, F., Molnar, G., Rosevear, J., Ackerman, E., Gatewood, L., Taylor, W., et al.
(1970). Mean amplitude of glycemic excursions, a measure of diabetic instability.
Diabetes, 19(9):644–655.
94
Siegelaar, S. E., Holleman, F., Hoekstra, J. B., and DeVries, J. H. (2010). Glucose
variability; does it matter? Endocrine Reviews, 31(2):171–182.
Smola, A. J. and Schölkopf, B. (2004). A tutorial on support vector regression. Statistics
and Computing, 14(3):199–222.
Tay, F. E. and Cao, L. (2001a). Application of support vector machines in financial time
series forecasting. Omega, 29(4):309–317.
Tay, F. E. and Cao, L. (2002a). Modified support vector machines in financial time series
forecasting. Neurocomputing, 48(1):847–861.
Tay, F. E. and Cao, L. (2002b). ε-descending support vector machines for financial time
series forecasting. Neural Processing Letters, 15(2):179–195.
Tay, F. E. H. and Cao, L. J. (2001b). Improved financial time series forecasting by
combining support vector machines with self-organizing feature map. Intelligent
Data Analysis, 5(4):339–354.
Theodoridis, S. and Koutroumbas, K. (2009). Pattern recognition. Burlington, MA :
Academic Press, c2009.
Vapnik, V. (1995). The nature of statistical learning theory. Springer, New York.
Vapnik, V. N. (1998). Statistical learning theory. Wiley-Interscience, New York.
Vernier, S. J. (2009). Clinical evaluation and enhancement of a case-based medical
decision support system. Master’s thesis, Ohio University.
Wiley, M. T. (2011). Machine learning for diabetes decision support. Master’s thesis,
Ohio University.
95
World Health Organization (2011). Fact sheet n 312.
http://www.who.int/mediacentre/factsheets/fs312/en/.
Zarkogianni, K., Vazeou, A., Mougiakakou, S. G., Prountzou, A., and Nikita, K. S. (2011).
An insulin infusion advisory system based on autotuning nonlinear model-predictive
control. IEEE Transactions on Biomedical Engineering, 58(9):2467–2477.
Zoungas, S., Patel, A., Chalmers, J., de Galan, B. E., Li, Q., Billot, L., Woodward, M.,
Ninomiya, T., Neal, B., MacMahon, S., et al. (2010). Severe hypoglycemia and risks
of vascular events and death. New England Journal of Medicine, 363(15):1410–1418.
96
Appendix A: CPGV Full Results
Data
Raw
Raw
Raw
Raw
Smooth
Smooth
Smooth
Smooth
Raw
Raw
Raw
Raw
Smooth
Smooth
Smooth
Smooth
Raw
Algorithm
SVR Gaussian
SVR Gaussian
SVR Gaussian
SVR Gaussian
SVR Gaussian
SVR Gaussian
SVR Gaussian
SVR Gaussian
SVR Linear
SVR Linear
SVR Linear
SVR Linear
SVR Linear
SVR Linear
SVR Linear
SVR Linear
Linear Regression
Backward
Forward
Forward
Backward
Backward
Forward
Forward
Backward
Backward
Forward
Forward
Backward
Backward
Forward
Forward
Backward
Backward
Selection method
With Dev
Without Dev
With Dev
Without Dev
With Dev
Without Dev
With Dev
Without Dev
With Dev
Without Dev
With Dev
Without Dev
With Dev
Without Dev
With Dev
Without Dev
With Dev
Training data
0.45648
0.456773
0.454196
0.469863
0.463989
0.463485
0.461253
0.46819
0.470581
0.42739
0.427446
0.334205
0.348358
0.349242
0.359651
0.356269
0.355408
0.349796
0.349191
0.57388
0.556184
0.548641
0.577501
0.572428
0.544831
0.550158
0.555434
0.56731
0.546358
0.31892
0.356485
0.516566
0.548641
0.512517
0.546358
0.562145
0.534016
0.543299
Rounded Output RMSE
0.321307
0.328478
0.325057
0.426895
0.433501
0.328845
0.335354
0.325953
0.328283
MAE
0.438268
0.440004
0.435996
0.43578
RMSE
Table A.1: Full results for each of the glycemic variability platforms.
Continued on next page
0.40954
0.414583
0.40625
0.435417
0.429583
0.40875
0.407917
0.41375
0.427083
0.39625
0.37875
0.40625
0.374583
0.39625
0.41375
0.397083
0.407083
Rounded Output MAE
97
Table A.1 – continued from previous page
Algorithm
Data
Selection method
Training data
RMSE
MAE
Rounded Output RMSE
Rounded Output MAE
Linear Regression
Raw
Backward
Without Dev
0.457279
0.333595
0.568039
0.40954
Linear Regression
Raw
Forward
With Dev
0.470063
0.35457
0.562893
0.40705
Linear Regression
Raw
Forward
Without Dev
0.469478
0.352595
0.558434
0.41205
Linear Regression
Smooth
Backward
With Dev
0.461272
0.33892
0.570235
0.41954
Linear Regression
Smooth
Backward
Without Dev
0.458412
0.3383
0.561398
0.40954
Linear Regression
Smooth
Forward
With Dev
0.455203
0.349145
0.567308
0.41371
Linear Regression
Smooth
Forward
Without Dev
0.454964
0.349435
0.550916
0.40288
Multilayer Perceptron
Raw
Backward
With Dev
0.561344
0.4372
0.631932
0.47789
Multilayer Perceptron
Raw
Backward
Without Dev
0.504739
0.380255
0.57098
0.42039
Multilayer Perceptron
Raw
Forward
With Dev
0.579702
0.45092
0.637195
0.48122
Multilayer Perceptron
Raw
Forward
Without Dev
0.529509
0.38323
0.597359
0.41704
Multilayer Perceptron
Smooth
Backward
With Dev
0.559819
0.432585
0.652701
0.48956
Multilayer Perceptron
Smooth
Backward
Without Dev
0.538918
0.399145
0.624642
0.45538
Multilayer Perceptron
Smooth
Forward
With Dev
0.604697
0.40911
0.680807
0.46954
Multilayer Perceptron
Smooth
Forward
Without Dev
0.519543
0.40556
0.610474
0.44788
98
99
Appendix B: Full Blood Glucose Level Prediction Results
t+30
66
260
180
210
106
84
104
98
152
252
198
120
124
96
46
180
98
194
142
116
86
122
Point
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
122
68
104
144
172
94
148
48
92
142
90
206
248
120
88
102
96
60
224
144
292
70
t+60
132
100
138
126
188
102
174
72
98
120
140
140
246
158
112
140
102
108
152
164
222
66
30 min
t0
132
100
138
126
188
102
174
72
98
120
140
140
246
158
112
140
102
108
152
164
222
66
60 min
143.68
104.4
134.07
133.35
192.3
83.72
163.52
76.61
98.98
126.35
132.6
156.09
252.22
164.75
121.87
113.4
106.8
134.31
183.47
186.61
226.07
55.69
30 min
150.5
116.21
133.35
133.37
193.27
75.52
148.61
80.28
99.24
134.52
131.65
158.05
252.91
158.83
125.13
105.09
113.7
146.43
186.79
187.1
228.27
54.68
60 min
ARIMA
156.09
98.38
132.8
131.58
183.82
80.15
163.94
65.92
103.55
122.88
125.41
165.47
260.24
167.41
122.04
102.83
101.75
158.39
177.05
178.31
208.47
59.51
30 min
SVR
171.81
102.49
136.06
131.93
169.15
79.95
156.3
67.76
75.46
157.05
122.89
174.49
272.26
147.2
127.14
102.56
104.07
181.53
171.14
184.67
202.49
73.77
60 min
150
94
126
142
202
94
177
91
100
120
132
160
269
175
120
118
109
130
171
184
234
64
30 min
170
85
125
162
189
84
171
112
107
128
130
168
248
179
123
151
124
150
191
198
198
165
60 min
Phys 1
174
90
154
140
205
88
184
90
108
122
131
168
255
145
126
118
114
136
207
191
223
70
30 min
195
76
170
135
226
80
193
88
115
130
150
167
221
124
120
142
128
132
225
170
195
75
60 min
Phys 2
172
57
184
170
147
70
184
140
110
124
105
234
176
179
93
107
107
183
159
74
91
144
161
82
177
146
116
90
210
122
106
126
107
246
147
193
93
70
104
178
259
65
238
61
60 min
Phys 3
30 min
Table B.1: Predictions for each baseline and the new prediction model.
157.93
96.91
128.99
126.4
182.05
80.34
164.41
98.36
98.37
113.86
118.55
200.02
215.05
145.43
105.82
109.67
99
135.09
191.77
133.27
187.85
60.78
60 min
Continued on next page
149.4
95.99
131.59
130.15
191.73
83.37
166.41
88.18
99.84
119.21
128.82
172.11
247.64
167.92
115.72
120.36
101.71
129.25
190.86
169.9
216.57
74.56
30 min
New model
100
t+30
116
236
312
314
212
182
102
262
96
160
82
92
244
306
318
170
178
122
324
94
174
256
Point
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
246
162
62
368
66
110
172
248
380
234
90
76
154
106
280
94
184
198
300
318
332
108
t+60
270
158
148
232
144
162
128
350
288
214
108
86
162
126
240
118
186
234
318
314
174
112
30 min
t0
270
158
148
232
144
162
128
350
288
214
108
86
162
126
240
118
186
234
318
314
174
112
60 min
254.81
170.08
115.33
256.68
170.41
169.85
143.83
347.91
291.69
227.64
135.33
81.88
163.9
118.29
263.92
109.54
183.77
231.52
315.71
321.53
186.79
133.41
30 min
242.9
170.23
104.77
228.33
176.06
170.13
144.42
347.83
291.81
227.81
142.95
81.94
166.22
119.7
268.7
106.42
181.92
233.82
314.76
325.08
191.07
150.73
60 min
ARIMA
248.19
150.15
106.65
236.15
150.27
162.94
151.11
339.64
279.54
233.59
133.3
80.01
159.66
108.19
258.56
108.83
182.9
226.89
315.56
318.72
213.92
131.67
30 min
SVR
215.62
120.36
85.56
529.23
152.65
134.74
191.54
349.59
235.14
234.49
140.11
65.79
155.04
94.94
255.18
105.39
182.68
222.11
312.24
312.35
219.19
124.23
60 min
266
168
123
263
162
176
151
341
312
236
131
82
164
115
254
113
176
234
315
323
184
130
30 min
257
180
134
263
172
176
181
334
312
258
140
83
163
104
262
108
171
239
308
312
201
152
60 min
Phys 1
Table B.1 – continued from previous page
255
176
121
270
169
189
161
329
316
209
150
109
163
118
279
106
174
247
294
307
209
142
30 min
234
168
102
258
157
171
163
309
328
203
169
111
154
118
295
93
163
250
281
274
210
144
60 min
Phys 2
249
201
106
219
201
146
177
306
258
217
128
82
158
93
279
100
166
279
277
294
175
165
30 min
208
158
98
185
227
116
249
261
265
210
164
102
135
95
257
97
129
258
248
289
332
202
60 min
Phys 3
185
153.93
92.62
272.58
118.38
149.39
141.08
276.9
278.06
237.43
162.25
74.55
154.93
111.33
228.22
89.93
184.86
218.57
311.36
280.16
232.49
136.57
60 min
Continued on next page
233.73
157.63
110.8
313.05
93.68
169.97
144.07
314.77
282.99
240.55
132.78
83.22
158.84
114.98
253.53
103.75
183.51
227.69
313.44
303.72
217.31
131.69
30 min
New model
101
t+30
146
140
132
170
226
250
180
150
192
104
240
196
106
276
210
292
46
198
208
130
248
220
Point
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
254
190
98
238
220
40
296
196
274
100
206
342
126
222
158
146
316
190
218
108
172
172
t+60
166
246
164
158
162
94
314
226
202
106
174
214
98
186
118
220
210
264
150
148
136
170
30 min
t0
166
246
164
158
162
94
314
226
202
106
174
214
98
186
118
220
210
264
150
148
136
170
60 min
186.32
249.02
150.81
182.05
170.7
56.67
301.16
215.35
217.24
124.12
179.17
229.58
116.98
191.13
123.52
213.01
221.11
229.81
152.65
143.23
140.58
165.21
30 min
189.07
248.79
150.59
187.76
171.73
44.17
297.29
206.67
217.76
138.31
182.49
230.77
127.08
184.35
123.83
201.39
222.08
199.38
154.13
142.65
141.06
164.8
60 min
ARIMA
192.42
233.77
145.2
185.36
169.55
51.07
280.85
215.39
223.51
146.16
189.99
233.24
114.61
205.75
131.02
211.2
221.41
248.96
159.42
136.71
147.95
162.24
30 min
SVR
198.49
214.76
140.8
193.99
163.43
38.46
242.85
208.52
220.44
173.46
179.39
219.23
131.62
210.3
177.25
196.35
211.43
224.26
149.95
135.77
157.79
157
60 min
196
254
151
187
172
77
308
224
224
119
195
224
114
175
133
225
229
263
159
131
155
153
30 min
216
257
150
205
187
74
295
228
238
127
195
221
126
171
165
238
241
250
170
129
185
156
60 min
Phys 1
Table B.1 – continued from previous page
210
227
166
206
178
73
287
241
237
122
201
232
120
202
143
236
240
255
170
132
149
167
30 min
227
216
186
233
200
64
267
263
202
131
203
207
137
215
161
246
250
242
176
121
162
196
60 min
Phys 2
215
221
145
194
189
61
287
210
256
130
196
175
146
209
155
184
252
231
202
109
154
149
30 min
270
260
125
146
222
65
283
196
183
131
166
126
150
233
194
158
227
206
194
112
180
94
60 min
Phys 3
211
225.31
129.31
235.2
171.83
41.45
257.11
210.14
234.52
120.49
178.05
224.61
110.73
209.09
145.21
168.93
220.29
216.34
170.97
130.2
151.21
162.75
60 min
Continued on next page
201.86
252.11
141.31
206.59
170.72
55.27
276.7
216.64
246.6
124.6
177.9
237.95
105.51
209.65
134.97
196.94
230.61
247.78
163.33
134.61
149.71
169.24
30 min
New model
102
t+30
246
88
260
188
178
160
142
200
172
228
108
194
344
108
204
68
110
118
196
84
194
174
Point
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
194
182
76
170
100
104
64
174
92
300
184
56
248
164
182
126
114
158
186
240
92
242
t+60
188
164
102
194
124
124
80
200
120
354
176
150
226
186
184
138
214
202
214
238
128
224
30 min
t0
188
164
102
194
124
124
80
200
120
354
176
150
226
186
184
138
214
202
214
238
128
224
60 min
177.64
190.81
92.25
209.35
132.11
122.18
91.3
197.92
121.5
378.89
190.25
147.91
217.07
182.43
194.74
141.92
181.99
191.07
179.18
265.9
83.98
229.62
30 min
172.5
202.06
93.5
214.34
136.3
121.92
100
192.77
125
347.51
188.38
147.91
217.06
182.14
192.64
142.93
167.23
188.77
164.58
256.77
82.62
214.71
60 min
ARIMA
174.76
191.73
88.43
211.7
135.96
122.09
89.4
197.76
117.33
361.04
195.21
152.46
196.96
176.71
201.81
139.12
180.24
184.24
213.94
258.46
67.8
239.87
30 min
SVR
159.37
200.01
76.75
198.23
147.96
123.11
99.52
196.06
115
291
198.15
131.26
194.74
174.08
202.06
143.43
158.82
178.53
174.01
242.46
36.24
230.91
60 min
177
186
88
209
149
121
74
210
112
372
203
147
208
174
200
149
203
189
196
262
111
248
30 min
163
211
73
210
157
121
65
227
100
360
222
138
201
167
209
153
200
185
187
264
99
264
60 min
Phys 1
Table B.1 – continued from previous page
168
201
94
211
154
119
74
220
107
318
202
165
205
180
197
152
201
182
200
270
98
241
30 min
158
210
115
193
171
105
81
231
110
281
228
180
182
168
176
165
186
162
192
255
73
226
60 min
Phys 2
199
206
86
220
158
114
73
224
107
320
220
177
212
173
149
161
255
163
171
283
64
200
30 min
164
228
76
175
151
108
60
224
86
278
218
124
180
160
223
195
182
135
156
228
163
177
60 min
Phys 3
186.48
219.46
86.36
190.83
141.13
116.87
109.62
196.8
115.3
271.45
190.48
146.8
203.58
184.72
194.76
134.33
152.04
153.52
149.46
216.09
51.21
232.25
60 min
Continued on next page
174.9
194.74
88.83
200.13
132.55
119.11
92.73
201.26
117.05
333.93
191.38
150
205.86
182.29
205.02
137.81
179.97
177.13
190
245.15
83.45
242.6
30 min
New model
103
t+30
140
178
190
138
80
212
136
188
104
172
314
86
128
70
128
272
88
304
172
78
144
96
Point
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
102
140
80
166
334
76
266
152
126
108
86
264
182
94
204
134
192
60
120
170
182
154
t+60
84
198
96
144
276
104
260
156
102
144
90
298
178
120
172
148
208
104
150
202
174
144
30 min
t0
84
198
96
144
276
104
260
156
102
144
90
298
178
120
172
148
208
104
150
202
174
144
60 min
83.09
183.47
102.01
147.6
282.7
101.79
257.5
161.08
91.93
132.67
91.95
272.39
173.73
119.05
176.46
145.49
198.29
101.57
147.56
198.02
177.91
142.82
30 min
83.06
179.44
107.96
147.93
283.1
101.69
255.21
161.94
90.13
132.16
92.36
262.83
172.79
118.83
177.16
142.75
187.08
100.7
147.3
197.95
179.5
146.02
60 min
ARIMA
83.93
187.92
100.88
150.21
268.08
100.57
246.75
158.02
98.38
128.13
93.96
260.08
174.76
121.42
171.74
147.27
206.41
96.93
144.1
187.47
175.53
134.42
30 min
SVR
86.6
188.38
109.82
146.77
240.29
98.3
226.29
151.57
97.8
119.27
98.36
238.91
175.01
125.21
167.19
143.56
193.29
96.27
142.79
168.45
169.16
127.14
60 min
85
180
90
156
287
98
270
169
97
126
92
303
193
115
178
154
225
84
141
186
182
131
30 min
86
173
88
175
310
89
282
189
100
128
92
293
196
112
185
170
228
86
134
185
189
138
60 min
Phys 1
Table B.1 – continued from previous page
81
178
95
161
297
99
273
176
93
123
90
306
184
113
182
157
235
91
139
188
188
130
30 min
76
168
91
159
305
87
264
187
105
101
82
287
169
88
179
168
249
97
134
182
179
123
60 min
Phys 2
88
193
81
171
295
94
279
184
117
132
102
270
177
108
187
165
235
84
135
181
195
122
30 min
90
181
69
146
294
80
237
197
171
125
111
263
192
97
178
149
242
114
116
158
202
152
60 min
Phys 3
93.03
178.13
100.49
158.48
280.29
101.82
257.89
163.36
112.74
135.64
101.7
247.79
180
106.51
186.01
147.79
196.89
100.33
141.74
167.13
165.88
133.3
60 min
Continued on next page
86.32
185
97.76
157.64
296.21
101.11
257.46
162.7
106.82
131.87
94.67
273.63
173.17
106.92
181.8
149.16
203.96
97.89
143.81
184.13
175.17
124.77
30 min
New model
104
t+30
80
126
214
92
258
134
84
326
136
114
170
232
216
164
174
118
144
104
114
188
298
102
Point
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
144
282
186
90
108
182
114
202
186
216
234
176
120
142
340
102
118
274
106
240
118
76
t+60
114
228
180
126
104
158
138
154
140
190
208
138
120
122
306
96
150
216
122
194
150
86
30 min
t0
114
228
180
126
104
158
138
154
140
190
208
138
120
122
306
96
150
216
122
194
150
86
60 min
113.69
242.11
198.61
124.31
102.1
174.41
140.3
158.59
152.63
214.41
221.72
174.45
51.71
122.65
297.28
88.46
146.88
225.79
117.58
208.82
143.68
81.6
30 min
112.75
248.47
207.38
124.22
101.46
179.6
147.31
161.76
160.06
224.84
228.31
210.89
-18.75
122.75
277.85
86.43
146.7
226.23
117.44
210.59
145
81.39
60 min
ARIMA
107.78
234.4
198.25
121.06
100.49
182.8
149.33
168.5
149.83
211.04
218.26
145.76
91.51
120.57
305.87
90.14
145.42
225.99
139.98
196.71
143.2
82.09
30 min
SVR
104.56
226.85
198.89
118.4
104.41
196.9
154.15
189.23
156.58
212.56
213.13
121.18
105.14
119.81
304
95.93
146.63
220.2
152.24
188.74
139.22
85.2
60 min
99
246
194
121
104
171
128
158
156
214
228
157
109
127
319
85
136
245
107
216
138
79
30 min
89
254
209
111
113
184
126
170
174
243
235
172
109
140
331
79
132
262
100
227
126
71
60 min
Phys 1
Table B.1 – continued from previous page
98
247
194
114
117
180
133
169
162
218
232
168
116
112
325
86
144
247
105
225
132
78
30 min
80
231
188
103
138
195
119
192
155
258
228
154
115
93
338
74
130
252
92
235
127
74
60 min
Phys 2
96
206
209
102
89
188
163
178
164
231
233
142
112
146
326
80
128
241
105
220
129
98
30 min
70
167
214
93
139
186
180
212
184
238
243
115
148
169
298
88
109
254
99
206
110
77
60 min
Phys 3
86.12
252.26
189.99
122.41
122.04
194.59
147.7
178.7
157.29
217.15
208.66
108.28
161.04
129.5
325.19
80.28
140.82
238.12
124.04
198.22
131.91
79.03
60 min
Continued on next page
97.93
243.64
189.94
120.8
107.39
184.39
143.29
170.78
151.16
210.69
219.2
130.77
86.54
124.83
320.38
83.36
143.2
234.06
119.37
198.16
135.42
74.73
30 min
New model
105
t+30
142
206
158
290
146
128
214
86
90
114
190
216
116
128
204
98
118
232
236
144
206
296
Point
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
294
234
108
212
270
116
74
226
98
98
208
216
104
68
90
278
118
118
268
152
206
116
t+60
278
190
162
242
188
106
126
168
198
148
190
216
130
110
108
198
150
168
304
166
196
154
30 min
t0
278
190
162
242
188
106
126
168
198
148
190
216
130
110
108
198
150
168
304
166
196
154
60 min
277.35
192.36
147.77
216.56
183.98
114.42
124.57
174.75
179.97
144.24
166.53
212.51
126.81
108.67
102.86
199.92
137.5
171.65
293.49
157.41
201.4
147.27
30 min
265.27
192.43
141.83
194.31
182.92
128.61
124.56
175.01
176.28
142.82
158.31
211.3
126.8
109.74
101.06
199.92
132.02
178.42
270.82
153.61
203.82
147.23
60 min
ARIMA
275.85
201.64
144.3
216.85
185.65
122.34
122.3
179.03
158.15
133.71
165.33
205.57
110.89
101.66
102.52
202.08
131.76
164.46
298
151.93
196.67
139.76
30 min
SVR
258.34
214.55
134.39
192.71
165.95
129.14
126.06
184.13
130.1
125.13
161.59
194.82
102.31
100.52
103.59
199.52
116.7
163.71
293.36
139.31
185.39
125.67
60 min
295
204
145
228
193
108
120
182
173
139
177
223
108
97
95
209
127
158
312
156
208
143
30 min
298
218
137
218
190
96
111
197
162
141
157
234
95
81
88
218
117
148
307
146
208
135
60 min
Phys 1
Table B.1 – continued from previous page
295
203
147
228
177
102
116
173
162
138
180
219
108
95
93
219
136
161
307
150
205
145
30 min
265
195
132
201
166
93
116
171
151
132
171
206
92
72
85
234
125
151
283
127
200
125
60 min
Phys 2
279
204
132
224
181
133
114
185
227
174
183
211
137
95
98
221
130
161
313
149
207
124
30 min
239
211
89
196
157
141
96
191
149
120
192
220
121
77
85
213
107
141
289
144
227
111
60 min
Phys 3
244.26
193.97
133.94
174.66
173.44
126.4
113.21
179.87
131.73
135.71
167.34
208.04
111.09
121.27
103.94
207.83
125.9
160.39
260.96
139.03
189.59
123.73
60 min
Continued on next page
271.22
196.92
141.69
206.9
183.05
112.56
111.95
177.82
155.45
141.42
169.42
210.33
111.48
109.53
103.31
196.12
131.39
162.07
295.36
149.8
196.81
137.84
30 min
New model
106
t+30
140
78
160
154
100
172
156
188
198
178
178
242
106
142
154
92
256
232
124
132
246
154
Point
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
162
244
110
116
236
260
86
168
118
106
238
162
172
174
214
146
172
98
186
146
60
166
t+60
176
220
154
122
230
222
108
148
146
104
252
220
150
206
160
146
144
122
152
160
104
106
30 min
t0
176
220
154
122
230
222
108
148
146
104
252
220
150
206
160
146
144
122
152
160
104
106
60 min
168.5
229.04
148.02
111.58
229.3
216.59
107.48
150.89
191.27
109.7
247.8
223.43
165.88
210.1
161.48
151.01
168.03
154.83
150.83
159.98
102.67
108.97
30 min
166.45
230.35
146.99
111.27
229.25
201.53
107.45
151.51
236.36
106.01
249.49
224.15
170.73
209.42
157.93
149.36
175.24
172.02
150.83
159.98
102.63
109.2
60 min
ARIMA
156.99
232.28
144.18
104.19
230.6
217.72
108.32
148.72
155.24
107.42
242.85
223.61
165.55
210.59
166.36
156.21
172.5
147.43
154.43
155.86
97.26
126.27
30 min
SVR
146.7
229.72
143.32
115.63
237.92
211.64
113.2
148.17
139.62
111
233.78
223.89
166.84
205.3
149.03
157.02
189.34
164.85
172.25
151.76
98.94
155.52
60 min
152
247
140
100
232
242
99
154
173
105
245
134
169
210
193
177
161
146
140
157
87
117
30 min
142
261
136
83
232
242
96
165
184
101
238
131
180
216
197
180
177
161
139
162
80
130
60 min
Phys 1
Table B.1 – continued from previous page
155
237
145
95
229
250
101
154
178
106
242
238
167
216
192
165
167
154
139
159
90
125
30 min
131
228
134
75
214
238
91
145
162
101
227
205
149
216
185
142
176
171
131
154
81
129
60 min
Phys 2
157
209
138
88
219
257
104
174
178
102
233
212
169
208
166
169
179
169
182
150
86
123
30 min
155
193
130
101
232
261
93
155
164
96
225
188
188
197
187
129
173
201
151
165
70
145
60 min
Phys 3
152.58
224.08
144.69
100.69
224.7
215.73
112.51
157.35
151.34
119.02
228.5
205.03
150.76
189.17
176.64
169.56
151.58
150.3
172.88
151.84
90.22
145.74
60 min
Continued on next page
151.03
229.66
144.65
95.37
227.43
222.08
108.29
155.63
162.45
111.77
239.05
211.41
156.02
206.87
178.92
168.36
171.5
142.33
161.69
155.43
91.09
124.66
30 min
New model
107
Table B.1 – continued from previous page
Point
t+30
t+60
ARIMA
t0
SVR
Phys 1
Phys 2
Phys 3
New model
30 min
60 min
30 min
60 min
30 min
60 min
30 min
60 min
30 min
60 min
30 min
60 min
30 min
60 min
177
148
146
148
148
149.92
150.35
148.59
146.78
154
162
154
149
152
158
148.98
148.55
178
148
144
148
148
148.23
148.4
147.76
147.2
148
145
146
139
145
153
146.53
145.95
179
110
122
96
96
116.59
119.75
117.45
122.27
121
135
120
141
118
108
107.51
112.41
180
216
218
214
214
215.51
214.89
212.33
204.63
223
227
226
229
227
233
211.38
211.43
181
132
142
134
134
127.74
126.58
128.1
125.97
140
168
134
151
129
114
130.77
131.52
182
76
66
78
78
87.19
88.05
101.11
135.98
98
126
111
156
56
101
94.51
113.72
183
216
244
212
212
216.13
216.53
214.03
212.18
238
239
229
219
213
240
219.77
220.94
184
260
270
244
244
262.49
253.7
258.97
259.06
272
282
273
248
275
240
257.84
241.92
185
378
376
328
328
331.01
321.31
334.56
305.38
351
355
354
342
340
301
317.82
305.58
186
148
146
152
152
158.63
160.94
157.21
158.06
156
167
171
184
151
169
153.74
157.33
187
182
146
138
138
184.19
201.71
182.76
193.27
167
196
173
196
166
127
172.67
177.86
188
104
90
148
148
134.33
133.32
127.31
123.88
123
104
127
102
131
110
131.66
134.24
189
284
242
206
206
231.37
232.22
258.08
243.53
249
254
260
247
247
196
260.36
250.47
190
166
170
172
172
165.45
165.15
160.49
158.07
162
162
161
191
158
184
169.3
199.96
191
124
100
162
162
156.21
155.95
149.44
146.65
134
119
135
119
140
116
146.63
136.8
192
88
114
82
82
74.79
73.91
76.2
81.54
68
66
88
113
76
106
65.32
83.16
193
292
314
176
176
201.34
206.45
203.65
209.8
227
239
220
211
229
194
202.83
208.37
194
234
222
198
198
245.15
256.49
236.95
223.53
231
254
234
219
238
201
239.47
238.06
195
180
182
184
184
182.47
182.15
184.46
188.75
175
169
176
172
175
175
180.15
177.13
196
116
124
124
124
121.51
120.94
125.65
130.96
116
107
117
110
114
108
123.51
122.2
108
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
Thesis and Dissertation Services
!