Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Measuring Glycemic Variability and Predicting Blood Glucose Levels Using Machine Learning Regression Models. A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Nigel W. Struble December 2013 c 2013 Nigel W. Struble. All Rights Reserved. 2 This thesis titled Measuring Glycemic Variability and Predicting Blood Glucose Levels Using Machine Learning Regression Models. by NIGEL W. STRUBLE has been approved for the School of Electrical Engineering and Computer Science and the Russ College of Engineering and Technology by Cynthia R. Marling Associate Professor of Electrical Engineering and Computer Science Dennis Irwin Dean, Russ College of Engineering and Technology 3 Abstract STRUBLE, NIGEL W., M.S., December 2013, Computer Science Measuring Glycemic Variability and Predicting Blood Glucose Levels Using Machine Learning Regression Models. (108 pp.) Director of Thesis: Cynthia R. Marling This thesis presents research in machine learning for diabetes management. There are two major contributions:(1) development of a metric for measuring glycemic variability, a serious problem for patients with diabetes; and (2) predicting patient blood glucose levels, in order to preemptively detect and avoid potential health problems. The glycemic variability metric uses machine learning trained on multiple statistical and domain specific features to match physician consensus of glycemic variability. The metric performs similarly to an individual physician’s ability to match the consensus. When used as a screen for detecting excessive glycemic variability, the metric outperforms the baseline metrics. The blood glucose prediction model uses machine learning to integrate a general physiological model and life-events to make patient-specific predictions 30 and 60 minutes in the future. The blood glucose prediction model was evaluated in several situations such as near a meal or during exercise. The prediction model outperformed the baselines prediction models, and performed similarly to, and in some cases outperformed, expert physicians who were given the same prediction problems. 4 Acknowledgments I would like to humbly thank my academic advisor and committee chair, Dr. Cynthia Marling. Without her boundless inspiration and help this thesis would not have been undertaken. I would also like to thank my committee members Dr. Razvan Bunescu for his machine learning advice, Dr. Frank Schwartz for his willingness to work with computer scientists, and Dr. Jundong Liu. I am thankful for everyone who has contributed to the research behind this thesis, including the medical doctors Dr. Schwartz, Dr. Shubrook, and Dr. Guo, the patients, and students on the project past and present. I would also like to thank my family for supporting me through my education. 5 Table of Contents Page Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Diabetes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 SmartHealth Lab Research . . . . . . . . . . . . . . . . . . 2.2.1 The 4 Diabetes Support SystemTM Project . . . . . . 2.2.2 A Consensus Perceived Glycemic Variability Metric 2.2.3 Blood Glucose Prediction . . . . . . . . . . . . . . 2.3 Machine Learning Algorithms . . . . . . . . . . . . . . . . 2.3.1 Linear Regression . . . . . . . . . . . . . . . . . . 2.3.2 Support Vector Regression . . . . . . . . . . . . . . 2.3.3 Kernel Functions . . . . . . . . . . . . . . . . . . . 2.3.4 Multilayer Perceptron . . . . . . . . . . . . . . . . 2.3.5 Auto Regressive Integrated Moving Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 13 13 14 15 15 16 18 23 23 29 3 Consensus Perceived Glycemic Variability Metric . . . . . . 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Distinction from HbA1C . . . . . . . . . . . . 3.1.2 Difficulty of Quantifying Glycemic Variability 3.1.3 Previous Measurement Methods and Studies . 3.1.4 The New Metric . . . . . . . . . . . . . . . . 3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Data Collection and Format . . . . . . . . . . 3.2.2 Feature Engineering . . . . . . . . . . . . . . 3.2.3 Smoothing the Data . . . . . . . . . . . . . . 3.2.4 Machine Learning Algorithms . . . . . . . . . 3.2.5 Algorithm Configurations . . . . . . . . . . . 3.2.6 Feature Selection and Tuning . . . . . . . . . 3.2.7 10-Fold Cross Validation . . . . . . . . . . . . 3.2.8 Datasets . . . . . . . . . . . . . . . . . . . . . 3.2.9 Defining the CPGV Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 32 33 33 34 35 35 35 39 41 41 41 43 43 44 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 3.4 3.2.10 Screen for Excessive Glycemic Variability . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 CPGV Metric Performance . . . . . . . . . . . . . . . . . 3.3.2 Performance of the Excessive Glycemic Variability Screen Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 45 45 46 50 4 Blood Glucose Prediction . . . . . . . 4.1 Background . . . . . . . . . . . 4.1.1 Previous Work . . . . . 4.2 Methods . . . . . . . . . . . . . 4.2.1 Data . . . . . . . . . . . 4.2.2 Physiological Model . . 4.2.3 Feature Vector . . . . . 4.2.4 Walk Forward Testing . 4.2.5 Baselines for Evaluation 4.2.6 Evaluation . . . . . . . 4.3 Results . . . . . . . . . . . . . . 4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 52 52 54 54 56 58 59 60 61 63 71 5 Related Research . . . . . . . . . . . . . . . . . . . . 5.1 Diabetes Related Research . . . . . . . . . . . . 5.1.1 Physiological Models . . . . . . . . . . . 5.1.2 Predictive Blood Glucose Control . . . . 5.1.3 Glycemic Variability . . . . . . . . . . . 5.2 Machine Learning Related Research . . . . . . . 5.2.1 Machine Learning For Problem Detection 5.2.2 Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 73 73 74 77 78 79 80 6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.1 Variability Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.2 Blood Glucose Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Appendix A: CPGV Full Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Appendix B: Full Blood Glucose Level Prediction Results . . . . . . . . . . . . . . 99 7 List of Tables Table Page 3.1 3.2 3.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Metric performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Classification performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Previous study prediction results. RMSE baselines . . . . . . . . . Ternary baselines . . . . . . . . New prediction model results . . Statistical significance . . . . . CEGA Regions 30 . . . . . . . CEGA Regions 60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 63 64 65 65 70 70 A.1 Full glycemic variability results . . . . . . . . . . . . . . . . . . . . . . . . . 97 B.1 Full prediction results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 8 List of Figures Figure Page 2.1 2.2 2.3 2.4 2.5 Tuning λ . . . . . -tube . . . . . . Perceptron . . . . MLP . . . . . . . Sigmoid function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 20 24 25 26 3.1 3.2 3.3 3.4 Glycemic Variability Slope . . . . . . . . ROC curve . . . . . Rated plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 39 49 50 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 Physiological Model . . . . Walk Forward Testing . . . . Physician prediction GUI . . CEGA . . . . . . . . . . . . t0 CEGA . . . . . . . . . . . ARIMA CEGA . . . . . . . SVR CEGA . . . . . . . . . Physician 1 CEGA . . . . . Physician 2 CEGA . . . . . Physician 3 CEGA . . . . . New prediction model CEGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 60 62 66 67 67 68 68 69 69 70 . . . . . 9 1 Introduction This thesis presents research in machine learning for diabetes management. There are two major contributions: 1. development of a metric for measuring glycemic variability, a serious problem for patients with diabetes; and 2. predicting patient blood glucose levels, in order to preemptively detect and avoid potential health problems. This work contributes to two of the three major projects of the SmartHealth Lab at Ohio University. The SmartHealth Lab projects include the 4 Diabetes Support System (4DSS), glycemic variability measurement, and blood glucose prediction. The 4DSS project provides problem detection and decision support for patients with type 1 diabetes mellitus (T1DM). The glycemic variability metric is a tool that physicians can use to help gauge overall glycemic control. The blood glucose prediction project aims to anticipate impending blood glucose control problems, thereby enabling preventative action. Patients with T1DM use insulin to control their blood glucose levels to a range prescribed by their doctor. Poor blood glucose control fails to maintain this range, and over time leads to several adverse effects including blindness, kidney failure, and premature death (DCCT Research Group and others, 1987). The glycemic variability metric provides an easy way to measure glycemic control (Rodbard et al., 2009). Blood glucose prediction can enable patients to preemptively correct blood glucose levels before a dangerous excursion occurs. A detailed description of diabetes, and the machine learning techniques used in this work is provided in Chapter 2. The first contribution of this thesis is a Consensus Perceived Glycemic Variability metric (CPGV). The metric was built using machine learning algorithms to capture the consensus of a group of expert physicians’ impression of glycemic variability. The metric 10 combines several calculations and metrics performed on the blood glucose signal. The glycemic variability metric developed in this work is the third iteration of the project, (Vernier, 2009) being the first, and (Wiley, 2011) the second. The first two iterations focused on classifying excessive vs. acceptable glycemic variability based on the impressions of two local physicians. This work extends that idea to provide a continuous metric based on the impressions of 12 physicians from across the country and around the world. A complete report and evaluation of the metric is presented in Chapter 3. A full list of the machine learning algorithms and results for the metric are provided in Appendix A. The second contribution of this work provides a framework for a blood glucose prediction system. The prediction system incorporates a physiological model of blood glucose as well as other factors. The blood glucose prediction system uses machine learning to combine the components of the physiological model with the other factors to predict blood glucose levels 30 and 60 minutes in the future. The full description of the blood glucose prediction work is shown in Chapter 4. A comprehensive list of prediction results is given in Appendix B. Chapter 5 describes related research. This includes research on physiological models, predictive blood glucose control, glycemic variability, problem detection, and time series prediction. Chapter 6 describes future work possibilities. Chapter 7 gives a summary and conclusion to this work. 11 2 Background This chapter provides background relevant to this work. First diabetes is defined and the challenges of managing diabetes are presented. Next, the work is positioned within SmartHealth Lab research at Ohio University on intelligent diabetes management. Finally, the machine learning approaches used in this work are described. 2.1 Diabetes Diabetes mellitus, or simply diabetes, is a chronic disease which disrupts the body’s natural ability to manage blood glucose levels. There are two types of diabetes, Type 1 Diabetes Mellitus (T1DM), and Type 2 Diabetes Mellitus (T2DM). Patients with T1DM do not produce insulin on their own to control their blood glucose (American Diabetes Association, 2012c). Patients with T2DM do produce insulin, however it is in insufficient amounts to control their blood glucose. Worldwide, there are about 350 million people living with diabetes (Danaei et al., 2011). Of those, about 5-10% have T1DM for which there is no known cure or prevention (World Health Organization, 2011). In 2007, the total annual cost of diabetes in the United States was $174 billion. In 5 years this figure increased by 41% to $245 billion in 2012 (American Diabetes Association, 2013). The primary goal of diabetes management is for the patient to maintain a glucose level in a range prescribed by the patient’s physician, typically between 70 and 160 mg/dl (American Diabetes Association, 2012a). A patient is hypoglycemic when their glucose level drops below this range; when the glucose level rises above this range, they are hyperglycemic. When a patient experiences hypoglycemia, they typically feel short-term side effects including dizziness, and confusion, and they are at risk for more serious problems, including coma and seizure (American Diabetes Association, 2012b). 12 Prolonged hyperglycemia is known to increase the risk of chronic complications such as heart disease, kidney failure, and blindness (DCCT Research Group and others, 1987). The blood glucose level is affected directly by insulin and also by carbohydrate intake. The blood glucose level is also affected indirectly by several life events such as stress, exercise, and sleep. Patients continuously monitor their blood glucose levels and make corrections in an attempt to keep the level within the range prescribed by their doctor. There are patients who, usually over time, become insensitive to the symptoms of hypoglycemia. Undetected hypoglycemia during sleep can be particularly dangerous, since the patient may not wake up in time to take action. These patients need to take extra care to control their disease to avoid and correct hypoglycemia. To monitor diabetes control on a day-to-day basis, patients with T1DM take a fingerstick blood sample 4-8 times a day to measure their blood glucose. An insulin pump gives patients more control over when and how much insulin they take than traditional injections. The pump delivers a basal amount of insulin continuously throughout the day. It delivers additional boluses of insulin as needed for meals or to correct hypoglycemia. A Continuous Glucose Monitor (CGM) can also be used, which reads a measurement of the blood glucose every 5 minutes. A CGM does not replace fingersticks since the CGM is not as accurate as fingersticks, and needs to be calibrated several times a day. Patients with a Medtronic insulin pump have access to the Bolus WizardR . The Bolus Wizard uses information from the current blood glucose level and carbohydrate intake to calculate how much bolus insulin a patient needs to take to correct for hyperglycemia or to compensate for a meal Medtronic also provides CareLinkR software, which allows patients to upload and review their pump and CGM data. When diabetes patients visit their physicians, they take a blood test to determine their HbA1c (glycosylated hemoglobin), which reflects their average glucose level over a 13 six-week period. It is recommended that the HbA1c is below 7% for most patients (American Diabetes Association, 2012a). Physicians use information from HbA1c, fingersticks, insulin, and CGM data to make recommendations to improve patients’ diabetes control. 2.2 SmartHealth Lab Research The SmartHealth lab is currently working on three major projects (Marling et al., 2012), the Diabetes Support System (4DSS) support system, a glycemic variability metric, and blood glucose prediction. As of 2013, the SmartHealth lab has collected data from three clinical research studies, and a fourth running study, of patients with T1DM to develop its projects. 2.2.1 The 4 Diabetes Support SystemTM Project The 4DSS project is a case-based reasoning (CBR) system that identifies problems and offers possible treatments. The CBR system grows the case base when a physician identifies a new problem with a patient and decides on a clinical treatment. The outcome of the case is evaluated based on whether the treatment was followed by the patient, and if the treatment successfully to fixed the problem. When evaluating a new patient, the 4DSS system uses life-events and blood glucose levels to find problems. The system then finds the closest match to other occurrences of the problems that it finds. The suggested treatment is an adaptation of the treatment of the most similar cases. If the treatment is followed and successful, then the new case would be added to the case-base for future diagnoses. Random samples of identified problems and treatments were shown to a panel of physicians (Marling et al., 2012). The physicians agreed 90% of the time that the problem identification system would be useful for physicians. They agreed 80% of the time that the 14 cases used for suggesting treatment were similar to the identified problems. They agreed 70% of the time that the suggested treatment would be beneficial to the patient. The goal of this project is to provide automated problem detection and treatments to physicians and nurses, who then can decide to relay the suggested treatments to patients if they are deemed appropriate. This would allow physicians to give treatments and advice to patients more frequently than their routine clinical check-ups. 2.2.2 A Consensus Perceived Glycemic Variability Metric Glycemic variability is an important part of diabetes management. Excessive glycemic variability has been linked to hypoglycemia unawareness (Rodbard et al., 2009), which can lead to dangerously prolonged hypoglycemia. Automated detection of glycemic variability would identify potentially at-risk patients. There is no current metric for glycemic variability which has been agreed upon by physicians, so it is not routinely assessed in clinical practice. However, physicians are able recognize excessive glycemic variability when they see it in blood glucose plots. The goal of this project is to capture that physician perception in order to measure glycemic variability for clinical use. Chapter 3 presents the Consensus Perceived Glycemic Variability (CPGV) metric that has been developed to supplement HbA1c as a measure of overall glycemic control in clinical practice. To develop this metric, 12 physicians managing patients with type 1 diabetes rated 250 24-hour continuous glucose monitoring (CGM) plots as exhibiting (1)low, (2)borderline, (3)high or (4)extremely high glycemic variability. When physician ratings were not unanimous, they were averaged to obtain a consensus. Descriptive features derived from the CGM plots were used to train machine learning algorithms to match consensus ratings. 15 2.2.3 Blood Glucose Prediction When managing blood glucose, there is a time delay between an action and the outcome. Food needs to be digested, insulin needs to be absorbed, and the CGM sensor measures the glucose in the interstitial tissue which lags the glucose in the blood plasma by about 15 minutes. Therefore, future uncertainty is a limiting factor in detecting problems in real time, or before they ever occur. Blood glucose prediction is a time series forecasting problem. Blood glucose is predicted based on past blood glucose levels, insulin data, meal data, exercise, medications, stress, sleep patters, work schedules, etc. Some patients respond differently to certain life events such as stress. Chapter 4 presents a glucose prediction model which uses a physiological model to combine certain features in an informed way to improve prediction accuracy for 30 and 60 minutes in the future. This model is an incremental prediction model that is trained on each patient individually. The goal of this project is to incorporate a real time prediction system for patients. The model could provide feedback to the patient who could use the predictions to preemptively correct a problem. The model could also be used in a closed-loop artificial pancreas to administer the appropriate amount insulin without patient intervention. 2.3 Machine Learning Algorithms This section describes the machine learning and statistical approaches used in this work. These are Linear Regression (LR), Support Vector Regression (SVR), Multilayer Perceptrons (MLP), and Auto Regressive Integrated Moving Average (ARIMA). 16 2.3.1 Linear Regression Linear Regression (LR), like all of the regression models in this work, is a means of creating a mathematical function or model which takes input values and outputs a close approximation to a desired value. The input values are a collection of features, called the input vector, which is computed from a dataset. The purpose of regression is to use data which is already known to approximate data which is difficult to obtain. In this work, LR is one of the machine learning approaches used for measuring glycemic variability, as described in Chapter 3, where features are computed from data which is automatically collected by a sensor to approximate a manual evaluation of physicians. The simplest form of LR, as described in (Bishop, 2007), takes the following form: y(x, w) = wT x + w0 (2.1) where x is a vector of features, and w is a weight vector. Equation 2.1 is a linear combination of the input variables x and w. w0 allows for any fixed offset in the data and is usually called the bias for the reason that it is the bias of the data, not to be confused with statistical bias. Since it is a linear combination of the input variables, the simple form of LR is limited as to what it can fit. For this reason, the input variable x is usually replaced by the basis function φ(x), as in Equation 2.2. y(x, w) = wT φ(x) + w0 (2.2) Using the form in Equation 2.2, the same behavior of Equation 2.1 can be achieved by using the identity φ(x) = x. However, the function y(x, w) can be made nonlinear of the input vector x by using a nonlinear basis function such as the polynomial basis function of the form φi (x) = xi (2.3) 17 or the Gaussian basis function of the form (x − µi )2 φi (x) = exp − 2s2 ( ) (2.4) where µi controls the location of the Gaussian curve, and s controls the scale. Even if the basis function is nonlinear, Function 2.2 is still considered a LR model since it is linear in w. Many nonlinear basis functions exist, but the choice of basis function does not affect the method of computing the vector w. The implementation of LR in this work is the WEKA implementation (Hall et al., 2009), which uses the identity φ(x) = x for the basis function. The vector w is computed on a set of training vectors x1 , x2 , ..., xN , which have the target values t, by minimizing the data-dependent sum-of-squares error function given by N 1X (tn − wT φ(xn ))2 (2.5) E D (w) = 2 n=1 The minimum error is where Equation 2.5 is minimum. Since the function is convex, the minimum is where the gradient is zero. The gradient of Equation 2.5 is given by N N X X ∇E D (w) = tn φ(xn )T − w φ(xn )φ(xn )T n=1 (2.6) n=1 Setting ∇E D (w) = 0 and solving for w gives w = (ΦT Φ)−1 ΦT t (2.7) where Φ is an N × M matrix, called the design matrix, given by φ (x ) φ (x ) · · · φ (x ) 0 1 1 1 M−1 1 φ (x ) φ (x ) · · · φ (x ) 1 2 M−1 2 0 2 Φ = . . .. .. .. .. . . φ (x ) φ (x ) · · · φ (x ) 0 N 1 N M−1 (2.8) N In order to reduce over-fitting the training data, a regularization parameter is introduced to the error function, which takes the form E D (w) + λEW (w) (2.9) 18 where λ is the regularization coefficient, E D (w) is as defined in Equation 2.5, and EW (w) is given by 1 EW (w) = wT w 2 (2.10) By introducing λ to the error function, the parameters in the vector w can be included in the minimization function. This helps reduce over-fitting by learning smaller parameters at the cost of increasing the error on the training set. The justification for this compromise is based on Occam’s razor, which states that the simplest solution is usually correct (Domingos, 1999). The regularization parameter can take any value, so the parameter needs to be tuned to find a good balance between data error and complexity. Figure 2.1 shows an example of tuning λ on a dataset where ln(λ) ranges in steps of 5 from -50 to 0, i.e., λ = e−50 to e0 using Root Mean Square Error (RMSE) as the measure of error. Solving Equation 2.7 with the changes from Equation 2.9 gives w = (λI + ΦT Φ)−1 ΦT t (2.11) where I is the identity matrix. 2.3.2 Support Vector Regression Support Vector Regression (SVR) was first described by (Vapnik, 1995) and has been described in many works including (Vapnik, 1998; Smola and Schölkopf, 2004; Bishop, 2007). SVRs are based on the equation y(xn ) = wT φ(xn ) + b where xn is an input vector, w is the vector of learned weights, φ is a transformation function, and b is the bias. (2.12) 19 Figure 2.1: Tuning the regularization coefficient, λ, on a separate validation dataset using RMSE to measure the error. SVRs minimize an error function based on the training vectors and their target values. In this work, the type of SVR used is an -SVR in which the error function is if |y(xn ) − tn | < ; 0, E (y(xn ) − tn ) = (2.13) |y(xn ) − tn | − , Otherwise where > 0. If the difference between y(xn ) and the target tn is less than , i.e., |y(xn ) − tn | < , then the error is 0, otherwise the error is reduced by . The region where |y(xn ) − tn | < is referred to as the -tube. 20 A regularization parameter, C, is introduced to give the error function: C N X n=1 1 E (y(xn ) − tn ) + kwk2 2 (2.14) For target points that lie outside of the -tube, slack variables are introduced. There are two slack variables ξn , and ξ̂n for each vector xn such that ξn ,ξ̂n ≥ 0 gives tn ≤ y(xn ) + + ξn (2.15) tn ≥ y(xn ) − − ξ̂n (2.16) where ξn represents the error for y(xn ) that lies above or on the top edge of the -tube, and ξ̂n represents the error for y(xn ) that lies below or on the bottom edge of the -tube. Figure 2.2 shows an example of an -tube. Using the slack variables, Equation 2.14 can be Figure 2.2: y(xn ) curve showing the -tube. Points above the tube have ξn > 0, points below the tube have ξ̂n > 0. Filled in points represent support vectors. rewritten as C N X 1 (ξn + ξ̂n ) + kwk2 2 n=1 (2.17) 21 which can be minimized subject to the constraints in Equations 2.15 and 2.16 and ξn ,ξˆn ≥ 0 by using Lagrangian multipliers and optimizing the following Lagrangian (Bishop, 2007) N X N X 1 (ξn + ξ̂n ) + kwk2 + (µn ξn + µ̂n ξ̂n ) L =C 2 n=1 n=1 − N X an ( + ξn + y(xn ) − tn ) − N X (2.18) ân ( + ξ̂n − y(xn ) + tn ) n=1 n=1 By substituting Equation 2.12 for y(xn ) and setting the partial derivative with respect to w, b, ξn , and ξ̂n to zero gives N X ∂L =0⇒w= (an − ân )φ(xn ) ∂w n=1 (2.19) N X ∂L =0⇒ (an − ân ) = 0 ∂w n=1 ∂L = 0 ⇒ an + µn = C ∂w ∂L = 0 ⇒ ân + µ̂n = C ∂w (2.20) (2.21) (2.22) Using these results with Equation 2.18 gives the optimization N min N 1 XX (an − ân )(am − âm )k(xn , xm ) 2 n=1 m=1 a, â + N X (an + ân ) − n=1 N X n=1 (an − ân )tn (2.23) subject to N X (an − ân ) = 0, n=1 0 ≤ an , ân ≤ C Equation 2.23 is called the dual representation equation. The function k(xn , xm ) is called the kernel function (Section 2.3.3). 22 The dual representation equation can be solved by Karush-Kuhn-Tucker (KKT) conditions by providing stopping criteria. The KKT conditions imply that the product of the dual variables and constraints vanish at the solution. The KKT conditions are: an ( + ξn + y(xn ) − tn ) = 0 (2.24) ân ( + ξ̂n − y(xn ) + tn ) = 0 (2.25) (C − an )ξn = 0 (2.26) (C − ân )ξ̂n = 0 (2.27) By adding Equations 2.24 and 2.25 together, it can be seen that either an or ân (or both) must be zero since > 0 and an , ân , ξn , ξ̂n ≥ 0. In the case where either an or ân is non-zero, then xn is called a support vector. If an is non-zero, then Equation 2.24 implies that either tn lies on the -tube (ξn = 0) or lies above it (ξn > 0). Similarly, if ân is non-zero, then Equation 2.25 implies that either tn lies on the -tube (ξ̂n = 0) or lies above it (ξ̂n > 0). If both an and ân are zero, then tn is within the -tube and xn is not a support vector, as shown in Figure 2.2. Substituting Equation 2.19 for w in Equation 2.12 yields y(x) = N X (an − ân )k(x, xn ) + b (2.28) n=1 which allows for predictions on a test vector x, where b is the bias which can be calculated on a support vector xn in which 0 < an < C and ξn = 0, i.e., the support vector lies on the edge of the -tube. Equation 2.24 with ξn = 0 implies that + ξn + y(xn ) + tn = 0 (2.29) Substituting Equations 2.12 and 2.19 gives b = tn − − N X (am − âm )k(xm , xn ) m=1 (2.30) 23 A similar equation can be constructed for support vectors having 0 < ân < C, and in practice, it is best to average the bias over all support vectors that lie on the -tube. The implementation of SVR in this work is from the LIBSVM package (Chang and Lin, 2011). 2.3.3 Kernel Functions There is a class of functions that are valid kernels, all of which take the form of Equation 2.31 for some function φ (Herbrich, 2002). k(x, x0 ) = φ(x)T φ(x0 ) (2.31) This work uses two kernel functions, the linear kernel k(x, x0 ) = xT x0 (2.32) k(x, x0 ) = exp(−γkx − x0 k2 ) (2.33) and the Gaussian kernel For the linear kernel, the φ function is simply φ(x) = x (2.34) Kernels allow the input space to be mapped in higher dimensions to find relations in the data. The Gaussian kernel creates an infinite dimensional feature space. 2.3.4 Multilayer Perceptron Multilayer perceptrons are built from perceptrons (Figure 2.3) in a directed graph as shown in Figure 2.4. An individual perceptron takes a vector of inputs a, and performs the dot product of a with a weight vector w. The perceptron uses an additional fixed input of 1 for a0 . The weight value for w0 represents the bias term. The dot product is then used as the input of an activation function g(x), to produce a single numeric output. In this work, 24 Figure 2.3: The Perceptron. Boxes represent input values. Each edge stores an associated weight. An implicit input of a0 = 1 allows w0 to represent the bias term. The output of the perceptron is the sum of inputs multiplied by their weights passed to the activation function. there are two activation functions in the perceptrons in the MLP – The identity function, and the Sigmoid function. The identity function is used only by the perceptron that gives the final output of the network. This function is given by g(ai ) = ai (2.35) The Sigmoid function is used by the perceptrons in the hidden layer of the network. This function takes the form g(ai ) = 1 (1 + exp(−ai )) (2.36) The Sigmoid function curve is shown in Figure 2.5. The MLP is called a feed-forward network because the inputs are fed into the MLP at the input layer to compute the values for all of the perceptrons in the hidden layer. Then the values of the perceptrons in the hidden layer are fed into the output perceptron to produce the final output. 25 Figure 2.4: The Multilayer Perceptron. Boxes represent input values. Circles represent perceptrons. Each edge stores an associated weight. Perceptrons in the hidden layer use the Sigmoid activation function. The output perceptron uses the identity activation function. 26 Figure 2.5: The Sigmoid function. 27 Like the other ML algorithms, the MLP is minimized in terms of its error. For the MLP the error is expressed as N E= 1X (y(xn ) − tn )2 2 n=1 (2.37) where E is the error, y(xn ) is the output of the MLP for the nth training vector, and tn is the target value for the nth training vector. The error is minimized using a gradient descent approach on the following differential ∂E ∂wi, j (2.38) where wi, j is the weight of the input i for perceptron j. Equation 2.38 represents the change of error with respect to the change of wi, j for the current state of the network. The error decreases the fastest if the components of wi, j are changed in the opposite direction as the gradient, so the weight change function for the output perceptron becomes (Russell et al., 1995) N X ∂E = ai (tn − y(xn )) ∆wi = − ∂wi n=1 where PN n=1 (tn (2.39) − y(xn )) is the negative of the derivative of Equation 2.37, and ai is the input i for the output perceptron. The change is combined over all training vectors. Equation 2.39 computes the direction that the weight vector w should be changed by, but to follow the gradient mathematically would involve taking a continuous descent over the surface. To overcome the impractical mathematical limitations of gradient descent, the weights are changed by taking steps along the gradient. The weight update function for the output perceptron becomes wi = wi + γ∆wi (2.40) 28 where γ is the learning rate. The learning rate determines the step size, which is proportional to the magnitude of the gradient. The step size allows the network to learn faster, but a step size too large can cause oscillations around the optimum value. For the hidden layer perceptrons, the error is propogated backwards from the output perceptron. Each perceptron j of the hidden layer shares a portion of the error from Equation 2.39. The weight change function for the hidden layer perceptron j is given by ∆wi, j = g (in j )ai, j 0 K X wk, j ∆w j (2.41) k=0 where in j = K X ak wk, j k=0 g0 (a) = g(a)(1 − g(a)) where wi, j is the weight for input i to perceptron j, g(a) is the Sigmoid function shown in Equation 2.36, and ai, j is the input i to perceptron j. ∆w j is defined in Equation 2.39. The weight update function for the hidden layer perceptrons is similar to the weight update function for the output perceptron. The update function is given by wi, j = wi, j + γ∆wi, j (2.42) In addition to the learning rate, a parameter called momentum is introduced. The momentum parameter alters the weight change functions to include momentum as a portion of the weight change from the previous learning iteration. The purpose of the momentum parameter is to help overcome an intrinsic issue with the gradient descent approach for MLPs. The gradient in Equation 2.38 is not guaranteed to have a local minimum which is the global minimum. This means that if the gradient descent approach finds a point at which no small change to the weights will decrease the error, it might not be the best solution, which would decrease performance. The momentum helps the weight change function to escape these non-optimal local minimum points, but this will only 29 work if the momentum is large enough to push the weight change function past the local minimum point. However, if the momentum is too large, the weights may oscillate or fall in a non-optimal local minimum point. The updated weight change functions with the momentum parameter become ∆wi (T + 1) = β∆wi (T ) + ai N X (tn − y(xn )) (2.43) n=1 ∆wi, j (T + 1) = β∆wi, j (T ) + g0 (in j )ai, j K X wk, j ∆w j (T + 1) (2.44) k=0 where T is the learning iteration, and β is the momentum. The MLP implementation in this work is from the WEKA package (Hall et al., 2009). In this work, there are 500 learning iterations for the MLP, and the hidden layer has |xn | 2 + 1 perceptrons in the hidden layer. 2.3.5 Auto Regressive Integrated Moving Average Auto Regressive Integrated Moving Average (ARIMA) is a statistical model of a time series which creates a function for the value of the time series at time t based on previous values. This allows an ARIMA model to extrapolate future values of the time series, since the model assumes homogeneity, i.e., that a portion of the time series behaves much like the rest of the time series. The ARIMA model and its equations are described in (Box et al., 2008). The ARIMA model is generally referred to as ARIMA(p, d, q), in which the parameters p, d, and, q correspond to the order of the individual processes of the ARIMA model – Auto Regressive(AR), Integrated(I), and Moving Average(MA) respectively. When one of the parameters is 0, that process drops out of the model, i.e., ARIMA(1,0,1) would be equivalent to an ARMA(1,1) model. Given a time series of the form y = {yt , yt−1 , ..., y1 }, the ARIMA(p, d, q) model can be described by yt = φ1 yt−1 + ... + φd+p yt−d−p − β1 t−1 − ... − βq t−q + t (2.45) 30 where φ are the components of the AR model, β are the components of the MA model, and t represents the error of the prediction for time t. Equation 2.45 can be used to predict future values of the time series. If yt is the last known value of the time series, then the prediction for the next value of the series would be given by yt+1 = φ1 yt + ... + φd+p yt−d−p+1 − β1 t − ... − βq t−q+1 (2.46) t+1 drops out since the error is unknown, and the expected value for i is 0. The predicted value for yt+1 can then be used to predict yt+2 , and so on; so the value for yt+l can eventually be predicted. Since the predicted value for yt+l is based on l − 1 predictions, the uncertainty of the predictions increases as l increases. The ARIMA model in this work is implemented in the R statistical package (R Core Team, 2013) using the auto.ARIMA function. The auto.ARIMA function determines the values of p, d, and q using the Hyndman and Khandakar algorithm (Hyndman and Khandakar, 2008). 31 3 Consensus Perceived Glycemic Variability Metric This chapter presents the new Consensus Perceived Glycemic Variability Metric (CPGV). First, glycemic variability is described and the purpose of measuring it is explained. Next, the development and evaluation of the new CPGV metric is presented. Finally, there is a discussion which includes how the metric can be further used as a feature for predicting blood glucose. A paper based on the work presented in this chapter has been published in the Journal of Diabetes Science and Technology (Marling et al., 2013). 3.1 Background Both physicians and patients are now beginning to consider glycemic variability in addition to maintaining a glycemic range (Siegelaar et al., 2010). Increased glycemic variability is associated with poor glycemic control (Rodbard et al., 2009) and is a strong predictor of hypoglycemia (Monnier et al., 2011; Qu et al., 2012), which has been linked to excessive morbidity and mortality (Zoungas et al., 2010; Seaquist et al., 2012). While physicians acknowledge the need for a glycemic variability measurement, there is no current consensus on the best glycemic variability metric to use, or on the criteria for acceptable or excessive glycemic variability (Bergenstal et al., 2013). Stated simply, glycemic variability is the fluctuation in blood glucose level over time. However, many factors are considered by a physician to determine the variability displayed on a patient’s glucose chart, and no single mathematical feature has been able to closely measure variability. Figure 3.1 shows four examples of glucose variability. The variability displayed by each subfigure was evaluated by physicians (described in Section 3.2.1) and unanimously rated as low (3.1a), borderline (3.1b), high (3.1c), or extremely high (3.1d). 32 (a) Low (b) Borderline (c) High (d) Extremely high Figure 3.1: Blood glucose plots over 24 hours showing glycemic variability 3.1.1 Distinction from HbA1C Patients know that a lower HbA1c is better and know that they are improving their diabetes management if they are able to reduce their HbA1c to closer to their target, which is 7% for most patients (American Diabetes Association, 2012a). However, there is no method for patients to determine if their glycemic variability is improving. Since HbA1c is an average, it does not capture variability, and a patient may have a low HbA1c and still be at risk for complications if their glycemic variability is high. 33 Some patients experience many hypoglycemic events, which puts them at high risk and reduces their HbA1c. Their low HbA1c, if considered on its own, would show good diabetes management, however, glycemic variability would reflect this high risk. 3.1.2 Difficulty of Quantifying Glycemic Variability Quantifying glycemic variability is difficult to automate since there are several factors that are considered by physicians when reviewing a patient’s data. Some of these factors include the number of excursions from a normal glucose level, the magnitude of the excursions, how rapidly the glucose level is changing, and in what direction. Physicians also know that the blood glucose readings from the sensors can be off by as much as 20% (Klonoff, 2005). This means that physicians know that many small changes, or a very rapid change, are likely the result of signal noise. For these reasons, glycemic variability is not routinely assessed in clinical practice. 3.1.3 Previous Measurement Methods and Studies One of the earliest methods designed to measure glycemic variability is mean amplitude of glycemic excursions (MAGE) (Service et al., 1970). Since then, other metrics have been developed. Previous studies have been conducted at Ohio University to classify blood glucose plots as excessively variable or not using these metrics. The first of these studies (Vernier, 2009) used MAGE, 75 point excursions (EF), and distance traveled (DT) to classify plots as excessively variable or not. A naive Bayes classifier was trained during this study that agreed with physicians’ classifications 85% of the time. The second study (Wiley, 2011) added to the work from the first study by engineering additional features and smoothing sensor data. These features, described in Section 3.2.2, were used to build a multilayer perceptron model that agreed with the physicians 93.8% of the time. 34 Both of the previous studies, however, only classified plots as exhibiting excessive glycemic variability or not. This makes the previous models useful for detecting excessive glycemic variability, but not as useful for measuring overall diabetes control. In this study, a metric is developed to represent a patient’s glucose variability as a single number that can be used to assess control and evaluate progress. 3.1.4 The New Metric To address the limitations of the previous classifiers, a new metric was developed for potential clinical use as a measure of glycemic variability. This new metric could be used in a similar way as HbA1c is used now. The patient would submit Continuous Glucose Monitoring (CGM) data over a few days before each doctor’s appointment. The physician would then be able to determine whether the patient is improving his or her glycemic control by comparing metric values. The metric is trained on a consensus of the perceived glycemic variability exhibited in patient blood glucose charts. Glycemic variability is perceived because it is evaluated and quantified by physicians reviewing blood glucose charts. There is a consensus, because the evaluation of multiple physicians are combined. The new glycemic variability metric could also be used as a screen to classify blood glucose variability as excessive or not, like the previous models. This would be accomplished by determining a threshold value that separates excessive and acceptable glycemic variability. This threshold could provide a target for patients to achieve, much like the 7% target for HbA1C. A potential third use for the metric is to quantify glycemic variability as an input feature for a blood glucose prediction model (Chapter 4). Knowledge of glycemic variability could improve a prediction model’s ability to predict the magnitude of change in blood glucose. 35 3.2 Methods This section describes the methods used to develop the glycemic variability metric. 3.2.1 Data Collection and Format To obtain a consensus of perceived glycemic variability, expert physicians were asked to rate blood glucose plots for their variability. Each expert was asked to give a rating of 1 - 4, where 1 was low variability, 2 was borderline variability, 3 was high variability, and 4 was extremely high variability. In total, twelve physicians rated 250 glucose plots, each representing 24 hours of CGM data. Each doctor was asked to rate 55 of these plots, but some of them chose to rate a second batch of plots, for a combined total of 820 ratings. The first five plots each doctor rated were the same for all doctors. These five plots were intended to discover if any of the doctor’s ratings were significantly different from the others, and to calibrate the doctors by providing them with examples over the spectrum of variability that they were to see. Each doctor received plots randomly, but in such a way that all plots would have approximately the same number of ratings, and no doctor would rate the same plot twice. Thus, each plot was rated by either three or four different doctors. The consensus rating for each plot is the average of the ratings for the plot. The doctors rated the plots using a web interface away from the researchers, which eliminated any accidental influence on the physicians from the researchers. 3.2.2 Feature Engineering In the first study (Vernier, 2009) three features were used: Mean Amplitude of Glycemic Excursions (MAGE), Excursion Frequency (EF), and Distance Traveled (DT). MAGE (Service et al., 1970) was developed to measure the mean of the glycemic excursions that have an amplitude greater than the standard deviation over the 24 hours. 36 The amplitude of an excursion is the difference between the peak (high) and nadir (low). Similar to MAGE, EF measures the frequency of excursions of 75 mg/dl or greater. The third measurement, DT, simply sums the difference between each pair of consecutive glucose readings. In the second study conducted (Wiley, 2011), eight additional features were added to the original three. The first of these is standard deviation. The next feature added in the second study was Area Under the Curve (AUC). This feature computes the total area under the blood glucose curve relative to the minimum observed blood glucose value. Equation 3.1 shows the formula for calculating area under the curve: AUC = N X xi − xmin (3.1) i=1 The next set of features involves central image moments. To compute the central image moments, a binary intensity function f (x, y) is used to denote whether the pixel (x, y) lies within the image. The image is represented by the region C between the glucose plot curve and the minimum glucose value. The binary intensity function is defined in Equation 3.2: 1, (x, y) ∈ C f (x, y) = 0, Otherwise (3.2) With this function, the image moments can be computed with Equation 3.3: m pq = XX x x p yq f (x, y) (3.3) y Coincidentally, m00 is equivalent to Area under the curve. The center of mass for the x and y axis can be calculated with m00 , m01 and m10 giving the centroid of the image ( x̄, ȳ) defined by: x̄ = m10 , m00 ȳ = m01 m00 37 The central image moments are calculated using the center of mass for the x and y axis as shown in Equation 3.4: µ pq = XX x (x − x̄) p (y − ȳ)q f (x, y) (3.4) y The central image moments used as features are µ11 , µ20 , µ02 , µ21 , µ12 , µ30 and µ03 . Eccentricity is a measure of how close an object is to a circle. In this case, the object is the outside edge of the glucose plot in which the horizontal line passing through the minimum glucose value is curved back onto itself into a circle, with the rest of the glucose plot being wrapped around this circle. Eccentricity can be calculated from central image moments, shown in Equation 3.5 as the ratio of the maximum and minimum distance of the center of mass ( x̄, ȳ) and the edge (Theodoridis and Koutroumbas, 2009): = (µ20 − µ02 )2 + 4µ11 µ00 (3.5) Discrete Fourier Transform (DFT) converts an analog signal into sinusoidal components of different frequencies. Each component represents a frequency, along with a value, as a complex number encoding both the amplitude and phase of the sinusoidal wave. A summation of each component’s sinusoidal frequency multiplied with its amplitude, and accounting for phase shift, would result in the original signal. Large amplitudes in the lower frequencies would correspond to fluctuations that take a long time, while large amplitudes in higher frequencies correspond to high frequency fluctuations such as sensor noise. The amplitudes of the first 24 DFT frequencies (FF) are taken as features. Roundness Ratio (RR) is a ratio between the square of the perimeter, P, and the area of the CGM plot, as represented in Equation 3.6. P is different from DT because DT only measures the change in glucose level between each pair of consecutive glucose readings, but P is a measure of the Euclidean distance between the points. RR = P2 4πµ00 (3.6) 38 Bending energy (BE) is a representation of the amount of energy a particle would require to traverse the glucose plot. BE can be calculated by computing the average curvature shown by Equation 3.7: n−2 1X (θi+1 − θi )2 , BE = P i=1 yi+1 − yi where θi = arctan xi+1 − xi ! (3.7) Direction codes (DCs) are the absolute difference between two consecutive glucose levels; thus there are n − 1 DCs, where n is the number of blood glucose readings.These DCs are placed into three bins of size three starting at zero, i.e., b1 = [0, 2], b2 = [3, 5] and b3 = [6, 8]. Any DC greater than 8 is not placed into a bin. There are three features derived from DCs corresponding to the ratio of the size of bin bi and the total number of direction codes n − 1; thus, DCi = ci /(n − 1), where ci is the total number of DCs that fall into bin bi . In this third study, two additional features were added to the set of features. Both of these new features quantify the maximum slope during an excursion, one for an increasing slope, and one for a decreasing slope. Two separate features were chosen, because an increasing blood glucose level is caused by different factors than a decreasing one, and a decreasing blood glucose level is considered more dangerous, because that could indicate that the patient is heading toward hypoglycemia. The slope is calculated on an excursion from the time the excursion began to the time when the blood glucose traveled 75 mg/dl or more. Excursions begin at either a peak (local maxima) or nadir (local minima). A distance greater than 75 mg/dl is included only when necessary for one end of the excursion or the other to be outside of the normal range. This calculation follows the prior implementation of the 75 point excursion frequency feature. Figure 3.2 shows decreasing slopes in red and increasing slopes in yellow. The intuition is that the slope associated with an excursion is more important than a slope that is not. Calculating the slope only on excursions also reduces the likelihood of 39 the feature simply representing random signal noise. The maximum daily increasing slope, and the maximum daily decreasing slope are used as features. The maximum slope is intended to be more sensitive to large changes, whereas the average slope would not capture the importance of excursion slopes. Table 3.1 shows a summary of the features used, as well as the study in which each feature was first introduced. The third study is the focus of this thesis. Figure 3.2: Slope of blood glucose levels calculated on excursions. Red lines show where decreasing slopes are calculated. Yellow lines show increasing slope. Only the maximum of each are used as features. 3.2.3 Smoothing the Data The sensor used to collect the CGM data is not perfect, and introduces some signal noise to the data. The sensor can give readings ±20% of the actual level (Klonoff, 2005). The noise caused by this inaccuracy would cause several of the features such as DT and 40 Table 3.1: Summary of the features used in this study Study first Feature Description MAGE Mean Amplitude of Glycemic Excursions introduced First EF Excursion Frequency DT Distance Traveled σ Standard Deviation AUC µ pq Area Under the Curve 2-Dimensional central moments of order 2 ≤ p+q ≤ 3 Eccentricity Second FFi Amplitudes of low DFT frequencies for 1 ≤ i ≤ 24 RR Roundness Ratio BE Bending Energy DCi Direction Codes, for 1 ≤ i ≤ 3 S lope ↑ Maximum increasing slope S lope ↓ Maximum decreasing slope Third RR to be inaccurate since they are based on the jaggedness of the glucose plot. For example, the RR would generally be much higher on raw CGM data than on smooth CGM data. The intuition is that smoothing the data will better represent glucose levels. The second study (Wiley, 2011) used a cubic spline smoothing filter (Pollock, 1993), which was identified by physicians as the best of several available smoothing methods. A spline connects adjacent points with a polynomial function while maintaining continuity. A cubic spline is a spline that uses a cubic function as the polynomial function that connects the points. A smoothing function allows the polynomial end points to differ from 41 the data points that are being modeled. Using equations described by Pollock, it is possible to give higher weights to certain datapoints such as fingerstick readings over sensor readings, which adds more flexibility and power to the smoothing algorithm. 3.2.4 Machine Learning Algorithms Three machine learning (ML) algorithms were trained on the features in Table 3.1. These algorithms are Support Vector Regression (SVR), Multilayer Perceptron (MLP), and Linear Regression (LR). These algorithms are described in detail in Chapter 2. 3.2.5 Algorithm Configurations Each algorithm was trained and evaluated in several different configurations. For MLP and LR, there were three configurations producing eight combinations: Using forward or backward feature selection (described in Section 3.2.6), using smoothed or raw data (described in Section 3.2.3), and using or not using the development set as training data after feature selection and tuning. The SVR used these three configurations with two different kernel types: Linear and Gaussian (also known as radial basis function, or RBF), producing 16 combinations. 3.2.6 Feature Selection and Tuning Not all of the features described in Section 3.2.2 are equally useful. Also, the ML algorithms can be confused by too many features and uncover patterns in the data that are only a coincidence of the training data. This phenomenon is called over-fitting, which can be reduced by choosing a subset of features on which to train the algorithms. Since there are over 40 features in total, it would not be feasible to try each subset of features since there are 2n subsets. For this reason, greedy algorithms are used to select features. There are two types of greedy algorithms used for feature selection in this experiment. 42 The first feature selection algorithm is a forward selection wrapper. This algorithm initializes with the set S = ∅, then for each feature, f ∈ S̄ , where S̄ is the complement of S , train on S ∪ f and evaluate on the development set. Select the feature f 0 such that the Root Mean Square Error (RMSE) is minimized on the development set and set S = S ∪ f 0 . If the RMSE is the minimum so far, Set S 0 = S and repeat until S̄ = ∅; otherwise, stop and return S 0 as the feature set. The other feature selection algorithm is a backward elimination wrapper. Backward elimination differs from forward selection by starting with the set S = all features, so for each feature f ∈ S , train on S − f and select the feature f 0 such that the RMSE is minimized on the development set and set S = S − f 0 . If the RMSE is the minimum so far, set S 0 = S . Repeat until S = ∅ and select S 0 as the feature set. When the subset of features has been chosen by either the forward selection or backward elimination wrapper, the tuning process can begin. Each algorithm has its own set of parameters to tune. The SVM with the Gaussian kernel tunes the cost (C) and the kernel width (γ) parameters. A grid search of these two parameters is performed with each ranging from 0.001 N to 10000 , N where N is the number of training instances. The SVM with the linear kernel only has the cost parameter. The LR only has the regularization coefficient parameter (λ) which takes values in the same range 1.0×10−10 N to 1000 . N The parameters are doubled each iteration because exponential growth of the parameters is considered a practical method of identifying good parameters (Hsu et al., 2003). The MLP has learning rate and momentum parameters which take the values {0.2, 0.5, 0.8}. The parameters for each of these algorithms are described in detail in Chapter 2. Similar to feature selection, the tuning process trains on the training set and evaluates on the development set to choose the best tuning parameters based on the RMSE on the development set. The process of feature selection and parameter tuning is performed for each of the folds in the 10-fold cross validation. 43 3.2.7 10-Fold Cross Validation For training, tuning and testing a machine learning model, there are three sets of instances – a training set, a development set, and a testing set. The training and development sets are used to prepare the model for the testing set, which contains instances that were not used to create the model. The evaluation of the model on the testing set is a representation of how the model would perform on new, unseen data. The model is trained on instances from the training set. However, to tune the parameters and do feature selection (Section 3.2.6) to improve performance, the model is evaluated on the development set. This way, the model is configured with a feature set and algorithmic parameters to perform the best it can on the development set. If the development and training sets are a good representation of all of the instances in the domain, and there are enough instances in the sets, the model should also perform well on the testing set. If there are too few instances to adequately create distinct training, development, and testing sets, a method called 10-fold cross validation can be used. In this method, some of the instances are held out for development, while the rest of the instances are segmented into 10 folds. Each fold of the 10-fold cross validation contains a training set (90% of the instances) and a testing set (10% of the instances). Each instance is bound to exactly one testing set, so each instance is used for testing once and for training nine times over the 10 folds. The idea behind 10-fold cross validation is to mimic a large testing set and a large training set without having a large number of instances. 3.2.8 Datasets For the experiments in this study, 250 CGM plots were rated by the physicians. Since there are relatively few instances, 10-fold cross validation is used, as described in Section 3.2.7. For this experiment, 50 of the instances were set aside as the development 44 set, leaving the other 200 to be used in the 10-fold cross validation. Therefore, 20 instances were tested on for each fold, and 180 were trained on. 3.2.9 Defining the CPGV Metric A metric was sought that matched physician consensus ratings as closely as possible. The performance of each configuration of each algorithm model was measured by the RMSE over the 10 folds. Any difference between the output of the model and the consensus rating of the physicians represents an error. The square of the error on the testing sets was averaged over the 10 folds to calculate the RMSE over all 200 CGM plots. The configuration/algorithm combination with the lowest RMSE was used to build the model for the CPGV metric. This model was trained on all 200 CGM plots that were used in the 10-fold cross validation, and tuned on the same 50 development plots. Results of the evaluation of the CPGV metric are reported in Section 3.3.1. 3.2.10 Screen for Excessive Glycemic Variability To obtain a screen for excessive glycemic variability, the CPGV metric was used to classify CGM plots as exhibiting excessive or acceptable glycemic variability, based on the value of the metric. A threshold value of the metric was chosen to separate excessive from acceptable glycemic variability. A value less than the threshold would be acceptable glycemic variability, and a value greater than or equal to the threshold would be excessive glycemic variability. Two hundred sixty-two CGM plots were collected for a previous study (Wiley, 2011) and reused for this purpose in this experiment. In that study, two physicians, Frank Schwartz and Jay Shubrook, classified CGM plots manually, providing a gold standard for correct classification. However, 64 of the 262 CGM plots were included in the dataset used to develop the CPGV metric. In evaluating the performance of the CPGV metric as a screen, it was important to avoid any statistical bias due to this overlap. Ideally, there would have been no overlap, but excluding the 64 plots would have 45 resulted in a smaller evaluation set. To avoid testing the metric on plots used for training the metric, values were taken from folds of the cross validation in which these plots were used only for testing. Results of the evaluation of the screen for excessive glycemic variability are reported in Section 3.3.2. 3.3 Results 3.3.1 CPGV Metric Performance Table 3.2 shows a comparison of the ability of the best machine learning algorithms and the individual physicians to match the consensus ratings. The individual physicians’ ability to match the consensus is shown for two different ways of calculating the consensus. Normally, the consensus for a plot is the average of all of the physicians’ ratings of that plot. This consensus (physicians inclusive) represents the inter-rater agreement and a gold standard for the models to achieve. However, because it is easier for an individual physician to match a consensus that includes his or her own rating, a second consensus (physicians exclusive) was computed to exclude the individual’s own rating. Physicians exclusive tests the ability of an individual physician to agree with the consensus of other doctors who rated the same CGM plot. The Mean Absolute Error (MAE) column represents the average difference from the consensus. Since the physicians only give ratings in whole numbers, the model outputs were rounded to the nearest integer for comparison. The best model was the SVR using smoothed CGM data on a Gaussian kernel with features chosen by the greedy forward selection method. This model performed slightly better when using the development set as training examples after features and tuning parameters were computed. The CPGV metric was created using an SVR with this configuration. It was trained on all 200 CGM plots that were used in the 10-fold cross validation, with feature selection and parameter tuning on the 50 development plots. 46 Table 3.2: Performance of the best MLP, SVR and LR models compared to the performance of individual doctors at matching the consensus RMSE MAE RMSE MAE (floating pt) (floating pt) (integer) (integer) Best SVR 0.417 0.316 0.511 0.376 Best MLP 0.496 0.392 0.575 0.445 Best LR 0.471 0.354 0.541 0.397 Physicians inclusive 0.489 0.355 Physicians exclusive 0.699 0.509 Table 3.2 shows that the CPGV metric performs comparably to the physicians when constrained to integer evaluations. When the metric is not constrained to integer evaluations, it performs much better at matching the consensus than individual physicians. 3.3.2 Performance of the Excessive Glycemic Variability Screen A glycemic variability metric can be used as a screen by choosing a cutoff value to separate acceptable from excessive glycemic variability. The CPGV metric was compared to several other glycemic variability metrics as screens. Excursion frequency, distance traveled, standard deviation, and MAGE, which were described in Section 3.2.2, were evaluated for comparison. One additional metric, Interquartile Range (IQR) (Rodbard, 2009b), was was also evaluated. The IQR is a metric which measures glycemic variability by finding the difference of blood glucose readings at the 75th and 25th percentiles. This method effectively measures the range of blood glucose while treating the highest and lowest 25% of readings as outliers, giving the size of the spread of the readings. 47 The screens were evaluated on 262 CGM plots that were previously annotated by two physicians, Frank Schwartz and Jay Shubrook (Wiley, 2011). The threshold value for each metric was experimentally chosen as the one that gives the best accuracy when trained on a randomly selected development set. This process was repeated 10 times with the accuracy, sensitivity, and specificity averaged over the 10 tests. Table 3.3 compares the performance of the screens. Statistical significance was calculated between CPGV and the other metrics using a one-tailed paired sample t-test. P-values are reported where statistical significance was found. Table 3.3 shows that the CPGV metric classifies statistically significantly better than all of the other metrics based on accuracy and sensitivity and better than STD, MAGE, and IQR based on specificity. The Receiver Operator Characteristic (ROC) curve in Figure 3.3 shows the comparison of the classification performance of the metrics. The threshold value is the dependent variable for computing the false positive and true positive rate. The closer the line is to the top left corner, the better. As can be seen from Figure 3.3, the CPGV metric outperforms all of the other metrics as a screen. 48 Table 3.3: Classification performance of CPGV, EF, DT, STD, MAGE, and IQR. p values show a one tailed paired t-test comparison to the CPGV metric Accuracy Sensitivity Specificity 90.1% 97.0% 74.1% 84.3% 92.0% 66.1% p < 0.005 p < 0.05 83.9% 89.1% p < 0.005 p < 0.005 83.6% 91.6% 64.8% p < 0.001 p < 0.005 p < 0.05 80.8% 90.9% 56.7% Excursions (MAGE) p < 0.001 p < 0.05 p < 0.005 Interquartile Range 78.7% 91.8% 47.2% p < 0.001 p < 0.05 p < 0.005 Consensus Perceived Glycemic Variability (CPGV) Excursion Frequency (EF) Distance Traveled (DT) Standard Deviation (STD) Mean Amplitude of Glycemic (IQR) 72.3% 49 Figure 3.3: ROC curve showing CPGV, EF, DT, STD, MAGE, and IQR 50 3.4 Discussion The CPGV metric gives a single numeric value that can be used to quantify the glycemic variability of a 24-hour CGM plot as represented in Figure 3.4. The CPGV CPGV = 1.4 CPGV = 2.0 CPGV = 3.1 CPGV = 3.9 Figure 3.4: CGM plots automatically rated by the CPGV metric. metric could be run on as many days of data as a patient has, as long as they have enough data to run the metric. Analytics can be derived from the daily values, such as average CPGV, or percentage of days with excessive variability. Correlations can also be made with predictable events such as weekends, holidays, exams, menstrual cycles, etc. A recent recommendation to standardize reporting of glycemic control for use in clinical decision making (Bergenstal et al., 2013) calls for a glycemic variability metric. 51 The metrics that were considered by Bergenstal et al. were STD and IQR. The CPGV metric has been shown to match the consensus of multiple expert physicians significantly better than STD (p < 0.001) and IQR (p < 0.001). High glucose variability has been used to predict hypoglycemia (Monnier et al., 2011; Qu et al., 2012). Both of these studies used STD and MAGE, finding a significant correlation between high variability and frequency of future hypoglycemic events. Using the CPGV metric instead of STD or MAGE could lead to more accurate predictions. 52 4 Blood Glucose Prediction This chapter describes blood glucose level prediction for patients with Type 1 diabetes (T1D). The background and previous work are described, and then the methods of the physiological model and machine learning are explained. Finally, the results and discussion are presented. A paper based on the work presented in this chapter has been submitted to the International Conference on Machine Learning and Applications (ICMLA). 4.1 Background The problem of predicting blood glucose falls within the domain of a time series prediction. A time series prediction uses all of the information up until a cut-off time t, to predict the value of a variable at time t + l. This work uses the data described in Section 4.2.1 to predict the value of blood glucose (BG) at two future times: t + 30 minutes, and t + 60 minutes. 4.1.1 Previous Work A preliminary study (Marling et al., 2011; Marling et al., 2012), simple Support Vector Regression (SVR) models were shown to outperform a simple baseline of BGt+0 , i.e., unchanging blood glucose, based on Root Mean Square Error (RMSE). These models use an arbitrary pivot point about 1 month into the patient’s study. The seven days before the pivot point were used as training data, while the 3 days after and including the pivot date were used as testing data. Two separate SVR models were trained, one for a 30 minute prediction, the other for a 60 minute prediction. These models used the following features: 1. Blood glucose level at present time, t0 . 53 2. Moving average over the previous four blood glucose levels before, and including, the prediction time. 3. Exponentially smoothed rate of change over the previous four blood glucose levels before, and including, the prediction time. 4. Insulin bolus dosage totals over 30 minutes before the prediction time computed over intervals of 10 or 30 minutes. 5. Insulin basal rate averages over 5 or 15 minutes before the prediction time. 6. Carbohydrate consumption over 30 minutes before the prediction time computed over intervals of 15 or 30 minutes 7. Exercise duration and intensity averages over 5, 30, or 60 minutes before the prediction time. In later work (Wiley, 2011), an Auto Regressive Integrated Moving Average (ARIMA) model was investigated as a new feature for input to the SVR model. An ARIMA model is a statistical model of a time series that uses past data points to make forecasts about the time series in the future. The ARIMA model itself can be used as a predictor, as well as to generate a feature to inform the SVR models. The ARIMA model was trained on four days of CGM data immediately before the test point. The SVR model used a pivot date on a Sunday about 1 month into the patient’s study. The SVR model was trained on 14 days before the pivot date and tested on the 14 days after the pivot date. Each of the features was designed as a parameterized template that was tuned using a grid search on two weeks of training and one week of development data before the pivot date. The investigation used two sets of features for the SVR. SVR1 used features 1 through 6 from the preliminary study, as well as the following features: 8. The ARIMA forecast for the prediction time (either 30 or 60 minutes in the future). 54 Table 4.1: Previous study prediction results. Horizon t0 ARIMA SVR1 SVR2 30 min 19.5 4.5 4.7 4.5 60 min 35.5 17.9 17.7 17.4 9. The amount of time spent exercising over the previous hour. 10. The amount of time spent sleeping over the previous hour. 11. The amount of time spent working over the previous hour. SVR2 used features 1, 2, 3, and 8 above. The CGM data used in the investigation was smoothed to reduce sensor noise. The smoothing algorithm was the same cubic spline smoothing algorithm described in Section 3.2.3. This algorithm uses both past and future information to move each point to form a smooth curve. Since the prediction point was also smoothed, this results in an easier prediction problem. Table 4.1 shows the results from this study. As can be seen from the table, the simple ARIMA model performed similarly to the SVR models. 4.2 Methods This section describes the methods used in this study for blood glucose prediction. 4.2.1 Data The blood glucose prediction model takes advantage of a comprehensive database of patient data accumulated from trials involving patients with Type 1 diabetes. This work TM uses the data from the third 4DSS following information: trial in which patients reported or submitted the 55 • CareLink data which includes: – CGM readings in five minute intervals. – Finger stick readings. – Insulin bolus dosages. – Bolus type. – Basal levels. – Temporary basals. • Time they went to bed. • Time they woke up. • Time they went to work. • Time they returned from work and work intensity on a revised Borg scale of perceived physical exertion (Borg, 1982) of 1 to 10. • Time, duration, and type of exercise and intensity on the same Borg scale of 1 to 10. • Time of hypoglycemia events with the corrective action (carbohydrates and food composition), and symptoms felt. • Time of meal, type of meal and composition, and estimated carbohydrates. • Time and type of stress. • Time of illness or injury. • Time of menses. • Time of a pump problem. 56 The CareLink data is recorded by the patient’s pump, which automatically records the time of the events. The rest of the data was reported by the patient through a web browser-based interface on a personal computer at the end of each day, as described in (Maimone, 2006). Two hundred test points were chosen from 5 patients - 40 from each. The test points were chosen to represent a wide range of situations: different times of day and night; soon or long after a variety of life events; on rising, falling, and static points of the blood glucose curve; or before, after, or far away from local optima and inflection points in the blood glucose curve. 4.2.2 Physiological Model To better capture the effects of carbohydrates and insulin in the body, a physiological model is used. The physiological model attempts to characterize meal absorption dynamics, insulin dynamics, and glucose dynamics. The physiological model is based on equations presented in (Duke, 2009), with a few adaptations to better match published data and feedback from physicians. The physiological model used in this work was developed by Razvan Bunescu at Ohio University. The physiological model is represented by its state variables X, input variables U, and a state transition function that computes the next state variables given the current state variables and the input variables, i.e., Xt+1 = f (Xt , Ut ). The state variables X are organized as follows: 1. Meal Absorption Dynamics: • Cg1 (t) = carbohydrate consumption (g). • Cg2 (t) = carbohydrate digestion (g). 2. Insulin Dynamics: 57 Figure 4.1: Variable dependency for calculating the next state for the physiological model. Figure by Razvan Bunescu. • I s (t) = subcutaneous insulin (µU). • Im (t) = insulin mass (µU). • I(t) = level of active plasma insulin (µU/ml). 3. Glucose Dynamics: • Gm (t) = blood glucose mass (mg). • G(t) = blood glucose concentration (mg/dl). The vector of input variables, U, contains the carbohydrate intake UC (t) measured in grams (g), and the amount of insulin intake U I (t) measured in units of insulin(U). The insulin intake is computed from the bolus and basal rate data from the patient’s pump. The carbohydrate data is from the patient’s report for meals, snacks, and hypoglycemia corrections. Figure 4.1 shows the interaction between the state variables at time t to compute the state at time t + 1. The boxes represent a single state variable. The set of state variables, X, is on the horizontal, and the passage of time is on the vertical, where moving down is later. Each state variable is computed by an equation that is dependent on the variables pointing to it. The equations for the state variables are as follows: Meal Absorption Dynamics: 58 • Cg1 (t + 1) = Cg1 (t) − α1C ∗ Cg1(t) + UC (t) • Cg2 (t + 1) = Cg2 (t) + α1C ∗ Cg1(t) − α2c /(1 + 25/Cg2 (t)) Insulin Dynamics: • I s (t + 1) = IS (t) − α f i ∗ IS (t) + U I (t) • Im (t + 1) = Im (t) + α f i ∗ IS (t) − αci ∗ Im (t) The equation for glucose dynamics is Gm (t + 1) = Gm (t) + ∆abs − ∆ind − ∆dep − ∆clr + ∆egp , where: • ∆abs (absorption) = α3c ∗ α2c /(1 + 25/Cg2 (t)) • ∆ind (insulin independent utilization) = α1ind ∗ √ G(t) • ∆dep (insulin dependent utilization) = α1dep ∗ I(t) ∗ (G(t) + α2dep ) • ∆clr (renal clearance, when G(t) > 115) = α1clr ∗ (G(t) − 115) • ∆egp (endogenous liver production) = α2egp ∗ exp(−I(t)/α2egp ) − α1egp ∗ G(t) Finally, the glucose and insulin concentrations are simply scalars of their respective mass, which are given by the following equations, where bm refers to body mass, and IS refers to insulin sensitivity: • G(t) = Gm (t)/(2.2 ∗ bm) • I(t) = Im (t) ∗ IS /(142 ∗ bm) 4.2.3 Feature Vector For each blood glucose reading at time t, a feature vector is computed. Each state vector can have one of two target values, blood glucose level at time t+30 or time t+60 . A separate SVR model is trained for each target value. The feature vector consists of features computed from: 59 • Blood glucose level at present time, t0 . • Blood glucose deltas. The values of BG(t0 ) − BG(t−5i ) for 1 ≤ i ≤ 12. • ARIMA forecasts. The values of the ARIMA forecast for time t+5i for 1 ≤ i ≤ 12. • Physiological model. The values of the components of the physiological model for Cg1 , Cg2 , IS , Im , and Gm at time t+30 and t+60 as well as the change of the components between time t0 and t+30 , and between t0 and t+60 . 4.2.4 Walk Forward Testing Figure 4.2 shows how the machine learning model was designed to train and tune on past data and test on future data. For each test point, the model uses the week prior to the point as training data, and all of the data before the training data as development data for tuning. A new prediction model is trained and tuned for each test point. It is important to note that the model does not use training vectors for which the target time is after the time for the prediction test point. For example, if the prediction point is BGt+30 , then the training vector for the preceding blood glucose reading cannot be used since the prediction time for it is equal to BGt+25 , which is in the future. To effectively tune the model, there needs to be at least a full week of CGM data for training and a full week of CGM data for tuning. This experiment includes some test points where there is not enough data to effectively tune the parameters for the model. Generic parameters were used for those test points. To obtain the generic parameters, 100 validation points were chosen at random from a separate dataset from two patients. These separate 100 points start far enough into the patient’s study to ensure that there is enough data for a full week of training. A grid search was performed over the 100 validation points to determine the set of parameters with the lowest RMSE. This set of parameters 60 Figure 4.2: Illustration of the walk-forward testing scheme used for the machine learning model. The darkest section, which is tested on, represents the feature vector for a single time value. became the generic parameters used for test points without at least two weeks of prior CGM data. 4.2.5 Baselines for Evaluation To gauge the effectiveness of the prediction model, this experiment uses four baselines for comparison: • A simple BG(t0 ) value. This baseline predicts that the blood glucose level will stay the same. 61 • ARIMA forecast for time t+30 for the 30 minute prediction model or for time t+60 for the 60 minute prediction model. • An SVR model trained on the features used for the SVR1 model described in Section 4.1.1. This model was trained and tuned using the walk forward method described in Section 4.2.4. • The predictions of three expert physicians. These physicians were given the same test points as the model and were asked to predict the blood glucose levels using a GUI. Figure 4.3 shows the GUI that the physicians used. The GUI was developed by Melih Altun, based on a visualization program originally developed by Wesley Miller (Miller, 2009). This figure shows the blood glucose curve in blue with a prediction point around 8:45pm. The dashed vertical line around 9:45pm represents the time for the 60 minute prediction. The physician is able to use the mouse pointer to place a prediction value on this vertical line. The top of the image shows many dots representing the various life events which the patient experienced. The physician was blocked from seeing future blood glucose levels, but was able to view any day prior to the day containing the test point. The two black dots at around 10:00am and 10:30am are visual feedback from predictions for a test point earlier in the same day. 4.2.6 Evaluation The standard evaluation metric for the model and the baselines is RMSE. During the physician prediction sessions, it was noticed that one of the physicians was good at identifying the direction of the blood glucose change, but overestimated the magnitude of the change. To capture this ability, an additional ternary evaluation metric was developed. 62 Figure 4.3: A screen capture of the GUI used by the physicians to predict future blood glucose levels. The ternary metric uses three categories for the change in the blood glucose level: decrease, stay the same, or increase. If the blood glucose deviates by less than or equal to 5 mg/dl for the 30 minute prediction, or less than or equal to 10 mg/dl for the 60 minute prediction, it is classified as staying the same. If the blood glucose increases or decreases by more than that, then it is classified as increasing or decreasing, accordingly. 63 Table 4.2: RMSE baselines for the prediction dataset. ARIMA SVR Physician 1 Physician 2 Physician 3 30 minute prediction 27.6 22.9 23.7 19.8 21.2 34.1 60 minute prediction 43.8 42.2 41.8 38.4 40.0 47.0 t0 The ternary metric uses a cost evaluation based on the following matrix: Actual Decrease Predicted S ame Increase Decrease S ame Increase 0 1 2 1 0 1 2 1 0 (4.1) A prediction in the same class as the actual change results in a cost of zero. A prediction of an adjacent class results in a cost of one. A prediction of the wrong direction results in a cost of two. The value for the ternary metric is the total cost for all of the test points. When using RMSE or ternary cost, a lower score is considered better. 4.3 Results Table 4.2 shows the RMSE scores for the baselines. As seen in the table, Physician 1 performed best for both 30 minute and 60 minute predictions, followed by Physician 2. Physician 3 had the highest scores when using RMSE as the evaluation metric, but did better when using the ternary cost as the evaluation metric, as shown in Table 4.3. The simplest baseline, t0 , is comparable to the other baselines at predicting blood glucose for 60 minutes, but worse at predicting for 30 minutes when using RMSE as the evaluation metric. For the 30 minute prediction, the RMSE scores for all of the other baselines except Physician 3 are significantly lower than the score for t0 (p < 0.01). For the 60 minute prediction, only the RMSE scores for Physician 1 (p < 0.01) and Physician 64 Table 4.3: Ternary cost baselines for the prediction dataset. ARIMA SVR Physician 1 Physician 2 30 minute prediction 159 107 95 81 86 115 60 minute prediction 151 131 118 106 119 113 t0 Physician 3 2 (p < 0.05) are significantly lower than the score for t0 . However, when using ternary cost as the evaluation metric, all other baselines have significantly lower scores than t0 for both the 30 and 60 minute predictions (p < 0.01). Table 4.4 shows the results for the new prediction model. The table shows that the new prediction model performs better than all baselines when using RMSE as the evaluation metric, and better than all of the baselines for 60 minute predictions when using the ternary cost as the evaluation metric. The new prediction model performs better than all but Physician 1 and Physician 2 for 30 minute predictions when using the ternary cost as the evaluation metric. Table 4.5 shows the p-values from a one tailed paired t-test comparison of the baselines to the new prediction model. As seen in the table, the new prediction model is significantly better than t0 and ARIMA for both 30 minute and 60 minute predictions when using either RMSE or ternary cost as the evaluation metric. The new prediction model is also significantly better than the SVR baseline for both 30 and 60 minute predictions when using RMSE as the evaluation metric and comparable when using ternary cost as the evaluation metric. The new prediction model performs comparably to the physicians overall; it is significantly better than the physicians in four cases, as shown in Table 4.5. The Clarke Error Grid Analysis (CEGA) is an analysis method first developed by (Clarke et al., 1987) for evaluating blood glucose sensor accuracy, but it can also be useful for evaluating the goodness of blood glucose prediction. Figure 4.4 shows a Clarke Error 65 Table 4.4: Results for the new prediction model. RMSE Ternary cost 30 minute prediction 19.5 88 60 minute prediction 35.7 105 Table 4.5: Statistical significance of the improvement of the new prediction model over the baselines. NS denotes not statistically significant. t0 ARIMA SVR Physician 1 Physician 2 Physician 3 30 minute RMSE p < 0.0001 p < 0.001 p < 0.0005 NS NS p < 0.0001 60 minute RMSE p < 0.0001 p < 0.0005 p < 0.0005 NS p < 0.01 p < 0.001 30 minute Ternary p < 0.0001 p < 0.05 NS NS NS p < 0.01 60 minute Ternary p < 0.0001 p < 0.05 NS NS NS NS Grid. The regions of the grid represent the clinical consequences of prediction accuracy. Region A represents clinical accuracy. Region B represents benign clinical error. Region C represents erroneous predictions suggesting treatment that is unnecessary, but not harmful. Region D represents erroneous predictions that fail to detect problems requiring treatment. Region E represents erroneous predictions suggesting the wrong clinical treatment, which could endanger patients. Predictions within regions A and B are considered clinically acceptable, but predictions within regions C, D, and E represent errors with potential clinical consequences. Tables 4.6 and 4.7 show the percentages of predictions falling within each CEGA region, for 30 and 60 minute predictions, respectively. For 30 minute predictions, the new prediction model tied the SVR baseline with predictions in region A, and only had fewer predictions in region A than Physician 1. For 60 minute predictions, the new prediction model tied Physician 3 with predictions in 66 Figure 4.4: The Clarke Error Grid. region A, and had more predictions in region A than all other baselines. Figures 4.5, 4.6, 4.7, 4.8, 4.9,and 4.10 show the CEGA for the baselines. Figure 4.11 shows the CEGA for the new prediction model. 67 30 minute prediction. 60 minute prediction. Figure 4.5: t0 baseline CEGA. 30 minute prediction. 60 minute prediction. Figure 4.6: ARIMA baseline CEGA. 68 30 minute prediction. 60 minute prediction. Figure 4.7: SVR baseline CEGA. 30 minute prediction. 60 minute prediction. Figure 4.8: Physician 1 baseline CEGA. 69 30 minute prediction. 60 minute prediction. Figure 4.9: Physician 2 baseline CEGA. 30 minute prediction. 60 minute prediction. Figure 4.10: Physician 3 baseline CEGA. 70 30 minute prediction. 60 minute prediction. Figure 4.11: New prediction model CEGA. Table 4.6: CEGA Region percentages for 30 minute predictions. t0 ARIMA SVR Physician 1 Physician 2 Physician 3 New Prediction Model A 78.57% 81.12% 84.70% 86.73% 83.67% 77.04% 84.70% B 19.39% 17.35% 14.29% 11.73% 14.80% 19.90% 13.78% C 0% 0% 0% 0% 0% 0% 0% D 2.04% 1.53% 1.02% 1.53% 1.53% 3.06% 1.53% E 0% 0% 0% 0% 0% 0% 0% Table 4.7: CEGA Region percentages for 60 minute predictions. t0 ARIMA SVR Physician 1 Physician 2 Physician 3 New Prediction Model A 57.14% 58.16% 64.29% 65.31% 64.29% 67.35% 67.35% B 35.71% 35.71% 30.10% 29.08% 30.61% 25.51% 26.53% C 0% 0.51% 0% 0% 0% 0% 0% D 7.14% 5.61% 5.10% 5.61% 4.59% 6.63% 6.12% E 0% 0.51% 0% 0.51% 0.51% 0% 0% 71 4.4 Discussion The new prediction model provides a prediction for 30 minutes and 60 minutes in the future based on features from past blood glucose levels, insulin dosages, and meals. The model uses features from ARIMA and blood glucose delta values, which are based on past blood glucose levels, and features from a physiological model, which uses past blood glucose levels, insulin dosages, and meals. This model has been shown to be a better predictor than the t0 , ARIMA, and the old SVR baselines when using RMSE as the evaluation metric and to outperform t0 and ARIMA when using ternary cost as the evaluation metric. The new prediction model also outperforms Physician 2 on 60 minute predictions and Physician 3 on 30 and 60 minute predictions when using RMSE as the evaluation metric, and Physician 3 on 30 minute predictions when using ternary cost as the evaluation metric. Since the new prediction model outperforms the old SVR, this shows that it is important to have a good representation of the data as features. Both the new prediction model and the old SVR model use the same machine learning algorithm and experimental platform, as well as features based on the same types of data (blood glucose, insulin, and carbohydrates), but the new prediction model used more informed features from this data by better representing the interactions of the data through a physiological model. The physiological model uses generalizations about body mass index, insulin sensitivity, and carbohydrate ratio, rather than parameters tuned to each individual, which limits its accuracy. However, the SVR has the advantage of being able to learn these parameters automatically. The SVR has the additional advantage of being able to include features based on data that would be difficult to incorporate in a physiological model, such as work or stress. The ability to predict blood glucose levels 30 minutes or 60 minutes in the future would provide an invaluable tool for patients with diabetes to preemptively correct for 72 dangerous blood glucose levels before they occur. This tool could also bring peace of mind to patients with diabetes and their carers, who have to live with the stress of not knowing if life threatening blood glucose levels are imminent. 73 5 5.1 Related Research Diabetes Related Research This section discusses research related to this work pertaining to diabetes including physiological models, predictive blood glucose control, and glycemic variability. 5.1.1 Physiological Models There are several physiological models that are used for modeling blood glucose. Currently, there are no physiological models being used for providing clinical advice for real patients. They are used for demonstration and learning purposes. Some studies use in silico trials, which use a physiological model/simulator for evaluation. The Lehmann and Deutsch simulator (Lehmann and Deutsch, 1992) uses four differential equations and 12 auxiliary equations to model blood glucose. The model implements the idea that physiology responds in transitory phases. Before the carbohydrates in a meal are absorbed in the blood, they first transition through the digestive system in phases. Subcutaneous insulin injection transits in similar phases. The equations in the model attempt to reflect these transitions and their eventual effect on glucose levels. The physiological model used in this work for predicting blood glucose is very similar to this simulator. AIDA (Lehmann et al., 2011; Lehmann, 2013) was developed by the same authors as the Lehmann and Deutsch simulator. The primary goal of AIDA is to provide a teaching tool for diabetes self-management. According to the Diabetes Control and Complications Trial (DCCT) (DCCT Research Group and others, 1987), intensive glycemic control reduces and delays serious complications. The aim of AIDA is to transfer clinical knowledge from physicians to patients through the computer tool, thereby reducing the time physicians need to teach patients how to control diabetes. The authors acknowledge 74 that AIDA is not complete enough to be used for modeling individual patients, but is a valuable tool for demonstrating the general responses to carbohydrates and insulin. Other models include the Glucose Insulin Model (GIM) (Dalla Man et al., 2007a) which implements the physiological model described in (Dalla Man et al., 2007b). This model is similar to the Lehmann and Deutsch simulator and AIDA. The GIM software can be used for simulating physiological responses to glucose and insulin. The H∞ robust model (Parker et al., 2000), is a 19th order linear model and one of the most complicated models in the literature. This work extends a physiological model by using machine learning to learn the effects of the variables on an individual patient. The physiological models described here attempt to find parameter weights based on values such as body weight, insulin sensitivity, carbohydrate ratio, etc. For example, the GIM model uses over 30 parameter weights. Most of the weights are tuned to generalized patients so that the models do not need to be completely reworked for each patient. Machine learning allows for weights to be learned automatically, allowing a machine learning algorithm to learn the effect of each equation of the model independently. Machine learning also allows for additional parameters to be included that are not part of the physiological model. 5.1.2 Predictive Blood Glucose Control Currently, diabetes management is conducted in an open-loop style, where glucose readings are presented to the patient, who then manually administers treatment using a simple algorithm. A closed-loop system continuously and automatically monitors glucose readings and administers treatment without patient intervention. This type of system is referred to as an artificial pancreas. According to (Klonoff, 2007), the benefits of a closed-loop system over an open-loop system are “1) less glycemic variability; 2) less 75 hypoglycemia; 3) less pain from pricking the skin to check the blood glucose and deliver insulin boluses; and 4) less overall patient effort.” An artificial pancreas consists of three components: (1) An insulin pump; (2) a CGM sensor; and (3) a closed-loop control algorithm to determine the rate of insulin delivery. Insulin pumps and CGM sensors exist on the market, but the control algorithm is still in development. Some control algorithms are based on the traditional Proportional-Integral-Derivative (PID) controller and do not model blood glucose (Bequette, 2005). This type of controller is widely used in industrial processes to maintain or adjust process variables in the presence of uncontrollable disturbances (Petruzella, 2005). These controllers need to have parameters tuned to each specific process. Since each patient has their own insulin sensitivity, the controller would need to be tuned for each patient. Other control algorithms use Model Predictive Control (MPC), which uses mathematical or AI models to forecast blood glucose (Bequette, 2005). Modeling blood glucose for the purpose of providing informed insulin dosages dates back to the 1960s (Boutayeb and Chetouani, 2006). Early efforts had to make do with fingerstick readings, since CGM sensors were not available until 1999. The current approach to control algorithms for the artificial pancreas is focused on eliminating all patient intervention. Therefore, they consider automatically recorded events such as blood glucose and insulin, but they do not consider any patient recorded events. (Kovács et al., 2012) evaluated a linear parameter varying (LPV) methodology for systems of nonlinear equations for modeling blood glucose levels for MPC. The LPV is an extension of linear systems where the inputs are considered linear, but their relations are a function of the signal, allowing nonlinear behavior. Choosing the variables using LPV allows the nonlinearity of the original model to be hidden, since the measured parameters describe the controller. The LPV methodology was used in the model developed by (Parker et al., 2000). This model takes into consideration several factors including glucose 76 and insulin kinetics, and renal excretion. The LPV model was evaluated using the Lehmann and Deutsch simulator (Lehmann and Deutsch, 1992). The model was shown to be capable of avoiding hypoglycemia. (Magni et al., 2009) is a simulated trial using the GIM software package (Dalla Man et al., 2007a) to evaluate the model. The trial used MPC to deliver insulin for 100 simulated patients over a 46 day period. Parameters to the physiological model were adjusted throughout the experiment; for example, the time of breakfast was shifted by a random amount. The model was robust to meal variations and was able to tune parameters. The authors conclude that their model achieves satisfactory results, especially during sleep, but the model should not replace traditional open-loop basal + bolus therapy for meals. The insulin infusion advisory system (IIAS) (Zarkogianni et al., 2011) is an intelligent insulin delivery system for patients with T1DM on CGM monitors and insulin pumps. The IIAS uses the information from the CGM sensor and patient reports of meals to estimate the optimum insulin dosage. It uses a personalized glucose-insulin metabolism model built on a neural network. The IIAS was evaluated in silico with the University of Virginia (UVa) T1DM simulator based on (Dalla Man et al., 2007b). Patients with variations were used to better simulate real life situations like sensor errors and carbohydrate estimation errors. The simulation showed that the IIAS controller, compared to a more traditional controller, reduced average blood glucose levels from 152 ± 28 to 118 ± 7 mg/dl, and reduced the percentage of time spent hyperglycemic from 28 ± 21% to 1 ± 2%. The closest research to this work is the intelligent diabetes assistant (IDA) (Duke, 2009), a system which collects information about patients through a phone interface. The IDA collects meal, insulin, medication, and exercise information. The information is used for predicting blood glucose after a meal, generating therapy advice and continuous 77 glucose modeling. Duke postulated that when patients correct for their meal, they are essentially attempting to predict their after-meal glucose levels when choosing a dosage. Duke developed Gaussian process regression with a Gaussian kernel to predict post-meal glucose levels. The model outperformed patients and other published results. The Gaussian Process regression approach focused on 2 hour predictions of blood glucose after a meal. The predictions were modeled using auto regressive and physiological models. Auto regressive models and physiological models were used for 15 and 45 minute predictions. The auto regressive models out performed the physiological models for the 15 minute predictions, but the physiological models performed better for the 45 minute predictions. This work uses a broader set of life events than (Duke, 2009), and more historical data from patients (3 months compared to 2 weeks). 5.1.3 Glycemic Variability Increased glycemic variability is associated with poor glycemic control (Rodbard et al., 2009) and is a strong predictor of hypoglycemia (Monnier et al., 2011; Qu et al., 2012), which has been linked to excessive morbidity and mortality (Zoungas et al., 2010; Seaquist et al., 2012). While physicians acknowledge the need for a glycemic variability measurement, there is no current consensus on the best glycemic variability metric to use, or on the criteria for acceptable or excessive glycemic variability (Bergenstal et al., 2013). In (Bergenstal et al., 2013), a panel of 34 expert diabetes specialist met on March 28-29, 2012 to discuss the current state of diabetes care and which direction to move forward. Part of their discussion was focused around glycemic variability. There are a number of metrics for measuring glycemic variability (Rodbard, 2009a) including standard deviation (SD), coefficient of variation (CV), interquartile range (IQR) (Rodbard, 2009b), mean amplitude of glycemic excursion (MAGE) (Service et al., 1970), M-value 78 (Schlichtkrull et al., 1965), mean of daily difference (MODD) (Molnar et al., 1972), continuous overall net glycemic action (CONGA) (McDonnell et al., 2005), and others. SD, IQR, and MAGE are described in detail in Chapter 3. SD and MAGE are both used as features for the CPGV metric. CV is a normalized standard deviation, the ratio of standard deviation and the mean. M-value is a measure of the stability of excursions compared to a normal blood glucose level. MODD is the mean difference between blood glucose levels on two consecutive days at the same time of day. CONGA is the SD of the differences of consecutive time windows of arbitrary size. CONGA is similar to MODD, but the window is not restricted to 24 hours, and the points are not averaged, but used to compute the SD. CONGA was the first glycemic variability metric designed specifically for CGM data. The panel focused on SD, IQR, and CV for their simplicity and familiarity. IQR was brought forward because SD, and other metrics based on SD including CV, assumes a normal distribution in the calculation even though blood glucose is not normally distributed, but IQR does not have this restriction. The panel acknowledged that the metrics are capable of measuring one aspect of glycemic variability or another, but none of them are capable of reflecting all aspects of glycemic variability. The panel concluded that glycemic variability should be part of a patient’s evaluation during doctor visits, but they did not conclude that any currently accepted metric should be the metric to use. The glycemic variability metric developed in this work is the only one that combines machine learning with many metrics to fit the consensus gestalt of multiple physicians. 5.2 Machine Learning Related Research The Machine Learning algorithms in this work have also been applied to a wide range of applications. There are pattern recognition applications, such as medical 79 diagnosis, which use the machine learning approaches used in this work. There are also time series forecasting applications such as Financial Market, Utility Load, and Control Systems prediction. 5.2.1 Machine Learning For Problem Detection In medicine, you need to distinguish between normal and abnormal cases. Being able to detect medical problems is something which machine learning is capable of doing, and is a common use of machine learning. Many applications in medicine collect large amounts of data, which allows machine learning to become useful in these applications. Doctors and nurses spend a large portion of time interpreting the large amounts of data being generated by modern tests. Machine learning is capable of screening many samples quickly and automatically so the the important results can be brought to the attention of the humans. In (Kiyan, 2011), artificial neural networks like multilayer perceptrons are used to classify breast cancer data as benign or malignant. The data contained about 700 samples of 9 features such as lump thickness and cell size. Using the neural networks, the authors were able to correctly classify over 95% of unseen samples. The authors compared 4 different neural networks: radial basis function networks; probabilistic neural networks; general regression neural networks; and multilayer perceptrons. The authors concluded that on their data samples, the general regression neural network was the best model with 98.8% accuracy. In (Chen et al., 2012), an expert system for diagnosing thyroid disease was developed. The system was based on a SVM with the features being selected using the Fisher score, a supervised feature selection method which determines relevant features for classification. The tuning parameters of the SVM were chosen using particle swarm optimization. As was done for the CPGV metric in this work, (Chen et al., 2012) 80 compared multiple methods. The authors cite 7 articles which used the same thyroid database to compare their system to. Using their methods, the authors were able to achieve 97.5% accuracy, better than the previous published methods for their dataset. Features extracted from brain scans, i.e. magnetic resonance imaging (MRI), have been used for diagnosing Alzheimer’s disease (Klöppel et al., 2008), and for identifying early risk of Alzheimer’s (Polat et al., 2012). Alzheimer’s disease is difficult to distinguish from natural aging and frontotemporal lobar degeneration (FTLD). In (Klöppel et al., 2008), an SVM was used to correctly classify Alzheimer’s vs. normal 95% of the time, Alzheimer’s disease vs. dementia 93% of the time, and Alzheimer’s disease vs. FTLD 81% of the time. In (Polat et al., 2012), an SVM was trained on a dataset of patients who were diagnosed with early onset Alzheimer’s disease. The SVM was able to correctly classify 79% of Alzheimer’s disease patients vs normal patients. 5.2.2 Time Series Prediction A survey of time series prediction applications (Sapankevych and Sankar, 2009) found the most published papers related to time series prediction were focused on financial markets. The next most common topic was electrical utility load. The rest of the topics were control system forecasting, general business forecasting, and other miscellaneous applications. Many time series forecasting applications are very similar to blood glucose prediction. In blood glucose prediction, as well as in other applications like financial market prediction, several factors influence the time series. In some applications, the factors need to be evaluated to determine if the influence is positive or negative. Blood glucose is nonlinear and non-stationary. Nonlinearity implies that no linear equation can closely approximate the data. Being non-stationary implies that it has dynamic trends 81 which change over time. Many time series problems share this nonlinearity and non-stationarity aspect. Five financial time series sources, including S&P 500 and several foreign bond indices, were studied in (Tay and Cao, 2001a). They found that an SVR outperformed an MLP, with the SVR being able to better fit the data, and the MLP overfitting. The authors published several follow-up studies. In the first follow-up publication (Tay and Cao, 2001b), the SVR was combined with a Self Organizing Feature Map (SOM) which clusters the entire input space into disjoint sections. The SOM is used as an intermediate layer to feed more informed features to the SVR. This method significantly improved prediction performance. In later work, the authors integrated ascending C SVR (Tay and Cao, 2002a), and -descending SVR (Tay and Cao, 2002b) into a single SVR (Cao and Tay, 2003). Ascending the C parameter of the SVR gives more weight to recent examples. Descending the parameter results in support vectors being made from more recent examples. Electrical load forecasting allows the power company to provide for more efficient power transmission, which reduces the price of electricity. (Bo-Juen et al., 2004) uses an SVR with features including day of week, time of day, weather, and holidays to forecast the maximum electrical load for the next 30 days. Including temperature reduced their performance, which they attributed to the inaccuracies in forecasting temperature. An SVR was compared to an Auto Regressive model for electrical load forecasting in (Mohandes, 2002). The SVR significantly outperformed the Auto Regressive model, especially when the number of training points was increased. 82 6 6.1 Future Work Variability Metric The development of the CPGV metric described in Chapter 3 is complete. However, validation of the metric needs to be done for the metric to become accepted in clinical practice. Biomarkers have been shown to correlate with the risk of long term complications of diabetes. If the CPGV metric can be correlated with biomarkers, then the metric would be validated as a method of determining risk of complications. Currently the SmartHealth Lab is conducting a patient trial in which the patients submit blood and urine samples, which are used to extract biomarkers. The team is also seeking out other databases which contain biomarkers and CGM data to expedite the validation process. 6.2 Blood Glucose Prediction The blood glucose prediction model described in Chapter 4 can be improved in many ways. The current model only considers some of the features that are tracked and are known to affect blood glucose levels. Some of the features that are yet not considered are: • Sleep events. Sleep is known to change the levels of the hormone cortisol. Cortisol changes a patient’s insulin sensitivity causing the blood glucose to change. A well known example of sleep affecting blood glucose levels is the dawn phenomenon. • Exercise events. Physical activity tends to pull glucose out of the blood into muscles causing blood glucose to drop. • Stress. Stress can induce cortisol, resulting in increasing blood glucose levels. However, the effects of stress are highly individual. In some patients it causes blood glucose levels to increase, and in others to decrease. 83 • Medication. Some medications, especially ones based on steroids, can have large influences on blood glucose levels. • Data from other sensors such as heart rate and galvanic skin response. The prediction model also does not consider time based events, such as the change in blood glucose 24 hours prior to the prediction point, or 7 days prior to the prediction point. The behavior of blood glucose on weekdays vs weekends can be considerably different for some patients. Another refinement would involve preprocessing the input data to the prediction model. In the CPGV metric, the blood glucose values were smoothed using a cubic spline smoothing algorithm. A modified method could be incorporated for the prediction model. However, the algorithm would need to be modified for smoothing the points near the end of the time series since to smooth point BGn , the cubic splines algorithm uses points BGn−1 and BGn+1 . When smoothing points at the end of the time series, the point BGn+1 is obscured. Transfer learning could be used to improve the predictions when there is not enough data to adequately train and tune a model for a new patient. Currently, it requires 1 week of data to fully train a model, and an additional week of data to fully tune the model. Transfer learning is a machine learning technique which stores knowledge from one problem for use in another. Currently, a model uses parameters tuned on other models if there is not enough tuning data. Transfer learning has not yet been explored to aid in situations where there is less than 1 week of data to train the model. Although patients are highly individual in their reactions to certain life events, some effects are expected to be shared across patients. For example, carbohydrates and insulin raise and lower blood glucose, respectively, for everybody; only the amount of change depends on the individual. 84 Optimizing the set of features is not part of the current work. All of the computed features are used to train the SVR. A greedy wrapper like the ones used for the CPGV metric could be used to select the features for the prediction model. Time series optimizations like the ascending-C and descending- (Cao and Tay, 2003) have improved other time series prediction problems, and could be beneficial to this work. 85 7 Summary and Conclusion This thesis presents research in machine learning for diabetes management. This work is contributes to two of the three major projects of the SmartHealth Lab at Ohio University, glycemic variability measurement and blood glucose prediction. There are two major contributions: 1. development of a metric for measuring glycemic variability, a serious problem for patients with diabetes; and 2. predicting patient blood glucose levels, in order to preemptively detect and avoid potential health problems. The first contribution is a novel solution to the glycemic variability measurement problem. The CPGV metric in this work is a machine learning regression model that learns the gestalt of 12 expert physicians’ impression of glycemic variability control. The value of the metric has been shown to closely reflect the physicians’ consensus. The root mean square error (RMSE) of the metric compared to the consensus was 0.417. If the metric is rounded to the nearest integer, the RMSE was 0.511. The RMSE of individual physicians, who gave integer ratings, compared to the consensus was 0.489. When used as a screen for excessive glycemic variability, the CPGV metric outperformed all other metrics that it was evaluated against. The accuracy of the CPGV metric as a screen was 90.1%. The accuracy of the most commonly used metrics, MAGE, standard deviation, and IQR was 80.8%, 84.3%, and 78.7%, respectively. These results have been published in the Journal of Diabetes Science and Technology (Marling et al., 2013). The second contribution is a blood glucose prediction model that uses machine learning for determining patient-specific predictions. The machine learning approach allows the model to learn the individuality of a patient that a pure physiological model cannot. The downfall of physiological models is they are not flexible enough to be used in 86 practice. The prediction model in this work combines the power of physiological models with the flexibility of machine learning models to provide individualized blood glucose models for prediction. The prediction model has been shown to outperform the baselines that it was compared against: a simple baseline, which assumes that the blood glucose levels do not change; and the statistical ARIMA model, which bases predictions on past blood glucose levels. The model also performed very similarly to and in some cases outperformed physicians who were given the same prediction problems. The RMSE of the 30 minute predictions compared to the actual blood glucose level was 19.5. The RMSE of the baseline ARIMA model was 22.9. The RMSE of the best physician predictions was 19.8. The RMSE of the 60 minute predictions compared to the actual blood glucose level was 35.7. The RMSE of the baseline ARIMA model was 42.2. The RMSE of the best physician predictions was 38.4. These results will be presented at the International Conference on Machine Learning and Applications in December, 2013. Future work is planned to validate the CPGV metric by correlating the metric with biomarkers that are linked to patient outcomes. The blood glucose prediction model could potentially be improved in the future through the incorporation of additional life-event features and experimentation with additional machine learning approaches. 87 References American Diabetes Association (2012a). Checking your blood glucose. http://www.diabetes.org/living-with-diabetes/treatment-and-care/blood-glucosecontrol/checking-your-blood-glucose.html, accessed November, 2012. American Diabetes Association (2012b). Hypoglycemia (low blood glucose). http://www.diabetes.org/living-with-diabetes/treatment-and-care/blood-glucosecontrol/hypoglycemia-low-blood.html, accessed November, 2012. American Diabetes Association (2012c). Type-1. http://www.diabetes.org/diabetes-basics/type-1, accessed November, 2012. American Diabetes Association (2013). Economic costs of diabetes in the US in 2012. Diabetes Care, 36(4):1033–1046. Bequette, B. W. (2005). A critical assessment of algorithms and challenges in the development of a closed-loop artificial pancreas. Diabetes Technology & Therapeutics, 7(1):28–47. Bergenstal, R. M., Ahmann, A. J., Bailey, T., Beck, R. W., Bissen, J., Buckingham, B., Deeb, L., Dolin, R. H., Garg, S. K., Goland, R., et al. (2013). Recommendations for standardizing glucose reporting and analysis to optimize clinical decision making in diabetes: the ambulatory glucose profile (AGP). Diabetes Technology & Therapeutics, 15(3):198–211. Bishop, C. M. (2007). Pattern recognition and machine learning (information science and statistics). Springer, New York. 88 Bo-Juen, C., Ming-Wei, C., and Chih-Jen, L. (2004). Load forecasting using support vector machines: A study on EUNITE competition 2001. IEEE Transactions on Power Systems, 19(4):1821 – 1830. Borg, G. A. V. (1982). Psychophysical bases of perceived exertion. Medicine and Science in Sports and Exercise, 14(5):377–381. Boutayeb, A. and Chetouani, A. (2006). A critical review of mathematical models and data used in diabetology. BioMedical Engineering OnLine, 5(1):43. Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (2008). Time series analysis: forecasting and control. Wiley series in probability and statistics. Hoboken, N.J. : J. Wiley & Sons, c2008. Cao, L.-J. and Tay, F. E. H. (2003). Support vector machine with adaptive parameters in financial time series forecasting. IEEE Transactions on Neural Networks, 14(6):1506–1518. Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm. Chen, H.-L., Yang, B., Wang, G., Liu, J., Chen, Y.-D., and Liu, D.-Y. (2012). A three-stage expert system based on support vector machines for thyroid disease diagnosis. Journal of medical systems, 36(3):1953–1963. Clarke, W. L., Cox, D., Gonder-Frederick, L. A., Carter, W., and Pohl, S. L. (1987). Evaluating clinical accuracy of systems for self-monitoring of blood glucose. Diabetes Care, 10(5):622–628. 89 Dalla Man, C., Raimondo, D. M., Rizza, R. A., and Cobelli, C. (2007a). Mathematical models of the metabolic system in health and in diabetes: GIM, simulation software of meal glucose–insulin model. Journal of Diabetes Science and Technology, 1(3):323–330. Dalla Man, C., Rizza, R. A., and Cobelli, C. (2007b). Meal simulation model of the glucose-insulin system. IEEE Transactions on Biomedical Engineering, 54(10):1740–1749. Danaei, G., Finucane, M., Lu, Y., Singh, G., Cowan, M., Paciorek, C., Lin, J., Farzadfar, F., Khang, Y., Stevens, G., et al. (2011). Global burden of metabolic risk factors of chronic diseases collaborating group (blood glucose). National, regional, and global trends in fasting plasma glucose and diabetes prevalence since 1980: systematic analysis of health examination surveys and epidemiological studies with 370 country-years and 2.7 million participants. Lancet, 378(9785):31–40. DCCT Research Group and others (1987). Diabetes control and complications trial (DCCT): results of feasibility study. Diabetes Care, 10(1):1–19. Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data mining and Knowledge Discovery, 3(4):409–425. Duke, D. L. (2009). Intelligent Diabetes Assistant: A Telemedicine System for Modeling and Managing Blood Glucose. PhD thesis, Carnegie Mellon University, Pittsburgh, Pennsylvania. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10–18. 90 Herbrich, R. (2002). Learning kernel classifiers: theory and algorithms. The MIT Press, Cambridge, MA. Hsu, C.-W., Chang, C.-C., and Lin, C.-J. (2003). A practical guide to support vector classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University, http://www.csie.ntu.edu.tw/∼cjlin. Hyndman, R. J. and Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. Journal of Statistical Software, 27(3):1–22. Kiyan, T. (2011). Breast cancer diagnosis using statistical neural networks. IU-Journal of Electrical & Electronics Engineering, 4(2):1149–1153. Klonoff, D. (2005). Continuous glucose monitoring roadmap for 21st century diabetes therapy. Diabetes Care, 28(5):1231–1239. Klonoff, D. C. (2007). The artificial pancreas: how sweet engineering will solve bitter problems. Journal of Diabetes Science and Technology, 1(1):72–81. Klöppel, S., Stonnington, C. M., Chu, C., Draganski, B., Scahill, R. I., Rohrer, J. D., Fox, N. C., Jack, C. R., Ashburner, J., and Frackowiak, R. S. (2008). Automatic classification of mr scans in alzheimer’s disease. Brain, 131(3):681–689. Kovács, L., Szalay, P., Almássy, Z., and Barkai, L. (2012). Applicability results of a nonlinear model-based robust blood glucose control algorithm. Journal of Diabetes Science and Technology, 7(3):708–716. Lehmann, E. and Deutsch, T. (1992). A physiological model of glucose-insulin interaction in type 1 diabetes mellitus. Journal of Biomedical Engineering, 14(3):235–242. Lehmann, E. D. (2013). AIDA. http://www.2aida.net/welcome/, accessed September, 2013. 91 Lehmann, E. D., Tarı́n, C., Bondia, J., Teufel, E., and Deutsch, T. (2011). Development of AIDA V4.3b diabetes simulator: Technical upgrade to support incorporation of lispro, aspart, and glargine insulin analogues. Journal of Electrical and Computer Engineering, 2011. Magni, L., Forgione, M., Toffanin, C., Dalla Man, C., Kovatchev, B., De Nicolao, G., and Cobelli, C. (2009). Artificial pancreas systems: Run-to-run tuning of model predictive control for type 1 diabetes subjects: In silico trial. Journal of Diabetes Science and Technology, 3(5):1091–1098. Maimone, A. (2006). Data and knowlege acquisition in case-based reasoning for diabetes management. Master’s thesis, Ohio University. Marling, C., Wiley, M., Bunescu, R., Shubrook, J., and Schwartz, F. (2011). Emerging applications for intelligent diabetes management. In Proceedings of the Twenty-Third Innovative Applications of Artificial Intelligence Conference (IAAI-11), San Francisco, CA, USA. AAAI Press. Marling, C., Wiley, M., Bunescu, R. C., Shubrook, J., and Schwartz, F. (2012). Emerging applications for intelligent diabetes management. AI Magazine, 33(2):67–78. Marling, C. R., Struble, N. W., Bunescu, R. C., Shubrook, J. H., and Schwartz, F. L. (2013). A consensus perceived glycemic variability metric. Journal of Diabetes Science and Technology, 7(4):871–879. McDonnell, C., Donath, S., Vidmar, S., Werther, G., and Cameron, F. (2005). A novel approach to continuous glucose analysis utilizing glycemic variation. Diabetes Technology & Therapeutics, 7(2):253–263. Miller, W. A. (2009). Problem detection for situation assessment in case-based reasoning for diabetes managemnt. Master’s thesis, Ohio University. 92 Mohandes, M. (2002). Support vector machines for short-term electrical load forecasting. International Journal of Energy Research, 26(4):335–345. Molnar, G., Taylor, W., and Ho, M. (1972). Day-to-day variation of continuously monitored glycaemia: a further measure of diabetic instability. Diabetologia, 8(5):342–348. Monnier, L., Wojtusciszyn, A., Colette, C., and Owens, D. (2011). The contribution of glucose variability to asymptomatic hypoglycemia in persons with type 2 diabetes. Diabetes Technology & Therapeutics, 13(8):813–818. Parker, R. S., Doyle, F. J., Ward, J. H., and Peppas, N. A. (2000). Robust H∞ glucose control in diabetes using a physiological model. AIChE Journal, 46(12):2537–2549. Petruzella, F. D. (2005). Programmable logic controllers. Tata McGraw-Hill Education, New York. Polat, F., Orhan Demirel, S., Kitis, O., Simsek, F., Isman Haznedaroglu, D., Coburn, K., Kumral, E., and Saffet Gonul, A. (2012). Computer based classification of mr scans in first time applicant alzheimer patients. Current Alzheimer Research, 9(7):789–794. Pollock, S. (1993). Smoothing with cubic splines. Department of Economics, Queen Mary and Westfield College. Qu, Y., Jacober, S. J., Zhang, Q., Wolka, L. L., and DeVries, J. H. (2012). Rate of hypoglycemia in insulin-treated patients with type 2 diabetes can be predicted from glycemic variability data. Diabetes Technology & Therapeutics, 14(11):1008–1012. R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 93 Rodbard, D. (2009a). Interpretation of continuous glucose monitoring data: glycemic variability and quality of glycemic control. Diabetes Technology & Therapeutics, 11(S1):S–55. Rodbard, D. (2009b). New and improved methods to characterize glycemic variability using continuous glucose monitoring. Diabetes Technology & Therapeutics, 11(9):551–565. Rodbard, D., Bailey, T., Jovanovic, L., Zisser, H., Kaplan, R., and Garg, S. K. (2009). Improved quality of glycemic control and reduced glycemic variability with use of continuous glucose monitoring. Diabetes Technology & Therapeutics, 11(11):717–723. Russell, S. J., Norvig, P., Canny, J. F., Malik, J. M., and Edwards, D. D. (1995). Artificial intelligence: a modern approach. Prentice Hall Englewood Cliffs. Sapankevych, N. and Sankar, R. (2009). Time series prediction using support vector machines: a survey. IEEE Computational Intelligence Magazine, 4(2):24–38. Schlichtkrull, J., Munck, O., and Jersild, M. (1965). The m-value, an index of blood-sugar control in diabetics. Acta Medica Scandinavica, 177(1):95–102. Seaquist, E. R., Miller, M. E., Bonds, D. E., Feinglos, M., Goff, D. C., Peterson, K., Senior, P., et al. (2012). The impact of frequent and unrecognized hypoglycemia on mortality in the accord study. Diabetes Care, 35(2):409–414. Service, F., Molnar, G., Rosevear, J., Ackerman, E., Gatewood, L., Taylor, W., et al. (1970). Mean amplitude of glycemic excursions, a measure of diabetic instability. Diabetes, 19(9):644–655. 94 Siegelaar, S. E., Holleman, F., Hoekstra, J. B., and DeVries, J. H. (2010). Glucose variability; does it matter? Endocrine Reviews, 31(2):171–182. Smola, A. J. and Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3):199–222. Tay, F. E. and Cao, L. (2001a). Application of support vector machines in financial time series forecasting. Omega, 29(4):309–317. Tay, F. E. and Cao, L. (2002a). Modified support vector machines in financial time series forecasting. Neurocomputing, 48(1):847–861. Tay, F. E. and Cao, L. (2002b). ε-descending support vector machines for financial time series forecasting. Neural Processing Letters, 15(2):179–195. Tay, F. E. H. and Cao, L. J. (2001b). Improved financial time series forecasting by combining support vector machines with self-organizing feature map. Intelligent Data Analysis, 5(4):339–354. Theodoridis, S. and Koutroumbas, K. (2009). Pattern recognition. Burlington, MA : Academic Press, c2009. Vapnik, V. (1995). The nature of statistical learning theory. Springer, New York. Vapnik, V. N. (1998). Statistical learning theory. Wiley-Interscience, New York. Vernier, S. J. (2009). Clinical evaluation and enhancement of a case-based medical decision support system. Master’s thesis, Ohio University. Wiley, M. T. (2011). Machine learning for diabetes decision support. Master’s thesis, Ohio University. 95 World Health Organization (2011). Fact sheet n 312. http://www.who.int/mediacentre/factsheets/fs312/en/. Zarkogianni, K., Vazeou, A., Mougiakakou, S. G., Prountzou, A., and Nikita, K. S. (2011). An insulin infusion advisory system based on autotuning nonlinear model-predictive control. IEEE Transactions on Biomedical Engineering, 58(9):2467–2477. Zoungas, S., Patel, A., Chalmers, J., de Galan, B. E., Li, Q., Billot, L., Woodward, M., Ninomiya, T., Neal, B., MacMahon, S., et al. (2010). Severe hypoglycemia and risks of vascular events and death. New England Journal of Medicine, 363(15):1410–1418. 96 Appendix A: CPGV Full Results Data Raw Raw Raw Raw Smooth Smooth Smooth Smooth Raw Raw Raw Raw Smooth Smooth Smooth Smooth Raw Algorithm SVR Gaussian SVR Gaussian SVR Gaussian SVR Gaussian SVR Gaussian SVR Gaussian SVR Gaussian SVR Gaussian SVR Linear SVR Linear SVR Linear SVR Linear SVR Linear SVR Linear SVR Linear SVR Linear Linear Regression Backward Forward Forward Backward Backward Forward Forward Backward Backward Forward Forward Backward Backward Forward Forward Backward Backward Selection method With Dev Without Dev With Dev Without Dev With Dev Without Dev With Dev Without Dev With Dev Without Dev With Dev Without Dev With Dev Without Dev With Dev Without Dev With Dev Training data 0.45648 0.456773 0.454196 0.469863 0.463989 0.463485 0.461253 0.46819 0.470581 0.42739 0.427446 0.334205 0.348358 0.349242 0.359651 0.356269 0.355408 0.349796 0.349191 0.57388 0.556184 0.548641 0.577501 0.572428 0.544831 0.550158 0.555434 0.56731 0.546358 0.31892 0.356485 0.516566 0.548641 0.512517 0.546358 0.562145 0.534016 0.543299 Rounded Output RMSE 0.321307 0.328478 0.325057 0.426895 0.433501 0.328845 0.335354 0.325953 0.328283 MAE 0.438268 0.440004 0.435996 0.43578 RMSE Table A.1: Full results for each of the glycemic variability platforms. Continued on next page 0.40954 0.414583 0.40625 0.435417 0.429583 0.40875 0.407917 0.41375 0.427083 0.39625 0.37875 0.40625 0.374583 0.39625 0.41375 0.397083 0.407083 Rounded Output MAE 97 Table A.1 – continued from previous page Algorithm Data Selection method Training data RMSE MAE Rounded Output RMSE Rounded Output MAE Linear Regression Raw Backward Without Dev 0.457279 0.333595 0.568039 0.40954 Linear Regression Raw Forward With Dev 0.470063 0.35457 0.562893 0.40705 Linear Regression Raw Forward Without Dev 0.469478 0.352595 0.558434 0.41205 Linear Regression Smooth Backward With Dev 0.461272 0.33892 0.570235 0.41954 Linear Regression Smooth Backward Without Dev 0.458412 0.3383 0.561398 0.40954 Linear Regression Smooth Forward With Dev 0.455203 0.349145 0.567308 0.41371 Linear Regression Smooth Forward Without Dev 0.454964 0.349435 0.550916 0.40288 Multilayer Perceptron Raw Backward With Dev 0.561344 0.4372 0.631932 0.47789 Multilayer Perceptron Raw Backward Without Dev 0.504739 0.380255 0.57098 0.42039 Multilayer Perceptron Raw Forward With Dev 0.579702 0.45092 0.637195 0.48122 Multilayer Perceptron Raw Forward Without Dev 0.529509 0.38323 0.597359 0.41704 Multilayer Perceptron Smooth Backward With Dev 0.559819 0.432585 0.652701 0.48956 Multilayer Perceptron Smooth Backward Without Dev 0.538918 0.399145 0.624642 0.45538 Multilayer Perceptron Smooth Forward With Dev 0.604697 0.40911 0.680807 0.46954 Multilayer Perceptron Smooth Forward Without Dev 0.519543 0.40556 0.610474 0.44788 98 99 Appendix B: Full Blood Glucose Level Prediction Results t+30 66 260 180 210 106 84 104 98 152 252 198 120 124 96 46 180 98 194 142 116 86 122 Point 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 122 68 104 144 172 94 148 48 92 142 90 206 248 120 88 102 96 60 224 144 292 70 t+60 132 100 138 126 188 102 174 72 98 120 140 140 246 158 112 140 102 108 152 164 222 66 30 min t0 132 100 138 126 188 102 174 72 98 120 140 140 246 158 112 140 102 108 152 164 222 66 60 min 143.68 104.4 134.07 133.35 192.3 83.72 163.52 76.61 98.98 126.35 132.6 156.09 252.22 164.75 121.87 113.4 106.8 134.31 183.47 186.61 226.07 55.69 30 min 150.5 116.21 133.35 133.37 193.27 75.52 148.61 80.28 99.24 134.52 131.65 158.05 252.91 158.83 125.13 105.09 113.7 146.43 186.79 187.1 228.27 54.68 60 min ARIMA 156.09 98.38 132.8 131.58 183.82 80.15 163.94 65.92 103.55 122.88 125.41 165.47 260.24 167.41 122.04 102.83 101.75 158.39 177.05 178.31 208.47 59.51 30 min SVR 171.81 102.49 136.06 131.93 169.15 79.95 156.3 67.76 75.46 157.05 122.89 174.49 272.26 147.2 127.14 102.56 104.07 181.53 171.14 184.67 202.49 73.77 60 min 150 94 126 142 202 94 177 91 100 120 132 160 269 175 120 118 109 130 171 184 234 64 30 min 170 85 125 162 189 84 171 112 107 128 130 168 248 179 123 151 124 150 191 198 198 165 60 min Phys 1 174 90 154 140 205 88 184 90 108 122 131 168 255 145 126 118 114 136 207 191 223 70 30 min 195 76 170 135 226 80 193 88 115 130 150 167 221 124 120 142 128 132 225 170 195 75 60 min Phys 2 172 57 184 170 147 70 184 140 110 124 105 234 176 179 93 107 107 183 159 74 91 144 161 82 177 146 116 90 210 122 106 126 107 246 147 193 93 70 104 178 259 65 238 61 60 min Phys 3 30 min Table B.1: Predictions for each baseline and the new prediction model. 157.93 96.91 128.99 126.4 182.05 80.34 164.41 98.36 98.37 113.86 118.55 200.02 215.05 145.43 105.82 109.67 99 135.09 191.77 133.27 187.85 60.78 60 min Continued on next page 149.4 95.99 131.59 130.15 191.73 83.37 166.41 88.18 99.84 119.21 128.82 172.11 247.64 167.92 115.72 120.36 101.71 129.25 190.86 169.9 216.57 74.56 30 min New model 100 t+30 116 236 312 314 212 182 102 262 96 160 82 92 244 306 318 170 178 122 324 94 174 256 Point 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 246 162 62 368 66 110 172 248 380 234 90 76 154 106 280 94 184 198 300 318 332 108 t+60 270 158 148 232 144 162 128 350 288 214 108 86 162 126 240 118 186 234 318 314 174 112 30 min t0 270 158 148 232 144 162 128 350 288 214 108 86 162 126 240 118 186 234 318 314 174 112 60 min 254.81 170.08 115.33 256.68 170.41 169.85 143.83 347.91 291.69 227.64 135.33 81.88 163.9 118.29 263.92 109.54 183.77 231.52 315.71 321.53 186.79 133.41 30 min 242.9 170.23 104.77 228.33 176.06 170.13 144.42 347.83 291.81 227.81 142.95 81.94 166.22 119.7 268.7 106.42 181.92 233.82 314.76 325.08 191.07 150.73 60 min ARIMA 248.19 150.15 106.65 236.15 150.27 162.94 151.11 339.64 279.54 233.59 133.3 80.01 159.66 108.19 258.56 108.83 182.9 226.89 315.56 318.72 213.92 131.67 30 min SVR 215.62 120.36 85.56 529.23 152.65 134.74 191.54 349.59 235.14 234.49 140.11 65.79 155.04 94.94 255.18 105.39 182.68 222.11 312.24 312.35 219.19 124.23 60 min 266 168 123 263 162 176 151 341 312 236 131 82 164 115 254 113 176 234 315 323 184 130 30 min 257 180 134 263 172 176 181 334 312 258 140 83 163 104 262 108 171 239 308 312 201 152 60 min Phys 1 Table B.1 – continued from previous page 255 176 121 270 169 189 161 329 316 209 150 109 163 118 279 106 174 247 294 307 209 142 30 min 234 168 102 258 157 171 163 309 328 203 169 111 154 118 295 93 163 250 281 274 210 144 60 min Phys 2 249 201 106 219 201 146 177 306 258 217 128 82 158 93 279 100 166 279 277 294 175 165 30 min 208 158 98 185 227 116 249 261 265 210 164 102 135 95 257 97 129 258 248 289 332 202 60 min Phys 3 185 153.93 92.62 272.58 118.38 149.39 141.08 276.9 278.06 237.43 162.25 74.55 154.93 111.33 228.22 89.93 184.86 218.57 311.36 280.16 232.49 136.57 60 min Continued on next page 233.73 157.63 110.8 313.05 93.68 169.97 144.07 314.77 282.99 240.55 132.78 83.22 158.84 114.98 253.53 103.75 183.51 227.69 313.44 303.72 217.31 131.69 30 min New model 101 t+30 146 140 132 170 226 250 180 150 192 104 240 196 106 276 210 292 46 198 208 130 248 220 Point 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 254 190 98 238 220 40 296 196 274 100 206 342 126 222 158 146 316 190 218 108 172 172 t+60 166 246 164 158 162 94 314 226 202 106 174 214 98 186 118 220 210 264 150 148 136 170 30 min t0 166 246 164 158 162 94 314 226 202 106 174 214 98 186 118 220 210 264 150 148 136 170 60 min 186.32 249.02 150.81 182.05 170.7 56.67 301.16 215.35 217.24 124.12 179.17 229.58 116.98 191.13 123.52 213.01 221.11 229.81 152.65 143.23 140.58 165.21 30 min 189.07 248.79 150.59 187.76 171.73 44.17 297.29 206.67 217.76 138.31 182.49 230.77 127.08 184.35 123.83 201.39 222.08 199.38 154.13 142.65 141.06 164.8 60 min ARIMA 192.42 233.77 145.2 185.36 169.55 51.07 280.85 215.39 223.51 146.16 189.99 233.24 114.61 205.75 131.02 211.2 221.41 248.96 159.42 136.71 147.95 162.24 30 min SVR 198.49 214.76 140.8 193.99 163.43 38.46 242.85 208.52 220.44 173.46 179.39 219.23 131.62 210.3 177.25 196.35 211.43 224.26 149.95 135.77 157.79 157 60 min 196 254 151 187 172 77 308 224 224 119 195 224 114 175 133 225 229 263 159 131 155 153 30 min 216 257 150 205 187 74 295 228 238 127 195 221 126 171 165 238 241 250 170 129 185 156 60 min Phys 1 Table B.1 – continued from previous page 210 227 166 206 178 73 287 241 237 122 201 232 120 202 143 236 240 255 170 132 149 167 30 min 227 216 186 233 200 64 267 263 202 131 203 207 137 215 161 246 250 242 176 121 162 196 60 min Phys 2 215 221 145 194 189 61 287 210 256 130 196 175 146 209 155 184 252 231 202 109 154 149 30 min 270 260 125 146 222 65 283 196 183 131 166 126 150 233 194 158 227 206 194 112 180 94 60 min Phys 3 211 225.31 129.31 235.2 171.83 41.45 257.11 210.14 234.52 120.49 178.05 224.61 110.73 209.09 145.21 168.93 220.29 216.34 170.97 130.2 151.21 162.75 60 min Continued on next page 201.86 252.11 141.31 206.59 170.72 55.27 276.7 216.64 246.6 124.6 177.9 237.95 105.51 209.65 134.97 196.94 230.61 247.78 163.33 134.61 149.71 169.24 30 min New model 102 t+30 246 88 260 188 178 160 142 200 172 228 108 194 344 108 204 68 110 118 196 84 194 174 Point 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 194 182 76 170 100 104 64 174 92 300 184 56 248 164 182 126 114 158 186 240 92 242 t+60 188 164 102 194 124 124 80 200 120 354 176 150 226 186 184 138 214 202 214 238 128 224 30 min t0 188 164 102 194 124 124 80 200 120 354 176 150 226 186 184 138 214 202 214 238 128 224 60 min 177.64 190.81 92.25 209.35 132.11 122.18 91.3 197.92 121.5 378.89 190.25 147.91 217.07 182.43 194.74 141.92 181.99 191.07 179.18 265.9 83.98 229.62 30 min 172.5 202.06 93.5 214.34 136.3 121.92 100 192.77 125 347.51 188.38 147.91 217.06 182.14 192.64 142.93 167.23 188.77 164.58 256.77 82.62 214.71 60 min ARIMA 174.76 191.73 88.43 211.7 135.96 122.09 89.4 197.76 117.33 361.04 195.21 152.46 196.96 176.71 201.81 139.12 180.24 184.24 213.94 258.46 67.8 239.87 30 min SVR 159.37 200.01 76.75 198.23 147.96 123.11 99.52 196.06 115 291 198.15 131.26 194.74 174.08 202.06 143.43 158.82 178.53 174.01 242.46 36.24 230.91 60 min 177 186 88 209 149 121 74 210 112 372 203 147 208 174 200 149 203 189 196 262 111 248 30 min 163 211 73 210 157 121 65 227 100 360 222 138 201 167 209 153 200 185 187 264 99 264 60 min Phys 1 Table B.1 – continued from previous page 168 201 94 211 154 119 74 220 107 318 202 165 205 180 197 152 201 182 200 270 98 241 30 min 158 210 115 193 171 105 81 231 110 281 228 180 182 168 176 165 186 162 192 255 73 226 60 min Phys 2 199 206 86 220 158 114 73 224 107 320 220 177 212 173 149 161 255 163 171 283 64 200 30 min 164 228 76 175 151 108 60 224 86 278 218 124 180 160 223 195 182 135 156 228 163 177 60 min Phys 3 186.48 219.46 86.36 190.83 141.13 116.87 109.62 196.8 115.3 271.45 190.48 146.8 203.58 184.72 194.76 134.33 152.04 153.52 149.46 216.09 51.21 232.25 60 min Continued on next page 174.9 194.74 88.83 200.13 132.55 119.11 92.73 201.26 117.05 333.93 191.38 150 205.86 182.29 205.02 137.81 179.97 177.13 190 245.15 83.45 242.6 30 min New model 103 t+30 140 178 190 138 80 212 136 188 104 172 314 86 128 70 128 272 88 304 172 78 144 96 Point 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 102 140 80 166 334 76 266 152 126 108 86 264 182 94 204 134 192 60 120 170 182 154 t+60 84 198 96 144 276 104 260 156 102 144 90 298 178 120 172 148 208 104 150 202 174 144 30 min t0 84 198 96 144 276 104 260 156 102 144 90 298 178 120 172 148 208 104 150 202 174 144 60 min 83.09 183.47 102.01 147.6 282.7 101.79 257.5 161.08 91.93 132.67 91.95 272.39 173.73 119.05 176.46 145.49 198.29 101.57 147.56 198.02 177.91 142.82 30 min 83.06 179.44 107.96 147.93 283.1 101.69 255.21 161.94 90.13 132.16 92.36 262.83 172.79 118.83 177.16 142.75 187.08 100.7 147.3 197.95 179.5 146.02 60 min ARIMA 83.93 187.92 100.88 150.21 268.08 100.57 246.75 158.02 98.38 128.13 93.96 260.08 174.76 121.42 171.74 147.27 206.41 96.93 144.1 187.47 175.53 134.42 30 min SVR 86.6 188.38 109.82 146.77 240.29 98.3 226.29 151.57 97.8 119.27 98.36 238.91 175.01 125.21 167.19 143.56 193.29 96.27 142.79 168.45 169.16 127.14 60 min 85 180 90 156 287 98 270 169 97 126 92 303 193 115 178 154 225 84 141 186 182 131 30 min 86 173 88 175 310 89 282 189 100 128 92 293 196 112 185 170 228 86 134 185 189 138 60 min Phys 1 Table B.1 – continued from previous page 81 178 95 161 297 99 273 176 93 123 90 306 184 113 182 157 235 91 139 188 188 130 30 min 76 168 91 159 305 87 264 187 105 101 82 287 169 88 179 168 249 97 134 182 179 123 60 min Phys 2 88 193 81 171 295 94 279 184 117 132 102 270 177 108 187 165 235 84 135 181 195 122 30 min 90 181 69 146 294 80 237 197 171 125 111 263 192 97 178 149 242 114 116 158 202 152 60 min Phys 3 93.03 178.13 100.49 158.48 280.29 101.82 257.89 163.36 112.74 135.64 101.7 247.79 180 106.51 186.01 147.79 196.89 100.33 141.74 167.13 165.88 133.3 60 min Continued on next page 86.32 185 97.76 157.64 296.21 101.11 257.46 162.7 106.82 131.87 94.67 273.63 173.17 106.92 181.8 149.16 203.96 97.89 143.81 184.13 175.17 124.77 30 min New model 104 t+30 80 126 214 92 258 134 84 326 136 114 170 232 216 164 174 118 144 104 114 188 298 102 Point 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 144 282 186 90 108 182 114 202 186 216 234 176 120 142 340 102 118 274 106 240 118 76 t+60 114 228 180 126 104 158 138 154 140 190 208 138 120 122 306 96 150 216 122 194 150 86 30 min t0 114 228 180 126 104 158 138 154 140 190 208 138 120 122 306 96 150 216 122 194 150 86 60 min 113.69 242.11 198.61 124.31 102.1 174.41 140.3 158.59 152.63 214.41 221.72 174.45 51.71 122.65 297.28 88.46 146.88 225.79 117.58 208.82 143.68 81.6 30 min 112.75 248.47 207.38 124.22 101.46 179.6 147.31 161.76 160.06 224.84 228.31 210.89 -18.75 122.75 277.85 86.43 146.7 226.23 117.44 210.59 145 81.39 60 min ARIMA 107.78 234.4 198.25 121.06 100.49 182.8 149.33 168.5 149.83 211.04 218.26 145.76 91.51 120.57 305.87 90.14 145.42 225.99 139.98 196.71 143.2 82.09 30 min SVR 104.56 226.85 198.89 118.4 104.41 196.9 154.15 189.23 156.58 212.56 213.13 121.18 105.14 119.81 304 95.93 146.63 220.2 152.24 188.74 139.22 85.2 60 min 99 246 194 121 104 171 128 158 156 214 228 157 109 127 319 85 136 245 107 216 138 79 30 min 89 254 209 111 113 184 126 170 174 243 235 172 109 140 331 79 132 262 100 227 126 71 60 min Phys 1 Table B.1 – continued from previous page 98 247 194 114 117 180 133 169 162 218 232 168 116 112 325 86 144 247 105 225 132 78 30 min 80 231 188 103 138 195 119 192 155 258 228 154 115 93 338 74 130 252 92 235 127 74 60 min Phys 2 96 206 209 102 89 188 163 178 164 231 233 142 112 146 326 80 128 241 105 220 129 98 30 min 70 167 214 93 139 186 180 212 184 238 243 115 148 169 298 88 109 254 99 206 110 77 60 min Phys 3 86.12 252.26 189.99 122.41 122.04 194.59 147.7 178.7 157.29 217.15 208.66 108.28 161.04 129.5 325.19 80.28 140.82 238.12 124.04 198.22 131.91 79.03 60 min Continued on next page 97.93 243.64 189.94 120.8 107.39 184.39 143.29 170.78 151.16 210.69 219.2 130.77 86.54 124.83 320.38 83.36 143.2 234.06 119.37 198.16 135.42 74.73 30 min New model 105 t+30 142 206 158 290 146 128 214 86 90 114 190 216 116 128 204 98 118 232 236 144 206 296 Point 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 294 234 108 212 270 116 74 226 98 98 208 216 104 68 90 278 118 118 268 152 206 116 t+60 278 190 162 242 188 106 126 168 198 148 190 216 130 110 108 198 150 168 304 166 196 154 30 min t0 278 190 162 242 188 106 126 168 198 148 190 216 130 110 108 198 150 168 304 166 196 154 60 min 277.35 192.36 147.77 216.56 183.98 114.42 124.57 174.75 179.97 144.24 166.53 212.51 126.81 108.67 102.86 199.92 137.5 171.65 293.49 157.41 201.4 147.27 30 min 265.27 192.43 141.83 194.31 182.92 128.61 124.56 175.01 176.28 142.82 158.31 211.3 126.8 109.74 101.06 199.92 132.02 178.42 270.82 153.61 203.82 147.23 60 min ARIMA 275.85 201.64 144.3 216.85 185.65 122.34 122.3 179.03 158.15 133.71 165.33 205.57 110.89 101.66 102.52 202.08 131.76 164.46 298 151.93 196.67 139.76 30 min SVR 258.34 214.55 134.39 192.71 165.95 129.14 126.06 184.13 130.1 125.13 161.59 194.82 102.31 100.52 103.59 199.52 116.7 163.71 293.36 139.31 185.39 125.67 60 min 295 204 145 228 193 108 120 182 173 139 177 223 108 97 95 209 127 158 312 156 208 143 30 min 298 218 137 218 190 96 111 197 162 141 157 234 95 81 88 218 117 148 307 146 208 135 60 min Phys 1 Table B.1 – continued from previous page 295 203 147 228 177 102 116 173 162 138 180 219 108 95 93 219 136 161 307 150 205 145 30 min 265 195 132 201 166 93 116 171 151 132 171 206 92 72 85 234 125 151 283 127 200 125 60 min Phys 2 279 204 132 224 181 133 114 185 227 174 183 211 137 95 98 221 130 161 313 149 207 124 30 min 239 211 89 196 157 141 96 191 149 120 192 220 121 77 85 213 107 141 289 144 227 111 60 min Phys 3 244.26 193.97 133.94 174.66 173.44 126.4 113.21 179.87 131.73 135.71 167.34 208.04 111.09 121.27 103.94 207.83 125.9 160.39 260.96 139.03 189.59 123.73 60 min Continued on next page 271.22 196.92 141.69 206.9 183.05 112.56 111.95 177.82 155.45 141.42 169.42 210.33 111.48 109.53 103.31 196.12 131.39 162.07 295.36 149.8 196.81 137.84 30 min New model 106 t+30 140 78 160 154 100 172 156 188 198 178 178 242 106 142 154 92 256 232 124 132 246 154 Point 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 162 244 110 116 236 260 86 168 118 106 238 162 172 174 214 146 172 98 186 146 60 166 t+60 176 220 154 122 230 222 108 148 146 104 252 220 150 206 160 146 144 122 152 160 104 106 30 min t0 176 220 154 122 230 222 108 148 146 104 252 220 150 206 160 146 144 122 152 160 104 106 60 min 168.5 229.04 148.02 111.58 229.3 216.59 107.48 150.89 191.27 109.7 247.8 223.43 165.88 210.1 161.48 151.01 168.03 154.83 150.83 159.98 102.67 108.97 30 min 166.45 230.35 146.99 111.27 229.25 201.53 107.45 151.51 236.36 106.01 249.49 224.15 170.73 209.42 157.93 149.36 175.24 172.02 150.83 159.98 102.63 109.2 60 min ARIMA 156.99 232.28 144.18 104.19 230.6 217.72 108.32 148.72 155.24 107.42 242.85 223.61 165.55 210.59 166.36 156.21 172.5 147.43 154.43 155.86 97.26 126.27 30 min SVR 146.7 229.72 143.32 115.63 237.92 211.64 113.2 148.17 139.62 111 233.78 223.89 166.84 205.3 149.03 157.02 189.34 164.85 172.25 151.76 98.94 155.52 60 min 152 247 140 100 232 242 99 154 173 105 245 134 169 210 193 177 161 146 140 157 87 117 30 min 142 261 136 83 232 242 96 165 184 101 238 131 180 216 197 180 177 161 139 162 80 130 60 min Phys 1 Table B.1 – continued from previous page 155 237 145 95 229 250 101 154 178 106 242 238 167 216 192 165 167 154 139 159 90 125 30 min 131 228 134 75 214 238 91 145 162 101 227 205 149 216 185 142 176 171 131 154 81 129 60 min Phys 2 157 209 138 88 219 257 104 174 178 102 233 212 169 208 166 169 179 169 182 150 86 123 30 min 155 193 130 101 232 261 93 155 164 96 225 188 188 197 187 129 173 201 151 165 70 145 60 min Phys 3 152.58 224.08 144.69 100.69 224.7 215.73 112.51 157.35 151.34 119.02 228.5 205.03 150.76 189.17 176.64 169.56 151.58 150.3 172.88 151.84 90.22 145.74 60 min Continued on next page 151.03 229.66 144.65 95.37 227.43 222.08 108.29 155.63 162.45 111.77 239.05 211.41 156.02 206.87 178.92 168.36 171.5 142.33 161.69 155.43 91.09 124.66 30 min New model 107 Table B.1 – continued from previous page Point t+30 t+60 ARIMA t0 SVR Phys 1 Phys 2 Phys 3 New model 30 min 60 min 30 min 60 min 30 min 60 min 30 min 60 min 30 min 60 min 30 min 60 min 30 min 60 min 177 148 146 148 148 149.92 150.35 148.59 146.78 154 162 154 149 152 158 148.98 148.55 178 148 144 148 148 148.23 148.4 147.76 147.2 148 145 146 139 145 153 146.53 145.95 179 110 122 96 96 116.59 119.75 117.45 122.27 121 135 120 141 118 108 107.51 112.41 180 216 218 214 214 215.51 214.89 212.33 204.63 223 227 226 229 227 233 211.38 211.43 181 132 142 134 134 127.74 126.58 128.1 125.97 140 168 134 151 129 114 130.77 131.52 182 76 66 78 78 87.19 88.05 101.11 135.98 98 126 111 156 56 101 94.51 113.72 183 216 244 212 212 216.13 216.53 214.03 212.18 238 239 229 219 213 240 219.77 220.94 184 260 270 244 244 262.49 253.7 258.97 259.06 272 282 273 248 275 240 257.84 241.92 185 378 376 328 328 331.01 321.31 334.56 305.38 351 355 354 342 340 301 317.82 305.58 186 148 146 152 152 158.63 160.94 157.21 158.06 156 167 171 184 151 169 153.74 157.33 187 182 146 138 138 184.19 201.71 182.76 193.27 167 196 173 196 166 127 172.67 177.86 188 104 90 148 148 134.33 133.32 127.31 123.88 123 104 127 102 131 110 131.66 134.24 189 284 242 206 206 231.37 232.22 258.08 243.53 249 254 260 247 247 196 260.36 250.47 190 166 170 172 172 165.45 165.15 160.49 158.07 162 162 161 191 158 184 169.3 199.96 191 124 100 162 162 156.21 155.95 149.44 146.65 134 119 135 119 140 116 146.63 136.8 192 88 114 82 82 74.79 73.91 76.2 81.54 68 66 88 113 76 106 65.32 83.16 193 292 314 176 176 201.34 206.45 203.65 209.8 227 239 220 211 229 194 202.83 208.37 194 234 222 198 198 245.15 256.49 236.95 223.53 231 254 234 219 238 201 239.47 238.06 195 180 182 184 184 182.47 182.15 184.46 188.75 175 169 176 172 175 175 180.15 177.13 196 116 124 124 124 121.51 120.94 125.65 130.96 116 107 117 110 114 108 123.51 122.2 108 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Thesis and Dissertation Services !