Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
An approach to predict the survival time of childhood Acute Lymphoblastic Leukemia (ALL) patients Name : Rangana Jayashanka Supervisor: Mrs.Rupika Wijesinghe Co-Supervisor: Dr. A.R. Weerasinghe Overview Introduction Objectives Related Work Research Design Results and Discussion Childhood Acute Lymphoblastic Leukemia (ALL) •Leukemia is a cancer of blood cells. •Acute lymphoblastic leukemia is a cancer of the lymphoblast. •This is the commonest cancer among children. •This is the most dangerous cancer among children. Survival Prediction •Survival prediction is the task of predicting the length of time that a patient will survive. •Doctors can’t accurately predict the survival time(prognosis) since the relationship between health status and survivability is still unknown. •This is a difficult task due to the complex relationship between biological, genetic and environmental factors in the human body. Importance of the Predicting Survival Time •An accurate survival model can help in the treatment and care of patients. •correct prognosis can guide doctors on planning future and selecting treatments. •Can show which features are more dominant in predicting survival time. Main Objectives 1.Identify the factors affect the survival of patients diagnosed as ALL. 2.Design a model to predict the survival time of newly diagnosed ALL patients. TNM Staging System Burke and Henson, 1993 •Introduced in the 1960. •Attributes to measure cancer outcome: 1.Tumor size 2.Number of lymph nodes with metastatic disease 3.Distance metastases TNM Staging System Burke and Henson, 1993 •New prognosis factors have been identified after introducing this system. •Several problems of the model: 1.Accuracy ( breast cancer - 44%) 2. predictive variables can’t be added to the model 3.cannot apply to all cancers Research Design Overview of the proposed methodology Data Collection Data Pre-processing Feature Selection Statistical Models Build Prediction Model Evaluation Machine Learning Models Machine Learning Model Evaluation •Use 5-fold cross-validation to obtain the best estimate of the predictor Primary Criteria -Relative Absolute Error (RAE) -Concordance Index (CI) Secondary Criteria -Average L1 Error Evaluation •Measure the performances of classifiers using –Accuracy –Sensitivity –Specificity •Visualization –Plot the actual survival time and predicted survival time on the same graph Results and Discussion Machine Learning models Feature selection •Exhaustive sequential backward method used •Features selected; Age group (age > 10, age < 10) Cranial RT First Lumber Puncture test results Prednisolone Response Regimen WBC count ALL subtype Relapse Baseline Take average or median value of the entire population as the predicted survival time for each individual patient Regression Method RAE CI LI<12 RAE>12 Mean Value 0.457 0.500 undefined 0.457 Average Value 0.531 0.500 undefined 0.531 Baseline Median Average Machine Learning algorithms for whole data set •We applied 2 main machine learning approaches 1. Multiple Linear Regression (MLR) 2. Support Vector Regression (SVR) Regression Method RAE CI LI < 12 RAE > 12 MLR 0.298 0.537 0.832 0.380 SVR 0.280 0.543 undefined 0.280 Machine Learning algorithms for whole data set MLR SVR Grouping •We used two different grouping approaches in this work 1.Group according to the Risk categorization 2.Classification and Regression Trees (CART) Group according to the Risk categorization Medical doctors categorize patients into 3 main risk groups 1. Standard Risk Group (SRG) 2. High Risk Group (HRG) 3. Very High Risk Group (VHRG) Multiple Linear Regression SRG HRG VHRG Support Vector Regression SRG HRG VHRG Classification and Regression Trees (CART) Classification and Regression Trees (CART) Place a patient into a sub group and then predict that the patient will live the average or median of that sub group Regression Method RAE CI L1 < 12 RAE > 12 Average 0.276 0.535 undefined 0.276 Median 0.274 0.525 undefined 0.274 Combination of CART and Regressions •Segregate the patients using the classification and regression tree. •Applied learning algorithms at each node of the resulting tree. Regression RAE Method CI L1 < 12 RAE > 12 MLR SVR 0.654 0.661 4.407 4.234 0.248 0.212 0.253 0.223 Outlier Detection •Outliers are the patients who are extremely different from the rest of the population. •Eliminate 5% instances that have relatively higher Mahalanobis distance. Multiple Linear Regression after removing outliers HR SR VHR Support Vector Regression after removing outliers HR SR VHR Classification Classify patients into “long survivor” versus “short survivor”, where the classification boundary is the average survival time of the entire population. Classification Regression Model Group Accuracy Sensitivity Specificity Base Line - Average Whole Data Set 0.5234 0.0000 1.0000 Base Line - Median Whole Data Set 0.5723 0.1267 0.8565 SRG 0.7189 0.8246 0.6245 HRG 0.7267 0.7637 0.7253 VHRG 0.7392 0.7862 0.7154 SRG 0.7115 0.8084 HRG 0.7337 0.7634 0.7097 VHRG 0.6955 0.8084 0.6025 Combined Model Whole Data Set 0.7512 0.7612 Multiple Linear Regression 0.6025 Support Vector Regression 0.7174 Classification After handing censored data, eliminating outliers and groping most predictors were able to achieve at least 70% accuracy. Conclusion •Multiple Linear Regression and Support Vector Regression are effective •Performances of survival prediction models increase after segregating the patients into groups •For classification task, most of the classifiers were able achieve at least 70% accuracy Conclusion •We can effectively predict childhood acute lymphoblastic leukemia patients’ survival times by taking the following steps: 1. Processing Data 2. Segregating Patients 3. Handling Censoring 4. Eliminating Outliers 5. Learning Predictors Future Works •Explore better techniques to handle censored data •Collect more available data for analysis •Kernel functions like, Gaussian Radial Basis Polynomial of degree 2 for the support vector regression need to be studied References [1]Alfred Reiter, Martin Schrappe, Wolfgang Hiddemann, and Gunter Henze, "Chemotherapy in 998 Unselected Childhood Acute Lymphoblastic Leukemia Patients. Results and Conclusions of the Multicenter Trial ALL-BFM 86," Blood, vol. 84, pp. 3122-3133, July 1994. [2]Martin Schrappe, Alfred Reiter, Martin Zimmermann, and Wolfgang Hiddemann, "Improved outcome in childhood acute lymphoblastic leukemia despite reduced use of anthracyclines and cranial radiotherapy: results of trialALL-BFM 90," BLOOD, vol. 95, pp. 3310-3322, JUNE 2000. [3]Heather S Jim, Michael A Andryskowski, N Pamela Munster, and B Baul Jacohsen, "Physical symptoms/side effects during breast cancer treatment predict posttreatment distress," Annals of Behavioral Medicine, vol. 34, no. 2, pp. 200-208, June 2007. [4]Susan Halabi, Eric J Small, Philip W Kantoff, and Michael W Kattan, "Prognostic Model for Predicting Survival in Men With Hormone-Refractory Metastatic Prostate Cancer," Journal of Clinical Oncology, vol. 21, no. 7, pp. 1232-1237, April 2003. [5]Tien Hoang, Ronghui Xu, Joan H Schiller, Philip Bonomi, and David H Jahnson, "Clinical Model to Predict Survival in Chemonaive Patients With Advanced Non–Small-Cell Lung Cancer Treated With Third-Generation Chemotherapy Regimens Based on Eastern Cooperative Oncology Group Data," Journal of Clinical Oncology, vol. 23, pp. 175-183, January 2005. References [6] R. R.Bittern, A.Cushieri. "Artificial neural networks in cancer management." 2002 [7] M. Lundin and J. Lundin. “Artificial neural networks applied to survival prediction in breast cancer”. Oncology, (57):281-286, 1999 [8] H.-C. Lin and S. Jabbari. Predicting survival times for cancer patients. 287(68), 1994. [9] H.-C. Lin. Learning accurate repressors for predicting survival times of individual cancer patients. 2011. [10] E. L. Kaplan and P. Meier. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 53, 1958.