Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
UNIVERSITI TEKNOLOGI MALAYSIA DECLARATION OF THESIS / UNDERGRADUATE PROJECT PAPER AND COPYRIGHT Author's full name : CHOON YEE WEN Date of birth : 11.04.1986 Title INFORMATIVE GENE SELECTION USING BAYESIAN MODEL AVERAGING FOR PATIENTS SURVIVAL ANALYSIS Academic Session: 2009/2010 I declare that this thesis is classified as : CONFIDENTIAL (Contains confidential information under the Official Secret Act 1972)* RESTRICTED (Contains restricted information as specified by the organisation where research was done)* OPEN ACCESS I agree that my thesis to be published as online open access (full text) I acknowledged that Universiti Teknologi Malaysia reserves the right as foll ows : 1. The thesis is the property of Universiti Teknologi Malaysia. 2. The Library of Universiti Teknologi Malaysia has the right to make copies for the purpose of research only. 3. The Library has the right to make copies of the thesis for academic exchange. Certified by : z4 SIGNATURE SIGNATURE OF SUPERVISOR 860411 23 6684 Mr. Afnizanfaizal Bin Abdullah - - (NEW IC NO. /PASSPORT NO.) Date 11 APRIL 2010 NOTES : NAME OF SUPERVISOR Date : 11 APRIL 2010 If the thesis is CONFIDENTIAL or RESTRICTED, please attach with the letter from the organisation with period and reasons for confidentiality or restriction. iii INFORMATIVE GENE SELECTION USING BAYESIAN MODEL AVERAGING FOR PATIENTS SURVIVAL ANALYSIS CHOON YEE WEN This thesis is submitted in partial fulfillment of the requirements for the award of the Bachelor of Computer Science Degree in Bioinformatics Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia April 2010 VI ABSTRACT Microarray technology is now widely used to identify potential biomarkers for cancer prognostics and diagnostics. With a vast number of genes information produced by microarray, informative gene selection is needed to both decrease clinical costs and mitigates the possibility of overfitting due to high inter variable correlations. A problem with feature selection algorithms used to produce continuous predictors of patient survival is that they fail to explain the model (a set of selected genes whose regression coefficients haven been calculated for use in predicting survival prognosis) uncertainty. With thousands of genes and only tens to hundreds of samples, it often happens that a number of different models describe the data about equally well. In this research. BMA (Bayesian Model Averaging) method is applied to select a subset of genes for survival analysis on microarray data. BMA combines the effectiveness of multiple models by taking the weighted average posterior distribution instead of choosing a single model and proceeding as if the data were generated from it. In this research, BMA method showed that it can successfully select the related genes and produce significant result for the experiments in shorter time. The results obtained from BMA method proved that BMA is an appropriate method to use in gene selection. REFERENCES Annest, A. . Bumgarner, R. E. Raftery, A. E. and Yeung, K. Y. (2009). Iterative Bayesian Model Averaging: a method for the application of survi val anal ysis to high -di mensional microarray data. BMC Bioinformatics. 10, 1-13. Ben-Dor, A. Bruhn, L. , Friedman. N. , Nachman. I. Schummer. M. , and Yakhini, Z. (2000). Tissue Classification with Gene Expression Profiles. Journal of Computational Biology. 7, 559-583. Breast Cancer. http://www.breastcancer.org. Retrieved September I 2009. Cancer Back Up, http://www.cancerbackup.org.uk/Cancertype/LymphomationHodgkin/TypesolVILL/diffuselargeb-cell. Retrieved September 3, 2009 Chai. I1. and Domeniconi, C. (2004). An Evaluation of Gene Selection Methods for Multi-class Microarray Data Classification. Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinfarmatics. Pisa. Italy. 7-14. Chow, M. , Maier, E. . and Mian. 1. (2001). Identifying Marker Genes in Transcription Profiling Data Using a Mixture of Feature Relevance Experts. Physiol Genomics. 5, 99-111. Cox. D. (1972). Regression Models and Life Tables. Journal of the Royal Statistical Society. 34, 187-220. Dudoit. S. . Fridlyan, J. and Speed, T. (2002). Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data. Journal of the American Statistical Association. 97, 77-87. Geman, D. , D'Avignon, C. Naiman, D. , and Winslow, R. (2004). Classifying Gene Expression Profiles from Pairwise mRNA Comparisons. Statistical Applications in Genetics and Molecular Biology. 3, 1-21. Merck Manuals, http:Pii-ww. merck. com/mmpe/print/Isec18/ch253/ch253e. hind. Retrieved September 3, 2009. Golub, T. Slonim. 1).. Tamayo, P. , Huard, C.. Gaasenbeek. M. , Mesirov, J.. Coller. H. . Loh. M. Downing. J. , Caliqiuri, M. , Bloomfield, C. , and Lander, E. (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science. 286. 531537. Guyon, I. , Weston, J. , and Barnhill, S. (2002). Gene Selection for Cancer Classification Using Support Vector Machines. Machine Learning. 46, 289-422. Huang, T. , Kecman. V. , and Kopriva. 1. (2006). Kernel Based Algorithms for Mining Huge Data Sets: Supervised. Semi-Supervised, and Unsupervised Learning. In Studies in computational intelligence Volume 17. Berlin: Springer Verlag. Ivan Kon Tai Kiong. Similarity-Based Robust Clustering Algorithm for Cancer Patient Survival Analysis. Bachelor Degree Thesis. Universiti Teknologi Malaysia, Skudai. J. A. Hoeting. , D. Madigan. . A. E. Rafters'. and C. T. Volinsky. Bayesian Model Averaging: A Tutorial. Statistical Science Volume 14, 4, 382-417. Li, J. , Duan. Y. , and Ruan, X. (2007). A Novel Hybrid Approach to Selecting Marker Genes for Cancer Classification Using Gene Expression Data. The International Conference on Bioinformatics and Biomedical Engineering. 1CBBE, 264-267. Li, L. , Weinberg. C. Darden, T.. and Pedersen, L. (2001). Gene Selection for Sample Classification Based on Gene Expression Data: Study of Sensitivity to Choice of Parameters of the GA/KNN Method. Bioinformatics. 17. 1131-1142. Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers. Nguyen, D. and Rocke, D. (2002). Tumor classification by Partial Least Square Using Microarray Gene Expression Data. Bioinfarinatics. 18. 39-50. Pudil. P. Novovicova, J. and Kittler, J. (1994). Floating Search Methods in Feature Selection. Physical Review Letters. 15. 1119-1125. Raftery. A. E. (1995). Bayesian Model Selection in Social Research (with Discussion). In Sociological Methodology 1995. 111-196 Rosenwald, A et al. , http.7.1:1linpp, nih. gov/DLBCLI. Retrieved August 10. 2009. StatLib, http://lih. slat. cniu. edu/S/bic. surv. Retrieved August 10, 2009. Supplementary Web Site to Iterative BMA Public ation in Progress. http://expression. wa.shington. edu/publications/kayee/ibnia.surv/ . Retrieved August 10, 2009. Tay Poh Ling (2010). Iterative Bayesian Model Averaging ,for Patients Survival Analysis. Bachelor Degree Thesis. tJniversiti Teknologi Malaysia. Skudai. van 't Veer. L. J, Dal. II. Vijver, M. J. van. de.. Ile, Y. D.. Hart. A. A. , Mao, M. Peterse, H. L. , Kooy. K. van. der. Marton. M. J. , Witteveen. A. T. , Schreiber, G. J. , Kerkhoven, R. M. , Roberts, C. , Linsley. P. S. . Bernards, R.. Friend. S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415. 530-536. Volinsky, C.. Madigan, D. , Raftery, A. E. and Kronmal. R. (1997). Bayesian Model Averaging in Proportional Hazard Models: Assessing the Risk of a Stroke. Applied Statistics. 46. 443-448. Wong Jun Wei (2010). Gene Selection Using Support Vector Machine for Patient Cancer Survival Analysis. Bachelor Degree Thesis. Universiti Teknologi Malaysia, Skudai. Yeung, K. Y. . Bumgarner. R. E, and Raftery. A. E. (2005). Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Oxford ,Journals. 21(10), 23942402. Merck Manuals, http:Pii-ww. merck. com/mmpe/print/Isec18/ch253/ch253e. hind. Retrieved September 3, 2009. Zhou. X. and Tuck, D. P. (2007). MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics. 23(9). 1106-1114.