Download UNIVERSITI TEKNOLOGI MALAYSIA DECLARATION OF THESIS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
UNIVERSITI TEKNOLOGI MALAYSIA
DECLARATION OF THESIS / UNDERGRADUATE PROJECT PAPER AND COPYRIGHT
Author's full name : CHOON YEE WEN
Date of birth
:
11.04.1986
Title
INFORMATIVE GENE SELECTION USING BAYESIAN MODEL
AVERAGING FOR PATIENTS SURVIVAL ANALYSIS
Academic Session: 2009/2010
I declare that this thesis is classified as :
CONFIDENTIAL
(Contains confidential information under the Official Secret
Act 1972)*
RESTRICTED
(Contains restricted information as specified by the
organisation where research was done)*
OPEN ACCESS
I agree that my thesis to be published as online open access
(full text)
I acknowledged that Universiti Teknologi Malaysia reserves the right as foll ows :
1. The thesis is the property of Universiti Teknologi Malaysia.
2. The Library of Universiti Teknologi Malaysia has the right to make copies for the purpose
of research only.
3. The Library has the right to make copies of the thesis for academic exchange.
Certified by :
z4
SIGNATURE
SIGNATURE OF SUPERVISOR
860411 23 6684
Mr. Afnizanfaizal Bin Abdullah
-
-
(NEW IC NO. /PASSPORT NO.)
Date 11 APRIL 2010
NOTES :
NAME OF SUPERVISOR
Date : 11 APRIL 2010
If the thesis is CONFIDENTIAL or RESTRICTED, please attach with the letter from
the organisation with period and reasons for confidentiality or restriction.
iii
INFORMATIVE GENE SELECTION USING BAYESIAN MODEL AVERAGING
FOR PATIENTS SURVIVAL ANALYSIS
CHOON YEE WEN
This thesis is submitted in partial fulfillment of the requirements for the award of the
Bachelor of Computer Science Degree in Bioinformatics
Faculty of Computer Science and Information Systems
Universiti Teknologi Malaysia
April 2010
VI
ABSTRACT
Microarray technology is now widely used to identify potential biomarkers for
cancer prognostics and diagnostics. With a vast number of genes information
produced by microarray, informative gene selection is needed to both decrease
clinical costs and mitigates the possibility of overfitting due to high inter variable correlations. A problem with feature selection algorithms used to
produce continuous predictors of patient survival is that they fail to explain the
model (a set of selected genes whose regression coefficients haven been
calculated for use in predicting survival prognosis) uncertainty. With thousands
of genes and only tens to hundreds of samples, it often happens that a number
of different models describe the data about equally well. In this research.
BMA (Bayesian Model Averaging) method is applied to select a subset of
genes for survival analysis on microarray data. BMA combines the
effectiveness of multiple models by taking the weighted average posterior
distribution instead of choosing a single model and proceeding as if the data
were generated from it. In this research, BMA method showed that it can
successfully select the related genes and produce significant result for the
experiments in shorter time. The results obtained from BMA method proved
that BMA is an appropriate method to use in gene selection.
REFERENCES
Annest, A. . Bumgarner, R. E. Raftery, A. E. and Yeung, K. Y. (2009).
Iterative Bayesian Model Averaging: a method for the application of
survi val anal ysis to high -di mensional microarray data. BMC
Bioinformatics. 10, 1-13.
Ben-Dor, A. Bruhn, L. , Friedman. N. , Nachman. I. Schummer. M. , and
Yakhini, Z. (2000). Tissue Classification with Gene Expression Profiles.
Journal of Computational Biology. 7, 559-583.
Breast Cancer. http://www.breastcancer.org. Retrieved September I 2009.
Cancer Back Up, http://www.cancerbackup.org.uk/Cancertype/LymphomationHodgkin/TypesolVILL/diffuselargeb-cell. Retrieved September 3, 2009
Chai. I1. and Domeniconi, C. (2004). An Evaluation of Gene Selection
Methods for Multi-class Microarray Data Classification. Proceedings of
the Second European Workshop on Data Mining and Text Mining in
Bioinfarmatics. Pisa. Italy. 7-14.
Chow, M. , Maier, E. . and Mian. 1. (2001). Identifying Marker Genes in
Transcription Profiling Data Using a Mixture of Feature Relevance
Experts. Physiol Genomics. 5, 99-111.
Cox. D. (1972). Regression Models and Life Tables. Journal of the Royal
Statistical Society. 34, 187-220.
Dudoit. S. . Fridlyan, J. and Speed, T. (2002). Comparison of Discrimination
Methods for the Classification of Tumors Using Gene Expression Data.
Journal of the American Statistical Association. 97, 77-87.
Geman, D. , D'Avignon, C.
Naiman, D. , and Winslow, R. (2004).
Classifying Gene Expression Profiles from Pairwise mRNA Comparisons.
Statistical Applications in Genetics and Molecular Biology. 3, 1-21.
Merck Manuals, http:Pii-ww. merck.
com/mmpe/print/Isec18/ch253/ch253e.
hind. Retrieved September 3, 2009.
Golub, T. Slonim. 1).. Tamayo, P. , Huard, C.. Gaasenbeek. M. , Mesirov, J..
Coller. H. . Loh. M. Downing. J. , Caliqiuri, M. , Bloomfield, C. , and
Lander, E. (1999). Molecular Classification of Cancer: Class Discovery
and Class Prediction by Gene Expression Monitoring. Science. 286. 531537.
Guyon, I. , Weston, J. , and Barnhill, S. (2002). Gene Selection for Cancer
Classification Using Support Vector Machines. Machine Learning. 46,
289-422.
Huang, T. , Kecman. V. , and Kopriva. 1. (2006). Kernel Based Algorithms for
Mining Huge Data Sets: Supervised. Semi-Supervised, and Unsupervised
Learning. In Studies in computational intelligence Volume 17. Berlin:
Springer Verlag.
Ivan Kon Tai Kiong. Similarity-Based Robust Clustering Algorithm for Cancer
Patient Survival Analysis. Bachelor Degree Thesis. Universiti Teknologi
Malaysia, Skudai.
J. A. Hoeting. , D. Madigan. . A. E. Rafters'. and C. T. Volinsky. Bayesian
Model Averaging: A Tutorial. Statistical Science Volume 14, 4, 382-417.
Li, J. , Duan. Y. , and Ruan, X. (2007). A Novel Hybrid Approach to Selecting
Marker Genes for Cancer Classification Using Gene Expression Data.
The International Conference on Bioinformatics and Biomedical
Engineering. 1CBBE, 264-267.
Li, L. , Weinberg. C. Darden, T.. and Pedersen, L. (2001). Gene Selection for
Sample Classification Based on Gene Expression Data: Study of
Sensitivity to Choice of Parameters of the GA/KNN Method.
Bioinformatics. 17. 1131-1142.
Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery
and Data Mining. Boston: Kluwer Academic Publishers.
Nguyen, D. and Rocke, D. (2002). Tumor classification by Partial Least Square
Using Microarray Gene Expression Data. Bioinfarinatics. 18. 39-50.
Pudil. P. Novovicova, J. and Kittler, J. (1994). Floating Search Methods in
Feature Selection. Physical Review Letters. 15. 1119-1125.
Raftery. A. E. (1995). Bayesian Model Selection in Social Research (with
Discussion). In Sociological Methodology 1995. 111-196
Rosenwald, A et al. , http.7.1:1linpp, nih. gov/DLBCLI. Retrieved August 10.
2009.
StatLib, http://lih. slat. cniu. edu/S/bic. surv. Retrieved August 10, 2009.
Supplementary Web Site to Iterative BMA Public ation in Progress.
http://expression. wa.shington. edu/publications/kayee/ibnia.surv/ .
Retrieved August 10, 2009.
Tay Poh Ling (2010). Iterative Bayesian Model Averaging ,for Patients
Survival Analysis. Bachelor Degree Thesis. tJniversiti Teknologi
Malaysia. Skudai.
van 't Veer. L. J, Dal. II. Vijver, M. J. van. de.. Ile, Y. D.. Hart. A. A. , Mao,
M. Peterse, H. L. , Kooy. K. van. der. Marton. M. J. , Witteveen. A. T. ,
Schreiber, G. J. , Kerkhoven, R. M. , Roberts, C. , Linsley. P. S. .
Bernards, R.. Friend. S. H. (2002). Gene expression profiling predicts
clinical outcome of breast cancer. Nature, 415. 530-536.
Volinsky, C.. Madigan, D. , Raftery, A. E. and Kronmal. R. (1997). Bayesian
Model Averaging in Proportional Hazard Models: Assessing the Risk of
a Stroke. Applied Statistics. 46. 443-448.
Wong Jun Wei (2010). Gene Selection Using Support Vector Machine for
Patient Cancer Survival Analysis. Bachelor Degree Thesis. Universiti
Teknologi Malaysia, Skudai.
Yeung, K. Y. . Bumgarner. R. E, and Raftery. A. E. (2005). Bayesian model
averaging: development of an improved multi-class, gene selection and
classification tool for microarray data. Oxford ,Journals. 21(10), 23942402.
Merck Manuals, http:Pii-ww. merck.
com/mmpe/print/Isec18/ch253/ch253e.
hind. Retrieved September 3, 2009.
Zhou. X. and Tuck, D. P. (2007). MSVM-RFE: extensions of SVM-RFE for
multiclass gene selection on DNA microarray data. Bioinformatics. 23(9).
1106-1114.