* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1 D DISCRETE WAVELET TRANSFORM FOR CLASSIFICATION OF Adarsh Jose
Minimal genome wikipedia , lookup
Primary transcript wikipedia , lookup
Public health genomics wikipedia , lookup
United Kingdom National DNA Database wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Point mutation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Ridge (biology) wikipedia , lookup
Transposable element wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics in learning and memory wikipedia , lookup
Metagenomics wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Genome (book) wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome editing wikipedia , lookup
Genome evolution wikipedia , lookup
Oncogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
History of genetic engineering wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Helitron (biology) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
1 D DISCRETE WAVELET TRANSFORM FOR CLASSIFICATION OF CANCER SAMPLES IN DNA MICROARAY DATA Adarsh 1 Jose , Dale 1 Mugler , PhD, Zhong-Hui 2,3 Duan , PhD. 1. Department of Biomedical Engineering, The University of Akron 2. Department of Computer Science, The University of Akron 3. Integrated Biosciences Program, The University of Akron Abstract The most important problem in applying Supervised Learning methods for classifying cancer samples using the gene expression profiles, is the limited availability of the samples. So selecting the relevant features is imperative for optimizing the classification algorithms. A feature(gene) selection method using 1D Discrete Wavelet Transforms is proposed for addressing ‘two class’ problems in DNA microarray data. Gene Expression: The process by which encoded information from DNA is converted into actual structures in cells. The subset of ‘expressed genes’ and their ‘expression levels’ form a characteristic of the state of the cell. DNA microarrays: Allows measurement of expression levels of thousands of genes simultaneously. Entire genome can be probed at a single point of time. It is based on base pair attraction between complementary pairs in the DNA and RNA strands. The microarray technology quantifies the notion of gene expression. Classification problem of microarray data Training sets with class labels Feature Selection Training Classifier Validation using Testing Set Problem : The number of features(genes) is very large compared to the number of samples. Solution: To reduce the feature size by ‘selecting’ or ‘extracting’ the ‘most relevant’ features. What is the wavelet transform ? Datasets • Leukemia dataset - 48 ALL & 25 AML Samples • B-Cell Lymphoma dataset – 58 DLBLC & 10 FCC Results & Observations • The algorithm was tested for classification accuracy on the oligonucleotide datasets by using KNN Classifier and 3 different validation methods for different variable sizes . • ‘Haar’ and ‘Bior1.5’ wavelets gave accuracy of up to 97%. • The average classification error is less than 11% in both the oligonucleotide datasets studied. • ‘Shuffling’ the samples within each class ‘DOES NOT’ have any effect on the accuracy. Procedure 1 D Discrete Wavelet Transform • Break down the signal into different frequency bands. • Implemented by sending the signal through a series of high pass and low pass half band filters. Conclusions Wavelet Decomposition Examples • 1-D Discrete Wavelets can capture patterns in Gene – Expression data which makes it a potential tool for feature selection. • A complete Error Estimation study has to be carried out with microarray data obtained from different platforms. References 1. T.R Golub et al. Molecular Classification of cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, www.sciencemag.org, SCIENCE, VOL 286 (1999) • The samples are grouped into the 2 classes. • 1-D Discrete Wavelet Transform to Level 3 of gene was taken. • Gene expression profile reconstructed using Level 3 approx. only. • Score = abs(mean(class1) – mean(class2)) • Genes were ranked by their scores . 2. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS, Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg DS, Lander ES, Aster JC, Golub TR. Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning Nat Med 2002 Jan;8(1):68-74. 3.Matlab manual – Matlab Wavelet toolbox, Matlab Bioinformatics toolbox. Mathworks