* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A gene expression analysis system for medical diagnosis
Secreted frizzled-related protein 1 wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Molecular evolution wikipedia , lookup
Genome evolution wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression wikipedia , lookup
Gene therapy wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene regulatory network wikipedia , lookup
Community fingerprinting wikipedia , lookup
Silencer (genetics) wikipedia , lookup
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas University of Athens Dept. of Informatics and Telecommuncations Objectives A system to support medical diagnosis using molecular level information Efficient classification of pathological conditions into multiple classes A user friendly interface for physicians and biologists DNA Microarrays Microscope glasses Thousands of spots Spot cDNA part DNA Microarrays Gene expression level (feature) DNA Microarrays Gene expression vector (feature vector) DNA Microarrays Gene expression matrix (data set) Gene expression analysis tools Image processing & analysis for microarray spot detection Visualization & clustering for discovery of unknown classes of pathological conditions Gene ranking for identification of differentially expressed marker genes Supervised classification of gene expression vectors into known classes Gene expression analysis tools GeneClust dChip Clusfavor Genesis Snomad Base TM4 Suite RankGene Excavator KnowledgeEditor ArrayNorm Do et al, 2000 Li & Wong, 2001 Peterson, 2002 Sturn et al, 2002 Collantuoni et al, 2002 Saal et al, 2002 Saeed et al, 2003 Yang et al, 2003 Xu et al, 2003 Toyoda & Konagaya, 2003 Pieler et al, 2004 Today’s challenge None of the existent tools takes into account the usability profile of a physician or a biologist Such tools could hardly be used in everyday medical practice Supervised approaches Most known supervised approaches have been applied to classification of gene expression vectors – Linear discriminant analysis – k-nearest neighbors – Parzen windows – Decision trees – Neural networks, etc. Support Vector Machines (Brown et al, 2000; Furey et al, 2000; Ryu & Cho, 2000; Dudoit et al, 2002; Lu & Han, 2003; Aliferis et al, 2003) Support Vector Machines Robust binary classifiers Not easily affected by the dimensionality of the feature vectors SVM methods for classification into multiple classes – One vs one – One vs all – Directed Acyclic Graph (DAG) – Weston & Watkins – Cramer & Singer (Weston & Watkins, 1999; Platt, 2000; Yeang et al, 2001; Cramer & Singer, 2001; Hsu & Lin, 2002) About multiclass SVM classifiers They all lead to comparable results They utilize a common, constant set of genes as input in each SVM node They assume that the various pathological conditions correspond to separable clusters in the same gene space (Hsu et al, 2002; Lee et al, 2003; Statnikov et al, 2004) The proposed approach We consider the fact that – Only a small subset of genes is differentially expressed for each type or subtype of a pathological condition We propose – The combination of SVMs in a cascading architecture that embodies gene selection in its structure Cascading architecture Pre-processing Unit Diagnostic Unit Classifies input vector x into ω1, ω2,… ωΝ Cascading architecture Pre-processing Unit Diagnostic Unit Poor quality cDNA targets generate missing values (Trovanskaya et al, 2001) Cascading architecture Pre-processing Unit Diagnostic Unit Normalization facilitates comparability of samples (Zhang & Shmulevich, 2002) Cascading architecture Pre-processing Unit Diagnostic Unit A subset of genes is selected by ranking for each block Three ranking criteria are available Gene ranking criteria Cascading architecture The classification module Cj is autonomously trained using a subset Xj of the available training samples X j x j h , h N p j 1 p Cascading architecture A standard binary SVM classifier implements each classification module Model selection The best architecture is determined by leave one out cross validation Selection bias is minimized – Gene selection and parameter tuning take place on the training samples during each iteration of the leave one out (Ambroise & McLahian, 2002; Varma & Simon, 2006) Graphical User Interface Results Prostate cancer data 112 samples (patients) Classes – 62 primary prostate tumors – 41 normal prostate specimens – 9 pelvic lymph node metastases 44016 gene expressions per sample (Lapointe et al, 2004) Results Minimum error 6.3% using 1 input gene Results Colon cancer dataset (Alon et al, 1999) – Minimum classification error 9.7% Lung cancer dataset (Bhattacharjee et al, 2001) – Minimum classification error 1.5% Conclusions We presented a user friendly system that implements a cascading SVM architecture It aims to the classification of gene expression data into known classes The cascading architecture automatically tunes its parameters and determines its optimal configuration In most cases leads to a diagnostic accuracy that exceeds 90% Conclusions Its performance is usually better than one-vs-one SVM combination method It utilizes N-1 binary SVM classifiers, whereas one-vs-one utilizes N(N-1)/2 It could be used in everyday clinical practice Within our future perspectives is the adoption of incremental learning approaches Thank you