Download A gene expression analysis system for medical diagnosis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Secreted frizzled-related protein 1 wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene desert wikipedia , lookup

Gene expression wikipedia , lookup

Gene therapy wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene regulatory network wikipedia , lookup

Community fingerprinting wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
A gene expression analysis
system for medical diagnosis
D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas
University of Athens
Dept. of Informatics and Telecommuncations
Objectives



A system to support medical diagnosis using
molecular level information
Efficient classification of pathological
conditions into multiple classes
A user friendly interface for physicians and
biologists
DNA Microarrays
Microscope glasses
Thousands of spots
Spot  cDNA part
DNA Microarrays
Gene expression level
(feature)
DNA Microarrays
Gene expression vector
(feature vector)
DNA Microarrays
Gene expression matrix
(data set)
Gene expression analysis tools




Image processing & analysis for microarray
spot detection
Visualization & clustering for discovery of
unknown classes of pathological conditions
Gene ranking for identification of
differentially expressed marker genes
Supervised classification of gene expression
vectors into known classes
Gene expression analysis tools











GeneClust
dChip
Clusfavor
Genesis
Snomad
Base
TM4 Suite
RankGene
Excavator
KnowledgeEditor
ArrayNorm
Do et al, 2000
Li & Wong, 2001
Peterson, 2002
Sturn et al, 2002
Collantuoni et al, 2002
Saal et al, 2002
Saeed et al, 2003
Yang et al, 2003
Xu et al, 2003
Toyoda & Konagaya, 2003
Pieler et al, 2004
Today’s challenge


None of the existent tools takes
into account the usability profile of
a physician or a biologist
Such tools could hardly be used in
everyday medical practice
Supervised approaches

Most known supervised approaches have
been applied to classification of gene
expression vectors
– Linear discriminant analysis
– k-nearest neighbors
– Parzen windows
– Decision trees
– Neural networks, etc.

Support Vector Machines
(Brown et al, 2000; Furey et al, 2000; Ryu & Cho, 2000;
Dudoit et al, 2002; Lu & Han, 2003; Aliferis et al, 2003)
Support Vector Machines



Robust binary classifiers
Not easily affected by the dimensionality
of the feature vectors
SVM methods for classification into
multiple classes
– One vs one
– One vs all
– Directed Acyclic Graph (DAG)
– Weston & Watkins
– Cramer & Singer
(Weston & Watkins, 1999; Platt, 2000;
Yeang et al, 2001; Cramer & Singer, 2001; Hsu & Lin, 2002)
About multiclass SVM classifiers



They all lead to comparable results
They utilize a common, constant set of
genes as input in each SVM node
They assume that the various
pathological conditions correspond to
separable clusters in the same gene
space
(Hsu et al, 2002; Lee et al, 2003; Statnikov et al, 2004)
The proposed approach

We consider the fact that
– Only a small subset of genes is differentially
expressed for each type or subtype of a
pathological condition

We propose
– The combination of SVMs in a cascading
architecture that embodies gene selection in
its structure
Cascading architecture
Pre-processing Unit
Diagnostic Unit
Classifies input vector x into ω1, ω2,… ωΝ
Cascading architecture
Pre-processing Unit
Diagnostic Unit
Poor quality cDNA targets generate missing values
(Trovanskaya et al, 2001)
Cascading architecture
Pre-processing Unit
Diagnostic Unit
Normalization facilitates comparability of samples
(Zhang & Shmulevich, 2002)
Cascading architecture
Pre-processing Unit


Diagnostic Unit
A subset of genes is selected by ranking
for each block
Three ranking criteria are available
Gene ranking criteria
Cascading architecture
The classification module Cj is autonomously trained
using a subset Xj of the available training samples
X j  x   j  h , h 
N

p  j 1
p
Cascading architecture
A standard binary SVM classifier implements each
classification module
Model selection


The best architecture is determined by
leave one out cross validation
Selection bias is minimized
– Gene selection and parameter tuning take
place on the training samples during each
iteration of the leave one out
(Ambroise & McLahian, 2002; Varma & Simon, 2006)
Graphical User Interface
Results

Prostate cancer data

112 samples (patients)

Classes
– 62 primary prostate tumors
– 41 normal prostate specimens
– 9 pelvic lymph node metastases

44016 gene expressions per sample
(Lapointe et al, 2004)
Results
Minimum error 6.3% using 1 input gene
Results

Colon cancer dataset
(Alon et al, 1999)
– Minimum classification error 9.7%

Lung cancer dataset
(Bhattacharjee et al, 2001)
– Minimum classification error 1.5%
Conclusions




We presented a user friendly system that
implements a cascading SVM architecture
It aims to the classification of gene
expression data into known classes
The cascading architecture automatically
tunes its parameters and determines its
optimal configuration
In most cases leads to a diagnostic
accuracy that exceeds 90%
Conclusions




Its performance is usually better than
one-vs-one SVM combination method
It utilizes N-1 binary SVM classifiers,
whereas one-vs-one utilizes N(N-1)/2
It could be used in everyday clinical
practice
Within our future perspectives is the
adoption of incremental learning
approaches
Thank you