Download Constraint-Techniques for Collaborative Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Public health genomics wikipedia , lookup

Metagenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

RNA-Seq wikipedia , lookup

Transcript
Decomposing Complex
Clinical Phenotypes by
Biologically Structured
Microarray Analysis
Claudio Lottaz and Rainer Spang
Berlin Center for Genome Based
Bioinformatics, Berlin (Germany)
Computational Diagnostics, Max Planck Institute
for Molecular Genetics, Berlin (Germany)
Overview
Overview
• Introduction
• Using functional annotation for
semi-supervised classification
• Heterogeneity vs. performance
• Evaluation on cancer related data
• Concluasions
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
2 / 21
Introduction
Tumor Classification
Patients
• Setting:
Genes
• Data: gene expression profiles
• Goal: prediction/classification
of outcome/sub-type
D
C
• More formally:
• Many expression levels measured
• Samples labelled as disease and control
• Train classifier
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
3 / 21
Introduction
State-of-the-Art
• Various powerful methods:
• Support vector machines
• Shrunken centroids...
• Regularization to fight overfitting:
• Feature selection
• Large margins...
• Common hypothesis:
Generate a single molecular signature
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
4 / 21
Introduction
Complex Phenotypes
• A single clinical phenotype may be caused
by different molecular mechanisms
• Our approach: discover several sub-classes
in disease group
• Each sub-class has a homogeneous
molecular signature
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
5 / 21
Introduction
Molecular Symptoms
• Classical signatures are globally optimal
• They have no biological focus
• Genes are corregulated thus correlated
 in a global signature genes can be
replaced with little loss
• Molecular Symptom:
• A functionally focused signature
to identify a disease sub-class
• High specificity – sub-optimal sensitivity
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
6 / 21
Introduction
Molecular Patient Stratification
• Patterns of molecular symptoms define
a molecular patient stratification
Subclass
Subclass
Another Molecular
Symptom
Diagnostic signature
Control
Molecular Symptom
Control
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
Control
7 / 21
Using Functional Annotations
Using Functionl Annotations:
A Priori vs. A Posteriori
• Common
procedure
Data
Statistical
Analysis
Functional
Annotations
Statistical
Analysis
Data
Functional
Annotations
• Our
suggestion
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
8 / 21
Using Functional Annotations
Gene Ontology
• Biological terms in
a directed graph
• Genes annotated
to terms
• Levels
represent
specificity
of terms
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
9 / 21
Using Functional Annotations
Structured Analysis
of Microarrays
• Classification in leaf nodes
• Regularized multivariate classifier
• Local signatures
• Diagnosis propagation
• Combine child diagnoses in inner nodes
• Generate more general diagnoses
• Regularization
• Shrink the classifier graph
• Remove uninformative branches
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
10 / 21
Using Functional Annotations
Leaf Node Classification
• Shrunken centroid classification
(Tibshirani et al. 2002)
• Classificatino according to distance to
centroids
• Regularization via gene shrinkage
• Determine probability-like values
as classification results
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
11 / 21
Using Functional Annotations
Propagation of Classification
• Weighted averages
• Weight according to child performance
• Weights are normalized per inner node
Pa
w1
C1
w2
C2
w3
C3
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
12 / 21
Using Functional Annotations
Graph Shrinkage
• Weights of nodes are shrunken
by a constant
• Negative weights are set to zero
 uninformative branches vanish
• Best shrinkage level chosen in crossvalidation
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
13 / 21
Heterogeneity vs. Performance
Biased Classifier Evaluation
Calibration of Sensitivity
and Specificity
Shrinkage
Parameter
Worst
Performance in
Leaf Node
Cj = DCi ( j Dj )-1
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
14 / 21
Heterogeneity vs. Performance
Classifier Heterogeneity
• Difference between two classifiers:
measures inconsistency of classifications
• Node‘s redundancy:
• Graph‘s redundancy
(K nodes of the shrunken graph)
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
15 / 21
Heterogeneity vs. Performance
Calibration
• Sensitivity vs. Specificity:
• Best classifiers: set to control prevalence
• More molecular symptoms: set  higher than
control prevalence
• Heterogeneity vs. Performance: 
• Molecular symptoms are heterogeneous
• Thus high  eliminates them
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
16 / 21
Evaluation on Cancer Related Data
Leukemia Data Set
• Data set by Yeoh et al. 2002
• Acute lymphocytic leukemia
• 327 patients of 7 clinical sub-types
• Expression profiles by HG-U95Av2
• Task for illustration
• Detect MLL sub-type
• 20 MLL samples
• 109 test set / 218 training set
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
17 / 21
Evaluation on Cancer Related Data
Functional Annotations
• Focus on GO‘s Biological Process branch
(8‘173 terms)
• 12‘625 probesets on the chip
• 8‘679 genes (68.7% of probesets)
• In 1‘359 leaf nodes
• 845 inner nodes (total 2‘204 nodes)
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
18 / 21
Evaluation on Cancer Related Data
MLL Classifier
• 2‘796 genes accessible through 32 nodes
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
19 / 21
Evaluation on Cancer Related Data
MLL Stratification
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
20 / 21
Conclustions
Conclusions
• Semi-supervised classification
• Datect sub-classes
• In labelled disease groups
• Functional annotation
• Use in an a priori fashion
• To find biologically focused signatures
 molecular symptoms
• Resolve complex clinical phenotypes
(stratification through molecular symptoms)
C. Lottaz & R. Spang : Structured Analysis of Microarrays
23-May-17
21 / 21