Download Classification and diagnostic prediction of cancers using gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Classification and diagnostic prediction of
cancers using gene expression profiling
and artificial neural networks
From Nature Medicine 7(6) 2001
By Javed Khan et al.
(Summarized by Kyu-Baek Hwang)
Abstract

Small, round blue-cell tumors (SRBCTs)
 Four distinct categories hard to discriminate


cDNA microarray and artificial neural networks (ANNs)
Tumor diagnosis and the identification of candidate targets
for therapy
The Problem

SRBCTs of childhood
 Neuroblastoma (NB)
 Rhabdomyosarcoma (RMS)
 Non Hodgkin lymphoma (NHL)
 The Ewing family of tumors (EWS)



All four distinctions have similar appearances in routine
histology.
Accurate diagnosis is essential.
In clinical practice,
 Immunohistochemistry: the detection of protein expression
 Reverse transcription-PCR: tumor-specific translocation
• EWS-FLI1 in EWS and the PAX3-FKHR in ARMS
The Approach

Gene-expression profiling using cDNA microarrays
 A simultaneous analysis of multiple markers


Multiple categorical distinctions
Artificial neural networks (ANNs)
 Diagnosing myocardial infarcts
 Diagnosing arrhythmias from electrocardiograms
 Interpreting radiographs
 Interpreting magnetic resonance images
The Experiment


cDNA microarray with 6,567 genes
63 training examples
 Tumor biopsy material
 Cell lines

Filtering for a minimal level of expression
 2,308 genes

PCA further reduced the dimensionality.
 10 dominant PCA components were used. (63% of the variance in
the data matrix)



Three-fold cross-validation
3,750 ANNs were constructed. (average vote)
No overfitting and zero classification error in the training
sample
The Schematic View of the Analysis
Process
Summed Square Error Graph
Optimizations of Genes Utilized for
Classification


Using 3,750 trained models, rank all genes according to
their significance for the classification.
Determine the classification error rate using increasing
number of these ranked genes.
Recalibrating the ANNs


Using only 96 genes, the analysis process was repeated.
Zero classification error
Multi-Dimensional Scaling (MDS)

Using 96 genes
Hierarchical Clustering of 96 Genes
- 93 unique genes (3 IGF2 and two MYC)
- 13 ESTs
- 41 genes have not been reported as
associated with these diseases.
- Perfect clustering of four categories
Diagnostic Classification


25 test examples (5 non-SRBCTs)
If a sample falls outside the 95th percentile of the
probability distribution of distances between samples and
their ideal output, its diagnosis is rejected.
Expression of FGFR4 on SRBCT Tissue
Array


At the protein level, Immunohistochemistry on SRBCT
tissue arrays for the expression of fibroblast growth factor
receptor 4 (FGFR4)
FGFR4
 Expressed during myogenesis (not in adult muscle)
 Potential role in tumor growth
 Prevention of terminal differentiation in muscle

Strong cytoplasmic immunostaining for FGFR4 was seen
in all 26 RMSs tested.
Discussion


Current diagnoses of tumors rely on histology (morpholgy)
and immunohistochemistry (protein expression).
Using cDNA microarrays
 Multiple markers (robust)
 Reveal the underlying genetic aberrations or biological processes.


Tumors and cell lines
Cell lines for ANN calibration
Related documents