Download Top Scoring Pair

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Transposable element wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Gene desert wikipedia , lookup

Gene wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene nomenclature wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Designer baby wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene expression profiling wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Classification Using Top Scoring
Pair Based Methods
Tina Gui
Outline





Introduction
Top Scoring Pair
Experiments Design
Future Work
Conclusion
Introduction
 Using DNA microarray technology, the limitations of
current methods are1:
1. Small Samples
2. Lack of Interpretability
 Objective:
Differentiate between two classes by finding pairs of genes
whose expression levels typically invert from one class to the
other1.
1. D. Geman, C. d'Avignon, D. Naiman and R. Winslow (2004). "Classifying gene
expression profiles from pairwise mRNA comparisons".
Approaches
 Rank-Based Approach
 Drawback: Information is lost using this procedure
 Comparison-Based Approach
 In some cases, accurate prediction can be achieved by
comparing the expression levels of a single pair of genes
 Simple example to classifying gene expression profiles
- Top Scoring Pair (TSP) Classifier
Top Scoring Pair
 G genes whose expression levels X = {X1, X2, … XG}
 Each profile X has a true class label in {1, 2, … C}
 Ex. C = 2
 Marker Gene Pairs (i, j)
 a significant difference in the probability of Xi < Xj from
class 1 to class 2
 profile classification is then based on the collection of
distinguished pairs
Top Scoring Pair
 The quantities of interest
pij(c) = P (Xi < Xj|c), c = 1, 2
(P, probabilities of observing Xi < Xj in each class)
 Expression values
Δij = |pij(1) − pij(2)|
(Δij , the “Score” of (i, j). )
Top Scoring Pair
 Rank the Expression Values
 Rank the scores Δij from largest-to-smallest
 Select all pairs achieving the Top score.
 Example of scoring a gene pair:




52 profiles -> class 1
50 profiles -> class 2
pij(1) = 50/52
pij(2) = 3/50
Top Scoring Pair Classifier
 Computing the score
Notes: Since pij(1) > pij(2), the classifier based on this gene pair
votes for class 1 for a profile with Xi < Xj and for class 2 otherwise
K-TSP Classifier
 In some instances, the TSPs may change when the
training data are perturbed by adding or deleting a
few examples
 K-TSP classifier uses the k top scoring disjoint gene
pairs from the list
 Increasing the accuracy of the TSP classifier
Experiments Design
 Baseline
 Augmented Space
 Alternate Space
Baseline
 Raw Data
A1
..
A13
M
..
A21
..
A45
..
AM
TSP classifier
(A13 : A45)
(A7 : A21)
(A1 : A72)
(A1 : A25 )
:
:
(Ax : Ay)
N
Augment
 Adding top ranked pairs
A1
..
A72
A7_45 A13_21 A1_72
..
M+K
Aa_b
K-TSP classifier
(A13 : A45)
(A7 : A21)
(A1 : A72)
(A1 : A25 )
:
:
(Aa : Ab)
N
K
Alteration
 Deal with the K-TSP columns only
A7_45
N
A13_21
A1_72
K
..
Ax_y
Future Work
 Combination of Decision Tree and Top Scoring Pairs1
1. Czajkowski M, Krȩtowski M. (2011) “Top Scoring Pair Decision Tree for
Gene Expression Data Analysis,”
Conclusion
 TSP classifier predictions are based entirely on the
top-scoring pairs.
 Beauty of Top Scoring Pair - Simplicity
 Main Goal - Improve the classification accuracy