Download 2007-09-19

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
An improved metric for the
comparison of RNAi knockout
phenotypes
XX
Background
• RNAi can effectively ‘knock out’ a gene
• Large-scale studies systematically perform
RNAi on many genes, identify phenotypes
• Embryonic Lethal, Uncoordinated, Thin…
Background
• Phenotypes can be thought of as gene
descriptors
• Each gene has a binary vector, with each
entry corresponding to a single phenotype
• Classic information theory setup
Previous methods
• Classic approach: given a collection of genes,
“eye them up” for common phenotypes
• Piano 2002. “Gene Clustering Based on RNAi
Phenotypes of Ovary-Enriched Genes in C.
elegans”
 Gunsalus 2004. “RNAiD and PhenoBlast: web
tools for genome-wide phenotypic mapping
projects.”
• Gunsalus 2005. “Predictive models of molecular
machines involved in Caenorhabditis elegans
early embryogenesis”
Tested metrics
PREVIOUS METRICS
• Pearson Correlation
• Uncentered Pearson Correlation
• Simple Match (1s)
• Simple Match (1s and 0s)
NOVEL METRICS
• “Scaled Match”
• “Loss of function agreement score”
IDF AND RELATED
• Inverse Document Frequency (IDF)
• Frequency Dot Product (FDP)
• Residual IDF
• Scaled IDF
OTHER
• CanB
• Euclidean Distance
• Hamming Distance
• Jaccard Distance
• Mutual Information
• Rand Index
Precision/Recall
Network Degree Distributions
Shared Phenotypes per linked
gene pair
Overview of
subnetwork
phenotypes
Number of enriched phenotypes
per subnetwork
Subnetwork coverage of best GO
category
Circularity Issues
• Go is basically built from knockout phenotypes
• Makes it very hard to evaluate predictions on a
large scale
• 19/35 phenotypes overlap a GO category by at
least 50% (several overlap a few)
• For example, 71 genes have the ‘Sluggish
Movement’ (SLU) phenotype. Of these, 70 are in
the ‘positive regulation of locomotion’ category,
which itself is comprised of only 82 genes.
Future Work
• Smaller subnetworks (or clustering)
• How well does the new phenotype data integrate
with other functional data (co-expression, p2p,
genetic, combination)?
–
–
–
–
Metric level
Network level
Triangle level
Subnetwork level
• Look for interesting biology in 9 novel subnetworks
Related documents