Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
An improved metric for the comparison of RNAi knockout phenotypes XX Background • RNAi can effectively ‘knock out’ a gene • Large-scale studies systematically perform RNAi on many genes, identify phenotypes • Embryonic Lethal, Uncoordinated, Thin… Background • Phenotypes can be thought of as gene descriptors • Each gene has a binary vector, with each entry corresponding to a single phenotype • Classic information theory setup Previous methods • Classic approach: given a collection of genes, “eye them up” for common phenotypes • Piano 2002. “Gene Clustering Based on RNAi Phenotypes of Ovary-Enriched Genes in C. elegans” Gunsalus 2004. “RNAiD and PhenoBlast: web tools for genome-wide phenotypic mapping projects.” • Gunsalus 2005. “Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis” Tested metrics PREVIOUS METRICS • Pearson Correlation • Uncentered Pearson Correlation • Simple Match (1s) • Simple Match (1s and 0s) NOVEL METRICS • “Scaled Match” • “Loss of function agreement score” IDF AND RELATED • Inverse Document Frequency (IDF) • Frequency Dot Product (FDP) • Residual IDF • Scaled IDF OTHER • CanB • Euclidean Distance • Hamming Distance • Jaccard Distance • Mutual Information • Rand Index Precision/Recall Network Degree Distributions Shared Phenotypes per linked gene pair Overview of subnetwork phenotypes Number of enriched phenotypes per subnetwork Subnetwork coverage of best GO category Circularity Issues • Go is basically built from knockout phenotypes • Makes it very hard to evaluate predictions on a large scale • 19/35 phenotypes overlap a GO category by at least 50% (several overlap a few) • For example, 71 genes have the ‘Sluggish Movement’ (SLU) phenotype. Of these, 70 are in the ‘positive regulation of locomotion’ category, which itself is comprised of only 82 genes. Future Work • Smaller subnetworks (or clustering) • How well does the new phenotype data integrate with other functional data (co-expression, p2p, genetic, combination)? – – – – Metric level Network level Triangle level Subnetwork level • Look for interesting biology in 9 novel subnetworks