Download P - CS

Comparative Expression Moran Yassour + = Goal  Build a multi-species gene-coexpression network    Find functions of unknown genes Discover how the genes interact Distinguish between accidentally regulated genes from those that are physiologically important Construction of a genecoexpression network.  Evolutionarily diverse organisms with extensive microarray data:      Homo sapiens Drosophila melanogaster Caenorhabditis elegans Saccharomyces cerevisiae. We first associated genes from one organism with their orthologous counterparts in other organisms. Evolution 101  Paralogs vs. Orthologs Evolution 101  Paralogs vs. Orthologs Construct a metagene identify connected components ignore nonreciprocal hits Human gene Fly gene  Worm gene Yeast gene MEG best BLAST hit Using this method, we assigned each gene to at most a single metagene. Some numbers   In total we have 6307 metagenes (6591 human genes, 5180 worm genes, 5802 fly genes, and 2434 yeast genes.) We sought to identify pairs of metagenes that not only were coexpressed in one experiment and in one organism but that also showed correlation in diverse experiments in multiple organisms. Edges in the graph Human Fly 1 Worm 5 5 1 3 2 4 2 4 3 4 5 2 1 MEG1 3 ? MEG2 2 MEG1 4 2 {2,4,2} significant ? MEG2 (P-value <? 0.05)  draw an edge Statistical tests (1) – permuted metagenes   Construction of a network from a set of permuted metagenes (random collection of genes from each organism) At P < 0.05, the real networks contained 3.5 ± 0.03 times as many interactions as the random networks contained Statistical tests (2) – half the data    Split microarray data into halves  two networks We then counted the fraction of interactions that were significant in one network (P < 0.05), given that they were significant in the other network at P < p for various values of p. P = 0.05  41% significant expression interactions  We added increasing levels of Gaussian noise to the entire data set for each of the organisms. Noise negative log P-value Statistical tests (3) – noise stability Real network negative log P-value Visualization    x-y plane – negative logarithm of P value K-means clustering z axis – density of genes in the region function  region function  network Example – Component 5  A total of 241 metagenes     110 of which were previously known to be involved in the cell cycle. 202 cell cycle metagenes in the network. P-value < 10-85 Of the 241 cell cycle metagenes:    30 – regulating the cell cycle. 80 – terminal cell cycle functions. 131 – unknown. Experimental validation (1) – expression data   Five metagenes with a significant number of links to known cell proliferation genes. Measuring expression levels in dividing pancreatic cancer cells and in nondividing normal cells. Experimental validation (2) – loss-of-function mutant   loss-of-function mutant phenotype for one of these genes (C. elegans gene ZK652.1) RNA interference (RNAi) of ZK652.1 resulted in excess nuclei in the germ line, suggesting that the wildtype function of this gene is to suppress germline proliferation. Multi-species vs. single species (1)   For each gene (of the five metagenes), we constructed an organism-specific neighborhood. On average, the neighborhoods of these five genes were over four times more enriched for cell proliferation and cell cycle genes in the multiple-species network than they were in the best single-species neighborhood. Multi-species vs. single species (2)  Trying to link together   genes that were previously known to be involved in a single function (coverage) excluding genes not known to participate in that function (accuracy) Huge data   The multiple-species network was built from more DNA microarray data (3182). Construction of the network out of only 979 DNA microarrays (as in the worm data set) gave similar results. Summary - Multi is good    We map only genes that have orthologs in other species and thus focuses strongly on core, conserved biological processes; Interactions in the multiple-species network imply a functional relationship based on evolutionary conservation. Nice to have – analysis of other components. Goal  Comparative study of large datasets of expression profiles from six evolutionarily distant organisms: Goal    Coexpression is often conserved. Comparing the regulatory relationships between particular functional groups in the different organisms. Comparing global topological properties of the transcription networks derived from the expression data, using a graph theoretical approach. Homologous gene with preserved function Coexpression conservation   Coexpressed groups - yeast transcription modules For each yeast module we constructed five “homologue modules”. Refining homologue modules   The signature algorithm identifies those homologues that are coexpressed under a subset of the experimental conditions. Furthermore, it reveals additional genes that are not homologous with any of the original genes, but display a similar expression pattern under those conditions Correlation distribution  the distribution of the Z-scores for the average gene–gene correlation of all the “homologue modules” Higher-order regulatory structures Cell Cycle Experiments Subsets of the data   Correlations between the sets of conditions for randomly selected subsets of the data. Although the data is sparse , the findings reflect real properties of the expression network. Decomposition of the expression data   Decomposition of the expression data into a set of transcription modules using the iterative signature algorithm (ISA) Modules are colored according to the fraction of homologues they possess in the other organism Protein synthesis Power-law connectivity distribution k ( n) ~ k    1.1  1.8 Connections & Connectivity   Connections between genes of similar connectivity are enhanced (red regions) Connections between highly and weakly connected genes are suppressed (blue) Essentiality & Connectivity  The likelihood of a gene to be essential increases with its connectivity. Homology & Connectivity  The highly connected genes are more likely to have homologues in the other organisms Summary  Similarity in lower resolution, differences in higher resolution:   All expression networks share common topological properties (scale-free connectivity distribution, high degree of modularity). The modular components of each transcription program as well as their higher-order organization appear to vary significantly between organisms and are likely to reflect organism-specific requirements. Future   Gene expression studies Evolution studies Thank you …

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download P - CS