Download Slide

Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik Journal Club November 14, 2006 Essential genes and their families: diverge more slowly than nonessential genes diverge to a greater extent than non-essential genes Why this happens? What parameters are responsible? unanswered paralogous families with essential genes: E-families evolution through paralogy tolerance to mutations -> extent of evolution within the family paralogous families without essential genes: N-families Essential genes definition: Genes that when mutated can result in a lethal phenotype. Type of selection acting on evolving genes: purifying selection. What is purifying selection? The ratio Ka/Ks <1 Ka is the number of nonsynonymous mutations per site Ks is the number of the synonymous mutation per site fraction of essential genes that are not singletons 3.5% 1.9 1.3 13.7 ratio of non-essential to essential genes in E-families 9.2% 18.4% Most of essential genes do not have paralogs - Why? Is there something special about those which do have paralogs? No answer in this paper… How can a gene have paralogs and still be essential? - All the paralogs together cannot replace all the function of the essential gene. Once this happens, the gene becomes non-essential. Divergence and diffusion graph. Edges represent homology relationships Significantly fewer edges between paralogs in E-families How were the families assembled? Construction of paralogous families. Each ORF is a node on a graph. 1. Do all-vs.-all Blast comparison of sequences of all translated ORFs within organis 2. Measure amino acid identity level between nodes 3. Translate amino acids to nucleotides and calculate Ks (synonymous substitution per site) and Ka (nonsynonymous substitutions) The result is 3 weighted graphs (as defined by 1, 2, and 3). A paralogous family consist of strongly connected components of the graph. A cutoff of Ks=5 and E-value 1e-15 are used in this work. In general there is a near-linear dependency of cutoff on Ks. Do non-essential members always evolve from essential memebers of the family? Largest families What is a typical size of E-family and of N-family? Are N-families typically larger? Are there more N-families than E-families? Both? Can a duplicate of nonessential paralog become essential? How paralogous families evolve: After duplication and divergence the following may happen: A more typical scenario for N-families a. Nonfunctionalization: a duplicate turns into pseudogene b. subfnuctionalization: multiple functions of the ancestral gene are divided between the paralogs c. neofuntionalization: one of the paralogs evolves a new function, the other keeps the old function(s) More common for E-families Purifying selection is stronger in E-familes (about 2 times) – Ka/Ks ratio is lower in E-families Implication: N-families diverge faster… How this is done: 1. For single feature polymorphism (SFP): check within Saccharomyces cerevisiae 2. For Ka/Ks ratio compare orthologs between closely related species (S.cerevisiae/S.paradoxus – yeast; E.coli K12/CFT073 orthologs ) Rate of conversion to peudogene is substantially higher in N-families 6.8 fold difference Paralogs get fixated more often in N-families (explains the larger size of N-families?) Equal rate of duplication in E-families and in N-families is assumed. What happens to the paralogs that do not go to fixation? Do they become pseudogenes, something else? Ks is higher in E-families, than in F-families Implication: paralogs in E-families stick around for a longer time, than in Nfamilies (3 times longer) Sequence divergence is higher in E-families nonsynonomous substitutions among paralogs within the family sequence identity among paralogs within the family It is possible to identify E- and N-families using only sequence divergence information. ROC plot (true positives) Clustering coefficient measures now well connected are the neighbors of a given node in a graph. (true negatives) Transcriptional regulation of paralogs changes more in E-families: paralogs rarely share trancriptional factors ChIP-cip experiments Summary: Two types of paralogous families exist: E-families and N-families Two type of families have dramatically different dynamics of molecular evolution: E-families diverge slowly, but persist for a long periods of time, thus diverging further than the paralogs in N-families N-families undergoes a more dynamic evolution: many duplicate get fixated, many other become pseudogenes. Level of sequence divergence is significantly lower. Duplicate in E-families typically assume part of the functions from the original gene and/or evolve a new function. This is less so with duplicates in N-families (no data shown for this…) My musings: In a minimalistic organism every gene would be an essential gene. The gene becomes non-essential when its functions are assumed by other gene or split between several genes. Every non-essential gene will go through the stage of being in an E-family in which one there is one essential gene. N-families gradually evolve from E-families, when the essential gene(s) in the family is not essential any longer. This happens when sufficient number of duplicates exist to assure that all function of the original essential gene are covered. In this scenario, the E-families are the transition link between essential genes on their way to become non-essential. (You could argue that more robust organism has less essential genes…) Essential genes (singleton) careful evolution Transition to non-essentiality (E-families) very careful creeping forward Non-essential genes (N-families) careless evolution Different selection pressures in each category? – Yes. But… how does the behavior of the family changes once it crosses from E-family to N-family?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide