* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Evolution of paralogous proteins
Survey
Document related concepts
Transcript
Evolution of paralogous proteins Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Sections 7.1 - 7.3 Advantagous duplications Duplication of a complete protein-coding gene results in two identical duplicons: Encode same protein and express it in the same way as original Duplication will be advantageous if increased supply of gene product is advantageous (e.g histones or ribosomal proteins) Duplication by retroposition results in a processed duplicon: Same protein encoded but expression likely to be different Duplication will be advantageous if change in expression pattern (tissue specificity, developmental stage) has biological advantage Fully redundant functions (neutral duplications) may be lost through drift but may persist long enough to acquire an advantageous function New proteins generally arise through modulation of existing ones Advantageous duplication of unprocessed genes Duplications are advantageous and maintained by positive selection if having multiple genes performing the same function ensures enhanced efficiency e.g. histones Positive selection for “advantageous” gene duplications can be observed in: Insects exposed to insecticides Tumours and protozoa exposed to drugs Various stomach lysozymes in ruminants: Seven closely related mRNAs encoding lysozymes Multiple lysozyme genes arose through duplication Serial duplication of lysozyme gene was an adaptive response to increase the expression of lysozyme Advantageous duplication of processed genes Due to changes in chromosomal environment and/or 5’ regulatory regions, processed gene may have altered regulatory features and may not be competing with its progenitor Lower fidelity of reverse transcription means that duplicon may have deleterious mutations and be lost Two examples of positive selection increasing the chance of survival of testis-specific processed genes: Phosphoglycerate kinase retrogene Pyruvate dehydrrogenase E1a retrogene The phosphoglycerate kinase retrogene Two functional PGK loci in the mammalian genome: PGK-1 is X-linked and expressed constitutively in all somatic cells PGK-2 is a functional autosomal gene expressed in a tissue-specific manner exclusively in the late stages of spermatogenesis PGK-2 lacks introns and has a poly(A) tail – processed gene Evolution of PGK-2 was a compensatory response to inactivation of the X-linked gene before meiosis: Mature spermatozoa require PGK to metabolise fructose in semen X-inactivation called for a functional autosomal PGK locus Unequal crossing-over would not have solved this problem Processed gene had an initial advantage since it permitted the expression of PGK in a tissue where the X-linked gene was inactivated Subsequently evolved a testis-specific promoter The pyruvate dehydrogenase E1a retrogene PDH E1a subunit of the PDH complex is located on the X chromosome and is expressed in somatic tissues Another PDH E1a locus is found on chromosome 4: Testis-specific and expressed in postmeiotic spermatogenic cells Lacks introns, has a downstream poly(A) tract and is flanked by a pair of 10 bp direct repeats After the last meiotic division, spermatids rely on energy from pyruvate for the maturation process: PDH E1a is essential: X-chromosome is inactivated in postmeiotic spermatogenic cells Only half the cells contain an X-chromosome Evolution of an alternative, non-X-linked gene Neutral duplications If there is no selective advantage from a duplication event, functional constraints protecting the new gene from deleterious mutations may be relaxed: May ultimately be converted to a pseudogene Many clusters of duplicated genes contain pseudogenes Advantageous mutations, either in the coding sequence or the regulatory region, may lead to positive selection Where there is a major change of function, several critical sites may be involved: New function might not be fully manifested until several sites have adapted Early mutational steps may be selectively neutral Visual pigment proteins Old World primates have three colour-sensitive proteins: Green- and red-absorbing photoreceptors are encoded by a pair of closely related (96% identity), closely linked genes on the X chromosome - suggests very recent gene duplication Blue photoreceptor is encoded by an autosomal gene New World monkeys have only one X-linked pigment: Duplication must have occurred in the ancestor of Old World monkeys after divergence from New World monkeys Humans, apes and Old World monkeys can discriminate three colours whereas New World monkeys can only distinguish two Prior to emergence of three pigments, rate of nonsynonymous substitution exceeded synonymous rate – suggests positive selection for three-colour vision Serine proteinases and their inhibitors Pancreatic proteinases (trypsin, chymotrypsin and elastase) illustrate paralogues with minor modifications of function: Strikingly similar three-dimensional structures Very different substrate specificities: – Elastase cleaves residues with small, non-polar side chains (Ala, Val etc.) – Chymotripsin cleaves at bulky, hydrophobic residues (Phe-X, Tyr-X etc.) – Trypsin cleaves only Arg-X or Lys-X Advantage of having multiple digestive proteinases is clear: their combined activities ensure more efficient utilisation of proteins in foodstuffs. Original duplicons survived since they acquired advantageous mutations that diversified their function Serine proteinases and their inhibitors Molecular basis of differences in substrate specificity can be rationalised from three-dimensional structure and understanding of the catalytic mechanism: Specificity of trypsins for Arg or Lys residues is due to: – Deep substrate binding site which can accommodate side chains – Asp-189 residue at the bottom of the pocket which neutralises the Arg / Lys residue of the substrate Specificity of chymotrypsin for bulky aromatic residues due to: – Large, hydrophobic substrate binding pocket – Small, neutral (usually Ser) residue at position 189 Elastase has shallower binding site: – Residues 216 and 226 have small side chains in trypsin/chymotrypsin – Elastase has bulkier residues (Val, Thr) at these positions Serine proteinases and their inhibitors Substitution of a few, key residues can alter sequence specificity without eliminating enzyme activity: Replacement of Asp-189 of trypsin with a Ser residue (to mimic chymotrypsin) greatlydiminishes activity towards Lys or Arg and increases specificity for hydrophobic substrates 10- to 50-fold Lack of complete change of substrate suggests that other readjustments had to occur during divergence of trypsin and chymotrypsin from their common ancestor Supports notion that new function may emerge by continual “improvement” of function Correlated with functional adaptation of serine proteinase inhibitors: Porcine elafin genes have 93-98% conservation in introns but only 60-77% similarity in exon 2, which encodes the inhibitor domain Due to accelerated mutation rate: KA >> KS UDP-glucuronosyltransferases (UDPGTs) UDPGTs detoxify hundreds of compounds by conjugation and increasing water solubility to facilitate excretion: In mammals, bilirubin (by-product of haem turnover) must undergo detoxification by conjugation to glucuronic acid Glucuronidation is carried out by a large family of UDPGTs with different, but overlapping, substrate specificities – Means that UDPGTs also affect levels of several hormones – Overproduction (through whole-gene duplication) deleterious UDPGTs have distinct domains serving different functions: N-terminal globular domain which binds toxic substrate C-terminal globular domain involved in UDP-glucuronic acid binding Substitutions in restricted regions of N-terminal domain led to diversification of substrate specificities UDP-glucuronosyltransferases (UDPGTs) Two families of UDPGTs in mammals which differ markedly in evolutionary strategy used for functional diversification: UDPGT2B subfamily has evolved through “classic” process of whole gene duplication resulting in several isoforms: – Clustered gene family on chromosome 4 – Primary substrates include 4-hydroxysterone and hyodeoxycholic acid Single UDPTG1 gene complex on chromosome 2 has diversified by duplication only of exon 1, which encodes substrate-binding domain: – Human UDPGT1 gene complex has six closely related exon 1 variants – Single set of four exons that encode the C-terminal parts of UDPGTs – mRNAs of different isoforms produced by differential splicing of one of the exon 1 variants onto the “constant” C-terminal exons Major change of function in paralogous genes Some members of the serine protease family (e.g. haptoglobin, hepatocyte growth factor, azurocidins) have lost their capacity to act as proteinases: Have lost one or more of the residues in the catalytic triad Have other important biological functions: – Haptoglobin binds globin release from lysed erythrocytes – Hepatocyte growth factor acts through specific receptor tyrosine kinases to stimulate cell growth – Azurocidin has bactericidal activity Careful analysis sometimes indicates plausible pathway for transition from one function to another Evolution of azurocidins Azurophil granules of neutrophils contain several proteins implicated in the killing of microorganisms: Serine proteases that cause degradation of connective tissues (cathepsin G, neutrophil elastase, proteinase 3) Azurocidin is similar to these but lacks His-57 and Ser-195 and thus has no proteolytic activity Bactericidal activity of azurocidin mediated by tight binding to anionic lipopolysaccharide, a component of the Gramnegative bacterial envelope: Serine proteinase fold used as a scaffold for endotoxin binding Fact that azurocidins share most recent common ancestor with proteinases that have antibacterial activity suggests that this was a common function of the ancestor New function probably emerged before original function was lost Major change of function by domain acquisition In proteinases involved in blood coagulation, very large segments are joined to the trypsin-homologue region: These “nonproteinase” parts of plasma proteinases consist of multiple structural-functional domains that were introduced by exon shuffling Function modified not only by point mutations but also by domain insertions and duplications Proteinase domains retained proteolytic activity but point mutations led to a altered (usually narrower) sequence specificity Value of domain-acquisition mutations is that they can endow novel binding specificities and lead to dramatic changes in regulation and targeting Modular structure of blood coagulation and fibrinolytic proteinases Plasminogen Protein C Factor IX Factor X Kringle module Prothrombin Growth factor module Tissue-type plasminogen activator Vitamin Kdependent calciumbinding module Urokinase Finger module Domain acquisition in the evolution of plasma proteinases Selective value of domains joined to proteinase domain illustrated by fact that they are usually involved in interactions with cofactors, substrates or inhibitors: Vitamin K-dependent calcium-binding domains of prothrombin, coagulation factors VII, IX, X and protein C anchor proteinases to phospholipid membranes ensuring proper regulation of cascade Kringle domains of plasmin and plasminogen are critical for binding of proteinase to its primary substrate, fibrin: – Serine proteinase domain has proteinase specificity very similar to that of trypsin (Lys-X and Arg-X) – Fibrin specificitydue to fact that kringle domains have specific fibrinbinding sites that target the enzyme to fibrin Similarities and differences in the evolution of paralogous and orthologous proteins Common protein folds are conserved in both paralogues and orthologues and structural elements generally accept mutations at similar rates between the two One difference is that orthologous proteins are likely to fulfil very similar functions in different species whereas paralogous proteins are more likely to have diversified in function: When comparing orthologous proteins, residues that are critical for structure, function and specificity are equally likely to be conserved When comparing paralogous proteins that fulfil different functions, only residues essential for structure are likely to be conserved