Download Evolution of paralogous proteins

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cyclol wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Protein domain wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Transcript
Evolution of paralogous
proteins
Level 3 Molecular Evolution and
Bioinformatics
Jim Provan
Patthy Sections 7.1 - 7.3
Advantagous duplications
Duplication of a complete protein-coding gene results in two
identical duplicons:
Encode same protein and express it in the same way as original
Duplication will be advantageous if increased supply of gene product
is advantageous (e.g histones or ribosomal proteins)
Duplication by retroposition results in a processed duplicon:
Same protein encoded but expression likely to be different
Duplication will be advantageous if change in expression pattern
(tissue specificity, developmental stage) has biological advantage
Fully redundant functions (neutral duplications) may be lost
through drift but may persist long enough to acquire an
advantageous function
New proteins generally arise through modulation of existing
ones
Advantageous duplication of
unprocessed genes
Duplications are advantageous and maintained by positive
selection if having multiple genes performing the same
function ensures enhanced efficiency e.g. histones
Positive selection for “advantageous” gene duplications
can be observed in:
Insects exposed to insecticides
Tumours and protozoa exposed to drugs
Various stomach lysozymes in ruminants:
Seven closely related mRNAs encoding lysozymes
Multiple lysozyme genes arose through duplication
Serial duplication of lysozyme gene was an adaptive response to
increase the expression of lysozyme
Advantageous duplication of
processed genes
Due to changes in chromosomal environment and/or 5’
regulatory regions, processed gene may have altered
regulatory features and may not be competing with its
progenitor
Lower fidelity of reverse transcription means that duplicon
may have deleterious mutations and be lost
Two examples of positive selection increasing the chance
of survival of testis-specific processed genes:
Phosphoglycerate kinase retrogene
Pyruvate dehydrrogenase E1a retrogene
The phosphoglycerate kinase
retrogene
Two functional PGK loci in the mammalian genome:
PGK-1 is X-linked and expressed constitutively in all somatic cells
PGK-2 is a functional autosomal gene expressed in a tissue-specific
manner exclusively in the late stages of spermatogenesis
PGK-2 lacks introns and has a poly(A) tail – processed gene
Evolution of PGK-2 was a compensatory response to
inactivation of the X-linked gene before meiosis:
Mature spermatozoa require PGK to metabolise fructose in semen
X-inactivation called for a functional autosomal PGK locus
Unequal crossing-over would not have solved this problem
Processed gene had an initial advantage since it permitted
the expression of PGK in a tissue where the X-linked gene
was inactivated
Subsequently evolved a testis-specific promoter
The pyruvate dehydrogenase E1a
retrogene
PDH E1a subunit of the PDH complex is located on the X
chromosome and is expressed in somatic tissues
Another PDH E1a locus is found on chromosome 4:
Testis-specific and expressed in postmeiotic spermatogenic cells
Lacks introns, has a downstream poly(A) tract and is flanked by a
pair of 10 bp direct repeats
After the last meiotic division, spermatids rely on energy
from pyruvate for the maturation process: PDH E1a is
essential:
X-chromosome is inactivated in postmeiotic spermatogenic cells
Only half the cells contain an X-chromosome
Evolution of an alternative, non-X-linked gene
Neutral duplications
If there is no selective advantage from a duplication
event, functional constraints protecting the new gene
from deleterious mutations may be relaxed:
May ultimately be converted to a pseudogene
Many clusters of duplicated genes contain pseudogenes
Advantageous mutations, either in the coding sequence or
the regulatory region, may lead to positive selection
Where there is a major change of function, several critical
sites may be involved:
New function might not be fully manifested until several sites
have adapted
Early mutational steps may be selectively neutral
Visual pigment proteins
Old World primates have three colour-sensitive proteins:
Green- and red-absorbing photoreceptors are encoded by a pair
of closely related (96% identity), closely linked genes on the X
chromosome - suggests very recent gene duplication
Blue photoreceptor is encoded by an autosomal gene
New World monkeys have only one X-linked pigment:
Duplication must have occurred in the ancestor of Old World
monkeys after divergence from New World monkeys
Humans, apes and Old World monkeys can discriminate three
colours whereas New World monkeys can only distinguish two
Prior to emergence of three pigments, rate of nonsynonymous substitution exceeded synonymous rate –
suggests positive selection for three-colour vision
Serine proteinases and their inhibitors
Pancreatic proteinases (trypsin, chymotrypsin and elastase)
illustrate paralogues with minor modifications of function:
Strikingly similar three-dimensional structures
Very different substrate specificities:
– Elastase cleaves residues with small, non-polar side chains (Ala, Val etc.)
– Chymotripsin cleaves at bulky, hydrophobic residues (Phe-X, Tyr-X etc.)
– Trypsin cleaves only Arg-X or Lys-X
Advantage of having multiple digestive proteinases is clear:
their combined activities ensure more efficient utilisation of
proteins in foodstuffs.
Original duplicons survived since they acquired
advantageous mutations that diversified their function
Serine proteinases and their inhibitors
Molecular basis of differences in substrate specificity can
be rationalised from three-dimensional structure and
understanding of the catalytic mechanism:
Specificity of trypsins for Arg or Lys residues is due to:
– Deep substrate binding site which can accommodate side chains
– Asp-189 residue at the bottom of the pocket which neutralises the
Arg / Lys residue of the substrate
Specificity of chymotrypsin for bulky aromatic residues due to:
– Large, hydrophobic substrate binding pocket
– Small, neutral (usually Ser) residue at position 189
Elastase has shallower binding site:
– Residues 216 and 226 have small side chains in trypsin/chymotrypsin
– Elastase has bulkier residues (Val, Thr) at these positions
Serine proteinases and their inhibitors
Substitution of a few, key residues can alter sequence
specificity without eliminating enzyme activity:
Replacement of Asp-189 of trypsin with a Ser residue (to mimic
chymotrypsin) greatlydiminishes activity towards Lys or Arg and
increases specificity for hydrophobic substrates 10- to 50-fold
Lack of complete change of substrate suggests that other
readjustments had to occur during divergence of trypsin and
chymotrypsin from their common ancestor
Supports notion that new function may emerge by continual
“improvement” of function
Correlated with functional adaptation of serine proteinase
inhibitors:
Porcine elafin genes have 93-98% conservation in introns but only
60-77% similarity in exon 2, which encodes the inhibitor domain
Due to accelerated mutation rate: KA >> KS
UDP-glucuronosyltransferases (UDPGTs)
UDPGTs detoxify hundreds of compounds by conjugation
and increasing water solubility to facilitate excretion:
In mammals, bilirubin (by-product of haem turnover) must undergo
detoxification by conjugation to glucuronic acid
Glucuronidation is carried out by a large family of UDPGTs with
different, but overlapping, substrate specificities
– Means that UDPGTs also affect levels of several hormones
– Overproduction (through whole-gene duplication) deleterious
UDPGTs have distinct domains serving different functions:
N-terminal globular domain which binds toxic substrate
C-terminal globular domain involved in UDP-glucuronic acid binding
Substitutions in restricted regions of N-terminal domain led to
diversification of substrate specificities
UDP-glucuronosyltransferases (UDPGTs)
Two families of UDPGTs in mammals which differ markedly
in evolutionary strategy used for functional diversification:
UDPGT2B subfamily has evolved through “classic” process of whole
gene duplication resulting in several isoforms:
– Clustered gene family on chromosome 4
– Primary substrates include 4-hydroxysterone and hyodeoxycholic acid
Single UDPTG1 gene complex on chromosome 2 has diversified by
duplication only of exon 1, which encodes substrate-binding
domain:
– Human UDPGT1 gene complex has six closely related exon 1 variants
– Single set of four exons that encode the C-terminal parts of UDPGTs
– mRNAs of different isoforms produced by differential splicing of one of
the exon 1 variants onto the “constant” C-terminal exons
Major change of function in
paralogous genes
Some members of the serine protease family (e.g.
haptoglobin, hepatocyte growth factor, azurocidins) have
lost their capacity to act as proteinases:
Have lost one or more of the residues in the catalytic triad
Have other important biological functions:
– Haptoglobin binds globin release from lysed erythrocytes
– Hepatocyte growth factor acts through specific receptor tyrosine
kinases to stimulate cell growth
– Azurocidin has bactericidal activity
Careful analysis sometimes indicates plausible pathway
for transition from one function to another
Evolution of azurocidins
Azurophil granules of neutrophils contain several proteins
implicated in the killing of microorganisms:
Serine proteases that cause degradation of connective tissues
(cathepsin G, neutrophil elastase, proteinase 3)
Azurocidin is similar to these but lacks His-57 and Ser-195 and thus
has no proteolytic activity
Bactericidal activity of azurocidin mediated by tight binding
to anionic lipopolysaccharide, a component of the Gramnegative bacterial envelope:
Serine proteinase fold used as a scaffold for endotoxin binding
Fact that azurocidins share most recent common ancestor with
proteinases that have antibacterial activity suggests that this was a
common function of the ancestor
New function probably emerged before original function was lost
Major change of function by domain
acquisition
In proteinases involved in blood coagulation, very large
segments are joined to the trypsin-homologue region:
These “nonproteinase” parts of plasma proteinases consist of
multiple structural-functional domains that were introduced by exon
shuffling
Function modified not only by point mutations but also by domain
insertions and duplications
Proteinase domains retained proteolytic activity but point mutations
led to a altered (usually narrower) sequence specificity
Value of domain-acquisition mutations is that they can
endow novel binding specificities and lead to dramatic
changes in regulation and targeting
Modular structure of blood coagulation
and fibrinolytic proteinases
Plasminogen
Protein C
Factor IX
Factor X
Kringle module
Prothrombin
Growth factor
module
Tissue-type
plasminogen
activator
Vitamin Kdependent calciumbinding module
Urokinase
Finger module
Domain acquisition in the evolution
of plasma proteinases
Selective value of domains joined to proteinase domain
illustrated by fact that they are usually involved in
interactions with cofactors, substrates or inhibitors:
Vitamin K-dependent calcium-binding domains of prothrombin,
coagulation factors VII, IX, X and protein C anchor proteinases to
phospholipid membranes ensuring proper regulation of cascade
Kringle domains of plasmin and plasminogen are critical for binding
of proteinase to its primary substrate, fibrin:
– Serine proteinase domain has proteinase specificity very similar to that
of trypsin (Lys-X and Arg-X)
– Fibrin specificitydue to fact that kringle domains have specific fibrinbinding sites that target the enzyme to fibrin
Similarities and differences in the evolution
of paralogous and orthologous proteins
Common protein folds are conserved in both paralogues
and orthologues and structural elements generally accept
mutations at similar rates between the two
One difference is that orthologous proteins are likely to
fulfil very similar functions in different species whereas
paralogous proteins are more likely to have diversified in
function:
When comparing orthologous proteins, residues that are critical for
structure, function and specificity are equally likely to be conserved
When comparing paralogous proteins that fulfil different functions,
only residues essential for structure are likely to be conserved