Download Kinases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Community fingerprinting wikipedia , lookup

Gene desert wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Molecular ecology wikipedia , lookup

Magnesium transporter wikipedia , lookup

Gene nomenclature wikipedia , lookup

Gene expression wikipedia , lookup

Non-coding DNA wikipedia , lookup

Protein wikipedia , lookup

Peptide synthesis wikipedia , lookup

Gene wikipedia , lookup

Paracrine signalling wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Metabolism wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Proteolysis wikipedia , lookup

Homology modeling wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Biosynthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Genetic code wikipedia , lookup

Mitogen-activated protein kinase wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Phylogeny of the Human Protein
Tyrosine Kinases
School of Medical Education
Liverpool, L69 3GE, UK
Dr John Smith
email: [email protected]
Abstract
The tyrosine kinases form a well conserved family of enzymes with a high degree of homology whose relationships are
well known. This allows the possibility of reconstructing the pathway of their evolution. By working backwards from the
sequences of existing enzymes, a possible sequence for the prototype tyrosine kinase has been constructed. The
sequences inferred for intermediate ancestral sequences will aid study of their functional and developmental
relationships.
Introduction
The tyrosine kinase family of enzymes is essentially though not entirely restricted to the metazoa1 and is involved in
intercellular signalling pathways. It is of interest because of the high degree of conservation of its catalytic domain and
the relatively large number of members2. These properties combine to give the possibility of reconstructing the pathway
of evolution of the family. The stringent requirements of amino acid positioning for catalytic activity have resulted in
regions that are highly conserved (http://ca.expasy.org/cgi-bin/nicedoc.pl?PDOC00100 accessed 10/5/05), in which the
pace of change is so slow that much of the pathway of their evolution may be inferred. Here the question is addressed:
how much information relating to the evolution of the protein tyrosine kinases is preserved in existing sequences?
Fig3: Frequency of Mutation at Each Amino Acid Site. For each
location, the number of mutations observed during the evolution of
the kinases is recorded. The maximum number of observable
mutations that could take place at a site is 176.
Methods
Protein tyrosine kinase domains were selected from Swissprot database (http://www.ncbi.nlm.nih.gov/entrez/) and
arranged into families on the basis of homology relatedness. This corresponded to families defined using extracellular
structure3. A family tree available on a commercial website was used for convenience
(http://www.cellsignal.com/reference/kinase/tk.asp accessed 6/03).
Assuming that each branch point represented a gene duplication event, the immediate ancestral gene as it was at the
time of duplication was given a name (fig 1) and a sequence was determined as a consensus sequence of its progeny
using its nearest neighbour as an outgroup to determine which amino acid was the original where those of the progeny
differed. (‘x’ was used where this could not be determined). To enable this, the amino acid sequences of the gene
products had to be aligned. In order to align amino acids, sequences were ‘piled up’ to locate conserved stretches and
variable inserts. Initially the clustal alignment of the NCBI conserved domain database for kinases
(http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi) was used to give each amino acid a number in the (longest
aggregate) sequence , though some adjustments were made manually as necessary to improve fit.
For each amino acid, an evolutionary tree was constructed by using successive neighbours or derived neighbours as
outgroups. The final stem sequence (S1) was first rooted with M3K9 as outgroup, but then refined using a TKL stem
sequence derived in the same way, using S1 of the TKL family as the final outgroup.
Where ‘x’s accumulated, a tentative assignment was made by looking for amino acids that appeared in progeny on both
sides of a divide. Finally, the tree was constructed that required the least number of mutations overall. Where there was
a choice of equal parsimony, it was assumed that the same mutation had occurred twice during the family development
rather than a forward mutation that was subsequently reversed. Only those amino acids present in essentially all the
sequences were used. Insertions and deletions were treated according to the same rules used in deciding parent amino
acids.
Results
Fig4: Effect of Number of Gene Duplication Events on Final
Evolutionary Distance. The total number of amino acid
differences of each kinase domain from the deduced stem
sequence is plotted against the number of gene duplication
events in its ancestry. A positive correlation is observed that
corresponds to approximately 5 extra mutations per event.
Discussion
When sequences of extant proteins are aligned, some alignments are tentative; manual ‘tidying’ is often necessary.
Deriving a stem sequence.
A publicly available family tree that shows the sequence similarity between protein kinase domains, derived from public
sequences and gene prediction methods detailed elsewhere4 was used as a basis for the reconstruction as described in
Methods. Each of the branch points, which represents the terminal state of a gene product prior to gene duplication was
given a name. The sequences of immediate precursors of existing gene products were deduced as described in
‘methods’, and these sequences in turn used to deduce the sequences of their ancestors. Using the family tree as
indicated above, a putative stem sequence for the protein tyrosine kinases was derived (fig 1).
Refining the tree.
The number of changes in amino acids between each sequence and its progeny allowed the assignment of lengths to
each branch of the tree, putatively giving a relative time scale to the phylogenetic chart (fig 2). Some amino acid
locations were clearly more variable than others (fig 3). In particular, the region of the inserted stretch ( in part of the
protein tyrosine kinase sub-family D) between amino acids 94-95 (not shown) was so variable that the parent sequences
were not derivable with any degree of certainty and these regions were not used to calculate lengths of the branches. It
was noted that the overall distance between the stem origin and the final sequence increased with the number of notional
gene duplications involved in its derivation (fig 4). The slope of the correlation corresponded to approximately 5 amino
acids per additional gene duplication
There is evidence of multiple changes at some loci (fig 4); eg TrC, shows 240 mutations in the course of its evolution
from S1, but differs in only 124 amino acids from it. However, if the constancy of certain sequences of amino acids
indicates their functional consistency 5,6, then the least certain amino acid assignments are the least important.
Refinement of sequences may be obtained by the use of multiple species to avoid the effect of modern ‘noise’ – recent
mutation. The common sequence of DNA at the branch point of the mammals is claimed to be discernable to 98% at the
nucleic acid base level even in non-coding regions7. The further back in development that is to be derived, the more
helpful other ‘primitive’ species would be.
‘Primitive’ species, however, being smaller, tend to have more rapid
generations, hence faster development. Hence, C. elegans is primitive in having generally only one member of each
subfamily of tyrosine kinases, but the sequences themselves are more derived.
The putative sequences of intermediate and stem tyrosine kinases will allow construction of those molecules, attached, if
appropriate to modern external domains and this will provide insight as to their former functions. This will confirm
functional inferences that would otherwise need to be gained by statistical predictions 6.
This is of interest in
interpreting the role of tyrosine kinases in the evolutionary development of multicellular / tissue interactions and in
embryonic development, and their effects when inappropriately expressed in cancer.
References
1 King N. & Carroll,S.B. A receptor tyrosine kinase from choanoflagellates: Molecular insights into early animal
evolution. PNAS 98, 15032-15037, (2001)
Fig1: The Putative Stem Sequence for the Protein Tyrosine
Kinases. Amino acids shown in bold black face are invariant
in all derived sequences, amino acids shown in red are
present in both immediately derived sequences and those in
blue are present in one immediately derived sequence and in
a sequence derived from the other immediately derived
sequence.
2 Manning, G., Plowman, G.D., Hunter,T. & Sudarsanam,S. Evolution of protein kinase signaling from yeast to man.
Trends in Biochemical Sciences 27, 514-520. (2002)
3 Fantl, W.J., Johnson, D.E. & Williams,L.T. Signalling by receptor tyrosine kinases. Ann. Rev. Biochem. 62, 453-481
(1993)
4 Manning, G., Whyte, D.B., Martinez, R., Hunter,T. & Sudarsanam, S. The Protein Kinase Complement of the Human
Genome. Science, 298, 1912-1934 , (2002)
5 Gu, X. Statistical Methods for Testing Functional Divergence after Gene Duplication. Mol.Biol.Evol. 16, 1664-1674
(1999)
6 Gu, J., & Gu, X. Natural History and Functional Divergence of Protein tyrosine Kinases. Gene 317, 49-57 (2003)
7 Blanchette, M., Green, E.D., Miller, W. & Haussler, D. Reconstructing large regions of an ancestral mammalian
genome in silico. Genome Research 14, 2412-2423, (2004)
Additional Information
The full set of derived sequences for each ancestral protein may be found at:
http://pcwww.liv.ac.uk/~jasmith/kinases.htm
Fig2 Graded Evolutionary Tree of the Human Tyrosine Kinases. The kinase
domains as labelled in figure1 are plotted according to their evolutionary
distance from their respective ancestral forms as measured by the number of
mutations observed.