Download locating domains

Document related concepts

Epitranscriptome wikipedia , lookup

SR protein wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Expression vector wikipedia , lookup

Genetic code wikipedia , lookup

Biochemistry wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Magnesium transporter wikipedia , lookup

Gene expression wikipedia , lookup

Point mutation wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Interactome wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein purification wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Proteolysis wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
Protein Structure Prediction
• Sequence database searching
• Domain assignment
• Multiple sequence alignment
• Comparative or homology modeling
• Secondary structure prediction
Homologous Proteins
• The term of homology as used in a biological context is
defined as similarity of structure, physiology, development
and evolution of organisms based upon common genetic
factors.
• The statement that two proteins are homologous implies that
their genes have evolved from a common ancestral gene.
Usually they might have similar functions.
• Two proteins are considered to be homologous when they
have identical amino acid residues in a significant number of
sequential positions along the polypeptide chains (> 30 %).
• Homologous proteins have conserved structural cores and
variable loop regions.
• Known structures of 32 pairs
of homologous proteins such
as globins, serine proteinases,
and immunoglobulin domains
have been compared. The
root mean square deviation of
the main-chain atoms of the
core regions is plotted as a
function of amino acid
homology. The curve
represents the best fit of the
dots to an exponential
function. Pairs with high
sequence homology are
almost identical in threedimensional structure,
whereas deviations in atomic
positions for pairs of low
homology are on the order of
2 Å.
The Divergence of Amino-acid
Sequence and 3D Structure for
the Core Region of Homologous
Proteins
A Generalized Approach to Predicting Protein Structure
•
•
•
•
•
•
•
•
•
•
Relevant experimental data
Sequence data/preliminary analysis
Sequence Database searching
Domain assignment
Multiple sequence alignment
Comparative or homology modeling
Secondary structure prediction
Fold Recognition
Analysis of folds and alignment of
secondary structures
Sequence to structure alignment
Flow
Chart
• This flowchart
assumes that the
protein is
soluble, likely
comprises a
single domain,
and does not
contain nonglobular regions.
Experimental Data
Much experimental data can aid the structure prediction process.
Some of these are listed below:
• Disulphide bonds, which provide tight restraints on the
location of cysteines in space
• Spectroscopic data, which can give ideas as to the secondary
structure content of the protein
• Site-directed mutagenesis studies, which can give insights as
to residues involved in active or binding sites
• Knowledge of proteolytic cleavage sites, post-translational
modifications, such as phosphorylation or glycosylation can
suggest residues that must be accessible, etc.
• Remember to keep all of the available data in mind when
doing predictive work. Always ask whether a prediction
agrees with the results of experiments. If not, then it may be
necessary to modify what has been completed.
Protein Sequence Data
• There is some value in doing some initial analysis on the
protein sequence. If a protein has come (for example)
directly from a gene prediction, it may consist of multiple
domains. More seriously, it may contain regions that are
unlikely to be globular, or soluble.
• Is the protein a transmembrane protein, or does it contain
transmembrane segments? There are many methods for
predicting these segments, including:
•
TMAP (EMBL)
http://www.mbb.ki.se/tmap/index.html
•
PredictProtein (EMBL/Columbia)
http://dodo.cpmc.columbia.edu/predictprotein/
•
TMHMM (CBS, Denmark)
•
TMpred (Baylor College)
•
DAS (Stockholm)
http://www.mbb.ki.se/tmap/index.html
COILS - Prediction of Coiled Coil
Regions in Proteins
• Does the protein contain coiled-coils? Prediction of coiled
coils can be completed at the COILS server or by
downloading the COILS program.
http://www.ch.embnet.org/software/COILS_form.html
• COILS is a program that compares a sequence to a
database of known parallel two-stranded coiled-coils and
derives a similarity score. By comparing this score to the
distribution of scores in globular and coiled-coil proteins,
the program then calculates the probability that the
sequence will adopt a coiled-coil conformation.
COILS was described in:
Lupas, A., Van Dyke, M., and Stock, J. (1991) Predicting Coiled
Coils from Protein Sequences, Science 252:1162-1164.
Does the Protein Contain
Regions of Low Complexity?
• Proteins frequently contain runs of poly-glutamine or polyserine, which do not predict well. To check for this the
program SEG (a version of SEG is also contained within
the GCG suite of programs) can be employed.
ftp://ftp.ncbi.nlm.nih.gov/pub/seg/seg/
• If the answer to any of the above questions is yes,
then it is worthwhile trying to break the sequence
into pieces or ignore particular sections of the
sequence, etc. This is related to the problem of
locating domains.
Multiple Sequence Alignment
•
•
•
•
Alignments can provide:
Information to protein domain structure
The location of residues likely to be involved in
protein function
Information of residues likely to be buried in
the protein core or exposed to solvent
More information on a single sequence for
applications like homology modeling and
secondary structure prediction.
Sequence Database Searching
• The most obvious first stage in the analysis of any
new sequence is to perform comparisons with
sequence databases to find homologues. These
searches can now be performed just about
anywhere and on just about any computer. In
addition, there are numerous web servers for doing
searches, where one can post or paste a sequence
into the server and receive the results interactively.
Sequence Database Searching
• There are many methods for sequence searching. By far
the most well known are the BLAST suite of programs.
One can easily obtain versions to run locally (either at
NCBI or Washington University), and there are many web
pages that permit one to compare a protein or DNA
sequence against a multitude of gene and protein sequence
databases. To name just a few:
• National Center for Biotechnology Information (USA)
Searches
– http://www.ncbi.nlm.nih.gov/BLAST/
• European Bioinformatics Institute (UK) Searches
– http://www2.ebi.ac.uk/
• BLAST search through SBASE (domain database; ICGEB,
Trieste)
BLAST
• One of the most important advances in sequence
comparison recently has been the development of both
gapped BLAST and PSI-BLAST (position specific
interated BLAST).
• Both of these have made BLAST much more sensitive, and
the latter is able to detect very remote homologues by
taking the results of one search, constructing a profile and
then using this to search the database again to find other
homologues (the process can be repeated until no new
sequences are found).
• It is essential that one compares any new protein sequence
to the database with PSI-BLAST to see if known structures
can be found prior to doing any of the other methods
discussed in the next sections.
Sequence Database Searching
Other methods for comparing a single sequence to a
database include:
• The FASTA suite (William Pearson, University of
Virginia, USA)
– http://alpha10.bioch.virginia.edu/fasta/
• SCANPS (Geoff Barton, European Bioinformatics
Institute, UK)
– http://barton.ebi.ac.uk/new/software.html
• BLITZ (Compugen's fast Smith Waterman search)
– http://www2.ebi.ac.uk/bic_sw/
Multiple Sequence Database Searching
• It is also possible to use multiple sequence information to
perform more sensitive searches. Essentially this involves
building a profile from some kind of multiple sequence
alignment. A profile essentially gives a score for each type
of amino acid at each position in the sequence, and
generally makes searches more sensitive.
Tools for doing this include:
• PSI-BLAST (NCBI, Washington)
• ProfileScan Server (ISREC, Geneva)
– http://www.isrec.isb-sib.ch/software/PFSCAN_form.html
• HMMER Hidden Markov Model searching (Sean Eddy,
Washington University)
– http://hmmer.wustl.edu/
• Wise package (Ewan Birney, Sanger Centre; this is for
protein versus DNA comparisons) and several others.
– http://www.sanger.ac.uk/Software/Wise2/
Multiple Sequence Searching Using a Motif
• A different approach for incorporating multiple sequence
information into a database search is to use a MOTIF. Instead
of giving every amino acid some kind of score at every
position in an alignment, a motif ignores all but the most
invariant positions in an alignment, and just describes the key
residues that are conserved and define the family. Sometimes
this is called a "signature".
• For example, "H-[FW]-x-[LIVM]-x-G-x(5)-[LV]-H-x(3)[DE]" describes a family of DNA binding proteins. It can be
translated as "histidine, followed by either phenylalanine or
tryptophan, followed by any amino acid (x), followed by
leucine, isoleucine, valine or methionine, followed by any
amino acid (x), followed by glycine, . . . [etc.]".
Multiple Sequence Searching Using a Motif
• PROSITE (ExPASy Geneva) contains a huge number of such
patterns, and several sites allow you to search these data:
ExPASy
http://www.expasy.ch/tools/scnpsite.html
EBI
http://www2.ebi.ac.uk/ppsearch/
• It is best to search a few different databases in order to find as
many homologues as possible. A very important thing to do,
and one which is sometimes overlooked, is to compare any
new sequence to a database of sequences for which 3D
structure information is available. Whether or not the
sequence is homologous to a protein of known 3D structure is
not obvious in the output from many searches of large
sequence databases. Moreover, if the homology is weak, the
similarity may not be apparent at all during the search
through a larger database.
• One can save a lot of time by making use of pre-prepared
protein alignment.
Web sites for Performing Multiple Alignment
• EBI (UK) Clustalw Server
– http://www2.ebi.ac.uk/clustalw/
• IBCP (France) Multalin Server
– http://www.ibcp.fr/multalin.html
• IBCP (France) Clustalw Server
• IBCP (France) Combined Multalin/Clustalw
• MSA (USA) Server
– http://www.ibc.wustl.edu/ibc/msa.html
• BCM Multiple Sequence Alignment ClustalW Sever
– http://dot.imgen.bcm.tmc.edu:9331/multi-align/Options/clustalw.html
Some Tips for Sequence Alignment
• Don't just take everything found in the searches and feed them
directly into the alignment program. Searches will almost
always return matches that do not indicate a significant
sequence similarity. Look through the output carefully and
throw things out if they don't appear to be a member of the
sequence family. Inclusion of non-members in the alignment
will confuse things and likely lead to errors later.
• Remember that the programs for aligning sequences aren't
perfect, and do not always provide the best alignment. This is
particularly so for large families of proteins with low sequence
identities. If a better way of aligning the sequences is
discovered, then by all means edit the alignment manually.
Locating Domains
• If the sequence has more than about 500 amino acids, it is
almost certain that it will be divided into discrete
functional domains. If possible, it is preferable to split
such large proteins up and consider each domain
separately. One can predict the location of domains in a
few different ways. The methods below are given
(approximately) from the most to the least confident.
• If homology to other sequences occurs only over a portion
of the probe sequence and the other sequences are whole
(i.e. not partial sequences), then this provides the strongest
evidence for domain structure. Either complete database
searches or make use of pre-defined databases of protein
domains. Searches of these databases (see links below)
will often assign domains easily.
Locating domains
• Regions of low-complexity often separate domains in
multi-domain proteins. Long stretches of repeated
residues, particularly Proline, Glutamine, Serine or
Threonine often indicate linker sequences and are usually a
good place to split proteins into domains.
• Low complexity regions can be defined using the program
SEG which is generally available in most BLAST
distributions or web servers.
• Transmembrane segments are also very good dividing
points, since they can easily separate extracellular from
intracellular domains.
Locating Domains
• Something else to consider are the presence of coiled-coils.
These unusual structural features sometimes (but not always)
indicate where proteins can be divided into domains.
• Secondary structure prediction methods will often predict regions
of proteins to have different protein structural classes. For
example, one region of a sequence may be predicted to contain
only a helices and another to contain only b sheets. These can
often, though not always, suggest likely domain structure.
• If a sequence has been separated into domains, then it is very
important to repeat all the database searches and alignments
using the domains separately. Searches with sequences
containing several domains may not find all sub-homologies,
particularly if the domains are abundant in the database (e.g.
kinases, SH2 domains, etc.).
Domain Assignment
Locating Domains by Web Sites
• SMART (Oxford/EMBL)
– http://smart.embl-heidelberg.de/
• PFAM (Sanger Center/Wash-U/Karolinska Intitutet)
– http://www.sanger.ac.uk/Software/Pfam/search.shtml
• COGS (NCBI)
• PRINTS (UCL/Manchester)
• BLOCKS (Fred Hutchinson Cancer Research Center,
Seattle)
– http://blocks.fhcrc.org/blocks/blocks_search.html
• SBASE (ICGEB, Trieste)
• Domain descriptions can also be located in the
annotations in SWISSPROT.
•
P68 RNA Helicase
ssyssdrdr grdrgfgapr fggsrtgpls
lvkkkwnlde lpkfeknfyq ehpdlarrta
eitvrghncp kpvlnfyean fpanvmdvia
qaqgwpvals gldmvgvaqt gsgktlsyll
flergdgpic lvlaptrela qqvqqvaaey
iyggapkgpq irdlergvei ciatpgrlid
rttylvldea drmldmgfep qirkivdqir
twpkevrqla edflkdyihi nigalelsan
hdvekdekli rlmeeimsek enktivfvet
mrrdgwpamg ihgdksqqer dwvlnefkhg
asrgldvedv kfvinydypn ssedyihrig
aytfftpnni kqvsdlisvl reanqainpk
grsrgrggmk ddrrdrysag krggfntfrd
llkrdfgakt qngvysaany tngsfgsnfv
gnptgtyqng ydstqqygsn vanmhngmnq
apmigypmpt gysq 614 aa
f015812 (Genebank)
gkkfgnpgek
qevdtyrrsk
rhnfteptai
paivhinhhp
cracrlkstc
flecgktnlr
pdrqtlmwsa
hnilqivdvc
krrcdeltrk
kapiliatdv
rtarstktgt
llqlvedrgs
renydrgysn
sagiqtsfrt
qayaypvpqp
Sequence Alignment of p68 to
DEAD Proteins
Walker A
AXTGSGKT
DEAD
SAT
Walker A motif for ATP binding
ATP binding, ATP hydrolysis
Transmission energy from ATP to unwind RNA
P68 RNA Helicase
RGG
repeats
Putative
HTH
30
80
435
130
477
504
IQ
motif
554
560 614
p68
N-terminal
C-terminal
Helicase-core
ATPase
DEAD AXXGXGKT
I
GXGKT
RNA binding
RGXD HRIGRXXR
PTRELA
GG
TPGR
DEAD
SAT
Ia
Ib
II
III
ATPase B
IV
ATPase A
DEXH
Helicase
PXRXXA
TXGX
DEXH
S/TAT
V
VI
XRXGRXXR
Figure 1. Top panel is the domain structure of p68 RNA helicase. The numbers represent amino acid
residue number. Bottom panels are the RNA helicase-core region of DEAD-box and DEXH-box
proteins. The conserved sequence motifs and their putative functions are indicated.
Comparative or Homology Modeling
• If the protein sequence shows significant
homology to another protein of known threedimensional structure, then a fairly accurate model
of the protein 3D structure can be obtained via
homology modeling.
• It is also possible to build models if one has found
a suitable fold via fold recognition and is satisfied
with the alignment of sequence to structure (Note
that the accuracy of models constructed in this
manner has not been assessed properly, so treat
with caution).
Comparative or Homology Modeling
• It is possible now to generate models automatically using the
very useful SWISSMODEL server. It is possible to send in a
protein sequence only when the degree of sequence
homology is high (50% or greater). It is best, particularly if
one has edited an alignment, to send an alignment directly to
the server.
– http://www.expasy.ch/swissmod/SWISS-MODEL.html
Some other sites useful for homology modeling include:
• WHAT IF (G. Vriend, EMBL, Heidelberg)
– http://www.cmbi.kun.nl/whatif/
• MODELLER (A. Sali, Rockefeller University)
– http://guitar.rockefeller.edu/modeller/modeller.html
• MODELLER Mirror FTP site
Swiss-Model of P68 Based on EIF-4A
DEAD
SAT
Walker A
AQSGTGKT
• EIF-4A is the initiation factor (1QAV) with 1.8 Å resolution.
Methods for Single Sequences
• Secondary structure prediction has been around for almost a
quarter of a century. The early methods suffered from a lack
of data. Predictions were performed on single sequences
rather than families of homologous sequences, and there were
relatively few known 3D structures from which to derive
parameters. Probably the most famous early methods are
those of Chou & Fasman, Garnier, Osguthorbe & Robson
(GOR) and Lim.
• Although the authors originally claimed quite high accuracies
(70 - 80 %), under careful examination, the methods were
shown to be only between 56 and 60 % accurate (Kabsch &
Sander, 1984). An early problem in secondary structure
prediction had been the inclusion of structures used to derive
parameters in the set of structures used to assess the accuracy
of the method.
Methods for Single Sequences
Early methods on single sequences
• Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.
• Lim, V.I. (1974). Journal of Molecular Biology, 88, 857-872.
• Garnier, J., Osguthorpe, D.~J. \& Robson, B. (1978).Journal of
Molecular Biology, 120, 97-120.
• Kabsch, W. & Sander, C. (1983). FEBS Letters, 155, 179-182. (An
assessment of the above methods)
Later methods on single sequences
• Deleage, G. & Roux, B. (1987). Protein Engineering , 1, 289-294
(DPM)
• Presnell, S.R., Cohen, B.I. & Cohen, F.E. (1992). Biochemistry, 31,
983-993.
• Holley, H.L. & Karplus, M. (1989). Proceedings of the National
Academy of Science, 86, 152-156.
• King, R. & Sternberg, M. J.E. (1990). Journal of Molecular Biology,
216, 441-457.
• D. G. Kneller, F. E. Cohen & R. Langridge (1990) Improvements in
Protein Secondary Structure Prediction by an
• Enhanced Neural Network, Journal of Molecular Biology, 214, 171182. (NNPRED)
Assignment of Amino Acids
Frequency of Occurrence of Amino Acids in the b Turns
Secondary Structure Prediction Methods & Links
There are now many web servers for structure prediction, here is a
quick summary:
• PSI-pred (PSI-BLAST profiles used for prediction; David Jones,
Warwick)
• JPRED Consensus prediction (Cuff & Barton, EBI)
– http://barton.ebi.ac.uk/servers/jpred.html
• PREDATORFrischman & Argos (EMBL)
– http://www.embl-heidelberg.de/cgi/predator_serv.pl
• PHD home page Rost & Sander, EMBL, Germany
– http://www.embl-heidelberg.de/predictprotein/predictprotein.html
• ZPRED server Zvelebil et al., Ludwig, U.K.
• http://kestrel.ludwig.ucl.ac.uk/zpred.html (GOR)
• nnPredict Cohen et al., UCSF, USA.
– http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html
• BMERC PSA Server Boston University, USA
– http://bmerc-www.bu.edu/psa/
• SSP (Nearest-neighbor) Solovyev and Salamov, Baylor College,
USA.
– http://dot.imgen.bcm.tmc.edu:9331/pssprediction/pssp.html
Recent Improvements
• The availability of large families of homologous sequences
revolutionized secondary structure prediction.
• Traditional methods, when applied to a family of proteins
rather than a single sequence, proved much more accurate at
identifying core secondary structure elements. The
combination of sequence data with sophisticated computing
techniques such as neural networks has lead to accuracies well
in excess of 70 %. Though this seems a small percentage
increase, these predictions are actually much more useful than
those for single sequence, since they tend to predict the core
accurately.
• Moreover, the limit of 70 – 80 % may be a function of
secondary structure variation within homologous proteins.
Automated Methods
There are numerous automated methods for predicting secondary structure
from multiply aligned protein sequences. Some good references are:
•
•
•
•
•
•
•
•
Zvelebil, M.J.J.M., Barton, G.J., Taylor, W.R. & Sternberg, M.J.E. (1987). Prediction of
Protein Secondary Structure and Active Sites Using the Alignment of Homologous
Sequences Journal of Molecular Biology, 195, 957-961. (ZPRED)
Rost, B. & Sander, C. (1993), Prediction of protein secondary structure at better than 70 %
Accuracy, Journal of Molecular Biology, 232, 584-599. PHD)
Salamov A.A. & Solovyev V.V. (1995), Prediction of protein secondary sturcture by
combining nearest-neighbor algorithms and multiply sequence alignments. Journal of
Molecular Biology, 247,1 (NNSSP)
Geourjon, C. & Deleage, G. (1994), SOPM : a self optimised prediction method for
protein secondary structure prediction. Protein Engineering, 7, 157-16. (SOPMA)
Solovyev V.V. & Salamov A.A. (1994) Predicting alpha-helix and beta-strand segments of
globular proteins. (1994) Computer Applications in the Biosciences,10,661-669. (SSP)
Wako, H. & Blundell, T. L. (1994), Use of amino-acid environment-depdendent
substitution tables and conformational propensities in structure prediction from aligned
sequences of homologous proteins. 2. Secondary Structures, Journal of Molecular
Biology, 238, 693-708.
Mehta, P., Heringa, J. & Argos, P. (1995), A simple and fast approach to prediction of
protein secondary structure from multiple aligned sequences with accuracy above 70 %.
Protein Science, 4, 2517-2525. (SSPRED)
King, R.D. & Sternberg, M.J.E. (1996) Identification and application of the concepts
important for accurate and reliable protein secondary structure prediction. Protein Sci,5,
2298-2310. (DSC).
PHD Prediction of rCD2
Comparison
Between
Prediction &
X-ray
Manual Intervention
• It has long been recognized that patterns of residue
conservation are indicative of particular secondary structure
types.
• Alpha helices have a periodicity of 3.6, which means that for
helices with one face buried in the protein core, and the other
exposed to solvent, the residues at positions i, i+3, i+4 & i+7
(where i is a residue in an a helix) will lie on one face of the
helix. Many alpha helices in proteins are amphipathic,
meaning that one face is pointing towards the hydrophobic
core and the other towards the solvent. Thus patterns of
hydrophobic residue conservation showing the i, i+3, i+4, i+7
pattern are highly indicative of an alpha helix.
Pattern in Amphipathic Helix
• For example, this
helix in myoglobin
has a classic pattern
of hydrophobic and
polar residue
conservation (i = 1).
Pattern in Amphipathic Beta Strand
• The geometry of beta strands means that adjacent residues have
their side chains pointing in opposite directions.
• Beta strands that are half buried in the protein core will tend to
have hydrophobic residues at positions i, i+2, i+4, i+8, etc, and
polar residues at positions i+1, i+3, i+5, etc.
Pattern in Buried Beta Strand
• Beta strands that are completely buried (as is often the case in proteins
containing both alpha helices and beta strands) usually contain a run of
hydrophobic residues, since both faces are buried in the protein core.
Secondary Structure Prediction of CD2
x-structure
A
1
B
10
20
PHD
GOR
SOPMA
x-structure
D
E
60
PHD
GOR
SOPMA
30
C'
40
C"
50
RDSGTVWGALGHGINLNIPNFQMTDDIDEVRWERGSTLVAEFKRKMKPFLK
CCCCSSSSCCCCCSSSCCCCCCCCCCHHHHHHHHCCHHHHHHHHHCCCCSS
CCCCSSSSSSSCCCSCCCCCCCCCCCHCHSSHHHCCHHHHHHHHHHHHHHH
CCCCSSHCCCCCCSSSCCCCCCCCCCCCHSSHHCCCSHHHHHHHHHHHHHC
Rat CD2
Rat CD2
C
F
70
G
80
90
SGAFEILANGDLKIKNLTRDDSGTYNVTVYSTNGTRILNKALDLRILE
CCCSSSSSCCCSSSCCCCCCCCCCSSSSSSCCCHHHHHHHHCCCCCCC
HHHHHHHHHHHHHHHSSSSCCCCSSSSSSSSCCCCSSHHHHHHHHHHH
CCCSSSSCCCCSSSSSSCCCCCCCSSSSSSSCCCCSSSSHHHHHSSHC
H = a-helix
b-sheet
S = b-sheet
C = coil
-helix
CD2 vs. Helical Propensity
F42
C"
C
C'
F
D
V78
E
G
W3 2
B
L 16
V39
Y76
A
C
N
• Residues on
strands C, C’, C”
and G have
strong helical
propensity
Three automated secondary structure predictions (PHD, SOPMA and
SSPRED) appear below the alignment of 12 glutamyl tRNA reductase
sequences. Positions within the alignment showing a conservation of
hydrophobic side-chain character are shown in yellow, and those
showing near total conservation of non-hydrophobic residues (often
indicative of active sites) are colored green.
• Predictions of accessibility performed by PHD (PHD Acc.
Pred.) are also shown (b = buried, e = exposed).
• For example, positions (within the alignment) 38 - 45 exhibit
the classical amphipathic helix pattern of hydrophobic residue
conservation, with positions i, i+3, i+4 and i+7 showing a
conservation of hydrophobicity, with intervening positions
being mostly polar.
• Positions 13 - 16 comprise a short stretch of conserved
hydrophobic residues, indicative of a buried beta-strand.
Alignment of Sequence to
Tertiary Structure
• Remember that the alignments of sequence for tertiary
structure that one gets from fold recognition methods may
be inaccurate. In instances where one has identified a
remote homologue, then the fold recognition methods can
sometimes give a very accurate alignment, though it is still
sometimes fruitful to edit the alignment around variable
regions.
• In other cases, it may be wise to create an alignment by
starting with the alignment from the fold recognition
method, and considering the alignment of secondary
structures.
Alignment of Sequence to
Tertiary Structure
There is one suggested method by Dr. Robert B. Russell:
• Ensure that residues predicted to be buried/exposed align
to those known to be buried or exposed in the template
structure. Note that conserved hydrophobic/polar residues
are more likely to be buried/exposed than non-conserved
residues, which could simply be anomalies. One can
predict residue accessibility manually, or by use of an
automated server like PHD.
• Ensure that critical hydrogen bonding patterns are not
disrupted in beta-sheet structures.
• Attempt to conserve residue properties (i.e. size, polarity,
hydrophobicity) as best as possible across known and
unknown structure.
Things Need to be Considered
In the construction of an alignment, several things need be
considered:
•
The observed residue burial or exposure
•
The predicted residue burial or exposure
•
The conservation of residue properties in known and
unknown structures
•
Whether or not the side chains on the core beta-strands
pointed in towards the barrel or out towards the helices
•
The hydrogen bonding pattern of the beta-strands
comprising the core beta-barrel.
Alignment of the Prediction of the Glutamyl tRNA Reductases
(hemA) with an Alpha/beta Barrel Structure (2acs)
Alignment of the Prediction of the Glutamyl tRNA Reductases
(hemA) with an Alpha/beta Barrel Structure (2acs)
• Sec. = known secondary structure from PDB code 2ACS (E =
extended, H = alpha helix, G = 310 helix, B = beta-bridge);
• Bur. = known residue exposure for 2ACS (b = buried, h = halfburied, e = exposed); in/out = positioning of residues in the betabarrel (i = pointing inwards, o = pointing outwards);
• Res. cons = conservation of residues (totally conserved =
UPPER CASE, h = hydrophobic, p = polar, c = charged, a =
aromatic, s = small, - = negative, + = positive) Pred denotes
predicted burial and secondary structure for the glutamyl tRNA
reductase family;
• Boxed positions are those with the same known/predicted burial.
Shaded positions show a conservation of hydrophobic character
in BOTH families of proteins, and positions in inverse text show
a conservation of polar character in BOTH families.