Download Structural alignment - Structural Biology Labs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

Expression vector wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Gene expression wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic code wikipedia , lookup

Interactome wikipedia , lookup

Metalloprotein wikipedia , lookup

Point mutation wikipedia , lookup

Western blot wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein wikipedia , lookup

Proteolysis wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Transcript
Structural alignment
[email protected]
Protein structure
Every protein is defined by a unique sequence (primary
structure) that folds into a unique shape (tertiary or
three-dimensional structure).
However, proteins with similar sequences adopt very
similar structures.
Cyclophilin from B. malayi
Cyclophilin A from H. sapiens
Why structural alignment ?
we have sequence alignment - Clustal…
KTHLCV
KSHA -V
that gives us an idea about a correspondence of
amino acids of two (or more ) proteins
That enables to infer information about function
And evolution of the Protein
If the sequences are similar enough !!!!
What is twilight zone ?
Sequence alignment unambiguously
distinguish only between protein pairs
of similar structure and non-similar
structures when the pairwise
sequence identity is high.
High sequence identity roughly means
over 40 %.
The signal gets blurred in the twilight
zone of 20-35 % sequence identity.
More of the twilight zone
More than 90 % sequence pairs with the sequence
identity lower than 25 % have different structures.
Significance of sequence alignments is length
dependent.
The longer the sequence the lower identity is
required to be be called significant.Nevertheless,
it converges to 25% with alignments longer than 80
amino acids.
‘The more similar than identical’ rule can reduce a
number of false positives.
Using of intermediate sequences for finding links
between more distant families can also reduce a
number of false positives.
How far can the sequence identity drop?
Average sequence identity of random alignments - 5.6 %
Average sequence identity of remote homologues 8.5 %
How does it work?
From http://www.biochem.unizh.ch/antibody/Introduction/Institutsseminar97/source/slide2.htm
Numbers
Given the average length of a protein
300 amino acid, there are 20300
possibilities of building the average
protein - more than atoms in universe.
In reality just few hundred thousand
sequences are known.
It is believed that a number of basic
protein folds is between 1500 - 5000.
Structural alignment because:
Structures are better conserved than sequences
structural alignment can imply a functional
similarity that is not detectable from a sequence
alignment .
Might help to improve sequence alignment when
structures are available (phylogenetic studies,
homology modeling).
Will improve sequence alignment methods (use of
structural alignments’ substitution matrices, gap
penalties).
Will improve sequence prediction methods
Sequence versus structural alignment
1 2
3
4
5
6
7
8
9
10 11 12 13 14
PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS
PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS
PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS
Material
Is it difficult to make structural alignment?
Structural alignment is NP-hard
(nondeterministic polynomial time)
problem.
In other words, it is not tractable
properly.
Even, if it would, the result would be
correct from technical point of view not
necessary from biological point of view.
Yes, it is.
General solution
Use a heuristic approach:
1. Represent the proteins A and B in some
coordinate independent space
2. Compare A and B
3. Optimize the alignment between A and B
(e.g. minimize R.M.S.d.)
4. Measure the statistical significance of
the alignment against some random set of
structure comparisons
“..in some coordinate independent space…”
Make the problem easier by:
- comparing only distance matrices of
atoms
-comparing secondary
structure element (SSE)
- comparing cartoons
- comparing vectors of SSE
- combination of mentioned methods
- ….
None of the methods guarantee
the finding of the closest
structure and two methods can
disagree at all amino acid
positions.
Nevertheless they can still
provide a valuable insight into the
history of the protein and give
hints concerning the function.
Methods for fold comparison
Server
CE
DALI
DEJAVU
LOCK
MATRAS
PRIDE
SSM
TOP
TOPS
TOPSCAN
VAST
Location
http://cl.sdsc.edu
Method
Extension of optimal path1
http://www2.ebi.ac.uk/dali
Distance-matrix
alignment2
http://portray.bmc.uu.se/cgi-bin/dennis/dejavu.pl
SSE alignment with Caatom optimisation3
http://gene.stanford.edu/LOCK/
Absolute orientation of
corresponding points4
http://bongo.lab.nig.ac.jp/~takawaba/Matras.html
Markov transition model
of evolution5
http://hydra.icgeb.trieste.it/pride/
Ca- Ca atom distances6
http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html
Graph matching algorithm
http://bioinfo1.mbfys.lu.se/TOP
SSE alignment7
http:// tops.ebi.ac.uk/tops/compare1. html
TOPS-diagram alignment8
http://www.rubic.rdg.ac.uk/~andrew/bioinf.org/to
pscan
Secondary topology-string
alignment9
http://www.ncbi.nlm.nih.gov/Structure/VAST/vas
tsearch.html
Vector alignment10
Protein structure classification
If you want to know which structures are
similar to a known structure, these
systems might help:
A) Manual - SCOP
B) Semi-automatic - CATH
C) Automatic - FSSP
CATH
C (class) - secondary structure
composition
A (architecture) - overall shape,
secondary structure elements
orientation
T (topology) - overall shape, secondary
structure elements orientation +
connectivity
H (homologous superfamily) Sequence identity >= 35%, 60% of larger structure equivalent to smaller
SSAP score >= 80.0 and sequence identity >= 20% 60% of larger structure
equivalent to smaller
SSAP score >= 80.0, 60% of larger structure equivalent to smaller
and domains which have related functions
S (sequence families) - clustering based on
the sequence identity level
Summary
Structural alignment can help with protein
annotations even when the sequence similarity is
not significant.
Sequence identity of two proteins with similar
structures can be lower than 10 % - number of
folds is limited.
Recent progress in the protein structure
determination increases the usefulness of
structural alignment.
Structural alignment is difficult problem that is
solved by heuristic methods.
These methods simplify the problem by moving
from 3D space to 2D space sacrificing the
optimum result for the speed.
Summary II
Different methods can provide
completely different alignments.
In our results, CE, Dali,Matras and Vast
were the best servers for finding
structural relatives.
A few structural classification systems
were developed (CATH, FSSP, SCOP), they
provide hierarchical classification of
protein structures and enable to infer
functional and evolutional relationships
between proteins.