Download Lab slides

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene prediction wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Transcript
IBGP/BMI 705 Lab 4:
Protein structure and alignment
TA: L. Cooper
Why Align Structures
1. For homologous proteins (similar ancestry), this
provides the “gold standard” for sequence alignment
– elucidates the common ancestry of the proteins.
2. For nonhomologous proteins, allows us to identify
common substructures of interest.
3. Allows us to classify proteins into clusters, based on
structural similarity.
Example of Structural Homologs
4DFR: Dihydrofolate reductase
1YAC: Octameric Hydrolase of Unknown Specificity
5.9% sequence identity (best alignment)
1YAC structure solved without knowing function.
Alignment to 4DFR and others implies it is a hydrolase of some sort.
DHFR:
yellow & orange
YAC:
green & purple
Sheets
only
Helices
only
Example of Structural Homologs
Sequence alignment
SLSAAEADLAGKSWAPVFANKNANGLDFLVALFEKFPDSANFFADFK-GKSVADIKA-S
VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
PKLRDVSSRIFTRLNEFVNNAANAGKMSAMLSQFAKEHVGFGVGSAQFENVRSMFPGFVA
KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP
Structural alignment
XSLSAAEADLAGKSW-APVFANKN-ANGLDFLVALFEKFPDSANFF-ADFKGKSVA—-DIK
V-LSPADKTNVKAAWGK-VGAHA-GEYGAEALERMFLSFPTTKTYFPHF-------DLS-H
ASPKLRDVSSRIFTRLNEFVNNAANAGKMSA-MLSQ-FAKEHV-GFGVGSAQFENVRSM-F
GSAQVKGHGKKVADALTNAVAHV-D—-DMPNAL—-SALSDLHAHKLRVDPVNFKLLS-HCL
PGFVA
LVTLAAHLPAEFTP
How to Align Structures
1.
Visual inspection (by eye)
2. Computational approach
• Point-based methods using point distances and other properties to
establish correspondences
• Secondary structure-based methods use vectors representing
secondary structures to establish correspondences.
Global
motif
local
Structural Alignment Algorithms
Alignment algorithms create a one-to-one mapping of
subset(s) of one sequence to subset(s) of another sequence.
Structure-based alignment algorithms do this by minimizing
the structure difference score or root-mean-square difference
(rmsd) in alpha-carbon positions.
• Find correspondence set
• Find alignment transform
(protein superposition problem)
• Chicken-and-egg
Parameter Space
Problem: find the rotation matrix, R and a
vector, v, that minimize the following quantity:
Torsion angles (f,y) are:
- local by nature (error propagation)
- invariant upon rotation and translation of the molecule
- compact (O(n) angles for a protein of n residues)
Add 1 degree
To all f, y
Structural Alignments Methods
• STRUCTAL [Levitt, Subbiah, Gerstein]
•
Using dynamic programming with a distance
metric
• DALI [Holm, Sander]
•
Analysis of distance maps
• LOCK [Singh, Brutlag]
•
Analysis of secondary structure vectors,
followed by refinement with distances
• SSAP [Orengo and Taylor, 1989]
• VAST [Gibrat et al., 1996]
• CE [Shindyalov and Bourne, 1998]
• SSM [Krissinel and Henrik, 2004]
• …
VAST (Vector Alignment Search Tool)
• It places great emphasis on the definition of the threshold of significant
structural similarity. By focusing on similarities that are surprising in the
statistical sense, one does not waste time examining many similarities of small
substructures that occur by chance in protein structure comparison. Very many
of the remaining similarities are examples of remote homology, often
undetectable by sequence comparison. As such they may provide a broader
view of the structure, function and evolution of a protein family.
• At the heart of VAST's significance calculation is definition of the "unit" of
tertiary structure similarity as pairs of secondary structure elements (SSE's)
that have similar type, relative orientation, and connectivity. In comparing two
protein domains the most surprising substructure similarity is that where the
sum of superposition scores across these "units" is greatest. The likelihood
that this similarity would be seen by chance is then given as a simple product:
the probability that one would obtain this score in drawing so many "units" at
random, times the number of alternative SSE-pair combinations possible in the
domain comparison, from which one has chosen the best.
• http://www.ncbi.nlm.nih.gov/Structure/RESEARCH/iucrabs.html#Ref_6
Today’s lab
• Answer questions bolded on handout
(There are five)
PDB: Protein structure viewing
PDB- Protein structure viewing
PDB- Protein structure viewing
PDB- Protein structure viewing
SCOP: Protein Classification
SCOP: Protein Classification
SCOP: Protein Classification
VAST: Alignment
VAST: Alignment
VAST: Alignment