Download Identification of Domains using Structural Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Proteomics wikipedia , lookup

Cyclol wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Western blot wikipedia , lookup

Protein folding wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein purification wikipedia , lookup

Protein design wikipedia , lookup

Alpha helix wikipedia , lookup

Protein structure prediction wikipedia , lookup

P-type ATPase wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Homology modeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Structural alignment wikipedia , lookup

Protein domain wikipedia , lookup

Transcript
Identification of Domains using
Structural Data
Niranjan Nagarajan
Department of Computer Science
Cornell University
Assorted Definitions of Domains
• Subsequences that can fold independently
into a stable structure.
• Structurally compact substructures.
• Functionally well-defined building blocks.
• Evolutionarily conserved and reused
fragments.
Protein Structural Domain
Identification
William R. Taylor
Basic Algorithm
• Initial Assignment of Labels
– Sequential residue numbering
• Update of Labels
• Termination Condition
– Mean squared deviation of average between
successive cycles < 10^-6 or number of
iterations > (length of protein)/2
Update Formula
• Sit+1 = Sit + step(t+1)*sign(jf(Sit, Sjt)) i.
• sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0.
• f(Sit, Sjt) =
– r/dij if Sjt > Sit and dij < r.
– -r/dij if Sjt < Sit and dij < r.
– 0 otherwise.
• Step(x) =
– 1 if x < N/2.
– 2(N-x)/N if N/2 <= x < N.
– 0 otherwise.
Example
• Full lines indicate protein backbone.
• Neighboring residues within radius r are connected by
dashed lines.
• Connections between i and i + 2 have been omitted for
clarity.
• Label evolution is done without inverse distance
weighting.
Refinements
• Median based smoothing with a window
size of 21 to reclaim short loops of 10 or
less residues.
• Small domains reassigned by using the
weighted mean values of its neighbors
(weights are given using f.)
• Domain recalculation repeated for at most
five times.
Preserving -sheets
• Matrix B of possible -sheet interactions
between residues generated based on
distance data and heuristics.
• Weighted mean heuristic used to generate
initial assignment of labels with the
averaging being iterated to convergence.
• Post-processing also done to badly broken
-sheets.
Self-testing with fake homologs
• Fake homologs generated by smoothing
– Replacing central atom of triple by average.
– Process repeated five times.
• Domain assignments compared and
similarity evaluated based on overlap score.
• r optimized for best overlap score.
Extension to Multiple Structures
• Algorithm is simultaneously run on
structures corresponding to a multiple
sequence alignment.
• Labels are synchronized to the average of
the labels at a position after each iteration.