Download Identification of Domains using Structural Data

Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University Assorted Definitions of Domains • Subsequences that can fold independently into a stable structure. • Structurally compact substructures. • Functionally well-defined building blocks. • Evolutionarily conserved and reused fragments. Protein Structural Domain Identification William R. Taylor Basic Algorithm • Initial Assignment of Labels – Sequential residue numbering • Update of Labels • Termination Condition – Mean squared deviation of average between successive cycles < 10^-6 or number of iterations > (length of protein)/2 Update Formula • Sit+1 = Sit + step(t+1)*sign(jf(Sit, Sjt)) i. • sign(x) = 1 if x > 0, -1 if x < 0, 0 if x = 0. • f(Sit, Sjt) = – r/dij if Sjt > Sit and dij < r. – -r/dij if Sjt < Sit and dij < r. – 0 otherwise. • Step(x) = – 1 if x < N/2. – 2(N-x)/N if N/2 <= x < N. – 0 otherwise. Example • Full lines indicate protein backbone. • Neighboring residues within radius r are connected by dashed lines. • Connections between i and i + 2 have been omitted for clarity. • Label evolution is done without inverse distance weighting. Refinements • Median based smoothing with a window size of 21 to reclaim short loops of 10 or less residues. • Small domains reassigned by using the weighted mean values of its neighbors (weights are given using f.) • Domain recalculation repeated for at most five times. Preserving -sheets • Matrix B of possible -sheet interactions between residues generated based on distance data and heuristics. • Weighted mean heuristic used to generate initial assignment of labels with the averaging being iterated to convergence. • Post-processing also done to badly broken -sheets. Self-testing with fake homologs • Fake homologs generated by smoothing – Replacing central atom of triple by average. – Process repeated five times. • Domain assignments compared and similarity evaluated based on overlap score. • r optimized for best overlap score. Extension to Multiple Structures • Algorithm is simultaneously run on structures corresponding to a multiple sequence alignment. • Labels are synchronized to the average of the labels at a position after each iteration.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Identification of Domains using Structural Data