* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ProteinShop: A tool for protein structure prediction and modeling
Paracrine signalling wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Expression vector wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genetic code wikipedia , lookup
Metalloprotein wikipedia , lookup
Structural alignment wikipedia , lookup
Interactome wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Biochemistry wikipedia , lookup
Protein–protein interaction wikipedia , lookup
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory The Protein Structure Prediction Problem To determine how proteins, the building blocks of living cells, fold themselves into three-dimensional shapes that define the role they play in life. Importance of Protein Structure Prediction • The shape of a protein determines its function. • Knowledge of structure is used in many ways: – Drug design – Design of synthetic proteins – Re-engineering defective proteins • Genome projects are providing sequences for many proteins whose structure will need to be determined. Protein Structures Gly Leu Ser Proteins consist of a long chain of amino acids, the primary structure Pro Side chain Amino acid H R H R H O H N N Backbone H-bond O N N O H R H O H R H H O R H H O R H N N N R H H N O R H H O Protein Structures Gly a-helix Leu Ser Pro b-sheet Proteins consist of a long chain of amino acids, the primary structure The constituent amino acids may encourage hydrogen bonding that form regular structures, called secondary structures The secondary structures fold together to form a compact 3-dimensional shape, called the tertiary structure Ab Initio Approach Our Goal: To provide an approach that relies more on physical principles than on information from known proteins The problem can be formulated as a global minimization problem, as it is assumed that the tertiary structure occurs at the global minimum of the free energy function of the primary sequence Ab Initio Method Tertiary structure is believed to minimize potential energy: Min VMM(x) where x = atom coordinates Difficulties: Proposed energy function may not match nature O(en2) local minima Very large parameter space e.g., modestly sized protein 100 amino acids ~ 1,600 atoms ~ 4,800 variables The Search Algorithm Given the amino acid sequence of a protein, find the global minimum of the free energy function. Generate Starting Configurations Phase 1 Global Optimization Phase 2 Secondary Structure Predictions in Phase 1 Sequence: SKIGIDGFGRIGRLVLRAALSCGAQ Servers predict secondary structure likely to be in a target protein based on a large database of known proteins. Sequence: SKIGIDGFGRIGRLVLRAALSCGAQ Type: CBBBBBCCCAAAAAAACCCBBBBBC Weight: 1135522356789992888566733 Matching the predicted strands is a combinatorial problem Which strands are paired? ? ? ? Which orientation? anti-parallel parallel Which residues are paired? odd even There are n!2 n-2 possible n-stranded motifs 96 motifs for n=4 960 motifs for n=5 It takes weeks to create some of these configurations using constrained local minimizations! Distribution of Beta Sheets in Proteins with Applications to Structure Prediction Ruckzinski, Kooperberg, Bonneau, and Baker, Proteins 48,2002 CASP4 Competition • Fourth community-wide experiment on the Critical Assessment of Techniques for Protein Structure Prediction (2000) • Our group predicted 8 proteins •Largest protein had 240 aa •Most complex fold had 2 β-strands ProteinShop • Interactive tool for protein manipulation • Designed to quickly create initial configurations • It takes weeks to create a number of configurations using constrained minimizations • It takes a few hours to create the same configurations with ProteinShop Phase 1 with ProteinShop Amino Acid Sequence 2ndary Structure Prediction Phase 1 Structure Sequence Initial Configurations Geometry Generation Phase 2 Pre-configuration Final Configuration Direct Manipulation ProteinShop takes minutes Initial Configurations CASP4 Competition (before ProteinShop) •Our group predicted 8 proteins •Largest protein had 240 aa •Most complex fold had 2 β-strands CASP5 Competition (with ProteinShop) •Our group predicted 20 proteins •Largest protein had 417 aa •Most complex fold had 13 β-strands Phase 2 Amino Acid Sequence Phase 1 Initial Configurations Phase2: Global Optimization Initial Configurations Subspace Selection Subspace Optimization Candidate Selection Final Configuration Final Configuration Takes months to converge using hundreds of processors on Seaborg! Phase 2 with ProteinShop Amino Acid Sequence Phase 1 Initial Configurations Phase2: Global Optimization Initial Configurations Subspace Selection Subspace Optimization Candidate Selection Final Configuration Final Configuration Will reduce computation time Monitoring System Direct Manipulation Steering System Monitoring System • Monitor progress of overall optimization/each optimization process Monitoring System • Monitor progress of overall optimization/each optimization process • Alert user to important events during optimization • A sudden drop in internal energy • A group of processes getting stuck • Test new heuristics for expanding nodes of the tree Steering System • Change configurations during optimization to account for developments not anticipated during Phase 1 • Manipulate proteins that don’t seem to be realistic or that are stuck in a local minimum • Allow pruning of the optimization tree •Assign multiple processes to a configuration that just had a drop in internal energy •Assign stuck processes to other configurations Plans for the Future Use of the monitoring and steering features to develop and test a new method for protein structure prediction Compete in CASP6 (Critical Assessment of Techniques for Protein Structure Prediction) Expand and enhance ProteinShop ProteinShop O. Kreylos, N. Max, B. Hamann, S. Crivelli, and W. Bethel. Interactive Protein Manipulation, Winner of the Best Application Award IEEE Visualization 2003, Seattle. Available to academic and non-profit organizations proteinshop.lbl.gov