* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Structural Bioinformatics
Survey
Document related concepts
Drug design wikipedia , lookup
Biochemistry wikipedia , lookup
Magnesium transporter wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Point mutation wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Metalloprotein wikipedia , lookup
Interactome wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Proteolysis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Structural alignment wikipedia , lookup
Transcript
Forces and Prediction of Protein Structure Ming-Jing Hwang (黃明經) Institute of Biomedical Sciences Academia Sinica http://gln.ibms.sinica.edu.tw/ Science 2005 Sequence - Structure - Function MADWVTGKVTKVQ NWTDALFSLTVHAP VLPFTAGQFTKLGLE IDGERVQRAYSYVN SPDNPDLEFYLVTVP DGKLSPRLAALKPG DEVQVVSEAAGFFV LDEVPHCETLWMLA TGTAIGPYLSILR Sequence/Structure Gap Current (May 15, 2007) entries in protein sequence and structure database: SWISS-PROT/TREMBL : 267,354/4,361,897 PDB : 43,459 Sequence Number of entries Structure Year Structural Bioinformatics: Sequence/Structure Relationship Percent Identity 100 90 All possible sequences of amino acids 80 Protein structures observed in nature 70 60 50 40 30 20 Protein sequences observed in nature Twilight zone Midnight zone 10 0 Structure Prediction Methods Homology modeling Fold recognition ab initio 0 10 20 30 40 50 60 70 80 90 100 % sequence identity Levinthal’s paradox (1969) If we assume three possible states for every flexible dihedral angle in the backbone of a 100-residue protein, the number of possible backbone configurations is 3200. Even an incredibly fast computational or physical sampling in 10-15 s would mean that a complete sampling would take 1080 s, which exceeds the age of the universe by more than 60 orders of magnitude. Yet proteins fold in seconds or less! Berendsen Energy landscapes of protein folding Borman, C&E News, 1998 Levitt’s lecture for S* Levitt Levitt Other factors Formation of 2nd elements Packing of 2nd elements Topologies of fold Metal/co-factor binding Disulfide bond … Ab initio/new fold prediction Physics-based (laws of physics) Knowledge-based (rules of evolution) Levitt Levitt Levitt Levitt Levitt Levitt Levitt Levitt Levitt Levitt Levitt Levitt Levitt Molecular Mechanics (Force Field) Levitt 1-microsecond 980ns MD simulation - villin headpiece - 36 a.a. - 3000 H2O - 12,000 atoms - 256 CPUs (CRAY) -~4 months - single trajectory Duan & Kollman, 1998 Protein folding by MD PROTEIN FOLDING: A Glimpse of the Holy Grail? Herman J. C. Berendsen* "The Grail had many different manifestations throughout its long history, and many have claimed to possess it or its like". We might have seen a glimpse of it, but the brave knights must prepare for a long pursuit. Massively distributed computing SETI@home: Folding@home Distributed folding Sengent’s drug design FightAIDS@home … Massively distributed computing Letters to nature (2002) - engineered protein (BBA5) - zinc finger fold (w/o metal) - 23 a.a. - solvation model - thousands of trajectories each of 5-20 ns, totaling 700 ms - Folding@home - 30,000 internet volunteers - several months, or ~a million CPU days of simulation Energy landscapes of protein folding Borman, C&E News, 1998 Protein-folding prediction technique CGU: Convex Global Underestimation - K. Dill’s group Challenges of physics-based methods Simulation time scale Computing power Sampling Accuracy of energy functions Structure Prediction Methods Homology modeling Fold recognition ab initio 0 10 20 30 40 50 60 70 80 90 100 % sequence identity Flowchart of homology (comparative) modeling From Marti-Renom et al. Fold recognition Find, from a library of folds, the 3D template that accommodates the target sequence best. Also known as “threading” or “inverse folding” Useful for twilight-zone sequences Fold recognition (aligning sequence to structure) (David Shortle, 2000) 3D->1D score On X-ray, NMR, and computed models (Rost, 1996) Reliability and uses of comparative models Marti-Renom et al. (2000) Pitfalls of comparative modeling Cannot correct alignment errors More similar to template than to true structure Cannot predict novel folds Ab initio/new fold prediction Physics-based (laws of physics) Knowledge-based (rules of evolution) From 1D 2D 3D Primary LGINCRGSSQCGLSGGNLMVRIRDQACGNQGQTWCPGERRAKVCGTGNSISAY VQSTNNCISGTEACRHLTNLVNHGCRVCGSDPLYAGNDVSRGQLTVNYVNSC seq. to str. mapping Secondary (fragment) Tertiary fragment assembly CASP Experiments One group dominates the ab initio (knowledge-based) prediction One lab dominated in CASP4 Some CASP4 successes Baker’s group Ab initio structure prediction server Toward High-Resolution de Novo Structure Prediction for Small Proteins --Philip Bradley, Kira M. S. Misura, David Baker (Science 2005) The prediction of protein structure from amino acid sequence is a grand challenge of computational molecular biology. By using a combination of improved low- and highresolution conformational sampling methods, improved atomically detailed potential functions that capture the jigsaw puzzle–like packing of protein cores, and highperformance computing, high-resolution structure prediction (<1.5 angstroms) can be achieved for small protein domains (<85 residues). The primary bottleneck to consistent high-resolution prediction appears to be conformational sampling. 3D to 1D? Science 2003 A computer-designed protein (93 aa) with 1.2 A resolution Structure prediction servers http://bioinfo.pl/cafasp/list.html Hybrid approach for solving macromolecular complex structures Thank You!