* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1.0 Å Cα RMSD for 249 residues
Biochemistry wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Paracrine signalling wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Structural alignment wikipedia , lookup
Western blot wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Proteolysis wikipedia , lookup
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington Examples of biological problems Protein structure prediction/docking simulations - need to run different trajectories that sometimes talk with each other Molecular dynamics simulations - need more cohesive parallelisation Polarisable force fields - need true parallelisation Bioinformatics searches/exploration - trivially parallelisable Computational issues Need efficient methods to start/stop jobs Need load/balancing queuing system Need fast communications at times Need stability (months/years uptimes) Need low maintainance/management overhead Need low installation overhead Needs to be cheap! Hardware and operating system 256 AMD and Intel CPUs (1-2.5 GHz) 0.5-1 GB RAM, 100-200 GB HD, dual processor MBs 100Mbps ethernet connectivity for 64 processor sets White boxes are good but use up space – 1u racks ideal Minimal Linux installation – create clone “CD” – copy on all machines Our solution No single solution – user implements their own Completely decentralised Analyse problem and determine parallelisable parts Implementation specific to problem Use local scratch space for computation Redundant storage of data for faster access Limit problem space to specific problems Problem specific implementation MCSA/GA: socket-based communication of trajectories; multiple trajectories on different CPUs Docking: sample different ligands/regions of the protein on different CPUs MD: Pairwise force-fields are additive PFF: ? Bioinformatics: trivial parallelisation; communication by disk Modelling proteomes Ram Samudrala University of Washington What is a “proteome”? All proteins of a particular system (organelle, cell, organism) What does it mean to “model a proteome”? For any protein, we wish to: ANNOTATION { - figure out what it looks like (structure or form) - understand what it does (function) Repeat for all proteins in a system Understand the relationships between all of them } EXPRESSION + INTERACTION Protein folding …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… DNA protein sequence …-L-K-E-G-V-S-K-D-… one amino acid unfolded protein spontaneous self-organisation (~1 second) native state not unique mobile inactive expanded irregular Protein folding …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… DNA protein sequence …-L-K-E-G-V-S-K-D-… one amino acid unfolded protein spontaneous self-organisation (~1 second) native state not unique mobile inactive expanded irregular unique shape precisely ordered stable/functional globular/compact helices and sheets De novo prediction of protein structure sample conformational space such that native-like conformations are found select hard to design functions that are not fooled by non-native conformations (“decoys”) astronomically large number of conformations 5 states/100 residues = 5100 = 1070 Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate … fragments from database 14-state f,y model … minimise … monte carlo with simulated annealing conformational space annealing, GA … filter all-atom pairwise interactions, bad contacts compactness, secondary structure CASP5 prediction for T138 4.6 Å Cα RMSD for 84 residues CASP5 prediction for T146 5.6 Å Cα RMSD for 67 residues CASP5 prediction for T170 4.8 Å Cα RMSD for all 69 residues CASP5 prediction for T129 5.8 Å Cα RMSD for 68 residues CASP5 prediction for T172 5.9 Å Cα RMSD for 74 residues CASP5 prediction for T187 5.1 Å Cα RMSD for 66 residues Comparative modelling of protein structure scan align de novo simulation … KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** build initial model minimum perturbation refine physical functions … construct non-conserved side chains and main chains graph theory, semfold CASP5 prediction for T129 1.0 Å Cα RMSD for 133 residues (57% id) CASP5 prediction for T182 1.0 Å Cα RMSD for 249 residues (41% id) CASP5 prediction for T150 2.7 Å Cα RMSD for 99 residues (32% id) CASP5 prediction for T185 6.0 Å Cα RMSD for 428 residues (24% id) CASP5 prediction for T160 2.5 Å Cα RMSD for 125 residues (22% id) CASP5 prediction for T133 6.0 Å Cα RMSD for 260 residues (14% id) Prediction of SARS CoV proteinase inhibitors Ekachai Jenwitheesuk Computational aspects of structural genomics A. sequence space B. comparative modelling * * C. fold recognition * * * * * * * * E. target selection D. ab initio prediction F. analysis * * * * * * * * * * * * * * targets (Figure idea by Steve Brenner.) Computational aspects of functional genomics structure based methods microenvironment analysis Bioverse structure comparison * * * * * zinc binding site? homology * function? + sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data single molecule + genomic/proteomic assign function to entire protein space Bioverse – explore relationships among molecules and systems http://bioverse.compbio.washington.edu Jason McDermott Bioverse – explore relationships among molecules and systems Jason Mcdermott Bioverse – prediction of protein interaction networks Target proteome Interacting protein database 85% protein α experimentally determined interaction protein A predicted interaction protein β protein B 90% Assign confidence based on similarity and strength of interaction Jason Mcdermott Bioverse – E. coli predicted protein interaction network Jason McDermott Bioverse – M. tuberculosis predicted protein interaction network Jason McDermott Bioverse – C. elegans predicted protein interaction network Jason McDermott Bioverse – H. sapiens predicted protein interaction network Jason McDermott Bioverse – organisation of the interaction networks Ci = 2n/ki(ki-1) Jason McDermott Bioverse – mapping pathways on the rice predicted network Defense-related proteins Jason McDermott Bioverse – mapping pathways on the rice predicted network Tryptophan biosynthesis Jason McDermott Bioverse – network-based annotation for C. elegans Jason McDermott Bioverse – H. sapiens protein-protein similarity network Jason McDermott Bioverse – viewer Aaron Chang Future directions Network connection with multiple ethernet cards based on traffic analysis Gigabit ethernet (switches are still expensive) Better network filesystems Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution Acknowledgements Aaron Chang Ashley Lam Ekachai Jenwitheesuk Gong Cheng Jason McDermott Kai Wang Ling-Hong Hung Lynne Townsend Marissa LaMadrid Mike Inouye Stewart Moughon Shing-Chung Ngan Yi-Ling Cheng Zach Frazier National Institutes of Health National Science Foundation Searle Scholars Program (Kinship Foundation) UW Advanced Technology Initative in Infectious Diseases