* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Computation in Biology
Gene expression wikipedia , lookup
Paracrine signalling wikipedia , lookup
Signal transduction wikipedia , lookup
Expression vector wikipedia , lookup
Point mutation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ligand binding assay wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Drug discovery wikipedia , lookup
Clinical neurochemistry wikipedia , lookup
Biochemistry wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Drug design wikipedia , lookup
Western blot wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Proteolysis wikipedia , lookup
Metalloprotein wikipedia , lookup
Structural alignment wikipedia , lookup
Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc Next-generation biologists must straddle computation and biology Hierarchical structures in living systems Cell Tissue Organelle Organ Macromolecule Supramolesular assembly Organism Genome Sequence- a book of life DOE-Genomes.org examplesfromenglishtext genomicbiologytakesaholisticapproachtomolecularbiologyandev olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc esincludingorganismspecificpagesthatincludelinkstomanywebsite sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks providedonthispage. examplesfromenglishtext genomicbiologytakesaholisticapproachtomolecularbiologyandev olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc esincludingorganismspecificpagesthatincludelinkstomanywebsite sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks providedonthispage. Genomic biology takes a holistic approach to molecular biology and evolution by studying the complete genome, its genes, and its protein expression patterns.NCBI provides several genomic biology tools and resources, including organism-specific pages that include links to many web sites and databases relevant to that species. We invite you to explore the links provided on this page. Molecular circuitry in the cell Biochemical networks www.expasy.ch Cellular networks Characteristics of the yeast proteome: map of protein-protein interactions. H.Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 40-41 (2001); Role of computation Data management Data Analysis & Interpretation Prediction Application What you need… A model A computational tool Models Levels of modelling Abstraction level Hierarchy in living organisms Abstraction level of the model Molecular models Sequences Structures Genome Sequences The ‘omics’ era Software tools Accelrys Tripos MOE BioSuite Schrodinger + hundreds of academic software bits What you can do …………. Sequence Space Determine identity of the molecule Predict physicochemical properties Predict three dimensional structure Predict Function Apply in pharmaceutical/ other industries Examples Accelrys GCG MOE BioSuite Example usage Examples of GCG capabilities Sequence Comparison Database Searching and Retrieval DNA/RNA Secondary Structure Prediction Editing and Publication Evolution Fragment Assembly Gene Finding and Pattern Recognition Sequence Importing and Exporting Mapping Primer Selection Protein Analysis Single Gene/Protein Sequence analysis- MOE The colored bars over the sequences reflect the secondary structure of those sequences having associated atomic coordinates. Chains with sequence-only data have no such bars. In this instance, seven of the chains in the family have structural data and can therefore be used as structural templates. This image illustrates Residue Identity matrix in MOE which shows Chains 13 and 14 have the highest percent identity to the query sequence. Whole genome Sequence analysis- BioSuite Structures Advantages of structural-level studies The protein folding problem Sequence-Structure Gap Need to predict structure using computational methods Applications Four levels of protein structure Structures Advantages of structural-level studies The protein folding problem Sequence-Structure Gap Need to predict structure using computational methods Applications What you can do …………. Structure Space Visualize structures Build molecular models Manipulate Analyse Simulate molecular behaviour Apply in Drug Discovery Visualization: Viewer Module of InsightII Pulldowns Module Icon Icon Palette Command prompt Information Area Visualizations Ligand-Protein Interaction Aiding NMR Structue determination Aiding crystal structure determination.. X-ray crystallography Building molecular models Small molecules Protein/ Nucleic acid/ Carbohydrates Predicting Protein Structure Homology modelling Threading Modifications- Site directed mutants Protein-ligand complexes BIOPOLYMER Biopolymer module provides tools for building and modifying a wide range of biological macromolecules, including proteins, peptides, nucleic acids, and carbohydrates. It is useful in: Building Proteins and Peptides Structural Domain Analysis Building Carbohydrates Building Nucleic Acids Structural Database Searching. This module in turn can be used later by other programs for structure refinement and analysis of small and large molecules Backbone structure of the C-terminal fragment of E.coli 50S ribosomal protein (in yellow), predicted from the carbon trace using the Protein/Backbone command of the Biopolymer module. The crystallographic backbone structure is shown superimposed in blue. The RMS deviation between corresponding backbone atoms of the two structures is 0.52 Angstroms Manipulations Eg., Conformation tweaking HIS_229 ASP_187 The following images are examples of this method of predicting conformations of a few long sidechains of PDB protein 1IC6.A. In each of the following figures, the native conformation is shown colored by element. In the left image, the predicted rotamer (the rotamer with the lowest deltaG) is shown in white. In the right image, all other rotamers generated by the conformational search are shown. MODELER MODELER uses a comparative modeling methodology to rapidly build structural models for protein sequences without a known structure. It derives 3D protein models without the time consuming separate stages of core region identification and loop region building or searching that are inherent to manual homology modeling schemes. MODELER can create a model even with only one source protein. In this case, the structure for dihydrofolate reductase from Lactobacillus Casei is used to generate a model for the E. Coli protein. The model is 2.2 Å RMS deviation from the crystal structure of the E. Coli protein. PROFILES – 3D Profiles-3D offers a unique approach to structure prediction by measuring the compatibility between protein sequences and known protein structures, and then using this information to address the inverse protein folding problem. Profiles-3D enables you to investigate which particular fold an amino acid sequence is likely to adopt. Benefits: Profiles-3D can test the validity of a model or preliminary structures derived from experimental data or modeling studies. Profiles-3D can suggest which 3D structure an amino acid sequence is likely to adopt by relating structural properties to amino acid sequence information. Reference template proteins identified by Profiles-3D can be used as input to InsightII Homology,MODELER module. This image shows the result of a “Profiles-3D Verify” showing a ribbon drawing of a model of myoglobin,where a single alpha-helix has been purposely misfolded.Profiles-3D has detected the misfolded region, and Insight II has automatically created the subset that was used to color the structure and ribbon. MATCHMAKER MatchMaker uses an inverse-folding method to predict the 3D structure of a protein from its amino acid sequence.By comparing a new protein sequence to its topology fingerprint database, MatchMaker assesses the ability of a sequence to adopt characteristic topologies. Even in the absence of strong sequence similarity, MatchMaker generates high quality structural models. Examples of MatchMaker output, including a histogram of sequence-structural compatibility (upper right), a sub-optimal alignment plot (upper left),an energy profile (middle left), and a prediction of structural elements (helix/beta strand, buried/exposed) for the input sequence. Simulations- ‘Discover’ Analysis Protein characterization Protein Comparison Sequence-Structure-Function relationships Active site detection Ligand Binding mode analysis Electrostatic analysis Structure Analysis Quality Check PROTABLE ProTable used to analyze and evaluate protein structures. ProTable creates Ramachandran plots, assesses deviation of local geometries and side chain rotameric states from standard protein values, and determines the energetics of each residue. These images show the results of a ProTable evaluation of a theoretical model of prostatespecific antigen (2PSA). MatchMaker energies reveals a loop (highlighted in green) that may require further refinement. Structures (purple and blue are low probability; orange and red are high probability). An automated Ramachandran analysis (right) identifies backbone torsions in borderline or disallowed regions. DELPHI DelPhi is a powerful and versatile Poisson-Boltzmann electrostatics simulation engine. DelPhi gives you the ability to determine the specificity of ligand-receptor interactions which aids in accelerating drug discovery. DelPhi calculates: Electrostatic properties,including the effects of bulk solvent and ionic strength for nucleic acids, polysaccharides, and complexes such as glycoproteins and protein/DNA. HIV protease, rendered with an electrostatic contour surface with a stick rendering of the drug inside the surface. Blue is positive, red is negative charge and gray is neutral. Applications: Drug Discovery SITEID SiteID provides analysis and visualization tools leading to the identification of potential binding sites within or at the surface of biological targets. Applications: Locate ligand binding pockets on a Macromolecule. Identify protein-protein interaction surfaces. Identify constraints in a novel protein structure for 3D database searching to find or optimize lead compounds. The binding pocket of dihydrofolate reductase located by SiteID and shown as a MOLCAD surface. The red areas of the surface indicate contact atoms in the pocket, while the yellow areas show the residues in which those atoms are contained. The inhibitor (methotrexate) is shown in green. STRUCTURE BASED DESIGN TOOLS Active Site Detection: MOE uses a fast geometric algorithm, based on Edelsbrunner’s alpha shapes, to detect candidate protein-ligand and proteinprotein binding sites. Individual sites can be visualized or populated with “dummy atoms” for docking calculations or Starting points for de novo ligand design efforts. Left PDB 1AAQ (HIV-1 Protease) and the first site located by the MOE Site Finder. Middle 1AAQ with the complexed ligand (hydroxyethylene isostere). Right Hydroethylene isostere overlaid with calculated alpha spheres of the first site. FLEXX FlexX rapidly docks a conformationally flexible ligand into a binding site, using an incremental construction algorithm that builds the ligand in the active site. FlexX is composed of four basic components: Conformational flexibility. Set of possible protein-ligand interactions. Scoring function for the interactions. Algorithm for placement and incremental growth of the ligand from a defined core. A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein backbone and the active site surface were rendered using MOLCAD. The active site surface is color-coded by electrostatic potential. RACHEL RACHEL performs automated combinatorial optimization of lead compounds by systematically derivatizing user-defined sites on the ligand. Applications: Combinatorially enumerate user defined sites on a lead scaffold to optimize binding within a receptor Bridge high-affinity ligand fragments positioned within the active site The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five ligands generated using RACHEL that are predicted to be active. Hydrogen bonds between the ligands and residues are indicated by dashed yellow lines. The surface was rendered using MOLCAD . Dark purple regions contain a greater Acceptor/donor density and light purple regions indicate areas where hydrogen bonding is less likely to occur. HIGH THROUGHPUT DISCOVERY TOOLS HTS-QSAR : CCG’s unique Binary QSAR methodology is ideal for building pass/fail models from high error content data and standard molecular descriptors. The resulting probabilistic models (based on Bayesian statistical inference) are used as a biasing agent in the design of focused combinatorial libraries CHEMINFORMATICS TOOLS Molecular Databases: The MOE Molecular Database is a disk-based spreadsheet central to the manipulation and visualization of large collections of compounds.Data can be imported and exported in various standard file formats and merged with structural or biological activity data. MOLECULAR DATABASE VIEWER MOLECULAR DATABASE CALCULATOR SEARCH COMPARE Search Compare provides systematic conformational search and analysis as well as superimposition, molecular similarity. Using Search Compare, two angiotensin II antagonists are flexibly superimposed based on the field similarity (combined steric and electrostatic potentials). UNITY Unity locates compounds in databases that match a pharmacophore or fit to receptor site. Applications: Exploration of databases for compounds consistent with a pharmacophore hypothesis Lead explosion by retrieving similar compounds Virtual screening of compound databases to discover lead compounds Determining reagents in commercial databases that support combinatorial chemistry synthesis A UNITY query constructed at the active site of the streptavidin/biotin complex (1STP). Yellow lines originate at hydrogen bonding sites of the protein (shown as spheres) and terminate within the spatial constraint for complementary ligand sites. A surface constraint at the protein/ligand interface is shown in green. The spatial cap in red accounts for a bifurcated interaction with an Asp carboxyl. Partial match groups are shown in different colors: red, yellow, or green. CATALYST/SHAPE Catalyst/SHAPE identifies compounds that possess similar 3D shapes to a specified 3D conformation. FEATURES: •Performs flexible shape-based database searches. •Performs statistical analysis of shape indices of a particular database. •Simultaneously performs shape and pharmacophore searches via a merged query. Methotrexate is displayed (left: hydrogen removed) in its bound conformation to the enzyme dihydrofolate reductase inhibitor. On the right are 3D compounds retrieved from the Derwent’s World Drug Index that best fit the shape of the bound conformation of methotrexate. This shape-based 3D search was performed with Accelrys’ Catalyst/SHAPE HypoGen HypoGen Given only available experimental information such as 2D structures and biological activities of a set of molecules, Catalyst can be used to generate general interaction hypotheses that explain variations in activity across a set of molecules. Two 5HT3 antagonists (green and yellow) mapped on to a six-feature hypothesis. C2-LIGAND FIT C2.LigandFit provides active site finding, flexible docking and scoring capabilities, allowing evaluation of compounds against a receptor site Features • Active site search by flood filling method • Fast conformational search for ligand in protein cavity • Fast grid method for evaluation of proteinligand interactions • Clustering of docked conformers • Multiple scoring functions Active site identification for HIV Protease usingC2•LigandFit flood filling technique C2ADME TOOL C2ADME provides computational models for the prediction of absorption, Distribution, metabolism,and excretion (ADME) properties derived from chemical structures. Features: C2•ADME provides computational ADME/Tox prediction tools with the ability to predict problematic New Chemical Entities at an early stage of the development process C2•ADME currently includes models for passive intestinal absorption,blood-brain barrier (BBB) penetration,and aqueous solubility at 25°C. Plot of Polar Surface Area (PSA) vs. LogP for a sample of the World Drug Index (WDI) database showing the 95% and 99% confidence limit ellipses corresponding to the Absorption Model. The points are color coded by Absorption level (Good,Moderate, Poor and Very Poor). In-built utilities Scripting- automation Session Folders Log files What you should remember ….. Good computational practices Other users are as important as yourself Do not use up licenses unduly Preparation Evaluate protocol, choice of package, follow job submission rules Access details Insight/ Catalyst/ Cerius – SGI machines- base modules- several licenses Tripos- SGI machines MOE- Linux platform/ Windows/ SGI BioSuite- Linux