Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
http://creativecommons.org/licenses/b y-sa/2.0/ Integrating the Data Prof:Rui Alves [email protected] 973702406 Dept Ciencies Mediques Basiques, 1st Floor, Room 1.08 Website of the Course:http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/ Course: http://10.100.14.36/Student_Server/ Outline • Methods for reconstruction of functional protein networks – Why is it important? • Methods for reconstruction of physical protein interactions Proteins do not work alone! Finding the social environment of a protein • Finding out what a protein does is not enough – Reductase, ok, but of what? (super-mouse) • There is an incredible ammount of information available regarding the biology of many organisms – Sequences, omics, pathways, etc… Integrating the information is important for network recontruction • If we can integrate all the information available for a given protein/gene, then we are likely to be able to predict its social network • From here to reconstructing the causal set of interactions in the network, there is only a step – Who does what to whom Methods for network reconstruction • Mapping Gene onto known pathways – If a gene is orthologous to genes in other organisms for which we known the pathways and circuits, then we can assume that they work in that circuit in the new organism Find a gene in a new genome Sequence of ste20 Orthologue …Sequenced … Genome… gene Reconstruct same pathway in new organism Ste20 new organism Methods for network reconstruction • Mapping Gene onto known pathways • Using text analysis – – – – Scientific literature as accumulated over centuries now. No one can know everything and read everything. However, information is buried in there Mining that information can assist in network reconstruction Publication databases are source of information Meta text databases create network models from publication analysis iHOP is a sofisticated context analysis motor How does meta-text analysis create networks? Literature database Server/ scripts Your genes Program Entry List of entries mentioning your gene Gene e.g Ste20 names database e.g activate, Language inhibit rules rescue database Gene list Rule list Problems with this set up • Delay with respect to available information • Disregards a lot of information available over the web Text Miner will address this Text Miner Text Miner Text Miner Text Miner Things to do • Statistical Significance – Internal controls – Overall controls • Sentence Mining – Definition of action words ontology to help automated function mining • Graphical Drawing – Allowing for mouse drag and droping • Selector for interaction that are to be trusted and included in the model Problems with this set up • Slow, analysis and document retrieval is done live – In the future there will be an option so that if a search has been done by someone before the user will be able to use that, instead of doing a live search • There is more “junk info” – However you can control that by selecting the sources of information you want to use Methods for network reconstruction • Mapping Gene onto known pathways • Meta text analysis • Evolutionary based protein interaction prediction – Proteins that work together (i.e. belong to the smae close social network) evolve together – Ergo, proteins that show co-evolution are to likely to work together Proteins that have coevolved share a function • If protein A has co-evolved with protein B, they are likely to be involved in the same process • Looking for proteins that coevolved will help prediction social networks of proteins • There are many methods to look for co-evolution of proteins – Phylogenetic profiling, gene neighbourhoods, gene fusion events, phylogenetic trees… Using phylogenetic profiles to predict protein interactions Your Sequence (A) A 1 C 0.9 Server/ … … Program B Database of proteins in fully sequenced genomes 0.11 Target Homologue in Homologue in … … Genome … 2? Genome Genome 1? (A and absent of A ProteinsCalculate Y and C) that are present N …in the same setProtein are likely to be involved in the same…process and therefore id A B genomes N Y interact coincidence index C Y N … … … … … A if protein B CA is absent in all genomes in which protein Similarly, B isof Database i/number ofisgenomes<1 present there a likelihood that they perform the same profiles function!for 0j/number 1 of genomes 2 each protein in each organism Phylogenetic coincidence server • We have one that will be up in a few months for yeast, coli, man, chimp, candida and xanthus. Syntheny/Conservation of gene neighborhoods Genome 1 Protein A Genome 2 Protein C Protein A Protein B Protein C Protein D Protein A Protein B Protein D Protein B Protein C Protein D Proteins A and B are in a conserved relative Genome 3 Protein B Protein A Protein C Protein D position in most genomes which is an Which of these proteins interact? indication that they are likely to interact Genome … … Gene fusion events Genome 1 Genome 2 Protein A Protein B Protein C Protein D Protein A Protein B Protein C Protein D Protein C Protein A Protein D Protein B Which of these proteins interact? Proteins A and B have suffered gene fusion Genome 3 in atProtein Protein B Protein A Protein C D events least some genomes, which is an indication that they are likely to interact Genome … … Building phylogenetic trees of proteins Genome 1 Protein A Protein B Protein C Protein D Phylogenetic trees represent the Genome evolutionary 2 homologue Protein D Protein C history Protein A of Protein B genes/proteins based on their sequence Genome 3 Protein B Protein A Protein C Protein D Genome … … Get sequence of all homogues, align and build a phylogenetic tree Similarity of phylogenetic trees indicates interaction between proteins B1 A1 B2 A2 B3 A3 … … C3 D2 … Proteins A and B have similar evolutionary trees and thus are likely to interact … C2 C1 D1 D3 Protein/Gene interactions • Often, people use these methods to say that genes of proteins interact. • The methods previously describe can not be used accurately to describe PHYSICAL interaction • When people say interact in this context one is forced to assume FUNCTIONAL (not necessarily physical) interaction, unless more info is available Methods for network reconstruction • • • • Mapping Gene onto known pathways Using meta text analysis Using phylogenetic profiling Using omics data – If two proteins/genes have evolve to perform a function in the same process, it is likely that their activity and gene expression is co-regulated – Conversely, if proteins/genes are co-regulated, then they are likely to participate in the same process Predicting gene functional interactions using micro array data cells Group of genes/proteins Purify cDNA Compare cDNA involved in response levels of corresponding genes to the stimulus Purify cDNA Stimulum in the different populations Genes overexpressed as a result of stimulus Genes underexpressed as a result of stimulus cells Genes with expression independent of stimulus Gene network reconstruction • Reconstruction of gene networks based on micro-array data is a very difficult endeavor • It is an inverse problem, meaning that there is usually more than one solution that fits the data • Pioner groups used either petri nets (e.g. Somogyi, Finland) or mathematical model (Okamoto, Japan) Predicting protein functional interactions using mass spec data cells Group of proteins involved in response Purify proteins Identify Proteins and to the stimulus compare Protein profiles/levels in the Purify proteins different populations Stimulum Proteins present as a result of stimulus Proteins absent as a result of stimulus cells Proteins Present in both conditions Protein network reconstruction • Reconstruction of protein networks based on mass spec proteomics data is still very immature. • To my knowledge no paradigmatic, large scale example of it has yet been done Regulation of gene expression • Predicting which TF regulate gene expression is an important part of reconstructing biological circuits of interest • Omics data and bioinformatics can also be used to do this Predicting regulatory modules with CHIP-ChIp experiments Scan new genomes for TF regulatory modules Crosslink Protein/DNA Derive consensus sequences for TF binding sites Break DNA Compare in Microarray Reverse cross link & Purify DNA Pieces cells Break DNA Afinity Purification of Transcription factor Reverse cross link & Purify DNA Pieces bound to TF Predicting protein activity modulation with NMR/IR/MS Metabolomics cells Measuring Metabolites Compare changes in metabolic levels to infer changes in protein activity Stimulus cells Measuring Metabolites Incorporating metabolomics information • These changes can be incorporated into mathematical models and these models can then be used predictively Methods for network reconstruction • • • • • Mapping Gene onto known pathways Using meta text analysis Using phylogenetic profiling Using omics data Using protein interaction data – Large scale protein interaction data sets are available – If proteins physically interact, it is likely that they work together in the same network Predicting protein networks using protein interaction data Server/ Program Database of protein interactions A C D Your Sequence (A) E Continue until you are satisfied B or completed F the network Outline • Methods for reconstruction of functional protein networks • Methods for reconstruction of physical protein interactions How do proteins work within the network? • Assume we now have the network our protein is involved in. • How do we further analyze the role of the protein? Proteins work by binding DNA Effect Proteins work by binding! So what? So, if we can predict how proteins DOCK to their ligands, then we will be able to understand how the binding allows them to work systemically Design drugs to overcome mutations in binding sites Design proteins to prevent/enhance other interactions What is in silico protein docking? • Given two molecules find their correct association using a computer: T = + What types of in silico docking exist? • Sequence Based Docking: In silico two hybrid docking Protein A Protein B E. coli AGGMEYW…. E. coli VCHPRIIE…. S. typhi AA – CDWY… S. typhi VCH -KIIE… … … … … Y. pestis AGG –DYW Y. pestis VCH –KIIE… D/K or E/R may be involved in a salt bridge A G G … Pearson Correlation D … V C H P K I I E… What types of in silico docking exist? • Sequence Based Docking • In silico structural protein docking Structure based docking • Protein-Protein docking – Rigid (usually) • Very demanding on Protein-Ligand docking resources computational – Rigid protein, flexible ligand Structural docking in a nutshell • Scan molecular surfaces of protein for best surface fit – First steric, then energetics – Can (and should) include biologically relevant information (e.g. residue X is known from mutation experiments to be involved in the docking → discard any docking not involving this residue) Atom based docking • First, a surface Accessible (Connolly) representationSurface is needed Van der Waals Surface Solvent accessible Surface Calculating the best docking • Scan molecular surfaces of protein for best surface fit – Calculate the position where a largest number of atoms fits together, factor in energy + biology and rank solutions according to that Grid-based techniques •Grid-based Techniques –Alternative to calculating protein atom / ligand atom interactions. more efficient (number of grid points < number of atoms) Grid based docking Score 2 Score 3 Score 1 Score 4 Calculate intermolecular forces for each grid point Place grid over protein The docking function • There are many and none is the best for all cases •Scores will depend on the exact docking function you use A docking function for surface matching •Molecules a, b placed on l × m × n grid 0 outside the molecule a,bl ,m,n inside the molecule 1 on the surface of the molecule •Match surfaces N Cal ',m ',n ' , b •Fourier transform makes calculation faster l step1, m step 2, n step 3 N N al ',m ',n ' b l step1,m step 2,n step 3 l '1 m '1 n '1 •Tabulate and rank all possible conformations A docking function for electrostatics • There are many •they use different force field approximations to calculate energy of electrostatic interactions. •The basics: Eelectrostatic a ra b rb a ra b rb dV Charge distributions for proteins Potential for proteins The full docking function • Calculates a relative binding energy that integrates electrostatic and shape matching factors. For example: Etot cElectrostatic EElectrostatic cshape matching Eshape matching Overall process of docking Overall process of docking Mol 1 Mol 2 Rigid Body energy calculation List of Complexes Final list of solutions Energy(Form Matching , Electrostatics) p1,i , p2, j i, j Re-rank using statistics of residue contact, H/bond, biological information, etc Re-rank using rotamers, flexibility in protein backbone angles, Molecular dynamics, etc. Summary • Methods for reconstruction of functional protein networks – Bibliomics – Genomics – Phenomics, etc • Methods for reconstruction of protein interactions – Sequence based – Structure based The overall picture The overall picture The overall picture The overall picture The overall picture The overall picture Grid-based techniques • Grid-based Techniques – Notes: • Grids spaced <1 Å – Results show very little change in error for grids spacing between .25 and 1 Å Problem Importance • Computer aided drug design – a new drug should fit the active site of a specific receptor. • Many reactions in the cell occur through interactions between the molecules. • No efficient techniques for crystallizing large complexes and finding their structure.