* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SRI International Bioinformatics
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International [email protected] http://BioCyc.org/ Computing with Pathway Tools: APIs  Generic SRI International Bioinformatics functions with a consistent naming scheme  Basic frame access functions  Built-in functions for analysis and global statistics  Simultaneous access to multiple KBs  Cross-species comparisons  Specialized KBs   MetaCyc SchemaBase Computing with Pathway Tools: APIs SRI International Bioinformatics  PerlCyc interface  Library of Perl functions for querying PGDBs via socket connection  Database access functions   Select_Organism, All_Pathways Functions for performing inference / hardwired queries    Genes_Of_Reaction, Genes_Of_Pathway Transcription_Unit_Transcription_Factors Enzyme_P  JavaCyc interface also in progress  http://aracyc.stanford.edu/~mueller/perlcyc/  Lisp API  http://bioinformatics.ai.sri.com/ptools/ptools-resources.html Perlcyc and Javacyc  Interface SRI International Bioinformatics to running Pathway Tools image through TCP  Names are translated to Perl and Java conventions  Object references are supported by means of unique frame names SRI International Bioinformatics Pathway Tools API Functions  get_class_all_instances(Class)  Returns the instances of Class  Key Pathway Tools classes:    Genetic-Elements Genes Proteins    Polypeptides Protein-Complexes Pathways       Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites Pathway Tools API Functions SRI International Bioinformatics  Notation Frame.Slot means a specified slot of a specified frame  get_slot_value(Frame Slot)  Returns first value of Frame.Slot  get_slot_values(Frame Slot)  Returns all values of Frame.Slot  slot_has_value_p(Frame Slot)  Returns true if Frame.Slot has at least one value  member_slot_value_p(Frame Slot Value)  Returns true if Value is one of the values of Frame.Slot Additional Pathway Tools Functions – Semantic Inference Layer  Built-in SRI International Bioinformatics functions encode commonly used queries that compute indirect DB relationships  genes_of_pathway, substrates_of_pathway  all_transcription_factors, regulon_of_protein  See http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html for more information Computing with Pathway Tools: Flat Files SRI International Bioinformatics  Two file formats: tab-delimited, attribute-value  One file for each format, each datatype  Specification:  http://bioinformatics.ai.sri.com/ptools/flatfile-format.html  Examples: Pathways.col – Pathways and genes encoding enzymes  Enzymes.col – Enzymes and reactions they catalyze  Pathways.dat – Full data on each pathway  Reactions.dat – Full data on each reaction  Example Flat File SRI International Bioinformatics UNIQUE-ID - P107-PWY TYPES - Energy-Metabolism COMMON-NAME - RuMP cycle and formaldehyde assimilation REACTION-LIST - FORMATEDEHYDROG-RXN REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN REACTION-LIST - 6PGLUCONDEHYDROG-RXN REACTION-LIST - R84-RXN REACTION-LIST - PGLUCISOM-RXN REACTION-LIST - R12-RXN REACTION-LIST - R10-RXN SYNONYMS - ribulose-monophosphate cycle SYNONYMS - formaldehyde oxidation // Example Flat File – Reactions.dat UNIQUE-ID - R84-RXN TYPES - EC-1.1.1 EC-NUMBER - 1.1.1.IN-PATHWAY - P122-PWY IN-PATHWAY - P107-PWY LEFT - GLC-6-P LEFT - NAD OFFICIAL-EC? - NO RIGHT - 6-P-GLUCONATE RIGHT - NADH RIGHT - PROTON // SRI International Bioinformatics Example Flat File – Compounds.dat UNIQUE-ID - GLC-6-P TYPES - Carbohydrate-Derivatives COMMON-NAME - glucose-6-phosphate CAS-REGISTRY-NUMBERS - 56-73-5 CHEMICAL-FORMULA - (C 6) CHEMICAL-FORMULA - (H 13) CHEMICAL-FORMULA - (O 9) CHEMICAL-FORMULA - (P 1) MOLECULAR-WEIGHT - 260.137 SYNONYMS - D-glucose-6-P SYNONYMS - glucose-6-P SYNONYMS - α-D-glucose-6-phosphate SYNONYMS - α-D-glucose-6-P SYNONYMS - D-glucose-6-phosphate // SRI International Bioinformatics Bioinformatics Results: Algorithms SRI International Bioinformatics  Query and visualization environment for genome and pathway information  PathoLogic algorithm predicts the metabolic network of an organism from its genome  Algorithm for global characterization of a metabolic network  Algorithms under development for qualitative modeling of the cell The Pathway Tools KB as a "virtual cell"  Detailed SRI International Bioinformatics representation of proteins, including subunits  Protein complexes and modifications  Links from genome, through proteins, to pathways and superpathways Computing with the Metabolic Network SRI International Bioinformatics  Comparative analysis of metabolic networks  Visualization of expression data  Correlation of metabolism and transport  Connectivity analysis of metabolic network  Forward propagation of metabolites  Verification of known growth media with metabolic network Computational Exploration of PGDBs SRI International Bioinformatics  Infer metabolic network from genome  Bioinformatics 18:705 2002  Global properties of the metabolic network  Genome Research 10:568 2000  Global properties of the genetic network  Comparison  Consistency of whole metabolic networks of a PGDB with respect to known growth-media requirements  Search for gaps in metabolic network  Pacific Symp Biocomputing 2001:471 Example Studies   Relationship of protein subunits to gene positions Global properties of the E. coli metabolic network  Reactions catalyzed by more than one enzyme  Enzymes that catalyze more than one reaction  Reactions participating in more than one pathway    SRI International Bioinformatics Automatic detection of intersection points in the metabolic network Nutrient analyses  Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network?  Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds? Operon prediction Protein subunits and linked genes  Question: SRI International Bioinformatics are protein subunits coded by neighboring genes?  Proteins are linked to genes, gene positions are recorded in the KB  Procedure  Fetch all protein complexes  Subunits are stored in the ‘components’ slot  Each component has a ‘gene’ slot  Genes have ‘left-end-position’ and ‘right-end-position’ slots  Results  Protein subunits of >90% of heteromeric enzymes are encoded by neighboring genes Global properties: How many reactions are catalyzed by more than one enzyme? SRI International Bioinformatics  Procedure get_class_all_instances(‘Reactions’)  We are interested only in reactions with at least one value in their ‘enzymatic-reaction’ slot  result = reactions with more than one value for their ‘enzymatic-reaction’ slot  Results  About 10% of reactions are catalyzed by more than one enzyme  Two classes of multi-enzyme reactions    Homologous enzymes “Easy” reactions Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?) SRI International Bioinformatics  Procedure get_class_all_instances(‘Proteins’)  result = proteins with more than one value in the ‘catalyzes’ slot  Results  100 out of 607 enzymes catalyze multiple reactions  This is significantly more than predicted by genome sequencing projects  Global properties: Reactions in multiple pathways SRI International Bioinformatics  Procedure get_class_all_instances(‘Reactions’)  result = reactions with more than one value in the ‘inpathway’ slot  Significance  Reactions that appear in multiple pathways correspond to intersection points in the metabolic network   Could be used to identify candidate reactions for drug targets Metabolic Overview Queries  Species comparison  Highlight reactions that are    Shared/not-shared with Any-one/All-of A specified set of species  Overlay expression data  Absolute or relative expression levels  Reaction colors reflects expression level SRI International Bioinformatics SRI International Bioinformatics A E SRI International Bioinformatics SRI International Bioinformatics C. crescentus Cell Cycle Gene Expression Global Consistency Checking of Biochemical Network SRI International Bioinformatics  Given: A PGDB for an organism  A set of initial metabolites   Infer:  What set of products can be synthesized by the smallmolecule metabolism of the organism  Can known growth medium yield known essential compounds?  Pacific Symposium on Biocomputing p471 2001 SRI International Bioinformatics Algorithm: Forward Propagation Nutrient set Products Metabolite set PGDB reaction pool Reactants “Fire” reactions Results SRI International Bioinformatics  Phase I: Forward propagation  21 initial compounds yielded only half of 38 essential compounds for E. coli  Phase II: Manually identify  Bugs in EcoCyc (e.g., two objects for tryptophan)  Missing initial protein substrates (e.g., ACP)  Missing pathways in EcoCyc  Phase III: Forward propagation with 11 more initial metabolites  Yielded all 38 essential compounds SRI International Bioinformatics Initial Metabolites (Total: 21 compounds) Nutrients (8) (M61 Minimal growth medium) Nutrients (10) (Growth conditions) Bootstrap Compounds (3) + 2+ 2+ + H , Fe , Mg , K , NH3, 22SO4 , PO4 , Glucose Water, Oxygen, Trace elements (Mn2+, Co2+, Mo2+, Ca2+, Zn2+, Cd2+, Ni2+, Cu2+) ATP, NADP, CoA SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase I: • Essential compounds • produced • not produced 19 19 • Total compounds • produced: (28%) • Reactions • Fired (31%) Missing Essential Compounds Due To  Bugs SRI International Bioinformatics in EcoCyc  Narrow conceptualization of the problem  Protein substrates  Incomplete biochemical knowledge SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase II (After adding 11 extra metabolites): • Essential compounds • produced • not produced • Total compounds • produced: • not produced: • Reactions • Fired • Not fired 38 0 (49%) (51%) (58%) (42%) Operon Prediction  Based SRI International Bioinformatics on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002)  Distance between genes  Functional classification  Correctly predicts 75% of transcription units, 65% of operons  Additional information available in PGDB  Pathways  Protein complexes  Transporters  Improved prediction performance: 80% of transcription units, 69% of operons  Detailed paper in preparation Visualization of Genetic Network SRI International Bioinformatics  Operon display window  Transcription factor display window  Highlight regulon on Overview diagram  Paint expression data onto Overview diagram  Database adapter mechanism: MAGE-ML intermediate form  Adapter defined for SMD Animation  User specified mapping of color ranges  Import of SAM files (next release)   List of significantly +/- genes  Display full genetic network (later release) SRI International Bioinformatics Acknowledgements SRI Peter Karp, Suzanne Paley, Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud EcoCyc Project  Julio Collado-Vides, Ian Paulsen, Monica Riley, Milton Saier MetaCyc Project  Sue Rhee, Lukas Mueller, Peifen Zhang, Chris Somerville Stanford  Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh  Funding sources:  NIH National Center for Research Resources  NIH National Institute of General Medical Sciences  NIH National Human Genome Research Institute  Department of Energy Microbial Cell Project  DARPA BioSpice, UPC BioCyc.org