Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Pathway Tools Schema Motivations for Understanding Schema SRI International Bioinformatics Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB When writing complex queries to PGDBs, those queries must name classes and slots within the schema A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity Reference Pathway SRI International Bioinformatics Tools User’s Guide, Volume I Appendix A: Guide to the Pathway Tools Schema SRI International Bioinformatics Web of Relationships for One Enzyme TCA Cycle Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhA sdhB sdhC sdhD Frame Data Model Frame SRI International Bioinformatics Data Model -- organizational structure for a PGDB Knowledge base (KB, Database, DB) Frames Slots Facets Annotations Knowledge Base Collection SRI International Bioinformatics of frames and their associated slots, values, facets, and annotations AKA: Database, PGDB Can be stored within An Oracle DB A disk file A Pathway Tools binary program Frames SRI International Bioinformatics Entities with which facts are associated Kinds of frames: Classes: Genes, Pathways, Biosynthetic Pathways Instances (objects): trpA, TCA cycle Classes: Superclass(es) Subclass(es) Instance(s) A symbolic frame name (id, key) uniquely identifies each frame Frame IDs Naming SRI International Bioinformatics conventions for frame IDs Uniqueness of frame IDs Frame IDs must be unique within a PGDB Goal: Same frame ID within different PGDBs should refer to the same biological entity Because many frames are imported from MetaCyc, this helps ensure consistency of frame names Frame IDs for newly created frames (not imported) are generated by Pathway Tools Those frame IDs contain a PGDB-specific identifier Example: CPLXzz-nnnn CPLXB3-0035 Slots SRI International Bioinformatics Encode attributes/properties of a frame Integer, real number, string, symbols Represent relationships between frames The value of a slot is the identifier of another frame Every slot is described by a “slot frame” in a KB that defines meta information about that slot SRI International Bioinformatics Slot Links TCA Cycle in-pathway Succinate + FAD = fumarate + FADH2 reaction Enzymatic-reaction catalyzes Succinate dehydrogenase component-of Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 product sdhA sdhB sdhC sdhD Slots SRI International Bioinformatics Number of values Single valued Multivalued: sets, bags Slot values Any LISP object: Integer, real, string, symbol (frame name) Slotunits define properties of slots: datatypes, classes, constraints Two slots are inverses if they encode opposite relationships Slot Product in class Genes SRI International Bioinformatics Representation of Function TCA Cycle EC# Keq Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Cofactors Inhibitors Molecular wt pI Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhA sdhB sdhC sdhD Left-end-position Monofunctional Monomer Pathway Reaction Enzymatic-reaction Monomer Gene SRI International Bioinformatics SRI International Bioinformatics Bifunctional Monomer Pathway Reaction Reaction Enzymatic-reaction Enzymatic-reaction Monomer Gene Monofunctional Multimer SRI International Bioinformatics Pathway Reaction Enzymatic-reaction Multimer Monomer Monomer Monomer Monomer Gene Gene Gene Gene Pathway and Substrates Reactant-1 Pathway left in-pathway Reactant-2 Reaction Product-1 Product-2 SRI International Bioinformatics right Reaction Reaction Reaction Transcriptional Regulation trp apoTrpR trpLEDCBA Int005 site001 Int001 pro001 Int003 trpL trpE trpD trpC trpB trpA SRI International Bioinformatics TrpR*trp RpoSig70 Principle Classes SRI International Bioinformatics Class names are capitalized, plural, separated by dashes Genetic-Elements, with subclasses: Chromosomes Plasmids Genes Transcription-Units RNAs rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses: Polypeptides Protein-Complexes Principle Classes Reactions, with subclasses: Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements SRI International Bioinformatics Slots in Multiple Classes Common-Name Synonyms Comment Citations DB-Links SRI International Bioinformatics Genes Slots Component-Of SRI International Bioinformatics (links to replicon, transcription unit) Left-End-Position Right-End-Position Centisome-Position Transcription-Direction Product Proteins Slots Molecular-Weight-Seq Molecular-Weight-Exp pI Locations Modified-Form Unmodified-Form Component-Of SRI International Bioinformatics Polypeptides Slots Gene SRI International Bioinformatics Protein-Complexes Slots Components SRI International Bioinformatics Reactions Slots EC-Number Left, Right DeltaG0 Keq Spontaneous? SRI International Bioinformatics Enzymatic-Reactions Slots Enzyme Reaction Activators Inhibitors Physiologically-Relevant Cofactors Prosthetic-Groups Alternative-Substrates Alternative-Cofactors SRI International Bioinformatics Pathways Slots Reaction-List Predecessors Primaries SRI International Bioinformatics GKB Editor Browse Tools GKB SRI International Bioinformatics class hierarchy and slot definitions -> Ontology Browser Editor described at http://www.ai.sri.com/~gkb/user-man.html Pathway Tools Data Access Mechanisms Introduction MANY APIs SRI International Bioinformatics ways to access and update PGDBs in Java, Perl, and Lisp Import/export Registry Import of files in many formats of Pathway/Genome Databases PGDB data into BioWarehouse Updating a PGDB from an external genome DB Pathway Tools APIs Support SRI International Bioinformatics programmatic queries and updates to PGDBs APIs in Java, Perl, and Lisp all provide access to a common set of procedures: Generic Frame Protocol -- Ocelot object database API Additional Pathway Tools functions For more information see http://bioinformatics.ai.sri.com/ptools/ptools-resources.html Generic Frame Protocol (GFP) A SRI International Bioinformatics library of procedures for accessing Ocelot DBs GFP specification: http://www.ai.sri.com/~gfp/spec/paper/paper.html A small number of GFP functions are sufficient for most complex queries Knowledge of Pathway Tools schema is critical for using the APIs: Appendix I of Pathway Tools User’s Guide, Vol I Generic Frame Protocol get-class-all-instances (Class) Returns the instances of Class Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides (a subclass of Proteins) Protein-Complexes (a subclass of Proteins) Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites SRI International Bioinformatics Generic Frame Protocol SRI International Bioinformatics Notation Frame.Slot means a specified slot of a specified frame get-slot-value(Frame Slot) Returns first value of Frame.Slot get-slot-values(Frame Slot) Returns all values of Frame.Slot as a list slot-has-value-p(Frame Slot) Returns T if Frame.Slot has at least one value member-slot-value-p(Frame Slot Value) Returns T if Value is one of the values of Frame.Slot print-frame(Frame) Prints the contents of Frame Generic Frame Protocol coercible-to-frame-p SRI International Bioinformatics (Thing) Returns T if Thing is the name of a frame, or a frame object save-kb Saves the current KB Generic Frame Protocol – Update Operations SRI International Bioinformatics put-slot-value(Frame Slot Value) Replace the current value(s) of Frame.Slot with Value put-slot-values(Frame Slot Value-List) Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values add-slot-value(Frame Slot Value) Add Value to the current value(s) of Frame.Slot, if any remove-slot-value(Frame Slot Value) Remove Value from the current value(s) of Frame.slot replace-slot-value(Frame Slot Old-Value New-Value) In Frame.Slot, replace Old-Value with New-Value remove-local-slot-values(Frame Slot) Remove all of the values of Frame.Slot Additional Pathway Tools Functions – Semantic Inference Layer SRI International Bioinformatics Semantic inference layer defines built-in functions to compute commonly required relationships in a PGDB http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html Internal note Note: SRI International Bioinformatics Refer to local copy of ptools-fns.html to go through the semantic inference layer fns File Import/Export Capabilities SRI International Bioinformatics PGDBs can be exported in whole or part to: SBML – Systems Biology Markup Language – sbml.org Import supported by many simulation packages File -> Export -> Selected Reactions to SBML File Pathway Tools Attribute-Value format and column-delimited format files http://brg.ai.sri.com/ptools/flatfile-format.shtml Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat Files Dump selected frames to a single file: File -> Export -> Selected Frames to File Import/Export Import from attribute-value or column-delimited files File -> Import -> Frames From File Import/Export to/from internal Pathway Tools format that allows pathways, reactions, enzymes, and compounds to be easily moved between Pathway Tools installations SRI International Bioinformatics Edit -> Add Pathway to File Export List File -> Export -> Selected Pathways to File File -> Import -> Pathways from File Import/Export to/from MDL molfile format Edit -> Import compound structure from molfile Edit -> Export compound structure to molfile Miscellaneous Exports SRI International Bioinformatics Overview -> Highlight -> Save to File Overview -> Highlight -> Load from File Gene / Protein Sequence / Save to file Chromosome -> Show Sequence of a Segment of Replicon SRI International Bioinformatics Napster Comes to Bioinformatics Public sharing of Pathway/Genome Databases PGDB registry maintained by SRI at URL http://biocyc.org/registry.html Registry operations List contents of registry Download PGDBs listed in the registry Register PGDBs you have created Registry Details SRI International Bioinformatics Why register your PGDB? Declare existence of your PGDB in a central location Facilitate download by other scientists Why download a PGDB? Desktop Navigator provides more functionality than Web Comparative operations Programmatic querying and processing of PGDB Registration process Registered PGDBs have open availability by default Authors can provide their own license agreements Registered PGDBs reside on authors’ FTP site BioWarehouse Biospice.org SRI International Bioinformatics New Import/Export Tools Suggestions? Volunteers? SRI International Bioinformatics Updating a PGDB From an External Genome DB Example: SRI International Bioinformatics AraCyc forms a pathway module to the TAIR DB TAIR is authoritative source for gene and geneproduct information Update AraCyc to reflect updates in TAIR Proposed Approach SRI International Bioinformatics Export TAIR to PathoLogic files Build AraCyc2 from those PathoLogic files – automated PathoLogic only Compare AraCyc1 (A1) to AraCyc2 (A2) A. Import new genes/proteins from A2 to A1 B. Delete from A1 genes/proteins not found in A2 C. Rename genes/proteins whose names changed from A2 to A1 Run name matcher on A1’ Check for pathways with no enzymes and report them so user can keep any that otherwise PathoLogic will delete What about enzymes that were assigned to a pathway by the hole filler? Re-run pathway predictor Remember what pathways user deletes so they are not re-predicted by PathoLogic Consider movement of genes from contig to chromosome