Download Introduction - Bioinformatics Research Group at SRI International

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
The Pathway Tools Schema
Motivations for Understanding
Schema
SRI International
Bioinformatics
 Pathway
Tools visualizations and analyses
depend upon the software being able to find
precise information in precise places within a
Pathway/Genome DB
 When
writing complex queries to PGDBs, those
queries must name classes and slots within the
schema
A
Pathway/Genome Database is a web of
interconnected objects; each object represents a
biological entity
Reference
 Pathway
SRI International
Bioinformatics
Tools User’s Guide, Volume I
 Appendix A: Guide to the Pathway Tools Schema
SRI International
Bioinformatics
Web of Relationships for One Enzyme
TCA Cycle
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Frame Data Model
 Frame
SRI International
Bioinformatics
Data Model -- organizational structure for a
PGDB
 Knowledge
base (KB, Database, DB)
 Frames
 Slots
 Facets
 Annotations
Knowledge Base
 Collection
SRI International
Bioinformatics
of frames and their associated slots,
values, facets, and annotations
 AKA: Database, PGDB
 Can
be stored within
 An Oracle DB
 A disk file
 A Pathway Tools binary program
Frames
SRI International
Bioinformatics

Entities with which facts are associated

Kinds of frames:
 Classes: Genes, Pathways, Biosynthetic Pathways
 Instances (objects): trpA, TCA cycle

Classes:
 Superclass(es)
 Subclass(es)
 Instance(s)

A symbolic frame name (id, key) uniquely identifies each
frame
Frame IDs
 Naming
SRI International
Bioinformatics
conventions for frame IDs
 Uniqueness of frame IDs
 Frame IDs must be unique within a PGDB
 Goal: Same frame ID within different PGDBs should refer to
the same biological entity
 Because many frames are imported from MetaCyc, this helps
ensure consistency of frame names
 Frame IDs for newly created frames (not imported) are
generated by Pathway Tools


Those frame IDs contain a PGDB-specific identifier
Example: CPLXzz-nnnn CPLXB3-0035
Slots
SRI International
Bioinformatics
 Encode
attributes/properties of a frame
 Integer, real number, string, symbols
 Represent
relationships between frames
 The value of a slot is the identifier of another frame
 Every
slot is described by a “slot frame” in a KB
that defines meta information about that slot
SRI International
Bioinformatics
Slot Links
TCA Cycle
in-pathway
Succinate + FAD = fumarate + FADH2
reaction
Enzymatic-reaction
catalyzes
Succinate dehydrogenase
component-of
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
product
sdhA
sdhB
sdhC
sdhD
Slots
SRI International
Bioinformatics
 Number
of values
 Single valued
 Multivalued: sets, bags
 Slot
values
 Any LISP object: Integer, real, string, symbol (frame name)
 Slotunits
define properties of slots: datatypes,
classes, constraints
 Two
slots are inverses if they encode opposite
relationships
 Slot Product in class Genes
SRI International
Bioinformatics
Representation of Function
TCA Cycle
EC#
Keq
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
Cofactors
Inhibitors
Molecular wt
pI
Sdh-flavo
Sdh-Fe-S
Sdh-membrane-1
Sdh-membrane-2
sdhA
sdhB
sdhC
sdhD
Left-end-position
Monofunctional Monomer
Pathway
Reaction
Enzymatic-reaction
Monomer
Gene
SRI International
Bioinformatics
SRI International
Bioinformatics
Bifunctional Monomer
Pathway
Reaction
Reaction
Enzymatic-reaction
Enzymatic-reaction
Monomer
Gene
Monofunctional Multimer
SRI International
Bioinformatics
Pathway
Reaction
Enzymatic-reaction
Multimer
Monomer
Monomer
Monomer
Monomer
Gene
Gene
Gene
Gene
Pathway and Substrates
Reactant-1
Pathway
left
in-pathway
Reactant-2
Reaction
Product-1
Product-2
SRI International
Bioinformatics
right
Reaction
Reaction
Reaction
Transcriptional Regulation
trp
apoTrpR
trpLEDCBA
Int005
site001
Int001
pro001
Int003
trpL
trpE
trpD
trpC
trpB
trpA
SRI International
Bioinformatics
TrpR*trp
RpoSig70
Principle Classes
SRI International
Bioinformatics

Class names are capitalized, plural, separated by dashes

Genetic-Elements, with subclasses:
 Chromosomes
 Plasmids
Genes
Transcription-Units
RNAs
 rRNAs, snRNAs, tRNAs, Charged-tRNAs
Proteins, with subclasses:
 Polypeptides
 Protein-Complexes




Principle Classes
 Reactions,
with subclasses:
 Transport-Reactions
 Enzymatic-Reactions
 Pathways
 Compounds-And-Elements
SRI International
Bioinformatics
Slots in Multiple Classes
 Common-Name
 Synonyms
 Comment
 Citations
 DB-Links
SRI International
Bioinformatics
Genes Slots
 Component-Of
SRI International
Bioinformatics
(links to replicon, transcription
unit)
 Left-End-Position
 Right-End-Position
 Centisome-Position
 Transcription-Direction
 Product
Proteins Slots
 Molecular-Weight-Seq
 Molecular-Weight-Exp
 pI
 Locations
 Modified-Form
 Unmodified-Form
 Component-Of
SRI International
Bioinformatics
Polypeptides Slots
 Gene
SRI International
Bioinformatics
Protein-Complexes Slots
 Components
SRI International
Bioinformatics
Reactions Slots
 EC-Number
 Left,
Right
 DeltaG0
 Keq
 Spontaneous?
SRI International
Bioinformatics
Enzymatic-Reactions Slots
 Enzyme
 Reaction
 Activators
 Inhibitors
 Physiologically-Relevant
 Cofactors
 Prosthetic-Groups
 Alternative-Substrates
 Alternative-Cofactors
SRI International
Bioinformatics
Pathways Slots
 Reaction-List
 Predecessors
 Primaries
SRI International
Bioinformatics
GKB Editor
 Browse
 Tools
 GKB
SRI International
Bioinformatics
class hierarchy and slot definitions
-> Ontology Browser
Editor described at
 http://www.ai.sri.com/~gkb/user-man.html
Pathway Tools
Data Access Mechanisms
Introduction
 MANY
 APIs
SRI International
Bioinformatics
ways to access and update PGDBs
in Java, Perl, and Lisp
 Import/export
 Registry
 Import
of files in many formats
of Pathway/Genome Databases
PGDB data into BioWarehouse
 Updating
a PGDB from an external genome DB
Pathway Tools APIs
 Support
SRI International
Bioinformatics
programmatic queries and updates to
PGDBs
 APIs
in Java, Perl, and Lisp all provide access to
a common set of procedures:
 Generic Frame Protocol -- Ocelot object database API
 Additional Pathway Tools functions
 For
more information see
 http://bioinformatics.ai.sri.com/ptools/ptools-resources.html
Generic Frame Protocol (GFP)
A
SRI International
Bioinformatics
library of procedures for accessing Ocelot DBs
 GFP
specification:
 http://www.ai.sri.com/~gfp/spec/paper/paper.html
A
small number of GFP functions are sufficient for
most complex queries
 Knowledge
of Pathway Tools schema is critical
for using the APIs:
 Appendix I of Pathway Tools User’s Guide, Vol I
Generic Frame Protocol

get-class-all-instances (Class)
 Returns the instances of Class

Key Pathway Tools classes:
 Genetic-Elements
 Genes
 Proteins
 Polypeptides (a subclass of Proteins)
 Protein-Complexes (a subclass of Proteins)
 Pathways
 Reactions
 Compounds-And-Elements
 Enzymatic-Reactions
 Transcription-Units
 Promoters
 DNA-Binding-Sites
SRI International
Bioinformatics
Generic Frame Protocol
SRI International
Bioinformatics

Notation Frame.Slot means a specified slot of a specified
frame

get-slot-value(Frame Slot)
 Returns first value of Frame.Slot
get-slot-values(Frame Slot)
 Returns all values of Frame.Slot as a list




slot-has-value-p(Frame Slot)
 Returns T if Frame.Slot has at least one value
member-slot-value-p(Frame Slot Value)
 Returns T if Value is one of the values of Frame.Slot
print-frame(Frame)
 Prints the contents of Frame
Generic Frame Protocol
 coercible-to-frame-p
SRI International
Bioinformatics
(Thing)
 Returns T if Thing is the name of a frame, or a frame object
 save-kb

Saves the current KB
Generic Frame Protocol –
Update Operations
SRI International
Bioinformatics

put-slot-value(Frame Slot Value)
 Replace the current value(s) of Frame.Slot with Value

put-slot-values(Frame Slot Value-List)
 Replace the current value(s) of Frame.Slot with Value-List, which must be a list of
values

add-slot-value(Frame Slot Value)
 Add Value to the current value(s) of Frame.Slot, if any

remove-slot-value(Frame Slot Value)
 Remove Value from the current value(s) of Frame.slot

replace-slot-value(Frame Slot Old-Value New-Value)
 In Frame.Slot, replace Old-Value with New-Value

remove-local-slot-values(Frame Slot)
 Remove all of the values of Frame.Slot
Additional Pathway Tools Functions –
Semantic Inference Layer
SRI International
Bioinformatics
 Semantic
inference layer defines built-in
functions to compute commonly required
relationships in a PGDB
 http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html
Internal note
 Note:
SRI International
Bioinformatics
Refer to local copy of ptools-fns.html to go
through the semantic inference layer fns
File Import/Export Capabilities
SRI International
Bioinformatics
 PGDBs
can be exported in whole or part to:
 SBML – Systems Biology Markup Language – sbml.org


Import supported by many simulation packages

File -> Export -> Selected Reactions to SBML File
Pathway Tools Attribute-Value format and column-delimited
format files



http://brg.ai.sri.com/ptools/flatfile-format.shtml
Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat
Files
Dump selected frames to a single file: File -> Export -> Selected Frames
to File
Import/Export

Import from attribute-value or column-delimited files


File -> Import -> Frames From File
Import/Export to/from internal Pathway Tools format that
allows pathways, reactions, enzymes, and compounds to be
easily moved between Pathway Tools installations




SRI International
Bioinformatics
Edit -> Add Pathway to File Export List
File -> Export -> Selected Pathways to File
File -> Import -> Pathways from File
Import/Export to/from MDL molfile format
 Edit -> Import compound structure from molfile
 Edit -> Export compound structure to molfile
Miscellaneous Exports




SRI International
Bioinformatics
Overview -> Highlight -> Save to File
Overview -> Highlight -> Load from File
Gene / Protein Sequence / Save to file
Chromosome -> Show Sequence of a Segment of Replicon
SRI International
Bioinformatics
Napster Comes to Bioinformatics
 Public

sharing of Pathway/Genome Databases
PGDB registry maintained by SRI at URL
http://biocyc.org/registry.html
 Registry
operations
 List contents of registry
 Download PGDBs listed in the registry
 Register PGDBs you have created
Registry Details
SRI International
Bioinformatics
 Why
register your PGDB?
 Declare existence of your PGDB in a central location
 Facilitate download by other scientists
 Why download a PGDB?
 Desktop Navigator provides more functionality than Web
 Comparative operations
 Programmatic querying and processing of PGDB
 Registration
process
 Registered PGDBs have open availability by default
 Authors can provide their own license agreements
 Registered PGDBs reside on authors’ FTP site
BioWarehouse
 Biospice.org
SRI International
Bioinformatics
New Import/Export Tools
 Suggestions?
 Volunteers?
SRI International
Bioinformatics
Updating a PGDB From an
External Genome DB
 Example:
SRI International
Bioinformatics
AraCyc forms a pathway module to the
TAIR DB
 TAIR
is authoritative source for gene and geneproduct information
 Update
AraCyc to reflect updates in TAIR
Proposed Approach



SRI International
Bioinformatics
Export TAIR to PathoLogic files
Build AraCyc2 from those PathoLogic files – automated
PathoLogic only
Compare AraCyc1 (A1) to AraCyc2 (A2)
A. Import new genes/proteins from A2 to A1
B. Delete from A1 genes/proteins not found in A2
C. Rename genes/proteins whose names changed from A2 to A1
 Run name matcher on A1’
 Check for pathways with no enzymes and report them so user can keep any that
otherwise PathoLogic will delete

What about enzymes that were assigned to a pathway by the hole filler?

Re-run pathway predictor
Remember what pathways user deletes so they are not re-predicted by
PathoLogic

Consider movement of genes from contig to chromosome
