Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Structured Advanced Query Page Tomer Altman & Mario Latendresse Bioinformatics Research Group SRI, International August 18, 2009 1 SRI International Bioinformatics Introduction BioVelo is a query language Like SQL but simpler and easier to learn Documentation: http://biocyc.org/bioveloLanguage.html Free-Form Advanced Query Page allows Web submission of BioVelo queries Structured Advanced Query Page (SAQP) Web page for interactively constructing precise queries to PGDBs Queries are translated to BioVelo and sent to the server for processing SAQP: http://biocyc.org/query.html Documentation: http://biocyc.org/webQueryDoc.html 1 SRI International Bioinformatics Why a query interface? Allow a structured way to access the rich data representation stored in a PGDB. Most advanced databases have a high-level, declarative method of access (i.e., SQL). Provides an intermediate level of access between graphically browsing the PGDB and programmatically querying the data using an API or BioVelo 1 SRI International Bioinformatics The Structured Advanced Query Page 'Advanced', in that it allows you to ask more advanced and complicated queries than the basic search interface In other words, the SAQP allows you to search for data that satisfy a precise set of conditions 'Structured', in that it is a dynamic HTML form that guides you in creating a well-formed query 'Page', in that it is accessed via the Web interface for Pathway Tools 1 SRI International Bioinformatics The Structure of the SAQP: Database Class specification specification Conditions Output Data 1 on attributes of classes attributes description format (HTML vs TXT) SRI International Bioinformatics Example #1: A simple query usually consists of querying a particular database about a particular class. Find all the proteins in E. coli K-12. Display 1 the protein names. SRI International Bioinformatics Structure of the Results A line that shows the equivalent BioVelo expression that the SAQP generated to answer the query. A HTML table of the results, with the corresponding entries hyperlinked to the matching Pathway Tools Web pages. If a text data format was requested, then a tabdelimited text file is generated, with just the table data. 1 SRI International Bioinformatics Example #2: Find all the proteins of E. coli K-12 for which the DNAFOOTPRINT-SIZE is smaller than 10. Display 1 the protein name, and the DNA footprint size. SRI International Bioinformatics Example #3: In EcoCyc, display polypeptides constrained by experimentally determined molecular weight and isoelectric point. The experimental molecular weight should be between 50 and 100 kD. The pI should be less than 7. Display the polypeptide name, the experimental molecular weight, and the pI. 1 SRI International Bioinformatics Example #4: The SAQP allows for specifying quantifiers on relations between PGDB objects. Extend example #3 to select only proteins whose encoding gene is situated within the first 500 kilobases of the E. coli chromosome. 1 SRI International Bioinformatics Example #5: Queries with Several Components A second search component will search potentially another database and another class of objects for each element found in the first search component. It is called a 'cross-product' search. Any number of search components can be added. In general, the new search component is done for each set of objects found in the previous components. Some restraints is needed not to build a query that takes too long to answer. (The server gives a limit of a few minutes for a query.) Example: Search for MetaCyc pathways in the taxonomic range of Bacteria that also exist in E. coli K12 using the common-name attribute. 1 SRI International Bioinformatics Introduction to BioVelo BioVelo is based on set and list comprehension. In Mathematics, a set comprehension describes a set of values as in: {x | x in Prime, x > 100} The output is 'x', the body has a generator 'x in Prime' and a condition 'x > 100'. Several conditions and several generators could be used. BioVelo used a concise syntax: 1) [ output-expression : generator, condition, ... ] 2) a generator has the form v ← database^^class 3) a condition uses logical and relational operators 1 SRI International Bioinformatics Examples of BioVelo Queries [r : r <- ecoli^^reactions] [p^name : p <- ecoli^^proteins] [p^?name : p<- ecoli^^proteins] [p^?name : p <- ecoli^^proteins, p^dna-footprint-size < 10] [(g^?name, g^left-end-position): g <- ecoli^^genes, g^left-end-position < 153000] [(g^?name, k): g<- ecoli^^genes, k := abs(g^left-endposition – g^right-end-position)+1, k < 200 ] [(r^?name, c^?name) : r<- ecoli^^reactions, c<- r^left, c in r^right] 1 SRI International Bioinformatics