Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BINF 4360, Fall 2007 Rachel Adams, Jerry Choate, Nathan Harrelson, Divya Mistry, and Whitney Smith Overview Goals Implementation Interface Images Final product Conclusions Goals Create a dynamic map of the Shewenella Oneidensis MR-1 genome Populate local database with relevant information from web-based databases Provide an efficient searching algorithm for key terms Implement user-friendly navigation and readability Implementation SQL Schema Parsing Databases Parsing XPath XPath was used to quickly parse through XML documents generated from NCBI’s SOAP interface. my $xp=XML::XPath->new(filename=>$file); # gets the locus tag foreach $var ($xp->find('//Gene-ref')->get_nodelist) { $name = $var->find('Gene-ref_locus')->string_value; $locus = $var->find('Gene-ref_locus-tag')->string_value; } LWP::Simple Simple was used to grab content from a url so it could be easily written to an XML file. Regular Expressions Regular expressions were used to parse through HTML files, match specific string patterns, and manipulate text. Schema area_area_id_seq img_img_id_seq img sequence_name name cache_value bigint sequence_name name cache_value bigint img_id integer last_value bigint log_cnt bigint last_value bigint log_cnt bigint map varchar(5) increment_by bigint is_cycled boolean increment_by bigint is_cycled boolean max_value bigint is_called boolean max_value bigint is_called boolean min_value bigint min_value bigint ncbi_proteins area imgplacement locus_tag text description text area_id integer target text img_id integer date date gene text href text coords text tilex integer title text img_id integer tiley integer defintion text pdb ncbi_genes kegg id text id integer location text gene_id text pdb text name text description text kegg_id text locus_tag text function text month integer cog_id text day integer gi text year integer img_id text Databases NCBI COG Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. IMG Local databases were populated using information retrieved from gene, protein, and 3D domain web-based databases. The Integrated Microbial Genomes (IMG) system's goal is to facilitate the visualization and exploration of genomes from a functional and evolutionary perspective. KEGG Knowledge-based methods for uncovering higher-order systemic behaviors of the cell and the organism from genomic information is stored in KEGG, Kyoto Encyclopedia of Genes and Genomes. More Databases MIST ORNL The Genome Analysis and System Modeling Group of the Life Sciences Division of ORNL provides bioinformatics and analytic services and resources to collaborators, predicts prospective gene and protein models for analysis, and provides user services for the general community. PDB The Microbial Signal Transduction database contains the signal transduction proteins for 591 complete bacterial and archaeal organisms. The RCSB PDB provides a variety of tools and resources for studying the structures of biological macromolecules and their relationships to sequence, function, and disease. ShewCyc ShewCyc is a part of BioCyc, a collection of 371 Pathway/Genome Databases, which describes the genome and metabolic pathways of the Shewenella Oneidensis MR-1 genome. Interface Functions provided by Google’s Map API were used to display pathways of the Shewenella genome. A small overview map is provided to give a bird’s eye view of the entire image. The current view is indicated with a translucent box. The user has the ability to view the pathways using 5 different zoom levels. Text balloons show information relevant to the user’s selected target. A search bar offers quick targeting of a user’s query of interest. The user can either pan over the images and click on areas of interest or enter a query in a search bar to find specific information. If the user submits a term to be queried, relevant targets are indicated on the map with colored pins. Images ImageMagick is a free software suite to create, edit, and compose bitmap images. The main functions that we took advantages of included the ability to resize, sharpen, pad, and stitch together images. We also were able to create a composite image by combining several (212) separate images. Placing the images within 16384 by 16384 pixels took strategic manipulation and tedious offset calculation. Final Product Zoomed image Final Product Query for glycogen Final Product Query for ATP Conclusions Using GoogleMaps we were able to create a searchable map of pathways in the Shewenella genome. Efficient parsing methods made collecting and querying data far simpler. With more time, additional improvements could be implemented to increase the usability of this application. Currently we offer links to images, but it would be optimal to have thumbnails of the pictures themselves readily viewable. GoogleWebToolkit has several functions that would make more information available for the user. Tabs on text balloons could separate data into topical subgroups. Overlaying a transparent map on top of the current map could be a useful tool for comparing two pathways. Additionally, the overall scope of the project would be enhanced if we had even more indepth zoom levels such that the user could actually see the sequence of the amino acids and nucleotides.