Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genome Browser The Plot Deepak Purushotham Hamid Reza Hassanzadeh Haozheng Tian Juliette Zerick Lavanya Rishishwar Piyush Ranjan Lu Wang The Outline • • • • The Need & The Requirement The Options The Chosen One The New Age Why one should develop a Genome Browser THE NEED Why A Genome Browser? I want to analyze this organism Why A Genome Browser? I want to analyze this organism Metabolic Pathways What is expected out of a Genome Browser THE REQUIREMENT A Genome Browser? I want something manageable A Genome Browser! The Genome Browser “Genome browsers facilitate genomic analysis by presenting alignment, experimental and annotation data in the context of genomic DNA sequences.” Melissa S Cline & James W Kent, 2009 Genome browsers aggregate data Taken From Andy Conley’s slides without permission A Short Survey of the available Genome Browsers Modules THE OPTIONS A Brief Time Travel • FlyBase, SGD, MGD, and WormBase • Setting up an MOD is expensive and time-consuming. • The four MODs agreed in the fall of 2000 to pool their resources and to make reusable components available to the community free of charge under an open source license. • The goal of this NIH-funded project, christened GMOD, is “…to generate a model organism database construction set that would allow a new model organism to be assembled by mixing and matching various components.” GMOD Who uses GMOD? GMOD Components Visualization - GBrowse Visualization JBrowse GBrowse Synteny CMAP DATA MANAGEMENT Chado Tripal (http://www.cacaogenomedb.org/) TableEdit BioMart InterMine ANNOTATION MAKER DIYA Galaxy Ergatis Apollo REALLY EXCITING OPTION! JBrowse • Smooth, fast navigation (think Google Maps for genomes ) JBrowse • Smooth, fast navigation (think Google Maps for genomes ) • Supports BED, GFF, Bio::DB::*, Chado, WIG, BAM, UCSC (intron/exon structure, name lookups, quantitative plots) • Relies on pre-indexing to minimize security exposure and runtime bandwidth/CPU load on the server (future versions more likely to do some server work at runtime) • Has an API for customized track/glyph extensions • Is stably funded by NHGRI, with many interesting innovations implemented & pending integration Smoother UI Most Genome browsers How is JBrowse different? First look: Live Demo A couple of JBrowses around the web • http://intron.ccam.uchc.edu/JBrowse/Dmel/ • http://jbrowse.org/ucsc/hg19/ Types of Tracks Pros • Fast and smooth! • User Friendly • Works nicely on an iPad/iPhone too Cons • No user-uploaded data support • Slow for big numbers of reference seqs (e.g. 5,000 annotated contigs) • Few glyph options, feature tracks are limited by the facts of <div> What to pick? Tried and tested ? Fancy concept Gbrowse and its Features THE CHOSEN ONE GBrowse • Most popular web based genome browser • Visualize genome features along a reference sequence • Open Source • Highly customizable • Excellent usability • Rich set of “glyphs” – Genome features – Quantitative Data – Sequence Alignments GBrowse Header Main Browser Window Track Menu Under The Hood • Client-Server Architecture • GBrowse Architecture • Installation Issues • Input Data • Configuration File • Customization Client Server Architecture 1. The user types in the URL: browser2012.biology.gatech.edu Client Server Architecture 2. Browser interprets and sends the request to HTTP Server Client Server Architecture 3. Web Server receives the request and “serves” the client i.e., starts Gbrowse Client Server Architecture 4. In case of success, relevant hypertexts and multimedia is generated by accessing the database Client Server Architecture 5. The output traverses the same path back Client Server Architecture 5. The output traverses the same path back Client Server Architecture 6. The whole process repeats again when the user interacts with the browser How you see what you see Juxtaposed Images How are so many images generated? How you see what you see + Hyper Text files How you see what you see Multimedia files + Hyper Text GBrowse Architecture Stein L D et al. Genome Res. 2002;12:1599-1610 ©2002 by Cold Spring Harbor Laboratory Press The Bio::DB::SeqFeature database Schema Parent2Child Name 1 n Type List 1 n 1 Attribute Feature n 1 n n Location List n Attribute List 1 1 Data file (.gff3) Source Eg: Prodigal/ Reference Glimmer Sequence (Chr/Clone /Contig) Type (sequence ontology (SO) terms) Start End Score Eg: Evalue Strand Phase (0/1/2) Attributes Format: tag=value Attributes (Data file) Different tags have predefined meanings: • ID: Gives the feature a unique identifier. Useful when grouping features together (such as all the exons in a transcript). • Name: Display name for the feature. This is the name to be displayed to the user. • Alias: A secondary name for the feature. It is suggested that this tag be used whenever a secondary identifier for the feature is needed, such as locus names and accession numbers. • Note: A descriptive note to be attached to the feature. This will be displayed as the feature's description. Alias and Note fields can have multiple values separated by commas. For example : Alias=M19211,gna-12,GAMMA-GLOBULIN • Other good stuff can go into the attributes field. Gbrowse Configuration File • • • • • Global Website Settings Additional HTML Pages JavaScript Jquery Global Database Settings • Data Source Definitions Customizations Configuration file (.conf) Making a new Track ### TRACK CONFIGURATION ### [ExampleFeatures] feature = remark glyph = generic stranded = 1 bgcolor = orange height = 10 key = Example Features Adding Multiple Tracks Data: Configuration: Searchable Links Result UI: Popup balloons with links Searching for Features click Gene symbols Gene IDs Sequence IDs Genetic markers Relative nucleotide coordinates Absolute nucleotide coordinates etc... Viewing Multiple Tracks Low Magnification Viewing Multiple Tracks High Magnification In short… • Main features (Determination of protein coding and non-coding,…) • Quantitative data (E-value, Identity percentage) • Other evidences (Interpro, CoGs, etc.) • GC content and other useful measurements • Protein and DNA sequences Value-Added Additions THE NEW AGE What’s New RICHER ANNOTATION Richer Annotation INCREASED ANNOTATION INFO 3000 Total Genes 2500 Pangenome Hits UniProt 2000 1500 1000 500 0 M19107 M19501 M21127 M21621 M21639 M21709 Richer Annotation INTEGRATED QUALITY SCORE Origin of Database Matches Color code was used for matches originated from different databases Quality Value Integration It distinguishes between different databases… However, for matches from the same database… Quality Scores Origin of Database Matches Color code will also be used for matches with different quality… Different E-values shown with different shades of colors What’s New MORE LINK-OUTS COGs KEGG ID What’s New PATHWAYS KEGG ID KEGG Compound KEGG Genes KEGG Pathway Synthesis! ORGANISM SPECIFIC PAGES Organism Summary Page • At this point of the course, we have gathered a lot of information for the strains we are dealing with • Not all of this information could be represented inside the genome browser • We propose a separate section in the browser containing strain-wise summarized information Organism Summary Page • Conceptually, the page could contain: – Biological information – Assembly information: Genome Size, Number of contigs, N50, Sequencing platform – Gene Prediction information: Number of protein coding and non-protein coding genes, links to 16s rRNA gene – Annotation information: Percent annotation, function distribution pie – Comparative information: Unique protein clusters, etc. Organism Summary Page Adding more values OPERONS Operons • Operon “…is a functioning unit of genomic DNA containing a cluster of genes under the control of a single regulatory signal or promoter” • ~70% of the genes have been assigned a unique OperonID • OperonID will provide an additional browsing mechanism for biologist connecting cotranscribed and co-regulated genes. Operons Incorporating Operon Information More with Comparison BRIG PATTERN BRIG Patterns • Concept: To either generate BRIG images at run time or load static images when the user requests for BRIG Pattern between two species BRIG Patterns That’s All Folks! • Questions? • Comments? • Concerns? • If you have any suggestions, we would love to hear from you! (There is a page on Wiki for it!)