Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Core 2: Bioinformatics CBio-Berkeley Outline • Berkeley group background • Core 2 first round – what: aims, milestones – how: software lifecycle, interaction w/ other cores • Current progress • Discussion Berkeley group: genomics • Formerly BDGP (Berkeley Drosophila Genome Project) Informatics – Genome sequencing, analysis and annotation – Genomic application development – Database development • FlyBase • Generic Model Organism Database Apollo GBrowse In-situ expression database Genomics applications • GadFly – analysis and annotation database – pipeline software • BOP – computational analysis integration • CGL – Comparative Genomics Software Library SO and SOFA • Sequence Ontology for Feature Annotation • Ontology for genomics – Sequence feature classes: • mRNA, intron, UTR, sequence_variant, … – Sequence feature relations • exon part_of transcript • polypeptide derives_from mRNA Chado • Model organism relational database schema – FlyBase, GMOD • Modules – – – – – – – sequence annotations expression map genotype phenotype ontology/cv … • Generic schema – Uses ontologies for strong typing Berkeley group: GO • Gene Ontology - Informatics – – – – Database, web portal Ontology editing tools Ontology QC and integration OBO OBO-Edit (formerly DAG-Edit) AmiGO and GO Database Obol • Problem: large ontologies of composite terms are difficult to manage • Solution: partial automation (reasoners) • Requires logical definitions – how do we obtain them? • Solution: Obol – Parses logical definitions from class names – Logical definitions can be reasoned over • detect errors and automation – Integrates OBO ontologies OBO Relations Ontology • Common relations used across ontologies must mean the same thing – – – – – is_a part_of derives_from has_participant … • OBO relations ontology provides precise definitions – defines class-level relations in terms of their instances • http://obo.sourceforge.net/relationship – collaboration with core5, Manchester & others Outline • Berkeley group background • Core 2 first round – what: aims, milestones – how: software lifecycle, interaction w/ other cores • Current progress • Open questions Core 2 specific aims • Aims 1. 2. 3. 4. • Capture and describe data Reconcile annotation and ontology changes Store, view and compare annotations Link disease genes First round – phenotypes: Fly and Zebrafish – HIV clinical trial data Aim 1: Capture and describe data • Phenotype data capture – OBO-Edit plug-ins – Combine classes from multiple ontologies • PATO, anatomical ontologies – NLP tools? • Clinical trial data capture – what are the appropriate tools? Aim 1: Capture and describe data • Zebrafish, fly – PaTO: Phenotype and trait ontology • phenotype ‘primitives’ – – – – ‘Entity-Attribute-Value’ model Phenotype ontologies Genetic data Orthologs • Clinical trial data – generic instance model – what are the appropriate ontologies here? PATO • An ontology of attributes and attribute values – e.g. morphology, structure, placement • Current status of PATO? – needs work to conform to sound ontology principles • definitions • formalisation of attributes – working with core3-cambridge (Gkoutos) and core5 (Neuhaus) Phenotype annotation • Entity-attribute structured annotations – Entity term; PATO term • • • • • brain FBbt:00005095; fused PATO:0000642 gut MA:0000917; dysplastic PATO:0000640 tail fin ZDB:020702-16; ventralized PATO:0000636 kidney ZDB:020702-16; hypertrophied PATO:0000636 midface ZDB:020702-16; hypoplastic PATO:0000636 • Pre-composed phenotype terms – Mammalian Phenotype Ontology • “increased activated B-cell number” MPO:0000319 • “pink fur hue” MPO:0000374 Example (Fly) A481G Gene: Jra Allele: Jra[bZIP.Scer\UAS] Allele Description: bZIP defects in head and dorsal cuticle. Scer\GAL4[hs.PB] induces….. Entity Attribute Value Background/ Environment embryp viability lethal Scer\GAL4[hs.P B] dorsal cuticle shape abnormal … … … … wing vein L2 shape branched temperature sensitive Genotype-Phenotype datamodel • Need to model complex genotypes • Environment • Phenotype – E-A-V is not enough • Relational attributes • Complex phenotypes • Measurements and assays – CSHL 2005 Phenotype meeting Aim 2: Reconcile annotation and ontology changes • Ontology evolution can trigger annotation changes • Identifiers – all classes and annotations will have stable identifiers – Cores 1 and 2 to decide on identifier model • LSID URNs • OntoTrack Aim 3: Store, view and compare annotations • OBO: ontologies • OBD: data annotated using ontologies – genotype-phenotype – clinical trials – others OBD: A Database for OBO • Data warehouse – collected from MODs and other sources • Annotation versioning • Generic data model – Any data typed by OBO classes can be stored • Specific annotation data views – Clinical trial data view – Phenotype data view • Chado-compliant • Entity-attribute-(value) model Key technologies • ‘Semantic Web’ database technology – ontology-aware • ontologies are part of meta-model • higher level query languages – SPARQL, SeRQL, … • tool interoperability – Protégé-OWL, Jena, .. – SQL compatibility • optionally layered on relational model – Standards? Maturity? • Many implementations – Sesame, Kowari, Aim 3: Store, view and compare annotations • Browsing – AmiGO-2 • Advanced visualization – work with core 1 (University of Victoria) Comparing annotations • process vs state – regulatory processes: • acidification of midgut has_quality reduced rate • midgut has_quality low acidity • development vs behavior – wing development has_quality abnormal – flight has_quality intermittent • granularity (scale) – chemical vs molecular vs cell vs tissue vs anatomical part Integrating anatomical ontologies • Annotations should be comparable between species – phenotype annotations are composed of anatomical terms • Multiple species-centric anatomical ontologies – Problem: how do we compare across species? – XSPAN (Bard et al): creating mappings – Core 1: ontology mappings Aim 4: Linking disease genes • Homology data – Orthologous genes • Genomic data – SNPs, sequence variants • Ontologies – Disease ontologies – Semantic similarity – Ontology integration • Obol, XSPAN Linking disease to phenotype • Relationship of phenotype to diseases and disorders – essentialist – statistical • Disease ontologies – OBO disease ontology (Northwestern) – EVOC disease ontology (EVOC) – Others • Disease ontology workshop (core 5) – November 2006 Outline • Berkeley group background • Core 2 first round – what: aims, milestones – how: software lifecycle, interaction w/ other cores • Current progress • Open questions Software lifecycle • Software is developed in phases • Different phases require interaction with different cores • Iterative “Agile” methodology – fast cycles – involve ‘customer’ (core3) at all phases Outline • Berkeley group background • Core 2 first round – what: aims, milestones – how: software lifecycle, interaction w/ other cores • Current progress Current progress • Meetings – CSHL November 2005 • Phenotype ontology meeting • Phenotype tools workshop – Berkeley, UVic, Core 3 • OBO-Edit complex class plug-in • Phenotype browser prototype • Genotype-Phenotype datamodel OBO-Edit complex class plugin • Combinatorial composition of classes • Current use-cases: – plant anatomical structures – integrating GO and OBO-Cell • Ideal for phenotype classes – extend to make ‘phenotype’ plug-in OBD Progress • Genotype-Phenotype data model defined • Prototype implemented • evaulating technologies Phenotype browser • Experimental branch of AmiGO code • Allows browsing and querying of combinatorial phenotype annotations • Experimental dataset • Demo – http://yuri.lbl.gov/amigo/obd