Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Metagenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Gene expression programming wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression profiling wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Microevolution wikipedia , lookup
WormBase -- one Web site, many roles PATO, December 2006 Caenorhabditis elegans PATO, December 2006 re s si on G da ta Se en qu e f RN G en un Ai en ce cti e pr ch o n od uc Tr ang t i an e G nt sg en G er e- en ac ene se et io q, ge ge ne A ns n e in n t n t e ib St am rac ody ti ru e, ct s on ur yn s M e c on ut y an orre m t p ct he ion no ty O N Si p t e ver ew e of ex all p e a Se cti res le qu on sio C el en an n l( ce aly na s m f e, M eat is fu ap u nc p re tio ing s n M ,ab dat o s a la t a Pr ic ion ot ei a S n tr n a ) f u u c ly n c t u si tio ra s ns l in C ov in fo al e n M i v it r Fu t m cro o nc tio od ar na ifi ray ca lc tio om n pl em S en NP ta s t io n Ex p Diverse data (from 3,739 papers) 1200 1129 1000 834 800 600 610 598 529 503 479 419 400 351 344 326 200 278 193 150 130 124 PATO, December 2006 85 58 58 57 43 26 20 0 12 Prior to July, 2006: ≈ 127 phenotype objects in WormBase. ≈ three-tiered organization (specialization_of or generalization_of) ≈ redundancy existed between terms ≈ no phenotype term definitions, references ≈ many RNAi experiments annotated to ‘Unclassified’ phenotype term ≈ ‘Not’ phenotype associations were not captured ≈ Phenotype vocabulary was not used for annotation of alleles and transgene objects PATO, December 2006 A controlled and structured vocabulary for phenotypes: ≈ allows complex data queries, and expedites analysis of genes that act in the same processes or pathways. ≈ helps to integrate a massive array of data from many different sources into a common body of knowledge. ≈ provides the option of linking phenotype data with other data in WormBase or with data from other databases. ≈ facilitates communication within and outside of the C. elegans community PATO, December 2006 Expansion of the phenotype ontology, source for term names: ≈ text descriptions in WormBase ≈ free text phenotype descriptions associated with alleles ≈ text associated with RNAi objects annotated to ‘Unclassified’ phenotype ≈ ≈ ≈ ≈ prior phenotype terms in WormBase GO ontology WormBase anatomy ontology Life stage ontology Term names and synonyms reflect the language of researchers. PATO, December 2006 The WormBase phenotype ontology is a pre-coordinated ontology: 1348 terms, ~20% of terms are defined PATO, December 2006 Current term usage: 40% used 60% not associated for annotation with an annotation PATO, December 2006 1 2 RNAi-phenotype data: ≈ 272,759 total RNAi-phenotype connections ≈ 63,439 RNAi experiments ≈ 19,692 genes associated with phenotypes via RNAi experiments: ≈ 19,185 genes connected via “Not” associations ≈ 4,577 genes connected directly PATO, December 2006 Allele-phenotype data: ≈ Most phenotype connections are to knockout alleles (NBP). ≈ Ongoing: ≈ Continuing to collect phenotype data from the community. ≈ Starting to annotate early papers describing large collections of mutants -> many high-level phenotype annotations. ≈ Starting to annotate new papers. Currently, 4,401 total allele-phenotype connections to 2585 alleles, defining 1296 genes. PATO, December 2006 Lots of RNAi data -> dense early_embryonic_lethal node: PATO, December 2006 Vague collections of phenotypes present challenges for ontology/annotation: pleiotropic_defects_severe_early_emb: “Often multiple pronuclei, aberrant cytoplasmic texture, drop in overall pace of development, osmotic sensitivity.” complex_phenotype_early_emb “Complex combination of defects that does not match other class definitions.” PATO, December 2006 Looking ahead to an entity-quality compatible schema: ≈ Within OBO-Edit we store relevant GO term names within primary names or synonym names (GO ID stored in relevant dbxref field) ≈ Phenotype ontology is developed using existing anatomy and life stage term names PATO, December 2006 Phenotype data integration: ≈ Phenotype annotations are associated with molecular information for alleles, transgenes, and RNAi objects that permit mapping these objects to the genome. ≈ High-level phenotype annotations associated with RNAi objects are automatically converted to GO terms (RNAi2GO) and associated with gene objects. ≈ Phenotype annotations describing gene regulation (‘transgene_expression_abnormal’) linked with detailed gene regulation information. ≈ Phenotypes linked to life stage and anatomy term PATO, December 2006 RNAi summary on gene page: PATO, December 2006 Sample detailed RNAi report: PATO, December 2006 Sample allele report: PATO, December 2006 Immediate future plans: ≈ Ontology: ≈ Define terms, further refine ontology (expansion will be dictated by community feedback and curation needs) ≈ Solicit more expert community feed-back ≈ Web site: ≈ Enhance phenotype search tools PATO, December 2006 Ontology browser to be integrated into WormBase: http://elbrus.caltech.edu/cgi-bin/igor/ontology/ontology.cgi PATO, December 2006 WormBase = ~30 people, 4 centers Cold Spring Harbor Laboratory Washington University at St. Louis Payan Canaran Jack Chen Tristan Fiedler Todd Harris Sheldon McKay Will Spooner Lincoln Stein Tamberlyn Bieri Darin Blasiar Phil Ozersky John Spieth California Institute of Technology Wellcome Trust Sanger Institute Igor Antoshechkin Carol Bastiani Juancarlos Chan Wen Chen Ranjana Kishore Raymond Lee Hans-Michael Müller Cecilia Nakamura Andrei Petcherski Gary Schindelman Erich Schwarz Paul Sternberg Kimberly Van Auken Daniel Wang Xiaodong Wang Paul Davis Richard Durbin Michael Han Anthony Rogers Mary Ann Tuli Gary Williams PATO, December 2006 Other acknowledgements: ≈ NIH/NHGRI ≈ C. elegans research community PATO, December 2006