* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Integration, Gene Ontology, and the Mouse*
Survey
Document related concepts
Oncogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Metagenomics wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Microevolution wikipedia , lookup
Genome (book) wikipedia , lookup
Genome evolution wikipedia , lookup
Genome editing wikipedia , lookup
Pathogenomics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Transcript
Data Integration, Gene Ontology, and the Mouse* Joel Richardson, Ph.D. Mouse Genome Informatics Group The Jackson Laboratory Bar Harbor, Maine 04609 * Not necessarily in that order. We have the human sequence: OK, now what? One species is not enough: The sequence is just the beginning model organisms (one strain is not enough) comparative studies sequence variants gene regulation and interaction networks non-coding functional elements environmental effects Genotype to phenotype The Mouse the premier animal model for studying human disease > 95% same genes same diseases, similar reasons (e.g., cancer, hypertension, diabetes, osteoporosis, …) 1000s lab strains, diff. characteristics precise genetic control The Jackson Laboratory Private nonprofit research institution (est. 1929) Studying mouse as a model of human biology and disease National Cancer Research Center Supplier of laboratory strains to researchers worldwide Areas: metabolism, development, cancer, immune response www.jax.org Bar Harbor, ME 04609 Mouse Genome Informatics (MGI) Consortium of NIH-funded projects Housed at TJL Integrates and disseminates public data resources covering selected aspects of mouse biology First program project funding 1989 > $10M/y total, >60 people Online since 1994. www.informatics.jax.org QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. MGI Concept Map Phenotypes Genotypes Strains Alleles Expression Data Anatomy Genes and other loci Mapping Data Variants DNA and Protein Sequences References Molecular Fragments Accession IDs Integration in MGI Identifying objects. Resolving or noting discrepancies. Integration is key to knowledge discovery in age of genomics The Power Of Integration: Queries What transcription factors are expressed in a 2-cell embryo and not in a blastocyst? What development QTLs contain these TFs? integration of multiple expression assay data sets and data types. standardization of anatomical references and developmental stages integration of expression data and mapping data genetic map result of integrating lots of mapping data What strains are distinguished by SNPs in this region? And so on… The MGI System (from 40,000 feet) Data Downloads Literature Curation Editing Interface Load scripts MGI RDBMS Servlets CGI Scripts Files Web SQL Report Scripts Files MGI in Context Unigene TIGR GO DoTS Ensembl Anatomy Interpro OMIM SwissProt RefSeq RPCI NCBI I.M.A.G.E. LocusLink MGC GenBank NIA Mutagenesis Centers Scientific Literature MGI db RIKEN ATCC RatMap Integration relies on Standard Vocabularies Structured vocabularies The common semantic frameworks Structured into is-a/part-of hierarchies Evidence-based annotation Associations of vocabulary terms with objects Evidence (codes), citations, etc., decorate the associations Structured annotations and queries Structured Vocabularies in MGI Gene Ontology (GO) Mammalian Phenotype (MP) Annotations to genotypes (e.g. knockouts) Mouse Anatomical Dictionary Functional gene annotations Annotations of expression Other standardized, non-structured vocabularies Mouse strains cell lines clone libraries tissues lots of smaller ones Challenges Domain very difficult to frame Huge variability, variety of data, formats, providors, update schedules&semantics, etc… Biologists and Computer Scientists think differently. communication is paramount, but difficult Rapid changes, e.g., in last 10 years: genetic crosses -> YAC/BAC mapping -> RH mapping -> genome sequence northern blots -> microarrays -> mpss System Evolution The system is a software ecosystem Maintenance is the cost of success Changes and cost/benefit If it ain’t broke, don’t fix it Commitments/agenda/priorities Credits Richard Baldarelli Matt Baya Jon Beal Dale Begley Judy Blake John Boddy Dirck Bradt Carol Bult Nancy Butler Donna Burkart Jeff Campbell Lori Corbani Rebecca Corey Sharon Cousins Diane Dahmen Harold Drabkin Janan Eppig Jackie Finger David Garippa Lucette Glass Carroll Goldsmith Pat Grant Terry Hayamizu David Hill Jim Kadin Ben King Debbie Krupke Moyha Lennon-Pierce Jill Lewis Ira Lu Cathy Lutz Lois Maltais Prita Mani Mike McCrossin Louise McKenzie David Miers Daniel Modrusan Dieter Naf Li Ni Janice Ormsby Sridhar Ramachandran Deborah Reed Joel Richardson Martin Ringwald David Shaw Bob Sinclair Cynthia Smith Connie Smith Paul Szauter Leslie Trombley Pierre Vanden Borre Michael Walker Linda Washburn Josh Winslow Iry Witham Sophia Zhu