Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Phenotype database interoperability and integration Damian Smedley, EBI Mouse models for human disease The Royal Society London, May 19-21st, 2010 Why do we need data integration and interoperability? Mouse models for human disease The Royal Society London, May 19-21st, 2010 Centralised Distributed warehouse solutionsolutions v1 v2 Centralised vs distributed Genomics portal MGI JaxMice Ensembl Central database IKMC projects KOMP Strains EUCOMM NorCOMM Mouse models for human disease IMSR EMMA nightly data syncs web services Phenotype/Expression TIGM Eurexpress /GXD etc Europhenome The Royal Society London, May 19-21st, 2010 Centralised solutions Advantages – Better query performance for large datasets – Easier to analyse raw data in one location Disadvantages – Regular data deposition is non-trivial – Designing a single schema to store different types of data is not simple. – Persuading people to “give up” their data/databases/websites – Will still need to make interoperable with other data sources Mouse models for human disease The Royal Society London, May 19-21st, 2010 Distributed solutions Advantages – Domain expertise at production site exploited – Different types of data easily integrated as long as they share something in common such as a gene identifier – No need for nightly data flow to keep data up to date – No need for redundant data in each database – Easier to persuade people to collaborate in a distributed scenario Disadvantages – Technical knowledge required to deploy the web services – Potential query performance problems for large datasets (may need to provide summary level data) – Potential problems performing analysis over all datasets – Problems with services going down Mouse models for human disease The Royal Society London, May 19-21st, 2010 1000 Genomes - centralisation Mouse models for human disease The Royal Society London, May 19-21st, 2010 International Cancer Genome Consortium France Liver (alcohol-related) Breast (HER2+ve) UK Breast (several subtypes) Japan Liver (virus related) Canada Pancreas China Stomach Spain CLL India Oral Cavity Australia Pancreas Mouse models for human disease The Royal Society London, May 19-21st, 2010 ICGC - distributed Mouse models for human disease The Royal Society London, May 19-21st, 2010 Joint Ensembl and EurExpress query Mouse models for human disease The Royal Society London, May 19-21st, 2010 IKMC portal: knockoutmouse.org NorCOMM Eurexpress IMSR CMMR Europhenome EUCOMM GXD EMMA KOMP rep KOMP TIGM Ensembl CREATE Mouse models for human disease The Royal Society London, May 19-21st, 2010 IKMC interoperability strategy CREATE Ensembl GXD Sanger, UK JAX, USA EBI, UK IKMC MGI ID MGI ID EURExpress MGI ID Sanger, UK ES cells + lines MGI ID MGI ID EMMA (UK), KOMP (USA), CMMR (Canada) Edinburgh, UK BioMart query interface(s) MGI ID MGI ID MGI Phenotype(EuroPhenome etc) JAX, USA Harwell, UK Mouse models for human disease The Royal Society London, May 19-21st, 2010 www.knockoutmouse.org/martsearch Mouse models for human disease The Royal Society London, May 19-21st, 2010 Europhenome: raw and summary data Mouse models for human disease The Royal Society London, May 19-21st, 2010 Possible strategy for phenotype data CREATE High thoughput phenotyping centres Ensembl GXD Sanger, UK JAX, USA EBI, UK IKMC MGI ID MGI ID Central database Sanger, UK ES cells + lines MGI ID EURExpress MGI ID MGI ID Presentation of results EMMA (UK), KOMP raw (USA), CMMR (Canada) Edinburgh, UK BioMart query interface(s) MGI ID MGI ID Analysis to assign MGI phenotypes to genes High throughput phenotyping JAX, USA Mouse models for human disease The Royal Society London, May 19-21st, 2010 Linking from IKMC portal Phenotype searches Phenotyping Mouse models for human disease The Royal Society London, May 19-21st, 2010 Linking from IKMC portal Mouse models for human disease The Royal Society London, May 19-21st, 2010 Mouse models for human disease Mouse models for human disease The Royal Society London, May 19-21st, 2010 Acknowledgements The whole CASIMIR consortium and in particular: • Paul Schofield, Michael Gruenberger, Chao-Kung Chen, George Gkoutos, Ann-Marie Mallon, John Hancock: MouseFinder tool. • MartSearch: Vivek Iyer, Darren Oakley, Bill Skarnes • BioMart: Arek Kaspryzk, Syed Haider, Edoardo Marcora Mouse models for human disease The Royal Society London, May 19-21st, 2010