Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Data Mining in Ensembl with EnsMart Possible queries… • All genes from a candidate region • Genes with a particular protein domain • Members of a protein family • Genes associated with SNPs 2 of 24 Specific queries • Disease related genes between markers D10S255 and D10S259 • Transmembrane proteins with an Ig-MHC domain (IPR003006) on chromosome 2 • Genes with associated coding SNPs on chromosomal band 5q35.3 • Mouse homologues for human disease genes. 3 of 24 More specific queries • Human genes with upstream regions conserved w.r.t. mouse • Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74). • Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non-synonymous SNPs. 4 of 24 EnsMart – vertical and horizontal data integration Human Rat Mouse Anopheles Zebrafish Fugu Ensembl Genes SNPs EST Genes Vega Genes 5 of 24 Ensembl data sets Genes EST Markers Diseases Protein Annotation SNPs Homology Expression 6 of 24 EnsMart • • • • • Data retrieval tool Query builder interface Gene or SNP lists Associated features or sequences Various output formats 7 of 24 Information flow start SPECIES filter output REGION REGION GENE GENE EXPRESSION EXPRESSION HOMOLOGY HOMOLOGY PROTEIN PROTEIN SNP SNP REFSEQ FASTA EMBL GTF AFFY HTML SWISSPROT TEXT FOCUS GO EXCEL INTERPRO FILE 8 of 24 Species and focus 9 of 24 Restrict your query 10 of 24 Restrict your query 11 of 24 Select output options 12 of 24 Select output options 13 of 24 Output formats HTML 14 of 24 Obtaining sequences 15 of 24 Ensembl core database • • • • Normalised Each data point stored only once Quick updates Minimal storage requirements But: • Many tables • Many joins for complicated queries • Slow for data mining questions 16 of 24 Mart database • • • • De-normalised Tables with ‘redundant’ information Query-optimised Fast and flexible • Ideal for data mining 17 of 24