Download Data mining - Bilkent University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining in Ensembl with
EnsMart
Possible queries…
• All genes from a candidate region
• Genes with a particular protein domain
• Members of a protein family
• Genes associated with SNPs
2 of 24
Specific queries
• Disease related genes between markers D10S255
and D10S259
• Transmembrane proteins with an Ig-MHC domain
(IPR003006) on chromosome 2
• Genes with associated coding SNPs on
chromosomal band 5q35.3
• Mouse homologues for human disease genes.
3 of 24
More specific queries
• Human genes with upstream regions conserved
w.r.t. mouse
• Upstream sequence for all Ensembl genes mapped
to U95A chip (similarly, complete genomic
annotation of MG_U74).
• Genomic location and description of all mouse, rat
and fugu homologues of all human genes, with
transmembrane domains, expressed in
cardiovascular system and have non-synonymous
SNPs.
4 of 24
EnsMart – vertical and
horizontal data integration
Human
Rat
Mouse
Anopheles Zebrafish
Fugu
Ensembl Genes
SNPs
EST Genes
Vega Genes
5 of 24
Ensembl data sets
Genes
EST
Markers
Diseases
Protein Annotation
SNPs
Homology
Expression
6 of 24
EnsMart
•
•
•
•
•
Data retrieval tool
Query builder interface
Gene or SNP lists
Associated features or sequences
Various output formats
7 of 24
Information flow
start
SPECIES
filter
output
REGION
REGION
GENE
GENE
EXPRESSION
EXPRESSION
HOMOLOGY
HOMOLOGY
PROTEIN
PROTEIN
SNP
SNP
REFSEQ
FASTA
EMBL
GTF
AFFY
HTML
SWISSPROT
TEXT
FOCUS
GO
EXCEL
INTERPRO
FILE
8 of 24
Species and focus
9 of 24
Restrict your query
10 of 24
Restrict your query
11 of 24
Select output options
12 of 24
Select output options
13 of 24
Output formats
HTML
14 of 24
Obtaining sequences
15 of 24
Ensembl core database
•
•
•
•
Normalised
Each data point stored only once
Quick updates
Minimal storage requirements
But:
• Many tables
• Many joins for complicated queries
• Slow for data mining questions
16 of 24
Mart database
•
•
•
•
De-normalised
Tables with ‘redundant’ information
Query-optimised
Fast and flexible
• Ideal for data mining
17 of 24
Related documents