* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Information Encoding in Biological Molecules: DNA and
Gene therapy of the human retina wikipedia , lookup
Essential gene wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Oncogenomics wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Transposable element wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Metagenomics wikipedia , lookup
Gene therapy wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene nomenclature wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Human genome wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene desert wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Genome editing wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Designer baby wikipedia , lookup
http://creativecommons.org/licenses/by-sa/2.0/ Lecture/Lab 7.3 1 Ensembl Database and Web Browser Erin Pleasance Canada’s Michael Smith Genome Sciences Centre, Vancouver Lecture/Lab 7.3 2 www.ensembl.org Lecture 7.1 3 What is Ensembl? • • • • • Joint project of EBI and Sanger Automated annotation of eukaryotic genomes Open source software Relational database system Web interface “The main aim of this campaign is to encourage scientists across the world - in academia, pharmaceutical companies, and the biotechnology and computer industries - to use this free information.” - Dr. Mike Dexter, Director of the Wellcome Trust Lecture/Lab 7.3 4 Ensembl components Search tools: Data: Chromosomes (ChromoView, KaryoView, CytoView, MapView) Diseases SNPs and Haplotypes (SNPView, GeneSNPView, HaploView, LDView) (DiseaseView) Functions (GOView) Sequence Similarity (BLAST, SSAHA) Genes (GeneView, TransView, ExonView, ProtView) Families (DomainView, FamilyView Genome Sequence Markers (MarkerView) (ContigView) Comparative Genomics Text (TextView) Other Annotations Anything (EnsMart) (ContigView, MultiContigView, SyntenyView, GeneView) Lecture/Lab 7.3 5 Species in Ensembl • Focus on vertebrates • No fungi/plants • Arabidopsis genome browser based on Ensembl at http://atensembl.arabidopsis.info/ Vertebrates Lecture/Lab 7.3 Mammals: Human Chimp Mouse Rat Dog Cow Opossum Fish: Zebrafish Fugu Pufferfish Tetraodon Pufferfish Other: Chicken Frog Invertebrates Insects: Fruitfly Mosquito Honeybee Other: 6 Nematode Ensembl Gene Annotation • “Basis for initial analysis and publication of most vertebrate genomes” • Genome assembly from NCBI • Gene build system – Targetted gene builds predict known genes – Similarity gene builds predict novel genes Lecture/Lab 7.3 7 Curwen et al, Genome Res 14: 942-950, 2004 Lecture/Lab 7.3 8 Targetted gene build • Align known proteins with pmatch and BLAST • Incorporate aligned cDNA sequences to find splice sites, UTRs with genewise UTRs predicted Known gene (p53) ContigView of best in genome gene with associated evidence Proteins aligned Unigene clusters aligned Lecture/Lab 7.3 cDNAs aligned 9 Similarity gene build • Identify novel exons ab initio using Genscan • Confirm exons by BLAST to known proteins, mRNAs, UniGene clusters Unigene ContigView of homology gene with clusters associatedaligned evidence Proteins aligned GenScan predictions Lecture/Lab 7.3 Novel gene 10 Ensembl Gene Annotation • Resulting “Ensembl genes” are highly accurate with low false positive rates • Ensembl human gene identifiers are 95% stable between builds Lecture/Lab 7.3 11 Manually curated genes: VEGA • Some chromosomes contain manually curated genes from VEGA database • Otter database/server allows integration of automatic and manual annotations (eg. from Apollo) Lecture/Lab 7.3 VEGA gene 12 Ensembl EST genes • ESTs not accurate enough to produce Ensembl genes, but important especially for identifying alternative transcripts • ESTs aligned to genome and merged to create an independent set of “EST genes” Known gene EST genes Unigene clusters aligned Lecture/Lab 7.3 13 Pseudogenes • Processed pseudogenes in annotation identified (lack of introns, frameshifts, presence of multi-exon version elsewhere in genome, etc.) Pseudogene Lecture/Lab 7.3 14 Noncoding RNA Genes • Genes with no ORFs that are functional (tRNAs, rRNAs, miRNAs …) • 7220 annotations from Sean Eddy and Tom Jones miRNAs Coding gene Lecture/Lab 7.3 15 Example 1: Exploring Caspase-3 • Aim to demonstrate basic browsing and views • Caspase-3 is a gene involved in apoptosis (cell suicide) • We will look at: – – – – Gene annotation SNPs Orthologs and genome alignments Alternative transcripts and EST genes Lecture/Lab 7.3 16 Example 1: Exploring Caspase-3 http://www.ensembl.org Go to human homepage Lecture/Lab 7.3 17 Species-specific homepage Site map Statistics of current release Lecture/Lab 7.3 18 Finding the tool/view: Site Map Lecture/Lab 7.3 19 Click Back to Gene Lecture/Lab 7.3 Text Search caspase-3 Species-specific homepage 20 GeneView ContigView ExportView SNPView TransView of transcript Lecture/Lab 7.3 ProteinView ExonView 21 GeneView Orthologs predicted by sequence similarity and synteny Lecture/Lab 7.3 GeneDAS: Get data from external sources22 GeneView On the same page, information provided for each transcript individually Lecture/Lab 7.3 Links to external databases 23 GeneView Lecture/Lab 7.3 24 GeneSNPView Lecture/Lab 7.3 25 Other SNP/Haplotype tools • SNPView • ProteinView (protein sequence with SNP markup) • LDView: View linkage disequilibrium (only limited regions) • HaploView: View haplotypes (only limited regions) Lecture/Lab 7.3 26 Click Back to Lecture/Lab 7.3 GeneView 27 ContigView Chromosome and bands Sequence contigs Lecture/Lab 7.3 28 ContigView: Detailed View See other tracks, options in menus Gene annotations Genscan predictions Targetted gene predictions (2 alternative transcripts) EST genes Other tracks: Aligned sequences etc. Lecture/Lab 7.3 29 ContigView Lecture/Lab 7.3 30 MultiContigView DNA sequence homology Rat ortholog Lecture/Lab 7.3 31 Other Comparative Genomics Tools • Saw gene orthology, DNA homology • Other view is SyntenyView • Also access comparative genomics through EnsMart Lecture/Lab 7.3 32 Data Mining with EnsMart • Allows very fast, cross-data source querying • Search for genes (features, sequences, etc.) or SNPs based on – Position; function; domains; similarity; expression; etc. • Accessible from Ensembl website (MartView) as well as stand-alone • Extremely powerful for data mining Lecture/Lab 7.3 33 Example 2: EnsMart • A new disease locus has been mapped between markers D21S1991 and D21S171. It may be that the gene involved has already been identified as having a role in another disease. What candidates are in this region? Lecture/Lab 7.3 34 Example 2: EnsMart • EnsMart is based on BioMart • http://www.ensembl.org/Multi/martview OR • http://www.ebi.ac.uk/BioMart/martview Lecture/Lab 7.3 35 EnsMart: Choosing your dataset Lecture/Lab 7.3 36 EnsMart: Filtering 21 D21S1991 D21S171 Lecture/Lab 7.3 37 EnsMart: Output Note you can output different types of information eg. sequences Lecture/Lab 7.3 38 EnsMart: Output Lecture/Lab 7.3 39 Sequence Similarity Searching • Use SSAHA for exact matches (fast) • Use BLAST for more distant similarity (slow) Lecture/Lab 7.3 40 Finding anything else: Help Lecture/Lab 7.3 41 DAS: Getting your Own Data in Ensembl • DAS (Distributed Annotation System) – Anyone can load data into Ensembl and allow others to view it in the same view (eg. ContigView) as other Ensembl annotations – Some built-in DAS sources • http://www.ensembl.org/Docs/ ldas.html Lecture/Lab 7.3 42 Other Ways to Access Ensembl • MySQL database directly accessible • APIs for Perl and Java • Other software – Apollo Java genome annotation viewer/editor – Sockeye Java viewer • You can get your own local version of Ensembl: software and data7.3freely available Lecture/Lab Sockeye 43 For more information • Publications (listed at http://www.ensembl.org/Docs/ wiki/html/EnsemblDocs/EnsemblPublications.html) – Ensembl Special: Genome Research May 2004 – Ensembl updates: NAR Jan. 2002-2005 – EnsMart: Kasprzyk et al, Genome Res Jan. 2004 • Documentation on how to download software and database: – http://www.ensembl.org/Docs/ Lecture/Lab 7.3 44 Exercises • Homologues of human genes are often present in Fugu rubripes in more condensed form (with shorter introns). Is this true for the gene PTEN, a tumor suppressor often mutated in advanced cancers? – Try MultiContigView; can you think of another way to get this information as well? • The microRNA bantam regulates the Drosophila (fruitfly) gene hid by binding the 3’ UTR. Hid is involved in apoptosis, and it is possible that binding sites for bantam could be found in the 3’ UTR of other apoptosis genes as well. Obtain the 3’ UTR sequence of all Drosophila genes known to be involved in apoptosis. – Using EnsMart, the GO term for apoptosis is GO:0006915, evidence code TAS • The file “PCR_product.txt” contains the sequence of a PCR product amplified from a mouse cDNA library. What gene does the product correspond to? Does it contain the complete coding sequence of that gene? – Would it be better to use BLAST or SSAHA? Lecture/Lab 7.3 45