* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt_I
Genetic engineering wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Gene desert wikipedia , lookup
Essential gene wikipedia , lookup
Gene expression programming wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Copy-number variation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Human genetic variation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Microevolution wikipedia , lookup
Oncogenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Ridge (biology) wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression profiling wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Metagenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Genome (book) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic library wikipedia , lookup
Genome editing wikipedia , lookup
Human genome wikipedia , lookup
Minimal genome wikipedia , lookup
Pathogenomics wikipedia , lookup
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November 2004 Schedule Today Introduction to the Ensembl system Hands-on examples to introduce the system Evaluating genes and transcripts Variation in Ensembl (SNPs, haplotypes) Tomorrow Data mining with EnsMart Comparative genomics and proteomics in Ensembl BioMart Advanced topics (Upload your own data, DAS) 2 of 45 Our goal 3 of 45 Assembly From 325,109 initial contigs Other ordering data non-redundant, “virtual contig” view to 26,720 overlapping clones 4 of 45 Mapping and Sequencing the human genome BACs fragment bacterial artificial chromosomes avg size 150 kb Shizuya et al 1992 Dib et al 1996 Deloukas et al 1998 Osoegawa et al 2001 WGS sequence assembly draft finished BAC fragment pUCs avg size 2-4 kb Bentley et al 2001 Bruls et al 2001 McPherson et al 2001 Montgomery et al 2001 Tilford et al 2001 map Status of the human sequence finished red /orange ~96% (99.999% accurate) 30-40% repetitive elements (eg Alpha satellite, Alu repeats) All known genes, correctly identified (99.74%) heterochromatin ~4% grey Assembled draft sequence totals 2.85 Gb Human genome: Current status • 22,287 'gene loci‘ defined, consisting of 19,599 protein-coding genes in the human genome and 2,188 DNA additional segments ‘predicted’ to be protein-coding genes – 1183 genes ‘were born’ in the last 60-100 My – ~ 30 genes ‘died’ in a similar time period Finishing the euchromatic sequence of the human genome, Nature 431:931-45 (2004) 7 of 45 Ensembl - project aims • funded to provide metazoan genomes to the world • aims to provide the world’s best automated genome annotation • a leading group for human and mouse analysis • all software, data and results freely available 8 of 45 Ensembl - project background • • • • group split between EBI and Sanger mainly Wellcome Trust funded largest dedicated compute in biology in Europe developer community > 100 people, including companies 9 of 45 Ensembl – Open source Freely-available Community development. – >51 Ensembl installs worldwide. – Both public and commercial, e.g. Gramene (CSHL) Fugu-sg (ICMB) Ciona-sg (Temasek) 10 of 45 Ensembl Analysis DB Final DB Supporting Databases SNP Manual Annotation CPU 11 of 45 Genome browsing why present the whole genome? • • • • • Explore what is in a chromosome region See features in and around a specific gene Search & retrieve across the whole genome Investigate genome organization Compare to other genomes 12 of 45 Genome browsers • Ensembl – public site + installable system • UCSC Human Genome Browser • NCBI Map Viewer http://www.ensembl.org http://www.ncbi.nlm.nih.gov/mapview http://genome.ucsc.edu 13 of 45 Introduction to the Ensembl web site Ensembl … … takes genomic sequence assemblies human build 34, mouse, rat, Fugu,mosquito adds annotation and links automated process presents all the data on a web site 14 of 45 Annotation: genes Known genes Novel genes • where? • genomic structure? • transcripts(s)? • protein(s)? • orthologues? • attach useful links • how to predict? require evidence • transcripts(s)? • protein(s)? • orthologues? • attach useful links 15 of 45 Annotation: other features • • • • markers and SNPs cytogenetic bands repeated sequences ESTs & other sequence records where do they show sequence similarity? • regions homologous to other species 16 of 45 How to get started … … • • • • • • • Species homepage Site map Map View Text search BLAST SSAHA Disease View 17 of 45 Homepage Site map MapView AnchorView BLAST and SSAHA BLAST and SSAHA Regions, maps and markers ContigView CytoView SyntenyView MultiContigView MarkerView SNPView 23 of 45 Ensembl ContigView ContigView close-up Customising & short cuts Evidence Transcripts red & black (Ensembl predictions) Blue (Vega) Pop-up menu ContigView - Chromosome 20 close-up Forward strand Manual annotation via Vega Reverse strand Ensembl predictions Ensembl EST-based predictions Other chromosomes with manual annotation from http://vega.sanger.ac.uk: 6, 7, 9, 10, 13, 14, 20, 22, X CytoView GeneSNP View MarkerView SNPView Synteny View MultiContig View Genes & gene products GeneView TransView ExonView ProteinView FamilyView DomainView GOView DiseaseView 32 of 45 Ensembl GeneView TransView ExonView Protein View Family View GOView DiseaseView Data retrieval EnsMart Export View Data sets on ftp site MySQL queries of databases Perl API access to databases 39 of 45 ExportView EnsMart Mouse differences • Genomic sequence assembly based on whole genome shotgun, with finished ‘stitched’ BACs • BACs are shown in CytoView (FPC map), but for most no sequence is available 42 of 45 Mouse CytoView Help! • context sensitive help pages click • access other documentation via generic home page • email the helpdesk HelpDesk / Suggestions 44 of 45 Thanks Ensembl Team 45 of 45 Ensembl Team November 2004 Database Schema and Core API Arne Stabenau Yuan Chen Ian Longden Craig Melsopp Glenn Proctor Daniel Ríos Guy Slater Project Leader Ewan Birney (EBI) Tim Hubbard (Sanger) Distributed Annotation System Andreas Kähäri Vega Web Team Patrick Meidl Steve Trevianon User Support Xosé Mª Fernández Michael Schuster Comparative Genomics Abel Ureta-Vidal Javier Herrero Sánchez Jessica Severin Cara Woodwark Ensembl Web Team James Stalker Fiona Cunningham James Smith Analysis and Annotation Pipeline Val Curwen Steve Searle Dan Andrews Mario Caccamo Laura Clarke Martin Hammond Jan Hinnerck-Vogel Kevin Howe Vivek Iyer Kerstin Jekosch Felix Kokocinski Simon White EnsMart & BioMart Arek Kasprzyk Damian Keefe Darin London Damian Smedley