* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Ensembl Introduction
Gene therapy of the human retina wikipedia , lookup
Point mutation wikipedia , lookup
Primary transcript wikipedia , lookup
Protein moonlighting wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Ridge (biology) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Oncogenomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genomic imprinting wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Gene therapy wikipedia , lookup
Genetic engineering wikipedia , lookup
Copy-number variation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Metagenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Public health genomics wikipedia , lookup
Microevolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Designer baby wikipedia , lookup
Helitron (biology) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Investigating Genomes with Ensembl Drs. Bert Overduin and Giulietta Spudich Overview of the day • Introduction and website walk-through • Hands-on exercises (the browser) Tea/Coffee • Introduction to BioMart • Hands-on exercises (BioMart) Lunch • Determining the gene set • Hands-on exercises (gene set) Tea/Coffee • Variations presentation and hands-on Introducing… • • • • Genome browsing: a comparison Consensus genes Ensembl annotation and software How to find help Sequencing the genome DNase I sensitive site Histone modification Gene Conserved sequence SNP What can we learn about genomes? • Within one genome: regulatory elements, gene order, chromatin structure… • Through comparative studies: Evolution, conserved regions, rearrangements… Gene quality and prediction. Genome Browsers Today • Ensembl Genome browser http://www.ensembl.org • NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ • UCSC Genome Browser http://genome.ucsc.edu Ensembl Genome Browser NCBI Map Viewer UCSC Genome Browser What Distinguishes Ensembl from the UCSC and NCBI Browsers? • The gene set. Automatic annotation based on mRNA and protein information. • Programmatic access via the Perl API (open source) • BioMart • Integration with other databases (DAS) • Comparative analysis (gene trees) Challenges of genome browsers • Increasing sequence information 198,879,188,987 nt (Aug 2007) Challenges of genome browsers • Increasing annotation: ENCODE • Pilot project completed in 2007: 1% of human genome • Discovered promoter elements are on either side of the transcription start site To meet a challenge… Ensembl’s AIM: To provide annotation for the biological community that is freely available and of high quality • Started in 1999 • Joint project between EBI and Sanger • Funded primarily by the Wellcome Trust, additional funding by EMBL, NIH-NIAID, EU, BBSRC and MRC • Team of ca. 40 people, led by Ewan Birney (EBI) and Tim Hubbard (Sanger) The Ensembl gene set • All Ensembl genes start from a known protein or mRNA Sequence Assembly Ensembl gene set mRNAs protein • An initial alignment of protein and mRNA to the genome begins the ‘Genebuild’. Have you heard of… • Ensembl – strives for best possible gene set www.ensembl.org • Havana (VEGA) – same goal http://vega.sanger.ac.uk • HGNC – a unique name and symbol for every gene in human http://www.genenames.org/ • UniProt – focus on proteins, and functional information www.uniprot.org Ensembl vs Havana annotation All genes at once (Ensembl Genebuild) Gene by gene (Havana/ VEGA) • Quick, keeps current • Flexible, can deal with inconsistencies • Consistent annotation • Can apply rules to more species • Consult publications as well as databases • ‘Out of the Ordinary’ Biology • However… Slow, Expensive Merging sets • Havana transcripts are incorporated into Ensembl • UniProt proteins are aligned to the genome in the Ensembl genebuild • UniProt imports Ensembl peptides for human • HGNC moved to Hinxton… coordination Consensus across genome browsers: the CCDS set http://www.ensembl.org/info/about/docs/ccds.html • A protein is deposited into the ‘Consensus CDS protein set’ or CCDS set if: NCBI UCSC Havana Ensembl have determined the same sequence. More about Ensembl… • • • • Genome browsing: a comparison Consensus genes Ensembl annotation and software How to find help Ensembl Genes – biological basis All Ensembl gene predictions are based on proteins and mRNAs in: • UniProt/Swiss-Prot (manually curated) • UniProt/TrEMBL • NCBI RefSeq (manually curated) Protein/ mRNA Sequence Assembly Ensembl Genes Genes and Transcripts in Ensembl • Ensembl known genes or transcripts • Ensembl novel genes or transcripts • Ensembl EST genes or transcripts Non-Ensembl genes: • Imports for yeast, c. elegans, fly, mosquito, takifugu and tetraodon Names in Ensembl • • • • ENSG### ENST### ENSP### ENSE### Ensembl Gene ID Ensembl Transcript ID Ensembl Peptide ID Ensembl Exon ID • For other species than human a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG###, etc. Gene Structure in Ensembl No UTRs UTRs annotated Calmodulin Chicken Calmodulin Human What annotation is available? • Gene/transcript/peptide models (coding and noncoding (ncRNAs)) • IDs in other database • Mapped cDNAs, peptides, micro array probes, BAC clones etc. • Cytogenetic bands, markers, repeats etc. • Comparative data: orthologues and paralogues, protein families, whole genome alignments, syntenic regions • Variation data: Single Nucleotide Polymorphisms (SNPs) • Regulatory data: “best guess” set of regulatory elements from ENCODE • Data from external sources (DAS) Specific data sources • Microarrays (Affimetrix, Illumina, Agilent) • GO (Gene Ontology: functional classes) http://www.geneontology.org/ • OMIM (human diseases and phenotypes) http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM • Identifiers in Entrez, UniProt, Refseq, etc • PDB, MSD (structural databases) http://www.rcsb.org/pdb/ http://www.ebi.ac.uk/msd/ Interpro Collection of protein data Sequences, Motifs, Structures http://www.ebi.ac.uk/interpro/ How is this information organised? • Ensembl Views (Website) • Ensembl Database (open source) (Perl API, FTP site) • BioMart ‘DataMining tool’ Ensembl – Open Source • • • • • Data and software freely available More than 50 installs worldwide Academia and industry Local or available via the web Mirrors with Ensembl data, e.g. http://ensembl.genome.tugraz.at/index.html http://ensembl.genomics.org.cn/ or user projects with own data 28 of 42 Powered by Ensembl 29 of 42 Help and Information • Use our helpdesk! [email protected] • View our help pages! (the ‘using Ensembl’ link) • View our animated tutorials http://www.ensembl.org/common/Workshops_Online • Mailing lists: [email protected] • Come visit our blog! http://ensembl.blogspot.com/ Ensembl Team