* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slide 1
Epigenetics of neurodegenerative diseases wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Koinophilia wikipedia , lookup
Transposable element wikipedia , lookup
DNA barcoding wikipedia , lookup
Gene expression profiling wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Designer baby wikipedia , lookup
Genome (book) wikipedia , lookup
Public health genomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Primary transcript wikipedia , lookup
Non-coding DNA wikipedia , lookup
Microevolution wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genomic library wikipedia , lookup
Metagenomics wikipedia , lookup
Human genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Minimal genome wikipedia , lookup
Genome editing wikipedia , lookup
Helitron (biology) wikipedia , lookup
Ensembl Genome Repository Main Data Repositories • Ensembl- BLAST or BLAT • UCSC - BLAT • NCBI (Entrez) - BLAST • Ensembl, NCBI, and UCSC use the same human genome assembly that is generated by NCBI Ensembl • Provide automatic annotation of sequenced genomes • Integrate with biological data • Make available from web – Genome Browser – Web interface – BioMart – Direct database access Perl API Outline • Where the data comes from • Questions that can be answered Ensembl Genomes Genome Annotation • Identify elements on the genome • Attach biological information to the elements • Automatic annotation and curation Vega/Havana Annotation • Addition of positional, functional, regulatory and evolutionary datasets to a raw assembled genome. • Genes, exon-intron boundaries, protein products, miRNAs, alternative splicing, transcriptional start sites, expression,orthologs, paralogs, repeats, structural features, syntenic relationships, ChIP-chip data ... • Based on experimental data and computational predictions. Genebuild • Align species-specific proteins to the genome to create CDS models (targeted build) • Align proteins from closely related species to locate additional CDS models (similarity build) • Add UTRs using cDNA/EST evidence and ditag data • Cluster transcripts into genes • Classify transcripts • Name genes Human/Mouse Genebuild • additional steps not included in the standard Ensembl build. • For both species, transcripts from the Consensus Coding Sequence (CCDS) set are imported directly and not altered by the genebuild process. • In addition, where manual curation is available for a transcript, the Ensembl and HAVANA transcript models are compared. • The Ensembl and HAVANA models are merged when they agree on the same coding sequence Ensembl Identifiers • ENS_Species_Type_00000_ID • Species: blank for human for all other species a three letter code (MUS - mouse) • Type: G (gene), T (transcript), P (protein) • ID: six-digit number • ENSMUST00000118022 • ENSMUSP00000113891 • ENSMUSG00000021944 Ensembl Organization • Views designed into four classes – – – – Gene Transcript Location (Genome Browser) Variation Questions • • • • Are there splice variants? How do I find orthologs and paralogs? Are there variations in the genomic sequence? How can I download different parts of the mRNA sequence? • What protein domains exist? • Gene Ontology • Can I download sets of data (DNA, cDNA, protein) for a species? • BioMart question Resources • Ensembl Tutorials http://www.ensembl.org/info/website/tutorials/index.html • Ensembl 2009 Nucleic Acids Research PMID: 19033362 • Bert Overduin, Ph.D. Ensembl http://www.ebi.ac.uk/~bert/workshops/london_080509/br owser_london_080509.pdf