* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Ensembl Database
Vectors in gene therapy wikipedia , lookup
Frameshift mutation wikipedia , lookup
Protein moonlighting wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Non-coding DNA wikipedia , lookup
Koinophilia wikipedia , lookup
Genomic library wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome (book) wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Public health genomics wikipedia , lookup
Gene desert wikipedia , lookup
Transposable element wikipedia , lookup
Point mutation wikipedia , lookup
DNA barcoding wikipedia , lookup
Gene nomenclature wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Designer baby wikipedia , lookup
Pathogenomics wikipedia , lookup
Sequence alignment wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome evolution wikipedia , lookup
Microevolution wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung 6/06 Outline Introduction to FlyBase Introduction to Ensembl Using web databases to assist annotation of novel sequences Introduction to FlyBase Available at http://www.flybase.org Introduction to FlyBase FlyBase is primarily funded by the National Institutes of Health FlyBase consortium includes Drosophila researchers and computer scientists at Harvard University, Indiana University, and University of Cambridge, plus scientists worldwide In addition to the main site at www.flybase.org, there are also many mirror sites What is FlyBase? It is a comprehensive database of genetic and molecular data for many Drosophila species: Information on genes and mutant alleles Expression and function of gene products Genetic, cytological, molecular map information Data from Berkeley Drosophila Genome Project Data from European Drosophila Genome Project Introduction to Ensembl Available at http://www.ensembl.org What is Ensembl? Ensembl is a joint project between the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute Ensembl seeks to develop an automated system for the production and maintenance of annotations on eukaryotic genomes These annotations should also be easily accessible to researchers What is Ensembl? While originally developed for eukaryotes, the Ensembl system has also been used to analyze prokaryotic genomes EBI Genome Review (archaea and bacteria) Most recent version is v38 (Apr 2006) Genomes available include human, chimp, mouse, dog, C. elegans, fruit fly, honey bee, mosquito among others Ensembl Gene Annotation System All Ensembl gene predictions are based on experimental evidence Predictions based on manually curated Uniprot/Swissprot/Refseq databases UTR’s are annotated only if they are supported by EMBL mRNA records Val Curwen, et al. The Ensembl Automatic Gene Annotation System Genome Res., May 2004; 14: 942 - 950. Using Web Databases for Annotation List of available species in the FlyBase BLAST service to use in a search for sequences homologous to your query Exon View in Ensembl: used to obtain sequence of a gene, exon-by-exon Using Web Databases for Annotation Motivations for using FlyBase Learn the biological functions of the gene of interest Use FlyBase BLAST service to detect sequence homology to Drosophila species or species related to Drosophila Motivations for using Ensembl Obtain records of gene from multiple databases Obtain coding sequence of each exon of a gene Walkthrough Typical use of web databases is to identify putative homolog to a D. melanogaster gene We have a novel 20 kb sequence from D. erecta Using RepeatMasker, we masked all drosophilaspecific repeats from the sequence Using blastx, we searched this sequence against the Swissprot database blastx results indicate our sequence is similar to the Paired-box protein (Pax6) in D. melanogaster Function of Pax-6 Clicking on the accession number of the first hit in the blastx output shows that Pax-6 is also known as eyeless We can learn more about eyeless using the FlyBase web site @ http://flybase.org Type in eyeless in the search field, then click on the hit “ey” (#17) Function of Pax-6 This brings up the gene report for eyeless in D. melanogaster We find that eyeless is important for brain and eye development It is expressed in embryo, larva, and adult Phenotypic changes in mutants include changes in the antenna, arista, and eye of the fruit fly Finding Homologs in Other Species Click on the BLAST button to access the BLAST service Search our masked sequence against D. melanogaster, D. yakuba, D. mojavensis, D. virilis genome assemblies using blastn Most of the species, other than D. melanogaster, are unannotated. Nonetheless, this is useful for finding putative orthologs and for discovering regulatory regions using multiple sequence alignments Using the Ensembl Database Navigate to Ensembl @ http://www.ensembl.org Click on “Drosophila melanogaster” to access the data specific for this species In the search box, type in the name “eyeless” then click “Go” We find only one match - CG1464 (the eyeless protein) Transcripts of eyeless There are four different isoforms of eyeless in D. melanogaster We would typically annotate the most “comprehensive” isoform • In this case, isoform D The Fruitfly GeneView provides a general overview of the gene structure and function of eyeless Links to FlyBase, RefSeq, Swiss-Prot, EMBL records of eyeless are also available on this page. Obtaining Transcript Sequence Click on “Exon Info” for the transcript CG1464-RD This bring us to the exon report for this transcript 9 exons, 3024 bps, 898 residues The sequence is shown with each exon in its own block. Sequence is color-coded: Purple = UTR’s Black = Coding DNA sequences (CDS) Blue = intronic sequences Green = upstream or downstream sequences Obtaining Peptide Sequence Click on the link “Protein Information” to obtain the peptide sequence of CG1464-RD This bring us to the protein report for this transcript “Protein Family” section shows that there are six gene members in this species Clicking on the link brings up the Family view - allows visualization of multiple sequence alignments of members of this family The peptide sequence has the following color-code: Black/Blue = Alternating text color for exons Red = Residue overlap splice site Green = Synonymous SNP Yellow = Non-synonymous SNP Next Step Annotate the exact boundaries of each exon in our D. erecta sequence based on sequence homology to D. melanogaster eyeless gene Use exon-by-exon BLAST search with BLAST 2 Sequences (bl2seq) Questions? Walk- through example Determining Exon Boundaries Use bl2seq to determine exon boundaries of the putative ortholog in our D. erecta sequence Go to www.ncbi.nlm.nih.gov/blast/ and select bl2seq Copy D. erecta sequence and paste into the Sequence 1 box. Copy the first exon of DM eyeless and paste into the Sequence 2 box. Change program to tblastx. Click “BLAST” Determining Exon Boundaries We find that the first exon corresponds to bases 1930719414 in our sequence We can repeat the previous steps to locate the other exons in our sequence