Download The Ensembl Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Vectors in gene therapy wikipedia , lookup

Frameshift mutation wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Non-coding DNA wikipedia , lookup

Koinophilia wikipedia , lookup

Genomic library wikipedia , lookup

Copy-number variation wikipedia , lookup

Gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome (book) wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Public health genomics wikipedia , lookup

Gene desert wikipedia , lookup

NEDD9 wikipedia , lookup

Transposable element wikipedia , lookup

Gene wikipedia , lookup

Point mutation wikipedia , lookup

DNA barcoding wikipedia , lookup

Gene nomenclature wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genome wikipedia , lookup

Designer baby wikipedia , lookup

Pathogenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Sequence alignment wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Web Databases for Drosophila
Introduction to FlyBase and
Ensembl Database
Wilson Leung
6/06
Outline



Introduction to FlyBase
Introduction to Ensembl
Using web databases to assist annotation of
novel sequences
Introduction to FlyBase
Available at http://www.flybase.org
Introduction to FlyBase

FlyBase is primarily funded by the National
Institutes of Health

FlyBase consortium includes Drosophila
researchers and computer scientists at Harvard
University, Indiana University, and University of
Cambridge, plus scientists worldwide

In addition to the main site at www.flybase.org,
there are also many mirror sites
What is FlyBase?

It is a comprehensive database of genetic and
molecular data for many Drosophila species:





Information on genes and mutant alleles
Expression and function of gene products
Genetic, cytological, molecular map information
Data from Berkeley Drosophila Genome Project
Data from European Drosophila Genome Project
Introduction to Ensembl
Available at http://www.ensembl.org
What is Ensembl?

Ensembl is a joint project between the
European Bioinformatics Institute (EBI) and the
Wellcome Trust Sanger Institute

Ensembl seeks to develop an automated
system for the production and maintenance of
annotations on eukaryotic genomes

These annotations should also be easily
accessible to researchers
What is Ensembl?

While originally developed for eukaryotes, the
Ensembl system has also been used to
analyze prokaryotic genomes


EBI Genome Review (archaea and bacteria)
Most recent version is v38 (Apr 2006)

Genomes available include human, chimp, mouse,
dog, C. elegans, fruit fly, honey bee, mosquito
among others
Ensembl Gene Annotation System

All Ensembl gene
predictions are based on
experimental evidence

Predictions based on
manually curated
Uniprot/Swissprot/Refseq
databases

UTR’s are annotated
only if they are supported
by EMBL mRNA records
Val Curwen, et al. The Ensembl Automatic
Gene Annotation System Genome Res., May
2004; 14: 942 - 950.
Using Web Databases for Annotation
List of available species in the
FlyBase BLAST service to use
in a search for sequences
homologous to your query
Exon View in Ensembl: used
to obtain sequence of a gene,
exon-by-exon
Using Web Databases for Annotation

Motivations for using FlyBase



Learn the biological functions of the gene of interest
Use FlyBase BLAST service to detect sequence
homology to Drosophila species or species related to
Drosophila
Motivations for using Ensembl


Obtain records of gene from multiple databases
Obtain coding sequence of each exon of a gene
Walkthrough

Typical use of web databases is to identify
putative homolog to a D. melanogaster gene

We have a novel 20 kb sequence from D. erecta



Using RepeatMasker, we masked all drosophilaspecific repeats from the sequence
Using blastx, we searched this sequence against the
Swissprot database
blastx results indicate our sequence is similar to the
Paired-box protein (Pax6) in D. melanogaster
Function of Pax-6

Clicking on the accession number of the first hit in the
blastx output shows that Pax-6 is also known as
eyeless

We can learn more about eyeless using the FlyBase
web site @ http://flybase.org

Type in eyeless in the search field, then click on the hit
“ey” (#17)
Function of Pax-6

This brings up the gene report for eyeless in D.
melanogaster

We find that eyeless is important for brain and eye
development

It is expressed in embryo, larva, and adult

Phenotypic changes in mutants include changes in the
antenna, arista, and eye of the fruit fly
Finding Homologs in Other Species

Click on the BLAST button to access the BLAST service

Search our masked sequence against D. melanogaster,
D. yakuba, D. mojavensis, D. virilis genome assemblies
using blastn

Most of the species, other than D. melanogaster, are
unannotated.

Nonetheless, this is useful for finding putative orthologs
and for discovering regulatory regions using multiple
sequence alignments
Using the Ensembl Database

Navigate to Ensembl @ http://www.ensembl.org

Click on “Drosophila melanogaster” to access the data
specific for this species

In the search box, type in the name “eyeless” then click
“Go”

We find only one match - CG1464 (the eyeless protein)
Transcripts of eyeless

There are four different isoforms of eyeless in D.
melanogaster

We would typically annotate the most “comprehensive” isoform
• In this case, isoform D

The Fruitfly GeneView provides a general overview of
the gene structure and function of eyeless

Links to FlyBase, RefSeq, Swiss-Prot, EMBL records of
eyeless are also available on this page.
Obtaining Transcript Sequence

Click on “Exon Info” for the transcript CG1464-RD

This bring us to the exon report for this transcript


9 exons, 3024 bps, 898 residues
The sequence is shown with each exon in its own block.
Sequence is color-coded:




Purple = UTR’s
Black = Coding DNA sequences (CDS)
Blue = intronic sequences
Green = upstream or downstream sequences
Obtaining Peptide Sequence

Click on the link “Protein Information” to obtain the
peptide sequence of CG1464-RD

This bring us to the protein report for this transcript



“Protein Family” section shows that there are six gene members
in this species
Clicking on the link brings up the Family view - allows
visualization of multiple sequence alignments of members of
this family
The peptide sequence has the following color-code:




Black/Blue = Alternating text color for exons
Red = Residue overlap splice site
Green = Synonymous SNP
Yellow = Non-synonymous SNP
Next Step

Annotate the exact boundaries of each exon in our D.
erecta sequence based on sequence homology to D.
melanogaster eyeless gene

Use exon-by-exon BLAST search with BLAST 2 Sequences
(bl2seq)
Questions?
Walk- through example
Determining Exon Boundaries

Use bl2seq to determine exon boundaries of the
putative ortholog in our D. erecta sequence

Go to www.ncbi.nlm.nih.gov/blast/ and select bl2seq

Copy D. erecta sequence and paste into the Sequence
1 box. Copy the first exon of DM eyeless and paste into
the Sequence 2 box.

Change program to tblastx. Click “BLAST”
Determining Exon Boundaries

We find that the first exon corresponds to bases 1930719414 in our sequence

We can repeat the previous steps to locate the other
exons in our sequence