Download Exercises Biological databases PART ensembl

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Long non-coding RNA wikipedia , lookup

Human genetic variation wikipedia , lookup

Transposable element wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Polyploid wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Oncogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genomic imprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Minimal genome wikipedia , lookup

Genetic engineering wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Point mutation wikipedia , lookup

X-inactivation wikipedia , lookup

Gene therapy wikipedia , lookup

Copy-number variation wikipedia , lookup

The Selfish Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene wikipedia , lookup

Gene expression programming wikipedia , lookup

Human genome wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Public health genomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genomics wikipedia , lookup

Gene desert wikipedia , lookup

Human Genome Project wikipedia , lookup

RNA-Seq wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome (book) wikipedia , lookup

Genome editing wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Exercise databases
Bioinformatics (updated 2015)
EXERCISES BIOLOGICAL DATABASES PART ENSEMBL
Exercises Biological databases PART ensembl.......................................................................................................... 1
Discovering genome projects WITH ENSEMBL ............................................................................................................... 1
location VIEW ................................................................................................................................................................. 2
genome view............................................................................................................................................................... 3
Chromosome view ...................................................................................................................................................... 3
Region overview ......................................................................................................................................................... 3
Region in detail ........................................................................................................................................................... 4
comparative genomics ................................................................................................................................................ 8
gene view ...................................................................................................................................................................... 13
Transcript level ............................................................................................................................................................. 16
DISCOVERING GENOME PROJECTS WITH ENSEMBL
http://www.ensembl.org/index.html
for more tutorials also look at the following sites
http://www.ensembl.org/info/website/tutorials/index.html (what is assembly, what is genome annotation)
Kathleen Marchal
1
Exercise databases
Bioinformatics (updated 2015)
In this exercise we will discover more about ensemble, next to ncbi one of the most important databases in molecular
biology (the European counterpart of NCBI).
Go the the ensembl home page
“The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this
information freely available online.”
Check for which species information is available in ensemble (click view the list of all ensemble species)
Search for the human pax 6 gene
We find one entry with and ensemble accessionnumber ENSG00000007372. For this gene we have a location view,
gene view and transcript view. Study each of the views in detail.
LOCATION VIEW
Kathleen Marchal
2
Exercise databases
Bioinformatics (updated 2015)
GENOME VIEW
Look at whole genome. This gives you very general information on the assembly and genome build that was used.
CHROMOSOME VIEW
The chromosome view gives you an overview of the chromosome in which pax 6 is located.
REGION OVERVIEW
Kathleen Marchal
3
Exercise databases
Bioinformatics (updated 2015)
REGION IN DETAIL
The location view provides a view of the gene on the human chromosome.
On which chromosome is the gene? What is the start and stop position of the gene?
A detailed region comparison shows a 1 MB region (500000 bp upstream and downstream of the gene pax6). The
dark/light blue alternating bars are the contigs that make up the assembly of the genome. You can observe the genes
neighbouring pax 6 (> indicates forward strand; < indicates the gene is on the reverse strand). Yellow bars represent
coding genes, blue genes represent processed RNAs (so non coding elements), grey genes are pseudogenes.
Kathleen Marchal
4
Exercise databases
Bioinformatics (updated 2015)
The lower panel shows a zoom in on the pax6 gene. Splice variants or alternative transcripts are shown: Open boxes
are UTRs, dark boxes are exons and lines are introns. Red or orange are protein coding genes. Blue are unprocessed
transcripts. Click on the orange transcript
A popup window appears showing details on the transcript. It says that the transcript is confirmed by both ensemble
and Havana annotation, so it is a highly relevant transcript. Green transcripts are referred to as resulting from the
consensus coding sequence project and they are confirmed by Havana, ensemble, NCBI and USCS genome browser. So
they are consensus transcripts.
In this view several tracks are shown and turned on by default: these are the %GC, regulatory regions (zoom at the
legend, can you pinpoint promotor of pax6 and the transcription factor binding sites?), variants (variants that occur
more than 1 % in the 1000 human genome project, variants that have been coupled to a phenotype). Zoom in on
some variants (place a box around the variants and click jump to region.
What type of variants do you observe?
Kathleen Marchal
5
Exercise databases
Bioinformatics (updated 2015)
Jump back to the original region (e.g. by zooming out).
About 400 different tracks are available and most have not been shown. To show more tracks click on the configure
wheel.
Look at the different tracks you can display. For instance display the tilepaths (these are the clones that were
sequenced to make the assembly; they are part of clones and miscellaneous regions). Close the window and check
whether they are added.
Kathleen Marchal
6
Exercise databases
Bioinformatics (updated 2015)
Now try to display somatic mutations (what are somatic mutations, how are they different from the other variants?,
are there somatic mutations known for pax6? Add the ones from COSMIC). Click on one variant and check the
information. Look at the phenotypic data. In which organ the variant was found?
Kathleen Marchal
7
Exercise databases
Bioinformatics (updated 2015)
To save your image, select
All information shown in the course can be revised through the following online tutorial
http://www.ensembl.org/info/website/tutorials/index.html
COMPARATIVE GENOMICS
Kathleen Marchal
8
Exercise databases
Bioinformatics (updated 2015)
Alignment image
of the pax 6 region in 12 species or in 8 primates: white regions are unaligned and usually correspond to introns or
intergenic regions.
Corresponding multiple alignment view
Kathleen Marchal
9
Exercise databases
Bioinformatics (updated 2015)
Region comparison
Shows the extent to which the gene neighbourhood of pax6 is aligned. You can select your species in the configure
view
Kathleen Marchal
10
Exercise databases
Bioinformatics (updated 2015)
At this moment Human is compared with zebrafish, rat and gorilla.
The genomic context of human pax 6 is most similar with the one of gorilla.
Kathleen Marchal
11
Exercise databases
Bioinformatics (updated 2015)
View the detailed alignment; Green regions correspond to aligned regions. The human pax 6 is in the middle panel as
it is the reference.
Despite its lower conservation of the genomic context, the pax6 gene itself is very conserved at the exon level.
Synteny
View between mouse and human: shows on which different mouse chromosomes the regions of the chromosome
containing the pax6 gene are located (human chromosome is scattered over 2 mouse chromosomes). The human
chromosome is in the middle. The small red box indicates the location of pax6.
Kathleen Marchal
12
Exercise databases
Bioinformatics (updated 2015)
Change the species to view the synteny with chimp. This is much higher than with mus.
GENE VIEW
The gene is identified by a unique Ensembl ID and its common name (HGNCID). Its location on the genome is indicated
(Contig Z95332.1.1.20874.). Annotation was performed using an automatic pipeline and manually curated.
How many different splice variants are available You view a table and when you scroll down the page you have a
graphical overview. Geta detailed view on the splice variants.
Kathleen Marchal
13
Exercise databases
Bioinformatics (updated 2015)
Show the supporting evidence: these are the individual sequences that were used to construct the different gene
models and splice variants
View sequence (sequendary structure). This provides you a view on the sequence with the exons indicated in red. If
you now click “export your sequence from the ensemble browser” you can download the sequence.
Kathleen Marchal
14
Exercise databases
Bioinformatics (updated 2015)
You can select the features you want to down load.
Also look at the following tutorial
http://www.ensembl.org/info/website/tutorials/index.html
Comparative genomics
Look at the gene tree (the alignment at the gene level (in contrast to the region alignment)
Kathleen Marchal
15
Exercise databases
Bioinformatics (updated 2015)
The pax6 gene family is a very conserved gene family that did not undergo significant gene losses or gains. How many
paralogs has the gene. What are paralogs. What is the difference with orthologs.
TRANSCRIPT LEVEL
Click on one of the transcripts and explore the transcript centered view. View the schematic view of the transcript.
Kathleen Marchal
16
Exercise databases
Bioinformatics (updated 2015)
Check the supporting evidence
Yellow and green are exon based evidence while purple is EST based evidence. The est based evidence supports the
untranslated regions while the exon based evidence supports the translated regions
View the ontology information
Kathleen Marchal
17
Exercise databases
Bioinformatics (updated 2015)
Gene ontology provides a structured annotation of the gene. It is subdivided in 3 classes, molecular function,
localization and biological process. The latter one is the most interesting. View it in a graph structure and try to
understand in what functions pax6 is involved.
View the presence of protein domains
Note on Havana sequences:
At the Sanger Institute, automated prediction provided by Ensembl are subjected to high quality manual curation.
Finished genomic sequence is analysed on a clone by clone basis using a combination of similarity searches (against
various DNA and protein databases) and ab initio gene predictions (Genscan, Fgenes). A sizable proportion of the
human genome has been manually annotated by the Havana group. The results of this project are released in the
VEGA database.
Kathleen Marchal
18