Download Practical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ridge (biology) wikipedia , lookup

Microevolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Designer baby wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

NEDD9 wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Transcript
ArrayExpress and Expression Atlas practical: querying and
exploring gene expression data at EMBL-EBI
Last updated: 14 October 2013
Sarah Morgan – [email protected] / Gabriella Rustici – [email protected]
This practical will introduce you to the data content and query functionality of
ArrayExpress and Gene Expression Atlas databases. We suggest using Firefox for
this tutorial.
Additional information on these two resources including dedicated courses and
more exercises can be found on the EBI eLearning portal, Train Online.
ArrayExpress: http://www.ebi.ac.uk/training/online/course/arrayexpressexploring-functional-genomics-data-ar
Expression Atlas: http://www.ebi.ac.uk/training/online/course/arrayexpressinvestigating-gene-expression-pattern-0
Please consider that the results you will obtain while doing the exercises might
differ from what illustrated here due to a recent database update.
Exercise 1
ArrayExpress - High-throughput sequencing example
Scenario
High-throughput sequencing (HTS) is becoming a popular tool in cancer research to
decipher the genetic make-up of a tumour. Mutations, epigenetic misregulation and
genomic reorganisation are just some of the things that can be studied using this
technology.
Imagine that you have just started a project working on human ‘prostate
adenocarcinoma’ and you want to find out all RNA-seq experiments in ArrayExpress
that study this cancer.
Task
Use the ArrayExpress database
(http://www.ebi.ac.uk/arrayexpress/experiments/browse.html) to find relevant
experiments.
1
Exercise 2
ArrayExpress – Effect of sodium dodecyl sulfate on human skin
Scenario
Sodium dodecyl sulfate (SDS) / sodium lauryl sulfate (SLS) is a common ingredient
in many household cleaning products. SDS is a known irritant and upon direct topical
application, can cause skin inflammation in some individuals, with possible
perturbations of gene expression underlying the inflammatory symptoms. You are
interested in finding out experiments which studied how SDS affects gene expression
in human skin or its in vitro models.
Task
Use the ArrayExpress database
(http://www.ebi.ac.uk/arrayexpress/experiments/browse.html) to find relevant
experiments.
Exercise 3
Expression Atlas - Regulation of transcription
Scenario
Carcinoma of the prostate is the most frequently diagnosed neoplasm in men in
industrialized countries. The androgen receptor (AR), a transcription factor that
mediates the action of androgens in target tissues, is expressed in nearly all
prostate cancers. During prostatic carcinogenesis, major changes in the androgen
receptor pathways occur. Androgen receptor signaling in the nuclei of malignant
cells directly stimulates growth of tumor cells.
Imagine that you have a mouse model for human prostate carcinoma, and are
trying to elucidate the role of androgen receptor-dependent transcription. You
want to find out which mouse genes, annotated as members of the ‘androgen
receptor signaling pathway’, are differentially expressed in prostate carcinoma,
and which of these are also transcription factors themselves, i.e. involved in
‘regulation of transcription from RNA polymerase II promoter’.
Task
Use the Expression Atlas (http://www.ebi.ac.uk/gxa/) to search for such genes.
2
Exercise 4
Expression Atlas – Holt-Oram Syndrome
Scenario
Imagine you are a scientist working on Holt-Oram syndrome, a genetic disease
also known as ‘heart-hand syndrome’ as it affects the development of the heart
septum and forelimbs in affected individuals. Linkage studies mapped the disease
to the TBX5 gene on human chromosome 12, and there is evidence that HoltOram patients suffer from haploinsufficiency of TBX5.
You’ve heard about the availability of mouse models for Tbx5 (e.g. a Tbx5knockout mouse). Before you start working on the project, you are curious about
the expression pattern of Tbx5 in mouse, to help you decide whether the mouse
models would be useful to your project.
Task
Use the Expression Atlas database (http://www.ebi.ac.uk/gxa/) to find information
about Tbx5 expression in mouse, and then compare the expression pattern with
human TBX5.
3
Need some help?
Exercise 1
1. Open the ArrayExpress experiments page,
http://www.ebi.ac.uk/arrayexpress/experiments/browse.html so you can see
both the search box and the drop-down filters.
2. Start typing ‘prostate’ in the experiment search box [A] and then select the
EFO term ‘prostate adenocarcinoma’. Click ‘Search’.
3. Filter your search by organism ‘Homo sapiens’ [B], ‘RNA assay’ [C] and
‘Sequencing assay’ [D], then click ‘Filter’. You do not need to touch the ‘All
arrays’ option as it is only used when you want to filter for experiments done
on a specific microarray platform (e.g. Affymetrix mouse 3’ IVT array).
4. The results would look like this:
4
5. You can now look more carefully at individual experiments to identify the one
that might be more relevant for your research. Let’s explore E-GEOD-24284.
Click on experiment E-GEOD-24284 and explore the information available for
this experiment.
This is an interesting experiment because the investigator performed both
microarray and sequencing analyses on the samples. In particular, take a look at
the ‘Samples (20) Click for detailed sample information and links to data’
section (circled in red in the screenshot above) to find out more information about
the samples analyzed in this experiment and how they relate to the data files.
5
Exercise 2
1. Open the ArrayExpress experiments page,
http://www.ebi.ac.uk/arrayexpress/experiments/browse.html so you can see
both the search box and the drop-down filters.
2. Type ‘sodium dodecyl sulfate’ in the search box (there is no auto-completion
of this term yet because it is not in EFO), click ‘Search’ [A]. Make sure you
put the search term in quotes so you’re searching for the chemical’s name,
rather than ‘sodium’, ‘dodecyl’ and ‘sulfate’ separately.
3. Filter the results by organism, select ‘Homo sapiens’ [B], then click ‘Filter’:
6
4.
Out of the few experiments returned, have a look at each of them to see
which one studied the effect of SDS on skin, and which ones merely
mentioned the term somewhere in the experiment (e.g. in the description or
protocols):
Alternatively, you can find the experiment of interest by typing three search
terms to start with: ‘sodium dodecyl sulfate’, ‘human’, ‘skin’. (It should
return ‘E-MTAB-943’ only.)
Remember that if your search terms are very specific, you may not get any
results at all, so you may find it easier to start with a broader search and then
narrow it down by adding more search terms later, or by using filters.
5.
Sodium dodecyl sulfate is not yet in Experimental Factor Ontology, so when
you searched for ‘sodium dodecyl sulfate’, its synonyms (e.g. SDS, SLS,
‘sodium lauryl sulfate’) or spelling variations (e.g. ‘sodium lauryl sulphate’)
are NOT searched automatically.
Try searching for the alternatives terms of ‘sodium dodecyl sulfate’ and see
what results are returned.
7
Exercise 3
1. Open the Atlas homepage, http://www.ebi.ac.uk/gxa/.
2. Start typing ‘androgen’ in the ‘Genes’ search box [A] and select the
suggestion ‘androgen receptor signaling pathway’ from the drop-down list.
3. Restrict your search to the organism ‘Mus musculus’ [B].
4. Type ‘prostate’ in the condition search box [C]. You will see a list of
suggested terms in a drop-down list. Hover your mouse cursor over the
suggested terms to reveal their EFO IDs. The term you need is ‘prostate
carcinoma’, [EFO_0001663].
5. Click ‘Search Atlas’ [D]. You will get results which look like this:
6. The result of your search are presented as a heatmap view [E] that shows genes
belonging to the ‘androgen receptor signaling pathway’, which were identified as
differentially expressed in the condition ‘prostate carcinoma’ in mouse. The EFO
‘tree’ has been expanded so you can see child term too, ‘prostate intraepithelial
neoplasia’.
7. You can now add additional search criteria to your search and search only for the
genes involved in regulation of transcription from RNA PolII promoter using
advanced search [F].
8
8. Click on the ‘Advanced search’ link [F]. You will be presented with a more
elaborate search box:
9. From the ‘Gene property’ menu [H] select Gene Ontology (GO) term. By
doing this, you will add an additional search box [G].
10. In the new search box, type in the GO term ‘Regulation of transcription from
RNA polymerase II promoter’ and select the suggestion for the matching term.
11. Click on ‘Search Atlas’ to refresh your results. The results now show only
genes differentially expressed in mouse prostate carcinoma, annotated as
members of the androgen signaling pathway and which are involved in the
regulation of transcription. They should represent an interesting set of genes
for your project.
12. Click at the heatmap cells to explore the studies which identified the genes as
differentially expressed under your queried conditions. You can also click on
an individual gene name to explore its wider expression pattern (e.g. in
different cell types, tissues).
You can download this list
of genes and related
information for further
analysis.
The downloaded file will
contain the experiments
from which these genes
were identified to be
differentially expressed
under these conditions.
9
Exercise 4
1. Open the Atlas homepage, http://www.ebi.ac.uk/gxa/.
2. Since you are interested in Tbx5 expression, you could type in the ‘Genes
search box’ ‘Tbx5’ and select, from the suggestions, the matching term for the
mouse gene. Always select one with the Ensembl gene ID shown (prefixed by
‘ENS’).
3. Select ‘Mus musculus’ in ‘Organism’, and then click ‘Search Atlas’.
4. Now you have the gene view page for Tbx5. You can scroll down the page
and explore. For example, click on the heat-map cell corresponding to ‘heart’
in the mouse anatomogram to see experiments in which Tbx5 is differentially
expressed.
10
5. Back to the Tbx5 results page. At the top, the gene’s orthologs from different
species are listed. Click at ‘Compare orthologs’
6. Now you can explore the expression pattern of Tbx5 and its orthologs across
different conditions (cell type, cell line, organism part, etc.).
Scroll to the right for more conditions.
As Holt-Oram syndrome mainly affects heart and forelimb development, you
may want to explore the expression patterns in ‘organism part’. Expand the
full set of organism parts by clicking ‘more>>‘. You will notice that mouse
Tbx5 and human TBX5 share similar expression patterns in the heart, lung,
skeletal muscles and part of the gut.
11