* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Practical
Survey
Document related concepts
Ridge (biology) wikipedia , lookup
Microevolution wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Transcript
ArrayExpress and Expression Atlas practical: querying and exploring gene expression data at EMBL-EBI Last updated: 14 October 2013 Sarah Morgan – [email protected] / Gabriella Rustici – [email protected] This practical will introduce you to the data content and query functionality of ArrayExpress and Gene Expression Atlas databases. We suggest using Firefox for this tutorial. Additional information on these two resources including dedicated courses and more exercises can be found on the EBI eLearning portal, Train Online. ArrayExpress: http://www.ebi.ac.uk/training/online/course/arrayexpressexploring-functional-genomics-data-ar Expression Atlas: http://www.ebi.ac.uk/training/online/course/arrayexpressinvestigating-gene-expression-pattern-0 Please consider that the results you will obtain while doing the exercises might differ from what illustrated here due to a recent database update. Exercise 1 ArrayExpress - High-throughput sequencing example Scenario High-throughput sequencing (HTS) is becoming a popular tool in cancer research to decipher the genetic make-up of a tumour. Mutations, epigenetic misregulation and genomic reorganisation are just some of the things that can be studied using this technology. Imagine that you have just started a project working on human ‘prostate adenocarcinoma’ and you want to find out all RNA-seq experiments in ArrayExpress that study this cancer. Task Use the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/experiments/browse.html) to find relevant experiments. 1 Exercise 2 ArrayExpress – Effect of sodium dodecyl sulfate on human skin Scenario Sodium dodecyl sulfate (SDS) / sodium lauryl sulfate (SLS) is a common ingredient in many household cleaning products. SDS is a known irritant and upon direct topical application, can cause skin inflammation in some individuals, with possible perturbations of gene expression underlying the inflammatory symptoms. You are interested in finding out experiments which studied how SDS affects gene expression in human skin or its in vitro models. Task Use the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/experiments/browse.html) to find relevant experiments. Exercise 3 Expression Atlas - Regulation of transcription Scenario Carcinoma of the prostate is the most frequently diagnosed neoplasm in men in industrialized countries. The androgen receptor (AR), a transcription factor that mediates the action of androgens in target tissues, is expressed in nearly all prostate cancers. During prostatic carcinogenesis, major changes in the androgen receptor pathways occur. Androgen receptor signaling in the nuclei of malignant cells directly stimulates growth of tumor cells. Imagine that you have a mouse model for human prostate carcinoma, and are trying to elucidate the role of androgen receptor-dependent transcription. You want to find out which mouse genes, annotated as members of the ‘androgen receptor signaling pathway’, are differentially expressed in prostate carcinoma, and which of these are also transcription factors themselves, i.e. involved in ‘regulation of transcription from RNA polymerase II promoter’. Task Use the Expression Atlas (http://www.ebi.ac.uk/gxa/) to search for such genes. 2 Exercise 4 Expression Atlas – Holt-Oram Syndrome Scenario Imagine you are a scientist working on Holt-Oram syndrome, a genetic disease also known as ‘heart-hand syndrome’ as it affects the development of the heart septum and forelimbs in affected individuals. Linkage studies mapped the disease to the TBX5 gene on human chromosome 12, and there is evidence that HoltOram patients suffer from haploinsufficiency of TBX5. You’ve heard about the availability of mouse models for Tbx5 (e.g. a Tbx5knockout mouse). Before you start working on the project, you are curious about the expression pattern of Tbx5 in mouse, to help you decide whether the mouse models would be useful to your project. Task Use the Expression Atlas database (http://www.ebi.ac.uk/gxa/) to find information about Tbx5 expression in mouse, and then compare the expression pattern with human TBX5. 3 Need some help? Exercise 1 1. Open the ArrayExpress experiments page, http://www.ebi.ac.uk/arrayexpress/experiments/browse.html so you can see both the search box and the drop-down filters. 2. Start typing ‘prostate’ in the experiment search box [A] and then select the EFO term ‘prostate adenocarcinoma’. Click ‘Search’. 3. Filter your search by organism ‘Homo sapiens’ [B], ‘RNA assay’ [C] and ‘Sequencing assay’ [D], then click ‘Filter’. You do not need to touch the ‘All arrays’ option as it is only used when you want to filter for experiments done on a specific microarray platform (e.g. Affymetrix mouse 3’ IVT array). 4. The results would look like this: 4 5. You can now look more carefully at individual experiments to identify the one that might be more relevant for your research. Let’s explore E-GEOD-24284. Click on experiment E-GEOD-24284 and explore the information available for this experiment. This is an interesting experiment because the investigator performed both microarray and sequencing analyses on the samples. In particular, take a look at the ‘Samples (20) Click for detailed sample information and links to data’ section (circled in red in the screenshot above) to find out more information about the samples analyzed in this experiment and how they relate to the data files. 5 Exercise 2 1. Open the ArrayExpress experiments page, http://www.ebi.ac.uk/arrayexpress/experiments/browse.html so you can see both the search box and the drop-down filters. 2. Type ‘sodium dodecyl sulfate’ in the search box (there is no auto-completion of this term yet because it is not in EFO), click ‘Search’ [A]. Make sure you put the search term in quotes so you’re searching for the chemical’s name, rather than ‘sodium’, ‘dodecyl’ and ‘sulfate’ separately. 3. Filter the results by organism, select ‘Homo sapiens’ [B], then click ‘Filter’: 6 4. Out of the few experiments returned, have a look at each of them to see which one studied the effect of SDS on skin, and which ones merely mentioned the term somewhere in the experiment (e.g. in the description or protocols): Alternatively, you can find the experiment of interest by typing three search terms to start with: ‘sodium dodecyl sulfate’, ‘human’, ‘skin’. (It should return ‘E-MTAB-943’ only.) Remember that if your search terms are very specific, you may not get any results at all, so you may find it easier to start with a broader search and then narrow it down by adding more search terms later, or by using filters. 5. Sodium dodecyl sulfate is not yet in Experimental Factor Ontology, so when you searched for ‘sodium dodecyl sulfate’, its synonyms (e.g. SDS, SLS, ‘sodium lauryl sulfate’) or spelling variations (e.g. ‘sodium lauryl sulphate’) are NOT searched automatically. Try searching for the alternatives terms of ‘sodium dodecyl sulfate’ and see what results are returned. 7 Exercise 3 1. Open the Atlas homepage, http://www.ebi.ac.uk/gxa/. 2. Start typing ‘androgen’ in the ‘Genes’ search box [A] and select the suggestion ‘androgen receptor signaling pathway’ from the drop-down list. 3. Restrict your search to the organism ‘Mus musculus’ [B]. 4. Type ‘prostate’ in the condition search box [C]. You will see a list of suggested terms in a drop-down list. Hover your mouse cursor over the suggested terms to reveal their EFO IDs. The term you need is ‘prostate carcinoma’, [EFO_0001663]. 5. Click ‘Search Atlas’ [D]. You will get results which look like this: 6. The result of your search are presented as a heatmap view [E] that shows genes belonging to the ‘androgen receptor signaling pathway’, which were identified as differentially expressed in the condition ‘prostate carcinoma’ in mouse. The EFO ‘tree’ has been expanded so you can see child term too, ‘prostate intraepithelial neoplasia’. 7. You can now add additional search criteria to your search and search only for the genes involved in regulation of transcription from RNA PolII promoter using advanced search [F]. 8 8. Click on the ‘Advanced search’ link [F]. You will be presented with a more elaborate search box: 9. From the ‘Gene property’ menu [H] select Gene Ontology (GO) term. By doing this, you will add an additional search box [G]. 10. In the new search box, type in the GO term ‘Regulation of transcription from RNA polymerase II promoter’ and select the suggestion for the matching term. 11. Click on ‘Search Atlas’ to refresh your results. The results now show only genes differentially expressed in mouse prostate carcinoma, annotated as members of the androgen signaling pathway and which are involved in the regulation of transcription. They should represent an interesting set of genes for your project. 12. Click at the heatmap cells to explore the studies which identified the genes as differentially expressed under your queried conditions. You can also click on an individual gene name to explore its wider expression pattern (e.g. in different cell types, tissues). You can download this list of genes and related information for further analysis. The downloaded file will contain the experiments from which these genes were identified to be differentially expressed under these conditions. 9 Exercise 4 1. Open the Atlas homepage, http://www.ebi.ac.uk/gxa/. 2. Since you are interested in Tbx5 expression, you could type in the ‘Genes search box’ ‘Tbx5’ and select, from the suggestions, the matching term for the mouse gene. Always select one with the Ensembl gene ID shown (prefixed by ‘ENS’). 3. Select ‘Mus musculus’ in ‘Organism’, and then click ‘Search Atlas’. 4. Now you have the gene view page for Tbx5. You can scroll down the page and explore. For example, click on the heat-map cell corresponding to ‘heart’ in the mouse anatomogram to see experiments in which Tbx5 is differentially expressed. 10 5. Back to the Tbx5 results page. At the top, the gene’s orthologs from different species are listed. Click at ‘Compare orthologs’ 6. Now you can explore the expression pattern of Tbx5 and its orthologs across different conditions (cell type, cell line, organism part, etc.). Scroll to the right for more conditions. As Holt-Oram syndrome mainly affects heart and forelimb development, you may want to explore the expression patterns in ‘organism part’. Expand the full set of organism parts by clicking ‘more>>‘. You will notice that mouse Tbx5 and human TBX5 share similar expression patterns in the heart, lung, skeletal muscles and part of the gut. 11