* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download bchm6280_16_ex5a
Copy-number variation wikipedia , lookup
Essential gene wikipedia , lookup
Genetic engineering wikipedia , lookup
Protein moonlighting wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Point mutation wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Pathogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Genomic imprinting wikipedia , lookup
Minimal genome wikipedia , lookup
Gene desert wikipedia , lookup
The Selfish Gene wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Exercise 5a: Exploring gene lists with DAVID & protein databases Due Date: Tuesday, June 7th at 4:00 pm Name: Background: This exercise is designed to help you understand the process of finding biological meaning in a larger list of genes and to drill down into the list to propose hypotheses about the role of specific genes or proteins in the list that you can test experientially. There will be not be a single right answer, but I’m interested in your ability to process the data and generate hypotheses, even if a bit far fetched given the limited background knowledge we have for this set of genes. Resources: DAVID bioinformatics tools Protein domain information: Uniprot/Interpro/SMART databases Protein information sites: CBS prediction/EMBOSS Starting point: GeneListforEx5.xlsx This is identical to what I sent for ex 4, except that I added 2 worksheets: UP regulated genes – worksheet titled UP-regulated DOWN regulated genes – worksheet titled DOWN-regulated 5-1) Functional Annotation Clustering analysis 1. Import the UP and DOWN regulated genes as separate lists in DAVID using the Entrez GeneIDs. 2. Use the default values and conduct a Functional Annotation Clustering analysis on both the UP and DOWN regulated lists. A new window should pop up that will allow you to examine the clusters in more detail. Describe in two tables (one for each list) the top 4 clusters, based on enrichment score. The tables should include the number of genes in that cluster the enrichment score a brief description of how the genes in that cluster appear to be related. Write a narrative explaining what you’ve found out about these genes. In your narrative, describe or explain: 1. How many clusters were generated for each gene list? 2. Did any of the top clusters identify genes that you think might be important in or related to the cellular senescence? If so, which ones and why? 3. Compare your top enrichment clusters with the top over-represented GO terms from the Panther or DAVID analysis done in Exercise 4. Do the clusters seem to overlap with the top terms? BCHM 6280 2016 Exercise 5a Page 1 of 3 5-2) Generate a sublist of genes of interest For each of the UP and DOWN regulated gene lists, generate a sublist from one of the top 4 annotation cluster. Your sublist should have between 15 and 35 genes. You can do this in one of 2 ways: 1. Within the cluster, click on the red G and download file from the window that opens. Save it as a text file and open in Excel. It will contain the Entrez Gene IDs, gene symbols and short description of the gene or gene name 2. Click on one or more of the check boxes to the left of the terms and when you’ve selected all you want to select, click the Generate Sublist button. Give it a meaningful name. When you return to the main page after closing the Functional Annotation Window, the new list will appear. Select the sublist in the dialog box on the left and click use. Then use Gene Name batch viewer to see the list and download as a text file. Import into Excel. This will also have the Entrez Gene IDs, gene symbols and name. In a single Excel file, have each gene list as its own worksheet with an appropriate name. Describe how you generated the sublists and why you chose the ones that you saved. 5-3) Explore your gene lists using Uniprot On the UniProt main page, click the link on the menu bar Retrieve ID/mapping. Under 1: Provide you identifiers: Copy the Entrez Gene IDs from your sublist and paste in the provided text box Under 2: Select Options Set the From dropdown menu to GeneID (Entrez Gene) and To dropdown menu to UniprotKB Click the GO button. You should note several things: 1. The Entrez Gene ID may map to more than 1 protein record in Uniprot 2. Some of the Uniprot records are Reviewed (Gold) and some are Unreviewed (Blue). Usually if the Entrez ID maps to more than 1 Uniprot record, one is reviewed and the other are unreviewed. 3. The Entry column provides the Uniprot Accession and is a link to the full Uniprot record for that protein. 4. You can download the data as sequences or tab-delimited data that can be imported into Excel. Save the exported data as a Excel workbook, with each gene list as a separate worksheet. Spend some time looking at your lists. When choosing a gene for follow-up studies, at least within the context of this exercise, I would recommend choosing one that has been reviewed. It is likely to have more data associated with it. Chose one from each sublist and open the Uniprot record for them. Your lists should now include the Uniprot accessions. You can submit a list of these accession numbers to see which proteins in your list are also in the SMART database. This may help you narrow down your list to the few genes you want to study. BCHM 6280 2016 Exercise 5a Page 2 of 3 Describe the two genes with the following information: 1. Why you chose them 2. Gene name 3. Probable function/role in the cell 4. Potential role in senescence (this can be quite far fetched) 5. What protein family they belong to 6. What domains they have 7. Likely or known cellular location(s) 8. Are there any disease associated variants located within the protein sequence? Most, if not all of this information, can be found within or linked from the Uniprot record. Within the Uniprot record, the Feature viewer that shows much of this information in a compact view that can be expanded. BCHM 6280 2016 Exercise 5a Page 3 of 3