Download bchm6280_16_ex5a

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epistasis wikipedia , lookup

Copy-number variation wikipedia , lookup

Essential gene wikipedia , lookup

Genetic engineering wikipedia , lookup

Protein moonlighting wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Point mutation wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Pathogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Genomic imprinting wikipedia , lookup

Minimal genome wikipedia , lookup

Gene desert wikipedia , lookup

NEDD9 wikipedia , lookup

The Selfish Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genome (book) wikipedia , lookup

Gene nomenclature wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Exercise 5a: Exploring gene lists with DAVID & protein databases
Due Date: Tuesday, June 7th at 4:00 pm
Name:
Background:
This exercise is designed to help you understand the process of finding biological meaning in a
larger list of genes and to drill down into the list to propose hypotheses about the role of specific
genes or proteins in the list that you can test experientially. There will be not be a single right
answer, but I’m interested in your ability to process the data and generate hypotheses, even if a
bit far fetched given the limited background knowledge we have for this set of genes.
Resources:
DAVID bioinformatics tools
Protein domain information: Uniprot/Interpro/SMART databases
Protein information sites: CBS prediction/EMBOSS
Starting point:
GeneListforEx5.xlsx This is identical to what I sent for ex 4, except that I added 2 worksheets:
UP regulated genes – worksheet titled UP-regulated
DOWN regulated genes – worksheet titled DOWN-regulated
5-1) Functional Annotation Clustering analysis
1. Import the UP and DOWN regulated genes as separate lists in DAVID using the Entrez GeneIDs.
2. Use the default values and conduct a Functional Annotation Clustering analysis on both the UP
and DOWN regulated lists.
A new window should pop up that will allow you to examine the clusters in more detail.
Describe in two tables (one for each list) the top 4 clusters, based on enrichment score. The
tables should include
 the number of genes in that cluster
 the enrichment score
 a brief description of how the genes in that cluster appear to be related.
Write a narrative explaining what you’ve found out about these genes. In your narrative,
describe or explain:
1. How many clusters were generated for each gene list?
2. Did any of the top clusters identify genes that you think might be important in or
related to the cellular senescence? If so, which ones and why?
3. Compare your top enrichment clusters with the top over-represented GO terms from
the Panther or DAVID analysis done in Exercise 4. Do the clusters seem to overlap
with the top terms?
BCHM 6280 2016
Exercise 5a
Page 1 of 3
5-2) Generate a sublist of genes of interest
For each of the UP and DOWN regulated gene lists, generate a sublist from one of the top 4
annotation cluster. Your sublist should have between 15 and 35 genes. You can do this in one of
2 ways:
1. Within the cluster, click on the red G and download file from the window that opens.
Save it as a text file and open in Excel. It will contain the Entrez Gene IDs, gene
symbols and short description of the gene or gene name
2. Click on one or more of the check boxes to the left of the terms and when you’ve
selected all you want to select, click the Generate Sublist button. Give it a meaningful
name. When you return to the main page after closing the Functional Annotation
Window, the new list will appear. Select the sublist in the dialog box on the left and
click use. Then use Gene Name batch viewer to see the list and download as a text
file. Import into Excel. This will also have the Entrez Gene IDs, gene symbols and
name.
In a single Excel file, have each gene list as its own worksheet with an appropriate name.
Describe how you generated the sublists and why you chose the ones that you saved.
5-3) Explore your gene lists using Uniprot
On the UniProt main page, click the link on the menu bar Retrieve ID/mapping.
Under 1: Provide you identifiers:
Copy the Entrez Gene IDs from your sublist and paste in the provided text box
Under 2: Select Options
Set the From dropdown menu to GeneID (Entrez Gene) and To dropdown menu to UniprotKB
Click the GO button.
You should note several things:
1. The Entrez Gene ID may map to more than 1 protein record in Uniprot
2. Some of the Uniprot records are Reviewed (Gold) and some are Unreviewed (Blue). Usually
if the Entrez ID maps to more than 1 Uniprot record, one is reviewed and the other are
unreviewed.
3. The Entry column provides the Uniprot Accession and is a link to the full Uniprot record for
that protein.
4. You can download the data as sequences or tab-delimited data that can be imported into
Excel. Save the exported data as a Excel workbook, with each gene list as a separate
worksheet.
Spend some time looking at your lists. When choosing a gene for follow-up studies, at least
within the context of this exercise, I would recommend choosing one that has been reviewed. It
is likely to have more data associated with it. Chose one from each sublist and open the Uniprot
record for them.
Your lists should now include the Uniprot accessions. You can submit a list of these accession
numbers to see which proteins in your list are also in the SMART database. This may help you
narrow down your list to the few genes you want to study.
BCHM 6280 2016
Exercise 5a
Page 2 of 3
Describe the two genes with the following information:
1. Why you chose them
2. Gene name
3. Probable function/role in the cell
4. Potential role in senescence (this can be quite far fetched)
5. What protein family they belong to
6. What domains they have
7. Likely or known cellular location(s)
8. Are there any disease associated variants located within the protein sequence?
Most, if not all of this information, can be found within or linked from the Uniprot record.
Within the Uniprot record, the Feature viewer that shows much of this information in a compact
view that can be expanded.
BCHM 6280 2016
Exercise 5a
Page 3 of 3