Download File - Chad M. Hodge

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Polycomb Group Proteins and Cancer wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
Homework 2 – DAVID functional annotation
BMI 6030 – Eilbeck
Chad Hodge u0584663
Section I
1. How many DAVID IDs have been analyzed?
a. 1234 DAVID IDs were analyzed
2. How many gene IDs were submitted?
a. 1238+43 = 1281? (not 43, but 4?) screen shot.
3. How many genes from the original gene list will not be included in the DAVID analysis?
a. 18 (4?)
4. Why are there more genes included in GOTERM_BP_1 analysis than in GOTERM_BP_5?
a. As noted in the instructions, BP_1 is a more general, higher level node in the ontology,
whereas BP_5 is more specific. Thusly, the more general term is going to be less
restrictive by the very nature on the ontology, and will have more genes included.
5. This dataset is derived from vein endothelial cells. Are there any terms in either chart that you
might expect to see with this dataset?
a. After reading the Hang article, and doing a little bit of Google’ing I would expect to see
these sorts of terms, related to vein endothelial cells: Angiogenesis, blood vessel,
morphogenesis, actin filament based movement, artery, and perhaps even umbilical and
cobalt, given Hangs focus.
6. Which chart provides these ‘endothelial cell’ relevant terms?
a. GOTERM_BP_5 has many of these. BP_1 is much more generalized, as one would expect
due to its order in the ontology.
7. Between the GOTERM_BP_FAT and GOTERM_BP_ALL, which is the most significant term in each
chart? Which of these two GO terms is most informative?
a. The most significant term for GOTERM_BP_FAT, in terms of its p-value, is “RNA
biosynthetic process”(m-phase of mitotic cells?)
b. The most significant term for GOTERM_BP_ALL, in terms of its p-value is “Regulation of
transport” (organelle organization?)
c. The GOTERM_BP_ALL term of “regulation of transport” is much more descriptive, and
thus more informative, especially so when you drill in to the term itself and read its
description. (m-phase)
Section II
8. What is the most significant GO Molecular Function term?
a. Based on p-value, that would be “deoxyribonuclease activity” (nucleotide binding?)
9. What is the most significant Interpro domain?
a. “EGF” according to p-value, which is epidermal growth factor-like domain. (pleckstrin
homology?)
10. Are these terms (or related term) identified in the both analyses?
a. The molecular function does not show EGF, or any of its clan memebers.
b. Interpro does not have the “deoxyribonuclease activity” term, nor any of its child term
on its list.
c. In terms of just general overlap between the two, by looking at the functional
annotation cluster report, there appear to be some overlaps, such as protein kinase, as
well as serine/threonine protein kinase, and insulin like growth factor binding, aspartyltRNA, aminoacyl-tRNA, as well as several others.
d. Nucleotide binding in both analysis
11. Look for kinase activity related terms in these two charts. How similar are these terms between
the two charts?
a. There is a lot of kinase overlap, such as protein kinase, serine/threonine and tyrosine.
Section III
12. What is the most significant KEGG pathway?
a. Based on p-value, the most significant KEGG pathway is “nucleotide excision repair”
(DNA replication)
13. How many genes are associated with this term?
a. There are 8 genes associated with this term.
b. (16?)
Section IV
14. We found that ‘programmed cell death’ (Fisher’s exact test, P = 2.1x10-7) is only significantly
observed in the upregulated genes, which indicates that apoptosis is initiated in response to
mimicked hypoxia in HUVECs. Does your GO Biological Process analysis confirm this statement?
a. Yes. I do see a programmed cell death entry, but not at the exact same fisher exact
value, and also I do see the cell death in the downregulated genes, not just in the up.
i. GOTERM_BP_FAT programmed cell death 1.8E-5
ii. Apoptosis?
iii. P values are significant for both, but not exact.
15. Are there any other Biological Processes that are identified by the upregulated and
downregulated dataset analyses that may be relevant
a. Interestingly, when I look at the fisher exact score in the downregulated genes, I see that
‘programmed cell death’ is the most significant biological process. Induction of cell
death as well.
Section V
16. How many DAVID IDs are in the current ‘combined’ gene list?
a. 1524
17. Compare the BP_FAT tables from all three gene lists. What is the benefit of carrying out a
combined analysis in comparison to looking at the upregulated and downregulated gene lists
independently?
a. The genes must pass a threshold in order to be considered significant. Combining these
data sets changes those counts, and thus what it considers significant. This pushes some
terms from the edge of significance to one side of the other of that line.