Download Abstract

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oncogenomics wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Essential gene wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

NEDD9 wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

Gene wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Minimal genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Ridge (biology) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Bis732. Bio-Network, Draft for Term-Project
20063562 Hyeyoung Cho
Construction of Genetic Modules by Utilizing Binding and Expression Data
Abstract
Motivation :
Using gene expression profiles and genome-wide location data together, I attempted to discover gene
modules to gain a biological insight into the patterns of combinatorial regulation and how the activity of
genes involved in related biological processes is coordinated and interconnected.
Introduction
Construction of modules can help to reduce genetic network complexity without significant loss of
explanatory power. Gene modules can be defined in the sense that first they are co-bound by the same set
of transcription factors and second they are co-expressed with the same expression pattern. Maybe this
can be viewed as that the genes in the module are co-regulated, and hence likely to have a common
biological function. Expression profile reflects functional changes in mRNA levels in different conditions.
On the other hand, genome-wide binding data suggests other approaches, since this data provides direct
evidence of physical interactions. These two data sources can offer complementary information.
To determine binding events in location data, researchers have previously used a statistical model and
chosen a relatively stringent P-value threshold (0.001) with the intention of reducing false positives at the
expense of false negatives. However the P values form a continuum and a strict threshold is unlikely to
produce good results. In the work of Bar-Joseph and others in 2003, they introduced the GRAM
algorithm which integrates genome-wide binding and expression data and improves on either data source
alone. Therefore, it is necessary to compensate for technical limitations in the location data through the
integration of expression data allowing the P-value cutoff to be relaxed if there is sufficient supporting
evidence from expression data.
Method
As a genomic location data, I used Lee et al.'s ChIP data which contains genome-wide binding
information of 113 yeast regulators. In order to determine target genes of an individual transcription factor,
each gene is checked to see if the corresponding p-value is less than 0.001, which would be regarded as
true. With these initiating sets of binding data, I construct all possible combinations of regulator.
Theoretically the number of possible sets of regulator is the summation of the combination of choosing i
from N, which N and i denote the number of all regulators and those of chosen, respectively. However, it
finally generated 564 sets of module.
As an expression data, I chose the one which is from Spellman et. al.. It contains 6316 yeast whole
genomic profiles with 7 time points. As a preprocessing, I filtered out genes with missing values, small
variance over time, very low absolute expression values and those with low entropy of profiles, there
remain 683 genes which are used for further analysis. After clustering of expression profiles, each cluster
comes to have a certain number of genes. At this step, I make the genes be separated as many as possible,
in this case, 100 clusters, with the intention for the genes to be distinguished fully representing the
functional diversity. With these data together, for each regulator combination, I looked for all the genes
bound by this set of regulators. And then I assigned cluster numbers to each candidate module
corresponding to the genes which the module itself has. Then we can find some modules that are related
to one another in terms of the results of expression profiles. Putting all these modules in the same cluster
together, I performed re-clustering according to the frequencies of module. At this moment, I was
expecting for the genes to be re-arranged as their biological functions and co-regulatory modules.
Result
Figure 1 shows intermediate clustering results which have 100 clusters intending for the genes to be
separated as many as possible for fully representing the functional diversity. With this clustering
information, I constructed genetic modules after adding the binding information. Figure 2 demonstrates
an example of constructed modules after re-clustering according to the number of the found modules.
Figure 1 Intermediate clustering results which have 100 clusters for intending to separate genes as many as possible for fully
representing the functional diversity
Figure 2 Results after re-clustering by the number of modules
FHL1
PDR1
RAP1
YDR450W
0.00000017
0.094
0.00055
YLR344W
0.0000033
0.88
0.00042
YDR471W
0.000000013
0.00044
0.00041
YNL096C
0.000000097
0.027
0.00013
Table 1 An example of constructed module : YDR450W, YLR344W, YDR471W and YNL096C, all genes are involved in
ribosomal protein genes which are possibly regulated by the module including FHL1, PDR1 and RAP1 even though some pvalues do not appear as statistically significant.
Figure 3 Information of genes according to SGD : YDR450W, YDR471W, YLR344W and YNL096C, all genes are involved in
ribosomal protein genes.
Among regulatory modules constructed, one module that caught my attention involves ribosomal protein
genes; ribosomes are important protein biosynthetic machines. One of the regulators, FHL1 is known to
appear almost all ribosomal protein genes, but little else is well understood. According to the information
from SGD, FHL1 appears to YDR450W and YNL096W, additionally RAP1 as well. There are no known
regulators of YDR471W and YLR344W. Through this resulting module, maybe I might conclude that the
four genes are regulated by the module including FHL1, PDR1 and RAP1. Even though PDR1 does not
appear in the p-value upper than 0.001, there might be some possibilities for the PDR1 to be involved in
the regulation of ribosomal protein genes.
Discussion
After carrying out such analysis as I’ve explained so far, I can get two kinds of results. In the first case,
there are modules which are assigned by one cluster; I regarded this case as right one. And In another case,
there are modules which are assigned by more than one cluster, I think it is necessary to interpret the
meaning and need a way to handle the case.