* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Abstract
Oncogenomics wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Metagenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Essential gene wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression programming wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Ridge (biology) wikipedia , lookup
Bis732. Bio-Network, Draft for Term-Project 20063562 Hyeyoung Cho Construction of Genetic Modules by Utilizing Binding and Expression Data Abstract Motivation : Using gene expression profiles and genome-wide location data together, I attempted to discover gene modules to gain a biological insight into the patterns of combinatorial regulation and how the activity of genes involved in related biological processes is coordinated and interconnected. Introduction Construction of modules can help to reduce genetic network complexity without significant loss of explanatory power. Gene modules can be defined in the sense that first they are co-bound by the same set of transcription factors and second they are co-expressed with the same expression pattern. Maybe this can be viewed as that the genes in the module are co-regulated, and hence likely to have a common biological function. Expression profile reflects functional changes in mRNA levels in different conditions. On the other hand, genome-wide binding data suggests other approaches, since this data provides direct evidence of physical interactions. These two data sources can offer complementary information. To determine binding events in location data, researchers have previously used a statistical model and chosen a relatively stringent P-value threshold (0.001) with the intention of reducing false positives at the expense of false negatives. However the P values form a continuum and a strict threshold is unlikely to produce good results. In the work of Bar-Joseph and others in 2003, they introduced the GRAM algorithm which integrates genome-wide binding and expression data and improves on either data source alone. Therefore, it is necessary to compensate for technical limitations in the location data through the integration of expression data allowing the P-value cutoff to be relaxed if there is sufficient supporting evidence from expression data. Method As a genomic location data, I used Lee et al.'s ChIP data which contains genome-wide binding information of 113 yeast regulators. In order to determine target genes of an individual transcription factor, each gene is checked to see if the corresponding p-value is less than 0.001, which would be regarded as true. With these initiating sets of binding data, I construct all possible combinations of regulator. Theoretically the number of possible sets of regulator is the summation of the combination of choosing i from N, which N and i denote the number of all regulators and those of chosen, respectively. However, it finally generated 564 sets of module. As an expression data, I chose the one which is from Spellman et. al.. It contains 6316 yeast whole genomic profiles with 7 time points. As a preprocessing, I filtered out genes with missing values, small variance over time, very low absolute expression values and those with low entropy of profiles, there remain 683 genes which are used for further analysis. After clustering of expression profiles, each cluster comes to have a certain number of genes. At this step, I make the genes be separated as many as possible, in this case, 100 clusters, with the intention for the genes to be distinguished fully representing the functional diversity. With these data together, for each regulator combination, I looked for all the genes bound by this set of regulators. And then I assigned cluster numbers to each candidate module corresponding to the genes which the module itself has. Then we can find some modules that are related to one another in terms of the results of expression profiles. Putting all these modules in the same cluster together, I performed re-clustering according to the frequencies of module. At this moment, I was expecting for the genes to be re-arranged as their biological functions and co-regulatory modules. Result Figure 1 shows intermediate clustering results which have 100 clusters intending for the genes to be separated as many as possible for fully representing the functional diversity. With this clustering information, I constructed genetic modules after adding the binding information. Figure 2 demonstrates an example of constructed modules after re-clustering according to the number of the found modules. Figure 1 Intermediate clustering results which have 100 clusters for intending to separate genes as many as possible for fully representing the functional diversity Figure 2 Results after re-clustering by the number of modules FHL1 PDR1 RAP1 YDR450W 0.00000017 0.094 0.00055 YLR344W 0.0000033 0.88 0.00042 YDR471W 0.000000013 0.00044 0.00041 YNL096C 0.000000097 0.027 0.00013 Table 1 An example of constructed module : YDR450W, YLR344W, YDR471W and YNL096C, all genes are involved in ribosomal protein genes which are possibly regulated by the module including FHL1, PDR1 and RAP1 even though some pvalues do not appear as statistically significant. Figure 3 Information of genes according to SGD : YDR450W, YDR471W, YLR344W and YNL096C, all genes are involved in ribosomal protein genes. Among regulatory modules constructed, one module that caught my attention involves ribosomal protein genes; ribosomes are important protein biosynthetic machines. One of the regulators, FHL1 is known to appear almost all ribosomal protein genes, but little else is well understood. According to the information from SGD, FHL1 appears to YDR450W and YNL096W, additionally RAP1 as well. There are no known regulators of YDR471W and YLR344W. Through this resulting module, maybe I might conclude that the four genes are regulated by the module including FHL1, PDR1 and RAP1. Even though PDR1 does not appear in the p-value upper than 0.001, there might be some possibilities for the PDR1 to be involved in the regulation of ribosomal protein genes. Discussion After carrying out such analysis as I’ve explained so far, I can get two kinds of results. In the first case, there are modules which are assigned by one cluster; I regarded this case as right one. And In another case, there are modules which are assigned by more than one cluster, I think it is necessary to interpret the meaning and need a way to handle the case.