Download Chromatin modification-aware network model - Bio

Chromatin modification-aware network model Based on ARACNe Cho, Young Mi a Department of Biological Science, KAIST Abstract DNA microarray experiments can measure the mRNA expression level of all the genes of an organism, providing a “genomic” viewpoint on gene expression. But the control of gene expression can be represented as not only genetic mechanism, but also epigenetic regulation mechanism. This epigenetic part takes a significant role in gene expression regulation. In this study, we develop “Chromatin modification-aware network model” which integrates epigenetic regulatory mechanism with previously well known gene regulatory network, ARACNe. Using Chip-Chip and histone modification pattern of gene, we give prior information about edge and node to compose the epigenetic regulatory network with ARACNe algorithm. 1. Introduction Inferring gene regulatory network is one of the main goals of functional genomics. Development of microarray technology has provided the large-scale gene expression profile data. DNA microarray experiments can measure the mRNA expression level of all the genes of an organism, providing a “genomic” viewpoint on gene expression. But the organization of gene-expression profile data into functionally meaningful genetic information has proven difficult and so far has fallen short of uncovering the intricate structure of cellular interactions. There are several available methods for this challenge, called network reverse engineering or deconvolution as following : optimization methods, which maximize a scoring function over alternative network2,3 models such as Bayesian network, Boolean network, etc; regression techniques, which fit the data to a priori models; integrative bioinformatics approaches, which combine data from a number of independent experiment clues; and statistical methods which rely on a variety of measures of pairwise gene-expression correlation. Recently, an “epigenetic” view point in gene expression is more and more emphasized. Epigenetics is the study of epigenetic inheritance, a set of reversible heritable changes in gene functions or other cell phenotypes that occur without a change in DNA sequence (genotype). It has been understood for some time that many diseased cells, and particularly those in cancer tumors, have altered epigenetic patterns. However, only more recently has the importance of the role of epigenetic modification mechanisms begun to be appreciated as a new way of attacking cancer. The control of gene expression is not a single result of genetic regulation but also of epigenetic regulation and post-transcriptional regulation. But, previous works on gene regulatory network have ignored the epigenetic part of gene expression and mainly concerned about only the genetic mechanism. In this work, we develop Chromatin modification-aware network model which integrates epigenetic regulatory mechanism with previously well known gene regulatory network, ARACNe. ARACNe1 (an Algorithm for the Reconstruction of Accurate Cellular Networks) was designed to build a model which is available for the genome-wide reverse engineering of larger scale cellular networks. ARACNe was comparable to Bayesian networks in sensitivity and largely superior in precision 2. To see the epigenetic state of the gene regulation, we introduce Chip-Chip data and Histone modification pattern of the regulatory region of gene. Chip-Chip data offers the prior information of edge comprising the regulatory network. In addition, Histone modification pattern of the regulatory region of gene provides the prior information about nodes comprising the network. And with this two prior information, we made scoring system to determine more precise edge and nodes on the network. 2. Background 2.1 Theoretical Background of ARACNe1 Temporal gene expression data is difficult to obtain for higher eukaryotes, and cellular populations harvested from different individuals generally capture random steady states of the underlying biochemical dynamics. Therefore only steady-state statistical dependences can be studied. The joint probability distribution(JPD) of the stationary expressions of all genes, , as : (1) where N is the number of genes, Z is the normalization factor, also called the partition function, Φ… are potentials, and H({qj}) is the Hamiltonian that defines the system’s statistics. Within this model, a set of variables interacts if and only if the single potential that depends exclusively on these variables is nonzero. ARACNe aims precisely at identifying which of these potentials are nonzero, and eliminating the others even though their corresponding marginal JPDs may not factorize. 2.2 Approximations of the interaction structure Since typical microarray sample sizes are relatively small, inferring the exponential number of potential n-way interactions of Eq.(1) is infeasible and a set of simplifying assumptions must be made about the dependency structure. The simplest model is one where genes are assumed independent, i.e., , such that first-order potentials can be evaluated from the marginal probabilities, P({qj}), which are estimated from experimental observations. M>100 is generally sufficient to estimate 2-way marginals in genomics problems, while magnitude more samples. Within approximation requires about an order of , all genes for which are declared mutually non-interacting. This includes genes that are statiscally independent(i.e., ) as well as genes that do not interact directly but are statistically dependent due to their interaction via other genes(i.e., , but ). 3. Algorithm In ARACNe algorithm, within the assumption of a two-way network, all statistical dependencies can be inferred from pairwise marginals, and no higher order analysis is needed. First step is idenfying candidate interactions by estimating pairwise gene expression profile mutual information. Then filter MIs using an appropriate threshold, computed for a specific p-value, in the null-hypothesis of two independent genes. [Figure 1] Now we develop additionally a method to introduce epigenetic regulatory mechanism for inferring gene regulatory network. To see the epigenetic state of the gene regulation, we introduce Chip-Chip data and Histone modification pattern of the regulatory region of the gene. Chip-Chip data offers the prior information of edge comprising the regulatory network. In addition, Histone modification pattern of regulatory region of the gene provides the information about nodes comprising the network. And with these two prior informations, we made scoring system to determine more precise edge and nodes on the network. Chip-Chip, a tool for genome-scale mapping of in vivo protein–DNA interactions allows global views of transcription factor binding. Chip-Chip provides physical interaction data of two nodes. Chip-Chip data, which provides location data, is used as prior information for determining edge. When chipchip indicates that If the gene product of A binds on the promoter of B, then the edge between A and B is strengthened. On the contrary, If there is no binding property between two genes, then the edge between two genes will be weakened. In addition, Histone modification patterns of the gene region can give us epigenetic state of the nodes on the network. Histone modification patterns such as H3K9Ac, H3K14Ac and H3K4 tri-Me are highly associated with transcription level. Histone modification patterns of each nodes can be classified as two states.(ref 5) One indicates “active” and the other does “inactive”. Histone modification data is used as prior information about the nodes constructing the network. If a node is indicated “active” in histone modification pattern, then all the edges connected with this node will be strengthened. But if a node is “inactive” on histone modification pattern, all the edges connected with this node will be weakened. And then strengthened or weakened mutual information of each gene pair is delivered to edge determining algorithm, Data processing inequality(DPI). As a result, devised regulatory network considering epigenetic state comes out. 3.1 Mutual Information Mutual information for a pair of discrete random variables, x and y, is defined as I(x,y) = S(x) + S(y) - S(x,y), where S(t) is the entropy of an arbitrary variable t. Entropy for a discrete variable is defined as the average of the log probability of its states: where p(ti) = Pr(t = ti) is the probability associated with each discrete state or value of the variable. If the variable is continuous, the entropy is replaced by the differential entropy, which has the same definition as S(t) in the preceding equation but where the summation is replaced by an integral and the discrete distribution is replaced by a probability density. To estimate the entropy, the property that mutual information is invariant under any invertible reparameterization of either x or y, is used. This can be expressed as I(x' = f1(x),y' = f2(y)) = I(x,y), with both f1 and f2 being invertible. Here, reparameterize the data using a rank transformation that projects the Nm measurements for each gene into equally spaced real numbers in the interval [0,1], preserving their original order. This transformation is also called copula, and it has the advantage of transforming the probability density of the individual variables into a constant, p(x') = p(y') = 1. Under this transformation, both S(x') and S(y') become constant and equal to zero. As a result, only S(x',y') must be estimated. For the synthetic analysis, this is done using a Gaussian Kernel estimator where Here, p(xi), p(yi) and p(xi,yi) are defined as The optimal values of the smoothing parameters d1 and d2 are obtained from Monte Carlo simulations, using a wide range of bivariate normal probability densities. For the large set of cell expression profiles, using a slightly less accurate but much more computationally efficient approximation is better. 3.2 Statistical threshold for mutual information. For each value of Nm in the synthetic data analysis, the P value associated with a given value of mutual information in the null hypothesis is obtained by Monte Carlo simulation using 10,000 iterations. The null hypothesis corresponds to pairs of nodes that are disconnected from the network and from each other. These follow a randomwalk dynamic, in the range [1,100], with a noise term drawn from a uniform probability density over the interval [-10,10]. For analysis, the P value should be computed by Monte Carlo simulation. Because a null-hypothesis dynamical model is not available, it is defined as a pair of existing genes whose values are randomly shuffled at each iteration with respect to the microarray profile in which they were observed. 3.3 Data Processing Inequality First define two genes, x and y, as indirectly interacting through a third gene, z, if the conditional MI I(x,y|z) is equal to zero. The two genes are directly interacting if no such third gene exists, implying that there is direct transfer of information between them. The DPI asserts that if both (x,y) and (y,z) are directly interacting, and (x,z) are indirectly interacting through y, then I(x,z) I(x,y) and I(x,z) I(y,z). This inequality is not symmetric, meaning that there may be situations where the triangle inequality is satisfied but x and z may be directly interacting. As a result, by applying the DPI to discard indirect interactions (i.e., (x,z) relationships for which the inequality is satisfied), we may be discarding some direct interactions as well. These are of two kinds: (i) cyclic or acyclic loops with exactly three genes and (ii) sets of three genes whose information exchange is not completely captured by the pairwise marginals. A typical example of the latter would be the Boolean operator XOR, for which the mutual information between any subpair of the three variables is zero. A percent tolerance for the DPI to account for inaccurate estimates of the difference between two close mutual information values is used. This is implemented by rewriting the DPI using a percent tolerance threshold ε: I(x,z) I(x,y)[1 - ε] and I(x,z) I(y,z)[1 - ε]. This has the advantage of avoiding rejection of some borderline edges, resulting in some loops of size three to occur in the predicted topology. Reference 1. Margolin, A. A. et al., 2004, ARACNe : An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context 2. Basso, K. et al., 2005, Reverse engineering of regulatory networks in human B cells, Nature Genetics 37, 382-388 3. Bulashevska, S., Eils, R., 2005, Inferring genetic regulatory logic from expression data, Bioinformatics 21, 2706-2713 4. Friedman, N., Linial, M., Machman, I. & Pe’er, D., 2000, Using Bayesian networks to analyze expression data, J. Comput. Biol. 7, 601-620 5. Chih Long Liu, Tommy Kaplan, Minkyu Kim, Stephen Buratowski, Stuart L Schreiber, Nir Friedman, and Oliver J Rando, 2005, Single-Nucleosome Mapping of Histone Modifications in S. cerevisiae, PLoS Biol. 3(10): e328 6. Sunjae Lee’ s lab seminar presentation.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chromatin modification-aware network model - Bio