* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download introduction modeling gene expression profiles kl
Epigenetics in learning and memory wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Public health genomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Genome (book) wikipedia , lookup
Genome evolution wikipedia , lookup
Gene therapy wikipedia , lookup
Epigenetics of human development wikipedia , lookup
The Selfish Gene wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene desert wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Designer baby wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
An Information-theoretic Dissimilarity For Clustering Gene Expression Profiles Models Jyotsna Kasturi, Raj Acharya, Shruthi Prabhakara Department of Computer Science and Engineering, Penn State University INTRODUCTION A new method to smooth the gene expression data and measure expression dissimilarity between genes is presented [Kasturi, J and Acharya, R. IJCNN 2008]. A Kullback-Leiber (KL) based Clustering method to analyze the noisy time-dependent gene expression data is proposed. The method presented is a twostep process: Modeling Gene Expression Profiles using Gaussian Radial Basis Functions (GRBF). Location-based Match Dissimilarity between Gene Profile distributions followed by clustering. MODELING GENE EXPRESSION PROFILES Let G= {g1, g2,…gN} denote the data matrix containing expression levels of N genes measured over time. The expression profile of each gene gi can be approximated using a linear combination of ni non-linear basis functions. ni : No. Gaussian components Θ = (µ ,σ): mean and width of distribution Observed data (circles), linear fit (dotted line), GRBF Fit with 4 Gaussian components(solid line) and individual components(dash dotted line) The parameters of GRBF model are learned using back propagation. KL- BASED DISSIMILARITY BETWEEN GENE PROFILE DISTRIBUTIONS KL Location Match, a new dissimilarity measure uses a matching strategy by calculating the distance between every Gaussian component in one gene to its closest paired component in the other gene. The normalized weight for the kth component is denoted as βk and given by: Location based match between GRBF with 3 components and GRBF with 4 components. where The parameter τ represents the threshold value by which components that contribute significantly to the shape of an expression profile are selected based on mixture weights. Two Gaussian Radial Basis Functions where the components are utilized in the KL divergence approximation based on their mixture weights. CLUSTERING Clustering is performed using the k-medoid procedure on the RBF-fitted genes using KL-LocMatch dissimilarity, which may be made symmetric as the sum of the two asymmetric dissimilarities. D Cluster obtained using Location-based Match approximation with varying threshold values of (A) 50% (B) 70% (C) 80% (D) 100% Davies Bouldin Cluster Validity Index calculated for number of Clusters CONCLUSION A new model-based approach to smoothing and measuring the dissimilarity for gene expression data from time-dependent experiments is proposed. Results from real data when validated show that the proposed method is a powerful tool for exploratory data analysis and clustering gene expression data. This method can be applied to evenly or unevenly spaced time-series data.