Download introduction modeling gene expression profiles kl

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics in learning and memory wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Public health genomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene wikipedia , lookup

NEDD9 wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genome (book) wikipedia , lookup

Genome evolution wikipedia , lookup

Gene therapy wikipedia , lookup

Epigenetics of human development wikipedia , lookup

The Selfish Gene wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Gene nomenclature wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Nutriepigenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
An Information-theoretic Dissimilarity For Clustering Gene Expression
Profiles Models
Jyotsna Kasturi, Raj Acharya, Shruthi Prabhakara
Department of Computer Science and Engineering, Penn State University
INTRODUCTION
A new method to smooth the gene expression data and measure expression dissimilarity between genes
is presented [Kasturi, J and Acharya, R. IJCNN 2008]. A Kullback-Leiber (KL) based Clustering method to
analyze the noisy time-dependent gene expression data is proposed. The method presented is a twostep process:
 Modeling Gene Expression Profiles using Gaussian Radial Basis Functions (GRBF).
Location-based Match Dissimilarity between Gene Profile distributions followed by clustering.
MODELING GENE EXPRESSION PROFILES
Let G= {g1, g2,…gN} denote the data matrix containing expression
levels of N genes measured over time. The expression profile of
each gene gi can be approximated using a linear combination of ni
non-linear basis functions.
ni : No. Gaussian components
Θ = (µ ,σ): mean and width
of distribution
Observed data (circles), linear fit (dotted line),
GRBF Fit with 4 Gaussian components(solid line)
and individual components(dash dotted line)
The parameters of GRBF model are learned using back
propagation.
KL- BASED DISSIMILARITY BETWEEN GENE PROFILE DISTRIBUTIONS
KL Location Match, a new dissimilarity measure uses a matching
strategy by calculating the distance between every Gaussian
component in one gene to its closest paired component in the
other gene. The normalized weight for the kth component is
denoted as βk and given by:
Location based match between GRBF with 3
components and GRBF with 4 components.
where
The parameter τ represents the threshold value by which
components that contribute significantly to the shape of an
expression profile are selected based on mixture weights.
Two Gaussian Radial Basis Functions where the
components are utilized in the KL divergence
approximation based on their mixture weights.
CLUSTERING
Clustering is performed using the k-medoid procedure on the RBF-fitted genes using KL-LocMatch
dissimilarity, which may be made symmetric as the sum of the two asymmetric dissimilarities.
D
Cluster obtained using Location-based Match approximation with varying threshold values of (A) 50% (B) 70%
(C) 80% (D) 100%
Davies Bouldin Cluster Validity Index
calculated for number of Clusters
CONCLUSION
A new model-based approach to smoothing and measuring the dissimilarity for gene expression data
from time-dependent experiments is proposed. Results from real data when validated show that the
proposed method is a powerful tool for exploratory data analysis and clustering gene expression data.
This method can be applied to evenly or unevenly spaced time-series data.