Download Learning Regulatory Networks from Sparsely Sampled Time Series

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Genetically modified crops wikipedia , lookup

Transposable element wikipedia , lookup

X-inactivation wikipedia , lookup

Human genome wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Twin study wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Heritability of IQ wikipedia , lookup

Oncogenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene expression programming wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

RNA-Seq wikipedia , lookup

Designer baby wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Essential gene wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Ridge (biology) wikipedia , lookup

Transcript
Cluster Overlap Distribution Map (CODM)
- Software
by: Makoto Kano, Shuichi Tsutsumi, Nobutaka Kawahara Yan
Wang, Akitake Mukasa Takaaki Kirino and Hiroyuki Aburatani
Presenter: Jia Meng
What is CODM?
Cluster Overlap Distribution Map (CODM) is a visualization
methodology
CODM compares the clustering results generated under two
different conditions
Background, Problem & Objective



Advances in microarray technologies have made it possible to
comprehensively measure 30,000 genes at the same time.
No body can handle tens of thousands genes separately.
Clustering seems a significant approach to handle them.
Problem

Although many clustering algorithms, have been proposed,
there are few effective methods to effectively compare
clustering results under different conditions.
Objective is:
compare clustering results under different conditions
Basic Idea: Format

two cluster sets are mapped
respectively to the X-axis and
on the Y-axis

The statistical evaluation values
of the overlaps between two
clusters selected from the
respective cluster sets are
displayed as the height of the
blocks
Basic Idea: compute the height of block
E( g, nxi , ny j , kij ) represents statistical evaluation values of the
overlaps between cluster Xi and Yj:
g is the total number of genes
Nxi is the number of genes in cluster Xi
Nyj is the number of genes in cluster Yj
Kij is the number of overlapped genes in Xi and Yj

we will evaluate the number of common genes between the two
different clusters by using hypergeometric probability distributions

Core idea

Assuming that the generation of gene clusters is a random selection
from among the total set of genes, the probability of observing at
least (k) overlapping genes between randomly selected (n1) genes
and (n2) genes from among all of the (g) genes is what we need.
Algorithm

the probability of observing at least (k) overlapping genes
between randomly selected (n1) genes and (n2) genes from
among all of the (g) genes is
When the P-value is small, the overlap is regarded as statistically
meaningful.
we defined the evaluation value of the overlap as:
Example
Data acquired from two environments
Compute CODM
Hidden Block
Hidden blocks (When dealing with hierarchical clustering results)
About CODM software




CODM, available on web site (http://www.genome.rcast.utokyo.ac.jp/CODM).
runs on a PC with Windows 2000 or Windows XP.
Memory requirement is in proportion to the square of the
number of genes to be analyzed.
In addition, a machine with a graphic board with a hardware
accelerator for the OpenGL is recommended.
Future Work

This method can help detect similarity between two clustering
results, but how to detect similar structure among three or
more clustering results?

This method is based on hard assignments, but if we use
statistical clustering method, we have only probability. This
method can’t work on soft assignments (Probabilities)