* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Learning Regulatory Networks from Sparsely Sampled Time Series
Gene desert wikipedia , lookup
Genetically modified crops wikipedia , lookup
Transposable element wikipedia , lookup
X-inactivation wikipedia , lookup
Human genome wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Heritability of IQ wikipedia , lookup
Oncogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Pathogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression programming wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Microevolution wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Designer baby wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Essential gene wikipedia , lookup
Genome evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Minimal genome wikipedia , lookup
Cluster Overlap Distribution Map (CODM) - Software by: Makoto Kano, Shuichi Tsutsumi, Nobutaka Kawahara Yan Wang, Akitake Mukasa Takaaki Kirino and Hiroyuki Aburatani Presenter: Jia Meng What is CODM? Cluster Overlap Distribution Map (CODM) is a visualization methodology CODM compares the clustering results generated under two different conditions Background, Problem & Objective Advances in microarray technologies have made it possible to comprehensively measure 30,000 genes at the same time. No body can handle tens of thousands genes separately. Clustering seems a significant approach to handle them. Problem Although many clustering algorithms, have been proposed, there are few effective methods to effectively compare clustering results under different conditions. Objective is: compare clustering results under different conditions Basic Idea: Format two cluster sets are mapped respectively to the X-axis and on the Y-axis The statistical evaluation values of the overlaps between two clusters selected from the respective cluster sets are displayed as the height of the blocks Basic Idea: compute the height of block E( g, nxi , ny j , kij ) represents statistical evaluation values of the overlaps between cluster Xi and Yj: g is the total number of genes Nxi is the number of genes in cluster Xi Nyj is the number of genes in cluster Yj Kij is the number of overlapped genes in Xi and Yj we will evaluate the number of common genes between the two different clusters by using hypergeometric probability distributions Core idea Assuming that the generation of gene clusters is a random selection from among the total set of genes, the probability of observing at least (k) overlapping genes between randomly selected (n1) genes and (n2) genes from among all of the (g) genes is what we need. Algorithm the probability of observing at least (k) overlapping genes between randomly selected (n1) genes and (n2) genes from among all of the (g) genes is When the P-value is small, the overlap is regarded as statistically meaningful. we defined the evaluation value of the overlap as: Example Data acquired from two environments Compute CODM Hidden Block Hidden blocks (When dealing with hierarchical clustering results) About CODM software CODM, available on web site (http://www.genome.rcast.utokyo.ac.jp/CODM). runs on a PC with Windows 2000 or Windows XP. Memory requirement is in proportion to the square of the number of genes to be analyzed. In addition, a machine with a graphic board with a hardware accelerator for the OpenGL is recommended. Future Work This method can help detect similarity between two clustering results, but how to detect similar structure among three or more clustering results? This method is based on hard assignments, but if we use statistical clustering method, we have only probability. This method can’t work on soft assignments (Probabilities)