* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download slide
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					DISCOVERING LARGER NETWORK MOTIFS Li Chen 4/16/2009 CSC 8910 Analysis of Biological Network, Spring 2009 Dr. Yi Pan THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS  Two distinct definitions of a motif based on frequency and statistical significance  Definition 1: a motif is a sub-graph that appears more than a threshold number of times.  Definition 2: a motif is a sub-graph that appears more often than expected by chance. (over-presented motif) THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS  Two characteristics used to evaluate a motif  Frequency: 1. Arbitrary overlaps of nodes and edges (non- identical case) 2. Only overlaps of nodes (edge-disjoint case) 3. No overlaps (edge and vertex-disjoint case) THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS  Statistical Significance: compares the obtained values of the frequencies for the observed and random networks. 1. Z-score 2. Abundance THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS  Models of Random Graphs  Preserves the same degree distribution of biological networks  Preserve degree sequence (search of n-node motifs)  Based on geometric random networks and Poisson distribution of the degree  Incorporate node clustering into model THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 3. Compact Topological Motifs: introduces a compact graph representation obtained by grouping together maximal sets of nodes that are ‘indistinguishable’. The graph on the left show the sets U1 and U2 as compact nodes and U1U2 as compact edge. THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS  Motif Discovery Algorithm  Exact algorithm on motifs with a small number of nodes 1. Exhaustive Recursive Search (ERS): the input network is represented by an adjacency matrix M. (motif size <= 4) 2. ESU: starting with individual nodes and adding one node at a time until the required size k is reached. (motif size <=14) THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS  Approximate Algorithms 1. Search Algorithm Based on Sampling (MFINDER): it picks at random edges of the input graph until a set of k nodes obtained to get sample sub-graph and assigns weights to the samples to correct the non-uniform sampling. It scale will with large networks, but does not scale well with large motifs. THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 2. Rand-ESU: do not needed to compute the weights of all samples compared with MFINDER. ESU builds a tree whose leaves correspond to sub-graphs of size k while internal nodes correspond to sub-graphs of size 1 up to k-1, depending on the tree level. It assigns to each level in the tree a probability that the nodes are further explored, so as to guarantee all leaves are visited with uniform probability. THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 3. NeMoFINDER: combines approaches of data mining and computational biology communities. It search for repeated trees and extend them to sub-graphs. It leads to a reduction of the computation time for discovery of larger motifs, but at the cost of missing some potentially interesting sub-graphs. THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 4. Sub-graph Counting by Scalar Computation: it characterize a biological network by a set of measures based on scalars and functional of the adjacency matrix associated to the network. Its advantages are mathematical elegance and computational efficiency. THE REVIEW ON MODELS AND ALGORITHMS FOR MOTIF DISCOVERY IN PROTEIN-PROTEIN INTERACTION NETWORKS 5. A-priori-based Motif Detection: the basic idea is if a subgraph is frequent so are all its sub-graphs. It builds candidate motifs of size k by joining motifs of size k-1 and then evaluating their frequency. A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Desirable features of clustering algorithms to evaluate  Scalability  Robustness  Order insensitivity  Minimum user-specified input  Mixed data types  Arbitrary-shaped clusters  Point proportion admissibility: Duplicating data and reclustering should not alter the results. A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Five categories clustering algorithm  Partitioning Clustering Algorithm  Hierarchical Clustering Algorithm  Grid-based Clustering Algorithm  Density-based Clustering Algorithm  Model-based Clustering Algorithm  Graph-based Clustering Algorithm A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Partition Clustering Algorithm  Numerical Methods 1. K-means algorithm and Farthest First Traversal k-center (FFT) algorithm 2. K-medoids or PAM (Partitioning Around Medoids) 3. CLARA (Clustering Large Applications) 4. CLARANS (Clustering Large Applications Based upon Randomized Search) and Fuzzy K-means A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Discrete Methods 1. K-modes 2. Fuzzy K-modes 3. Squeezer and COOLCAT.  Mixed of Discrete and Numerical Clustering Methods 1. K-prototypes A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Hierarchical Clustering Algorithm  Divide the data into a tree of nodes, where each node represents a cluster.  Two categories based on methods or purposes 1. Agglomerative vs. Divisive 2. Single vs. Complete vs. Average linkage A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Popular: natures can have various levels of subsets  Drawbacks: 1. Slow 2. Errors are not tolerable 3. Information losses when moving the levels  Two kinds of methods 1. Numerical Methods: BIRCH, CURE , Spectral clustering 2. Discrete Methods: ROCK, Chameleon, LIMBO A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Grid-based Clustering Algorithm  Form a grid structure of cells from the input data. Then each data is distributed in a cell of the grid.  STING combines a numerical grid-base clustering method and hierarchical method A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Density-based Clustering Algorithm  Use a local density standard  Clusters are dense subspaces separated by low density spaces  Examples of bioinformatics application : finding the densest subspaces in interactome(protein-protein interaction) networks A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  DBSCAN, OPTICS, DENCLUE, WaveCluster, CLIQUE use numerical values for clustering  SEQOPTICS is used for sequence clustering  HIERDENC (Hierarchical Density-based Clustering), MULIC (Multiple Layer Incremental Clustering), Projected (subspace) clustering, CACTUS, STIRR, CLICK, CLOPE use discrete values for clustering A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Model-based Clustering Algorithm  Uses a model often derived by a statistical distribution  Bioinformatics applications 1. gene expression 2. interactomes 3. sequences A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Numerical model-based methods 1. Self-Organizing Maps  Discrete model-based clustering algorithm 1. COBWEB  Numerical and discrete model-based clustering methods 1. BILCOM (Bi-level clustering of Mixed Discrete and Numerical Biomedical Data) using empirical Bayesian approach A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Examples 1. Gene expression clustering 2. Protein sequence clustering 3. AutoClass 4. SVM Clustering methods  Graph-based Clustering Algorithm  Applied to interactomers for complex prediction and sequence networks A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Examples: 1. MCODE (Molecular Complex Detection) 2. SPC (Super Paramagnetic Clustering) 3. RNSC (Restricted Neighborhood Search Clustering) 4. MCL(Markov Clustering) 5. TribeMCL 6. SPC 7. CD-HIT 8. ProClust 9. BAG algorithms A ROADMAP OF CLUSTERING ALGORITHM IN BIOINFORMATICS APPLICATIONS  Usage in Bioinformatics Applications  Gene expression clustering 1. K-means algorithm 2. Hierarchical algorithm 3. SOMs  Interactomes 1. AutoClass, 2. SVM clustering 3. COBSEB 4. MULIC  Sequence clustering 1. Hierarchical clustering algorithm REFERENCES       [1] Bill Andreopoulos, Aijun An, Xiaogang Wang, and Michael Schroeder. A roadmap of clustering algorithms: finding a match for a biomedical application. Brief Bioinform, pages bbn058+, February 2009. [2] Alberto Apostolico, Matteo Comin, and Laxmi Parida". Bridging Lossy and Lossless Compression by Motif Pattern Discovery. Electronic Notes in Discrete Mathematics, 21:219 - 225, 2005. General Theory of Information Transfer and Combinatorics. [3] Giovanni Ciriello and Concettina Guerra. A review on models and algorithms for motif discovery in protein-protein interaction networks. Brief Funct Genomic Proteomic, 7(2):147-156, 2008. [4] Jun Huan, Wei Wang, and Jan Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Data Mining, IEEE International Conference on, 0:549, 2003. [5] Michihiro Kuramochi and George Karypis. Finding Frequent Patterns in a Large Sparse Graph. Data Mining and Knowledge Discovery, 11(3):243271, November 2005. [6] Laxmi Parida. Discovering Topological Motifs Using a Compact Notation. Journal of Computational Biology, 14(3):300-323, 2007. Thank you so much !
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            