* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CSE891 Selected Topics in Bioinformatics
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Distributed firewall wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Computer network wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Network tap wikipedia , lookup
CSE891-002 Selected Topics in Bioinformatics Jin Chen 232 Plant Biology Bld. 2011 Spring 1 About me… • Jin Chen, Assistant Professor in CSE and PRL from 2009 • Office: 232 Plant Biology Lab. Tel: (517) 355-5015. Email: [email protected] 2 Outline • Course Description • Introduction to Computational Network Biology 3 Course Description • Course objectives: study interesting computational network biology problems and their algorithms, with a focus on the principles used to design those algorithms. (3 credits) • Instructor: Jin Chen, Office: 232 Plant Biology Bld. Email: [email protected] • Office hours: Thursday 2PM-3PM. If you cannot attend office hours, email me about scheduling a different time. • Web page: http://www.msu.edu/~jinchen/cse891a 4 Course Description • Course work: One 80 minutes lecture, and 80 minutes of discussion & student presentations each week • Grading policies: The course will be graded on attendance (10%), participation (20%), and presentation (70%). • No Final Exam 5 Course Description • Prerequisites: Graduate students in science or engineering. Note: an override is necessary for non-CSE graduate students; please send your PID & NetID to me. • No prior knowledge of biology is required. Computationally inclined biology graduate students are encouraged to take the class as well. 6 Suggested books • A.-L. Barabási, Linked: The new science of networks • U. Alon, An Introduction to Systems Biology • B. Palsson. Systems Biology: Properties of Reconstructed Networks • K. Kaneko, Life: An Introduction to Complex Systems Biology 7 Course Description Graph model Graph clustering subgraph mining Protein-protein interaction network Network Biology Gene regulatory network Metabolic network Integrative study Graph Mining 8 Date Title 1/11/11 Course Organization. Introduction. 1/13/11 Protein-Protein Interaction networks I 1/18/11 Student Presentation & Discussion 1 Topics Introduction to Computational Network Biology PPI network construction and false positive detection 1/20/11 Protein-Protein Interaction networks II 1/25/11 Student Presentation & Discussion 2 Topological analysis in PPI networks. Network motif. Applications of PPI network (protein function prediction, 1/27/11 Protein-Protein Interaction networks III network comparison) 2/1/11 Student Presentation & Discussion 3 2/3/11 Gene Correlation Networks I Gene co-expression study 2/8/11 Student Presentation & Discussion 4 2/10/11 Gene Correlation Networks II Gene co-regulation study 2/15/11 Student Presentation & Discussion 5 2/17/11 Gene Transcriptional Regulation Networks I cis-elements and gene co-regulation 2/22/11 Student Presentation & Discussion 6 2/24/11 Gene Transcriptional Regulation Networks II Bayesian network for GRN construction 3/1/11 Student Presentation & Discussion 7 3/3/10 Gene Transcriptional Regulation Networks III ChIP-seq and its applications in GRN construciton 3/7-11 Spring Break 3/15/11 Student Presentation & Discussion 8 3/17/11 Gene Transcriptional Regulation Networks IV GRN topological study 3/22/11 Student Presentation & Discussion 9 3/24/11 Metabolic Networks I Flux balance analysis and metabolic control analysis 3/29/11 Student Presentation & Discussion 10 3/31/11 Metabolic Networks II Integrative study: r-FBA model 4/5/11 Student Presentation & Discussion 11 4/7/11 Graph Mining I Graph models 4/12/11 Student Presentation & Discussion 12 4/14/11 Graph Mining II Graph clustering and partitioning 4/19/11 Student Presentation & Discussion 13 4/21/11 Graph Mining III Frequent subgraph mining 4/26/11 Student Presentation & Discussion 14 9 Paper list 1. Chua et al. Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions. Bioinformatics (2006) 22 (13): 1623-1630. 2. Kashani et al. Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics 2009, 10:318 3. Deng et al. Prediction of Protein Function Using Protein–Protein Interaction Data. Journal of Computational Biology. December 2003, 10(6): 947-960. 4. Hu et al. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics. Vol. 21 Suppl. 1 pp. i213–i221. 2005 5. Xu et al. Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles. ICDE 2006 6. Xu et al, Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines. PLoS Computational Biology. 5(4) 2009 7. Huang et al. Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Systems. 43. 1207–1225. 2007 8. Honkela et al. Model-based method for transcription factor target identification with limited data. PNAS vol 107(17) pp. 7793–7798. 2009 9. Vermeirssen et al. Transcription factor modularity in a Gene-Centered C. elegans Protein-DNA interaction network. Genome Research 17, 061-1071. 2007 10. Covert et al, Transcriptional Regulation in Constraints-Based Metabolic Models of Escherichia coli, Journal of Biological Chemistry, 277(31): pp. 28058-28064. 2002 11. Herrgard et al. Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae. Genome Research. 16:627–635. 2006 12. Barabási et al. Network Biology: Understanding the Cell's Functional Organization. Nature Reviews Genetics 5, 101-113. 2004 13. Dongen. A cluster algorithm for graphs. Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000 14. Huan et al. Mining Family Specific Residue Packing Patterns from Protein Structure Graphs, RECOMB, pp. 308-315, 2004 10 Course Description • Select at least one paper for presentation from the paper list. Email me which paper you will present by next Mon (1/17/2011) • Each presentation is 45 min, including 15 min Q&A, followed with a discussion • Your grade will be largely determined by the presentation (70%) • Presentation starts from next Tue (1/18/2011) 11 Important Days: Class Begins Open adds end Last day to drop with refund Last day to drop with no grade reported Class Ends 1/10/2011 1/14/2011 2/3/2011 3/2/2011 5/6/2011 12 Introduction to Computational Network Biology • Network biology belongs to systems biology, which belongs to genomics • Interested in the relations between entities rather than the entities themselves http://bionet.bioapps.biozentrum.uni-wuerzburg.de/ 13 Network’s everywhere • Internet, social network, anti-terrorism network • Biological networks – – – – – – Protein-protein interaction (PPI) network protein-DNA interaction network gene correlation network gene regulatory network metabolic network signaling network… • Network is a tool for under standing complex systems • Network models explains network properties and support network behavior study • Network measures provide quantitative analysis for complex systems 14 Definition of network (graph) Self-loop Multi-set of edges Edge G(V,E) Node (vertex) Simple graph: does not have loops (self-edges) and does not have multi-edges. 15 Definition of network (graph) Directed graph vs. Undirected graph Labeled graph vs. Unlabeled graph Symmetric graph vs. Asymmetric graph 16 Webpage layout Pages on a web site and the hyperlinks between them 17 M. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 2004 Adopted from R Albert’s 18 slides Biological networks 19 Yeast Protein-Protein Interaction network Hawoong Jeong 20 Gene regulation network of sea urchin Eric Davidson 21 Metabolic flux analysis of E. coli Abhishek Murarka 22 Why study networks? • Complex systems cannot be described in a reductionist view • Behavior study of complex systems starts with understanding the network topology • Network - related questions: – How do we reconstruct a network? – How can we quantitatively describe large networks? – How did networks get to be the way they are? 23 Simple measures • Node Degree: the number of edges connected to the node – In-degree & Out-degree – Total in-degree == total out-degree • Average Degree: the average of node degrees for all the nodes in the network, denoted as: where N is the number of nodes in the network, ki is the node degree of node i • Degree distribution: the degree distribution P(k) gives the fraction of nodes that have k edges 24 Simple measures • Shortest path: to find a path between two nodes such that the sum of the weights of its constituent edges is minimized • Graph diameter: the longest shortest path between any pair of nodes in the graph. • Connected graph: any two vertices can be joined by a path • Bridge: if we erase the edge, the graph becomes disconnected 25 Simple measures • Betweenness centrality: for all node pairs (i, j), find all the shortest paths between nodes i and j, denoted as C(i,j), and determine how many of these pass through node k, denoted as Ck(i,j). Betweenness centrality of node k is • Calculating the betweenness involves calculating the shortest paths between all pairs of vertices on a graph. O(V2logV + VE) for sparse graph with Johnson’s algorithm. L. C. Freeman, Sociometry 40, 35 (1977); P. E. Black, Dictionary of Algorithms and Data Structures (2004) 26 Complex measures • Frequent subgraph mining • Graph comparison & classification • Graph isomorphic testing 28 Useful software • Visualization & Topological Analysis – Cytoscape (www.cytoscape.org) – Pajek (vlado.fmf.uni-lj.si/pub/networks/pajek) • Graph related programming – LEDA (www.algorithmic-solutions.com) – Nauty (www.cs.sunysb.edu/~algorith/implement/nauty/impleme nt.shtml) 29 1960 1999 2002 Real networks are much more complex • Transcription regulatory networks of Yeast and E. coli show an interesting example of mixed characteristics – how many genes a TF interacts with - scale-free – how many TFs interact with a given gene - exponential Modularity and network motif • Cellular function are likely to be carried out in a highly modular manner • Modular -- a group of genes/proteins that work together to achieve distinct functions • Biology is full of examples of modularity Remaining challenges • Discovery of network motifs is closely related to the generation of random networks • Structure of network motifs does not necessary determine function • Relation between higher-level organizational, functional states and networks has not yet been studied Voigt, W. et al. Genetics 2005 Ingram P.J.et al. BMC Genomics 2006 Eric Werner. Nature 2007 Next class • PPI network construction • False-positive detection 38