* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Computer science
Protein moonlighting wikipedia , lookup
Signal transduction wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Transcription factor wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Understanding Science Through the Lens of Computation Richard M. Karp Nov. 3, 2007 The Power of the Computational Perspective • Exposes the computational nature of natural processes and provides a language for their description. • Brings to bear fundamental algorithmic concepts: adversarial and probabilistic models, asymptotic analysis, intractability, computational learning theory, threshold behavior, fault tolerance, … • Alters the worldviews of many scientific fields. Computational Processes in the Sciences • Regulation of protein production, metabolism and embryonic development • Mechanisms of learning • Molecular self-assembly • Strategic behavior of companies A Computational View of Quantum Mechanics • QM is the right setting for studying computation at subatomic levels. • Theoretical computer science provides a mathematical model for a quantum computer, in which the unit of information is the qubit, a superposition of the values 0 and 1; n qubits exist in 2n states at the same time. • There is evidence that quantum computers are more powerful than classical computers. • The quest to realize quantum computation will test the foundations of QM. “Quantum Computation is as much about testing Quantum Physics as it is about building powerful computers” Umesh Vazirani Links Between Statistical Physics and Computer Science • Both fields study how macroscopic properties of large systems arise from local interactions. – Statistical physics: properties of water and magnetic materials – Computer science: global properties of World Wide Web, structure of complex combinatorial problems Similarities of Models and Methods • Probabilistic models capture statistical behavior of large, complex, heterogeneous and incompletely known systems. • Phase transitions in statistical physics have close parallels with sharp thresholds in computer science. Areas of Convergence • Constraint satisfaction problems • Belief propagation and error-correcting codes • Markov Chain Monte Carlo • Percolation and sensor networks Computational Models of the Web • The Internet and the Web are simultaneously computational, social and economic. They support new modes of interaction. • Novel algorithmic problems: ranking methods of search engines, reputation systems, recommendation systems, mechanism design. Social Sciences and the Web • The Web is a powerful laboratory for studying social and economic systems as computational processes. • Insights from algorithmic game theory are indispensable for understanding the new markets and economic mechanisms that the Internet has spawned. “The Internet is an equilibrium, we just have to identify the game.” Scott Shenker Game Theory Payoff Matrix Left Middle Right Top 0/0 1,2 2,1 Middle 2,1 0.0 1,2 Bottom 1,2 2,1 0,0 • Nash equilibrium: each player chooses each pure strategy with probability 1/3 The Computational Lens on Game Theory • Finding a Nash equilibrium is an intractable problem. • But computational tractability is an important modeling prerequisite. • “If your laptop can’t find it, then neither can the market.” Kamal Jain Computational Processes in Biology • Learning in neural networks • Response of immune system to an invading microbe • Specialization of cells during embryonic development • Collective behavior of animal communities: flocking of birds, self-organization of ant colonies • Design of sensor-actuator control systems for regulation of biological processes • Evolution of species Algorithmic Challenges in Computational Molecular Biology Revolution in Biology • Advances in computation and instrumentation enable a quantitative characterization of biological systems. • Opportunity to advance understanding of molecular processes of life and change the ways we diagnose and treat disease. • Multidisciplinary field: involves the biological, physical, engineering and mathematical sciences. Biological background The eukaryotic cell Goals of Computational Molecular Biology • Sequence and compare the genomes of many organisms. • Identify the genes and determine the functions of the proteins they encode. • Understand how genes, proteins and other molecules act in concert to control cellular processes. Goals of Computational Molecular Biology • Trace the evolutionary history and evolutionary relationships of existing species. • Understand the structure, function and evolutionary history of proteins. • Identify the associations between genetic mutations and disease. Regulation of Gene Expression • Animals can be viewed as highly complex, precisely regulated spatial and temporal arrays of differential gene expression. • Gene expression is regulated by a complex network of interactions among proteins, genomic DNA, RNA and chemicals within the cell. Levels of Regulation • Genome: spells the names of the proteins. • Transcription of genes to mRNA: regulated by binding of transcription factors to DNA in control regions of genes. • Translation of mRNA into functioning proteins, regulated by complex networks of protein-protein and protein-RNA interactions, and by posttranslational modifications of proteins. Levels of Regulation (Cont.) • Regulation of metabolic processes: complex network of chemical reactions catalyzed by enzymes. • Global phenotype such as disease: regulated by interaction of many metabolic processes. Regulatory Networks “We can approach understanding how the whole genome works by breaking it down into groups of genes that interact strongly with each other. Once researchers identify and understand these network modules, the next step will be to figure out the interactions within networks of networks, and so on until we eventually understand how the whole genome works, many years from now. ” Garrett Odell Key Research Areas • Analysis of protein-DNA interactions: breaking the cis-regulatory code. “ Regulatory interactions mandated by circuitry encoded in the genome determine whether each gene is expressed in each cell, throughout developmental space and time, and, if so, at what amplitude.” Eric Davidson • Analysis of protein-protein interactions: identification of molecular machines and signal transduction cascades. Tools for Analysis • Measurement of protein-DNA and protein-protein interactions, and of mRNA production under selected perturbed conditions. • DNA sequence analysis to identify genes, their regulatory regions and the transcription factor binding sites within them. • Phylogenetic analysis to identify regulatory structures conserved across species. • Classification of proteins according to structure and function. Regulatory Control of Transcription • E.H. Davidson Genomic Regulatory Systems • Regulatory interactions mandated by circuitry encoded in the genome determine whether each gene is expressed in each cell, throughout developmental space and time and, if so, at what amplitude. The Ultimate Goal • ``Portions of the endo16 cis-regulatory system of Strongylocentrotus are to date the most extensively explored of any, with respect to the functional meaning of each interaction that takes place within them. What emerges is almost astounding: a network of logic interactions programmed into the DNA sequence that amounts essentially to a hardwired biological computational device. Analysis of Protein-DNA Interactions • Transcription is regulated by proteins called transcription factors that bind to DNA near the transcription start site of a gene and influence the rate of transcription. • Goals: identify the transcription factors, characterize the sites they bind to in the genome, and determine how they act in combination to enhance or inhibit transcription. This information is referred to as the cis-regulatory code. Transcription Factors and Binding Sites Binding Motifs • Motif: short sequence pattern recognized by a transcription factor. • Occurs repeatedly in the genome, but with considerable stochastic variation. Some positions are highly conserved, others exhibit great variation. • Certain combinations of motifs occur repeatedly in clusters. Occurrences of A Motif 5’- TCTCTCTCCACGGCTAATTAGGTGATCATGAAAAAATGAAAAATTCATGAGAAAAGAGTCAGACATCGAAACATACAT …HIS7 5’- ATGGCAGAATCACTTTAAAACGTGGCCCCACCCGCTGCACCCTGTGCATTTTGTACGTTACTGCGAAATGACTCAACG …ARO4 5’- CACATCCAACGAATCACCTCACCGTTATCGTGACTCACTTTCTTTCGCATCGCCGAAGTGCCATAAAAAATATTTTTT …ILV6 5’- TGCGAACAAAAGAGTCATTACAACGAGGAAATAGAAGAAAATGAAAAATTTTCGACAAAATGTATAGTCATTTCTATC …THR4 5’- ACAAAGGTACCTTCCTGGCCAATCTCACAGATTTAATATAGTAAATTGTCATGCATATGACTCATCCCGAACATGAAA …ARO1 5’- ATTGATTGACTCATTTTCCTCTGACTACTACCAGTTCAAAATGTTAGAGAAAAATAGAAAAGCAGAAAAAATAAATAA …HOM2 1: 2: . . . . . M: AAAAGAGTCA AAATGACTCA AAGTGAGTCA AAAAGAGTCA GGATGAGTCA AAATGAGTCA GAATGAGTCA AAAAGAGTCA Informativeness: 2+∑bpbllog2pbl 5’- GGCGCCACAGTCCGCGTTTGGTTATCCGGCTGACTCATTCTGACTCTTTTTTGGAAAGTGTGGCATGTGCTTCACACA …PRO3 1YSA Regulatory Module • Set of mutually cooperating transcription factors that can bind to the control regions of many genes to enhance or inhibit transcription. • Given a database of transcription factors and their binding motifs, can identify such modules by searching for sets of transcription factors whose binding sites tend to co-occur in control regions. Discovery of Conserved Protein Networks Signaling and regulatory pathways and protein complexes consist of genes, proteins and small molecules ``wired’’ together in a complex network of intermolecular interactions. Two proteins interact if they bind to each other or one of them binds to the control regions of the other’s gene. Conserved Complexes and Pathways Many complexes and pathways are conserved – they have evolved over evolutionary time and occur, in modified forms, in many organisms. Our goal: using databases of protein-protein interactions in several species, in conjunction with data about protein sequence, structure, function and expression, identify conserved protein complexes and signaling pathways. Discovery of conserved pathways and complexes allows transfer of functional annotation and prediction of interactions from one species tto another. Pairwise Network Alignment Bacterial pathogen (Helicobacter pylori) http://www.pathblast.org Baker’s yeast (Saccharomyes cerevisiae) Kelley et al. PNAS 2003 Sharan et al. RECOMB 2004 3-way Comparison S. cerevisiae • 4389 proteins • 14319 interactions C. elegans • 2718 proteins • 3926 interactions D. melanogaster • 7038 proteins • 20720 interactions Putative Complexes Conserved in Yeast, Worm and Fly 173 complexes associated with: DNA, RNA and phosphorus metabolism intracellular transport regulation of transcription protein folding, synthesis and degradation homeostasis cell proliferation, development and growth RNA localization Future Work • Identifying the proteins that interact to form a molecular machine is only a first step. The next, and far more difficult step, is to construct detailed predictive dynamical models of the behavior of molecular machines, under both normal and perturbed conditions.