Download DNA in Action! A 3D Swarm-based Model of a Gene Regulatory

DNA in Action! A 3D Swarm-based Model of a Gene Regulatory System Ian Burleigh, Garret Suen, and Christian Jacob Department of Computer Science, University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 Phone: (403) 220-7682, Fax: (403) 284-4707 {burleigh, sueng, jacob}@cpsc.ucalgary.ca http://www.cpsc.ucalgary.ca/∼jacob/ESD/ Abstract. We present a 3D swarm-based model of the classic lactose (lac) operon gene regulatory system. The lac operon is a well-understood genetic switch capable of self-regulation dependent on the energy source of lactose. Our model includes a 3-dimensional visualization which simulates proteins interacting with DNA and incorporates many of the important aspects of the gene system. Our model utilizes a decentralized swarm approach with multiple agents acting independently to exhibit a complex behaviour. Our visualization is implemented using the Java 3DTM Runtime Environment and is also being used in a CAVE! Automated Virtual Environment. Keywords Biological Agents, Bioinformatics, Simulation, Swarm Intelligence ACAL 2003, First Australian Conference on Artificial Life, Canberra, Australia. 1 Introduction Current research in genetics and genomics focuses on understanding the genetics of model organisms, such as the bacterium Escherichia coli, the nematode Caenorhabditis elegans, and the fruitfly Drosophila melanogaster. Working with these simple biological models helps to elucidate more complex processes found in higher order gene networks. One of the major advances in this area of research is in the utilization of computers as an integral research tool, leading to new interdisciplinary fields of Bioinformatics and Biological Computing. Innovative computer graphics and specialized visualization technology, such as the CAVE! Automated Virtual Environment, provide biologists with unprecedented tools for research in ‘virtual laboratories’. In this paper, we present a 3D model of the lactose (short: lac) operon, one of the most basic and well-understood biological models of gene regulation. Several computer models of the lac operon exist, including simple grammarbased approaches [2], Functional Hybrid Petri Net models [5], systems based on rewrite rules [9], and the E-CELL project [10]. However, current visualization models of the lac operon only involve 2-dimensional space. We propose a 3dimensional model of the lac operon that simulates many important aspects of the system, including the processes of transcription and translation. The paper is organized as follows. In section 2, we present a brief synopsis of the lac operon gene regulatory system as it is commonly understood in biology. In section 3, we discuss our implementation of the lac operon, highlighting many of the modelled processes and structures. Finally, we look at the swarm approach used to simulate this gene regulatory system and detail its visualization in a CAVE! immersive environment. 2 The lac Operon: A Gene Regulatory System An operon is a group of genes located on the DNA of bacteria. Jacques Monod and Francois Jacob first studied the lac operon in the 1960s [4]. Found in the bacterium Escherichia coli (E. coli ), it stands as one of the most important findings in genetics, as it is one of the most basic gene regulatory systems known, and is consequently used as a basis for studies of more complex gene systems. The lac operon, in particular, is a gene system that is responsible for converting the sugar lactose into glucose, a key energy source for the bacterium. E. coli is a prokaryotic organism without a nucleus that is normally found in a lactose rich environment, such as the gut of humans. E. coli requires glucose for much of its growth and has evolved a solution for obtaining glucose from its environment, by converting lactose into glucose and galactose. This conversion is accomplished through the enzyme β-galatosidase, which is one of the products of the lac operon. In the presence of lactose, the lac operon has the ability to turn itself on and produce β-galactosidase. When lactose is no longer present, the lac operon turns itself off, hence stops the production of β-galacosidase, and conserves cellular resources. In this manner, the lac operon is capable of self-regulation [1], [6], [7]. 2.1 Self-Regulation of the lac Operon Gene-based self-regulation is an emergent property resulting from the interaction of proteins, enzymes, and DNA. In order to understand how this ‘emergence’ is accomplished, we will have to look at the lactose operon in much closer detail. The main components of the lac operon as a regulatory unit on the bacterial DNA consists of four genes: lacZ, lacY, lacA, and lacI. Gene Complex 1: lacZ-Y-A The lacZ-Y-A genes appear as a single module and are located adjacent to one another on the operon (Fig. 1a). A control complex consisting of an operator and a promoter precedes these three genes. The operator controls the protein production from these genes. Producing a protein from a given gene is accomplished through the action of RNA polymerase. RNA polymerase reads a sequence of genes, resulting in the production of their corresponding proteins through the processes of transcription and translation (Section 2.2). Gene Complex 2: lacI The lacI gene, the second key module, is located downstream of the main lac complex (Fig. 1a). It likewise contains a promoter region, and produces proteins through the same action of RNA polymerase. The lacI protein product is known as a repressor, which has the ability to bind to the operator region and prevent RNA polymerase from reading the lacZ-Y-A genes. Hence, the repressor serves as the basic control mechanism for the lac operon. Turning the Switch When lactose is present in the system, the lac operon can turn itself on (Fig. 1b). This is accomplished through the binding of lactose to the repressor to form a repressor-lactose complex. Due to conformational changes, the repressor-lactose complex cannot bind to the operator region of the lacZ-Y-A genes any more. This allows RNA polymerase to now read lacZ, lacY, and lacA, producing β-galactosidase, lactose permease, and transacetylase, respectively. Among these three gene products, β-galactosidase is the enzyme that converts lactose into glucose and galactose, whereas lactose permease enhances the movement of lactose from the outer environment into the cell. Transacetylase does not seem to play a role in this regulatory system. Once lactose is removed from the system, the repressor is, again, free to bind to the operator region and stop the production of β-galactosidase (Fig. 1a). In this manner, the lac operon is able to regulate its gene products, thus conserving cellular resources. 2.2 Transcription and Translation Once genes are ‘switched on’, RNA polymerase has access to the encoding regions for the structural genes on the DNA. The processes of transcription and translation serve as intermediary steps in order to produce proteins from a given gene. Transcription is the process of converting Deoxyribose Nucleic Acid (DNA) into RNA Polymerase RNA Polymerase Repressor binds to operator I LacI Pi P O LacZ LacY LacA mRNA No mRNA and no proteins LacI Pi P O LacZ LacY LacA mRNA + Ribosomes mRNAs + Ribosomes blocked … + Ribosomes Lactose I I Z Y A Conformational change (a) (b) Fig. 1. Schematic of the lac operon switch. (a) After RNA polymerase docks onto Pi , the LacI promoter site, it transcribes the LacI gene into its mRNA representation, which is then translated by ribosomes into the repressor protein I. This repressor binds to the LacZ-Y-A operator site, which in turn blocks RNA polymerase; hence, none of the three genes are expressed. (b) When lactose enters into the cell, it induces a shape change in the repressors that disables them from binding to the operator. Consequently, the LacZ-Y-A genes are accessible by the RNA polymerase and are expressed as proteins Z, Y, and A. an intermediate protein known as messenger Ribonucleic Acid (mRNA). The enzyme RNA polymerase is responsible for this particular conversion, which proceeds as follows: 1. RNA polymerase searches along the DNA structure until it encounters an appropriate promoter region. 2. Starting at the promoter region, RNA polymerase begins to synthesize mRNA based on the genes found adjacent to the promoter. 3. Once transcription is complete, the mRNA strand is free to undergo a second conversion process (through translation), whereas RNA polymerase reiterates the process of transcription. During translation a protein is synthesized from an mRNA strand. This mRNA-to-protein mapping is achieved through the action of two types of proteins – ribosomes and transfer RNA (tRNA) – as follows: 1. A ribosome locates and attaches to a free mRNA strand. 2. The ribosome begins to read the strand and synthesizes a chain of amino acids with the support of tRNA. The chain then folds into a 3-dimensional protein structure. Multiple ribosomes can simultaneously read and synthesize proteins from a single mRNA strand. 3. Once translation is complete, the ribosome detaches from the mRNA strand and releases the newly made protein. The lac operon is a complex gene-regulatory process that involves the interactions of many agents. To model these processes as they occur spatially in nature, we present a 3D visualization of this simulation. 3 Visualization in 3D Our model of the lac operon visualizes many of the important structures and events associated with the gene regulation system. This includes DNA and its associated gene structures, transcription and translation, and the self-regulatory controls that govern the interactions and system dynamics. The 3-dimensional model is implemented using a swarm approach with multiple independent ‘agents’ following simple rules of interaction. The visualization, developed as a 2D projection on a normal computer screen (Fig. 2), is further enhanced through stereoscopic 3D in a CAVE! immersive environment (Section 3.6). Fig. 2. The lac operon 3D visualization and its graphical user interface. 3.1 Modeling Circular DNA DNA in bacteria exists similar to that found in higher organisms. However, bacterial DNA is circular in nature. We represent the actual encoding of the lactose operon gene as a circular DNA double-helix with its characteristic Watson-Crick complementarity pattern (Fig. 3) [11]. DNA consists of four kinds of bases: Adenine, Cytosine, Guanine, and Thymine. A grouping of three such bases is known as a codon, which codes for a specific amino acid, the basic building blocks of proteins. Due to the vast amount of bases that make up the genes involved in the lac operon, we represent the genetic information as codons. These codons directly correspond to the amino acid composition of their associated proteins. In nature, there are 20 amino acids that make up proteins (see the right margin of Fig. 2). We visualize these codons as colour-coded cylinders that make up the DNA strand (Fig. 4). Fig. 3. The E. coli circular DNA represented as a double helix composed of colourencoded codons 3.2 Modelling Gene Structures There are two distinct gene regions in the lac operon: the lacI and the lacZlacY-lacA region (compare Section 2.1). For the purposes of this model, we have chosen to only model the lacI and lacZ gene regions. The lacY and lacA genes do not greatly impact the understanding of the system and are therefore not included. The lacI and lacZ gene regions are labeled appropriately on the model (Fig. 4). In addition, the promoter and operator regions that precede the lacZ gene are also included. To further clarify the model, the codon numbering is shown as well, highlighting various aspects of the DNA coding sequences. Conventional models of DNA include the -10 TATA box and the -35 TTGACA RNA polymerase codon recognition sites, relative to the promoter region (Fig. 4). Fig. 4. The operator region and the lacZ gene. Shown are also the −10 TATA box and the −35 TTGACA RNA polymerase recognition sites. 3.3 Modelling Transcription and Translation RNA polymerase (Pol ), the initiator of transcription, is represented as a dark brown sphere (Fig. 5). RNA polymerase has a natural affinity for DNA and is usually found near or on the DNA, which we have incorporated into the model. Once RNA polymerase attaches to a DNA region, it starts scanning along the chain of nucleotide bases (codons). Transcription occurs once RNA polymerase has encountered a viable promoter region. Genes adjacent to the promoter region are transcribed into mRNA, represented as a twisted single-strand helix (Fig. 6). Again, we have taken the liberty of representing the mRNA gene material as codons corresponding to the actual nucleotide base sequence. As before, codons are represented as color-coded cylinders corresponding to the appropriate amino acids they encode for (Fig. 2). The process of translation occurs once the mRNA strand has been synthesized. Ribosomes, represented as small red spheres, attach to a free mRNA strand and begin to synthesize the associated protein (Fig. 7). The unfolded protein is shown as a strand of disks. Multiple ribosomes can simultaneously read a single mRNA strand, as illustrated in Figure 7. Once a chain of amino acids is completely synthesized, the ‘unfolded protein’ turns into its associated protein, such as a repressor or β-galactosidase. All folded proteins are represented as spheres of different sizes and colours. 3.4 Simulating Gene Regulation The two kinds of proteins synthesized through the processes of transcription and translation are repressor proteins and β-galactosidase enzymes. Repressors are represented as green spheres (Fig. 8). Repressors have a natural affinity for the Fig. 5. RNA polymerase attaches to the lac operon and searches for a viable promoter region to begin transcription. Fig. 6. An mRNA strand transcribed by RNA polymerase. Fig. 7. mRNA undergoes translation by multiple ribosomes. The ribosomes construct unfolded proteins based on the mRNA codon sequence. operator region of the lac operon. They attempt to bind to the operator region and physically block transcription of the lacZ gene. This turns the lac operon off. If lactose, represented as a small purple sphere (Fig. 8), is introduced into the system, the lac operon will turn on. This is accomplished through the formation of a repressor-lactose complex, which—due to conformational changes—cannot bind to the operator region. This, in turn, allows RNA polymerase to transcribe the lacZ gene and produce β-galactosidase, which can then convert lactose into glucose. Fig. 8. The repressor binds to the operator region of the lac operon and turns it off. A lactose-repressor complex has formed, preventing the repressor protein from binding to the operator. 3.5 A Swarm Approach Implementation of this model and its visualization is achieved through a swarmbased approach. Each individual element in the simulation is an independent agent governed by simple rules of interaction. Dynamic elements in the system move randomly, executing specific actions when interacting with other agents. An agent operates within the confines of the cell. Each agent follows a set of simple rules that define its actions in the system. As an example, we show the behaviour of RNA polymerase in Table 1. The system provides each agent with basic services, such as the ability to move, rotate, and determine the existence Iterate Pseudo Code Biological State and Action case state of FLOATING: /* initial state */ if near DNA: attach self to nearest DNA codon state = DOCKED else: move randomly within the cell Floating: RNA polymerase is usually found near DNA and moves about the cell in a random manner. In this state, RNA polymerase will attempt to attach itself to the nearest free DNA strand. DOCKED: if promoter region is reached: state = READY_TO_TRANSCRIBE else: move along DNA to next codon Docked: Once RNA polymerase has docked onto a free DNA strand, it will begin reading the DNA. READY_TO_TRANSCRIBE: create a new (empty) mRNA molecule state = TRANSCRIBING Ready to Transcribe: When a promoter/operator sequence is found, the RNA polymerase will begin to initiate transcription. TRANSCRIBING: if a stop codon is reached: release constructed mRNA state = DETACHED else if blocked by a repressor: destroy partial mRNA state = DETACHED else: move to the next codon append codon mRNA Transcribing: RNA polymerase will transcribe the DNA sequence into an mRNA molecule. RNA polymerase reads each codon sequentially, and appends a new base to the growing mRNA molecule. This process is completed once RNA polymerase encounters the appropriate stop codon. RNA polymerase will then detach itself from the DNA. DETACHED: detach self from DNA move randomly state = FLOATING Detached: Once RNA polymerase has detached from DNA, it will again resume its random movement in the cell. end case Table 1. Rules governing the behaviour of RNA polymerase as an example swarm agent. Pseudocode is presented with each state of RNA polymerase outlined. The corresponding biological actions are described in the right column. (a) (b) (c) (d) (e) (f) Fig. 9. Different stages of the lac operon simulation. (a) RNA polymerase searches for a promoter region. (b) RNA polymerase synthesizes an mRNA molecule. (c) Ribosomes synthesize protein molecules. (d) Repressors block RNA polymerase from transcribing the LacZ gene. (e) Lactose is introduced into the system. (f) Lactose binds to repressors preventing them from blocking RNA polymerase. An RNA polymerase molecule has just started transcribing the LacZ gene. and position of other agents. A scheduler implements time slicing by invoking each agent’s Iterate method, which executes a specific action. These actions are based on the agent’s current state, and the state of other agents in the system. There are two specific instances where we have restricted the random movement of agents in order to properly model the lac operon. RNA polymerase has a natural affinity for DNA. Hence, our RNA polymerase ‘agents’ will randomly move within a defined area located around the DNA. In addition, repressor proteins have a high affinity for the operator region of the lac operon. Consequently, we direct the repressor towards the operator region, while maintaining the repressor’s random movements. Mirrors Floor Projector Mirror Left Screen Front Projector Mirror Left Projector Fig. 10. The CAVE! immersive virtual reality setup. Forward, left, right, and floor projections provide a 3-dimensional, stereoscopic image to the viewer. (Adapted from www.visualgenomics.ca) A swarm-based approach affords a measure of modularity, as agents can be added and removed from the system, producing different results each time the simulation is run. This is in contrast to common models of gene regulatory systems that are usually scripted. In addition, new agents can also be introduced into the simulation. This allows for other aspects of the lac operon to be modelled, such as the CAP activator complex that promotes the production of β-galactosidase. A decentralized, swarm approach to modelling the lac operon closely approximates the way in which biologists view such systems. Although our simulations have so far only been tested for a relatively small number (< 100) of interacting agents, the system is designed to handle a much larger number of proteins and other cellular entities, thus getting closer to more accurate simulations of massively-parallel interaction processes within a cell that involve thousands and more particles. A brief sequence of images from the lac operon simulation is shown in Figure 9. This sequence illustrates different stages of the lac operon including the action of RNA polymerase (a), transcription (b), translation (c), repressor inhibition (d), and lactose-repressor interaction (e, f). 3.6 The CAVE! Automated Virtual Environment The CAVE! Automated Virtual Environment is an immersive 3D environment that allows users to visualize images in stereoscopic 3D. Multiple image projections are integrated into a stereo image, composed of a forward, right, left, and floor projection (Fig. 10). This particular CAVE! is the first such immersive system to allow for implementation and development using Java 3DTM [8]. We developed our visualization model using the Java 3DTM Runtime Environment. Integration of the visualization into the CAVE! environment in the Sun Center of Excellence for Visual Genomics at the University of Calgary provides viewers with a virtual 3D model of the lac operon. Users wear stereoscopic glasses in order to view the simulation (Fig. 11). The CAVE! can track movements of these glasses, automatically zooming the simulation in and out as the user moves through the modelled environment. An interactive wand (3D cursor) gives users the ability to rotate and manipulate the simulation in any desired orientation. The immersive platform provides enough space for several people to simultaneously discuss and interact with the model. The visualization of the lac operon simulation within the CAVE! serves as an innovative visualization, simulation, and teaching tool for students, educators and researchers who wish to study gene regulation. 4 Conclusion and Future Work We have presented a 3D visualization model of the lac operon gene regulatory system. The model focuses on simulating many important aspects of the system including basic gene processes such as transcription and translation. Developed using the Java 3DTM Runtime Environment, the visualization was integrated and exported into the CAVE! , providing users with a unique 3D virtual model. We believe that such visualizations will greatly support biologists in their understanding of complex gene regulatory systems, and decentralized, massivelyparallel biological systems in general. Future work includes integrating additional aspects of the lac operon not covered in the current model. This includes the CAP Catabolite activator complex that acts as an initiation factor for promoting the production of β-galactosidase based on glucose concentrations. We would also like to incorporate an Fig. 11. The lac operon model and its 3D visualization in the CAVE! at the University of Calgary. evolutionary computation engine into the simulation, such as the Evolvica system [3]. Evolution of this gene system may lead to interesting and complex behaviours that can be compared with other gene regulatory systems evolved by nature. Further information about the lactose operon and other swarm-based models of biological systems, such as the λ-switch and an artificial immune system, is available at: http://www.cpsc.ucalgary.ca/∼jacob/ESD/LacOperon. 5 Acknowledgement We would like to thank Julie Andreotti for her help with the implementation of the 3D visualization model. We would also like to thank Dr. Christoph Sensen and Paul Gordon for their help in incorporating this simulation into the CAVE! . Further information about the CAVE! environment and research in visual genomics is available at: http://www.visualgenomics.ca. References 1. J. R. Beckwith and D. Zipser, editors. The Lactose Operon. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1970. 2. J. Collado-Vides. Towards a grammatical paradigm for the study of the regulation of gene expression. In B. Goodwin and P. Saunders, editors, Theoretical Biology. Epigenetic and Evolutionary Order from Complex Systems, pages 211–224. Johns Hopkins University Press, Baltimore, ML, 1992. 3. C. Jacob. Illustrating Evolutionary Computation with Mathematica. Morgan Kaufmann Publishers, San Francisco, CA, 2001. 4. F. Jacob and J. Monod. Genetic regulatory mechanisms in the synthesis of proteins. Molecular Biology, 3:318–356, 1961. 5. H. Matsuno, A. Doi, A. Tanaka, H. Aoshima, Y. Hirata, and S. Miyano. Genomic object net: Basic architecture for representing and simulating biopathways. In Ninth International Conference on Intelligent Systems for Molecular Biology, Copenhagen, Denmark, 2001. 6. B. Müller-Hill. The lac Operon - A Short History of a Genetic Paradigm. Walter de Gryter, Berlin, 1996. 7. M. Ptashne and A. Gann. Genes & Signals. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2002. 8. C. W. Sensen. Using cave technology for functional genomics studies. Diabetes Technology & Therapeutics, 4:867–871, 2002. 9. G. Suen and C. Jacob. A symbolic and graphical gene regulation model of the lac operon. In Fifth International Mathematica Symposium, pages 73–80, London, England, 2003. Imperial College Press. 10. M. Tomita, K. Hashimoto, K. Takahashi, Y. Matsuzaki, R. Matsushima, K. Saito, K. Yugi, F. Miyoshi, H. Nakano, S. Tanida, Y. Saito, A. Kawase, N. Watanabe, T. Shimizu, and Y. Nakayama. The e-cell project: Towards integrative simulation of cellular processes. New Generation Computing, 18(1):1–12, 2000. 11. J. D. Watson and F. H. C. Crick. A structure for deoxyribose nucleic acid. Nature, 171:737–738, 1953.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download DNA in Action! A 3D Swarm-based Model of a Gene Regulatory