Download DNA in Action! A 3D Swarm-based Model of a Gene Regulatory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene nomenclature wikipedia , lookup

Gene therapy wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic engineering wikipedia , lookup

DNA supercoil wikipedia , lookup

SR protein wikipedia , lookup

Expression vector wikipedia , lookup

Community fingerprinting wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genetic code wikipedia , lookup

RNA interference wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Biosynthesis wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

RNA wikipedia , lookup

Polyadenylation wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

RNA silencing wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Messenger RNA wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Gene wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene regulatory network wikipedia , lookup

RNA-Seq wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

Epitranscriptome wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Lac operon wikipedia , lookup

Transcript
DNA in Action!
A 3D Swarm-based Model of a Gene Regulatory
System
Ian Burleigh, Garret Suen, and Christian Jacob
Department of Computer Science, University of Calgary,
2500 University Drive NW,
Calgary, Alberta, Canada T2N 1N4
Phone: (403) 220-7682, Fax: (403) 284-4707
{burleigh, sueng, jacob}@cpsc.ucalgary.ca
http://www.cpsc.ucalgary.ca/∼jacob/ESD/
Abstract. We present a 3D swarm-based model of the classic lactose
(lac) operon gene regulatory system. The lac operon is a well-understood
genetic switch capable of self-regulation dependent on the energy source
of lactose. Our model includes a 3-dimensional visualization which simulates proteins interacting with DNA and incorporates many of the important aspects of the gene system. Our model utilizes a decentralized
swarm approach with multiple agents acting independently to exhibit
a complex behaviour. Our visualization is implemented using the Java
3DTM Runtime Environment and is also being used in a CAVE! Automated Virtual Environment.
Keywords Biological Agents, Bioinformatics, Simulation, Swarm Intelligence
ACAL 2003, First Australian Conference on Artificial
Life, Canberra, Australia.
1
Introduction
Current research in genetics and genomics focuses on understanding the genetics
of model organisms, such as the bacterium Escherichia coli, the nematode Caenorhabditis elegans, and the fruitfly Drosophila melanogaster. Working with these
simple biological models helps to elucidate more complex processes found in
higher order gene networks. One of the major advances in this area of research
is in the utilization of computers as an integral research tool, leading to new
interdisciplinary fields of Bioinformatics and Biological Computing. Innovative
computer graphics and specialized visualization technology, such as the CAVE!
Automated Virtual Environment, provide biologists with unprecedented tools
for research in ‘virtual laboratories’.
In this paper, we present a 3D model of the lactose (short: lac) operon,
one of the most basic and well-understood biological models of gene regulation.
Several computer models of the lac operon exist, including simple grammarbased approaches [2], Functional Hybrid Petri Net models [5], systems based on
rewrite rules [9], and the E-CELL project [10]. However, current visualization
models of the lac operon only involve 2-dimensional space. We propose a 3dimensional model of the lac operon that simulates many important aspects of
the system, including the processes of transcription and translation.
The paper is organized as follows. In section 2, we present a brief synopsis of
the lac operon gene regulatory system as it is commonly understood in biology. In
section 3, we discuss our implementation of the lac operon, highlighting many of
the modelled processes and structures. Finally, we look at the swarm approach
used to simulate this gene regulatory system and detail its visualization in a
CAVE! immersive environment.
2
The lac Operon: A Gene Regulatory System
An operon is a group of genes located on the DNA of bacteria. Jacques Monod
and Francois Jacob first studied the lac operon in the 1960s [4]. Found in the
bacterium Escherichia coli (E. coli ), it stands as one of the most important
findings in genetics, as it is one of the most basic gene regulatory systems known,
and is consequently used as a basis for studies of more complex gene systems.
The lac operon, in particular, is a gene system that is responsible for converting the sugar lactose into glucose, a key energy source for the bacterium.
E. coli is a prokaryotic organism without a nucleus that is normally found in
a lactose rich environment, such as the gut of humans. E. coli requires glucose
for much of its growth and has evolved a solution for obtaining glucose from its
environment, by converting lactose into glucose and galactose. This conversion
is accomplished through the enzyme β-galatosidase, which is one of the products
of the lac operon. In the presence of lactose, the lac operon has the ability to
turn itself on and produce β-galactosidase. When lactose is no longer present,
the lac operon turns itself off, hence stops the production of β-galacosidase,
and conserves cellular resources. In this manner, the lac operon is capable of
self-regulation [1], [6], [7].
2.1
Self-Regulation of the lac Operon
Gene-based self-regulation is an emergent property resulting from the interaction
of proteins, enzymes, and DNA. In order to understand how this ‘emergence’ is
accomplished, we will have to look at the lactose operon in much closer detail.
The main components of the lac operon as a regulatory unit on the bacterial
DNA consists of four genes: lacZ, lacY, lacA, and lacI.
Gene Complex 1: lacZ-Y-A The lacZ-Y-A genes appear as a single module
and are located adjacent to one another on the operon (Fig. 1a). A control
complex consisting of an operator and a promoter precedes these three genes.
The operator controls the protein production from these genes. Producing a
protein from a given gene is accomplished through the action of RNA polymerase.
RNA polymerase reads a sequence of genes, resulting in the production of their
corresponding proteins through the processes of transcription and translation
(Section 2.2).
Gene Complex 2: lacI The lacI gene, the second key module, is located
downstream of the main lac complex (Fig. 1a). It likewise contains a promoter
region, and produces proteins through the same action of RNA polymerase. The
lacI protein product is known as a repressor, which has the ability to bind to the
operator region and prevent RNA polymerase from reading the lacZ-Y-A genes.
Hence, the repressor serves as the basic control mechanism for the lac operon.
Turning the Switch When lactose is present in the system, the lac operon
can turn itself on (Fig. 1b). This is accomplished through the binding of lactose
to the repressor to form a repressor-lactose complex. Due to conformational
changes, the repressor-lactose complex cannot bind to the operator region of the
lacZ-Y-A genes any more. This allows RNA polymerase to now read lacZ, lacY,
and lacA, producing β-galactosidase, lactose permease, and transacetylase, respectively. Among these three gene products, β-galactosidase is the enzyme that
converts lactose into glucose and galactose, whereas lactose permease enhances
the movement of lactose from the outer environment into the cell. Transacetylase
does not seem to play a role in this regulatory system.
Once lactose is removed from the system, the repressor is, again, free to bind
to the operator region and stop the production of β-galactosidase (Fig. 1a). In
this manner, the lac operon is able to regulate its gene products, thus conserving
cellular resources.
2.2
Transcription and Translation
Once genes are ‘switched on’, RNA polymerase has access to the encoding regions
for the structural genes on the DNA. The processes of transcription and translation serve as intermediary steps in order to produce proteins from a given gene.
Transcription is the process of converting Deoxyribose Nucleic Acid (DNA) into
RNA
Polymerase
RNA
Polymerase
Repressor binds
to operator
I
LacI Pi
P O LacZ LacY LacA
mRNA
No mRNA and no proteins
LacI Pi
P O LacZ LacY LacA
mRNA
+
Ribosomes
mRNAs
+
Ribosomes
blocked
…
+
Ribosomes
Lactose
I
I
Z
Y
A
Conformational change
(a)
(b)
Fig. 1. Schematic of the lac operon switch. (a) After RNA polymerase docks onto Pi ,
the LacI promoter site, it transcribes the LacI gene into its mRNA representation,
which is then translated by ribosomes into the repressor protein I. This repressor
binds to the LacZ-Y-A operator site, which in turn blocks RNA polymerase; hence,
none of the three genes are expressed. (b) When lactose enters into the cell, it induces
a shape change in the repressors that disables them from binding to the operator.
Consequently, the LacZ-Y-A genes are accessible by the RNA polymerase and are
expressed as proteins Z, Y, and A.
an intermediate protein known as messenger Ribonucleic Acid (mRNA). The
enzyme RNA polymerase is responsible for this particular conversion, which
proceeds as follows:
1. RNA polymerase searches along the DNA structure until it encounters an
appropriate promoter region.
2. Starting at the promoter region, RNA polymerase begins to synthesize mRNA
based on the genes found adjacent to the promoter.
3. Once transcription is complete, the mRNA strand is free to undergo a second
conversion process (through translation), whereas RNA polymerase reiterates the process of transcription.
During translation a protein is synthesized from an mRNA strand. This
mRNA-to-protein mapping is achieved through the action of two types of proteins – ribosomes and transfer RNA (tRNA) – as follows:
1. A ribosome locates and attaches to a free mRNA strand.
2. The ribosome begins to read the strand and synthesizes a chain of amino
acids with the support of tRNA. The chain then folds into a 3-dimensional
protein structure. Multiple ribosomes can simultaneously read and synthesize
proteins from a single mRNA strand.
3. Once translation is complete, the ribosome detaches from the mRNA strand
and releases the newly made protein.
The lac operon is a complex gene-regulatory process that involves the interactions of many agents. To model these processes as they occur spatially in
nature, we present a 3D visualization of this simulation.
3
Visualization in 3D
Our model of the lac operon visualizes many of the important structures and
events associated with the gene regulation system. This includes DNA and its
associated gene structures, transcription and translation, and the self-regulatory
controls that govern the interactions and system dynamics. The 3-dimensional
model is implemented using a swarm approach with multiple independent ‘agents’
following simple rules of interaction. The visualization, developed as a 2D projection on a normal computer screen (Fig. 2), is further enhanced through stereoscopic 3D in a CAVE! immersive environment (Section 3.6).
Fig. 2. The lac operon 3D visualization and its graphical user interface.
3.1
Modeling Circular DNA
DNA in bacteria exists similar to that found in higher organisms. However, bacterial DNA is circular in nature. We represent the actual encoding of the lactose
operon gene as a circular DNA double-helix with its characteristic Watson-Crick
complementarity pattern (Fig. 3) [11]. DNA consists of four kinds of bases: Adenine, Cytosine, Guanine, and Thymine. A grouping of three such bases is known
as a codon, which codes for a specific amino acid, the basic building blocks of
proteins. Due to the vast amount of bases that make up the genes involved in
the lac operon, we represent the genetic information as codons. These codons
directly correspond to the amino acid composition of their associated proteins.
In nature, there are 20 amino acids that make up proteins (see the right margin
of Fig. 2). We visualize these codons as colour-coded cylinders that make up the
DNA strand (Fig. 4).
Fig. 3. The E. coli circular DNA represented as a double helix composed of colourencoded codons
3.2
Modelling Gene Structures
There are two distinct gene regions in the lac operon: the lacI and the lacZlacY-lacA region (compare Section 2.1). For the purposes of this model, we
have chosen to only model the lacI and lacZ gene regions. The lacY and lacA
genes do not greatly impact the understanding of the system and are therefore
not included. The lacI and lacZ gene regions are labeled appropriately on the
model (Fig. 4). In addition, the promoter and operator regions that precede the
lacZ gene are also included. To further clarify the model, the codon numbering
is shown as well, highlighting various aspects of the DNA coding sequences.
Conventional models of DNA include the -10 TATA box and the -35 TTGACA
RNA polymerase codon recognition sites, relative to the promoter region (Fig.
4).
Fig. 4. The operator region and the lacZ gene. Shown are also the −10 TATA box and
the −35 TTGACA RNA polymerase recognition sites.
3.3
Modelling Transcription and Translation
RNA polymerase (Pol ), the initiator of transcription, is represented as a dark
brown sphere (Fig. 5). RNA polymerase has a natural affinity for DNA and is
usually found near or on the DNA, which we have incorporated into the model.
Once RNA polymerase attaches to a DNA region, it starts scanning along the
chain of nucleotide bases (codons). Transcription occurs once RNA polymerase
has encountered a viable promoter region. Genes adjacent to the promoter region
are transcribed into mRNA, represented as a twisted single-strand helix (Fig.
6). Again, we have taken the liberty of representing the mRNA gene material as
codons corresponding to the actual nucleotide base sequence. As before, codons
are represented as color-coded cylinders corresponding to the appropriate amino
acids they encode for (Fig. 2). The process of translation occurs once the mRNA
strand has been synthesized. Ribosomes, represented as small red spheres, attach
to a free mRNA strand and begin to synthesize the associated protein (Fig. 7).
The unfolded protein is shown as a strand of disks. Multiple ribosomes can
simultaneously read a single mRNA strand, as illustrated in Figure 7. Once a
chain of amino acids is completely synthesized, the ‘unfolded protein’ turns into
its associated protein, such as a repressor or β-galactosidase. All folded proteins
are represented as spheres of different sizes and colours.
3.4
Simulating Gene Regulation
The two kinds of proteins synthesized through the processes of transcription and
translation are repressor proteins and β-galactosidase enzymes. Repressors are
represented as green spheres (Fig. 8). Repressors have a natural affinity for the
Fig. 5. RNA polymerase attaches to the lac operon and searches for a viable promoter
region to begin transcription.
Fig. 6. An mRNA strand transcribed by RNA polymerase.
Fig. 7. mRNA undergoes translation by multiple ribosomes. The ribosomes construct
unfolded proteins based on the mRNA codon sequence.
operator region of the lac operon. They attempt to bind to the operator region
and physically block transcription of the lacZ gene. This turns the lac operon
off. If lactose, represented as a small purple sphere (Fig. 8), is introduced into the
system, the lac operon will turn on. This is accomplished through the formation
of a repressor-lactose complex, which—due to conformational changes—cannot
bind to the operator region. This, in turn, allows RNA polymerase to transcribe
the lacZ gene and produce β-galactosidase, which can then convert lactose into
glucose.
Fig. 8. The repressor binds to the operator region of the lac operon and turns it off.
A lactose-repressor complex has formed, preventing the repressor protein from binding
to the operator.
3.5
A Swarm Approach
Implementation of this model and its visualization is achieved through a swarmbased approach. Each individual element in the simulation is an independent
agent governed by simple rules of interaction. Dynamic elements in the system
move randomly, executing specific actions when interacting with other agents.
An agent operates within the confines of the cell. Each agent follows a set of
simple rules that define its actions in the system. As an example, we show the
behaviour of RNA polymerase in Table 1. The system provides each agent with
basic services, such as the ability to move, rotate, and determine the existence
Iterate Pseudo Code
Biological State and Action
case state of
FLOATING: /* initial state */
if near DNA:
attach self to nearest DNA codon
state = DOCKED
else:
move randomly within the cell
Floating:
RNA polymerase is usually found
near DNA and moves about the
cell in a random manner. In this
state, RNA polymerase will attempt to attach itself to the
nearest free DNA strand.
DOCKED:
if promoter region is reached:
state = READY_TO_TRANSCRIBE
else:
move along DNA to next codon
Docked:
Once RNA polymerase has
docked onto a free DNA strand,
it will begin reading the DNA.
READY_TO_TRANSCRIBE:
create a new (empty) mRNA molecule
state = TRANSCRIBING
Ready to Transcribe:
When a promoter/operator sequence is found, the RNA polymerase will begin to initiate transcription.
TRANSCRIBING:
if a stop codon is reached:
release constructed mRNA
state = DETACHED
else if blocked by a repressor:
destroy partial mRNA
state = DETACHED
else:
move to the next codon
append codon mRNA
Transcribing: RNA polymerase
will transcribe the DNA sequence
into an mRNA molecule. RNA
polymerase reads each codon sequentially, and appends a new
base to the growing mRNA molecule. This process is completed
once RNA polymerase encounters the appropriate stop codon.
RNA polymerase will then detach itself from the DNA.
DETACHED:
detach self from DNA
move randomly
state = FLOATING
Detached:
Once RNA polymerase has detached from DNA, it will again
resume its random movement in
the cell.
end case
Table 1. Rules governing the behaviour of RNA polymerase as an example swarm
agent. Pseudocode is presented with each state of RNA polymerase outlined. The
corresponding biological actions are described in the right column.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 9. Different stages of the lac operon simulation. (a) RNA polymerase searches for
a promoter region. (b) RNA polymerase synthesizes an mRNA molecule. (c) Ribosomes
synthesize protein molecules. (d) Repressors block RNA polymerase from transcribing
the LacZ gene. (e) Lactose is introduced into the system. (f) Lactose binds to repressors
preventing them from blocking RNA polymerase. An RNA polymerase molecule has
just started transcribing the LacZ gene.
and position of other agents. A scheduler implements time slicing by invoking
each agent’s Iterate method, which executes a specific action. These actions are
based on the agent’s current state, and the state of other agents in the system.
There are two specific instances where we have restricted the random movement of agents in order to properly model the lac operon. RNA polymerase has
a natural affinity for DNA. Hence, our RNA polymerase ‘agents’ will randomly
move within a defined area located around the DNA. In addition, repressor
proteins have a high affinity for the operator region of the lac operon. Consequently, we direct the repressor towards the operator region, while maintaining
the repressor’s random movements.
Mirrors
Floor Projector
Mirror
Left Screen
Front Projector
Mirror
Left Projector
Fig. 10. The CAVE! immersive virtual reality setup. Forward, left, right, and floor
projections provide a 3-dimensional, stereoscopic image to the viewer. (Adapted from
www.visualgenomics.ca)
A swarm-based approach affords a measure of modularity, as agents can be
added and removed from the system, producing different results each time the
simulation is run. This is in contrast to common models of gene regulatory
systems that are usually scripted. In addition, new agents can also be introduced into the simulation. This allows for other aspects of the lac operon to
be modelled, such as the CAP activator complex that promotes the production of β-galactosidase. A decentralized, swarm approach to modelling the lac
operon closely approximates the way in which biologists view such systems. Although our simulations have so far only been tested for a relatively small number
(< 100) of interacting agents, the system is designed to handle a much larger
number of proteins and other cellular entities, thus getting closer to more accurate simulations of massively-parallel interaction processes within a cell that
involve thousands and more particles.
A brief sequence of images from the lac operon simulation is shown in Figure
9. This sequence illustrates different stages of the lac operon including the action
of RNA polymerase (a), transcription (b), translation (c), repressor inhibition
(d), and lactose-repressor interaction (e, f).
3.6
The CAVE! Automated Virtual Environment
The CAVE! Automated Virtual Environment is an immersive 3D environment
that allows users to visualize images in stereoscopic 3D. Multiple image projections are integrated into a stereo image, composed of a forward, right, left, and
floor projection (Fig. 10). This particular CAVE! is the first such immersive
system to allow for implementation and development using Java 3DTM [8]. We
developed our visualization model using the Java 3DTM Runtime Environment.
Integration of the visualization into the CAVE! environment in the Sun
Center of Excellence for Visual Genomics at the University of Calgary provides
viewers with a virtual 3D model of the lac operon. Users wear stereoscopic glasses
in order to view the simulation (Fig. 11). The CAVE! can track movements of
these glasses, automatically zooming the simulation in and out as the user moves
through the modelled environment. An interactive wand (3D cursor) gives users
the ability to rotate and manipulate the simulation in any desired orientation.
The immersive platform provides enough space for several people to simultaneously discuss and interact with the model. The visualization of the lac operon
simulation within the CAVE! serves as an innovative visualization, simulation,
and teaching tool for students, educators and researchers who wish to study gene
regulation.
4
Conclusion and Future Work
We have presented a 3D visualization model of the lac operon gene regulatory
system. The model focuses on simulating many important aspects of the system
including basic gene processes such as transcription and translation. Developed
using the Java 3DTM Runtime Environment, the visualization was integrated
and exported into the CAVE! , providing users with a unique 3D virtual model.
We believe that such visualizations will greatly support biologists in their understanding of complex gene regulatory systems, and decentralized, massivelyparallel biological systems in general.
Future work includes integrating additional aspects of the lac operon not
covered in the current model. This includes the CAP Catabolite activator complex that acts as an initiation factor for promoting the production of β-galactosidase based on glucose concentrations. We would also like to incorporate an
Fig. 11. The lac operon model and its 3D visualization in the CAVE! at the University
of Calgary.
evolutionary computation engine into the simulation, such as the Evolvica system [3]. Evolution of this gene system may lead to interesting and complex
behaviours that can be compared with other gene regulatory systems evolved
by nature. Further information about the lactose operon and other swarm-based
models of biological systems, such as the λ-switch and an artificial immune system, is available at:
http://www.cpsc.ucalgary.ca/∼jacob/ESD/LacOperon.
5
Acknowledgement
We would like to thank Julie Andreotti for her help with the implementation of
the 3D visualization model. We would also like to thank Dr. Christoph Sensen
and Paul Gordon for their help in incorporating this simulation into the CAVE! .
Further information about the CAVE! environment and research in visual genomics is available at:
http://www.visualgenomics.ca.
References
1. J. R. Beckwith and D. Zipser, editors. The Lactose Operon. Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, NY, 1970.
2. J. Collado-Vides. Towards a grammatical paradigm for the study of the regulation
of gene expression. In B. Goodwin and P. Saunders, editors, Theoretical Biology.
Epigenetic and Evolutionary Order from Complex Systems, pages 211–224. Johns
Hopkins University Press, Baltimore, ML, 1992.
3. C. Jacob. Illustrating Evolutionary Computation with Mathematica. Morgan
Kaufmann Publishers, San Francisco, CA, 2001.
4. F. Jacob and J. Monod. Genetic regulatory mechanisms in the synthesis of proteins.
Molecular Biology, 3:318–356, 1961.
5. H. Matsuno, A. Doi, A. Tanaka, H. Aoshima, Y. Hirata, and S. Miyano. Genomic object net: Basic architecture for representing and simulating biopathways.
In Ninth International Conference on Intelligent Systems for Molecular Biology,
Copenhagen, Denmark, 2001.
6. B. Müller-Hill. The lac Operon - A Short History of a Genetic Paradigm. Walter
de Gryter, Berlin, 1996.
7. M. Ptashne and A. Gann. Genes & Signals. Cold Spring Harbor Laboratory Press,
Cold Spring Harbor, NY, 2002.
8. C. W. Sensen. Using cave technology for functional genomics studies. Diabetes
Technology & Therapeutics, 4:867–871, 2002.
9. G. Suen and C. Jacob. A symbolic and graphical gene regulation model of the
lac operon. In Fifth International Mathematica Symposium, pages 73–80, London,
England, 2003. Imperial College Press.
10. M. Tomita, K. Hashimoto, K. Takahashi, Y. Matsuzaki, R. Matsushima, K. Saito,
K. Yugi, F. Miyoshi, H. Nakano, S. Tanida, Y. Saito, A. Kawase, N. Watanabe,
T. Shimizu, and Y. Nakayama. The e-cell project: Towards integrative simulation
of cellular processes. New Generation Computing, 18(1):1–12, 2000.
11. J. D. Watson and F. H. C. Crick. A structure for deoxyribose nucleic acid. Nature,
171:737–738, 1953.