Download Towards Understanding the Origin of Genetic Languages

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metabolism wikipedia , lookup

Molecular ecology wikipedia , lookup

Proteolysis wikipedia , lookup

Biosynthesis wikipedia , lookup

Personalized medicine wikipedia , lookup

Genetic engineering wikipedia , lookup

Biochemistry wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
Towards Understanding
the Origin of Genetic Languages
Why do living organisms use
4 nucleotide bases and 20 amino acids?
Apoorva Patel
Centre for High Energy Physics and
Supercomputer Education and Research Centre
Indian Institute of Science, Bangalore
26 January 2013, IIT, Jodhpur
Origin of Genetic Languages – p.1/28
What is Life?
Life is fundamentally a non-equilibrium process,
commonly charaterised in terms of two basic phenomena.
Origin of Genetic Languages – p.2/28
What is Life?
Life is fundamentally a non-equilibrium process,
commonly charaterised in terms of two basic phenomena.
Metabolism: Many biochemical processes are needed to
sustain a living organism. That requires a continuous
supply of free energy, extracted from the environment.
(The ultimate source of this energy is gravity—the only
interaction that is not in equilibrium.)
Reproduction: A particular structure cannot survive forever,
because of environmental disturbances and damages.
So life perpetuates itself by a succession of generations.
Origin of Genetic Languages – p.2/28
What is Life?
Life is fundamentally a non-equilibrium process,
commonly charaterised in terms of two basic phenomena.
Metabolism: Many biochemical processes are needed to
sustain a living organism. That requires a continuous
supply of free energy, extracted from the environment.
(The ultimate source of this energy is gravity—the only
interaction that is not in equilibrium.)
Reproduction: A particular structure cannot survive forever,
because of environmental disturbances and damages.
So life perpetuates itself by a succession of generations.
Both these phenomena are sustaining/protecting/improving
something, and that too against the odds.
But what is it that is being sustained/protected/improved?
Origin of Genetic Languages – p.2/28
The meaning of it all
All living organisms are made up of atoms.
Atoms are fantastically indestructible.
They just get rearranged in different ways.
Each of us would have a billion atoms that once belonged
to the Buddha, or Genghis Khan, or Isaac Newton.
Origin of Genetic Languages – p.3/28
The meaning of it all
All living organisms are made up of atoms.
Atoms are fantastically indestructible.
They just get rearranged in different ways.
Each of us would have a billion atoms that once belonged
to the Buddha, or Genghis Khan, or Isaac Newton.
It is not the atoms themselves but their arrangements,
which carries biochemical information.
Living organisms evolve, by altering atomic arrangements.
Molecules keep on changing, and atoms are shuffled.
Origin of Genetic Languages – p.3/28
The meaning of it all
All living organisms are made up of atoms.
Atoms are fantastically indestructible.
They just get rearranged in different ways.
Each of us would have a billion atoms that once belonged
to the Buddha, or Genghis Khan, or Isaac Newton.
It is not the atoms themselves but their arrangements,
which carries biochemical information.
Living organisms evolve, by altering atomic arrangements.
Molecules keep on changing, and atoms are shuffled.
Hardware is recycled,
while software is improved!
Preservation of information requires complex structures!!
Origin of Genetic Languages – p.3/28
Typical scales:
Atoms: H, C, N, O, and infrequently P, S.
Nucleotide bases and amino acids: 10-20 atoms
Peptides and drugs: 40-100 atoms
Proteins: 100-1000 amino acids
Genomes: 103 -109 nucleotide base pairs
Size: 1 nm (molecules)-104 nm (cells)
Origin of Genetic Languages – p.4/28
Typical scales:
Atoms: H, C, N, O, and infrequently P, S.
Nucleotide bases and amino acids: 10-20 atoms
Peptides and drugs: 40-100 atoms
Proteins: 100-1000 amino acids
Genomes: 103 -109 nucleotide base pairs
Size: 1 nm (molecules)-104 nm (cells)
Gene and protein databases are accumulating a lot of
information, which can be used to test hypotheses and
consequences of optimised information processing.
We hope to understand physical evolutionary reasons for
(1) specific languages, and (2) their specific realisations.
These have a bearing on probability of finding life elsewhere in the universe.
Origin of Genetic Languages – p.4/28
Biological Facts
1. Languages of genes and proteins are universal:
The same 4 nucleotide bases and 20 amino acids
are used in DNA, RNA and proteins,
all the way from viruses and bacteria to human beings.
This is despite the fact that other nucleotide bases and
amino acids exist in living cells.
⇒ Selection of a specific language has taken place.
Origin of Genetic Languages – p.5/28
Biological Facts
1. Languages of genes and proteins are universal:
The same 4 nucleotide bases and 20 amino acids
are used in DNA, RNA and proteins,
all the way from viruses and bacteria to human beings.
This is despite the fact that other nucleotide bases and
amino acids exist in living cells.
⇒ Selection of a specific language has taken place.
2. Genetic information is encoded close to
the data compression limit (for genes)
and maximal packing (for proteins).
⇒ Optimisation of information storage has occurred.
Origin of Genetic Languages – p.5/28
Biological Facts
1. Languages of genes and proteins are universal:
The same 4 nucleotide bases and 20 amino acids
are used in DNA, RNA and proteins,
all the way from viruses and bacteria to human beings.
This is despite the fact that other nucleotide bases and
amino acids exist in living cells.
⇒ Selection of a specific language has taken place.
2. Genetic information is encoded close to
the data compression limit (for genes)
and maximal packing (for proteins).
⇒ Optimisation of information storage has occurred.
3. Evolution occurs through random mutations,
which are local changes in the genetic sequence.
Only a small fraction of the mutations survive.
Darwinian selection (competition for finite resources
among the users) is the optimising mechanism.
Origin of Genetic Languages – p.5/28
Two Scenarios
Frozen accident?
No!
The language somehow came into existence, and became
such a vital part of life’s machinery that any change
in it would be highly deleterious to living organisms.
Requires an extremely rare event,
without sufficient time to explore other possibilities.
Optimal solution?
Yes!!
The language arrived at its best form by trial and error,
and it did not change thereafter, because any change
in it would make the information processing worse.
Requires sufficient time to generate many possibilities,
and subsequent competition amongst them.
The optimal solution then wins over all other options.
Origin of Genetic Languages – p.6/28
The Tasks
DNA/RNA: Assemble a chain of building blocks on top of
a pre-existing master template.
DNA is the read-only-memory (ROM) of living organisms.
This is a 1-dimensional problem.
Origin of Genetic Languages – p.7/28
The Tasks
DNA/RNA: Assemble a chain of building blocks on top of
a pre-existing master template.
DNA is the read-only-memory (ROM) of living organisms.
This is a 1-dimensional problem.
Proteins: Design structurally stable molecules of specific
shapes, with precise locations of active chemical groups.
Proteins carry out various functions, by highly specific
binding and short range interactions (hard-core repulsion,
screened charges, van der Waals forces, hydrogen bonds),
i.e. lock-and-key mechanism.
The key shape accuracy is a fraction of atomic size.
This is a 3-dimensional problem.
Origin of Genetic Languages – p.7/28
Optimisation Criteria
Minimise errors: Use a digital language—clearly
distinguishable building blocks, with discrete operations.
Discrete values are associated with islands of physical
properties, instead of precise points.
Practical applications need only bounded error calculations.
Origin of Genetic Languages – p.8/28
Optimisation Criteria
Minimise errors: Use a digital language—clearly
distinguishable building blocks, with discrete operations.
Discrete values are associated with islands of physical
properties, instead of precise points.
Practical applications need only bounded error calculations.
Minimise resources: Use a small number of building blocks,
which allow simple and quick operations.
In a versatile language, the building blocks can be joined
together in as many different ways as possible, giving rise
to distinct structures.
Origin of Genetic Languages – p.8/28
Top-down vs. Bottom-up
Michelangelo’s David
Origin of Genetic Languages – p.9/28
Top-down vs. Bottom-up
Michelangelo’s David
Lego Construction Blocks
Origin of Genetic Languages – p.10/28
Minimal Language
The language with the smallest set of building blocks
(for a given task) has a unique status in the
optimisation procedure:
• Largest tolerance against errors.
(Discrete variables are spread as far apart as possible
in the available range of physical hardware properties.)
• Smallest instruction set.
(Number of possible transformations is limited.)
• High density of packing and quick operations.
(These more than make up for the increased depth of
computation.)
• Simplest language, without need of translation.
(Simple physical responses of the hardware can be used.)
Origin of Genetic Languages – p.11/28
Boolean Algebra
It is the minimal classical language for encoding information
as 1-dimensional sequences.
Its two letters can have a variety of realisations:
0 and 1, on and off, up and down, etc.
Its operations form the smallest field Z2 .
It is sufficient (though not necessarily optimal)
to encode any information, as our computers demonstrate.
When starting from scratch (e.g. at the origin of life),
it is the simplest language to arrive at.
Origin of Genetic Languages – p.12/28
Boolean Algebra
It is the minimal classical language for encoding information
as 1-dimensional sequences.
Its two letters can have a variety of realisations:
0 and 1, on and off, up and down, etc.
Its operations form the smallest field Z2 .
It is sufficient (though not necessarily optimal)
to encode any information, as our computers demonstrate.
When starting from scratch (e.g. at the origin of life),
it is the simplest language to arrive at.
So why did evolution opt for more complex languages,
and how? We have to understand the optimisation of
languages, while implementing specific tasks!
Origin of Genetic Languages – p.12/28
Summary (Proteins)
1. What is the purpose of the language of amino acids?
. . . contd.
Origin of Genetic Languages – p.13/28
Summary (Proteins)
1. What is the purpose of the language of amino acids?
To form protein molecules of different shapes and sizes,
and containing different chemical groups.
. . . contd.
Origin of Genetic Languages – p.13/28
Summary (Proteins)
1. What is the purpose of the language of amino acids?
To form protein molecules of different shapes and sizes,
and containing different chemical groups.
2. What is the optimal discrete geometry for designing
3-dimensional structures (translations and rotations)?
. . . contd.
Origin of Genetic Languages – p.13/28
Summary (Proteins)
1. What is the purpose of the language of amino acids?
To form protein molecules of different shapes and sizes,
and containing different chemical groups.
2. What is the optimal discrete geometry for designing
3-dimensional structures (translations and rotations)?
Simplicial tetrahedral geometry and diamond lattice.
(Secondary protein structures, i.e. α-helices, β -bends
and β -sheets, fit easily on the diamond lattice.)
Diamond is the hardest material, with the largest band gap.
. . . contd.
Origin of Genetic Languages – p.13/28
Summary (Proteins)
1. What is the purpose of the language of amino acids?
To form protein molecules of different shapes and sizes,
and containing different chemical groups.
2. What is the optimal discrete geometry for designing
3-dimensional structures (translations and rotations)?
Simplicial tetrahedral geometry and diamond lattice.
(Secondary protein structures, i.e. α-helices, β -bends
and β -sheets, fit easily on the diamond lattice.)
Diamond is the hardest material, with the largest band gap.
3. What are the best physical components to realise this
geometry?
. . . contd.
Origin of Genetic Languages – p.13/28
Summary (Proteins)
1. What is the purpose of the language of amino acids?
To form protein molecules of different shapes and sizes,
and containing different chemical groups.
2. What is the optimal discrete geometry for designing
3-dimensional structures (translations and rotations)?
Simplicial tetrahedral geometry and diamond lattice.
(Secondary protein structures, i.e. α-helices, β -bends
and β -sheets, fit easily on the diamond lattice.)
Diamond is the hardest material, with the largest band gap.
3. What are the best physical components to realise this
geometry?
Covalently bonded carbon atoms. Also N + and H2 O .
(In the graphite sheet form, carbon also provides the
simplicial geometry for 2-dim membrane patterns.)
. . . contd.
Origin of Genetic Languages – p.13/28
4. What is a convenient way to assemble these
components in the desired structure?
Synthesise 1-dim polypeptide chains, which contain
information about how to fold into 3-dim structures.
The same structure may be covered by different folding patterns,
so all the folds may not occur with equal probability.
Origin of Genetic Languages – p.14/28
4. What is a convenient way to assemble these
components in the desired structure?
Synthesise 1-dim polypeptide chains, which contain
information about how to fold into 3-dim structures.
The same structure may be covered by different folding patterns,
so all the folds may not occur with equal probability.
5. What are the elementary operations needed to fold a
polypeptide chain on a diamond lattice, in any desired
manner?
Nine discrete rotations, represented as 3 × 3 array
on the Ramachandran map.
Trans-cis flip and long distance bonds are additional operations.
Origin of Genetic Languages – p.15/28
The allowed orientation angles for the Cα bonds in real polypeptide chains for
chiral L-type amino acids, taking into account hard core repulsion between atoms.
Stars mark the nine discrete possibilities for the angles, when the polypeptide chain
is folded on a diamond lattice.
Origin of Genetic Languages – p.16/28
4. What is a convenient way to assemble these
components in the desired structure?
Synthesise 1-dim polypeptide chains, which contain
information about how to fold into 3-dim structures.
The same structure may be covered by different folding patterns,
so all the folds may not occur with equal probability.
5. What are the elementary operations needed to fold a
polypeptide chain on a diamond lattice, in any desired
manner?
Nine discrete rotations, represented as 3 × 3 array
on the Ramachandran map.
Trans-cis flip and long distance bonds are additional operations.
6. What can the side-chain R-groups do?
They favour particular orientations by interactions
amongst themselves. They fill up cavities in the
structure by variations in their size.
Chemical properties decide orientation. Physical volume adjusts to cavity size.
Origin of Genetic Languages – p.17/28
Amino acid
R-group property
Mol. wt.
Class
G Gly (Glycine)
A Ala (Alanine)
Propensity
Non-polar
75
II
turn
aliphatic
89
II
α
P Pro (Proline)
115
II
turn
V Val (Valine)
117
I
β
L Leu (Leucine)
131
I
α
I Ile (Isoleucine)
131
I
β
S Ser (Serine)
Polar
105
II
turn
T Thr (Threonine)
uncharged
119
II
β
N Asn (Asparagine)
132
II
turn
C Cys (Cysteine)
121
I
β
M Met (Methionine)
149
I
α
Q Gln (Glutamine)
146
I
α
D Asp (Aspartate)
Negative
133
II
turn
E Glu (Glutamate)
charge
147
I
α
K Lys (Lysine)
Positive
146
II
α
R Arg (Arginine)
charge
174
I
α
H His (Histidine)
Ring/
155
II
α
F Phe (Phenylalanine)
aromatic
165
II
β
Y Tyr (Tyrosine)
181
I
β
W Trp (Tryptophan)
204
I
β
Origin of Genetic Languages – p.18/28
Summary (DNA)
1. What is the information processing task carried out by
the genetic code?
Assembling molecules by picking up components from
an unsorted database.
Origin of Genetic Languages – p.19/28
Summary (DNA)
1. What is the information processing task carried out by
the genetic code?
Assembling molecules by picking up components from
an unsorted database.
2. What is the optimal way of carrying out this task?
Lov Grover’s quantum search algorithm.
(Requires wave dynamics.)
Origin of Genetic Languages – p.20/28
Biochemical Assembly Process
• Instead of waiting for a desired complex biomolecule
to come along, it is far more efficient to synthesise it
from common, simple ingredients.
• There should be a sufficient number of clearly
distinguishable building blocks to create
the wide variety of required biomolecules.
• The database of building blocks is unsorted.
The assembly of DNA, RNA and polypeptide chains
takes place on pre-existing templates.
• The assembly process is linear. At each step,
base-pairings decide (by complementarity rule)
which building block is the correct one to add.
• Molecular bonds are binary questions;
either they form or they do not form.
Origin of Genetic Languages – p.21/28
Summary (DNA)
1. What is the information processing task carried out by
the genetic code?
Assembling molecules by picking up components from
an unsorted database.
2. What is the optimal way of carrying out this task?
Lov Grover’s quantum search algorithm.
(Requires wave dynamics.)
3. What is the signature of this algorithm?
(2Q + 1) sin−1
√1
N
=
π
2
=⇒
(
Q = 1,
Q = 2,
Q = 3,
N=4
N=10.5
N=20.2
. . . contd.
Origin of Genetic Languages – p.22/28
Database Search
The computer science paradigm for the “Name the person”
game is “Database search”.
Classical:
Binary tree search is the optimal classical algorithm.
A sorted database of N items can be searched using
log2 N binary questions.
An unsorted database of N items can be searched using
N/2 binary questions with memory, and using N binary
questions without memory.
Quantum:
Wave mechanics works with amplitudes and not with
probabilities. Superposition of amplitudes can yield
constructive as well as destructive interference.
Optimal search solutions differ from the classical ones.
Origin of Genetic Languages – p.23/28
Quantum Database Search
The steps of the algorithm for the simplest case of 4 items
in the database. Let the first item be desired by the oracle.
Algorithmic Steps
Amplitudes
0.5
(1)
0
U
?b
0.25
0
(2)
−Us
?
(3)
p
p
(4) Projection
p
0.25
0
Physical Implementation
Uniform
distribution
Equilibrium
configuration
Quantum oracle
Binary question
Amplitude of
desired state
flipped in sign
Sudden
perturbation
Reflection
about average
Overrelaxation
Desired state
reached
Opposite end
of oscillation
Algorithm
is stopped
Measurement
(Dashed line denotes the average amplitude.)
Origin of Genetic Languages – p.24/28
4. Does the genetic machinery have the ingredients
to implement this algorithm?
Yes!
Superposition can be quantum (e.g. wavefunction), classical (e.g. vibrations),
or illusory (selection time longer than transit time between possibilities).
All this means that if we have to design a language
to implement this task, knowing all the physical laws
that we do, we would opt for something like what is
present in nature.
Origin of Genetic Languages – p.25/28
4. Does the genetic machinery have the ingredients
to implement this algorithm?
Yes!
Superposition can be quantum (e.g. wavefunction), classical (e.g. vibrations),
or illusory (selection time longer than transit time between possibilities).
All this means that if we have to design a language
to implement this task, knowing all the physical laws
that we do, we would opt for something like what is
present in nature.
Classically two nucleotide bases (one complementary pair)
are sufficient to encode the genetic information.
Such a simpler system should have preceded (during
evolution) the four nucleotide base system found in nature.
Was the speed-up provided by the wave algorithm the real
incentive for nature to complicate the language?
Origin of Genetic Languages – p.25/28
Nucleotide Base-pairings
Origin of Genetic Languages – p.26/28
4. Does the genetic machinery have the structure
to implement this algorithm?
Yes!
5. What did nature do? Was this algorithm exploited
when the genetic code evolved billions of years ago?
“?”
(Evolution of life cannot be repeated, and there is a
limit to extrapolating evolutionary trees back in time.)
Origin of Genetic Languages – p.27/28
4. Does the genetic machinery have the structure
to implement this algorithm?
Yes!
5. What did nature do? Was this algorithm exploited
when the genetic code evolved billions of years ago?
“?”
(Evolution of life cannot be repeated, and there is a
limit to extrapolating evolutionary trees back in time.)
6. Do living organisms use this algorithm even today?
In principle, experimentally testable!
Rely on Darwinian selection: Construct artificial
languages having a subset of the building blocks,
and make them compete against the natural language.
Origin of Genetic Languages – p.27/28
4. Does the genetic machinery have the structure
to implement this algorithm?
Yes!
5. What did nature do? Was this algorithm exploited
when the genetic code evolved billions of years ago?
“?”
(Evolution of life cannot be repeated, and there is a
limit to extrapolating evolutionary trees back in time.)
6. Do living organisms use this algorithm even today?
In principle, experimentally testable!
Rely on Darwinian selection: Construct artificial
languages having a subset of the building blocks,
and make them compete against the natural language.
Need a believable, and testable, atomic scale model
implementing Grover’s algorithm using nucleotide bases.
Origin of Genetic Languages – p.27/28
References
All my papers are available at http://arXiv.org/
quant-ph/0002037:Pramana 56 (2001) 367-381
quant-ph/0102034:J. Genet. 80 (2001) 39-43
quant-ph/0105001:J. Biosc. 26 (2001) 145-151
quant-ph/0103017:J. Biosc. 27 (2002) 207-218
quant-ph/0206014:Fluct. Noise Lett. 2 (2002) L279-284
quant-ph/0202022 (review):Computing and Information Sciences: Recent Trends,
Ed. J.C. Misra, Narosa (2003) 271-294
q-bio.GN/0403036:J. Theor. Biol. 233 (2005) 527-532
quant-ph/0503068:Proceedings of QICC 2005, IIT Kharagpur, February 2005,
Ed. S.P. Pal and S. Kumar, Allied Publishers (2006) 197-206
quant-ph/0401154:Int. J. Quant. Inform. 4 (2006) 815-825
quant-ph/0609042:AIP Conference Proceedings 864 (2006) 261-272
0705.3895[q-bio.GN] (review):Chapter 10, Quantum Aspects of Life,
Eds. D. Abbott et al., Imperial College Press (2008) 187-219
Origin of Genetic Languages – p.28/28