Download Course Chemical Biology of Nucleic Acids

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA replication wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

DNA polymerase wikipedia , lookup

Microsatellite wikipedia , lookup

Helicase wikipedia , lookup

Helitron (biology) wikipedia , lookup

Replisome wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Transcript
Course
Chemical Biology of Nucleic Acids
Established 2004 in Haifa at the Technion
by
Prof. Dr. Thomas Carell
Ludwig Maximilians University Munich
Butenandtstr. 5-13
D-81377 Munich
Recommended Literature:
W. Saenger, Principles of Nucleic Acid Structure, Springer Verlag,1983.
G. Quinkert, E. Egert, C. Griesinger, Aspekte der Organischen Chemie, Struktur,
VCH, 1995.
1
1.
The Structure and the Constituents
of DNA and RNA
1.1 The genetic information (background)
Most organisms use the macromolecule DNA to encode the genetic information.
Some viruses have the DNA molecule replaced by RNA. Both molecules contain
three different constituents. A) the nucleobase, B) a sugar, which is either D-(-) ribose
or D(-)-2’-deoxyribose, and C) a phosphodiester linkage. All three components were
and are still prime targets for chemists for modifications. This course will cover the
general principles of nucleic acids with a strong emphasis on the chemical
manipulation of the nucleic acid structure and hence of the genetic system.
Schematic representation of the double helix structure
DNA and RNA are “just” encoding the genetic information. For DNA no other function
in cells is known. RNA in contrast is known to have in addition catalytic (splicing,
ribosome) and gene regulatory functions. Recently mRNA molecules were found,
2
which encode proteins needed for the biosynthesis of vitamins. The mRNA is binding
to the produced vitamins, which changes the structure of the mRNA stopping its
translation into the proteins and hence vitamin biosynthesis (riboswitches).[1, 2]
The genetic information is basically the sequence of the four canonical bases:
adenosine, thymidine, cytidine and guanosine. This base sequence is translated in a
complex and highly controlled process into a sequence of amino acids, which folds
into a protein. To this end, the information has to be read from the DNA, which is
called transcription and then has to be translated into the amino acid sequence,
which is termed translation. In order reproduce a cell, the genetic information has to
be copied. This process is called replication. Replication is tightly controlled in all
organisms. Uncontrolled cell growth is the basis for cancer.
No wonder that many chemically modified nucleobases, which interfere with the
processes of DNA transcription, translation or replication have biologically strong
effects. In higher eukaryotic cells, the DNA molecule is present in the cell nucleus. In
the process of transcription, the DNA sequence, which needs to be decoded, is
“copied” into a mRNA molecule (m = messenger). This mRNA molecule leaves the
cell nucleus and travels to the endoplasmatic reticulum where it is bound by
ribosomes.
3
Schematic depiction of the process of protein biosynthesis
Schematic depiction of the process of transcription
The ribosomes (made up of RNA and proteins) are the catalytic machines, which
generate the amino acid sequence, which folds subsequently into a protein. For the
translational process, special adapter molecules are needed the tRNAs ‘(t = transfer).
They carry each one an amino acid and they possess in addition a small RNAsequence, the “anticodon loop” to interact with the mRNA.
4
Schematic depiction of the process of protein biosynthesis
The structure of the ribosome was recently solved.
[3, 4]
The catalytic activity is
established by the RNA and by the proteins, but RNA alone is sufficient to catalyze
the peptide bond making process.[5]
1.2 The nucleic acids
The
genetic
nucleic
acids
are
long
chain-like
polymers.
Every
base+sugar+phorsphordiester is one unit which forms later in the double helix a base
pair (bp). The length of a gene is measured in kilo-bp (kbp). The total genetic
information comprises even mega-bp (mbp).
The genetic information of the E. coli bacterium is made up from about 4 Mbp =
4x106 base pairs. It has a molecular weight of about 3x109 Da and a length of 1.5
mm.
The genetic information of the haploid fruit fly contains about 180 Mbp distributed
on 4 chromosomes. The total length of the DNA polymer is about 56 mm.
The now totally deciphered genetic system of humans has in each cell 3900 Mbp
and a total length of about 990 mm.
5
Questions mainly of interest for contemporary chemists:
How can we chemically manipulate the genetic information. Can we prepare drugs
that interfere with the processes of transcription, translation, replication and
biosynthesis of the molecules needed to construct DNA in our cells.
Which parameters determine the structure of nucleic acids.
Can we create alternative genetic systems? How did the system evolve
6
1.2.1 Structure and nomenclature
The nucleobases
O
NH2
N
N
N
N
H
N
H
N
Adenine (Ade)
N
N
H
NH2
Guanine (Gua)
NH
NH
N
NH
O
O
NH2
N
H
O
Cytosine (Cyt)
N
H
O
Thymine (Thy)
O
Uracil (Ura)
Pyrimidine
Purine
Depiction of the nucleobases
The base pairing units in nucleic acids are the bases shown above. The monocyclic
bases are the pyrimidines cytosine, thymine and uracil. The bicyclic bases are the
purines adenine and guanine.
The nucleobases are connected with one ring nitrogen with the anomeric center of
the sugar (C1'). In the case of the pyrimidines, the connecting nitrogen is N-1. All
purines are connected via N-9. The resulting bond is called the glycosidic or
sometimes the nucleosidic bond. The resulting nucleosides are adenosine (Ado),
guanosine (Guo), cytidine (Cyd), thymidine (Thd) and uridine (Urd) in the RNA series.
If the ribose is replaced by the 2’deoxyribose the bases are accordingly dGuo, dCyd,
dThd and dUrd.
NH2
N
N
HO
N
N
O
OH H
(OH)
H
OH
7
N
Desoxyadenosine
HO
O
6
NH 1
8
2
N
NH2 HO
N
9
4
3
O
5
OH H (OH)
guanosine
5
4
6
NH2
N
N
1
O
2
O
O
OH H
(OH)
cytidin
R
3
NH
N
HO
O
O
OH H (OH)
R= H: Uridine
R= Me: Thymidine
Depiction of the nucleosides and numbering of pyrimidines and purines
7
In DNA we find the base thymidine. In RNA this base is replaced by uridine. This rule
is not strict because also RNA molecules are known containing thymidine. It is
therefore important to distinguish between desoxyuridine and uridine as well as
between desoxythymidine and thymidine.
The ribose sugar
Ribose is similar to many other aldose sugars a polyhydroxylated aldehyde with the
open chain structure:
O
H
OH
H
OH
H
OH
OH
The ribose molecule used by nature for the construction of DNA and RNA is the D (-)
ribose. The D and L nomenclature was introduced by E. Fischer to distinguish the
two form of glyceraldehyde:
(+)-D-Glyceraldehyde
(-)-L-Glyceraldehyde
O
H
OH
OH
O
HO
H
OH
The letters L und D describe the position of the chiral secondary OH-group, which is
farest away from the most highly oxidized centre. The structure is drawn with this
centre up. Does the OH-group in this Fischer projection point to the left site, we term
the compound L. Does the OH group point to the right side, it is the D-structure.
(+) and (-) are only telling us if the compound turns linear polarized light to the left (-)
or to the right (+).
The figure below recalls the D-series of the aldoses and shows that D-ribose is just
one out of 4 possible D-pentoses. Why did nature choose ribose and not arabinose
to construct the genetic system?
8
Acylic forms of the D-series of aldoses
All sugars exist in solution only to a very small extend in the open chain form. They
are in thermodynamic equilibrium with the cyclic, semi-acetal structures. Most stable
are the six-membered rings, the pyranosides. Upon formation of the ring-structure a
new stereocenter is formed, the anomeric centre. The anomeric OH-group is after
ring closure either α- or β- configured. An alternative ring closure provides 5membered rings which are called the furanosidic forms of sugars. They are higher in
energy and hence only to a small extend formed in solution.
Regarding the α/β equilibrium, the β-anomers are more stable because in the βconfiguration less unfavourable syn-pentane interactions are present. However, the
α-anomer is present in equilibrium to a higher extend than expected if we just
consider
9
the unfavourable syn-pentane interactions. There must be a force which stabilizes
the α-anomer. This force is called
the anomeric effect. It is strong
n
when the group at the anomeric
R
O
C
O
σ*-Orbital
σ*
n
centre is electron withdrawing such
as an OH-group or even better a F.
R
Critical for the anomeric effect is the
delocalization of electron density
present in the non-bonding electron pairs at the ring oxygen into the σ*-orbital of the
glycosidic bond. The interaction is only strong when the substituent at the anomeric
centre is axially oriented.
α/β-Nomenclature: It describes the relative configuration of the anomeric centre
relative to the centre with the highest locant number. Is the OH group at the anomeric
centre in the Fischer projection at the same side as the O-atom of the hemiacetal we
name the bond α, otherwise β.
α-D-Glucopyranose
H
OH
H
OH
HO
H
H
OH
H
O
CH2OH
Fischer-Projection of the cyclic structure
Besides the above mentioned configurational aspects, conformational settings are of
paramount importance for the structure of the genetic information.
All furanoses do not have a strongly favoured preference conformation such as
pyranosides which exist mainly in the chair conformation. Furanoses may exist in an
envelope conformation E, in which four of the five ring atoms form a plane and one
atom stands out of the plane. Alternatively they can exist in the twist conformation T
10
in which only three of the five atoms are in the plane and two are out of plane in
opposite direction.
1
E means for example that the sugar is in the envelope conformation with the atom 1
shifted out of the plane. The shift is in the direction from which the remaining atoms in
the plane can be numbered clockwise. E2 mean accordingly that the sugar exist in
the envelope conformation with atom 2 out of plane in the direction from where the
remaining ring atoms are numbered counter clockwise.
3
1
O
4
3
4
3
2
1
O
1
4
2
E2
E
O
1
2
3
T2
The macroscopic structure of the double helix is determined by the sugar
conformation, the sugar pucker. The ribofuranose does either in an envelope or in a
twist conformation. In the furanose, the plane is formed by the atoms C1’-O4’-C4’t.
Endo-pucker means that C2’ or C3’ are twisted out of this plane towards O5’. A shift
into the opposite direction is called an exo-pucker.
The two puckers C2’-endo und C3’-endo are in equilibrium with each other. The
energy barrier is about 20 kJ mol-1. The favoured conformation is determined by the
substituent at C2’. An electron withdrawing substituent favours at C2’ an axial
position which leads to a C3’-endo conformation of the sugar as found in RNA.
Deoxyribose is more flexible and can adopt both conformations. C2’-endo is slightly
favoured.
C(2‘)-endo
is
2
E
and
C(3‘)-endo
is
3
E
11
The nucleosides are finally connected via phophordiester groups to long chains.
These chains are the carrier of the genetic information. It is however important to
note that nature makes
NH2
the nucleosides not only
N
for the purpose of storing
genetic
Many
O
information. -O P O
Oother
very
important processes in
our
cells
are
regulated
or
O
even
derivatives.
First
important
the
signalling
role
(cAMP)
O
NH
N
N
NH2
NH2
N
OH
O
O P O
O
O-
various phosphates play
an
N
N
performed by nucleotides
nucleotide
N
OH
O
O P O
O
O-
also
and
N
N
and
NH2
N
O P
O
N
N
O
O
N
O
OH OH
energy storage (ATP).
O
O
NH
O
OH
O P O
O
O-
in
N
O
O
NH2
N
O
O
O
HO P O P O P O
O
O
O
OH
NH2
N
cAMP
N
HO
N
N
O
OH OH
ATP
NH
N
N
N
HO
O
O
O
OH
O P O
O
3'-UMP
N
O
2',3'-AMP
O
O
P
O
O
12
Finally, nucleotides are frequently constituent of co-enzymes such as NAD. In
summary, nature developed highly sophisticated molecules to store the genetic
information. At the same time the molecules are used for a plethora of other tasks.
NH2
OH OH
N
N
H2N
O
O
O
O
O P O P O
O
O
N
N
N
O
NAD+
OH OH
Due to the connection of the nucleotides in DNA and RNA via phosphodiesters, each
connecting unit bears a negative charge. DNA and RNA are therefore polyanions.
This is important to keep the macromolecule soluble in water. It however causes also
an enormous Coulomb repulsion when two single strands come together to form a
double strand. For the formation of soluble strands, metal ions are strictly required to
compensate the charge. They closely associate with the polyanions to establish
electric neutrality. All experiments with DNA and RNA have to be performed under
strict control of the ion strength. The amount of salt in solution strongly affects the
properties of DNA and RNA. Normally one adds to DNA and RNA solutions about
150 mM NaCl in a buffer buffereing at pH = 7. More salt stabilizes the duplexes. Less
salt destabilizes.
If the salts are replaced by a polymer, which bears positively charged groups or by
long chain tetraalkylammonium salts, than the DNA will become soluble also in
organic solvents.[6, 7]
The sequence of an oligonucleotide is noted with letters Uridine = U, Thymidine = T,
Adenosine = A, Guanosine = G and Cytidine = C. For desoxy one adds the letter d to
get: dU, dT, dA, dG and dC.
The trinucleotide G-C-U is completely named Guanylyl-3‘,5‘-cyctidylyl-3‘,5‘-uridine,
one writes however GpCpU or even shorter GCU with the G forming the 5’-end and
the U forming the 3‘-end of the oligonucleotide. We write oligonucleotide sequences
always in 5‘ to 3‘-direction.
13
The formation of double strands requires the antiparallel alignment of two single
strands. The base pairs have to be complementary so G faces C and A faces T. The
Figure below shows an antiparallel double strand forming a right handed helix. It is
important to note that the bases are inside the helix stacking on top of each other at a
distance of 3.4 Å facing each other. The negatively charged backbone is outside
contacted by metal ions. The typical helix has two grooves, a major groove and a
minor groove, which are the binding and recognition sides for proteins (major groove)
and small molecules (minor groove)
14
1.2.2
Physical
properties
of
nucleobases,
nucleosides
and
nucleotides
NH2
N
N
H
N
N
Adenine
O
7
6
N
8
NH2
9
N
H
N
NH 1
5
4
N
2
3
4
N
H
NH2
Guanine
O
Cytosine
5
6
O
3
O
NH
NH
2
N
O
H 1
Thymine
N
H
O
Uracil
The H-bond donor and acceptor groups of the nucleobases determine the base
pairing and the positions, where proteins are able to bind in a sequence specific
manner. The H-bonding strength is controlled by the pKa values of the nucleobases.
The pKa -values also show that the bases are neutral under pH = 7 conditions. The
pKa-values are summarized in Table 1.
Table 1: pKa-values of the nucleobases at 20°C without salt.
Base
Group
Nucleoside
3’-Nucleotide
5’-Nucleotide
Adenine
N-1
3.52
3.70
3.88
Cytosine
N-3
4.17
4.43
4.56
Guanine
N-7
3.3
3.5
3.6
Guanine
N-1
9.42
9.84
10.00
Thymine
N-3
9.93
Uracil
N-3
9.38
10.47
9.96
10.06
All bases are under physiological conditions uncharged (5 < pH < 9). This is of
course also true for the ribose which’s’ secondary OH groups have a pKa-value of
about 12. The three nucleobases A, C, and G are first protonated at the ring
nitrogens. The exocylic NH2-groups are not very basic because they delocalize the
lone pair of the NH2-group into the aromatic system. The C-NH2 bonds are
consequently with about 1.34 Å shortened (bond length: C-N 1.47 Å, C=N 1.25 Å )
The phosphodiester group is negatively charged. (pKa-values for H3PO4 at 25ºC pK1
= 2.16, pK2 = 7.20 und pK3 = 12.33).
15
Nucleobases can exist in many different tautomeric forms. This would be a
catastrophe for exact base pairing. The nucleobases A, T, C and G exist to more
then 99.99% in the always shown amino- and ketoforms
N
H
O
N
OH
N
H
NH2
NH
Imin
Amin
Enol
Keto
N
Tautomeric forms of pyridine-2-on and 2-Aminopyridine as examples
1.2.3 H-bonds connect the bases
All of the four bases have to be able to form a series of highly specific H-bonds. The
NH-groups in the ring systems and the exocyclic NH2-groups are the H-bond donors
(d). The keto-groups (C=O) function as H-bond acceptors (a).
The energy that keeps two molecules to from an H-bond is by nature electrostatic.
The electrostatic charge is about +0.2e at the H and –0.2e at the C=O. A typical Hbond gives about 6-10 kJmol-1 in attractive energy. This energy is however not
strongly contributing to the strength of a base pair in a DNA duplex. H-bonds would
also be formed between the bases and surrounding water. If you bring two
complementary bases together, you loose the H-bonds to water and you gain the
energy from the newly formed H-bonds, which is in summary a plus-minus game.
For shape reasons, all bases form in the DNA duplex exclusively Watson-Crick Hbonds as shown below.
HN H
O
N
H N
N
r
O
N
H NH
N r
N H
N
N
r
N
H
H N
O
N r
N
N
O
16
The distances between the centers of N...O is between 2.8 Å – 2.95 Å. Again for
shape reasons it is always a purine base which is pairing with a pyrimidine base.
Kool et al developed the base bisfluorpyrimidine and showed with this base behaves
like thymine, although the base can not from any H-bonds, showing that H-bonds are
not essential for the formation of the double strand.[8-11]
F
F
r
In the case of the A--T base pair one finds sometimes in tRNA the reversed WatsonCrick base pairing mode.
r
H
H N
O
N
N
N r
N
N H
N
O
Next to Watson-Crick base pairing, Hoogsteen und reversed Hoogsteen base
pairing is an important base pairing mode. The Hoogsteen sites are frequently
employed by proteins and small molecules to interact with DNA through the minor or
major grooves.
N
HN H
O
N
N
N
N
r
Hoogsteen
H
N r
O
N
HN H
N
N
O
H
r
N
H2 N
reverse Hoogsteen
H
H N
O
N
N
N
N
N
r
H
N
H
N r
O
N
r
O
GC+ Hoogsteen Basenpaar
In this mode, the adenine is pairing via N-6 and N-7 with the thymine base. After
protonation of C to C+ this base can form a Hoogsteen base pair with A. This is in fact
an important binding motif found in triple helix complexes.
17
In tRNA a G-U base pair is sometimes formed which is connected through two Hbonds. This base pairing motif is called Wobble-base pair. The bases have to slide
al little to form this H-bond.
O
N H O
N
r
O
H N
N
N r
N
H2N
The H-Bond
The H-bond is the most important orienting interaction between molecules in nature.
The concept was developed in 1919 by Huggins at the UC Berkeley. The H-bonds is
essentially the bonding of a covalently bonded H-atom to another atom.
R-X-H
+ Y-R'
R-X-H ---- Y-R'
Typical H-donors are: -OH, -NH2, -COOH, -CONH2, NH2CONH2
Typical H-acceptors are O-atoms in alcohols, ethers, C=O systems and N-atoms in
Amines and N-heterocycles
Strong H-bonds :
O-H...O, OH...N, N-H...O
Medium H-bonds:
N-H...N
Weak interactions:
Cl2C-H...O, Cl2C-H...N
O-H...π-Systeme
The attractive force is best described by electrostatic forces (Coulomb force).
However, small orbital contributions are always also discussed particularly to explain
NMR coupling constants through a H-bond (3HJ-coupling). In general the H-atom
carries a positive partial charge and the acceptor hat possesses a negative charge.
One therefore expects that H-bonds ar linear with an angle D-H...A of about 180º.
Analysis of crystal structure (molecular packing) shows indeed typical angles of
167±20º for O-H...O and 161±20º for O-H...N H-bonds in very good agreement with
the expectation.
18
A typical H-bond is a-symmetric. The X-H bond is clearly covalent and the H...Y bond
is non-covalent.
Because the Coulomb-potential drops only slowly with distance (1/r) H-bond can be
far reaching forces.
Very recently even the existence of very strong H-bonds, so called low-barrier
symmetric H-bonds in biological systems has been discussed. Here the H-atom is
symmetrically placed between the donor and the acceptor. It is currently speculated
that such strong H-bond may participate in transition state stabilization of enzyme
reactions.
Normal H-bond and a low barrier hydrogen bond (LBHB)
19
Examples of H-bonds, which fall into the LBHB regime are:
F-H...F- , O-H...O- , O+-H...O.
In these cases one observes a very short H-acceptor distance of 1.2 Å – 1.5 Å. The
distance between the two heteroatoms is reduced to van der Waals distance of 2.5 Å.
The energy of the H-bond is >40KJ mol-1 and the angle is exactly 180º. This LBHB
has in the IR spectrum vibrational bands <1600 cm-1. The 1H-NMR shows the proton
at > 17 ppm.
Normal H-bonds from between one donor-H and one acceptor. However, H-bonds
are also possible between one D-H and two acceptors. These are called bifurcated
H-bonds.
A
bifurcated H-bond:
D
H
A
Normal H-bonds show a rather large variation of angles and distances. The strength
of an H-bonds varies also between 3 – 7 kcal/mol in the gas phase and in unpolar
media. In water is the strength of a hydrogen bond very low. Here, the solvents
competes with the two molecules for binding. The strength of a H-bonds depends on
the pKa-values of the two centers involved.
At constant donor strength, the basicity of the acceptor is determining
At constant acceptor strength, the acidity of the donor is determining.
Examples for: HO-H...B
B-Species: MeNH2 (-6.8 kcal/mol) > CH3CN (-4.9 kcal/mol) because MeNH2 is the
better acceptor (more basic). MeOH (-6.8 kcal/mol) > water (-6.2 kcal/mol) because
MeOH has the more basic O.
In the series of C=O compounds we get the following order:
20
Urea > N-methylurea > acetic acid > acetic acid methyl ester > acetone
Examples for: HO-H...B
HOH-Species: All amines are very weak H-bond donors because they are not acidic
enough (CH3NH2...OH2: 3.5 kcal/mol). Amides are much better donors because they
are more acidic (CH3CONHCH3 ... OH2, -6.7 kcal/mol). Excellent is acetic acid
because of the strong acidity of pKa = 4.7 (CH3COOH...OH2 = -8.8 kcal/mol).
For more details see the calculated values in the Table below (W. L. Jorgensen
calculated in the gas phase using the OPLS-force field)
21
Secundary H-bonds
If one compares the four complexes below, measured in chloroform, it becomes clear
that the pairing strengths are very different although all complexes are hold together
by 3 H-bonds (Δ(ΔG) ≈ 3-4 kcal/mol).
H
O
N
N
Et
Ar
H
N
N
N
R
O
N H
H
N
N
Bz
HN
H
N
H
N
N
Br
N
4
N
Et
O
H
N
N
N H
H
CH3
O
N
-1
Ka = 1.7 x 10 Lmol
H3C
HN
H
N
O
Ka ca. 104 -105 Lmol-1
H
O
N
R
O
H
N
C3H7
N
N
H
N
R
O
H
N
C3H7
O
O
-1
-1
Ka = 90 Lmol
Ka = 170 Lmol
One model explaining these differences stems from W. L. Jorgensen (Yale
University). This model is strongly supported by many experiments.
All H-bonds are in these complexes approximately equally strong. The primary
interaction should therefore lead to equally strong complexes. One has to consider,
however, also secondary electrostatic interactions between neighbouring H-bonding
centres. Again all H’s carry a positive partial charge while the acceptors bear a
negative charge. The model is depicted below :
+
-
+
-
+
-
+
-
-
+
-
+
+
-
-
+
+
-
4 positive i. a.
2 positive i. a.
2 non-positive i.a.
4 non-positive i.a.
22
The model of secondary interactions nicely explains the binding constants found for
the three complexes below in Chloroform.
O
O
Ph
N
H
O
H
N
N
H
N
OC3H7
N
H
N
Ph
H
O
OC3H7
(DAD)-(ADA)
Ka 78 M-1
N
N
H
HN
H
N
N
Ph
N
N
N
H
H N
H
N
H
N H
O
Ph
O
C3H7O H Ar OC3H7
N
N
H
Me
O
(DDA)-(AAD)
Ka 9.3 x 103 M-1
CO2C8H17
CO2C8H17
(DDD)-(AAA)
Ka > 105 M-1
In DNA, the H-bonds determine the selectivity of base pairing because they
contribute to the shape of the nucleobases. Not forming them would cost in addition a
significant amount of energy. The stability of the duplex, however, is not determined
by the H-bonds. The bases want to stack because of hydrophobic interactions and
dispersion forces.
Dispersion forces (induced dipole interactions)
A dipole in one molecule can induce a dipole in a second molecule. The size of the
induced dipole depends on the size of the dipole and the polarizability of the partner
UInd.Dipol-Dipol =
-
1
---------4πεo
α1 μ22 + α2 μ21
------------------------r6
α = Polarizability
23
Dispersion forces are attractive forces, which form when an induced dipole induces a
dipole in a second molecule. These London dispersion forces are strongly distance
dependent (1/r6) The individual forces are small but they sum up to significant
contributions when large contact surfaces are present such as in nucleobases
stacking on top of each other. These forces are presumably the most important
attractive forces of unpolar molecules. In the Lennard Jones potential that described
the potential energy of molecules brought together, the first term A describes the
repulsive forces when molecules are pressed together. The second term B describes
the attractive forces, which for unpolar molecules are the dispersion forces.
U
=
A
-------r12
−
B
---------r6
The term B for temporary induced dipole dipole interactions can be described by the
Slater-Kirkwood equation:
B
=
3/2 e (h/2πm1/2) αl αj
-----------------------------------(ai/Ni)1/2 + (aj/Nj)1/2
In this equation is α the polarizability, e the elementary charge, m the mass of an
electron, h the Planck constant, N the effective number of electrons in the outer shell.
It is clear that atoms with many valence electrons and increasing polarizability have
larger dispersive power. The table below gives some numbers. O is only weakly, CH, -CH2 and -CH3 groups medium and S highly polarizable.
24
25
The hydrophobic interaction
If hydrophobic molecules are inserted into water, the water molecules order around
the molecule to build a quasi crystalline surface. This is done in order to maximise
the hydrogen bonding of the water molecules around a hydrophobic surface. If two
hydrophobic molecules meet they will associate with their hydrophobic surfaces
towards each other. The water molecules, which were attached to these surfaces are
distributed back into the bulk solvent. This generated a favorable entropy. The
entropic gain is responsible for almost all associations in the medium water and
hence of extreme importance for life (Formation of membranes, micelles, and for
protein folding where folding starts often with tryptophane residues forming a
hydrophobic core). The Figure below illustrates the hydrophobic effect.
26
One can measure the hydrophobic effect if one observes how an alkane molecule
enters a water or a cyclohexane phase from the gas phase. The entropic effect is
large in water.
The strongly negative entropy is the main reason for the positive free enthalpy ΔG
which make the whole process endothermic. The entropic effect compensates even
the small enthalpic gain.
What is the nature of the small enthalpic gain?
The ordered water molecules have not all possible 4 H-bonds established if present
in the quasi crystalline surface structures. If they are released into the bulk, they can
reform all H-bonds, which gives the required small negative enthalpic contribution.
Also the dispersion forces are smaller between water and the hydrophobic molecule
compared to two hydrophobic molecules sticking on each other.
27
The hydrophobic character of a molecule is extremely important in pharmaceutical
research. It determines if a molecule is able to cross cell membranes. The
hydrophobic character of a group is measured by a distribution experiment between
n-octanol and water. The hydrophobic character of a molecule can be calculated from
the sum of the hydrophobic character of the individual groups. One determines the
distribution of a compound H-S in the organic phase and in water Porg/Pwater = Po .
Than the same value is obtained for a substituted R-S = P. The hydrophobicity
constant for that group R is then defined by π = P/Po
The larger P the larger is the solubility in organic media. For new pharmaceuticals the
logP-value is today an important value, which can be even calculated for not yet
synthesized compounds using special computer programs.
The π-values are additive. Every methyl group in a molecule will contribute an
increment of 0.5 to the overall characteristics of the molecule at question.
One also finds a good correlation between the size of a hydrophobic surface and the
enthalpy of transfer into water. 1Å2 surface gives upon coverage a hydrophobic
energy of about 20-25 cal/mol.
28
1.3 Parameter which determine the structure of the double
helix.
The exact orientation of the base relative to the sugar determines the structure of the
double helix. These structural aspects are very important because oligonucleotide
duplexes can exist in many different global conformations (A, B, and Z).
Excurse: Definition of angles in molecules
The three dimensional structure of molecules is determined by bond length and bond
angles. The torsion angle θ describes the angle of a central bond B-C in the system
A-B-C-D.
θ Describes the angle between the bonds A-B und C-D if one looks along the central
axis B-C as shown in the picture c below. One can look into the direction B-C or C-B.
θ is 0° if A-B and C-D are fully eclipsed (cis and coplanar, syn-periplanar). The sign
of θ is positive, if the distal bond is turned clockwise away from the proximal bond.
The torsion angle is therefore given either as a value between 0° – 360° or as –180°
until +180°. We can also derive the torsion angle if we take the relation of the two
planes ABC and BCD as shown below in picture d
Next to torsion angles we can also describe the position of atoms relative to each
other using dieder angles φ. This dieder angles are derived as shown in the picture b
below. It is important to distinguish both angles.
29
The conformation of nucleosides (these are the 3’-mono-phosphates of the
nucleosides) is described by number of angles. Important for the structure and
function of nucleic acids are the torsion angles δ und χ. Important is also the pucker
of the sugar.
1.3.1 The angle δ
The angle δ is critical for the formation of a double helix. If δ = 60° a double helix will
not form. The single strand will in contrast exist as a zic-zac chain. Is δ > 60° then
formation of a double helix will occur. In the case of DNA and RNA we find δ = 80°
given rise to a helical structure. The group of Prof. Prof. Eschenmoser (ETH Zürich)
recently prepared nucleic acids which contain sugars other then D-(-)-ribose.
Particularly interesting are nucleic acids constructed from pyranosidic forms of sugars
such as the homo-DNA, containing one CH2-group more. The angle δ adopts in these
nucleotides indeed a value of 60° forcing the double strand to form a quasi linear
pairing system. The six membered ring of the pyranose is conformationally much
more restricted and strongly favours the chair conformation.
It is therefore the repetition of nucleotidic structural elements and an angle δ
of
80°C which finally gives rise to the formation of the α-helix is so typical for
oligonucleotides.
1.3.2 The torsion angle χ
This angle described the orientation of the base relative to the sugar. All bases have
to be in the so called anti conformation to enable efficient base pairing and hence
double strand formation. If the bases are rotated around the glycosidic bond by 180°
they adopt the so called syn conformation in which base pairing for pyrimidines is
impossible and purines have now to use the Hoogsteen mode to pair. This is highly
unusual and in nature only realized in Z DNA, in which the dG base adopt the syn
conformation.
30
Comparison between DNA and homo-DNA.
Description of the angle χ which determines the base pairing characteristics of
the bases
31
All canonical bases dA, dC, dG and dT exist in general in the anti conformation
required for base pairing. If however the purines are modified at C8, then the synconformation for steric reason will be energetically more favourable.
Our genome is constantly damaged and one process that is frequently observed is
the oxidation of deoxyguanosine to deoxy-8-oxoguanosine shown below. This base
is constantly formed in our genome and the lesion is made responsible for the
occurrence of spontaneous mutations and the ageing process. The mutational
properties of the base are currently explained with a syn/anti equilibrium around the
glycosidic bond. In the anti conformation, the base is pairing with dC. However, it
easily rotates into the syn conformation were base pairing occurs with dA. If our
genome is copied, such a base will instruct polymerase to insert a wrong dA opposite
the lesion with a 50% chance. This is believed to be the basis for many mutation
occurring in response to oxidative stress.[12]
RO
O
H
O H N
H
N
O
N
N H N
OR
N
N
N H O
H
OR
H
O
N
RO
N
N
O
OR
O
dA
8-oxodG
OR
OR
N
N
H
N
N
O
OR
dC
8-oxodG
H
H
N
O
H
N
H
N
Similarly the second major oxidative lesion, the formamidopyrimidine of the dG base,
is currently believed to be able to pair with dC and possibly with dT.
H
H
RO
O
O
H
N
H N
O
NH
HN
N
NH2
N
OR
N
O
O
RO
FaPydG
H
N
H2N
dC
N
RO
O
OR
O
NH
NH
H
O
RO
O
HN
O
FaPydG
OR
N
O
dT
OR
32
1.3.3 The sugar pucker
The sugar pucker determines the shape of the α-helix, whether the helix will exist in
the A-form or in the B-form. The five membered ring is for steric reasons not planar
(Pitzer –transannular– tension, because all bond would be ecliptic). One atom or two
are turned out of the plane by 50 pm. In the ribofuranose, the plane C1’-O4’-C4’ is
fixed. Endo-pucker means that C2’ or C3’ are turned out of this plane into the
direction of O5’. Exo-pucker describes a shift in the opposite direction. C2’-endo and
C3’-endo are in equilibrium. In RNA we find predominantly the C3’-endo
conformation. DNA may adjust and is able to take on both conformations. For an
exact nomenclature of sugar puckers see below.
Description of the various puckers of the ribose system
33
Today it is clear that DNA adopts a variety of conformations depending on sequence.
Sequence also affects the flexibility of the DNA molecules. A:T sequences are more
flexible than G:C sequences. These facts are called the secondary genetic code
because some proteins seem to bind based on structure and flexibility to DNA and
not by contacting DNA in sequence specific manner.
In the cell, DNA is tightly packed, which also changes the structure of the DNA
molecule dramatically. The DNA duplex is wound around a Histone-octamer. The
structure was solved by the Richmond group at ETH Zürich.
Packing of DNA around a histone octamer
Even the histone is further packed into larger structures called chromatin. Chromatin
condensation and decondensation which allows or denies proteins involved in
transcription and replication access to DNA are currently active research areas.[13, 14]
Packing of the histones
34
1.4 Helical parameters
The various double helices are described by three parameters:
P:
P is the pitch of the helix corresponding to the distance between a base and
the base obtained after walking along the DNA one full turn of 360°.
n:
n is the number on nucleotides within one pitch.
h:
distances between bases.
The winding of double stranded DNA to give a double helix results in the formation of
grooves . These grooves are very different for A, B and Z-DNA. The normal B-DNA
has two groves. One large major groove and a small minor groove. The H-bond
donors and acceptors, which point into these grooves are the recognition sites that
allow proteins and small molecules to bind specifically to certain positions in these
grooves. All the protein driven gene regulatory functions operate in the major groove.
Base pairs and functional groups pointing into the various grooves
35
The position of the base pairs relative to the helix axis are described by another four
parameters
D:
Distance between the centre of the base pair and the helix axis.
ΘT:
Basepair-tilt. Shift of the base pairs short axis relative to the vertical helix axis.
Θr:
Basepair-roll, Shift of the base pairs long axis relative to the vertical helix axis.
Θp:
Propeller-twist, Twist of the bases against each other.
1.5 The conformation of the double strands
We know today three main double strand conformations termed A, B, and Z. RNA
double strands all adopt an A conformation. DNA exists mainly in the B-conformation.
DNA, however, which contains one or a few RNA bases will shift towards an Aconformation. In the A form, the P...P distance is due to the C3’-endo conformation of
the ribose reduced from 7.0 Å (B-conformation) to 5.9 Å. This also reduces the
distance between the bases from h = 3.30–3.37 Å (B-conformation) to 2.59-3.29 Å in
the A duplex. The A structure is slightly unwound with 11-12 nucleotides for every
360º turn (B-conformation: 10 nucleotides).
A major difference between both structures is the shift of the base pairs relative to
the helix axis. In B-form, the helix axis runs almost straight through the centre of the
base pair. (D = -0.2 Å). In the A conformation, however, the centres of the base pairs
are strongly shifted away from the helix axis. Here the base pairs wind around the
helix axis with D = 4.5 Å. The result is a very small but deep major groove, only
accessible to water and metal ions, and a shallow but wide minor. In B-DNA, both
36
grooves are equally deep but the width differs strongly. Both grooves are able to
participate in molecule recognition phenomena.
The table below summarizes the main data.
Z-DNA is not a right handed helix but is left handed. The dG bases exist all in the
unusual syn-conformation.
37
B-DNA
38
A-DNA
39
Some additional literature:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
A. M. Jose, G. A. Soukup, R. R. Breaker, Nucleic Acids Res. 2001, 29, 1631.
N. A. Winkler WC, Roth A, Collins JA, Breaker RR, 2004, 428, 281.
M. Valle, R. Gillet, S. Kaur, A. Henne, V. Ramakrishnan, J. Frank, Science
2003, 300, 127.
P. B. Moore, Biochemistry 2001, 40, 3243.
P. Schimmel, R. Alexander, Science 1998, 281, 658.
K. Tanaka, Y. Okahata, J. Am. Chem. Soc. 1996, 118, 10679.
Y. Okahata, T. Kobayashi, K. Tanaka, M. Shimomura, J. Am. Chem. Soc.
1998, 120, 6165.
T. J. Matray, E. T. Kool, J. Am. Chem. Soc. 1998, 120, 6191.
L. Dzantiev, Y. O. Alekseyev, J. C. Morales, E. T. Kool, L. J. Romano,
Biochemistry 2001, 40, 3215.
E. T. Kool, Annu. Rev. Biochem. 2002, 71, 191.
E. T. Kool, Acc. Chem. Res. 2002, 35, 936.
M. Ober, U. Linne, J. Gierlich, T. Carell, Angew. Chem. Int. Ed 2003, 42, 4947.
K. Luger, A. W. Mäder, R. K. Richmond, D. F. Sargent, T. J. Richmond, Nature
1997, 389, 251.
T. J. Richmond, C. A. Davey, Nature 2003, 423, 145.
40