Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Course Chemical Biology of Nucleic Acids Established 2004 in Haifa at the Technion by Prof. Dr. Thomas Carell Ludwig Maximilians University Munich Butenandtstr. 5-13 D-81377 Munich Recommended Literature: W. Saenger, Principles of Nucleic Acid Structure, Springer Verlag,1983. G. Quinkert, E. Egert, C. Griesinger, Aspekte der Organischen Chemie, Struktur, VCH, 1995. 1 1. The Structure and the Constituents of DNA and RNA 1.1 The genetic information (background) Most organisms use the macromolecule DNA to encode the genetic information. Some viruses have the DNA molecule replaced by RNA. Both molecules contain three different constituents. A) the nucleobase, B) a sugar, which is either D-(-) ribose or D(-)-2’-deoxyribose, and C) a phosphodiester linkage. All three components were and are still prime targets for chemists for modifications. This course will cover the general principles of nucleic acids with a strong emphasis on the chemical manipulation of the nucleic acid structure and hence of the genetic system. Schematic representation of the double helix structure DNA and RNA are “just” encoding the genetic information. For DNA no other function in cells is known. RNA in contrast is known to have in addition catalytic (splicing, ribosome) and gene regulatory functions. Recently mRNA molecules were found, 2 which encode proteins needed for the biosynthesis of vitamins. The mRNA is binding to the produced vitamins, which changes the structure of the mRNA stopping its translation into the proteins and hence vitamin biosynthesis (riboswitches).[1, 2] The genetic information is basically the sequence of the four canonical bases: adenosine, thymidine, cytidine and guanosine. This base sequence is translated in a complex and highly controlled process into a sequence of amino acids, which folds into a protein. To this end, the information has to be read from the DNA, which is called transcription and then has to be translated into the amino acid sequence, which is termed translation. In order reproduce a cell, the genetic information has to be copied. This process is called replication. Replication is tightly controlled in all organisms. Uncontrolled cell growth is the basis for cancer. No wonder that many chemically modified nucleobases, which interfere with the processes of DNA transcription, translation or replication have biologically strong effects. In higher eukaryotic cells, the DNA molecule is present in the cell nucleus. In the process of transcription, the DNA sequence, which needs to be decoded, is “copied” into a mRNA molecule (m = messenger). This mRNA molecule leaves the cell nucleus and travels to the endoplasmatic reticulum where it is bound by ribosomes. 3 Schematic depiction of the process of protein biosynthesis Schematic depiction of the process of transcription The ribosomes (made up of RNA and proteins) are the catalytic machines, which generate the amino acid sequence, which folds subsequently into a protein. For the translational process, special adapter molecules are needed the tRNAs ‘(t = transfer). They carry each one an amino acid and they possess in addition a small RNAsequence, the “anticodon loop” to interact with the mRNA. 4 Schematic depiction of the process of protein biosynthesis The structure of the ribosome was recently solved. [3, 4] The catalytic activity is established by the RNA and by the proteins, but RNA alone is sufficient to catalyze the peptide bond making process.[5] 1.2 The nucleic acids The genetic nucleic acids are long chain-like polymers. Every base+sugar+phorsphordiester is one unit which forms later in the double helix a base pair (bp). The length of a gene is measured in kilo-bp (kbp). The total genetic information comprises even mega-bp (mbp). The genetic information of the E. coli bacterium is made up from about 4 Mbp = 4x106 base pairs. It has a molecular weight of about 3x109 Da and a length of 1.5 mm. The genetic information of the haploid fruit fly contains about 180 Mbp distributed on 4 chromosomes. The total length of the DNA polymer is about 56 mm. The now totally deciphered genetic system of humans has in each cell 3900 Mbp and a total length of about 990 mm. 5 Questions mainly of interest for contemporary chemists: How can we chemically manipulate the genetic information. Can we prepare drugs that interfere with the processes of transcription, translation, replication and biosynthesis of the molecules needed to construct DNA in our cells. Which parameters determine the structure of nucleic acids. Can we create alternative genetic systems? How did the system evolve 6 1.2.1 Structure and nomenclature The nucleobases O NH2 N N N N H N H N Adenine (Ade) N N H NH2 Guanine (Gua) NH NH N NH O O NH2 N H O Cytosine (Cyt) N H O Thymine (Thy) O Uracil (Ura) Pyrimidine Purine Depiction of the nucleobases The base pairing units in nucleic acids are the bases shown above. The monocyclic bases are the pyrimidines cytosine, thymine and uracil. The bicyclic bases are the purines adenine and guanine. The nucleobases are connected with one ring nitrogen with the anomeric center of the sugar (C1'). In the case of the pyrimidines, the connecting nitrogen is N-1. All purines are connected via N-9. The resulting bond is called the glycosidic or sometimes the nucleosidic bond. The resulting nucleosides are adenosine (Ado), guanosine (Guo), cytidine (Cyd), thymidine (Thd) and uridine (Urd) in the RNA series. If the ribose is replaced by the 2’deoxyribose the bases are accordingly dGuo, dCyd, dThd and dUrd. NH2 N N HO N N O OH H (OH) H OH 7 N Desoxyadenosine HO O 6 NH 1 8 2 N NH2 HO N 9 4 3 O 5 OH H (OH) guanosine 5 4 6 NH2 N N 1 O 2 O O OH H (OH) cytidin R 3 NH N HO O O OH H (OH) R= H: Uridine R= Me: Thymidine Depiction of the nucleosides and numbering of pyrimidines and purines 7 In DNA we find the base thymidine. In RNA this base is replaced by uridine. This rule is not strict because also RNA molecules are known containing thymidine. It is therefore important to distinguish between desoxyuridine and uridine as well as between desoxythymidine and thymidine. The ribose sugar Ribose is similar to many other aldose sugars a polyhydroxylated aldehyde with the open chain structure: O H OH H OH H OH OH The ribose molecule used by nature for the construction of DNA and RNA is the D (-) ribose. The D and L nomenclature was introduced by E. Fischer to distinguish the two form of glyceraldehyde: (+)-D-Glyceraldehyde (-)-L-Glyceraldehyde O H OH OH O HO H OH The letters L und D describe the position of the chiral secondary OH-group, which is farest away from the most highly oxidized centre. The structure is drawn with this centre up. Does the OH-group in this Fischer projection point to the left site, we term the compound L. Does the OH group point to the right side, it is the D-structure. (+) and (-) are only telling us if the compound turns linear polarized light to the left (-) or to the right (+). The figure below recalls the D-series of the aldoses and shows that D-ribose is just one out of 4 possible D-pentoses. Why did nature choose ribose and not arabinose to construct the genetic system? 8 Acylic forms of the D-series of aldoses All sugars exist in solution only to a very small extend in the open chain form. They are in thermodynamic equilibrium with the cyclic, semi-acetal structures. Most stable are the six-membered rings, the pyranosides. Upon formation of the ring-structure a new stereocenter is formed, the anomeric centre. The anomeric OH-group is after ring closure either α- or β- configured. An alternative ring closure provides 5membered rings which are called the furanosidic forms of sugars. They are higher in energy and hence only to a small extend formed in solution. Regarding the α/β equilibrium, the β-anomers are more stable because in the βconfiguration less unfavourable syn-pentane interactions are present. However, the α-anomer is present in equilibrium to a higher extend than expected if we just consider 9 the unfavourable syn-pentane interactions. There must be a force which stabilizes the α-anomer. This force is called the anomeric effect. It is strong n when the group at the anomeric R O C O σ*-Orbital σ* n centre is electron withdrawing such as an OH-group or even better a F. R Critical for the anomeric effect is the delocalization of electron density present in the non-bonding electron pairs at the ring oxygen into the σ*-orbital of the glycosidic bond. The interaction is only strong when the substituent at the anomeric centre is axially oriented. α/β-Nomenclature: It describes the relative configuration of the anomeric centre relative to the centre with the highest locant number. Is the OH group at the anomeric centre in the Fischer projection at the same side as the O-atom of the hemiacetal we name the bond α, otherwise β. α-D-Glucopyranose H OH H OH HO H H OH H O CH2OH Fischer-Projection of the cyclic structure Besides the above mentioned configurational aspects, conformational settings are of paramount importance for the structure of the genetic information. All furanoses do not have a strongly favoured preference conformation such as pyranosides which exist mainly in the chair conformation. Furanoses may exist in an envelope conformation E, in which four of the five ring atoms form a plane and one atom stands out of the plane. Alternatively they can exist in the twist conformation T 10 in which only three of the five atoms are in the plane and two are out of plane in opposite direction. 1 E means for example that the sugar is in the envelope conformation with the atom 1 shifted out of the plane. The shift is in the direction from which the remaining atoms in the plane can be numbered clockwise. E2 mean accordingly that the sugar exist in the envelope conformation with atom 2 out of plane in the direction from where the remaining ring atoms are numbered counter clockwise. 3 1 O 4 3 4 3 2 1 O 1 4 2 E2 E O 1 2 3 T2 The macroscopic structure of the double helix is determined by the sugar conformation, the sugar pucker. The ribofuranose does either in an envelope or in a twist conformation. In the furanose, the plane is formed by the atoms C1’-O4’-C4’t. Endo-pucker means that C2’ or C3’ are twisted out of this plane towards O5’. A shift into the opposite direction is called an exo-pucker. The two puckers C2’-endo und C3’-endo are in equilibrium with each other. The energy barrier is about 20 kJ mol-1. The favoured conformation is determined by the substituent at C2’. An electron withdrawing substituent favours at C2’ an axial position which leads to a C3’-endo conformation of the sugar as found in RNA. Deoxyribose is more flexible and can adopt both conformations. C2’-endo is slightly favoured. C(2‘)-endo is 2 E and C(3‘)-endo is 3 E 11 The nucleosides are finally connected via phophordiester groups to long chains. These chains are the carrier of the genetic information. It is however important to note that nature makes NH2 the nucleosides not only N for the purpose of storing genetic Many O information. -O P O Oother very important processes in our cells are regulated or O even derivatives. First important the signalling role (cAMP) O NH N N NH2 NH2 N OH O O P O O O- various phosphates play an N N performed by nucleotides nucleotide N OH O O P O O O- also and N N and NH2 N O P O N N O O N O OH OH energy storage (ATP). O O NH O OH O P O O O- in N O O NH2 N O O O HO P O P O P O O O O OH NH2 N cAMP N HO N N O OH OH ATP NH N N N HO O O O OH O P O O 3'-UMP N O 2',3'-AMP O O P O O 12 Finally, nucleotides are frequently constituent of co-enzymes such as NAD. In summary, nature developed highly sophisticated molecules to store the genetic information. At the same time the molecules are used for a plethora of other tasks. NH2 OH OH N N H2N O O O O O P O P O O O N N N O NAD+ OH OH Due to the connection of the nucleotides in DNA and RNA via phosphodiesters, each connecting unit bears a negative charge. DNA and RNA are therefore polyanions. This is important to keep the macromolecule soluble in water. It however causes also an enormous Coulomb repulsion when two single strands come together to form a double strand. For the formation of soluble strands, metal ions are strictly required to compensate the charge. They closely associate with the polyanions to establish electric neutrality. All experiments with DNA and RNA have to be performed under strict control of the ion strength. The amount of salt in solution strongly affects the properties of DNA and RNA. Normally one adds to DNA and RNA solutions about 150 mM NaCl in a buffer buffereing at pH = 7. More salt stabilizes the duplexes. Less salt destabilizes. If the salts are replaced by a polymer, which bears positively charged groups or by long chain tetraalkylammonium salts, than the DNA will become soluble also in organic solvents.[6, 7] The sequence of an oligonucleotide is noted with letters Uridine = U, Thymidine = T, Adenosine = A, Guanosine = G and Cytidine = C. For desoxy one adds the letter d to get: dU, dT, dA, dG and dC. The trinucleotide G-C-U is completely named Guanylyl-3‘,5‘-cyctidylyl-3‘,5‘-uridine, one writes however GpCpU or even shorter GCU with the G forming the 5’-end and the U forming the 3‘-end of the oligonucleotide. We write oligonucleotide sequences always in 5‘ to 3‘-direction. 13 The formation of double strands requires the antiparallel alignment of two single strands. The base pairs have to be complementary so G faces C and A faces T. The Figure below shows an antiparallel double strand forming a right handed helix. It is important to note that the bases are inside the helix stacking on top of each other at a distance of 3.4 Å facing each other. The negatively charged backbone is outside contacted by metal ions. The typical helix has two grooves, a major groove and a minor groove, which are the binding and recognition sides for proteins (major groove) and small molecules (minor groove) 14 1.2.2 Physical properties of nucleobases, nucleosides and nucleotides NH2 N N H N N Adenine O 7 6 N 8 NH2 9 N H N NH 1 5 4 N 2 3 4 N H NH2 Guanine O Cytosine 5 6 O 3 O NH NH 2 N O H 1 Thymine N H O Uracil The H-bond donor and acceptor groups of the nucleobases determine the base pairing and the positions, where proteins are able to bind in a sequence specific manner. The H-bonding strength is controlled by the pKa values of the nucleobases. The pKa -values also show that the bases are neutral under pH = 7 conditions. The pKa-values are summarized in Table 1. Table 1: pKa-values of the nucleobases at 20°C without salt. Base Group Nucleoside 3’-Nucleotide 5’-Nucleotide Adenine N-1 3.52 3.70 3.88 Cytosine N-3 4.17 4.43 4.56 Guanine N-7 3.3 3.5 3.6 Guanine N-1 9.42 9.84 10.00 Thymine N-3 9.93 Uracil N-3 9.38 10.47 9.96 10.06 All bases are under physiological conditions uncharged (5 < pH < 9). This is of course also true for the ribose which’s’ secondary OH groups have a pKa-value of about 12. The three nucleobases A, C, and G are first protonated at the ring nitrogens. The exocylic NH2-groups are not very basic because they delocalize the lone pair of the NH2-group into the aromatic system. The C-NH2 bonds are consequently with about 1.34 Å shortened (bond length: C-N 1.47 Å, C=N 1.25 Å ) The phosphodiester group is negatively charged. (pKa-values for H3PO4 at 25ºC pK1 = 2.16, pK2 = 7.20 und pK3 = 12.33). 15 Nucleobases can exist in many different tautomeric forms. This would be a catastrophe for exact base pairing. The nucleobases A, T, C and G exist to more then 99.99% in the always shown amino- and ketoforms N H O N OH N H NH2 NH Imin Amin Enol Keto N Tautomeric forms of pyridine-2-on and 2-Aminopyridine as examples 1.2.3 H-bonds connect the bases All of the four bases have to be able to form a series of highly specific H-bonds. The NH-groups in the ring systems and the exocyclic NH2-groups are the H-bond donors (d). The keto-groups (C=O) function as H-bond acceptors (a). The energy that keeps two molecules to from an H-bond is by nature electrostatic. The electrostatic charge is about +0.2e at the H and –0.2e at the C=O. A typical Hbond gives about 6-10 kJmol-1 in attractive energy. This energy is however not strongly contributing to the strength of a base pair in a DNA duplex. H-bonds would also be formed between the bases and surrounding water. If you bring two complementary bases together, you loose the H-bonds to water and you gain the energy from the newly formed H-bonds, which is in summary a plus-minus game. For shape reasons, all bases form in the DNA duplex exclusively Watson-Crick Hbonds as shown below. HN H O N H N N r O N H NH N r N H N N r N H H N O N r N N O 16 The distances between the centers of N...O is between 2.8 Å – 2.95 Å. Again for shape reasons it is always a purine base which is pairing with a pyrimidine base. Kool et al developed the base bisfluorpyrimidine and showed with this base behaves like thymine, although the base can not from any H-bonds, showing that H-bonds are not essential for the formation of the double strand.[8-11] F F r In the case of the A--T base pair one finds sometimes in tRNA the reversed WatsonCrick base pairing mode. r H H N O N N N r N N H N O Next to Watson-Crick base pairing, Hoogsteen und reversed Hoogsteen base pairing is an important base pairing mode. The Hoogsteen sites are frequently employed by proteins and small molecules to interact with DNA through the minor or major grooves. N HN H O N N N N r Hoogsteen H N r O N HN H N N O H r N H2 N reverse Hoogsteen H H N O N N N N N r H N H N r O N r O GC+ Hoogsteen Basenpaar In this mode, the adenine is pairing via N-6 and N-7 with the thymine base. After protonation of C to C+ this base can form a Hoogsteen base pair with A. This is in fact an important binding motif found in triple helix complexes. 17 In tRNA a G-U base pair is sometimes formed which is connected through two Hbonds. This base pairing motif is called Wobble-base pair. The bases have to slide al little to form this H-bond. O N H O N r O H N N N r N H2N The H-Bond The H-bond is the most important orienting interaction between molecules in nature. The concept was developed in 1919 by Huggins at the UC Berkeley. The H-bonds is essentially the bonding of a covalently bonded H-atom to another atom. R-X-H + Y-R' R-X-H ---- Y-R' Typical H-donors are: -OH, -NH2, -COOH, -CONH2, NH2CONH2 Typical H-acceptors are O-atoms in alcohols, ethers, C=O systems and N-atoms in Amines and N-heterocycles Strong H-bonds : O-H...O, OH...N, N-H...O Medium H-bonds: N-H...N Weak interactions: Cl2C-H...O, Cl2C-H...N O-H...π-Systeme The attractive force is best described by electrostatic forces (Coulomb force). However, small orbital contributions are always also discussed particularly to explain NMR coupling constants through a H-bond (3HJ-coupling). In general the H-atom carries a positive partial charge and the acceptor hat possesses a negative charge. One therefore expects that H-bonds ar linear with an angle D-H...A of about 180º. Analysis of crystal structure (molecular packing) shows indeed typical angles of 167±20º for O-H...O and 161±20º for O-H...N H-bonds in very good agreement with the expectation. 18 A typical H-bond is a-symmetric. The X-H bond is clearly covalent and the H...Y bond is non-covalent. Because the Coulomb-potential drops only slowly with distance (1/r) H-bond can be far reaching forces. Very recently even the existence of very strong H-bonds, so called low-barrier symmetric H-bonds in biological systems has been discussed. Here the H-atom is symmetrically placed between the donor and the acceptor. It is currently speculated that such strong H-bond may participate in transition state stabilization of enzyme reactions. Normal H-bond and a low barrier hydrogen bond (LBHB) 19 Examples of H-bonds, which fall into the LBHB regime are: F-H...F- , O-H...O- , O+-H...O. In these cases one observes a very short H-acceptor distance of 1.2 Å – 1.5 Å. The distance between the two heteroatoms is reduced to van der Waals distance of 2.5 Å. The energy of the H-bond is >40KJ mol-1 and the angle is exactly 180º. This LBHB has in the IR spectrum vibrational bands <1600 cm-1. The 1H-NMR shows the proton at > 17 ppm. Normal H-bonds from between one donor-H and one acceptor. However, H-bonds are also possible between one D-H and two acceptors. These are called bifurcated H-bonds. A bifurcated H-bond: D H A Normal H-bonds show a rather large variation of angles and distances. The strength of an H-bonds varies also between 3 – 7 kcal/mol in the gas phase and in unpolar media. In water is the strength of a hydrogen bond very low. Here, the solvents competes with the two molecules for binding. The strength of a H-bonds depends on the pKa-values of the two centers involved. At constant donor strength, the basicity of the acceptor is determining At constant acceptor strength, the acidity of the donor is determining. Examples for: HO-H...B B-Species: MeNH2 (-6.8 kcal/mol) > CH3CN (-4.9 kcal/mol) because MeNH2 is the better acceptor (more basic). MeOH (-6.8 kcal/mol) > water (-6.2 kcal/mol) because MeOH has the more basic O. In the series of C=O compounds we get the following order: 20 Urea > N-methylurea > acetic acid > acetic acid methyl ester > acetone Examples for: HO-H...B HOH-Species: All amines are very weak H-bond donors because they are not acidic enough (CH3NH2...OH2: 3.5 kcal/mol). Amides are much better donors because they are more acidic (CH3CONHCH3 ... OH2, -6.7 kcal/mol). Excellent is acetic acid because of the strong acidity of pKa = 4.7 (CH3COOH...OH2 = -8.8 kcal/mol). For more details see the calculated values in the Table below (W. L. Jorgensen calculated in the gas phase using the OPLS-force field) 21 Secundary H-bonds If one compares the four complexes below, measured in chloroform, it becomes clear that the pairing strengths are very different although all complexes are hold together by 3 H-bonds (Δ(ΔG) ≈ 3-4 kcal/mol). H O N N Et Ar H N N N R O N H H N N Bz HN H N H N N Br N 4 N Et O H N N N H H CH3 O N -1 Ka = 1.7 x 10 Lmol H3C HN H N O Ka ca. 104 -105 Lmol-1 H O N R O H N C3H7 N N H N R O H N C3H7 O O -1 -1 Ka = 90 Lmol Ka = 170 Lmol One model explaining these differences stems from W. L. Jorgensen (Yale University). This model is strongly supported by many experiments. All H-bonds are in these complexes approximately equally strong. The primary interaction should therefore lead to equally strong complexes. One has to consider, however, also secondary electrostatic interactions between neighbouring H-bonding centres. Again all H’s carry a positive partial charge while the acceptors bear a negative charge. The model is depicted below : + - + - + - + - - + - + + - - + + - 4 positive i. a. 2 positive i. a. 2 non-positive i.a. 4 non-positive i.a. 22 The model of secondary interactions nicely explains the binding constants found for the three complexes below in Chloroform. O O Ph N H O H N N H N OC3H7 N H N Ph H O OC3H7 (DAD)-(ADA) Ka 78 M-1 N N H HN H N N Ph N N N H H N H N H N H O Ph O C3H7O H Ar OC3H7 N N H Me O (DDA)-(AAD) Ka 9.3 x 103 M-1 CO2C8H17 CO2C8H17 (DDD)-(AAA) Ka > 105 M-1 In DNA, the H-bonds determine the selectivity of base pairing because they contribute to the shape of the nucleobases. Not forming them would cost in addition a significant amount of energy. The stability of the duplex, however, is not determined by the H-bonds. The bases want to stack because of hydrophobic interactions and dispersion forces. Dispersion forces (induced dipole interactions) A dipole in one molecule can induce a dipole in a second molecule. The size of the induced dipole depends on the size of the dipole and the polarizability of the partner UInd.Dipol-Dipol = - 1 ---------4πεo α1 μ22 + α2 μ21 ------------------------r6 α = Polarizability 23 Dispersion forces are attractive forces, which form when an induced dipole induces a dipole in a second molecule. These London dispersion forces are strongly distance dependent (1/r6) The individual forces are small but they sum up to significant contributions when large contact surfaces are present such as in nucleobases stacking on top of each other. These forces are presumably the most important attractive forces of unpolar molecules. In the Lennard Jones potential that described the potential energy of molecules brought together, the first term A describes the repulsive forces when molecules are pressed together. The second term B describes the attractive forces, which for unpolar molecules are the dispersion forces. U = A -------r12 − B ---------r6 The term B for temporary induced dipole dipole interactions can be described by the Slater-Kirkwood equation: B = 3/2 e (h/2πm1/2) αl αj -----------------------------------(ai/Ni)1/2 + (aj/Nj)1/2 In this equation is α the polarizability, e the elementary charge, m the mass of an electron, h the Planck constant, N the effective number of electrons in the outer shell. It is clear that atoms with many valence electrons and increasing polarizability have larger dispersive power. The table below gives some numbers. O is only weakly, CH, -CH2 and -CH3 groups medium and S highly polarizable. 24 25 The hydrophobic interaction If hydrophobic molecules are inserted into water, the water molecules order around the molecule to build a quasi crystalline surface. This is done in order to maximise the hydrogen bonding of the water molecules around a hydrophobic surface. If two hydrophobic molecules meet they will associate with their hydrophobic surfaces towards each other. The water molecules, which were attached to these surfaces are distributed back into the bulk solvent. This generated a favorable entropy. The entropic gain is responsible for almost all associations in the medium water and hence of extreme importance for life (Formation of membranes, micelles, and for protein folding where folding starts often with tryptophane residues forming a hydrophobic core). The Figure below illustrates the hydrophobic effect. 26 One can measure the hydrophobic effect if one observes how an alkane molecule enters a water or a cyclohexane phase from the gas phase. The entropic effect is large in water. The strongly negative entropy is the main reason for the positive free enthalpy ΔG which make the whole process endothermic. The entropic effect compensates even the small enthalpic gain. What is the nature of the small enthalpic gain? The ordered water molecules have not all possible 4 H-bonds established if present in the quasi crystalline surface structures. If they are released into the bulk, they can reform all H-bonds, which gives the required small negative enthalpic contribution. Also the dispersion forces are smaller between water and the hydrophobic molecule compared to two hydrophobic molecules sticking on each other. 27 The hydrophobic character of a molecule is extremely important in pharmaceutical research. It determines if a molecule is able to cross cell membranes. The hydrophobic character of a group is measured by a distribution experiment between n-octanol and water. The hydrophobic character of a molecule can be calculated from the sum of the hydrophobic character of the individual groups. One determines the distribution of a compound H-S in the organic phase and in water Porg/Pwater = Po . Than the same value is obtained for a substituted R-S = P. The hydrophobicity constant for that group R is then defined by π = P/Po The larger P the larger is the solubility in organic media. For new pharmaceuticals the logP-value is today an important value, which can be even calculated for not yet synthesized compounds using special computer programs. The π-values are additive. Every methyl group in a molecule will contribute an increment of 0.5 to the overall characteristics of the molecule at question. One also finds a good correlation between the size of a hydrophobic surface and the enthalpy of transfer into water. 1Å2 surface gives upon coverage a hydrophobic energy of about 20-25 cal/mol. 28 1.3 Parameter which determine the structure of the double helix. The exact orientation of the base relative to the sugar determines the structure of the double helix. These structural aspects are very important because oligonucleotide duplexes can exist in many different global conformations (A, B, and Z). Excurse: Definition of angles in molecules The three dimensional structure of molecules is determined by bond length and bond angles. The torsion angle θ describes the angle of a central bond B-C in the system A-B-C-D. θ Describes the angle between the bonds A-B und C-D if one looks along the central axis B-C as shown in the picture c below. One can look into the direction B-C or C-B. θ is 0° if A-B and C-D are fully eclipsed (cis and coplanar, syn-periplanar). The sign of θ is positive, if the distal bond is turned clockwise away from the proximal bond. The torsion angle is therefore given either as a value between 0° – 360° or as –180° until +180°. We can also derive the torsion angle if we take the relation of the two planes ABC and BCD as shown below in picture d Next to torsion angles we can also describe the position of atoms relative to each other using dieder angles φ. This dieder angles are derived as shown in the picture b below. It is important to distinguish both angles. 29 The conformation of nucleosides (these are the 3’-mono-phosphates of the nucleosides) is described by number of angles. Important for the structure and function of nucleic acids are the torsion angles δ und χ. Important is also the pucker of the sugar. 1.3.1 The angle δ The angle δ is critical for the formation of a double helix. If δ = 60° a double helix will not form. The single strand will in contrast exist as a zic-zac chain. Is δ > 60° then formation of a double helix will occur. In the case of DNA and RNA we find δ = 80° given rise to a helical structure. The group of Prof. Prof. Eschenmoser (ETH Zürich) recently prepared nucleic acids which contain sugars other then D-(-)-ribose. Particularly interesting are nucleic acids constructed from pyranosidic forms of sugars such as the homo-DNA, containing one CH2-group more. The angle δ adopts in these nucleotides indeed a value of 60° forcing the double strand to form a quasi linear pairing system. The six membered ring of the pyranose is conformationally much more restricted and strongly favours the chair conformation. It is therefore the repetition of nucleotidic structural elements and an angle δ of 80°C which finally gives rise to the formation of the α-helix is so typical for oligonucleotides. 1.3.2 The torsion angle χ This angle described the orientation of the base relative to the sugar. All bases have to be in the so called anti conformation to enable efficient base pairing and hence double strand formation. If the bases are rotated around the glycosidic bond by 180° they adopt the so called syn conformation in which base pairing for pyrimidines is impossible and purines have now to use the Hoogsteen mode to pair. This is highly unusual and in nature only realized in Z DNA, in which the dG base adopt the syn conformation. 30 Comparison between DNA and homo-DNA. Description of the angle χ which determines the base pairing characteristics of the bases 31 All canonical bases dA, dC, dG and dT exist in general in the anti conformation required for base pairing. If however the purines are modified at C8, then the synconformation for steric reason will be energetically more favourable. Our genome is constantly damaged and one process that is frequently observed is the oxidation of deoxyguanosine to deoxy-8-oxoguanosine shown below. This base is constantly formed in our genome and the lesion is made responsible for the occurrence of spontaneous mutations and the ageing process. The mutational properties of the base are currently explained with a syn/anti equilibrium around the glycosidic bond. In the anti conformation, the base is pairing with dC. However, it easily rotates into the syn conformation were base pairing occurs with dA. If our genome is copied, such a base will instruct polymerase to insert a wrong dA opposite the lesion with a 50% chance. This is believed to be the basis for many mutation occurring in response to oxidative stress.[12] RO O H O H N H N O N N H N OR N N N H O H OR H O N RO N N O OR O dA 8-oxodG OR OR N N H N N O OR dC 8-oxodG H H N O H N H N Similarly the second major oxidative lesion, the formamidopyrimidine of the dG base, is currently believed to be able to pair with dC and possibly with dT. H H RO O O H N H N O NH HN N NH2 N OR N O O RO FaPydG H N H2N dC N RO O OR O NH NH H O RO O HN O FaPydG OR N O dT OR 32 1.3.3 The sugar pucker The sugar pucker determines the shape of the α-helix, whether the helix will exist in the A-form or in the B-form. The five membered ring is for steric reasons not planar (Pitzer –transannular– tension, because all bond would be ecliptic). One atom or two are turned out of the plane by 50 pm. In the ribofuranose, the plane C1’-O4’-C4’ is fixed. Endo-pucker means that C2’ or C3’ are turned out of this plane into the direction of O5’. Exo-pucker describes a shift in the opposite direction. C2’-endo and C3’-endo are in equilibrium. In RNA we find predominantly the C3’-endo conformation. DNA may adjust and is able to take on both conformations. For an exact nomenclature of sugar puckers see below. Description of the various puckers of the ribose system 33 Today it is clear that DNA adopts a variety of conformations depending on sequence. Sequence also affects the flexibility of the DNA molecules. A:T sequences are more flexible than G:C sequences. These facts are called the secondary genetic code because some proteins seem to bind based on structure and flexibility to DNA and not by contacting DNA in sequence specific manner. In the cell, DNA is tightly packed, which also changes the structure of the DNA molecule dramatically. The DNA duplex is wound around a Histone-octamer. The structure was solved by the Richmond group at ETH Zürich. Packing of DNA around a histone octamer Even the histone is further packed into larger structures called chromatin. Chromatin condensation and decondensation which allows or denies proteins involved in transcription and replication access to DNA are currently active research areas.[13, 14] Packing of the histones 34 1.4 Helical parameters The various double helices are described by three parameters: P: P is the pitch of the helix corresponding to the distance between a base and the base obtained after walking along the DNA one full turn of 360°. n: n is the number on nucleotides within one pitch. h: distances between bases. The winding of double stranded DNA to give a double helix results in the formation of grooves . These grooves are very different for A, B and Z-DNA. The normal B-DNA has two groves. One large major groove and a small minor groove. The H-bond donors and acceptors, which point into these grooves are the recognition sites that allow proteins and small molecules to bind specifically to certain positions in these grooves. All the protein driven gene regulatory functions operate in the major groove. Base pairs and functional groups pointing into the various grooves 35 The position of the base pairs relative to the helix axis are described by another four parameters D: Distance between the centre of the base pair and the helix axis. ΘT: Basepair-tilt. Shift of the base pairs short axis relative to the vertical helix axis. Θr: Basepair-roll, Shift of the base pairs long axis relative to the vertical helix axis. Θp: Propeller-twist, Twist of the bases against each other. 1.5 The conformation of the double strands We know today three main double strand conformations termed A, B, and Z. RNA double strands all adopt an A conformation. DNA exists mainly in the B-conformation. DNA, however, which contains one or a few RNA bases will shift towards an Aconformation. In the A form, the P...P distance is due to the C3’-endo conformation of the ribose reduced from 7.0 Å (B-conformation) to 5.9 Å. This also reduces the distance between the bases from h = 3.30–3.37 Å (B-conformation) to 2.59-3.29 Å in the A duplex. The A structure is slightly unwound with 11-12 nucleotides for every 360º turn (B-conformation: 10 nucleotides). A major difference between both structures is the shift of the base pairs relative to the helix axis. In B-form, the helix axis runs almost straight through the centre of the base pair. (D = -0.2 Å). In the A conformation, however, the centres of the base pairs are strongly shifted away from the helix axis. Here the base pairs wind around the helix axis with D = 4.5 Å. The result is a very small but deep major groove, only accessible to water and metal ions, and a shallow but wide minor. In B-DNA, both 36 grooves are equally deep but the width differs strongly. Both grooves are able to participate in molecule recognition phenomena. The table below summarizes the main data. Z-DNA is not a right handed helix but is left handed. The dG bases exist all in the unusual syn-conformation. 37 B-DNA 38 A-DNA 39 Some additional literature: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] A. M. Jose, G. A. Soukup, R. R. Breaker, Nucleic Acids Res. 2001, 29, 1631. N. A. Winkler WC, Roth A, Collins JA, Breaker RR, 2004, 428, 281. M. Valle, R. Gillet, S. Kaur, A. Henne, V. Ramakrishnan, J. Frank, Science 2003, 300, 127. P. B. Moore, Biochemistry 2001, 40, 3243. P. Schimmel, R. Alexander, Science 1998, 281, 658. K. Tanaka, Y. Okahata, J. Am. Chem. Soc. 1996, 118, 10679. Y. Okahata, T. Kobayashi, K. Tanaka, M. Shimomura, J. Am. Chem. Soc. 1998, 120, 6165. T. J. Matray, E. T. Kool, J. Am. Chem. Soc. 1998, 120, 6191. L. Dzantiev, Y. O. Alekseyev, J. C. Morales, E. T. Kool, L. J. Romano, Biochemistry 2001, 40, 3215. E. T. Kool, Annu. Rev. Biochem. 2002, 71, 191. E. T. Kool, Acc. Chem. Res. 2002, 35, 936. M. Ober, U. Linne, J. Gierlich, T. Carell, Angew. Chem. Int. Ed 2003, 42, 4947. K. Luger, A. W. Mäder, R. K. Richmond, D. F. Sargent, T. J. Richmond, Nature 1997, 389, 251. T. J. Richmond, C. A. Davey, Nature 2003, 423, 145. 40