* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DNA RNA Protein
Bisulfite sequencing wikipedia , lookup
Human genome wikipedia , lookup
Frameshift mutation wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
DNA vaccination wikipedia , lookup
Molecular cloning wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
DNA polymerase wikipedia , lookup
Epigenomics wikipedia , lookup
Microevolution wikipedia , lookup
RNA interference wikipedia , lookup
DNA supercoil wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
History of genetic engineering wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Transfer RNA wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Non-coding DNA wikipedia , lookup
Polyadenylation wikipedia , lookup
Messenger RNA wikipedia , lookup
Point mutation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Expanded genetic code wikipedia , lookup
RNA silencing wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genetic code wikipedia , lookup
History of RNA biology wikipedia , lookup
Non-coding RNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Deoxyribozyme wikipedia , lookup
DNA, RNA, and Protein Central Dogma of Molecular Biology • • • The flow of information in the cell starts at DNA, which replicates to form more DNA. Information is then ‘transcribed” into RNA, and then it is “translated” into protein. The proteins do most of the work in the cell. Information does not flow in the other direction. This is a molecular version of the incorrectness of “inheritance of acquired characteristics”. Changes in proteins do not affect the DNA in a systematic manner (although they can cause random changes in DNA. Vocabulary: – DNA is replicated to make more DNA, using the enzyme DNA polymerase – DNA is transcribed into RNA using RNA polymerase – RNA is translated into protein using ribosomes. Reverse Transcription • • • However, a few exceptions to the Central Dogma exist. Most importantly, some RNA viruses, called retroviruses make a DNA copy of themselves using the enzyme reverse transcriptase. The DNA copy incorporates into one of the chromosomes and becomes a permanent feature of the genome. The DNA copy inserted into the genome is called a “provirus”. This represents a flow of information from RNA to DNA. Closely related to retroviruses are retrotransposons, sequences of DNA on the chromosome that make RNA copies of themselves, which then get reversetranscribed into DNA that inserts into new locations in the genome. Unlike retroviruses, retrotransposons always remain within the cell. They lack genes to make the protein coat that surrounds viruses. Prions • • • • A prion is an “infectious protein”. Prions are the agents that cause mad cow disease (bovine spongiform encephalopathy), chronic wasting disease in deer and elk, scrapie in sheep, and Creutzfeld-Jakob syndrome in humans. These diseases cause neural degeneration. In humans, the symptoms are approximately those of Alzheimer’s syndrome accelerated to go from onset to death in about 1 year. Fortunately, the disease is very hard to catch and very rare, and they usually have a long incubation time. No cure is known, and not enough is known about how it is spread to do a thorough job of preventing it. Avoid eating brains is a good start though. The prion protein (PrP) is normally present in the body. Like all proteins, it is folded into a specific conformation, a state called PrPC. Prion diseases are caused by the same protein folded abnormally, a state called PrPSc. A PrPSc can bind to a normal PrPC protein and convert it to PrPSc. This conversion spreads throughout the body, causing the disease to occur. It is also a form of inheritance that does not involve nucleic acids. Nucleotide Structure • DNA and RNA are macromolecules composed of subunits called nucleotides. • Each nucleotide of DNA or RNA has 3 parts: a nitrogenous base, a sugar, and a phosphate group. • The phosphate group, PO4, links two sugar molecules in the backbone. Each phosphate carries a -1 charge. This causes DNA to have an overall negative charge. • The sugar is ribose in the case of RNA and deoxyribose in the case of DNA. has 5 carbons, numbered 1’ through 5’. – the nitrogenous base is attached to the 1’ carbon – the 2’ carbon has a free -OH group in the case of RNA, but a -H group in the case of DNA. The lack of the oxygen atom makes DNA far less reactive than RNA. – the 3’ carbon has an -OH group on it that links to the phosphate group on the next base. The “end” of the DNA molecule is a free 3’ OH group. – the 5’ carbon is attached to the phosphate group. Nucleotides More Nucleotide Structure • • • • • • There are 4 possible DNA bases). : adenine (A), guanine (G), cytosine (C), and thymine (T) Adenine and guanine are purines: they consist of two linked rings of mixed nitrogen and carbon atoms. Thymine and cytosine are pyrimidines, which consist of a single ring. In RNA, thymine is replaced by uracil (U), which looks like thymine except for a single methyl group. Each strand of DNA pairs with a complementary DNA strand. This pairing happens because each A is paired with a T, and each G is paired with a C. Thus, the information on one DNA strand easily allows the other strand to be deduced. The amount of A in DNA always equals the amount of T, and the amount of G always equals the amount of C. This is not true in RNA, which is usually single-stranded. Pairing is caused by hydrogen bonds, weak links between oxygen and nitrogen atoms where one of them has a hydrogen attached. A-T pairs have 2 hydrogen bonds, while G-C pairs have 3 hydrogen bonds. G-C pairs are stronger, and they are more frequent in high temperature organisms. Paired Nucleotides Replication • • • • Watson and Crick recognized that the double stranded DNA molecule could replicate by unwinding, then synthesizing a new strand for each of the old stands. This mode of replication is called semi-conservative. It means that after one DNA molecule has replicated to become 2 DNA molecules, each new molecule consists of one old strand (from the original molecule) and one new strand. The information from each old strand can be used to create the new strands, since A always pairs with T, and G always pairs with C. DNA replication starts at specific locations origins of replication, and proceeds in both directions. Replication Components • • • • • The raw materials of DNA synthesis are nucleoside triphosphates, often written as dNTPs. dNTPs have a chain of 3 phosphate groups attached to the 5’ carbon of the deoxyribose sugar. Just as with ATP, the bonds between the phosphates are high energy bonds, and releasing them produces the energy needed to drive the synthesis of DNA. Each new nucleotide is added to a growing DNA chain by removing the outer 2 phosphates and attaching the remaining phosphate to the 3’ OH group of the previous nucleotide. The DNA chain is said to grow from 5’ to 3’, which means that the first DNA base has a free 5’ end, with attached phosphates. The last nucleotide has a free 3’ OH group on it. All other bases have their 5’ carbons attached to a phosphate, which is attached to the 3’ OH group of the previous nucleotide. DNA polymerase is the main enzyme used to replicate DNA. However, DNA polymerase is only one enzyme in the replication complex. Several other enzymes are needed to cause replication to occur. Continuous and Discontinuous Synthesis • DNA can only be synthesized from 5’ to 3’, by adding new nucleotides to the 3’ end. • This is a problem, because both strands must be synthesized at the replication fork, and one strand will necessarily be synthesized in the opposite direction from the movement of the replication fork. • In reality, one strand is synthesized continuously, in the same direction that the replication fork is moving. This is called the leading strand. • The other strand is synthesized in short, discontinuous pieces, that are then attached together to form the final DNA strand. This is the lagging strand. Each fragment of the lagging strand is called an Okazaki fragment, and they are synthesized in the opposite direction that the replication fork moves. Discontinuous Synthesis • • • • Another peculiarity of DNA synthesis is that DNA polymerase must attach new bases to the 3’ end of a pre-existing nucleic acid chain. All DNA synthesis starts at a short double-stranded region. In the cell, short pieces of RNA, called primers are paired with the DNA bases to create to the short double stranded regions that DNA synthesis builds. The RNA primers are synthesized by an enzyme called primase, and they are removed by DNA polymerase during the synthesis of the next Okazaki fragment. Joining of the Okazaki fragments is done by the enzyme DNA ligase. RNA • RNA plays a central role in the life of the cell. We are mostly going to look at its role in protein synthesis, but RNA does many other things as well. • RNA can both store information (like DNA) and catalyze chemical reactions (like proteins). • One theory for the origin of life has it starting out as RNA only, then adding DNA and proteins later. This theory is called the “RNA World”. • RNA/protein hybrid structures are involved in protein synthesis (ribosome), splicing of messenger RNA, telomere maintenance, guiding ribosomes to the endoplasmic reticulum, and other tasks. • Recently it has been found that very small RNA molecules are involves in gene regulation. RNA Used in Protein Synthesis • messenger RNA (mRNA). A copy of the gene that is being expressed. Groups of 3 bases in mRNA, called “codons” code for each individual amino acid in the protein made by that gene. – in eukaryotes, the initial RNA copy of the gene is called the “primary transcript”, which is modified to form mRNA. • ribosomal RNA (rRNA). Four different RNA molecules that make up part of the structure of the ribosome. They perform the actual catalysis of adding an amino acid to a growing peptide chain. • transfer RNA (tRNA). Small RNA molecules that act as adapters between the codons of messenger RNA and the amino acids they code for. Transcription • • • Transcription is the process of making an RNA copy of a single gene. Genes are specific regions of the DNA of a chromosome. The enzyme used in transcription is RNA polymerase. The raw materials for the new RNA are the 4 ribonucleoside triphosphates (NTPs): ATP, CTP, GTP, and UTP. – It’s the same ATP as is used for energy in the cell. • • • As with DNA replication, transcription proceeds 5’ to 3’: new bases are added to the free 3’ OH group. Unlike replication, transcription does not need to build on a primer. Instead, transcription starts at the promoter, which is where RNA polymerase binds to the DNA. For proteincoding genes, the promoter is located a few bases 5’ to (upstream from) the first base that is transcribed into RNA. Transcription factors are other proteins that bind to sites near the promoter and help initiate the start of transcription. Process of Transcription • Transcription starts with RNA polymerase binding to the promoter. – • This binding only occurs under some conditions: when the gene is “on”. Various other proteins (transcription factors) help RNA polymerase bind to the promoter. RNA polymerase unwinds a small section of the DNA and uses it as a template to synthesize an exact RNA copy of the DNA strand. – The DNA strand used as a template is the antisense strand; the other strand is the sense strand . The bases of the sense strand are the same as the bases of the RNA. The bases of the antisense strand are complementary to the RNA bases. – Notice that the RNA is made from 5’ end to 3’ end, so the antisense strand is actually read from 3’ to 5’. • • RNA polymerase proceeds down the DNA, synthesizing the RNA copy. In prokaryotes, each RNA ends at a specific terminator sequence. In eukaryotes transcription doesn’t have a definite end point; the RNA is given a definitive termination point during RNA processing. After Transcription • In prokaryotes, the RNA copy of a gene is messenger RNA, ready to be translated into protein. In fact, translation starts even before transcription is finished. • In eukaryotes, the primary RNA transcript of a gene needs further processing before it can be translated. This step is called RNA processing. Also, it needs to be transported out of the nucleus into the cytoplasm. • Steps in RNA processing: – 1. Add a cap to the 5’ end – 2. Add a poly-A tail to the 3’ end – 3. splice out introns. Capping and poly-A Addition • RNA is inherently unstable, especially at the ends. The ends are modified to protect it. • At the 5’ end, a slightly modified guanine (7-methyl G) is attached “backwards”, by a 5’ to 5’ linkage, to the triphosphates of the first transcribed base. • At the 3’ end, the primary transcript RNA is cut at a specific site and 100-200 adenine nucleotides are attached: the poly-A tail. Note that these A’s are not coded in the DNA of the gene. Introns • Introns are regions within a gene that don’t code for protein and don’t appear in the final mRNA molecule. Protein-coding sections of a gene (called exons) are interrupted by introns. • The function of introns remains unclear. They may help is RNA transport or in control of gene expression in some cases, and they may make it easier for sections of genes to be shuffled in evolution. But , no generally accepted reason for the existence of introns exists. • There are a few prokaryotic examples, but most introns are found in eukaryotes. • Some genes have many long introns: the dystrophin gene (mutants cause muscular dystrophy) has more than 70 introns that make up more than 99% of the gene’s sequence. However, not all eukaryotic genes have introns: histone genes, for example, lack introns. Intron Splicing • Introns are removed from the primary RNA transcript while it is still in the nucleus. • Introns are “spliced out” by RNA/protein hybrids called “spliceosomes”. The intron sequences are removed, and the remaining ends are reattached so the final RNA consists of exons only. Summary of RNA processing • • • • • In eukaryotes, RNA polymerase produces a primary transcript, an exact RNA copy of the gene. A cap is put on the 5’ end. The RNA is terminated and poly-A is added to the 3’ end. All introns are spliced out. At this point, the RNA can be called messenger RNA. It is then transported out of the nucleus into the cytoplasm, where it is translated. Proteins • Proteins are composed of one or more polypeptides, plus (in some cases) additional small molecules (cofactors). – Polypeptides are linear chains of amino acids. – The sequence of amino acids in a polypeptide is known as its “primary structure”. • After synthesis, the new polypeptide folds spontaneously into its active configuration and combines with the other necessary subunits to form an active protein. Thus, all the information necessary to produce the protein is contained in the DNA base sequence that codes for the polypeptides. Amino Acids and Peptide Bonds • There are 20 different amino acids coded in DNA. • They all have an amino group (-NH2) group on one end, and an acid group (-COOH) on the other end. Attached to the central carbon is an R group, which differs for each of the different amino acids. • When polypeptides are synthesized, the acid group of one amino acid is attached to the amino group of the next amino acid, forming a peptide bond. Translation • Translation of mRNA into protein is accomplished by the ribosome, an RNA/protein hybrid. Ribosomes are composed of 2 subunits, large and small. Both subunits contain both RNA and polypeptides. • Ribosomes bind to the translation initiation sequence on the mRNA, then move down the RNA in a 5’ to 3’ direction, creating a new polypeptide. • The first amino acid on the polypeptide has a free amino group, so it is called the N-terminus. The last amino acid in a polypeptide has a free acid group, so it is called the C-terminus. • Each group of 3 nucleotides in the mRNA is a codon, which codes for 1 amino acids. Transfer RNA is the adapter between the 3 bases of the codon and the corresponding amino acid. Transfer RNA • • • • Transfer RNA molecules are short RNAs that fold into a characteristic cloverleaf pattern. Some of the nucleotides are modified to become things like pseudouridine and ribothymidine. Each tRNA has 3 bases that make up the anticodon. These bases pair with the 3 bases of the codon on mRNA during translation. Each tRNA has its corresponding amino acid attached to the 3’ end. A set of enzymes, the “aminoacyl tRNA synthetases”, are used to “charge” the tRNA with the proper amino acid. Some tRNAs can pair with more than one codon. The third base of the anticodon is called the “wobble position”, and it can form base pairs with several different nucleotides. Initiation of Translation • In prokaryotes, ribosomes bind to specific translation initiation sites. There can be several different initiation sites on a messenger RNA: a prokaryotic mRNA can code for several different proteins. Translation begins at an AUG codon, or sometimes a GUG or UUG. The modified amino acid N-formyl methionine is always the first amino acid of the new polypeptide. • In eukaryotes, ribosomes bind to the 5’ cap, then move down the mRNA until they reach the first AUG, the codon for methionine. Translation starts from this point. Eukaryotic mRNAs code for only a single gene. (Although there are a few exceptions, mainly among the eukaryotic viruses). • Note that translation does not start at the first base of the mRNA. There is an untranslated region at the beginning of the mRNA, the 5’ untranslated region (5’ UTR). – There is also a 3’ UTR, a region at the end of the mRNA that is not translated. More Initiation • The initiation process involves first joining the mRNA, the initiator methionine-tRNA, and the small ribosomal subunit. Several “initiation factors”--additional proteins--are also involved. The large ribosomal subunit then joins the complex. Elongation • The ribosome has 2 sites for tRNAs, called P and A. The initial tRNA with attached amino acid is in the P site. A new tRNA, corresponding to the next codon on the mRNA, binds to the A site. The ribosome catalyzes a transfer of the amino acid from the P site onto the amino acid at the A site, forming a new peptide bond. • The ribosome then moves down one codon. The now-empty tRNA at the P site is displaced off the ribosome, and the tRNA that has the growing peptide chain on it is moved from the A site to the P site. • The process is then repeated: – the tRNA at the P site holds the peptide chain, and a new tRNA binds to the A site. – the peptide chain is transferred onto the amino acid attached to the A site tRNA. – the ribosome moves down one codon, displacing the empty P site tRNA and moving the tRNA with the peptide chain from the A site to the P site. Elongation Termination • • • Three codons are called “stop codons”. They code for no amino acid, and all protein-coding regions end in a stop codon. When the ribosome reaches a stop codon, there is no tRNA that binds to it. Instead, proteins called “release factors” bind, and cause the ribosome, the mRNA, and the new polypeptide to separate. The new polypeptide is completed. Note that the mRNA continues on past the stop codon. The remaining portion is not translated: it is the 3’ untranslated region (3’ UTR). Post-Translational Modification • New polypeptides usually fold themselves spontaneously into their active conformation. However, some proteins are helped and guided in the folding process by chaperone proteins • Many proteins have sugars, phosphate groups, fatty acids, and other molecules covalently attached to certain amino acids. Most of this is done in the endoplasmic reticulum. • Many proteins are targeted to specific organelles within the cell. Targeting is accomplished through “signal sequences” on the polypeptide. In the case of proteins that go into the endoplasmic reticulum, the signal seqeunce is a group of amino acids at the N terminal of the polypeptide, which are removed from the final protein after translation. The Genetic Code • Each group of 3 nucleotides in the translated part of the mRNA is a codon. Since there are 4 bases, there are 43 = 64 possible codons, which must code for 20 different amino acids. • More than one codon is used for most amino acids: the genetic code is “degenerate”. This means that it is not possible to take a protein sequence and deduce exactly the base sequence of the gene it came from. • In most cases, the third base of the codon (the wobble base) can be altered without changing the amino acid. • AUG is used as the start codon. All proteins are initially translated with methionine in the first position, although it is often removed after translation. There are also internal methionines in most proteins, coded by the same AUG codon. • There are 3 stop codons, also called “nonsense” codons. Proteins end in a stop codon, which codes for no amino acid. More Genetic Code • The genetic code is almost universal. It is used in both prokaryotes and eukaryotes. • However, some variants exist, mostly in mitochondria which have very few genes. • For instance, CUA codes for leucine in the universal code, but in yeast mitochondria it codes for threonine. Similarly, AGA codes for arginine in the universal code, but in human and Drosophila mitochondria it is a stop codon. • There are also a few known variants in the code used in nuclei, mostly among the protists.