Download DNA RNA Protein

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bisulfite sequencing wikipedia , lookup

Human genome wikipedia , lookup

Frameshift mutation wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

DNA vaccination wikipedia , lookup

Molecular cloning wikipedia , lookup

Genomics wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

DNA polymerase wikipedia , lookup

Epigenomics wikipedia , lookup

Microevolution wikipedia , lookup

RNA interference wikipedia , lookup

DNA supercoil wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Epigenetics of human development wikipedia , lookup

History of genetic engineering wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Transfer RNA wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Non-coding DNA wikipedia , lookup

Polyadenylation wikipedia , lookup

Messenger RNA wikipedia , lookup

Gene wikipedia , lookup

RNA world wikipedia , lookup

Point mutation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Expanded genetic code wikipedia , lookup

Replisome wikipedia , lookup

RNA silencing wikipedia , lookup

RNA-Seq wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic code wikipedia , lookup

History of RNA biology wikipedia , lookup

Non-coding RNA wikipedia , lookup

Epitranscriptome wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Primary transcript wikipedia , lookup

Transcript
DNA, RNA, and Protein
Central Dogma of Molecular Biology
•
•
•
The flow of information in the cell
starts at DNA, which replicates to
form more DNA. Information is then
‘transcribed” into RNA, and then it is
“translated” into protein. The
proteins do most of the work in the
cell.
Information does not flow in the
other direction. This is a molecular
version of the incorrectness of
“inheritance of acquired
characteristics”. Changes in
proteins do not affect the DNA in a
systematic manner (although they
can cause random changes in DNA.
Vocabulary:
– DNA is replicated to make more
DNA, using the enzyme DNA
polymerase
– DNA is transcribed into RNA using
RNA polymerase
– RNA is translated into protein using
ribosomes.
Reverse Transcription
•
•
•
However, a few exceptions to the Central
Dogma exist.
Most importantly, some RNA viruses, called
retroviruses make a DNA copy of
themselves using the enzyme reverse
transcriptase. The DNA copy incorporates
into one of the chromosomes and becomes a
permanent feature of the genome. The DNA
copy inserted into the genome is called a
“provirus”. This represents a flow of
information from RNA to DNA.
Closely related to retroviruses are
retrotransposons, sequences of DNA on
the chromosome that make RNA copies of
themselves, which then get reversetranscribed into DNA that inserts into new
locations in the genome. Unlike retroviruses,
retrotransposons always remain within the
cell. They lack genes to make the protein
coat that surrounds viruses.
Prions
•
•
•
•
A prion is an “infectious protein”.
Prions are the agents that cause mad cow disease (bovine
spongiform encephalopathy), chronic wasting disease in deer and
elk, scrapie in sheep, and Creutzfeld-Jakob syndrome in humans.
These diseases cause neural degeneration. In humans, the
symptoms are approximately those of Alzheimer’s syndrome
accelerated to go from onset to death in about 1 year. Fortunately,
the disease is very hard to catch and very rare, and they usually have
a long incubation time. No cure is known, and not enough is known
about how it is spread to do a thorough job of preventing it. Avoid
eating brains is a good start though.
The prion protein (PrP) is normally present in the body. Like all
proteins, it is folded into a specific conformation, a state called PrPC.
Prion diseases are caused by the same protein folded abnormally, a
state called PrPSc. A PrPSc can bind to a normal PrPC protein and
convert it to PrPSc. This conversion spreads throughout the body,
causing the disease to occur. It is also a form of inheritance that
does not involve nucleic acids.
Nucleotide Structure
• DNA and RNA are macromolecules composed of subunits called
nucleotides.
• Each nucleotide of DNA or RNA has 3 parts: a nitrogenous base, a
sugar, and a phosphate group.
• The phosphate group, PO4, links two sugar molecules in the
backbone. Each phosphate carries a -1 charge. This causes DNA
to have an overall negative charge.
• The sugar is ribose in the case of RNA and deoxyribose in the
case of DNA. has 5 carbons, numbered 1’ through 5’.
– the nitrogenous base is attached to the 1’ carbon
– the 2’ carbon has a free -OH group in the case of RNA, but a -H group
in the case of DNA. The lack of the oxygen atom makes DNA far less
reactive than RNA.
– the 3’ carbon has an -OH group on it that links to the phosphate group
on the next base. The “end” of the DNA molecule is a free 3’ OH group.
– the 5’ carbon is attached to the phosphate group.
Nucleotides
More Nucleotide Structure
•
•
•
•
•
•
There are 4 possible DNA bases). : adenine (A),
guanine (G), cytosine (C), and thymine (T)
Adenine and guanine are purines: they consist of
two linked rings of mixed nitrogen and carbon
atoms.
Thymine and cytosine are pyrimidines, which
consist of a single ring. In RNA, thymine is replaced
by uracil (U), which looks like thymine except for a
single methyl group.
Each strand of DNA pairs with a complementary
DNA strand. This pairing happens because each A
is paired with a T, and each G is paired with a C.
Thus, the information on one DNA strand easily
allows the other strand to be deduced. The amount
of A in DNA always equals the amount of T, and the
amount of G always equals the amount of C. This is
not true in RNA, which is usually single-stranded.
Pairing is caused by hydrogen bonds, weak links
between oxygen and nitrogen atoms where one of
them has a hydrogen attached.
A-T pairs have 2 hydrogen bonds, while G-C pairs
have 3 hydrogen bonds. G-C pairs are stronger,
and they are more frequent in high temperature
organisms.
Paired Nucleotides
Replication
•
•
•
•
Watson and Crick recognized that the
double stranded DNA molecule could
replicate by unwinding, then
synthesizing a new strand for each of
the old stands.
This mode of replication is called
semi-conservative. It means that
after one DNA molecule has replicated
to become 2 DNA molecules, each
new molecule consists of one old
strand (from the original molecule) and
one new strand.
The information from each old strand
can be used to create the new strands,
since A always pairs with T, and G
always pairs with C.
DNA replication starts at specific
locations origins of replication, and
proceeds in both directions.
Replication Components
•
•
•
•
•
The raw materials of DNA synthesis are
nucleoside triphosphates, often written as
dNTPs.
dNTPs have a chain of 3 phosphate groups
attached to the 5’ carbon of the deoxyribose
sugar. Just as with ATP, the bonds between
the phosphates are high energy bonds, and
releasing them produces the energy needed
to drive the synthesis of DNA.
Each new nucleotide is added to a growing
DNA chain by removing the outer 2
phosphates and attaching the remaining
phosphate to the 3’ OH group of the previous
nucleotide.
The DNA chain is said to grow from 5’ to 3’,
which means that the first DNA base has a
free 5’ end, with attached phosphates. The
last nucleotide has a free 3’ OH group on it.
All other bases have their 5’ carbons attached
to a phosphate, which is attached to the 3’ OH
group of the previous nucleotide.
DNA polymerase is the main enzyme used to
replicate DNA. However, DNA polymerase is
only one enzyme in the replication complex.
Several other enzymes are needed to cause
replication to occur.
Continuous and Discontinuous
Synthesis
• DNA can only be synthesized from 5’ to 3’, by
adding new nucleotides to the 3’ end.
• This is a problem, because both strands must be
synthesized at the replication fork, and one strand
will necessarily be synthesized in the opposite
direction from the movement of the replication fork.
• In reality, one strand is synthesized continuously, in
the same direction that the replication fork is moving.
This is called the leading strand.
• The other strand is synthesized in short,
discontinuous pieces, that are then attached
together to form the final DNA strand. This is the
lagging strand. Each fragment of the lagging
strand is called an Okazaki fragment, and they are
synthesized in the opposite direction that the
replication fork moves.
Discontinuous Synthesis
•
•
•
•
Another peculiarity of DNA
synthesis is that DNA polymerase
must attach new bases to the 3’
end of a pre-existing nucleic acid
chain. All DNA synthesis starts at
a short double-stranded region.
In the cell, short pieces of RNA,
called primers are paired with the
DNA bases to create to the short
double stranded regions that DNA
synthesis builds.
The RNA primers are synthesized
by an enzyme called primase,
and they are removed by DNA
polymerase during the synthesis
of the next Okazaki fragment.
Joining of the Okazaki fragments
is done by the enzyme DNA
ligase.
RNA
• RNA plays a central role in the life of
the cell. We are mostly going to look at
its role in protein synthesis, but RNA
does many other things as well.
• RNA can both store information (like
DNA) and catalyze chemical reactions
(like proteins).
• One theory for the origin of life has it
starting out as RNA only, then adding
DNA and proteins later. This theory is
called the “RNA World”.
• RNA/protein hybrid structures are
involved in protein synthesis
(ribosome), splicing of messenger
RNA, telomere maintenance, guiding
ribosomes to the endoplasmic
reticulum, and other tasks.
• Recently it has been found that very
small RNA molecules are involves in
gene regulation.
RNA Used in Protein Synthesis
• messenger RNA (mRNA). A copy of the gene that is
being expressed. Groups of 3 bases in mRNA, called
“codons” code for each individual amino acid in the
protein made by that gene.
– in eukaryotes, the initial RNA copy of the gene is called the
“primary transcript”, which is modified to form mRNA.
• ribosomal RNA (rRNA). Four different RNA molecules
that make up part of the structure of the ribosome. They
perform the actual catalysis of adding an amino acid to a
growing peptide chain.
• transfer RNA (tRNA). Small RNA molecules that act as
adapters between the codons of messenger RNA and
the amino acids they code for.
Transcription
•
•
•
Transcription is the process of making an
RNA copy of a single gene. Genes are specific
regions of the DNA of a chromosome.
The enzyme used in transcription is RNA
polymerase.
The raw materials for the new RNA are the 4
ribonucleoside triphosphates (NTPs): ATP,
CTP, GTP, and UTP.
– It’s the same ATP as is used for energy in the
cell.
•
•
•
As with DNA replication, transcription proceeds
5’ to 3’: new bases are added to the free 3’ OH
group.
Unlike replication, transcription does not need
to build on a primer. Instead, transcription
starts at the promoter, which is where RNA
polymerase binds to the DNA. For proteincoding genes, the promoter is located a few
bases 5’ to (upstream from) the first base that
is transcribed into RNA.
Transcription factors are other proteins that
bind to sites near the promoter and help initiate
the start of transcription.
Process of Transcription
•
Transcription starts with RNA polymerase
binding to the promoter.
–
•
This binding only occurs under some conditions: when
the gene is “on”. Various other proteins (transcription
factors) help RNA polymerase bind to the promoter.
RNA polymerase unwinds a small section of
the DNA and uses it as a template to
synthesize an exact RNA copy of the DNA
strand.
– The DNA strand used as a template is the
antisense strand; the other strand is the sense
strand . The bases of the sense strand are the
same as the bases of the RNA. The bases of
the antisense strand are complementary to the
RNA bases.
– Notice that the RNA is made from 5’ end to 3’
end, so the antisense strand is actually read
from 3’ to 5’.
•
•
RNA polymerase proceeds down the DNA,
synthesizing the RNA copy.
In prokaryotes, each RNA ends at a specific
terminator sequence. In eukaryotes
transcription doesn’t have a definite end point;
the RNA is given a definitive termination point
during RNA processing.
After Transcription
• In prokaryotes, the RNA copy of a gene is messenger
RNA, ready to be translated into protein. In fact,
translation starts even before transcription is finished.
• In eukaryotes, the primary RNA transcript of a gene
needs further processing before it can be translated.
This step is called RNA processing. Also, it needs to be
transported out of the nucleus into the cytoplasm.
• Steps in RNA processing:
– 1. Add a cap to the 5’ end
– 2. Add a poly-A tail to the 3’ end
– 3. splice out introns.
Capping and poly-A Addition
• RNA is inherently unstable,
especially at the ends. The
ends are modified to protect it.
• At the 5’ end, a slightly
modified guanine (7-methyl G)
is attached “backwards”, by a
5’ to 5’ linkage, to the
triphosphates of the first
transcribed base.
• At the 3’ end, the primary
transcript RNA is cut at a
specific site and 100-200
adenine nucleotides are
attached: the poly-A tail. Note
that these A’s are not coded in
the DNA of the gene.
Introns
• Introns are regions within a gene that don’t code for protein and
don’t appear in the final mRNA molecule. Protein-coding sections of
a gene (called exons) are interrupted by introns.
• The function of introns remains unclear. They may help is RNA
transport or in control of gene expression in some cases, and they
may make it easier for sections of genes to be shuffled in evolution.
But , no generally accepted reason for the existence of introns
exists.
• There are a few prokaryotic examples, but most introns are found in
eukaryotes.
• Some genes have many long introns: the dystrophin gene (mutants
cause muscular dystrophy) has more than 70 introns that make up
more than 99% of the gene’s sequence. However, not all eukaryotic
genes have introns: histone genes, for example, lack introns.
Intron Splicing
• Introns are removed from
the primary RNA
transcript while it is still in
the nucleus.
• Introns are “spliced out”
by RNA/protein hybrids
called “spliceosomes”.
The intron sequences are
removed, and the
remaining ends are reattached so the final RNA
consists of exons only.
Summary of RNA processing
•
•
•
•
•
In eukaryotes, RNA polymerase produces a primary transcript, an exact RNA copy
of the gene.
A cap is put on the 5’ end.
The RNA is terminated and poly-A is added to the 3’ end.
All introns are spliced out.
At this point, the RNA can be called messenger RNA. It is then transported out of the
nucleus into the cytoplasm, where it is translated.
Proteins
• Proteins are composed of one or more polypeptides,
plus (in some cases) additional small molecules (cofactors).
– Polypeptides are linear chains of amino acids.
– The sequence of amino acids in a polypeptide is known as its
“primary structure”.
• After synthesis, the new polypeptide folds spontaneously
into its active configuration and combines with the other
necessary subunits to form an active protein. Thus, all
the information necessary to produce the protein is
contained in the DNA base sequence that codes for the
polypeptides.
Amino Acids and Peptide Bonds
• There are 20 different amino
acids coded in DNA.
• They all have an amino group
(-NH2) group on one end, and
an acid group (-COOH) on the
other end. Attached to the
central carbon is an R group,
which differs for each of the
different amino acids.
• When polypeptides are
synthesized, the acid group of
one amino acid is attached to
the amino group of the next
amino acid, forming a peptide
bond.
Translation
• Translation of mRNA into protein is accomplished by
the ribosome, an RNA/protein hybrid. Ribosomes are
composed of 2 subunits, large and small. Both subunits
contain both RNA and polypeptides.
• Ribosomes bind to the translation initiation sequence on
the mRNA, then move down the RNA in a 5’ to 3’
direction, creating a new polypeptide.
• The first amino acid on the polypeptide has a free amino
group, so it is called the N-terminus. The last amino
acid in a polypeptide has a free acid group, so it is called
the C-terminus.
• Each group of 3 nucleotides in the mRNA is a codon,
which codes for 1 amino acids. Transfer RNA is the
adapter between the 3 bases of the codon and the
corresponding amino acid.
Transfer RNA
•
•
•
•
Transfer RNA molecules are short RNAs
that fold into a characteristic cloverleaf
pattern. Some of the nucleotides are
modified to become things like
pseudouridine and ribothymidine.
Each tRNA has 3 bases that make up the
anticodon. These bases pair with the 3
bases of the codon on mRNA during
translation.
Each tRNA has its corresponding amino
acid attached to the 3’ end. A set of
enzymes, the “aminoacyl tRNA
synthetases”, are used to “charge” the
tRNA with the proper amino acid.
Some tRNAs can pair with more than one
codon. The third base of the anticodon is
called the “wobble position”, and it can form
base pairs with several different
nucleotides.
Initiation of Translation
• In prokaryotes, ribosomes bind to specific translation
initiation sites. There can be several different initiation
sites on a messenger RNA: a prokaryotic mRNA can
code for several different proteins. Translation begins at
an AUG codon, or sometimes a GUG or UUG. The
modified amino acid N-formyl methionine is always the
first amino acid of the new polypeptide.
• In eukaryotes, ribosomes bind to the 5’ cap, then move
down the mRNA until they reach the first AUG, the
codon for methionine. Translation starts from this point.
Eukaryotic mRNAs code for only a single gene.
(Although there are a few exceptions, mainly among the
eukaryotic viruses).
• Note that translation does not start at the first base of the
mRNA. There is an untranslated region at the beginning
of the mRNA, the 5’ untranslated region (5’ UTR).
– There is also a 3’ UTR, a region at the end of the mRNA that is
not translated.
More Initiation
• The initiation process
involves first joining the
mRNA, the initiator
methionine-tRNA, and the
small ribosomal subunit.
Several “initiation
factors”--additional
proteins--are also
involved. The large
ribosomal subunit then
joins the complex.
Elongation
• The ribosome has 2 sites for tRNAs, called P and A. The initial
tRNA with attached amino acid is in the P site. A new tRNA,
corresponding to the next codon on the mRNA, binds to the A site.
The ribosome catalyzes a transfer of the amino acid from the P site
onto the amino acid at the A site, forming a new peptide bond.
• The ribosome then moves down one codon. The now-empty tRNA
at the P site is displaced off the ribosome, and the tRNA that has the
growing peptide chain on it is moved from the A site to the P site.
• The process is then repeated:
– the tRNA at the P site holds the peptide chain, and a new tRNA binds to
the A site.
– the peptide chain is transferred onto the amino acid attached to the A
site tRNA.
– the ribosome moves down one codon, displacing the empty P site tRNA
and moving the tRNA with the peptide chain from the A site to the P
site.
Elongation
Termination
•
•
•
Three codons are called “stop
codons”. They code for no amino
acid, and all protein-coding
regions end in a stop codon.
When the ribosome reaches a
stop codon, there is no tRNA that
binds to it. Instead, proteins
called “release factors” bind, and
cause the ribosome, the mRNA,
and the new polypeptide to
separate. The new polypeptide is
completed.
Note that the mRNA continues on
past the stop codon. The
remaining portion is not translated:
it is the 3’ untranslated region (3’
UTR).
Post-Translational Modification
• New polypeptides usually fold themselves spontaneously
into their active conformation. However, some proteins
are helped and guided in the folding process by
chaperone proteins
• Many proteins have sugars, phosphate groups, fatty
acids, and other molecules covalently attached to certain
amino acids. Most of this is done in the endoplasmic
reticulum.
• Many proteins are targeted to specific organelles within
the cell. Targeting is accomplished through “signal
sequences” on the polypeptide. In the case of proteins
that go into the endoplasmic reticulum, the signal
seqeunce is a group of amino acids at the N terminal of
the polypeptide, which are removed from the final protein
after translation.
The Genetic Code
• Each group of 3 nucleotides in the translated part of the mRNA is
a codon. Since there are 4 bases, there are 43 = 64 possible
codons, which must code for 20 different amino acids.
• More than one codon is used for most amino acids: the genetic
code is “degenerate”. This means that it is not possible to take a
protein sequence and deduce exactly the base sequence of the
gene it came from.
• In most cases, the third base of the codon (the wobble base) can
be altered without changing the amino acid.
• AUG is used as the start codon. All proteins are initially
translated with methionine in the first position, although it is often
removed after translation. There are also internal methionines in
most proteins, coded by the same AUG codon.
• There are 3 stop codons, also called “nonsense” codons.
Proteins end in a stop codon, which codes for no amino acid.
More Genetic Code
• The genetic code is almost universal. It is used
in both prokaryotes and eukaryotes.
• However, some variants exist, mostly in
mitochondria which have very few genes.
• For instance, CUA codes for leucine in the
universal code, but in yeast mitochondria it
codes for threonine. Similarly, AGA codes for
arginine in the universal code, but in human and
Drosophila mitochondria it is a stop codon.
• There are also a few known variants in the code
used in nuclei, mostly among the protists.