Download RNA structure

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsatellite wikipedia , lookup

Replisome wikipedia , lookup

DNA nanotechnology wikipedia , lookup

Helicase wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
CS 177
DNA
RNA
Mutations
Amino acids,
protein structure
DNA, RNA, protein overview
DNA, RNA, protein overview
Questions about the genome in an organism:
How much DNA, how many nucleotides?
How many genes are there?
What types of proteins appear to be coded by these genes?
Questions about the proteome:
What proteins are present?
DNA
RNA
Mutations
Amino acids,
protein structure
Where are they?
When are they present - under what
conditions?
DNA, RNA, protein overview
Lecture 2
* DNA and its components
* RNA and its components
* Mutations
* Amino acids, review of protein structure
DNA
RNA
Mutations
Amino acids,
protein structure
DNA overview
DNA
deoxyribonucleic acid
4 bases
Pyrimidine (C4N2H4)
A = Adenine
Purine (C5N4H4)
T = Thymine
C = Cytosine
G = Guanine
Nucleoside
Nucleotide
base + sugar (deoxyribose)
base + sugar
O
--
- PO
O
P4 O
OH
DNA
RNA
Mutations
Amino acids,
protein structure
5’ CH2
O
O
4’
1’
H
H
3’
OH
H
H
2’
H
Numbering of carbons?
sugar
+ phosphate
Linking nucleotides
3’
5’
3’
Hydrogen bonds
Linking nucleotides:
3’
N-H------N
3’
3’
N-H------O
The 3’-OH of one nucleotide is
linked to the 5’-phosphate of
the next
nucleotide
What
next?
3’
Thymine
3’
2nm
Adenine
3’
3’
Cytosine
DNA
RNA
3’
Mutations
5’
Amino acids,
protein structure
3’
Guanine
Base pairing
3’
5’
A
T
3’
Base pairing (Watson-Crick):
C
3’
A/T (2 hydrogen bonds)
G
G/C (3 hydrogen bonds)
3’
Always pairing a purine and a
pyrimidine yields a constant width
A
3’
T
3’
DNA base composition:
A + G = T + C (Chargaff’s rule)
T
3’
A
3’
DNA
RNA
C
3’
Mutations
G
5’
Amino acids,
protein structure
3’
DNA conventions
1. DNA is a right-handed helix
DNA
RNA
Mutations
Amino acids,
protein structure
DNA conventions
1. DNA is a right-handed helix
2. The 5’ end is to the left by convention
5’ -ATCGCAATCAGCTAGGTT- 3’
sense (forward)
3’ -TAGCGTTAGTCGATCCAA- 5’
antisense (reverse)
3’ -TAGCGTTAGTCGATCCAA- 5’
5’ -ATCGCAATCAGCTAGGTT- 3’
DNA
Amino acids,
protein structure
3
’
T
A
G
C
G
T
T
A
G
T
C
G
A
T
C
C
A
A
5
’
Mutations
5
’
A
T
C
G
C
A
A
T
C
A
G
C
T
A
G
G
T
T
3
’
RNA
DNA structure
Some more facts:
1. Forces stabilizing DNA structure: Watson-Crick-H-bonding and base stacking
(planar aromatic bases overlap geometrically and electronically  energy gain)
2. Genomic DNAs are large molecules:
Eschericia coli: 4.7 x 106 bp; ~ 1 mm contour length
Human: 3.2 x 109 bp; ~ 1 m contour length
3. Some DNA molecules (plasmids) are circular and have no free ends:
mtDNA
bacterial DNA (only one circular chromosome)
4. Average gene of 1000 bp can code for average protein of about 330 amino acids
5. Percentage of non-coding DNA varies greatly among organisms
Organism
DNA
RNA
Mutations
Amino acids,
protein structure
small virus
‘typical’ virus
bacterium
yeast
human
amphibians
plants
# Base pairs
# Genes
4 x 103
3
3x
5 x 106
1 x 107
3.2 x 109
< 80 x 109
< 900 x 109
Non-coding DNA
105
3000
6000
30,000?
very little
200
10 - 20%
> 50%
99%
?
?
23,000 - >50,000
> 99%
very little
RNA structure
RNA
3 major types of RNA
messenger RNA (mRNA); template for protein synthesis
transfer RNA (tRNA); adaptor molecules that decode the genetic code
ribosomal RNA (rRNA); catalyzing the synthesis of proteins
ribonucleic acid
4 bases
Pyrimidine (C4N2H4)
A = Adenine
Purine (C5N4H4)
U = Uracil
C = Cytosine
G = Guanine
Thymine (DNA)
Nucleoside
base
Uracil (RNA)
Nucleotide
+ sugar (ribose)
base + sugar
O
--
DNA
OH
RNA
5’ CH2
Mutations
Amino acids,
protein structure
- PO
O
P4 O
O
O
4’
1’
H
H
3’
OH
H
H
2’
OH
sugar
+ phosphate
Base interactions in RNA
Base pairing:
U/A/(T) (2 hydrogen bonds)
G/C
(3 hydrogen bonds)
RNA base composition:
A+G/
=U+C
Chargaff’s rule does not apply (RNA usually prevails as single strand)
RNA structure:
- usually single stranded
DNA
RNA
Mutations
Amino acids,
protein structure
- many self-complementary regions  RNA commonly exhibits an intricate secondary structure
(relatively short, double helical segments alternated with single stranded regions)
- complex tertiary interactions fold the RNA in its final three dimensional form
- the folded RNA molecule is stabilized by interactions (e.g. hydrogen bonds and base stacking)
RNA structure
Primary structure
A) single stranded regions
formed by unpaired nucleotides
Secondary structure
B) duplex
double helical RNA (A-form with 11 bp per turn)
C
C) hairpin
duplex bridged by a loop of unpaired nucleotides
D) internal loop
D
nucleotides not forming Watson-Crick base pairs
E
DNA
RNA
Mutations
Amino acids,
protein structure
F
G
E) bulge loop
unpaired nucleotides in one strand,
other strand has contiguous base pairing
F) junction
B
A
three or more duplexes separated by single
stranded regions
G) pseudoknot
tertiary interaction between bases of hairpin loop
and outside bases
RNA structure
Primary structure
Secondary structure
Tertiary structure
C
D
E
DNA
RNA
Mutations
Amino acids,
protein structure
F
B
A
G
RNA structure
How to predict RNA secondary/tertiary structure?
Probing RNA structure experimentally:
- physical methods (single crystal X-ray diffraction, electron microscopy)
- chemical and enzymatic methods
- mutational analysis (introduction of specific mutations to test change in some
function or protein-RNA interaction)
Thermodynamic prediction of RNA structure:
- RNA molecules comply to the laws of thermodynamics, therefore it should be
possible to deduce RNA structure from its sequence by finding the conformation
with the lowest free energy
- Pros: only one sequence required; no difficult experiments; does not rely on
alignments
- Cons: thermodynamic data experimentally determined, but not always accurate;
possible interactions of RNA with solvent, ions, and proteins
Comparative determination of RNA structure:
DNA
RNA
Mutations
Amino acids,
protein structure
- basic assumption: secondary structure of a functional RNA will be conserved in the
evolution of the molecule (at least more conserved than the primary structure);
when a set of homologous sequences has a certain structure in common, this structure can
be deduced by comparing the structures possible from their sequences
- Pros: very powerful in finding secondary structure, relatively easy to use, only sequences
required, not affected by interactions of the RNA and other molecules
- Cons: large number of sequences to study preferred, structure constrains in fully conserved
regions cannot be inferred, extremely variable regions cause problems with alignment
Amino acids/proteins
The “central dogma” of modern biology: DNA  RNA  protein
Getting from DNA to protein:
Two parts: 1. Transcription in which a short portion of chromosomal DNA is used to
make a RNA molecule small enough to leave the nucleus.
2. Translation in which the RNA code is used to assemble the protein at the
ribosome
The genetic code
- The code problem: 4 nucleotides in RNA, but 20 amino acids in proteins
- Bases are read in groups of 3 (= a codon)
- The code consists of 64 codons (43 = 64)
- All codons are used in protein synthesis:
- 20 amino acids
- 3 stop codons
- AUG (methionine) is the start codon (also used internally)
DNA
RNA
Mutations
Amino acids,
protein structure
- The code is non-overlapping and punctuation-free
- The code is degenerate (but NOT ambiguous): each amino acid is specified by at
least one codon
- The code is universal (virtually all organisms use the same code)
The genetic code
Base 2
T
C
Phenylalanine
F
T
Leucine
L
A
Leucine
L
Isoleucine
I
Mutations
Amino acids,
protein structure
Valine
V
Cysteine
C
STOP
Proline
P
Threonine
T
DNA
G
Tyrosine
Y
Serine
S
Methionine M
RNA
G
Alanine
A
Histidine
H
Glutamine
Q
STOP
T
C
A
Tryptophan
W
G
Arginine
R
Asparagine
N
Serine
S
Lysine
K
Arginine
R
Aspartate
B
Glutamate
Z
Glycine
G
T
C
A
G
T
C
A
G
T
C
A
G
In-class exercise
1. Which amino acids are
specified by single codons?
methionine and tryptophan
Base 3
Base 1
C
A
2. How many amino acids
are specified by the first
two nucleotides only?
five: proline, threonine,
valine, alanine, glycine
3. What is the RNA code for
the start codon?
AUG
Amino acids
G
Hydrophobic
A
V
L
I
DNA
RNA
Mutations
Amino acids,
protein structure
M
F
W
P
Amino acids
Hydrophyllic
S
T
C
Y
N
Q
K
R
H
DNA
RNA
Mutations
Amino acids,
protein structure
D
E
Reading frames
Reading frame (also open reading frame):
The stretch of triplet sequence of DNA that potentially encodes
a protein. The reading frame is designated by the initiation or
start codon and is terminated by a stop codon.
- a reading frame is not always easily recognizable
- each strand of RNA/DNA has three possible starting
points (position one, two, or three):
Position 1
CAG AUG AGG UCA GGC AUA
gln met arg ser gly ile
Position 2
C AGA UGA GGU CAG GCA UA
arg trp gly gln ala
Position 3
CA GAU GAG GUC AGG CAU A
asp glu val arg his
- mutations within an open reading frame that delete or add nucleotides can disrupt
the reading frame (frameshift mutation):
DNA
RNA
Wildtype
gln met arg
Mutations
Amino acids,
protein structure
CAG AUG AGG UCA GGC AUA GAG
Mutant
ser gly
ile
glu
CAG AUG AGU CAG GCA UAG AG
gln met ser
gln
ala
Up to 30% of mutations
causing humane disease
are due to premature
termination of translation
(nonsense mutations or
frameshift)
Mutations
Mutation: any heritable change in DNA
Sources of mutation:
Spontaneous mutations: mutations occur for unknown reasons
Induced mutations: exposure to substance (mutagen) known to cause mutations,
e.g. X-rays, UV light, free radicals
Mutations may influence one or several base pairs
a) Nucleotide substitutions (point mutation)
1) Transitions (Pu  Pu; Py  Py)
2) Transversions (Pu  Py)
In-class exercise
How many transition and transversion
events are possible?
2 transitions:
T  C; A  G
4 transversions: T  A; T  G
C  A; C  G
one to many bases can be involved
frequently associated with repeated sequences (“hot spots”)
lead to frameshift in protein-coding genes, except when N = 3X
also caused by insertion of transposable elements into genes
b) Insertion or deletion (“indels”)
DNA
-
RNA
Mutations
Amino acids,
protein structure
“Weighting” of mutation events plays important role for phylogenetic analyses
(model of sequence evolution)
Mutations
Mutations may influence phenotype
a) Silent (or synonymous) substitution
-
nucleotide substitution without amino acid change
no effect on phenotype
mostly third codon position
other possible silent substitutions: changes in non-coding DNA
b) Replacement substitution
- causes amino acid change
- neutral: protein still functions normally
- missense: protein loses some functions (e.g. sickle cell anemia: mutation in ß-globin)
c) Sense/nonsense substitution
- sense: involves a change from a termination codon
to one that codes for an amino acid
- nonsense: creates premature termination codon
Mutation rates
DNA
RNA
Mutations
Amino acids,
protein structure
= a measure of the frequency of a given mutation per generation
-
mutation rates are usually given for specific loci (e.g. sickle cell anemia)
the rate of nucleotide substitutions in humans is on the order of 1 per 100,000,000
range varies from 1 in 10,000 to 1 in 10,000,000,000
every human has about 30 new mutations involving nucleotide substitutions
mutation rate is about twice as high in male as in female meiosis
Mutations
A single amino acid substitution in a protein causes sickle-cell disease
DNA
RNA
Mutations
Amino acids,
protein structure
Review of protein structure
DNA
RNA
Mutations
Amino acids,
protein structure
Making a polypeptide chain
Review of protein structure
Primary structure
Proteins are chains of amino acids joined by peptide bonds
Polypeptide chain
The structure of two amid acids
The N-C-C sequence is repeated throughout the protein, forming the backbone
DNA
RNA
Mutations
Amino acids,
protein structure
The bonds on each side of the C atom are free to rotate within spatial constrains,
the angles of these bonds determine the conformation of the protein backbone
The R side chains also play an important structural role
Review of protein structure
Secondary structure:
Interactions that occur between the C=O and N-H groups on amino acids
Much of the protein core comprises  helices and  sheets, folded into a threedimensional configuration:
-
regular patterns of H bonds are formed between neighboring amino acids
the amino acids have similar angles
the formation of these structures neutralizes the polar groups on each amino acid
the secondary structures are tightly packed in a hydrophobic environment
Each R side group has a limited volume to occupy and a limited number of interactions
with other R side groups
 helix
DNA
RNA
Mutations
Amino acids,
protein structure
 sheet
Secondary structure
Other Secondary structure elements
(no standardized classification)
- random coil
- loop
- others (e.g. 310 helix, -hairpin, paperclip)
Super-secondary structure
DNA
RNA
Mutations
Amino acids,
protein structure
- In addition to secondary structure elements that apply to all proteins
(e.g. helix, sheet) there are some simple structural motifs in some proteins
- These super-secondary structures (e.g. transmembrane domains, coiled
coils, helix-turn-helix, signal peptides) can give important hints about
protein function
Secondary structure
Structural classification of proteins (SCOP)
DNA
RNA
Mutations
Amino acids,
protein structure
Class 1: mainly alpha
Class 2: mainly beta
Class 3: alpha/beta
Class 4: few secondary structures
Secondary structure
Alternative SCOP
DNA
RNA
Mutations
Amino acids,
protein structure
Class  : only  helices
Class  : antiparallel  sheets
Class / : mainly  sheets
with intervening  helices
Class + : mainly
segregated  helices with
antiparallel  sheets
Membrane structure:
hydrophobic  helices with
membrane bilayers
Multidomain: contain
more than one class
Review of protein structure
Q: If we have all the Psi and Phi angles in a protein, do we then have enough
information to describe the 3-D structure?
A: No, because the detailed packing of the amino acid side chains is not
revealed from this information. However, the Psi and Phi angles do
determine the entire secondary structure of a protein
DNA
RNA
Mutations
Amino acids,
protein structure
Tertiary structure
Tertiary structure
The tertiary structure describes the organization in three dimensions
of all the atoms in the polypeptide
The tertiary structure is determined by a combination of different types of bonding
(covalent bonds, ionic bonds, h-bonding, hydrophobic interactions, Van der Waal’s forces)
between the side chains
Many of these bonds are very week and easy to break, but hundreds or thousands working
together give the protein structure great stability
DNA
RNA
Mutations
Amino acids,
protein structure
If a protein consists of only one polypeptide chain, this level then describes the
complete structure
Tertiary structure
Proteins can be divided into two general classes based on their tertiary structure:
- Fibrous proteins have elongated structure with the polypeptide chains arranged
in long strands. This class of proteins serves as major structural component of cells
Examples: silk, keratin, collagen
- Globular proteins have more
compact, often irregular structures.
This class of proteins includes most
enzymes and most proteins involved
in gene expression and regulation
DNA
RNA
Mutations
Amino acids,
protein structure
Quaternary structure
The quaternary structure defines the conformation assumed by a multimeric protein.
The individual polypeptide chains that make up a multimeric protein are often referred to
as protein subunits. Subunits are joined by ionic, H and hydrophobic interactions
Example:
Haemoglobin
(4 subunits)
DNA
RNA
Mutations
Amino acids,
protein structure
Structure displays
Common displays are (among others) cartoon, spacefill, and backbone
cartoon
DNA
RNA
Mutations
Amino acids,
protein structure
spacefill
backbone
Summary protein structure
Primary structure:
Sequence of amino acids
Secondary structure:
Interactions that occur between
the C=O and N-H groups on amino acids
Tertiary structure:
Organization in three dimensions of all the atoms in the
polypeptide
Quaternary structure:
DNA
Conformation assumed by a multimeric protein
RNA
Mutations
Amino acids,
protein structure
The four levels of protein structure are hierarchical:
each level of the build process is dependent upon the one below it
Next week
First quiz
Lecture 1
- Bioinformatics definitions
- The human genome project
Lecture 2
- DNA structure
- RNA structure
- Mutations
- Amino acids
- Proteins
DNA
RNA
Mutations
Amino acids,
protein structure