Download How do proteins recognize DNA

Document related concepts

Protein–protein interaction wikipedia , lookup

RNA silencing wikipedia , lookup

Biochemistry wikipedia , lookup

Lac operon wikipedia , lookup

Community fingerprinting wikipedia , lookup

Real-time polymerase chain reaction wikipedia , lookup

Metalloprotein wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

DNA supercoil wikipedia , lookup

SR protein wikipedia , lookup

Proteolysis wikipedia , lookup

Paracrine signalling wikipedia , lookup

Nucleosome wikipedia , lookup

Gene regulatory network wikipedia , lookup

Genetic code wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

RNA wikipedia , lookup

Messenger RNA wikipedia , lookup

Polyadenylation wikipedia , lookup

RNA-Seq wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Gene wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Point mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Epitranscriptome wikipedia , lookup

Biosynthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcription factor wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Gene expression wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Eukaryotic transcription wikipedia , lookup

RNA polymerase II holoenzyme wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Transcript
How do proteins recognize DNA
Modes:
Single-stranded Binding Proteins
Double-stranded Binding Proteins
Sequence-Independent Binding Proteins
Sequence-dependent Binding Proteins
Lesion-specific (mismatch, 8-oxoG, thymidine dimers, etc)
Binding Proteins
Information Content:
Major Groove versus Minor Groove
DNA Packaging in Bacteria
Procaryotic HU protein
HU protein (bacteria) is a small (9kDa), basic (cationic), abundant (60,000 copies per
cell) protein with two subunits; HUα and HUβ. HU protein condenses and packages DNA and
regulates DNA replication.
See Hu Pymol Script
DNA Packaging
in Eukaryotes:
Histones and
Nucleosomes
Image from wikipedia
The nucleosome core particle is
around 147 base pairs of DNA
wrapped around a histone
octamer (8 proteins) with 2
copies each of histones H2A,
H2B, H3, and H4. Nucleosome
core particles are connected by
"linker DNA", that is around 80 bp
in length.
See nucleosome Pymol Script
Models for the 30 nm fiber
The genetic switch of the lambda bacteriophages (lambda, 434, P22).
The regulation of phage λ is reasonable well-understood at a mechanistic and
structural level. The phage in a infected bacterium decides between two alternative
lifestyles, the lytic or the lysogenic pathways.
Lysogenic: dormant, integrated into the host bacteria's chromosome
Lytic: lyse the host bacterium - favored under conditions of low temperature,
starvation, high multiplicity of infection, SOS.
The genes that express the proteins Repressor and Cro are on opposite sides of a
common operator region.
Repressor protein stimulates transcription of Repressor and blocks transcription of Cro,
and maintains the lysogenic state.
Cro protein, does the opposite. It stimulates transcription of Cro and blocks transcription
of Repressor and turns the switch to the lytic state.
The operator region contains three three operators (OR1, OR2, OR3). Operators are
DNA sequences. The Repressor and Cro genes are transcribed in opposite directions.
The two promoters (RNA polymerase initiation binding sites) overlap in the central
operator, OR2.
The helix-turn-helix (HTH) DNA binding
motif contains two α helices (helices 3 and
4 on this figure) joined by a short linker.
The HTH motif is seen in Cro, CAP, and λ
repressor. Recognition and binding takes
place in the major groove, Helix 3 (figure)
contributes most to DNA recognition and
is called the "recognition helix". The
recognition helix binds by a combination of
hydrogen bonds and/or van der Waals
interactions with the edges of bases. The
other α helix locks the recognition helix
into position.
See P22 and lambda repressor pymol
script
Transcription Factors
A transcription factor is a protein that binds to specific DNA sequences,
thereby controlling transcription.
Transcription factors act alone or with other proteins, by promoting (activator),
or blocking (repressor) the binding of RNA polymerase to specific genes.
A transcription factors contains one or more DNA-binding domains (DBDs),
which bind to specific sequences of DNA adjacent to the genes that they
regulate.
constitutively-active
conditionally-active
Systematic DNA-Binding Domain Classification of Transcription Factors.
Stegmaier P, Kel AE, Wingender, E Genome Inform 15:276-286 (2004) .
1 Superclass: Basic Domains (Basic-helix-loop-helix)
1.1 Class: Leucine zipper factors (bZIP)
1.1.1 Family: AP-1(-like) components; includes (c-Fos/c-Jun)
1.1.2 Family: CREB
1.1.3 Family: C/EBP-like factors
1.1.4 Family: bZIP / PAR
1.1.5 Family: Plant G-box binding factors
1.1.6 Family: ZIP only
1.2 Class: Helix-loop-helix factors (bHLH)
1.2.1 Family: Ubiquitous (class A) factors
1.2.2 Family: Myogenic transcription factors (MyoD)
1.2.3 Family: Achaete-Scute
1.2.4 Family: Tal/Twist/Atonal/Hen
1.3 Class: Helix-loop-helix / leucine zipper factors (bHLH-ZIP)
1.3.1 Family: Ubiquitous bHLH-ZIP factors; includes USF (USF1, USF2); SREBP (SREBP)
1.3.2 Family: Cell-cycle controlling factors; includes c-Myc
1.4 Class: NF-1
1.4.1 Family: NF-1 (A, B, C, X)
1.5 Class: RF-X
1.5.1 Family: RF-X (1, 2, 3, 4, 5, ANK)
1.6 Class: bHSH
2 Superclass: Zinc-coordinating DNA-binding domains
2.1 Class: Cys4 zinc finger of nuclear receptor type
2.1.1 Family: Steroid hormone receptors
2.1.2 Family: Thyroid hormone receptor-like factors
2.2 Class: diverse Cys4 zinc fingers
2.2.1 Family: GATA-Factors
2.3 Class: Cys2His2 zinc finger domain
2.3.1 Family: Ubiquitous factors, includes TFIIIA, Sp1
2.3.2 Family: Developmental / cell cycle regulators; includes Krüppel
2.3.4 Family: Large factors with NF-6B-like binding properties
2.4 Class: Cys6 cysteine-zinc cluster
2.5 Class: Zinc fingers of alternating composition
3 Superclass: Helix-turn-helix
3.1 Class: Homeo domain
3.1.1 Family: Homeo domain only; includes Ubx
3.1.2 Family: POU domain factors; includes Oct
3.1.3 Family: Homeo domain with LIM region
3.1.4 Family: homeo domain plus zinc finger motifs
3.2 Class: Paired box
3.2.1 Family: Paired plus homeo domain
3.2.2 Family: Paired domain only
3.3 Class: Fork head / winged helix
3.3.1 Family: Developmental regulators; includes forkhead
3.3.2 Family: Tissue-specific regulators
3.3.3 Family: Cell-cycle controlling factors
3.3.0 Family: Other regulators
3.4 Class: Heat Shock Factors
3.4.1 Family: HSF
3.5 Class: Tryptophan clusters
3.5.1 Family: Myb
3.5.2 Family: Ets-type
3.5.3 Family: Interferon regulatory factors
3.6 Class: TEA ( transcriptional enhancer factor) domain
3.6.1 Family: TEA (TEAD1, TEAD2, TEAD3, TEAD4)
4 Superclass: beta-Scaffold Factors with Minor Groove Contacts
4.1 Class: RHR (Rel homology region)
4.1.1 Family: Rel/ankyrin; NF-kappaB
4.1.2 Family: ankyrin only
4.1.3 Family: NFAT (Nuclear Factor of Activated T-cells) (NFATC1, NFATC2, NFATC3)
4.2 Class: STAT
4.2.1 Family: STAT
4.3 Class: p53
4.3.1 Family: p53
4.4 Class: MADS box
4.4.1 Family: Regulators of differentiation; includes (Mef2)
4.4.2 Family: Responders to external signals, SRF (serum response factor) (SRF)
4.5 Class: beta-Barrel alpha-helix transcription factors
4.6 Class: TATA binding proteins
4.6.1 Family: TBP
4.7.1 Family: SOX genes, SRY
4.7.2 Family: TCF-1 (TCF1)
4.7.3 Family: HMG2-related, SSRP1
4.7.5 Family: MATA
4.8 Class: Heteromeric CCAAT factors
4.8.1 Family: Heteromeric CCAAT factors
4.9 Class: Grainyhead
4.9.1 Family: Grainyhead
4.10 Class: Cold-shock domain factors
4.10.1 Family: csd
4.11 Class: Runt
4.11.1 Family: Runt
0 Superclass: Other Transcription Factors
0.1 Class: Copper fist proteins
0.2 Class: HMGI(Y) (HMGA1)
0.2.1 Family: HMGI(Y)
0.3 Class: Pocket domain
0.4 Class: E1A-like factors
0.5 Class: AP2/EREBP-related factors
0.5.1 Family: AP2
0.5.2 Family: EREBP
0.5.3 Superfamily: AP2/B3
0.5.3.1 Family: ARF
0.5.3.2 Family: ABI
0.5.3.3 Family: RAV
The Basic Leucine Zipper Domain
(bZIP domain)
contains basic peptide sequences that
mediate DNA binding and leucine zipper
dimerization region.
GCN4
Fos
Max
Jun
GCN4 AA sequence, b-zip protein
Methyl
GCN4
Show 1YSA vmd
Basic Helix-Loop-Helix
Two α-helical regions connected by a loop.
The smaller helix is the dimerization
region. The larger helix contains basic
amino acid residues that interact with DNA.
bHLH proteins generally bind to a
consensus sequence called an E-box,
CANNTG.
homodimer of Max
Recognizes the same consensus sequence as MyoD (i.e. CAC), but with different AA.
Transcription
During transcription, an RNA complement (a transcript) of a DNA
sequence is synthesized. If the DNA templete (antisense)
sequence is '5 ...GGGCATT... 3', then the RNA transcript has
sequence 5' ...AAUGCCC... 3'.
http://biology.unm.edu/ccouncil/Biology_124/Summaries/T&T.html
Transcription
A DNA’ transcription unit’ can contain (1) the sequence that will eventually be directly translated into the
protein (the coding sequence).
(2) Introns - that will be removed by splicing.
(3) regulatory sequences - that direct and regulate the synthesis of
the protein. The regulatory sequence before (upstream from) the
coding sequence is called the 5'UTR and followin (downstream
from) the coding sequence is called the (3'UTR). UTRs can
contain riboswitches, etc.
All RNA is made by transcription. There are many types of RNA produced by transcription.
1)  Messenger RNAs (mRNA) are coding RNAs. mRNAs carry information contained within DNA
to the ribosome, where they direct the sequence of amino acids during protein synthesis, according to
the mRNA sequence and the 'genetic code'. The sequence of codons (nucleotide triplets) in an mRNA
determines the amino acid sequence in a protein. Some mRNAs contain cis regulatory elements, such
as riboswitches, in the untranslated regions (either 3' or 5' UTRs).
2)  Ribosomal RNAs (rRNA) are structural and catalytic components of the ribosome, the large
RNA-protein assembly where protein is synthesized in all living systems. In the ribosome, amino
acids are transfered from tRNAs to a nascent (growing) polypeptide chain, with the amino acid
sequence controlled by the mRNA. The peptidyl transferase center, which is the catalytic site of the
ribosome, is all rRNA. So technically the ribosome is a ribozyme, not a protein enzyme.
3)  Transfer RNAs (tRNA) are RNAs that become covalently linked to amino acids (activating the
amino acids). tRNAs contain anti-codons that interact with condons on mRNAs. tRNAs transfer
amino acids to a nascent polypeptide chain in the ribosome. The covalent linkage between a given
amino acid and the correct (cognate) tRNA is catalyzed by a specific aminoacyl-tRNA synthetase
(one for each amino acid). The aminoacyl-tRNA synthetases establish and enforce the genetic code. 4)  Regulatory RNAs
MicroRNAs (miRNAs) are ~22 nucleotides in length that down regulate and silencing of gene
expression (mRNA degradation & sequestering and translational suppression)
CRISPR RNAs These work in the Prokaryotic immune system
RNA polymerase (RNAP) is an enzyme that produces RNA
using DNA as a template. RNAP is essential to modern life
and is found in all living systems. RNAP is a nucleotidyl transferase that adds a ribonucleotide to
the 3' hydroxyl group of an RNA molecule. The reaction is
driven by release of PPi hydrolysis of PPi to Pi + Pi. RNAP
can initiate without a primer.   α: yellow and green
  β: cyan
  β : pink
  ω: gray
Bacterial RNA polymerase:
Taq RNAP core enzyme
* α: two α subunits help with assembly and bind with regulatory elements.
* β: the polymerase activity (catalyzes the synthesis of RNA).
* β': binds to DNA (nonspecifically).
* ω: promotes assembly
The RNAP is large (~450 kDa). The core enzyme is α2ββ ω (5 subunits)
* α: two α subunits help with assembly and bind with regulatory elements. * β: the polymerase activity (catalyzes the synthesis of RNA).
* β': binds to DNA (nonspecifically).
* ω: promotes assembly
Only one strand of DNA is transcribed (unlike replication). The
sense strand has the same sequence as the transcribed RNA.
The antisense strand is the DNA template. Bacterial genes are found in operons. The transcription of many genes
with related functions can be controlled by a single control elements.
An operon is a cluster of genes under the control of a single
regulatory signal or promoter. Eukaryotic genes are controlled
independently (generally).
Steps in transcription
  (1) Initiation
Starts at the +1 position, usually a purine. RNAP does not require a primer. Initiation
involves a DNA Promoter, Transcription Factors, DNA Helicases, RNAP, Activators
and Repressors. RNAP binds very tightly to the promotor (KD=10-14M ).
A promoter is a DNA sequence from -1 to around -40 (i.e., on the 5 side of
the sense strand of gene). RNAP binds to the Pribnow box during initiation
which is the first region where base pairs are disrupted. The Pribnow box
(TATAAT) is the most conserved part of the promoter.
Expression of various genes (or operons) is controlled by various σ factors
that recognize the -10 to -35 region. Different σ factors recognize different
sequences. The α subunit recognizes an upstream element (-40 to -70 base
pairs, TTGACA) of the DNA.
Steps in transcription
 (2) Promoter clearance
After initiation the RNAP has a tendency to release truncated RNA transcripts
(abortive initiation). Abortive initiation continues to occur until the σ factor
rearranges and is released.
Steps in transcription
 (3) Elongation
RNA polymerase traverses the template (antisense) strand, and following the rules
of Watson-Crick complementarity with the antisense strand, creates an RNA copy of
the sense (coding) strand. Polymerization is processive (without dissociation).
Transcripts can be thousands or even millions of nucleotides. The rate of
polymerization is around 50 nucleotides/second, slower than replication. The error
rate of transcription is around 1 in 4000. RNA polymerase traverses the template
strand from 3' → 5'. Polymerization occurs in the 5' → 3' direction. The resulting
RNA transcript is a copy of the sense (coding, non-template) strand, except that
thymines are replaced with uracils, and deoxyriboses are replaced by riboses. A
second RNAP can quickly reinitiate from the same site.
Topology issues during elongation.
topoisomerase I
re-initiation
Figure 26-7
gyrase
Steps in transcription
  (4) Termination
Transcription terminates at specific sites. Bacteria use two strategies for
termination.
In Rho-independent termination, the newly synthesized RNA molecule forms a G-Crich stem-loop followed by a run of A's and U's. It seems the stem loop causes the
RNAP to pause and ultimately to dissociate.
In the "Rho-dependent" termination, a protein "Rho" destabilizes the interaction
between the template and the mRNA, thus releasing the newly synthesized mRNA
from the elongation complex.
Rho-independent termination at a GC-rich
palindrome.
Figure 26-9a
rho Figure 26-10
Eukaryotic Transcription:
Multiple RNAPs for different kinds of RNAs. Eukaryotes
have more complicated polymerases and control mechanisms
RNAPs + control proteins = ~100 proteins for recognition and
initiation. RNAP I: rRNAs (except 5S rRNA), requires promoters and
upstream promoters. (between -107 and -187)
RNAP II: mRNAs. long diverse promoters. Around ¼ of human
genes are regulated by a TATA box (position -27). TATA is
observed in both archaea and eukarya and is thought to be
evolutionarily ancient. Enhancers: promoters that are remote from the start site.
RNAP III: 5S rRNA, tRNA, small RNAs
Core promoter sequences. A given promoter can contain all, some,
or none of these.
Figure 26-14
Transcription Factors
TBP: (part of TFIID) Figure 26-15
Figure 26-17b
Transcription Factors
Figure 26-17a
yeast RNAP II
  [bacteria homolog: α: yellow and green]
  Rpb2 (forms clamp, wall) [bacteria homolog: β: cyan]
  Rpb1, catalysis [bacteria homolog: β : pink]
  [bacteria homolog: ω: gray]
yeast RNAP II
Rpb1 C-terminal domain: heptapeptide repeats
-(Pro-Thr-Ser-Pro-Ser-Try-Ser)26(yeast)-52(mammel)-
unphosphorylated: initiation
phosphorylated: elongation
Figure 26-11
Trapped elongation (minus
rU))
RNAP II
Zinc Fingers
- small (~25 aa), independently-folding motifs that coordinate zinc ions with some
combination of cysteine and histidine residues,
-  bind to DNA, RNA, proteins, or small molecules.
-  at least 1000 mamalian zinc figure proteins.
Type by zinc coordination: Cys2His2, Cys4, Cys6, etc.
Type by protein fold: classic zinc finger, treble clef, and zinc ribbon.
Cys2His2 (most frequent)
β-strand – turn – β-strand – turn – α-helix
(F/Y)-X-C-X2-C-X3-(F/Y)-X5-y-X2-H-X3-H, where X represents any amino acid and y
is a hydrophobic residue,
Pavletich NP, Pabo CO (1991) Zinc
Finger-DNA Recognition: Crystal
Structure of a Zif268-DNA Complex at
2.1 A. Science 252:809-817.
Amino acid sequence of zinc finger domains of Zif 268
Basic structure predicted by Klug group from sequence and zinc requirement.
The classic zinc finger motif