Download Nucleic Acids and Chromatin

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Oncogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

DNA repair wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

DNA profiling wikipedia , lookup

DNA wikipedia , lookup

Histone acetyltransferase wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Mutagen wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

DNA methylation wikipedia , lookup

DNA polymerase wikipedia , lookup

Designer baby wikipedia , lookup

Chromosome wikipedia , lookup

Human genome wikipedia , lookup

SNP genotyping wikipedia , lookup

Epigenetics wikipedia , lookup

Genomic library wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Replisome wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Epigenetics in stem-cell differentiation wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Molecular cloning wikipedia , lookup

DNA vaccination wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Genomics wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Genome editing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Cancer epigenetics wikipedia , lookup

DNA supercoil wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Microsatellite wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Primary transcript wikipedia , lookup

History of genetic engineering wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Helitron (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Nucleosome wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Epigenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Nucleic Acids and Chromatin
John O. Thomas
Objectives:
- Use the principles of nucleic acid biology to be able to select the most appropriate diagnostic
test for your patient and interpret the test results in light of limitations of the test.
- Understand how proteins can interact strongly with specific nucleotide sequences and be able
to apply these general principles to an understanding of gene expression.
- Understand how chromosomal structure and chromatin structure affect gene expression.
Understand how modification of chromatin structure can lead to epigenetic inheritance.
Supplementary materials: These can be found on the Molecular basis of Medicine web site,
I. Nucleic acid structure
A. Chemical structure and nomenclature of the nucleotides.
1. DNA and RNA are polymers of nucleotides (polynucleotides). Nucleotides contain a
base, a sugar and a phosphate.
a. The base is either a purine (A & G), or a pyrimidine (T & C for DNA or U & C for
RNA). In many cases the bases contain chemical modifications which may affect
their function. Some of these are discussed below and in later lectures.
b. The sugar is either ribose in the case of RNA or 2' deoxyribose in the case of DNA.
The carbons of the sugar are numbered with primes (1' to 5'). The base is
connected to the sugar through an N-glycosidic linkage with the 1' position.
c. A phosphate is joined, through a phosphoester bond, to the 5' position of the sugar.
d. The 2' OH of RNA can, like the serine-OH, function as a catalytic center. Two
important consequences are 1) that RNA is much more susceptible to hydrolysis than
DNA and 2) some RNAs catalyze biologically important reactions.
2. Nucleotides are joined together by phosphodiester bonds.
a. Usually the bonds are between the 5' and 3' positions. Thus, polynucleotides have a
polarity and one can refer to a 5' to 3' direction or a 3' to 5' direction. For linear (as
opposed to circular) polynucleotides, one can also refer to 3' and 5' ends.
b. Other phosphodiester linkages are also possible. For example, a 2'-5'
phosphodiester is formed as an intermediate in RNA splicing, and the RNA cap
structure of eukaryotic mRNA and snRNA contain nucleotides linked 5'-5'. These
will be discussed in the lectures on transcription.
3. The length of a polynucleotide is measured as the number of bases or base pairs (b or
bp). DNAs and RNAs may contain thousands to millions of bases or base pairs, in
which case their sizes are expressed as kilo- or mega- bases or base pairs (kb or Mb).
4. Nucleotide sequences can be written in several ways. Often the nucleotides are
represented by single letters (A,C,G,T or U) denoting the bases and p for phosphates.
Unless otherwise indicated, the sequence is written from 5' to 3' (left to right):
pppApCpGpT (5' triphosphate, 3' OH);
pApCpGpT (5' phosphate, 3'OH);
ApCpGpTp (5' OH, 3' phosphate).
Usually, the presence of the phosphodiesters is not written, as for example: pACGT.
Usually, when writing sequences of double stranded DNAs, the sequence of only one
of the two strands is written with it being understood that the second strand has the
1
complementary sequence. Unless otherwise noted, the sequence is written in the 5' to 3'
direction:
ACGT refers to the sequence: 5'ACGT3'
3'TGCA5'
II. Some examples of nucleic acids, and their relative sizes (for general information; not to be
learned).
DNA
E. coli chromosomal DNA
E. coli plasmid
human genome
human chromosome
human mitochondrial DNA
adenovirus DNA
Form
circular
circular
linear
linear
circular
linear
Approx. base pairs
4 x 106
1-200 x 103
3 x 109
50-250 x 106
20 x 103
36 x 103
RNA
messenger RNA (mRNA)
primary mRNA transcript (pre-mRNA)
ribosomal RNA (5S,5.8S,18S,28S)
Transfer RNA
Small nuclear RNA
Polio virus RNA (genome)
HIV RNA
Approx size (kb)
~2
0.2-30
0.12 - 5.1
0.08-0.1
0.1-0.2
7.44
9.7
Approx. length
1.5 mm
0.3-70 µm
1.7-8.5 cm
7 µm
12 µm
III. DNA based diagnostics.
A. Hybridization
1. The two strands of a DNA double helix can be separated by heating a solution containing
the DNA. The transition from double stranded DNA to single stranded DNA occurs over
a temperature range of a few degrees. The midpoint of this transition is the melting
temperature, abbreviated as Tm.
2. The melting temperature, Tm, is largely dependent on the number of hydrogen bonds that
hold the two DNA strands together.
a. A double helix formed between two strands that are not perfectly complementary in
sequence has a lower Tm than a double helix formed between perfectly complementary
strands
b. DNA with a high G:C content has a higher Tm than DNA with high A:T content.
c. If reagents that disrupt hydrogen bonds (such as urea or formamide) are added to the
solution of the DNA, the Tm is lowered.
d. Lowering the ionic strength of the solution of DNA lowers the Tm. This is because
the repulsive forces of the negatively charged phosphates are decreased by counter
ions.
e. Extremes of pH, disrupt the hydrogen bonds and hence convert double stranded
DNA to single strands.
2
3. When a solution of denatured DNA is cooled. Complementary bases form base pairs.
a. If the solution is cooled very slowly, the original duplex structure will reform since it is
thermodynamically the most stable state.
b. If the solution is cooled rapidly, the original duplex does not reform: base pairing
between short complementary sequences on the same strand takes place before the
complementary strands have a chance to find each other.
100
% double helix
heating
+
slow
cooling
rapid
cooling
slow
cooling
50
rapid
cooling
0
80
85
90
Tm
Temperature oC
Denaturation an renaturation of DNA by heating. The Tm is the temperature at which half of the
nucleotides are base-paired.
4. Hybrid DNA molecules can be formed by denaturing DNA (the target DNA) then
renaturing it in the presence of a single-stranded competing DNA.
a. Typically the target DNA is a chromosome or PCR product (see below).
b. Typically the competing DNA is a synthetic oligonucleotide with a sequence that is
complementary to a specific region of the target DNA. The oligonucleotides are usually
about 20 nucleotides long. Based on probability, oligonucleotides of this length are
likely to hybridize to just one location within the 3 X 109 base pairs of the human
genome (there are 420 , about 3 X 1012, possible 20 nucleotide sequences).
B. Fluorescence In situ hybridization (FISH) is used to detect anomalies in the number or
structure of chromosomes. It is widely used in prenatal diagnosis and in the diagnosis of
cancers. With this technique, individual chromosomes can be identified and the
position(s) of particular DNA sequences on a chromosome can be observed.
1. Procedure:
a. Obtain a DNA that is homologous to a chromosomal region of interest and label it
with a fluorescent dye. Fluorescent DNAs that are homologous to specific regions
of the chromosomes are commercially available (e.g. see www.vysis.com).
a. Mount chromosomes (or nuclei) on a microscope slide.
b. Denature the DNA (it remains attached to the slide).
c. Renature the DNA in the presence of a probe DNA.
d. The probe DNA is a fluorescently labeled DNA that is homologous to the
chromosomal region of interest. FISH probes are typically quite large, limiting
resolution to large genetic changes; point mutations can not be detected. Fluorescent
3
DNA probes that are homologous to specific regions of the chromosomes are
commercially available (e.g. see www.vysis.com).
e. Observe the slide by fluorescence microscopy. The probes that are bound to the DNA
will be observed as colored regions.
2. Applications (these will be discussed in greater detail in the cytogenetics lectures):
a. Prenatal diagnosis of disorders such as trisomy 21 (Down syndrome). FISH is
especially useful when a prompt diagnosis is of importance.
b. Diagnosing the presence of deletions, insertions or rearrangements. To be visible by
FISH, these abnormalities must be large, on the order of thousands of base-pairs.
c. Diagnosis of chromosomal abnormalities that are common in some types of cancer.
This is used in diagnosis and to monitor the progress of chemotherapy.
C. Polymerase Chain Reaction (PCR) is an elegant method for amplifying a defined region
of DNA (or RNA). Although it is simple, it is a very powerful tool that is used widely in
diagnostics. Central to the process of PCR is DNA polymerase, the enzyme that
synthesizes DNA. PCR depends on the fact that DNA polymerases, are only capable of
elongating a preexisting polynucleotide or oligonucleotide chain. They can not initiate
polymerization. An oligonucleotide that serves as the starting point for elongation is
referred to as a primer. (The biological functions of the DNA polymerases will be
discussed in detail in lectures on DNA synthesis and repair.)
Steps in the PCR procedure:
1. Set up:
Step 1. Purchase two synthetic oligonucleotide primers with sequences such that:
- The primers will hybridize to sequences that flank the region to be amplified
- The primers will hybridize with opposite strands.
Oligonucleotides of any desired sequence can be automatically synthesized by
machines. A primer is typically about 20 nucleotides long, so that its sequence is
likely to be present at only one location in the entire genome.
Step 2. Mix the DNA to be amplified with a large molar excess of the primers, the four
deoxynucleoside triphosphates, heat-stable DNA polymerase and buffer.
2. Steps of an automated process. The following steps are repeated many times. For n cycles,
a 2n fold amplification of the DNA will result; 30 cycles will produce a billion fold
amplification.
Step 1. The DNA strands are separated by heating
Step 2. The solution is cooled to allow the primers to hybridize with the DNA
Step 3. The heat stable DNA polymerase synthesizes complementary strands by
extending the primers.
4
3. Steps in the automated process:
First step 1: unwind by heating
5'....__________________________________....3'
Second step 1: again unwind by heating
5'....__________________________________....3'
+
3'....__________________________________....5'
+
3'....__________________________________....5'
First step 2: hybridize primer by cooling
5'....__________________________________....3'
primerB5'
+
5'primerA
3'....__________________________________....5'
+
5'______________________________....3'
+
3'...._______________________5'
First step 3: polymerase extends primer
5'....__________________________________....3'
3'....___________________________ primerB5'
After nth step 3 this will be the major product
5'primerA __________________ 3'
3'___________________ primerB5'
+
5'primerA_____________________________....3'
3'....__________________________________....5'
D. Electrophoresis is commonly used to separate DNAs and RNAs on the basis of their size
and/or shape. Nucleic acids ranging from mononucleotides to entire chromosomes can be
analyzed by electrophoresis.
1. The electrophoretic mobility is dependent on:
a. Size of the nucleic acid.
b. Conformation (single stranded, double stranded, circular, supercoiled).
c. Conditions of electrophoresis (e.g. porosity of the media; polyacrylamide is used for
oligonucleotides and DNAs up to about 500 bp, agarose is used for larger DNAs.
Specialized electrophoretic methods can be used to separate very big molecules such as
chromosomes).
2. Example: detection of the cystic fibrosis ∆F508 allele in heterozygous or homozygous
patients.
E. Blotting. By combining electrophoresis with hybridization, it is possible to identify one
particular DNA in a complex mixture of DNAs.
1. DNA that is present in the electrophoresis gel is transferred to the surface of a paper-like
substrate to which it binds. This is easy to handle and allows oligonucleotides and other
chemicals to have easy access to the DNA. An oligonucleotide that is complementary to a
sequence of interest is then added and hybridized (denaturation and renaturation) with the
targeted DNA sequence. The oligonucleotide, referred to as a probe, is long enough so
that its sequence is likely to be present at only one location in the entire genome, (typically
18-20 nucleotides). The oligonucleotide is labeled (by radioactivity or color) so that it can
be detected.
2. If DNA is the molecule that has been electrophoresed and transferred to the paper, the
blotting and hybridization procedure is known as a Southern Blot.
3. If RNA is electrophoresed and transferred to the paper, the procedure is known as a
northern blot.
4. When proteins are separated by electrophoresis, transferred to paper, and detected by
5
reaction with a specific antibody, the procedure is known as a western blot (discussed in
the Proteomics lectures).
F. Allele Specific Oligonucleotides (ASOs) are used in conjunction with PCR to
specifically detect a particular allele.
1. An ASO is an oligonucleotide, typically about 18 base pairs long, with a sequence that is
complementary to the DNA sequence of the allele to be detected.
a. ASOs are usually used in pairs, one ASO being complementary to the normal allele,
and the other being complementary to the variant allele that one wishes to detect.
ASOs of any sequence can be synthesized and labeled with radioactivity or a chemical
tag so that it can be detected.
b. A region surrounding and including the mutation to be analyzed is amplified by PCR to
provide sufficient DNA for the test.
c. Each allele is detected by hybridizing the PCR-amplified DNA with the ASO for the
allele of interest under conditions such that the ASO binds only to a perfectly
complementary sequence, but not to a sequence with a mismatched base pair (high
stringency hybridization).
2. Example: Detection of a cystic fibrosis point mutation (also see the example in the
“courseware” section of the course web site.
3. A serious drawback to the use of ASOs is that each ASO will detect only one allele. The
suspected disorder will be missed if it is due to a mutation that is different than the
specific mutations that are examined.
a. For diseases due to genes that have one or only a few alleles in the population, ASO
testing can provide a powerful screening method (for cystic fibrosis screening, the
American College of Medical Genetics recommends a panel of 25 ASOs
corresponding to the 25 most common mutations).
b. For diseases due to genes that have many alleles (such as the familial
hypercholesterolemia), screening by ASO testing is not practical.
G. DNA arrays are currently used primarily for analyzing patterns of gene expression. For
example, the amount of each of thousands of specific mRNAs that are made by a cancer cell
can be compared to the amount made by a normal cell.
1. DNA arrays contain thousands of DNA sequences mounted on a substrate (such as a
microscope slide or silicon chip). The DNA sequences can be in the form of:
a. Small dots of cDNA clones of known genes attached to a microscope slide.
b. Oligonucleotides that are synthesized directly on a silicon matrix (about 300,000
sequences on a 1.28 X 1.28 cm array).
2. How DNA arrays are used for analysis of gene expression is illustrated in the following
figure of a DNA array containing ten DNAs. In practice, a DNA array would contain
thousands of DNAs. mRNA is isolated from a normal cell type and then converted to
fluorescently labeled cDNA (several enzymatic steps). mRNA fom the cell type to be
compared to normal is isolated and converted to cDNA of a contrasting fluorescent
color. The two cDNAs are mixed together, hybridized to the DNAs on the matrix. Each
spot on the matrix corresponds to a particular mRNA. The resulting fluorescent color of
the spot reflects the relative concentrations of that mRNA that are present in the normal
cell vs variant cell.
6
Biopsy
Normal Cells
Tumor cells
mRNA
mRNA
red cDNA
green cDNA
Mix and hybridize
with DNA array
οοοοο
οοοοο
Results:
•••••
•••••
Red spots:
genes that are
under expressed
in the tumor.
Yellow spots:
(most of them)
equally expressed
genes.
Green spots:
genes that are
over expressed
in the tumor.
V. Nucleic acid - protein interactions. A large number of proteins interact with nucleic acids.
These interactions are essential for the proper expression of the information that is encoded by
DNA and mRNA and for the functions of other RNAs such as rRNA, tRNA and snRNA.
A. Some proteins bind to DNA and RNA with little sequence specificity. Proteins such as the
histones and viral nucleic acid packaging proteins function to condense or package DNA.
Proteins such as the single stranded DNA binding proteins that are involved in DNA
synthesis and in recombination also interact with little sequence specificity.
B. Some proteins recognize DNA or RNA sequences with a high degree of specificity.
Examples of these are proteins that control the expression of genes by binding to specific
DNA sequences. As an example of specificity, the E. coli lac repressor protein binds to a
28 bp DNA sequence that must be distinguished from the other four million base pairs of
E. coli DNA. It does this by binding 4 million times stronger to its target sequence than to
any other region of the DNA. The dissociation constant of the complex is about 10-13M
7
C. The structures of many DNA-protein complexes have been determined by x-ray
crystallography, and several general patterns for how proteins interact with nucleic acids
have emerged from these studies. You may view and manipulate 3-D models of some of
these structures in the tutorial in the “courseware” section of the course web site.
1. Ionic interactions with the phosphate backbone add stability to the DNA-protein
complex, but do not confer specificity.
2. Specificity in DNA-protein interactions is usually achieved by recognizing
combinations of sequence specific atoms present in the major groove. The minor
groove is too small to allow for the recognition of base-specific features by most
proteins. Some proteins, however, have the ability to enlarge the minor groove and
bind to it by causing a bend in the DNA.
You should be able to identify the major groove and the minor groove in the above
picture and explain why the two grooves are different sizes (observe the positions of
the N-glycosidic bonds in the A:T and C:G pairs shown in the figure below). You
should also be able to identify the phosphates, sugars and bases.
3. When DNA is viewed from the major groove, each of the four base pairs offer a
different combination of hydrogen bond acceptors, hydrogen bond donors, and van der
Waals interactions. Proteins interact with these sequence-specific features when they
bind to specific DNA sequences. Similar principles apply for proteins that bind in the
minor groove.
Major groove
H
H
CH3
H
H
O
H
N
N
N
O
H
N
H
H
H
N
N
H
Major groove
VdW
H
N
N
N
N
N
O
N
Minor Groove
N
N
H
O
N
H
H
Minor Groove
Arrangement of potential H-bond donor (↑) and acceptor (↓) groups and Van der
Waals interactions (VdW) in the major groove of A:T and G:C base pairs. You
should identify the N-glycosidic linkages in the above pictures and note the relative
positions of the sugars with respect to the major and minor grooves.
8
4. Amino acid side chains can form hydrogen bonds with nucleotides. In a typical DNAbinding protein, several amino acid side chains are spatially oriented so that many
hydrogen bonds form with bases. The following are a few examples of how amino acid
side chains can interact specifically with bases by binding to sites in the major groove.
O
O
+
NH3
O
Asparagine
H
N
N H
H
NH3
H
N
N
H
O
N
Deoxyguanosine
N
N
O
N
N
N
N
H
H
N
N
Arginine
H
O
H
N
N
O
O
Arginine
H
Deoxyadenosine
N
O
N
N
N
N
H
+
NH3
Glutamate
H
O
O
+
NH3
+
O
N
N
H
Deoxyadenosine
Examples of how proteins can recognize specific DNA sequences by binding to groups that
extend into the major groove.
5. Several common motifs have been found in proteins that interact with DNA. Examples
in the form of 3-dimensional molecular models that can be manipulated are presented at
our web site. Examples are also shown in most Biochemistry texts.
a. In a "helix-turn-helix" protein, the amino acid side chains that interact with DNA
are located on an alpha helix that fits into the major groove of DNA. The
recognition helix is held in position by interactions with a second helix connected to
the recognition helix by a stretch of relatively unstructured peptide that forms the
"turn" in the name helix-turn-helix.
b. Proteins containing "zinc fingers" are another important class of DNA binding
proteins. Cysteines coordinated with zinc atoms orient a recognition helix so that it
will fit into the major groove of the DNA. The “nuclear receptor” class of
transcription factors are zinc finger proteins. The nuclear receptors include the
glucocorticoid receptor, estrogen receptor, vitamin D receptor thyroid hormone
receptor, and several other receptors that will play prominent roles in this course.
c. The "leucine zipper" motif is found in several important transcription factors
involved in growth control. These proteins contain a long helical section. Part of
the helix fits into the major groove of the target DNA, and part of the helix forms a
dimerization domain where every seventh amino acid is a leucine. The
dimerization domain looks like a zipper hence the name leucine zipper.
IV . Eukaryotic Chromosome structure. The human genome includes both nuclear
chromosomes, which are large (50-250 Mb) linear DNAs, and mitochondrial DNA, which is
much smaller (about 20 kb) and is circular.The following topics refer to the nuclear
chromosomes.
A. Each chromosome contains specialized structures required for their replication and
segregation during mitosis and meiosis.
9
1. The centromeres are regions where sister chromatids associate during mitosis. They
are involved in mitotic spindle formation and are required for the proper segregation of
the chromosomes to daughter cells.
2. Telomeres are located at the ends of chromosomes and are required for the completion
of DNA synthesis.
3. Each chromosome must contain at least one origin of replication; most chromosomes
contain multiple origins.
B. Most (about 80 - 90%) of the DNA in the genome is present in introns and in the noncoding regions between genes. Although most of this DNA has no known function, it
contains sequences that are very useful as genetic markers.
1. One particularly useful class of sequences is known as Simple Sequence Repeats or
SSRs (also referred to microsatellites or Short Tandem Repeat Polymorphisms STRs).
a. A SSR consists of a repeating short DNA sequence (most often a di- tri- or tetranucleotide repeat). For example (CG)n.
b. SSRs occur frequently (there are tens of thousands of them in the human genome)
and are distributed rather uniformly over the entire genome.
c. A repeat such as (CG)n) may occur multiple times throughout the genome. One
specific instance of the repeat can be identified by the unique DNA sequences that
flank it.
d. SSRs are polymorphic. That is, if a population is analyzed for the length of a
particular SSR, a number of different repeat lengths will be observed.
e. The length of a SSR is an inheritable trait. Normally, a person has two copies of a
particular SSR: one from the person's mother and one from the father. The two
copies may be the same length or they may be different lengths.
f. A specific SSR can be isolated from an individual’s DNA by PCR amplification
with primers that are complementary to the DNA sequences that flank the SSR.
Electrophoresis of the PCR product will reveal the length(s) of the SSR that the
individual possesses.
2. Microsatellite instability and cancer.
Microsatellite is a term commonly used in tumor biology to describe an SSR (the
term microsatellite is derived from experimental observations made in the early days of
chromatin research). . In some types of cancer, the lengths of microsatellites
throughout the genome differ from those found in the patient’s normal tissue. The
altered microsatellite lengths can be attributed to improperly repaired DNA in one or a
few cells that then undergo clonal expansion to form the tumor. The degree of
microsatellite instability (the number of microsatellites with altered lengths) in a tumor
may be of importance in diagnosis and for determining the best course of treatment
(Hampel et al N Engl J Med. 2005 352(18):1851-60). Microsatellite instability is a
commonly seen feature of nonpolyposis colorectal cancer, where it is usually attributed
to the loss of the DNA repair enzyme hMLH1, which will be discussed in the lectures
on DNA replication and repair.
3. Another type of polymorphism that is becoming increasing popular as a genetic tool is
the single nucleotide polymorphism (SNP - pronounced "snip"). The polymorphisms
that are observed (usually by sequencing) are differences between individuals in the
nucleotides found at particular locations in the geneome. SNPs occur much more
10
frequently in the genome than SSRs. On average there is about one SNP per kb.
VI. Chromatin
A. In the nucleus, DNA is found associated with a large number of proteins to form
chromatin. The packaging of DNA into chromatin serves two main functions: 1) the
physical packaging of the chromosome into an ordered and untangled structure that can be
replicated and segregated to daughter cells, and 2) the regulation of gene transcription.
B. The nucleosome is the fundamental packaging unit of chromatin.
1. Histones H2A, H2B, H3 and H4 (two of each) form an octamer
2. 146bp of DNA is wrapped, in two turns, around this histone octamer.
3. There are about 50bp of DNA between nucleosomes (amount is variable).
Nucleosome: 146 bp of
DNA wrapped 1.75 turns
around an octamer of
histones
The histone N-terminal tails are
toward the surfaces of the nucleosome.
Modifications of the histone tails
affect nucleosome and chromatin
structure.
4. The amino terminal regions of the histones are positively charged, and are located
toward the surfaces of the nucleosome. They likely play important roles in
maintaining the nucleosome structure and in directing the interactions between
nucleosomes that are responsible for higher-order chromatin structures. These Nterminal tails are subject to a number of modifications including acetylation of lysines,
methylation of lysines and arginines and phosphorylation of serines. These
modifications can result in profound effects on chromatin structure and the expression
of the packaged genes.
5. In humans, adjacent nucleosomes are separated by about 50 bp of DNA. When viewed
by electron microscopy under partially denaturing conditions the nucleosomes appear
as 10nm thick filament resembling beads on a string.
6. In some regions, the position of the nucleosomes on the DNA can be critical since they
may mask important DNA sequences. The nucleosomes can be moved by "chromatin
remodeling enzymes". These are multi-subunit complexes that are highly regulated and
require the hydrolysis of ATP.
C. In the nucleus, chromatin is separated into regions of highly condensed and less condensed
chromatin that can be distinguished microscopically. The highly condensed regions are
referred to as heterochromatin; the less condensed regions as euchromatin.
Functionally, DNA sequences that are condensed into heterochromatin are, for the most
part, not transcribed into RNA.
1. In heterochromatin, the nucleosomes are further condensed into a 30nm thick fiber. The
structure of the 30nm fiber is not currently clear. One model suggests that it consists of
a helix of nucleosomes .
11
10 nm fiber
30 nm fiber
2. The compact higher order packing of nucleosomes in heterochromatin is associated with
transcriptional inactivity, and is an important mechanism for regulating the expression
of genes. A number of factors are involved in the condensation of chromatin into the
30nm fiber including histone modification, and DNA methylation and the binding of
histone H1.
3. Regions of a chromosome that are highly condensed are separated from regions that are
less highly condensed by short DNA sequences referred to as insulators. Insulator
function is mediated by specific insulator binding proteins (Gaszner & Felsenfeld Nat
Rev Genet. 2006 7(9):703-13.)
D. Higher order chromatin structures
1. In the nucleus, chromatin is organized into large loops of about 20-100kb. These loops
may function to delineate regions of more or less highly condensed chromatin structure,
and hence regions that are more or less available for transcription into RNA.
2. During mitosis, each chromosome is extremely condensed, with the loops of chromatin
being further compacted into a structure organized around a protein scaffold to form the
mitotic chromosomes that can be viewed by light microscopy.
VII. Chromatin structure is critically important for the regulation of gene expression.
A. Three broad classes of chromatin structure can be distinguished according to differences
in ability to be transcribed, differences in structure, and differences in histological
appearance. Chromatin structures are formed during stem cell differentiation.
1. Repressed chromatin is chromatin that will never be transcribed in a particular cell
line. It tends to be packaged into a compact chromatin structure, is methylated at many
CpG sequences, contains histone H3 that is methylated on specific lysines including
lysine 27. It is seen in the nucleus as heterochromatin.
2. Potentially active chromatin is not transcribed, but may be transcribed in the future in
a particular cell line. Transcriptional potential can be transferred to daughter cells.
Nucleosomes of potentially active chromatin tends to be under methylated at CpG
sequences, and contain histone H3 that is methylated on several specific lysines
including lysine 4.
3. Active chromatin is being actively transcribed. The chromatin structure of the
promoter regions is loosely folded, and histones in the promoter regions are modified
by acetylation and other modifications.
B. Chromatin structure is determined by many effectors. Three particularly important ones
are histone methylation, histone acetylation, and DNA methylation at CpG dinucleotides.
Other factors are also associated with transcriptional activity including other
modifications of histones, and the binding of a number of nonhistone proteins.
1. Histone methylation
a. Several lysine and argenines residues in the amino-terminal tails of histones can be
modified by adding one, two or three methyl groups. The methylation of specific
lysines provide signals for determining chromatin conformation.
b. One example of the role of histone H3 methylation is chromatin formation and gene
expression during differentiation of embryonic stem cells.
12
i. In embryonioc stem cells, many genes contain a di- or tri-methyl group on histone
H3 lysine 4 (a H3K4 mark) and/or on histone H3 lysine 27 (a H3K27 mark).
ii. Genes that are condensed into chromatin containing a H3K27 mark tend to be
transcriptionally repressed.
iii. During differentiation of embryonic stem cells, some genes lose the H3K27 marks
but retain the H3K4 marks. These genes tend to be expressed in the differentiated
cell. Other genes lose the H3K4 marks but retain the H3K27 marks. These genes
tend to be repressed in the differentiated cell.
2. Histone acetylation is a key mechanism for regulating transcription.
a. Lysines near the N-terminus of the core histones (2a, 2b, 3 and 4) are subject to
acetylation. Histone acetylation results in a less condensed chromatin structure;
deacetylation favors condensation into the 30nm form. Histone acetylation is usually
required for active transcription.
b. The cell contains many different Histone Acetyl Transferases (HATs) and histone
deacetylases (HDACs) that function in conjunction with other proteins (such as
transcription factors and 5-methyl-C binding proteins) that target their activities to
specific regions of the chromosome in response to specific cellular and developmental
conditions. HATs and HDACs will be discussed in detain in the gene expression
lectures.
3. DNA methylation at CpG dinucleotides
a. DNA is methylated on the 5 position of C at some CpG sequences. In humans, DNA
methylation at CpG is important for X-inactivation, imprinting, and determining the
pattern of gene expression during cellular differentiation.
NH
NH
2
H C
3
N
N
O
Deoxyribose
Deoxycytidine
O
2
H C
3
N
N
O
Deoxyribose
Deoxy5-methyl cytidine
N
N
O
Deoxyribose
Thymidine
The structures of deoxycytidine and deoxy 5-methyl cytidine. Note that 5-methyl
cytidine can be converted to thymidine by deamination (a non-enzymatic
reaction). This is an important mutagenic event.
b. Highly CpG methylated DNA induces the formation of heterochromatin; unmethylated
DNA induces the formation of euchromatin. One mechanism that couples CpG
methylation to gene expression is as follows. A complex containing a 5-methyl-C binding
protein and a histone deacetylase binds to 5-methyl-C and deacetylates neighboring
histones. The deacetylation of the histones condenses the chromatin structure in the
region near the methylated CpG DNA. The condensed chromatin structure blocks
transcription. Other modifications to nucleosomes near methylated DNA such as
histone methylation are also likely to play a role in determining the local chromatin
13
structure and hence gene activity.
c. The presence of the dinucleotide CpG and its methylation is not randomly distributed
throughout the genome.
i. Intergenic regions have far less CpG than expected by chance, and the CpG is usually
methylated.
ii. The 5’ ends of genes are often enriched in the dinucleotide sequence CpG. These are
referred to as “CpG islands”. The state of methylation of CpG islands is correlated
with gene activity: genes with highly methylated CpG islands are silenced.
iii. In differentiated cells, the methylation pattern is an important determinant of the set
of genes that can be expressed by that cell type. Undifferentiated cells (such as
embryonic stem cells) have a low level of CpG methylation.
Untranscribed genes
Repressed chromatin
--Methylated DNA
--H3K27
--deacetylated histones
Untranscribed genes
Potentially active chronmatin
--Unmethylated DNA
--H3K4
--deacetylated histones
Transcribed genes
Noncompact chromatin
--Acetylated histones
DNA and histone modifications in determining chromatin structure and gene regulation
d. The pattern of DNA methylation can be transferred from one cell to its daughter
cells. This is an example of epigenetic inheritance.
...CmG ...
...G Cm...
DNA
replication
...CmG ...
...G C ...
+
...C G ...
...G Cm...
CmG specific
methylation
...CmG ...
...G Cm...
+
...CmG ...
...G Cm...
A specific pattern of methylation can be transferred to daughter cells. Since the methylation
pattern is an important determinant of a cell's phenotype, this is an example of epigenetic
inheritance.
e. During gametogenesis the methylation pattern is erased.
i. Some genes are remethylated during gametogenesis (e.g. imprinting).
ii. Some genes are remethylated during early embryogenesis (e.g. X-inactivation).
iii. Some genes are remethylated during cellular differentiation.
f. Aberrant DNA methylation and cancer
i. Cancer cells isolated from many types of tumors often show the loss of one or more
critical proteins involved in limiting the growth of cells (the functions of many of these
critical proteins will be discussed later in this and other courses). One mechanism that
can lead to the loss of these proteins is over-methylation of the promotor sequence.
Tumor cells with gene inactivation due to over-methylation are said to have the “CpG
island methylator phenotype” (CIMP) and the tumors are referred to as CIMP positive.
iii. One example of this is nonpolyposis colorectal cancer where the DNA repair enzyme
14
hMLH1 is missing in the tumor cells (this enzyme will be discussed in detail in the
DNA replication and repair lectures). In some of these cancers, the loss of the
hMLH1 protein is due to over-methylation of the promotor of the hMLH1 gene.
15