Download Genes, Genomics, and Chromosomes

Document related concepts

Ridge (biology) wikipedia , lookup

Replisome wikipedia , lookup

Gene regulatory network wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression profiling wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Molecular cloning wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genomic library wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genome evolution wikipedia , lookup

Gene wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
[III] Genes, Genomics, and Chromosomes
•
•
•
•
Eukaryotic gene structure, Cot analysis, Rot
analyses, chromosomal organization of genes
and noncoding DNA
Genomics: Genome-wide analysis of gene
structure and expression
Structural organization of eukaryotic
chromosomes
Morphology and functional elements of
eukaryotic chromosome
Molecular Definition of a Gene
•
•
•
Definitation of a “Gene”: The entire nucleic acid
sequence that is necessary for the synthesis of a
functional gene product (polypeptide or RNA)
A gene includes:
 Nucleic acid sequence not only encoding the amino acid
sequence of the protein (coding region)
 It is also required for the synthesis of an RNA transcript
 It also contains the transcription-control region (i.e.,
enhancer or silencer)
 Sequences that specifies 3’ cleavage and polyadenylation
[poly(A)] sites, and splice sites
Most genes are transcribed into mRNAs, but some are
transcribed into RNA molecules such as tRNA, rRNA and shRNA
Gene Expression in Prokaryotes and Eukaryotes
Prokaryotes
•
Gene expression in prokaryotes
takes place in a single
compartment, but gene
expression in eukaryotes takes
place in multiple compartments in
multiple stages
Eukaryotes
Eukaryotic Genes Produce Monocistronic
mRNAs and Contain Lengthy Introns
•
•
•
•
While prokaryotes produce polycistronic mRNA, eukaryotes
produce monocistronic mRNA
In the polycistronic mRNA, a ribosome binding site is present
near the start site for each of the cistron, and translation can
be initiated from each of these sites
In eukayrotic mRNA, the 5’CAP site directs the binding of
ribosome to the mRNA and protein synthesis begins from the
closest AUG codon. Furthermore, most of the mRNA also
possess poly(A) tails
In eukaryotes, introns, which are larger than exons, need to be
removed from the precursor mRNA (pre-mRNA) before it can
direct protein synthesis. Some introns in human genes are as
big as 17 kb. The median intron length is about 3 kb.
Comparison of Structures of the cDNA
and Its Genomic Gene
The main differences between a cDNA and a genomic gene are:
cDNA does not have intron
cDNA does not have a regulatory/promoter sequence
Distribution of Uninterrupted and
Interrupted Genes in Various Eukaryotes
•
•
•
Majority of the genes in
yeast are uninterrupted
Most of genes in flies
are interrupted by one
or two introns
Most genes in
mammals are
interrupted by many
introns
Sizes of Genes in Various Organisms
•
•
Yeast genes are short
Genes in flies and mammals
have a dispersed bimodal
distribution extending to
very long sizes
Sizes of Exons and Introns
Exons
•
Introns
Exons coding for proteins usually are short, but introns usually
range from very short to very long
Simple Eukaryotic Transcription Unit
•
•
In eukaryotes, some DNA encodes a single protein while the others
encode more than one protein
It means that some genes have simple transcription unites while
others have complex transcription units. This slide shows a simple
transcription unit
Complex Eukaryotic
Transcription Unit
•
•
Three different ways to
process the primary
transcription product of a
gene to give rise to
different mRNAs :
 Using different splice
sites to produce
different mRNA species
 Using alternative poly(A)
sites to produce mRNAs
with different 3’ exons
 Using alternative
promoters to produce
mRNA with different
5’exons and same 3’
exons
Differential splicing of an
precursor mRNA leads to
production of isoforms of
gene products
Kinetics of DNA Hybridization
Suggested Reading:
1. Integration of Cot
analysis, DNA cloning
and high-throughput
sequencing facilitate
genome characterization
and gene discovery.
Perterson et al. (2002)
Genome Res 12:795-807.
2. Repeated sequences in
DNA. Britten and Kohne
(1968) Science 161: 529540
•
•
The rate of DNA annealing is proportional to the concentration of nucleic
acid and time of hybridization
dC/dt = -kC2 by integrating the equation between Co (initial) and after time t,
C/Co = 1/(1 + k.Cot) . If C/Co = ½, Cot1/2 = 1/k
Kinetics of DNA Reassociation (Cot Analysis)
•
Britten and Kohne (1968) studied genomic DNA sequence via
measuring the kinetics of DNA reassociation
 Assigned Reading: Repeated sequence in DNA
•
Rate of DNA reassociation is dependent upon random collision of
the complementary strands (i.e., concentration of DNA) and
duration of time for collision to occur
dC/dt = -kC2
where k = reassociation
constant
By integration
C/Co = 1/ (1 + k.Cot)
Indicating that parameter controlling the re-association reaction is
the product of initial DNA concentration and time (Cot)
•
C/Co = ½ = 1/ (1+ kCot1/2)
so: Cot1/2 = 1/k
Cot1/2 is the concentration and time required for 50% re-association
Reassociation
Kinetics of
Eukaryotic
DNA
Calculating the Complexity of a Genome
Cot1/2 (DNA of any genome)
Complexity of any genome
=
Cot1/2 of E. coli
4.2 x 106 bp
Repetitive and Unique DNA Sequence in Eukaryotes
•
Non-repetitive DNA:
•
Intermediate (Moderate) Repetitive DNA:
•
Highly Repetitive DNA:
•
•
 Only present once per genome
 Found in prokaryotic and eukaryotic genome
 Repeat several times (10-1000X) per genome
 Disperse throughout the genome in eukaryotes
 Short repetitive DNA (<100 bp) present up to 1 million times in
the eukaryotic genome
Larger genomes are not generated by increasing the number of
copies of the same sequences present in smaller genomes. It is
due to the presence of more repetitive DNA
Suggested Reading II:
 Initial sequencing and analysis of human genome. Nature 409: 861927, 2001.
 Finishing the eukaryotic sequence of human genome. Nature 431:
931-945, 2004.
The Proportions of Different Sequence
components in eukaryotic Genomes
•
•
•
The absolute content
of non-repetitive DNA
increases with
genome size but
reaches a plateau at
~2-3x 109 bp
mRNA is typically
derived from nonrepetitive DNA
sequence
A significant part of
the moderately
repeat DNA sequence
consists of
transposones (able
to move around the
genome)
Genomes of Many Organisms Contain
Much Noncoding DNA
•
•
Much of the DNA in many eukaryotic cells do not encode
RNA or have any apparent regulatory function
 Yeast ,12 Mb; fruit flies, 180 Mb; chicken, 1300 Mb; human,
300 Mb DNA
 Many lower organisms than human have higher DNA contents
than human
Data from DNA sequence analysis revealed that the
genome of higher eukaryotes contain a large amount of
non-coding DNA
•
Gene rich
region vs.
gene desert
region
Genome Size and Gene Numbers in
Various Organisms
The number of genes in bacterial
and archael genomes is
proportional to the genome size
Relationship of Gene Number and Genome Size
•
•
The number of genes in prokaryotes correlates well with the sizes
of their genome
The number of genes in eukaryotes does not correct well with
their genome sizes
Protein-Coding Genes
•
•
Solitary genes: About 25-50 percent of the protein-coding
genes are represented only once in the haploid genome
 Chicken lysozyme gene contains 15 kb DNA coding sequence
which constitutes a simple transcription unit with three exons
and 2 introns
Duplicated genes: These genes are close but nonidentical
sequences that often are located within 5-50 kb of one
another called “gene family”
 Each gene family could contain from a few to 30 or so members
 Gene family: A set of duplicated genes that encode proteins with
similar but not identical amino acid sequences. Examples are:
cytoskeletal proteins, the myosin heavy chain, the a- and b-globins
 Protein family: Encode closely related , homologous proteins.
Examples: protein kinases, vertebrate immunoglobins and olfactory
receptors. Protein families include from just a few to 30 or more
members
 The genes encoding b-globins are a good example of gene family that
contains five functional genes: b, d, Ag, Gg, and E
Total Number of Genes and Duplicated Genes
•
•
In bacteria, since most of the genes are unique, so the number of
distinct families is close to the total gene number
In eukaryotes, many genes are duplicated, and as a result the
number of different gene families is much less than the total
number of genes
Proportions of Unique and Duplicated Genes
The proportion of unique genes drops sharply with genome size;
bacteria have the highest proportion of unique genes, and yeast,
flies, worm and Arabidopsis drop sharply
Heavily Used Gene Products (rRNA and snRNA
Genes) are Arranged in Tandem Repeat
•
•
•
•
In vertebrates and invertebrates, the genes encoding
rRNAs and some other noncoding RNAs such as
snRNA are arranged in tandemly repeated arrays
These tandemly repeated genes, appear one after the
other, encode identical or almost identical proteins or
functional RNAs
The tandemly repeated rRNA and snRNA genes are
needed to meet the great cellular demand for their
transcripts. Example: cells have 100 copies or more of
5S rRNA genes
Multiple copies of tRNA and histone genes are also
present in clusters, but generally not in tandem repeat
A Tandem rDNA Gene Cluster
A tandem gene cluster of rRNA gene
Electromicrograph
of DNA being
Transcribed into
RNA
•
•
Green arrow indicates
DNA and Red arrow
indicates RNA
This micrograph was
taken by O.L. Miller, Jr,
and Barbara R. Beatty at
Oak Ridge National Lab
showing the transcription
of tandem repeat of rRNA
genes in Xenopus
oocytes
Non-Protein
Coding Genes
Encode
functional
RNAs
•
•
There are nonprotein genes
in the genome
that encode
functional
RNAs. These
RNAs are
important in
regulating the
expression of
genes
Assigned Reading: The functional genomics of noncoding RNA. Mattick et
al. (2005), Science 309: 1527-1528.
How Many Genes Are There in All Organisms?
•
This slide shows the
comparison of fly genes to
those of the worm and yeast
Orthologous genes
(orthologs): Genes encod
corresponding polypeptides
in different organisms. Two
gene products from different
organism that their sequence
share >80% of their lengths
are considered as orthologs
In flies, ~20% of the genes
have orthologs with worm and
yeast. These are required
genes
When fly genes are compared with those of worm, an additional
10% genes are considered as additional orthologs. This means
that these 30% genes are required for flies and worms
The total number of proteins can be a good estimate of the total
proteome size
•
•
•
•
Proportion of Protein Encoding Genes
in Human Genome
•
•
•
•
Human haploid genome contains
22 autosomes plus the X and Y
chromosomes, and the
chromosomes range from 45 to
279 Mb DNA
The total haploid genome size is
3286 Mb (~3.3 x 109 bp)
The chromatin comprises majority
of genome, ~2.9 x 109 bp)
Although about 25% of the human
genome are for protein coding
genes, the actual exons are only
1%
The Structure
of Average
Human Gene
Different Classes of Repetitive DNA
Sequences Human Genome
•
Five classes of repetitive
DNA sequences in human
genome:
 Transposons, 45% of
thegenome, multiple
copies
 Pseudogenes, ~3,000
in all
 Simple sdequence of
repetitive DNA, ~3%
of total DNA
 Segmental
duplications, 10 to
300 Kbthat have been
duplicated, ~5%
 Tandem repeat from
blocks of one typeof
sequence
Genomic DNA of Eukaryotic Organisms
Classes of DNA
Protein coding genes
#/genome % of Human Genome
~25,000
55
Tandemly repeated genes
U2 snRNA
rRNA
Repetitious DNA
Single sequence DNA
Interpersed repeat
Processed peusogenes
Unclassified spacer DNA
~20
<0.001
~300
0.4
variable
~6
~3.26
45
1-~100
~0.4
n.a.
25
Interspersed repeats: DNA transposons, LTR retrotransposons, Non-LTR
retrotranspons, LINEs and SINEs
Satellite DNAs
• When eukaryotic DNA is
centrifuged on a CsCl
gradient, two
components are
observed:


•
Main band: most of
the genomic DNA
Satellite band: one or
multiple miner bands;
they could be heavier
or lighter than the
main band
The main band DNA has
buoyant density of
1.701 g/cm with a G-C
content of 42%, and
minor band DNA has
the buoyant density of
1.690 g/cm with a G-C
content of 30%
Satellite DNAs Lie in Heterochromatin
•
•
•
•
•
•
Highly repetitive DNA (simple sequence DNA): Satellite DNA is
characterized by rapid rate of hybridization, consists of very
short sequences repeated many times in tandem in large
clusters. It is typically <10%
In addition, multi-cellular eukaryotes have complex satellites with
longer repeat units mainly in heterochromatic region
In human, a satellite DNA that consists of 171 bp repeats. bsatellite DNA family has 68 repeat units interspersed with a
longer 3.3 repeats
The tandem repeat DNA often has a distinct physical property
that can be used to isolate. This physical property is the
buoyant density which is lower than the buoyant density of the
non-repetitive DNA
Therefore, by equilibrium centrifugation on a CsCl gradient, the
satellite DNA can be separated from the non-repetitive DNA
The buoyant density of a duplex DNA depends on the G-C
content according to the following formula
Buoyant density = 1.660 + 0.00098 (% G-C) g/cm-3
Most Simple-Sequence DNAs are Concentrated
in Specific Chromosal Locations
•
•
•
Repetitious DNA is present in
the genome of eukaryotic cells
 Simple-sequence DNA or
called satellite DNA (6% of the
human genome), size 14 to 500
bp
 Microsatellite, 1-13 bp
 Interspersed repetitive DNA
dispersed throughout the
genome (also called as
transposable elements)
By fluorescence in situ
hybridization (FISH), the
simple-sequence DNAs are
localized near the centromeres
and telomeres of mouse
chromosome
Centromeric heterchromatin--necessary for separation of
chromosome to daughter cells
Diseases Associated with Microsatellites
•
•
•
Microsatellite occasionally occur within
transcription units
At least 14 different types neuromuscular disease
associate with microsatellite repeats in transcription
unit of the gene
Myotonic dystrophy and spinocerebellar ataxia are
the examples. In myotonic dystrophy, the transcript
of DMPK (dystrophia myotonica protein kinase) gene
contain 1000 to 4000 repeats of the sequence of CUG
in the 3’ end untranslated region that interfere with
normal RNA processing and export of the mature
RNA from nucleus to cytosol
Probing Minisatellite
DNA by Southern
Blot Hybridization
•
•
•
DNA samples from three
different individuals were
digested with a restriction
enzyme Hinf1, separated
on agarose gels,
transferred to nylon
membranes and probed
with three different radiolabeled minisatellites
Different unique among
individuals were observed
with different individuals
DNA Fingerprinting
depends on differences in
length of simple-sequence
DNA
DNA Fingerprinting
•
•
•
Minisatellite DNA: 14 to
100 bp repeat in a
region of 1 to 5 kb
region which makes up
of 20-50 repeat units.
A slight difference in
the total length of the
repeats can be detected
by PCR analysis. This
forms the basis of DNA
fingerprinting
This technique can be
used in population
studies, paternal or
maternal identity test
and criminal
identification
Hybridization Kinetics of cDNAs to mRNAs
•
•
The population complexity of mRNA isolated from a
cell can be estimated by studying the kinetics of
hybridization of mRNAs to their cDNAs
The example given below is to compare the mRNA
population differences of RNA isolated from
estrogen treated trout liver to its untreated control:
 Isolate total RNA samples from livers of estrogen treated
fish and control (RNAind & RNAunind)
 Prepare 32P-labeled cDNAind by reverse transcription
 Set up hybridization between 32P-cDNAind and RNAunind at
different Rot values (concentration of 32P-cDNAind x time)
 Determine the amount of hybridization by treating the
hybridization mixture with S1 nuclease
Hybridization between mRNA and cDNA
•
•
•
This slide shows the
hybridization profile of excess
mRNA of chick oviduct with the
cDNA of chick oviduct
32P-labelled cDNA synthesized
from mRNA of chick oviduct and
hybridized to excess mRNA of
chick oviduct
The result showed that there are
three components of cDNA
present at different frequencies
hybridizing to chick oviduct
mRNA:
•
•
•
About 50% of cDNA hybridizing
at a Rot1/2 of 0.0015
About 15% of cDNA hybridizing
at a Rot1/2 of 0.04
About 35% of cDNA hybridizing
at a Rot1/2 of 30
Rot Analysis of Excess mRNA and
cDNA of Chick Oviduct Cells
•
•
•
•
Total mRNA was isolated chick
oviduct cells
32P-cDNA was prepared from the
total mRNA by reverse
transcription
Rot analysis was conducted
between radio labeled cDNA and
excess amount of total mRNA
The Rot analysis data showed that
there are three components of
sequences hybridizing to cDNA:

cDNA of estrogentreated oviduct RNA
hybridize to un-treated
oviduct RNA


The first component has the
characteristic of ovalbumin mRNA
The second component has the
total complexity of 15 Kb (7-8
different mRNA of 2000 bases
The last component has the
complexity of 26 Mb (~13,000
mRNA)
•
•
•
•
Number of Expressed Gene Measured by
DNA Microarray Analysis
Although Rot analysis can be used to reveal the complexity of mRNA
population in any cell type, the number of gene expressed in any cell type
can be determined by DNA microarray.
In this assay, the mRNA isolated from the cell type of interest can be
reversed transcribed to cDNA with tags
The labeled cDNA is used to hybridize to an DNA array that contains entire
number of genes of an organism of interest
The genes that hybridized to the tagged cDNA can be visualized by
scanning the array
This slide shows results of
DNA microarray analysis to
determine expression of 12
genes in 59 individual breast
tumor tissues of breastfed and
breast-unfed women
Genes highly expressed are
shown “red”, lower expression
in “blue”, equal expression in
“grey”
•
•
•
Genomics: Genome-wide analysis of
gene structure and expression
Database of Genomes
•
•
•
Using automated DNA sequencing techniques, methods for
cloning DNA fragments on the order of 100 Kb in length, and
computer algorithms to piece together the stored sequence
data, scientists have determined vast amounts of DNA
sequences including the entire genome of human, and many
key experimental organisms e.g., the round-worm (C.
elegans), fruit flies, mice, medaka and zebrafish etc.
Since the cost of sequencing Mb of DNA is becoming very
cheap, the genomes of many organisms are rapidly been
determined
There are two databases for human genome:
 The gene bank at the National Institute of Health at Bethesda,
MD
 The EMBL sequence base at the European Molecular Biology
Laboratory in Heidelberg, Germany
Comparison of the Regions of Human NF1 Protein
with Ira Protein of S. cerivisiae
•
Ira, the GTPase activating protein (GAP) modulate the GTPase
activity of the monomeric G protein called ras. Both GAP and ras
function to control cell replication and differentiation in response to
signals from outside of the cell
Structural Motifs
•
When a protein shows no significant similarity to other
proteins with the BLAST (basic local alignment sequence
tool) algorithm, it may nevertheless share a short sequence
that is functionally important. Such short sequence
recurring in many different proteins, referred to as structural
motifs
Comparison of Related Sequences from Different
Species Can Give Clues to Evolutionary
Relationship Among Proteins
•
•
Paralogous: sequences that diverged as the result of gene
duplication
Orthologous: sequences that aroused because of speciation
Genes Can be Identified within Genomic
DNA Sequences
 By scanning for “Open Reading Frame” (ORF)
 ORF is defined as a stretch of DNA containing at least with 100
bp with a start codon and a stop codon of translation
 ORF analysis has identified at least more than 90% of the genes
in bacteria and yeast
 Both very short genes and long genes are missed by this
method
 For eukaryotic genes, due to the presence of multiple exons and
introns, scanning of the ORF is not a good method to identify
genes. One needs to use computer programs to compare the
genomic DNA sequences to cDNA sequences, splice site
sequences and sequences of the expressed sequence tags
(EST)
 Another powerful method for identifying human genes is to
compare the human genomic sequence with that of the mouse
since human and mouse are sufficiently related to have most
genes in common
Comparison of the Gene Number and Type of Proteins
Encoded in the Genomes of different Organisms

Structural organization of eukaryotic
chromosomes
Questions?
•
How are DNA molecules organized within
eukaryotic cells?
 Total length of cellular DNA is up to a hundred
thousand times of cell’s length and the
packing of DNA is crucial to cell architecture
 During interphase, DNA exists as a
nucleoprotein complex, called as chromatin,
dispersed throughout the nucleus
 During mitosis, chromatin further compact
into visible metaphase chromosomes which
can be visualized under a microscope
Package of DNA in Microorganisms
•
In viruses, genomic
DNA molecule is
associated with
protein molecules
and packaged inside
the viral capsids. In
bacteria and fungi,
the genomic DNA is
associated with
proteins and is
packaged as a
compact mass inside
the center of the cell.
It is called as
“nucleoid”
Electronmicrographs of Extended and
Condensed Chromatin
Extended form
•
•
Condensed form
Nucleosomes: Chromatin isolated from nucleus under low salt and
no divalent cation (Mg+2), the isolated chromatin resembles “beads
on a string”. The beads are termed nucleosomes and the string
termed linker
Nucleosome is about 10 nm in diameter and is the primary
structural unit of chromatin
Nuclear DNA Associate with Histones to
form Chromatin
•
•
When the DNA from eukaryotic nuclei was isolated in
an isotonic buffer (i.e.,~0.15 M KCl), it is associated
with an equal mass of proteins (histones [basic
proteins]) as chromatin
There are five different histones found in the
chromatin, namely H1, H2A, H2B, H3 and H4. The
sequences of four histones (H2A, H2B, H3 and H4)
among different organisms are similar—suggesting
these proteins fold into similar three dimensional
conformation
Four Classes of Histones
•
Histones: Small basic chromosomal proteins rich in basic amino
acids (lysine-rich and arginine-rich; positive charge)
•
Separation of
Nucleosomes
When the chromatin of
nuclei is digested with
micrococcal nuclease,
discrete DNA fragments
with definitive sizes can
be recovered from the
digested fraction
When the digested
materials were
separated by gradient
centrifugation, different
size particles were
isolated
These particles are
DNA fragments isolated monomers, dimers,
from chromatin digested trimers and tetramers
with different DNA size
with micrococal
fragments
nuclease (limited
digestion)
•
•
Individual Nucleosomes Released by Digestion of Chromatin
with Limited Amounts of Micrococcal Nuclease
100 nm
Structure of Nucleosomes
•
•
•
•
•
•
The DNA in the nucleosomes are less
susceptible to digestion by nuclease
than that in the linkers
By controlling the digestion with
nuclease, free nucleosomes can be
isolated
A nucleosome consists of a protein core
with DNA wrap around its surface like
thread around the spool
The protein core is an octomer
containing two copies of H2A, H2B, H3
and H4
Nucleosomes from all eukaryotes
contain 147 bp of DNA wrapped slightly
less than 2 turns around the protein core
The length of the linker DNA is variable
ranging from 8 to 114 bp, H1 associates
with the linker DNA
Nucleosome
Dimer and Monomer of Nucleosones
•
•
•
Mononucleosomes typically have ~200 bp DNA. End trimmed
nucleosomes reduces the DNA to ~165 bp
The core particles have DNA fragment of ~140 bp
The linker DNA between two nucleosomes varies from 8 to 114
bp
Beads-on-a-String
Structure of
Chromatin
a, In the presence of histone
H1
b, In the absence of histone
H1
Structures of Nucleosome
•
•
•
Mononucleosome
is 10 nm particle
which contains
200 bp DNA and
histone octomer
(consisting two
copies of H2A,
H2B, H3, and H4)
DNA occupies
most of the outer
surface of the
nucleosome
Sequences on the
DNA that lie on
different turns
around the
nucleosome may
be close together
Structures of the Four Core Histones
•
•
Histone H2A and H2B each has 2 short a-helix and one long ahelix regions. These regions can form special folding
Similarly, histone H3 and H4 also have 2 short b-helix and one
long a-helix regions. These regions also form special folding as
shown in next slide
Histone Fold
Histone fold is formed by
the three a-helical regions of
the core histones (a).
The histone fold regions of
two histone molecules allow
them to associate to form a
heterodimer (b)
Structure of the Nucleosome
(a) Left: Top view of nucleosome; Right: Side view of nucleosome
(b) Model of a nucleosome viewed from the top with histone shown
as ribbon diagram
Interaction of Histone 1 with Nucleosome
Histone 1 interacts with the central gyre of the DNA at the
dyad axis, as well as with the linker DNA at either the entry of
the exit
Histone Tails
The N- and C-termini of
histone H2A, H2B, H3
and H4 project out from
the core of the
nucleosome. These
regions are termed as
histone tails
Modification of Histone Tails
•
•
•
•
Histone tails are chemically
modified: acetylation,
methylation, phosphorylation and
ubiquitination to form histone
codes
The lysine residues in the histone
tails of H3 and H4 can go through
reversible acetylation and
deacetylation. Acetylation in the
lysine group will prevent the
chromatin to condense
Histone tails can also associate
with other chromosomal proteins
and thus affect transcription and
DNA replication; this interaction
can be affected by acetylation of
the lysine or methylation of lysine
and arginine in the histone tails
Phosphorylation of serine
residues on histones is another
modification of histone tails
Acetylation and Deacetylation of Histones



Enzymes responsible for acetylation of histones are histone
acetyltransferases (HATs) [Gcn5 N-acetytransferases, p300/CBP
family and MYST family]
Enzymes responsible for deacetylation of histones are histone
deacetylases (HDACs)
The levels of acetylation of the N-terminus of histone is controlled
by the balance between HAT and HDAC
Acetylation of Lysine Residues on Core Histones
Acetylation and Methylation of H3 and H4 Histones


Enzyme
responsible for
methylation is
methylase
The methylated
group can also be
removed by
specific enzymes
Lysine 9 in H3 can be
either aceylated or
methylated
Methylation of Lysine 9 in H3 can inhibits
the acetylation of lysine 14
•
•
•
•
Histone
Modifications
Lysine e-amino groups
can be methylated
several times, leading to
preventing acetylation,
and thus maintaining
their positive charge.
Arginine side chain can
also be methylated
Serine and threonine
side chains can be
reversily phosphrylated,
introducing a negative
charge
A single 76-amino acid
ubiquitin molecule can
be reversibly added to a
lysine C-terminal tails of
H2A and 2B. Addition of
ubiquitin to H2A and
H2B could reduce the
positive charge of
histone H2A and H2B
Overall Modifications of Histone Tails
•
•
•
•
•
•
•
Important Terms
Histone code: The situation of acetylation, methylation,
phosphorylation, ubiqutination and sumolation of the histone
tails. The pattern of modification affects the activity of the genes
on the chromatin
Changes of charges on the histones resulting from histone
modification will also affect the binding of non-histone proteins to
the chromatin. This is essential for gene expression
Bromodomain: A protein domain of many transcription factors
that recognizes lysine residues in the histone tail
Chromodomain (chromatin organization modifier commonly found
in modifier): Protein structural domain of about 60 amino acid
residues found in association with remodeling and manipulation
of chromatin
Chromoshadow domain: A protein domain which is distantly
related to the chromodomain. Proteins containing a
chromoshadow domain include Su(var)205 (HP1) and mammalian
modifier 1 and modifier 2
PHD finger: Cys4-His-Cys3 motif of HAT3 . It relates to epigenetics
TUDOR domain: A protein that recognizes methylated histones
Sites on Histone Modification and Functions
Most modified sites in histones have a single, specific type of modification,
but some sites have more than one type of modification
Structure of Condensed Chromatin
•
•
•
•
•
•
When chromatin was extracted from
cells in isotonic buffers, it appears as
fibers = 30 nm in diameter
Nucleosomes in this type of chromatin
are packaged into an irregular spiral or
solenoid arrangement (about 6
nucleosome per turn)
H1 histone is associated with each
nucleosome
Condensed chromatin in 30 nm
Electron microscopic observation
fiber structure.
revealed that the 30 nm fiber is less
uniform than the perfect solenoid
Condensed chromatin may be very
dynamic with regions occasionally
The solenoid
unfolding and then refolding into
model for the
solenoid structure
structure of the 30
Chromatin in chromosomal regions that
nm chromatin
are not being actively transcribed exists
fiber
in condensed fiber form or in higherorder folded structures
Structure of the 30-nm
Chromatin Fiber
•
•
•
•
The structure of chromatin is
highly conserved in different
organisms
The amino acid sequences of four
histones (H2A, H2B, H3 and H4)
are highly conserved between
distantly related species
The amino acid sequence of H1
varies more from organism to
organism
The similarity in histone
structures suggests that they fold
into very similar 3-dimensional
conformations which were
optimized for histone function
early in evolution in a common
ancestor of all modern eukaryotes
H1 and Other Modified
Histones on the Formation
of 30 nm fiber
(a). Positive charge of H4 and
negative chage of H2A and H2B
resulted in closer package of these
two nucleosomes
(b). Acetylation of Lysine 16 (K16)
resulted in loss of positive in this
position and lead to neutrolization of
“+” and “-” attraction between these
two nucleosomes
The globular region of the H1 (in pink color) interacts with the
linker DNA as it exits the nucleosome and changes its path to
produce a more compact structure
Further Condensation of Chromatin Fiber
The 30 nm chromatin fiber can be thrown into a series of loops, each
of approximately 50,000-200,000 bp in size, producing a looped fiber
with diameter of approximately 300 nm
Nonhistone Proteins, a Structural Scaffold for Long
Chromatin Loops
•
•
•
In addition to histones,
non-histone proteins are
also involved in organizing
chromosome structure
Chromosome scaffold:
nonhistone proteins
associated with the
metaphase chromosome
The shape of the scaffold is
maintained even after the
DNA on the metaphase
chromosome is removed
by DNase digestion
The loops of the 30 nm fiber are attached
to the nuclear scaffold via the matrixattachment regions (MAR s)
An individual loop can alter its structure from that
of 30 nm fiber to the beads-on-a-string structure,
allowing transcription to occur. These regions
can be detected by its sensitivity to limited
digestion by DNase I
Is DNA attached to the scaffold via specific
sequence?
•
•
•
•
MARs (Matrix Attachment Regions):
DNA on the chromatin that attach to
the scaffold proteins, It is also called
as SARs (scaffold attachment
regions)
Figure in this slide shows how MARs
can be isolated
DNA sequence analysis revealed that
there is no consensus sequence on
the DNA that bind to scaffold matrix
except the DNA is ~70% AT rich
Furthermore, it has been found that
MARs contain DNA that are ciselements regulating transcription or
topoisomerase II recognition site,
suggesting that MARs may provide
sites for topographical change in
DNA
Heterochromatin
•
•
•
•
Heterchromatin: A region of the chromatin that does
not uncoil after mitosis. It is a dark staining area of
the chromatin
In mammalian cells, heterochromatin appears as
darkly staining regions of the nucleus, often
associated with the nuclear envelope
Experiments of pulse labeling with 3H-uridine and
autoradiography showed that most transcription
occurs in regions of euchromatin and the nucleolus
In general, heterochromatic regions are sites of
inactive genes; however some transcribed genes
have been located in regions of heterchromatin. Not
all inactive genes and non-transcribed regions of
DNA are visible as heterochromatin
Heterochromatin Versus Euchromatin
Bone marrow stem cell
Dark stained regions show
heterochromatin and the light
stained regions show euchromatin
The modifications of histone Nterminal tails in the heterochromatin
and euchromatin of the histone H3
Probing
Nontranscribed Genes
from Transcribed
Genes
• Transcribable genes are
•
•
sensitive to limited digestion by
DNase I
Nuclei from chicken embryo
erythroblasts at 14 days and
undifferentiated chicken
lymphoblastic leukemia cells
were exposed to increasing
amount of DNase I, and DNA
isolated from the digested
nuclei
DNA digested with Bsm H1,
separated by agrose gel,
transferred to nylon membrane
and probed with the 4.5 Kb
globin gene fragment
Model for the Formation of
Heterochromatin by Binding to
Histone H3 Trimethylated at
Lysine 9
•
•
•
HP1: Heterochromatin protein 1, contribute to
the condensation of heterochromatin by
binding to the N-terminus lysine 9 of histone
3 after it is trimethylated
The HP1 bound histone 3 will continue to
associate among each other (HP1
oligomerization) and cause chromatin
aggregation and condensation
Heterochromatin condensation can spread
along a chromosome because HP1 binds a
histone methytransferase (HMT) that
methylates lysine 9 of histone H3. This
creates a binding site for HP1 on the
neighboring nucleosome. The spreading
process continues until a “boundary element”
is encountered
Chromatin Contains Small Amounts of
Nonhistone Proteins
•
•
•
Besides histones and scaffold proteins, chromatins
also contain small amounts of non-histone proteins
High mobility group (HMG) proteins:
 Proteins can bind to transcription factors. In yeast, removal
of HMG genes will result in expression of other genes allover
the genome
 HMG proteins are found to bind with transcription factors
and thus stabilizing the transcription factor complex to
regulate the expression of genes
DNA binding transcription factors: Regulate the
transcription of genes
Model for the
Folding of the 30nm Chromatin
Fiber in a
Metaphase
Chromosome
Model for the Packing of Chromatin and the
Chromosome Scaffold in Metaphase
Overview of the Structure of Genes & Chromosomes
Eukaryotic Chromosomes Contain One Linear
DNA Molecule
•
•
Since the largest intact DNA molecules in lower eukaryotes can
be extracted from the cells, it indicates that each chromosome
contains a single DNA molecule
 DNA molecules (2.3 x 105 to 1.5 x 106 bp ) from S. cerevisiae can be
separated by pulse-field gel electrophoresis
 Drosophila genomic DNA (6 x 107 to 1 x 108 bp) can be readily
analyzed
 The largest DNA of human chromosomes (2.8 x 108 bp) are too large
to be extracted as intact molecules
In summary, eukaryotic chromosome is a linear structure
composed of an immensely long, single DNA molecule that is
wound around histone octomers about 200 bp, forming strings of
closed packed nucleosomes. The nucleosomes fold to forma 30nm chromatin fibers. The fibers attach to scaffold proteins to
form loops. In addition, thousands of transcription factors and
HMG proteins are also found

Morphology and functional elements of
eukaryotic chromosome
Microscopic Appearance of a Typical
Metaphase Chromosome
•
•
Colchisine or colcemid: compound that destroy microtubule and thus
leaving the two sister chromatid attach together in metaphase
Karyotype: number, size and shapes of metaphase chromosomes
Karyotypes of Human Chromosomes
•
•
•
In non-dividing cells, chromosomes are not visible
During mitosis or meiosis, chromosomes condensed and become
visible by light microscopy
During metaphase of mitosis, each chromosome is in the form of
divalent chromatids attached at the centomer
Giemsa Staining of Chromosomes
•
•
•
•
•
G bands: Giemsa staining of human chromosomes
which will give specific patterns G-bandings
G-bands correspond to large regions of the human
genome that have low “G+C” content
R Bands: R bands are produced by treating human
chromosomes with hot alkaline solution and
subsequent staining with Giemsa reagent. The
pattern of R-bands is opposite to the pattern of Gbands
R-bands and G-bands are used to identify
chromosome aberration by cytogeneticist
Chromosome painting: Revealing chromosomes by in
situ hybridization of chromosome with fluorescence
probes (FISH). It can be in single or multiple color
Giemsa Staining of Chromosomes
Using G-Banding and Multicolor FISH to
Reveal Transloaction
Translocation between chromosome 9 and chromosome 22 to result in
Philadelphia chromosome in nearly all myelgenous leukemia patients
Banding on Drosophila Polytene Salivary
Gland Chromosomes
Band revealed by in situ hybridization
•
This is caused by DNA amplification but the daughter
chromosomes do not separate.
Interphase Polytene Chromosome in the Salivary Gland of
Drosophila melanogaster Arise by DNA Amplification
Functional Elements Required for Replication
and Stable Inheritance of Chromosomes
•
•
•
Although chromosomes differ in length and number
between species, the chromosomes behave similarly
at the time of cell division
Three functional elements are required for any
eukaryotic cells to replicate and segregate correctly:
 Replication origins
 The centromer
 Two telomeres
Experiments described in next few slides are designed
to demonstrate the importance of these functional
elements
Yeast Transfection Experiment
ARS is (Automomas replication sequence) is required for
DNA replication in Yeast
Yeast Transfection Experiment
CEN (Yeast centromere sequence) is required for proper
segregation
Yeast Transfection Experiment
TEL (Telemere sequence) is required for chromsomal DNA
replication
Comparison of CEN Sequence between
Yeast and Drosophila
•
•
•
•
Centromeres from yeast and Drosophila vary greatly in length
Region I and Region III are short and sequences are conserved
Region II, although with various sequence, is fairly constant in
length and is rich in AT content
While region I and II bind to about 30 proteins and also bind to
microtubule of the spindle apparatus during mitosis, region II is
bound to a nucleosome with H3 been replaced by a variant form of
H3 (e.g., CENP-A in human)
•
•
•
Yeast Artificial Chromosomes Serve as
Cloning Vector to Clone Megabase DNA
Fragments
Yeast artificial chromosome (YAC) consists of TEL
sequence from yeast, yeast CEN and ARS plus
selection marker and DNA to be cloned to make up
more than 50 k
Only 1 daughter cell out of 1,000 to 10,000 failing to
receive an artificial chromosome
The successful propagation of YACs and studies
presented earlier strongly support the conclusion
that yeast chromosomes, and probably all
eukaryotic chromosomes are linear double-stranded
DNA molecules containing special regions that
ensure replication and proper segregation
Action of Telomerase to Prevent Shorting
of Chromosomes
•
•
•
•
•
Telomeres of several organisms are shown to contain repetitive
oligmers with a high G content in the 3’end at the end of the
chromosome. The repeat sequence is TTAGGG
The lengths of repeats are several bp in protozoans and several
thousand bp in vertebrates
The region is bound by specific proteins that both protect the
ends of the linear chromosomes from exonuclease digestion
Synthesis of DNA in the lagging strand can not reach
completion like leading strand, and results in shortening of the
chromosomes. Telomerase can fill in the missing sequence in
the lagging strand, thus maintaining the proper length of
chromosome
Reading List:
– Maintenance of chromosomes by telomeres and telomerase. The
Nobel Prize in Physiology or Medicine 2009
Assigned Readings [III]:
1. Repeated Sequence in DNA
2. Integration of Cot analysis, DNZ cloning and high-throughput
sequencing facilitate genome characterization and gene
discovery
3. Initial sequencing and analysis of human genome
4. Finishing the eukaryotic sequence of human genome
5. The functional genomic of non-coding RNA
6. Maintenance of chromosomes by telomeres and telomerase, A
Nobel Lecture
7. DNA methylation and histone modifications: teaming up the
silence genes
8. Histone lysine demethylases: emerging roles in development,
physiology and disease
9. The key to development : interpreting the histone code?
10. Histones