Download ***********X***********X*******X*******X***X***X***X***X***X***X

Document related concepts

Maurice Wilkins wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

X-inactivation wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Replisome wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Molecular cloning wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Genomic library wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

DNA supercoil wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
From Genes to Genomes
Jinsong Pang
Tel. 85099367
E-mail: [email protected]
Key laboratory of molecular
biology epigenetics of MOE
Watson and Crick: DNA double helix (1953)
Eukaryotic chromosome
SV40 minichromosome
Nucleosides
•The bases are covalently attached to the 1’ position
of a pentose sugar ring, to form a nucleoside
Glycosidic (glycoside, glycosylic) bond
•R
•Ribose or
•2’-deoxyribose
•Adenosine, guanosine, cytidine, thymidine, uridine
•Bases in DNA
•adenine
•purines
•guanine
•N9
•cytosine
•pyrimidines
•thymine
•N1
BASES
NUCLEOSIDES
NUCLEOTIDES
Adenine (A)
Adenosine
Adenosine 5’-triphosphate (ATP)
Deoxyadenosine
Deoxyadenosine 5’-triphosphate (dATP)
Guanosine
Guanosine 5’-triphosphate (GTP)
Deoxyguanosine
Deoxy-guanosine 5’-triphosphate (dGTP)
Cytidine
Cytidine 5’-triphosphate (CTP)
Deoxycytidine
Deoxy-cytidine 5’-triphosphate (dCTP)
Thymine (T)
Thymidine/
Deoxythymidie
Thymidine/deoxythymidie
5’-triphosphate (dTTP)
Uracil (U)
Uridine
Uridine 5’-triphosphate (UTP)
Guanine (G)
Cytosine (C)
Gene
Gene: A molecular unit of heredity of a living organism.
The segment of DNA specifying production of a
polypeptide chain; it includes regions preceding and following
the coding region (leader and trailer) as well as intervening
sequences (introns) between individual coding segments
(exons).
A brief history of genetics
Gene is a "particulate
factor" that passes
unchanged from parent to
progeny. (Mendel 1865)
The central dogma: Crick 1958
A gene codes for an RNA, which may code for protein.
Eukaryotic genes are often interrupted
Exon is any segment of an interrupted gene that is represented in the mature
RNA product.
Intron is a segment of DNA that is transcribed, but removed from within the
transcript by splicing together the sequences (exons) on either side of it.
RNA splicing is the process of excising the sequences in RNA that
correspond to introns, so that the sequences corresponding to exons are
connected into a continuous mRNA.
Structural gene codes for any RNA or protein product other than a regulator.
Transcript is the RNA product produced by copying one strand of DNA. It
may require processing to generate mature RNAs.
2.5 Organization of interrupted genes may be conserved
Comparison of the cDNA and genomic DNA for mouse b-globin
shows that the gene has two introns that are not present in the
cDNA. The exons can be aligned exactly between cDNA and
gene.
Alternative splicing generates the a
and b variants of troponin T.
Alternative splicing uses the same
pre-mRNA to generate mRNAs
that have different combinations
of exons.
Most genes are uninterrupted in yeast, but
most genes are interrupted in flies and
mammals.
Yeast genes are small, but
genes in flies and mammals
have a dispersed distribution
extending to very large sizes.
Exons coding for proteins are usually short. Introns range from very
short to very long.
•Polymorphism (more fully genetic polymorphism): The
simultaneous occurrence in the population of genomes
showing variations at a given position.
•Single nucleotide polymorphism (SNP): The
polymorphism (variation in sequence between individuals)
caused by a change in a single nucleotide.
An exon surrounded by flanking sequences that is translocated into an
intron may be spliced into the RNA product.
Human chromosomes
The contrast between interphase
chromatin and mitotic chromosomes
Heterochromatin describes regions of the genome that are
permanently in a highly condensed condition are not transcribed,
and are late-replicating. May be constitutive or facultative.
Euchromatin comprises all of the genome in the interphase
nucleus except for the heterochromatin.
•The importance of packing of DNA into
chromosomes
 Chromosome is a compact form of the DNA that readily
fits inside the cell
 To protect DNA from damage
 DNA in a chromosome can be transmitted efficiently to
both daughter cells during cell division
 Chromosome confers an overall organization to each
molecule of DNA, which facilitates gene expression as
well as recombination
•Chromosome sequence & diversity
Chromosomes
 Shape: circular or linear
 Number in an organism is characteristic
 Copy: haploid, diploid, polyploid
•Genomes
•3/15/05
Genome
Genome: The complete set of sequences in the genetic material of an organism.
Transcriptome: The complete set of RNAs present in a cell, tissue, or organism.
Proteome: The complete set of proteins that is expressed by the entire genome.
Genome & the complexity of the
organism
 Genome size: the length of DNA associated with
one haploid complement of chromosomes
 Gene number: the number of genes included in a
genome
 Gene density: the average number of genes per
Mb of genomic DNA
•3/15/05
•Genome & genes
•Genome: all DNA sequences in a cell
•Genes: a stretch of continuous DNA sequence
encoding a protein or RNA
•C-value is the quantity of DNA in the genome
(per haploid set of chromosomes).
•C-value paradox refers to the lack of a
correlation between genome size and genetic
complexity (ie. Lungfish 139Gb vs. human 3Gb)
non-coding sequence
•
DNA sequence that does not code for protein or RNA, including
1. Introns (unique sequence) in genes
2. DNA consisting of multiple repeats, can be tandemly repeated sequences
(e.g. satellite DNA) or interspersed repeats (e.g. Alu element) etc.
Repetitive DNA
•Tandem gene clusters :
1. moderately repetitive DNA consists of a number of types of repeated
sequence.
2. genes whose products are required in unusually large quantities, e.g. there
are 10-10000 copies of rDNA encoding 45S precursor and X100 copies of
histone genes.
•The proportions of different sequence
components vary in different genomes.
•The largest component of the human genome
consists of transposons. Other repetitive sequences
include large duplications and simple repeats.
•Satellite DNA (simple sequence) :
- Highly repetitive DNA (>106).
- very short (2 to 20-30bp, mini- or micro-), in tandem
arrays
- concentrated near the centromeres and forms a large
part of heterochromatin.
- as separate band in buoyant density gradient
- no function found, except a possible role in kinetochore
binding
- Minisatellite repeats are the basis of the DNA
fingerprinting techniques
•5’ – ATAAACTATAAACTATAAACT – 3’
•3’ – TATTTGATATTTGATATTTGA – 5’
•ACAAACT, 1.1x107 bp, 25% genome
•ATAAACT, 3.6x106 bp, 8% genome
•ACAAATT, 3.6x106 bp, 8% genome
•AATATAG, cryptic
•Satellites comprise more than 40% of the genome
•Drosophila satellite DNA repeat
•(several million copies)
•Human mitochondrial DNA has 22 tRNA genes, 2 rRNA genes, and 13 protein-coding
regions. 14 of the 15 protein-coding or rRNA-coding regions are transcribed in the same
direction. 14 of the tRNA genes are expressed in the clockwise direction and 8 are read
counter clockwise.
CpG islands and the promoters of housekeeping genes
•Proteins in chromosome
• Half of the molecular mass of eukaryotic chromosome is
protein
 In eukaryotic cells a given region of DNA with its associated
proteins is called chromatin
 The majority of the associated proteins are small, basic
proteins called histones.
 Other proteins associated with the chromosome are referred
to as non-histone proteins, including numerous DNA binding
proteins that regulate the transcription, replication, repair and
recombination of DNA.
 Nucleosomes: regular association of DNA with histones to
form a structure effectively compacting DNA
Centromeres, origin of replication and telomere are
required for eukaryotic chromosome maintenance
Eukaryotic chromosome duplication & segregation occur
in separate phases of the cell cycle
Cell cycle: a single round of cell division
Mitotic cell division: the chromosome number is maintained
during cell division
•Centromeres
Required for the correct segregation of the
chromosomes after replication
 Direct the formation of kinetochore (an
elaborate protein complex) essential for
chromosome segregation
 One chromosome, one centromere
 The size varies (200 bp- >40 kb)
 Composed of largely repetitive DNA
sequences
•The centromere
1. The region where two chromatids are
joined
2. The sites of attachment to the mitotic
spindle via kinetochore
3. Centromere DNA:
•Yeast:
•AT-rich (88bp)
•Mitotic chromosome
•Mitotic
spindle
Mitotic chromosome - centromere
•Yeast centromere
Mammalian cells: much longer, flanked by satellite DNA
•The Telomere
1. Specialized DNA sequences which form the ends of the
linear DNA of the eukaryotic chromosome
2. Contains up to hundreds copies of a short repeated
sequence (5’-TTAGGG-3’ in human)
3. Synthesized by the enzyme telomerase (a ribonucleoprotein)
independent of normal DNA replication.
4. The telomeric DNA forms a special secondary structure to
protect the chromosomal ends from degradation
•Telomere & Telomerase
•Repeat sequence: Tetrahymena- TTGGGG; human- TTAGGG
•Telomere: structure
•A loop structure forms at the end of
chromosomal DNA
The eukaryotic mitotic cell cycle
Nucleosomes are the building blocks of
chromosomes
 The nucleosome is composed of a core of
eight histone proteins and the DNA (core
DNA, 147 bp) wrapped around them. The
DNA between each nucleosome is called a
linker DNA. Each eukaryote has a
characteristic average linker DNA length
(20-60 bp)
The path of nucleosomes in the chromatin fiber
Histone octamer
DNA
•Nucleosome core
•146 bp, 1.8 superhelical turn
Histone H1
•Chromatosome
•166 bp, 2 superhelical turn
The 10 nm fiber in partially unwound
state can be seen to consist of a string of
nucleosomes. Photograph kindly
provided by Barbara Hamkalo.
The structure of nucleosome
Histones are small, positively charged (basic) proteins
DNA packaged into nucleosome
Six-fold DNA compaction

Five abundant histones are H1 (linker histone, 20 kd), H2A,
H2B, H3 and H4 (core histones, 11-15 kd).

The core histones share a common structural fold, called
histone-fold domain

The core histones each have an N-terminal “tail”, the sites of
extensive modifications
Many DNA sequence-independent
contacts (?) mediate interaction
between the core histones and DNA
The histone N-terminal tails stabilize
DNA wrapping around the octamer
The histone tails emerge from the core of the nucleosome at specific
positions, serving as the grooves of a screw to direct the DNA wrapping
around the histone core in a left-handed manner.
Histone H1 binds to the linker DNA
between nucleosome, inducing tighter
DNA wrapping around the nucleosome
The role of H1
1. Stabilizes the point at which DNA enters and
leaves the nucleosome core.
2. C- tail of H1: stabilizes the DNA between the
nucleosome cores.
• 23 kDa, located outside of nucleosome core,
binds to DNA more loosely
• Less conserved in its sequence
•The interaction of DNA with the
histone octamer is dynamic
There are factors acting on the
nucleosome to increase or decrease the
dynamic nature
The dynamic nature of DNA-binding to
the histone core is important for access of
DNA by other proteins essential genome
expression etc.
Histone variants and modification
•
The major mechanisms for the condensing and decondensing of
chromatin operate directly through the histone proteins which carry out
the packaging.
•
Short-term changes in chromosome packing modulated by chemical
modification of histone proteins
•
Actively transcribed chromatin: via acetylation of lysine residues in the Nterminal regions of the core histones.
•
Condensation of chromosomes at mitosis: by the phosphorylation of
histone H1.
•
Longer term differences in chromatin condensation: associated with
changes due to stages in development and different tissue types.
•
Utilization of alternative histone variants, H5 replacing H1 in some very
inactive chromatin.
Modification of the histone N-terminal tails alters the
function of chromatin
Interphase chromosomes: chromatin
Heterochromatin
1. Highly condensed
2. Transcriptionally inactive
3. Can be the repeated satellite DNA close to the centromeres,
and sometimes a whole chromosome (e.g. one X
chromosome in mammals)
Euchromatin: chromatin other than heterochromatin.
1. More diffused and not visible
2. The region where transcription takes place
3. Not homogenous, only a portion (~10%) euchromatin is
transcriptionally active where the 30nm fiber has been
dissociated to “beads on a string” structure and parts of
these regions may be depleted of nucleosome.
DNase I hypersensitivity
•Euchromatin
• CpG methylation : CpG island
•
•
•
•
•
•
•
Methylation of C-5 in the cytosine base of 5’-CG-3’
Occurs in mammalian cells
Signaling the appropriate level of chromosomal packing at
the sites of expressed genes
CpG methylation is associated with transcriptionally inactive
regions of chromatin
Islands of unmethylated CpG are coincident with regions of
DNase I hypersensitivity
“Islands”: surround the promoters of housekeeping genes.
Responsible for epigenetic and may also to RNA silencing
•Euchromatin
•DNase I hypersensitivity
Brief summary
1. Prokaryotic chromosome: closed-circular DNA, domains/loops,
negatively supercoiled, HU & H-NS
2. Eukaryotic chromatin: Histones (octamer: H2A, H2B, H3,
H4)+146bp DNA > Nucleosomes + H1 > chromatosome + Linker
DNA > beads on string > 30nm fiber > fiber loop + nuclear
matrix > highly ordered chromatin > > > chromosome
3. Eukaryotic chromosome structure: centromere, kinetochore,
telomere, hetero- or euchromatin, CpG island and methylation
4. Genome complexity: noncoding DNA, unique sequence,
repetitive DNA, satellite DNA