Download No Slide Title

Document related concepts

DNA supercoil wikipedia , lookup

DNA vaccination wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Molecular cloning wikipedia , lookup

Polyploid wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Public health genomics wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Epigenomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Cancer epigenetics wikipedia , lookup

NUMT wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomic imprinting wikipedia , lookup

Oncogenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Genetic engineering wikipedia , lookup

Primary transcript wikipedia , lookup

Gene expression profiling wikipedia , lookup

Metagenomics wikipedia , lookup

Point mutation wikipedia , lookup

Transposable element wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

RNA-Seq wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome (book) wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Human Genome Project wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Human genome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomic library wikipedia , lookup

Genomics wikipedia , lookup

Minimal genome wikipedia , lookup

Genome editing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Genome Organization
Genome

Complete set of instructions for making an organism
• master blueprints for all enzymes, cellular structures &
activities




an organism‘s complete set of DNA
The total genetic information carried by a single set of
chromosomes in a haploid nucleus
Located in every nucleus of trillions of cells
Consists of tightly coiled threads of DNA organized into
chromosomes
Typical viral genome
DNA
or
RNA
4-200 genes
Viral genomes
Viral genomes: ssRNA, dsRNA, ssDNA, dsDNA, linear or circular
Viruses with RNA genomes:
• Almost all plant viruses and some bacterial and animal viruses
• Genomes are rather small (a few thousand nucleotides)
Viruses with DNA genomes (e.g. lambda = 48,502 bp):
• Often a circular genome.
Replicative form of viral genomes
• all ssRNA viruses produce dsRNA molecules
• many linear DNA molecules become circular
Molecular weight and contour length:
• duplex length per nucleotide = 3.4 Å
• Mol. Weight per base pair = ~ 660
Procaryotic genomes





Generally 1 circular chromosome (dsDNA)
Usually without introns
Relatively high gene density (~2500 genes per
mm of E. coli DNA)
Contour length of E.coli genome: 1.7 mm
Often indigenous plasmids are present
Typical Procaryotic genome
one circular doublestranded DNA chromosome
500-12,000 genes
often
plasmid(s)
Bacterial genomes: E. coli



4288 protein coding genes:
• Average ORF 317 amino acids
• Very compact: average distance
between genes 118bp
Numerous paralogous gene families:
38 – 45% of genes arisen through
duplication
Homologues:
• H. influenzae (1130 of 1703)
• Synechocystis (675 of 3168)
• M. jannaschii (231 of 1738)
• S. cerevisiae (254 of 5885)
Easy problem
Bacterial Gene-finding





Dense Genomes
Short intergenic regions
Uninterrupted ORFs
Conserved signals
Abundant comparative information
Plasmids
Extra chromosomal circular DNAs










-lactamase
ori
Found in bacteria, yeast and other fungi
foreign gene
Size varies form ~ 3,000 bp to 100,000 bp.
Replicate autonomously (origin of replication)
May contain resistance genes
May be transferred from one bacterium to another
May be transferred across kingdoms
Multipcopy plasmids (~ up to 400 plasmids/per cell)
Low copy plasmids (1 –2 copies per cell)
Plasmids may be incompatible with each other
Are used as vectors that could carry a foreign gene of interest (e.g.
insulin)
Agrobacterium tumefaciens

Characteristics
• Plant parasite that causes Crown Gall Disease
• Encodes a large (~250kbp) plasmid called Tumorinducing (Ti) plasmid

Portion of the Ti plasmid is transferred between bacterial
cells and plant cells  T-DNA (Tumor DNA)
Agrobacterium tumefaciens
T-DNA integrates stably into plant genome
Single stranded T-DNA fragment is converted to
dsDNA fragment by plant cell
Then integrated into plant genome
 2 x 23bp direct repeats play an important role in the
excision and integration process

Agrobacterium tumefaciens



Tumor formation = hyperplasia
Hormone imbalance
Caused by A. tumefaciens
• Lives in intercellular spaces of the plant
• Plasmid contains genes responsible for the disease
 Part of plasmid is inserted into plant DNA
 Wound = entry point  10-14 days later, tumor
forms
Agrobacterium tumefaciens

What is naturally encoded in T-DNA?
• Enzymes for auxin and cytokinin synthesis


Causing hormone imbalance  tumor formation/undifferentiated
callus
Mutants in enzymes have been characterized
• Opine synthesis genes (e.g. octopine or nopaline)


Carbon and nitrogen source for A. tumefaciens growth
Insertion genes
• Virulence (vir) genes
• Allow excision and integration into plant genome
Ti plasmid of A. tumefaciens
1. Auxin, cytokinin,
opine synthetic genes
transferred to plant
2. Plant makes all 3
compounds
3. Auxins and cytokines
cause gall formation
4. Opines provide unique
carbon/nitrogen
source only A.
tumefaciens can use!
Eucaryotic genomes
Located on several chromosomes
Relatively low gene density (50 genes per mm of
DNA in humans)
Contour length of DNA
Carry organellar genome as well
Typical eukaryotic genome
4-224, linear chromosomes
5,000 - 125,000 genes
Fungal genomes: S. cerevisiae




First completely sequenced
eukaryote genome
Very compact genome:
• Short intergenic regions
• Scarcity of introns
• Lack of repetitive sequences
Strong evidence of duplication:
• Chromosome segments
• Single genes
Redundancy: non-essential genes
provide selective advantage
Human Genomes
Human
50,000 genes X 2 kbp=100 Mbp
Introns=300 Mbp?
Regulatory regions=300 Mbp?
•Only 5-10% of human genome codes for genes
- function of other DNA (mostly repetitive sequences) unknown
but it might serve structural or regulatory roles
Plant genomes








It contains three genomes
The size of genomes is given in base pairs (bp)
The size of genomes is species dependent
The difference in the size of genome is mainly due to a
different number of identical sequence of various size
arranged in sequence
The gene for ribosomal RNAs occur as repetitive sequence
and together with the genes for some transfer RNAs in
several thousand of copies
Structural genes are present in only a few copies, sometimes
just single copy. Structural genes encoding for structurally
and functionally related proteins often form a gene family
Genetic information is divided in the chromosome
The DNA in the genome is replicated during the interphase of
mitosis
Size of the genome in plants and in
human
Genome
Arabidopsis
thaliana
Zea mays
Vicia faba
Human
Nucleus
70 Millions
3900 Millions
14500 Millions
2800 Millions
Plastid
0.156 Millions
0.136 Millions
0.120 Millions
Mitochondrion
0.370 Millions
.570 Millions
.290 Millions
.017 Millions
Plant genomes: Arabidopsis thaliana









A dicotyledonous plant
A weed growing at the roadside of
central Europe
It has only 2 x 5 chromosomes
It is just 70 Mbp
It has a life cycle of only 6 weeks
A model plant for the investigation of
plant function
Contains 25,498 structural genes from
11,000 families
The structural genes are present in only
few copies sometimes just one protein
Structural genes encoding for structurally
and functionally related proteins often
form a gene family
Plant genomes: Arabidopsis thaliana



Cross-phylum matches:
• Vertebrates 12%
• Bacteria / Archaea 10%
• Fungi 8%
60% have no match in non-plant
databases
Evolution involved whole genome
duplication followed by
subsequent gene loss and
extensive local gene duplications
Global Increase in Genome Size

Polyploidization (whole genome duplication):
Allopolyploidy: combination of genetically distinct
chromosome sets. (Wheat…)
Autopolyploidy: multiplication of one basic set of
chromosomes. (Goldfish, rose…)

Regional duplication
Repetitive Structure of
Eukaryotic Genome

Eukaryotic genomes contain various degrees of
repetitive structure: satellites, micro/minisatellites, retrotransposons, retrovirus, etc.
Repetitive sequence size correlates with
genome size:
Heterochromatin (*109bp)

Gorrila gorilla
Symphalangus syndactylus
Pan troglodites
Homo sapiens
Hylobates muelleri
Genome size (*109bp)
Mechanisms for Regional
Increase in Genome Size




Duplicative transposition
Unequal crossing-over
Replication slippage
Gene amplification (rolling circle replication)
Gene Duplication

duplication of a part of the gene:
domain/internal sequence duplication  enhance function,
novel function by new combination

duplication of a complete gene (gene family)
invariant duplication: dose repetitions,
variant duplication: new functions.

duplication of a cluster of genes
Internal Gene Duplication
5’
1
2
3
4
5
3’
6
Ancestral trypsinogen gene
Deletion
6’
1
5’
3’
Thr Ala Ala Gly
4 fold duplication + addition of spacer sequence
6’
1
5’
Internal duplications + addition of intron sequence
5’
1
1
2
3
4
5
6
3’
Spacer: Gly
7
…
37
38
Antifreeze glycoprotein gene
39
40
41
6’
3’
Complete Gene Duplication

Invariant duplication:
RNA specifying genes: Number of tRNA and rRNA correlates with
genome size.

Variant duplication:
X-linked
autosomal
Trichromatic
Human female
Trichromatic
Human male
Human male
(color blind)
New world monkey
female
or
or
Dichromatic
Trichromatic
New world monkey
female
New world monkey
male
Dichromatic
or
Dichromatic
Gene Loss


Duplicated genes  unprocessed pseudogenes.
Single-copy genes devoid of selection pressure
 unitary pseudogenes.
Loss of L-gulono--lactone oxidase in humans, guinea pigs, etc.
comparing to other vertebrates: the enzyme at the terminal
step of synthesizing L-ascorbic acid (vitamin C).
Genome organization
Protein Coding Gene
 A segment of DNA which encodes protein synthesis
 DNA sequence encoding protein
Gene classification
coding genes
Chromosome
(simplified)
intergenic
non-coding
region
genes
Messenger RNA
Structural RNA
Proteins
transfer
RNA
Structural proteins
Enzymes
ribosomal
RNA
other
RNA
Coding region
Nucleotides (open reading frame) encoding the
amino acid sequence of a protein
The molecular definition of gene includes more
than just the coding region
Noncoding regions

Regulatory regions
• RNA polymerase binding site
• Transcription factor binding sites


Introns
Polyadenylation [poly(A)] sites
Eukaryotic genes



Most have introns
Produce monocistronic mRNA: only one
encoded protein
Large
Appearance of genomes



One to many
chromosomes
Repeat sequences
common in some genomes
e.g. 35% of human are
transposable elements 10% Alu, 14.6% LINE1
sequences
Gene structure varies –
no. and length of introns
What does 50 kb of sequence look
like?
repeat
Pseudogene
Intron-exon components of a gene
Human – very few genes - repeats
Yeast – many genes (~25) – few
repeats
Maize – mostly repeats
What do the genes encode?
Microbes
highly
specialized
Basic functions
+
Yeast –
simplest
eukaryote
Fly –
complex
development
Genes for basic cellular functions such as
translation, transcription, replication and repair
share similarity among all organisms
Worm –
programmed
development
Arabidopsis –
plant life cycle
Gene families expand to
meet biological needs.
Repetitive DNA

Moderately repeated DNA
• Tandemly repeated rRNA, tRNA and histone genes (gene
products needed in high amounts)
• Large duplicated gene families
• Mobile DNA

Simple-sequence DNA
• Tandemly repeated short sequences
• Found in centromeres and telomeres (and others)
• Used in DNA fingerprinting to identify individuals
Types of DNA repeats
Tandem repeats (e.g. satellite DNA)
5’-CATGTGCTGAAGGCTATGTGCTGCGACG- 3’
3’-GTACACGACTTCCGATACACGACGCTGC- 5’
Inverted repeats (e.g. in transposons)
5’-CATGTGCTGAAGGCTCAGCACATCGACG- 3’
3’-GTACACGACTTCCGAGTCGTGTAGCTGC- 5’
• Form stem-loop structures
Palindroms = adjacent inverted repeats
(e.g. restriction sites)
• Form hairpin structures
Loop
Stem
Hairpin
Repetitive sequences
Satellite DNA
Chromosomal DNA
Repeats in the mouse genome
Caesium chloride
density gradient
Type
No. of
Repeats
Size
Percent of
genome
Highly
repetitive
Moderately
repetitive
> 1 Mill
< 10 bp
10 %
> 1000
~ 150 - ~300
bp
20 %
Mobile DNA


Move within genomes
Most of moderately repeated DNA sequences
found throughout higher eukaryotic genomes
• L1 LINE is ~5% of human DNA (~50,000 copies)
• Alu is ~5% of human DNA (>500,000 copies)

Some encode enzymes that catalyze
movement
Transposition


Movement of mobile DNA
Involves copying of mobile DNA element and
insertion into new site in genome
Why?


Molecular parasite: “selfish DNA”
Probably have significant effect on evolution by
facilitating gene duplication, which provides
the fuel for evolution, and exon shuffling
RNA or DNA intermediate


Transposon
moves using DNA
intermediate
Retrotransposon
moves using RNA
intermediate
Types of mobile DNA elements
LTR (long terminal repeat)


Flank viral retrotransposons and retroviruses
Contain regulatory sequences Transcription start site and poly
(A) site
LINES and SINES

Non-viral retro-transposons
• RNA intermediate
• Lack LTR

LINES (long interspersed elements)
• ~6000 to 7000 base pairs
• L1 LINE (~5% of human DNA)
• Encode enzymes that catalyze movement

SINES (short interspersed elements)
• ~300 base pairs
• Alu (~5% of human DNA)
Mitochondrial genome (mtDNA)








Number of mitochondria in plants can be between 502000
One mitochondria consists of 1 – 100 genomes (multiple
identical circular chromosomes. They are one large and
several smaller
Size ~15 Kb in animals
Size ~ 200 kb to 2,500 kb in plants
Mt DNA is replicated before or during mitosis
Transcription of mtDNA yielded an mRNA which did not
contain the correct information for the protein to be
synthesized. RNA editing is existed in plant
mitochondria
Over 95% of mitochondrial proteins are encoded in the
nuclear genome.
Often A+T rich genomes
Chloroplast genome (ctDNA)






Multiple circular molecules, similar to procaryotic
cyanobacteria, although much smaller (0.001-0.1%of the size
of nuclear genomes)
Cells contain many copies of plastids and each plastid contains
many genome copies
Size ranges from 120 kb to 160 kb
Plastid genome has changed very little during evolution.
Though two plants are very distantly related, their genomes
are rather similar in gene composition and arrangement
Some of plastid genomes contain introns
Many chloroplast proteins are encoded in the nucleus (separate
signal sequence)
The family of plastids
Buchannan et al. Fig. 1.44
Endosymbiosis



Well accepted that chloroplasts and mitochondria
were once free living bacteria
Their metabolism is bacterial (e.g. photosynthesis)
Retain some DNA (circular chromosome)
• Protein synthesis sensitive to chloramphenicol
• Cytosolic P synthesis sensitive to cycloheximide

Most genes transferred from symbiont to nucleus
• Requires protein tageting
DNA for chloroplast proteins can be in
the nucleus or chloroplast genome
Buchannan et al. Fig. 4.4
Import of proteins into chloroplasts
Buchannan et al. Fig. 4.6
Biochemistry inside plastids


Photosynthesis – reduction of C, N, and S
Amino acids, essential amino acid synthesis
restricted to plastids
• Phenylpropanoid amino acids and secondary compounds
start in the plastids (shikimic acid pathway)
• Site of action of several herbicides, including glyphosate
• Branched-chain amino acids
• Sulfur amino acids

Fatty acids – all fatty acids in plants made in plastids
“Cellular” Genomes
Viruses
Procaryotes
Eucaryotes
Nucleus
Capsid
Plasmids
Viral genome
Bacterial
chromosome
Chromosomes
(Nuclear genome)
Mitochondrial
genome
Chloroplast
genome
Genome: all of an organism’s genes plus intergenic DNA
Intergenic DNA = DNA between genes
Methods of regulation

Gene expression
• Normally slow relative to metabolic control that will
be discussed most of the time in this course
• Allows metabolism to be changed in response to
environmental factors
• Transcriptional control most common

Sometimes variation in transcription rate not reflected in
enzyme amount
• Translational control also found

No change in mRNA levels but changes in protein amounts
Gene structure relevant to metabolic
regulation
Promoters
Exploring metabolism by genetic
methods

Antisense – what happens when the amount of an
enzyme is reduced
• not clear how antisense works

Knockouts
• Often more clear-cut since all of the enzyme is gone
• Use of t-DNA, Salk lines

Overexpression
• Use an unregulated version of the protein or express on a
strong promoter
• Sometimes leads to cosuppression

RNA interference
• 21 to 26 mers seem very effective in regulating translation