Download lecture - Haloarchaea

Document related concepts

Gene desert wikipedia , lookup

Genomic imprinting wikipedia , lookup

Gene expression wikipedia , lookup

Mutation wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Replisome wikipedia , lookup

Molecular cloning wikipedia , lookup

DNA sequencing wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Exome sequencing wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Community fingerprinting wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Non-coding DNA wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Microbial Genomics
Topics
 Describe the new area of genomics
 Outline the rapid progress in genomic sequencing
 Describe the analysis of sequences - bioinformatics
 Show the use of genomics in the study of microbes
 Use the sequence of a human pathogen Escherichia
coli O157:H7 to illustrate the above points
Ref: Perna et al. (2001) Nature 409:529 (USA)
 Relevant to next lectures.
Dr M. D-S, 2007
Microbial genome sequences
Genbank (NCBI), Bethesda, Maryland, USA
2007: 481 - completed microbial genomes
2006: 319
2003: 112
Sizes range from 0.58 - over 9 Mb
Genbank - main genomic database
There is some duplication...
Dr M. D-S, 2007
Genomics
- the study of entire genomes of organisms
 assumes the entire sequence of at least one
representative example has been determined
 includes study of all the genes and gene
products and non-coding regions
 includes study of genome organisation and
evolution
Dr M. D-S, 2007
The explosion of
‘-ome’ and ‘-omics’ words
 Functional genomics
 Proteome
 Transcriptome
 Metabolome, Glycome, Lipidome
e.g. a recent journal article with the title: “Functional
genomics by integrated analysis of metabolome and
transcriptome of Arabidopsis”
Dr M. D-S, 2007
Genomics
What can microbial genomics tell us ?
•
•
•
•
•
•
Full gene complement of the cell
Complete description of cell metabolism
How genomes are structured
Virulence genes
Potential drug targets
Gene flow between cells (evolution)
Dr M. D-S, 2007
Genome Sequencing:
Two methods
 1. Sanger di-deoxy sequencing (using
fluorescently labelled ddNTPs) on cloned
DNA templates.
 2. Pyro-sequencing method on 454
machine using uncloned DNA templates
Dr M. D-S, 2007
Genome Sequencing:
Two methods
1. Sanger di-deoxy sequencing (using
fluorescently labelled ddNTPs) on cloned
DNA templates. ‘Shotgun’ strategy.
 Dye-terminator chemistry, ABI sequencing apparatus,
commercial software for handling seq. data
Dr M. D-S, 2007
Genomic sequencing
methods
Shear DNA & isolate
fragments about 2kb
chDNA
Clone thousands of
fragments into
plasmid vector
(library). Prepare
DNA for
sequencing
Dr M. D-S, 2007
Dideoxy chain termination
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
http://www.plattsburgh.edu/acadvp/artsci/biology/bio401/DNASeq.html
Dr M. D-S, 2007
Sequence: methods section
Applied Biosystems Inc (ABI) latest
sequencing machine, PE 3700
Capillary electrophoresis
96 capillaries at a time
Robotically loaded and run (24hr)
-
How many bp can it do in a day?? +
- each run is 2hr, get 600-1000 nt per
capillary, 96 capillaries/run
Dr M. D-S, 2007
Sequence: methods section
Applied Biosystems Inc (ABI) latest
sequencing machine, PE 3700
How many bp can it do in a day??
- each run is 2hr, about 800bp each
lane, 96 lanes
= 24/2  800  96 = 921,000
Or about 1 Mb /machine/day
Dr M. D-S, 2007
Top of capillary tubes
Sequence data
-
Laser scanning of the 96
capillary tubes identifies the
colour and positions of the
closely spaced bands of
ssDNA.
+
TAATCATGGTC....
Dr M. D-S, 2007
Shotgun sequencing: how much do
you need to do?
~ 1 Mb /machine/day
Want both strands, good sequence for
both, random coverage means you will
need 6-8x genome size in sequence
data
Speed makes it efficient?
Counter argument is the difficulty in
linking up reads, particularly when
genomes have long repeat
sequences.
Dr M. D-S, 2007
Genome Sequencing:
Two methods
In the E.coli O157:H7 genome sequence paper by
Perna et al., there were 2 gaps remaining in the
genome sequence! They couldn’t complete it.
“Extended exact matches pose a significant
assembly problem.” ??
Dr M. D-S, 2007
Repeat sequences, eg. Prophage genomes
Nearly identical prophage sequences at 3
locations on genome, all > 2000 nt
What sequences do you observe
when inside a prophage genome?
Dr M. D-S, 2007
Repeat sequences, eg. Prophage genomes
Nearly identical prophage sequences at 2
locations on genome
What sequences do you see going
across the borders of prophages?
Dr M. D-S, 2007
Repeat sequences, eg. Prophage genomes
Nearly identical prophage sequences at 2
locations on genome
What information do you need to
place the repeats properly?
Dr M. D-S, 2007
Genome Sequencing:
Two methods
 1. Sanger di-deoxy sequencing (using
fluorescently labelled ddNTPs) on cloned
DNA templates.
 2. Pyro-sequencing method on 454
machine using uncloned DNA templates
Dr M. D-S, 2007
The 454 machines: the next revolution
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
www.454.com
The 454 machines: the next revolution
40 million bases/5.5 hr
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
www.454.com
The 454 machines: the next revolution
40 million bases/5.5 hr
DNA immobilised on micro-beads
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Positioned in wells of special tray
(44um diameter, 1.2 million per chip)
Sequencing enzymes on smaller
beads.
Only one DNA-bead can fit in each
well
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Each bead has only one DNA
fragment attached, so will give unique
sequence.
www.454.com
The 454 machines: the next revolution
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When a base is
incorporated (by DNA
polymerase), light is
emitted, and the light
detected under each well.
www.454.com
The 454 machines: the next revolution
40 million bases/5.5 hr
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
When a base is incorporated (by DNA polymerase), light is
emitted, and the light detected under each well. If there are
multiple bases, the light is proportional to the number. Chain
lengths of 200 nt are possible. With 200,000 wells, and
200nt/well, then 40 million bases can be sequenced.
www.454.com
Genomics
 Papers filled with JARGON. Mainly
genetic terms. Some terms are relatively
new (eg. replichore)
 Use the E.coli paper example, stopping to
investigate each new term or concept
 Emphasise the uses of this data, and the
future of genomic research.
Dr M. D-S, 2007
What do you know about
microbial genomes ?
Exercise: Think of a typical bacterial
genome, like that of E.coli and  Sketch the genome and the most significant
features you know about it (as a whole
genome, not individual genes)
 Jot down what you think the main selective
pressures are on it
Dr M. D-S, 2007
Escherichia coli genome
Circular, ~ 4.6 Mb
Ori and Ter, bidirectional replication
Replichores about equal
oriC
ter
Dr M. D-S, 2007
Replichore ‘balance’ ?
 If you move oriC relative to Ter, the
growth rate of E. coli K-12 is reduced.
 Chromosomal inversions around the origin
or termination of replication are usually
symmetrical, conserving the replichore
balance.
Hill, C. W., and J. A. Gray. 1988. Effects of chromosomal inversion on
cell fitness in Escherichia coli K-12. Genetics 119:771–778.
Eisen, J. A., J. F. Heidelberg, O. White, and S. L. Salszberg. 2000.
Evidence for symmetric chromosomal inversions around the
replication origin in bacteria. Genome Biol. 1:0011.1–0011.9 Dr M. D-S, 2007
E.coli genome - global features
Gene dosage
Gene direction relative to ori
Recombination/inversion rates vary
around chromosome
Dr M. D-S, 2007
Gene Dosage
 Genes near the origin of replication will
almost always be in multiple copy compared
to genes near the terminus
 So the position of a gene relative to the origin
will affect its expression, and the regulatory
systems would have evolved to accommodate
for the gene dosage effect.
 So what would happen
oriC
if you moved genes ?
ter
Dr M. D-S, 2007
Gene Direction
 What happens when a DNA pol meets an
RNA pol going in the opposite direction?
RNA
Polymerase
DNA
Polymerase
Dr M. D-S, 2007
Gene Direction
 What happens when a DNA pol meets an
RNA pol going in the opposite direction?
RNA
Polymerase
DNA
Polymerase
This is better….
Dr M. D-S, 2007
Gene Direction
ori
A preference for genes to
be on ONE strand of the
replichore, so that the
direction of transcription
and replication are the
same.
This bias may have other
implications.
Dr M. D-S, 2007
Recombination/inversions
 Genomes often have large repeated
sequences, eg. ribosomal RNA gene
clusters (16S-23S-5S), or phage
genomes.
 Such repeats allow large inversions of
DNA segments or recombination
between chromosomes
Dr M. D-S, 2007
Inversion via repeated
sequences
Homologous recombination between
rRNA genes
Dr M. D-S, 2007
origin
GC-skew
Chi sequences
terminus
Dr M. D-S, 2007
Genomics: What is GC-skew ?
Systematic bias in base composition of one
strand as you go around the genome
origin
[G-C]
[G+C]
GC skew
ter
ter
genome
Dr M. D-S, 2007
GC-skew of genomes
Dr M. D-S, 2007
GC-skew of genomic DNA
Compositional bias:
Leading strand enriched in G/T (keto)
Lagging strand enriched in C/A (amino)
WHY?
Perhaps due to deamination of
exposed C’s in the leading strand,
producing C>T mutations. Theory only.
Dr M. D-S, 2007
origin
GC-skew
Chi sequences
terminus
Dr M. D-S, 2007
E.coli O157:H7 - K12 genome
comparison:
Chi sequences
GCTGGTGG
 Sequence recognised (and cut) by
the RecBC enzyme
 Promotes homologous
recombination (by RecA)
Dr M. D-S, 2007
Lateral Gene Transfer (LGT)
 Literally, the natural transfer of genetic material
between different organisms (species, genera, etc)
 Doesn’t say how the DNA was transferred or
integrated, or where it came from.
 Does imply that the DNA can be identified as
‘foreign’
 Since DNA doesn’t have a ‘made in X’ sticker,
how can the ‘foreignness’ be identified? ….
Ideas?….
Dr M. D-S, 2007
Lateral Gene Transfer (LGT)
Known mechanisms of DNA transfer between
bacteria: Transduction
 transducing bacteriophages introduce host DNA,
and this recombines with the genome
 Transformation
 DNA uptake from the surroundings, and
recombination.
 Conjugation
 natural transfer method, sex pilus, one-way
transfer, recombination.
+
-
Dr M. D-S, 2007
Prophage
Bacteriophages that are temperate (as compared to
lytic) can exist inside host cells in a stable and
relatively inactive state as prophages.
 The host cell, with a prophage, is called a lysogen.
 Some prophages express virulence determinants,
such as toxins ( = lysogenic conversion). eg. Shiga
toxin
 Some prophages exist as plasmids, but most
integrate into the genome.
 If the prophage becomes damaged…. ?
Dr M. D-S, 2007
E.coli genome sequences
STRAIN
SIZE
DATE
E.coli K12
4639221 bp, Oct 13 1998
E.coli O157:H7 (USA)
5528970 bp, Jan 25, 2001
E.coli O157:H7 (Japanese) 5498450 bp, Mar 7, 2001
*about 4.1Mb in common
Data from NCBI:
http://www.ncbi.nlm.nih.gov:80/PMGifs/Genomes/eub.html
Dr M. D-S, 2007
E.coli O157:H7 - K12
genome comparison
 Unexpected complex segmented relationship
 Share a common 4.1 Mb ‘backbone’ or common,
and generally colinear sequence (only 1 inversion)
 Homologous sequences are interspersed with
HUNDREDS of ISLANDS of INTROGRESSED
DNA
A B C D
A B X C D
A B C D
A B
C D
Dr M. D-S, 2007
E.coli O157:H7 - K12
genome comparison
 The specific DNA segments for each strain were
named ‘O islands’ , ie O157:H7-specific DNA
segments, or ‘K islands’
 Backbone of 4.1 Mb common sequence. Not
identical (eg 75% of proteins differ by one aa).
 O-islands total 1.34 Mb (about 26% of genes !)
 Largest O-island is 106 gene region (not small!)
Dr M. D-S, 2007
E.coli O157:H7 - K12
genome comparison
 Virulence genes do not seem to be concentrated in
one particular ‘island’; appear to be several
 Often (189 cases), the backbone-island junction is
WITHIN an ORF.
AUG
O-island
UGA
Protein coding ORF
What does this pattern suggest?
Dr M. D-S, 2007
E.coli O157:H7 - K12
genome comparison
 Suggests that incoming DNA recombined with the
genome (somehow?) rather than inserted.
AUG
O-island
UGA
Protein coding ORF
Dr M. D-S, 2007
Comparative Genome Map
Dr M. D-S, 2007
Genome Map
Distribution of O-islands of
EDL933 specific sequence
(red), ‘K-islands’ of K12
specific sequence (green)
and common ‘backbone’
sequence (blue)
GC-content of genes,
plotted around mean
GC-skew for 3rd codons
Scale, in base pairs
Octamer Chi sequences
Dr M. D-S, 2007
Genome sequence - Figure 2
O-specific ‘islands’
K-specific ‘islands’
O157:H7
genes and their
orientation
Scale (10kb/tick)
Dr M. D-S, 2007
Genome sequence - Figure
2
CP-933 = Cryptic Prophage. Also an O island
How many kb is this phage genome?
Dr M. D-S, 2007
E.coli O157:H7 genome sequence
Summary of main findings:
1. Many insertions of DNA around chromosome
2. Inserted DNA is foreign (HGT or Lateral GT)
3. Several virulence gene clusters; widely spread
4. Prophage genomes prominent
5. Systematic variations base composition
- coding strand, GC skew, chi seqs
Dr M. D-S, 2007
E.coli O157:H7 genome sequence
Summary of main findings:
6. E.coli O157:H7 undergoes relatively high rates
of recombination and mutation.
- where is the DNA coming from ?
unknown, phage, mobile elements (eg. transposons)
- what is the main method of transfer ?
- is defective DNA mismatch repair important ?
Dr M. D-S, 2007
E.coli O157:H7 genome sequence
Summary of main findings:
These large differences can be exploited:
 Diagnostic tools (discriminate b/n E.coli strains)
 New virulence gene candidates can be tested for
function, and new drugs developed
 Effects of antibiotics on toxin synthesis examined
 Note in the genome sequences of many microbes,
the percentage of ORFs that cannot be identified is
often > 20%
Dr M. D-S, 2007