Download What Have We Learned From Unicellular Genomes?

Document related concepts

Mitochondrial DNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

NUMT wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Polyploid wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression programming wikipedia , lookup

Essential gene wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Transposable element wikipedia , lookup

Genetic engineering wikipedia , lookup

Oncogenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Metagenomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Public health genomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human genome wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Designer baby wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

Human Genome Project wikipedia , lookup

Genomics wikipedia , lookup

Genomic library wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
What Have We Learned
From Unicellular Genomes?
Propionibacterium acnes
Bacteroides thetaiotaomicron
Mycoplasma genitalium
Mimivirus
Cyanobacteria
Plasmodium
Yeast
Why do I get so many pimples?
The genome of Propionibacterium acnes was
sequenced in July of 2004.
P. acnes lives in sebaceous cysts and
sometimes stimulates and immune response.
A group in Paris, along with two groups in
Germany sequenced P. acnes.
They found 2,333 genes in its 2.6 Mb
genome.
68% of these had orthologs in other species.
20% had none, and 12% encoded only RNA.
Anatomy of a pimple
Genome-wide evaluations
A first step following bacterial genome sequencing
is finding the ori and terminus for replication.
GC skewing (non-uniform distribution of G’s & C’s
Oris tend to have the lowest skew, while termini
have the highest.
Genes that have originated by horizontal transfer
are identified using a sliding window to find
segments with abnormal GC content.
Codon bias is also used to detect HT. Immunogenic
and metabolic genes were detected.
Transcriptional Phase Variation
During finishing, it was found that P. acnes had a
variable # of G’s associated with some genes.
It is hypothesized that the initiation of
transcription depends on the # of consecutive
G’s.
As rows of G’s are replicated, the # will change.
This leads to a mixed population of bacteria with
varying degrees of protein production.
This diverse population is optimized to respond
differentially to various skin treatments.
Digesting Our Cells For Food
P. acnes was found to be able to grow
anaerobically as well as aerobically.
Cells produce many enzymes that are able to
degrade lipids, ester, and amino acids.
Some of these degradation products increase
adhesion to our cells.
Many of the digestive enzymes contain a motif
(LPXTG) that targets them to the cell wall.
Hyaluronate lyase is also found on the surface
of the bacteria, this destroys the extracellular
matrix that binds our cells together.
Stimulating the Immune Response
P. acnes produces 5 CAMP factors (secreted
proteins that bind antibodies) that can form
pores in the cell membrane.
A dipeptide motif (PT) is present in certain
proteins, this motif is also found in M.
tuberculosis.
The bacteria also has at least 7 heat shock
protein genes.
Porphyrin is also secreted, which produces
toxic forms of oxygen, further stimulating the
immune response.
Withstanding the Environment
P. acnes can signal nearby cells that
something has changed in the environment.
Sensors called two-component systems (1 to
sense & 1 to signal) exist in some bacteria, P.
acnes has 10 pairs.
Quorum sensing is the ability to detect
conditions of overcrowding. The LuxS gene
is expressed in these instances, which
produces a universal signal for interspecies
communication among bacteria.
Biofilms of meshed-together cells protect
themselves.
Are all bacteria living in us bad for us?
An average human body is composed of about
1013 cells.
Our intestines have about 1010 microbes/ml
and contain at least 1,000 ml.
A majority of the cells in our bodies may be
bacteria! (500 - 1,000 different species)
This accounts for 2-4 million non-human genes
Bacteroides thetaiotaomicron constitutes a
substantial portion of our intestinal flora.
A group from Wash. U. in St. Louis sequenced
it’s genome.
Overview of the Genome
B. thetaiotaomicron’s genome contains 6.3 Mb,
as well as 4,779 genes (and a 33 kb plasmid).
58% of ORFs have known function, 18% have
orthologs of no known function, and 24% have
no homology with known proteins.
COGs (functional categories of genes) are
determined following sequencing to create an
overview of a given genome.
Many of the genes specialize in sugar uptake,
cell wall synthesis, environmental sensing and
signaling, as well as transposition.
Major COGs
Sugar metabolism- 170 genes fit into this
category, most bacteria have a set of 23.
61% of these appear to be secreted, this not
only benefits other bacteria but us as well.
163 paralogs of 2 genes (SusC & SusD) import
sugars into the cytoplasm of the microbe.
Many two-component genes are present for
signaling, some of these interact with s factors.
63 tranposons are present, which may help
spread antibiotic resistance.
Does Size Matter?
The coding capacity for this genome is very
high (89% coding DNA) but it has a lower
ratio of gene # to genome size than expected.
This was a paradox until it was determined
that the ORFs of this microbe are unusually
large. It is unclear why this is the case.
Summary
Gut symbionts provide us with predigested
sugars, stimulated blood vessel formation,
crowd out pathogens, sequester limited
resources, and stimulate our mucosal layer.
Can Microbial Genomes
Become Dependent Upon Us?
In the microbial world, if you don’t use it- you
lose it.
Mycoplasma genitalium has one of the most
reduced microbial genomes and the 2nd
smallest bacterial genome with 580 kb (the
smallest is N. equitans with 490 kb).
TIGR sequenced its genome in 1995.
470 ORFs were found, 96 of which have no
known orthologs.
M. genitalium has an 88% coding capacity.
Genes that have been lost:
M. genitalium has presumably lost many genes
involved in the synthesis of amino acids,
cofactors, cell envelope, and regulatory factors.
It has only 1 s factor.
The microbe has retained genes for energy
metabolism, fatty acid and phospholipid
metabolism, nucleotide production, replication,
transcription, and protein transport.
The only category overrepresented is
translation, namely rRNA and tRNA genes.
What is the Minimum # of Genes?
Craig Venter, along with Hamilton O. Smith, is
trying to construct an organism with the
fewest possible genes.
A new field called synthetic biology seeks to
synthesize a functioning genome de novo.
A better understanding of evolutionary
principles and genome circuitry is sought.
Japanese & European scientists have tried to
identify the essential genes of B. subtilis.
They have found that only 192 genes are
indispensable to life.
Do all Viruses have Small Genomes?
Most viral genomes are much smaller than
bacterial ones:
HIV- 9,200 nt
WNV- 10,962 nt
SARs- 29,727 nt
T7- 39,900 nt
l- 48,502 nt
In 2003, a new virus that infects amoeba was
isolated that has 1.2 Mb! A group in Marseille,
France sequenced Mimivirus, as it is called.
Mimivirus Genome
1,262 ORFs were identified, the coding
capacity is 90.5%.
Like most viruses, the genome is linear, but
it has inverted repeats at both ends by which
it may circularize, perhaps during replication.
Isoleucine is used twice as often as usual,
and there is a strong codon bias for codons
lacking G or C. The genome is 28% GC.
Mimivirus is overrepresented in genes for
translation, posttranslational modification,
and amino acid transport and metabolism.
Is Mimivirus Alive?
The genome of Mimivirus resembles bacterial,
Mimivirus even stains Gram +, is it a virus?
In 1957, the definition of a virus was proposed:
1) smaller than .2 microns
2) possesses DNA or RNA, not both
3) not able to synthesize its own proteins
4) cannot generate energy from substrates
5) cannot grow by binary fission
Mimivirus only satisfies the 4th category, we
are not sure about the 5th.
What is it then?
Mimivirus has blurred the distinction between
prokaryotes and viruses.
It is hypothesized that, like M. genitalium,
Mimivirus has lost genes over time.
We will learn of more obligate intracellular
parasites later in class.
Mimivirus may resemble some of the earliest
forms of life that was able to replicate
independently until it became a parasite.
Genomes Reflect an Organism’s
Ecological Niche
Cyanobacteria are the most productive
phytoplankton in the world.
The two most abundant genera of cyanobacteria are Prochlorococcus and Synechococcus. 3 genomes in the former group and 1
in the latter were sequenced in 2003.
Individual cells from both genera are referred
to using a numbering system to indicate
different ecotypes. Species designations are
difficult to assign still, Prochlorococcus was
discovered in the 1990s.
Prochlorococcus
Dot
Plot
Alignment
Prochlorococcus MED4 vs. MIT9313
These ecotypes share 1,352 orthologs.
Short diagonal segments indicate synteny.
A negative slope indicates that the segment
was inverted in one type relative to the other.
Segments with positive slope but located off
the diagonal indicate chromosome
recombinations.
Genes along the axis means they are missing
from the other ecotype, MED4 has 364 genes
not found in MIT9313, which has 923 genes
not found in the other.
pcb gene family
A major difference between the ecotypes is in
the pcb gene family, which encode
chlorophyll-binding, light-harvesting antenna
complex proteins that help capture a wider
spectrum of light.
MED4 (high light) has only 1 pcb gene
MIT9313 (medium light) has 2 (A & B)
SS120 (low light) has 8 (A-H)
MED4’s gene does not respond to changes
in Fe+3 but MIT9313’s is induced 7-fold and
SS120’s is induced 23-fold.
MED4’s Small Genome
MED4’s genome is the smallest known for a
photoautotroph and may represent the
minimum for a photosynthetic organism.
MED4 appears to have lost genes over time.
A more stream-lined genome means a
narrower ecological range that an organism is
adapted for. Synechococcus has the largest
genome of this group and the largest
ecological range as well.
People have proposed seeding the ocean
with Fe+3 to help stimulate CO2 consumption.
Gene deletions in Cyanobacteria
Malaria
Malaria, although it rarely makes news
headlines, is a daily threat to the 3 billion
people who live in tropical climates.
In 2002, about 500 million people were
infected. About 2.7 million people die each
year (about 90% of these are < 5 years old).
The cause of malaria has been known for 100
years but we still can’t stop its spread.
The most lethal form of malaria is caused by
Plasmodium falciparum.
Lifecycle of
Plasmodium
RBC Infection
The most vulnerable time for Plasmodium is
during the RBC infection stage.
The parasite must force its way into a RBC
without rupturing any plasma membranes.
Three structures are important during
infection:
1) extracellular coating to make cells sticky
2) apical end of cell must be oriented downward
3) apicoplast is an internalized algal symbiont
Plasmodium Genomes
Plasmodium actually has three genomes:
nuclear, mitochondrial, and apicoplastic.
Pulse-field gel electrophoresis to separate
chromosomes, followed by shotgun genome
sequencing was used on Plasmodium.
This proved to be the most AT-rich genome
sequenced so far (19.4% GC).
The 22.9 Mb genome has 52.6% coding
capacity and 5,268 ORFs (60% of which have
no known function, the largest of any
genome).
Tricking the Immune System
The genes of Plasmodium that are responsible
for binding to RBC’s and for avoiding the
immune system are located near the telomeres
of this eukaryote.
Genes located near Plasmodium telomeres are
replicated many times, all three gene families
in these categories (var, rif, & stevor) are
polymorphic.
There are 59 var paralogs, 149 rif, and 28
stevor. This may account for our immune
system’s lack of ability to deal with this parasite
The Plasmodium Proteome
1% of proteins are used for host cell invasion
4% help evade the immune response
31% are integral to the membrane
14% are enzymes (about 4x < most proteomes)
10% are transported to the apicoplast
60% have unknown function
The Krebs cycle is present, but the organism
grows anaerobically and only uses this cycle for
heme biosynthesis (which it could get from us)
Apicoplast Proteome
Similar to a chloroplast in origin but used for a
different purpose now.
Only two photosynthetic orthologs remain.
This organelle synthesizes fatty acids,
isoprenoids, and heme groups.
Nuclear proteins sent here assist in DNA
replication & repair, transcription, translation,
posttranslational glycosylation, protein import,
and protein degradation.
Comparing Plasmodia
The Plasmodium sequencing project took 45
people 6 years to complete.
At the same time, other groups were working
on P. yoelii, which infects rats and is used as
a model organism for malaria research.
Unfortunately, this latter genome was never
finished, making comparisons difficult.
P. yoelii has 600 additional ORFs, and the two
have 3,310 genes in common (56%).
Is this similar enough to make a good model
organism?
Malaria Treatment Options?
Recently, a German & American team used
reverse genetics (starting with a gene
sequence and deducing its function) to
target a gene in the production of a knockout strain. This strain is expected to be less
pathogenic than wild type. Mice injected
with this strain were protected for 30 days.
Even if a better drug were produced,
funding and health care infrastructure are
lacking in many problem areas. Very little $
is spent on malaria research.
Yeast
Yeast Genome
The S. cerevisiae genome was sequenced in
1996.
It took over 600 scientists in Europe, North
America, and Japan working together to
seqeunce the 12 Mb genome.
Yeast has a 70.3% coding capacity, higher
than Plasmodium but lower than all bacteria.
There is a gene every 2 kb in yeast, one
every 6 kb in C. elegans, and one every 30
kb in humans. Eukaryotes have more junk
DNA than prokaryotes and enhancers,
promoters, and introns add substantially to
the size of eukaryotic genes.
Chromosome Structure in Yeast
The 4 smallest chromosomes in yeast have a
unique structure. It was known from using
YACs that chromosomes smaller that 150 kb
were not stable in yeast. These chromosomes
are relatively gene-poor and undergo
recombination at high frequencies, perhaps to
protect the larger ones from the same fate.
Transcriptionally silent genes are found in the
sub-telomeric regions of many chromosomes,
this may help identify the right and left sides of
a chromosome.
Yeast Chromosomes
Evolutionary History of Yeast
There were a substantial number of genes
found in duplicate copies in yeast.
It was proposed that yeast had undergone
“duplication events” at some point in time.
Many regions of chromosomes are syntenic
with regions on other chromosomes. Such
paralogs are seen as evolutionary experiments
where one gene can drift to provide new
specialized functions.
Some genes were initially thought to be extra
copies but experiments proved their difference
Predictions for the Future
The authors of the landmark 1996 yeast
sequencing publication made the following
predictions:
1) they described plans to produce a collection
of single, double, and even triple KO mutations
2) they addressed the value of making all
genome sequences publicly available.
3) They felt WGS sequencing of large genomes
was not feasible.
4) They looked forward to comparing yeast with
the S. pombe as well as the human genome.
Better Annotation
A number of yeast genomes have been
sequenced since 1996. With these, the need
to annotate genes based on GO, Gene
Ontology, became clear.
Improvements in computers, search
algorithms, and the increased volume of
genes in the databases lead to better
annotation.
The original 5,885 ORFs annotated has been
increased to 6,672, many below the original
cutoff of 100 codons