Download Genomic island analysis: Improved web-based software

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

X-inactivation wikipedia , lookup

Public health genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Essential gene wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Genome (book) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
Genomic island analysis:
Improved web-based software
and insights into an apparent
gene pool associated with
genomic islands
William Hsiao
Brinkman Laboratory
Simon Fraser University
Burnaby, BC, Canada
Prokaryotic Genomic Islands (GIs)

Definition: Genomic DNA segments with
particular characteristics that indicate
horizontal origins
A bacterium
GI
Genomic Island Characteristics
Exhibit sequence and annotation features
Genomic Island (e.g. PAI)
(%G+C, sequence composition bias)
chromosome

Direct Repeats
tRNA gene
mob
Direct Repeats
VF VF
VF
mob: mobility genes
Often contain genes encoding adaptive functions of
medical and environmental importance



Pathogenicity Islands: virulence factors (genes contribute to diseases)
Resistance Islands: antibiotic resistance
Metabolic Islands: secondary metabolism (e.g. sucrose)
IslandPath: Aiding identification of GIs
TCP island
Vibrio cholerae N16961 Chr1
A yellow circle: %G+C above high cutoff
A green circle: % G+C between cutoffs
A pink circle: %G+C below low cutoff
A black bar: transfer RNA
A purple bar: ribosomal RNA
A deep blue bar: both tRNA and rRNA
A black square: transposase
A black triangle: integrase
A strike-line: regions with dinucleotide bias
TCP = toxin
co-regulated pili
(Hsiao et al 2003
Bioinformatics p418-20)
IslandPath V.2
Which Features Best Identify GIs
Examined prevalence of features in 95 published islands

85% of islands with >25% dinucleotide bias coverage
(62% have > 50% dinucleotide bias coverage)

Mobility genes identified in >75% of the islands

tRNA genes observed in <50% of known islands

Only 20% of the islands show atypical %G+C
Properties of genes in GIs?
Defined a “putative island” as
 8 or more genes in a row with dinucleotide
bias
 8 or more genes in a row with dinucleotide
bias + an associated mobility gene
Any difference for genes in islands versus outside
of islands in terms of their protein Functional
categories?
 63 genomes (67 chromosomes) analyzed
 COG: cluster of orthologous groups of proteins
Bacillus subtilis 168
More novel genes inside of islands
Yersinia pestis CO92
70.00%
Vibrio cholerae chromosome II
Vibrio cholerae chromosome I
Sulfolobus solfataricus
Streptococcus pneumoniae TIGR4
Staphylococcus aureus N315
Salmonella typhimurium LT2
Pseudomonas aeruginosa PAO1
Neisseria meningitidis MC58
Mycobacterium leprae
Mycobacterium tuberculosis
CDC1551
Mycoplasma pneumoniae M129
Listeria innocua Clip11262
Helicobacter pylori 26695
Haemophilus influenzae Rd-KW20
Escherichia coli O157
Chlamydia trachomatis D
Clostridium acetobutylicum
ATCC824
Escherichia coli K12
Buchnera sp. APS
Borrelia burgdorferi B31
Proportions of Genes with no COG Assignment in Islands vs. Outside
OUTSIDE
ISLAND
60.00%
50.00%
40.00%
Paired-t-test
P value:
1.27E-18
30.00%
20.00%
10.00%
0.00%
Hsiao et al. PLOS
Genetics e62, Nov.
2005
Control for Analysis Biases

Control for mis-prediction of genes in
sequence composition biased regions


Control for bias of COG Protein Classification


Excluded genes < 300bps
Used SUPERFAMILY classification which is better
at detecting distant homologs
Control for compositional bias due to other
factors

Used the dinucleotide bias plus mobility gene
dataset
More novel genes in islands in all
experiments
Island Dataset
Classification
Method
Paired t-test pvalue
DINUC (all genes)
COG
1.27E-18
DINUC+MOB (all Genes)
COG
1.20E-18
DINUC (all genes)
SUPERFAMILY
1.13E-18
DINUC+Mob (all genes)
SUPERFAMILY
4.43E-14
DINUC (>300bps)
COG
1.05E-17
DINUC+MOB (>300bps)
COG
7.65E-16
DINUC (>300bps)
SUPERFAMILY
3.01E-16
DINUC+MOB (>300bps)
SUPERFAMILY
2.04E-10
Hsiao et al. PLOS Genetics e62, Nov. 2005
Phage may be the predominant
donors of GIs

Some GIs are clearly of bacteriophage origin, but
more may be from phage as well

Predicted subcellular localizations of proteins
encoded in our GIs similar to phage genomes (lower
proportion of cytoplasmic membrane proteins)


Hsiao et al. PLOS Genetics e62, Nov. 2005
Many GI encoded genes have sequence
characteristics similar to phage genes (A+T rich and
short)

Daubin et al. Genome Biol. 4(9): R57
7
Proportions of virulence factors in Islands
vs. Outside of Islands in 26 pathogens
Outside
Island
6
P value: < 2.2E-16
% of VFs
5
4
3
2
Higher proportions of genes in Islands are VFs
1
0
DINUC
DINUC + Mob Gene
Island Types
Fedynak, Hsiao, and Brinkman (unpublished)
http://zdsys.chgb.org.cn/VFs/
Certain classes of VFs overrepresented in GIs
Virulence Factor Database (VFDB) classification of VFs in GIs and non-GIs
VFDB Classification
Unclassified
Secretion system
Adherence
Iron uptake
Type III translocated protein
Antiphagocytosis
Protease
Toxin
GIs
VFs (#)
185
95
59
33
6
23
5
18
non-GIs
Proportion VFs (#)
of genes
(%)
1.89
158
0.97
138
0.60
138
0.34
59
0.06
1
0.23
66
0.05
5
0.18
53
p-value
Proportion
of genes
(%)
0.23
0.20
0.20
0.09
0.00
0.10
0.01
0.08
< 2.20E-16
< 2.20E-16
5.69E-13
5.83E-11
1.54E-07
3.34E-04
2.08E-03
2.34E-03
Most of these are “offensive” virulence factors
Fedynak, Hsiao, and Brinkman (unpublished)
Conclusions

Genomic islands contain disproportionately
higher number of novel genes, suggesting a
large and understudied gene pool
contributing to horizontal gene transfer

These novel genes appear to be drawn from
a large pool of phage - metagenomics studies
useful

These novel genes may contribute to
microbial adaptation and may play a role in
pathogenesis and in antibiotic resistance
Acknowledgements
Fiona Brinkman
 Amber Fedynak -VF studies
 Brian Coombes, Michael Lowden, and Brett
Finlay (UBC) - Microarray data
 Jenny Bryan (UBC) -Stats analysis
 Brinkman Laboratory

http://www.pathogenomics.sfu.ca/islandpath
Other categories more common in
islands
Category
In putative islands:
Paired t-test
p-value
In putative islands +
mobility genes:
Paired t-test
p-value
Cell motility
7.73E-5
0.002087 (may be a
sampling size issue)
Intracellular trafficking,
secretion, and vesicular
transport
8.124E-3
0.406955 (may be a
sampling size issue)
Several metabolism-associated categories are under-represented
in islands
* Novel genes not included in analysis due to potential skew of other category
results
70.00%
Yersinia pestis CO92
80.00%
Vibrio cholerae chromosome II
Vibrio cholerae chromosome I
Staphylococcus aureus N315
Streptococcus pneumoniae
TIGR4
Sulfolobus solfataricus
Salmonella typhimurium LT2
Pseudomonas aeruginosa PAO1
Neisseria meningitidis MC58
Mycobacterium leprae
Mycobacterium tuberculosis
CDC1551
Mycoplasma pneumoniae M129
Listeria innocua Clip11262
Escherichia coli O157
Haemophilus influenzae RdKW20
Helicobacter pylori 26695
Chlamydia trachomatis D
Clostridium acetobutylicum
ATCC824
Escherichia coli K12
Buchnera sp. APS
Borrelia burgdorferi B31
Bacillus subtilis 168
Proportions of Genes with no SUPERFAMILY Assignment in Islands vs. Outside
OUTSIDE
ISLAND
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
P value
3.0E-16
IslandPath V.2
Experiment: S. typhimurium LT2 ssrB gene KO
Track 1: IslandPath
Track 2: Microarray expression (overexp
& underexp
)