Download Comparative Genomics Final

Document related concepts

Drosophila melanogaster wikipedia , lookup

Molecular mimicry wikipedia , lookup

Immunomics wikipedia , lookup

Complement component 4 wikipedia , lookup

Transcript
Comparative Genomics
The Finale
Angela Pena, Ambily Sivadas, Amit Rupani
Shimantika Sharma, Juliette Zerick
Keerti Surapaneni, Artika Nath, Hema Nagrajan
Outline
Results
Goal 1 – PCR Assay
Goal 2 – Comparative genome analysis
Goal 3 – Haemolysis study
Goal 4 – Virulent factors
Discussion
•
•
•
•
•
Goal 1
Identification and characterization
of target genes for PCR Assay
Identification of target genes
Fw primer
Rv primer
A
C
B
A
C
Hhae
NTHi
PCR products of different size
One copy of B or multiple copies? Is A-C organized the same
way in both organisms?
Identify candidate clusters/genes for assay development and
conserved regions for primer/probe design
Cluster Analysis – Genome Set
Cluster statistics
 Protein sequences were clustered using Blastclust





Total clusters: 8402
Total common to all genomes : 361
Total unique to Hhae: 82
Total unique to Hinf: 38
Total unique to Pathogenic strains: 0
Common clusters – Functional
breakdown
Metabolism
protein syntheis /post trans
transport
DNA repair replication
chaperone/protein folding
proteolysis
tRNA processing
DNA repair/ replication
secretion
competence
cell division cell cycle
regulation
Hypothetical
ATP/DNA binding
unknown
transcription
stress response
secretion
antiobiotic resistence
Target identification - Protocol 1
1. Take all proteins common to all 25 genomes
Common = Most conserved proteins
•
Target identification - Protocol 1
1. Cluster Analysis: Take all proteins common to all 25 genomes
Common = Most conserved proteins
•
2. Compute and compare inter-cluster distances
for Hhae vs Hinf
•Look for species specific patterns
•Look for including unique genes
If a pistol just isn't working for you . . .
Protocol II:
BLAST Everything
Our method is this: for every unique Hhae gene, we will
locate its corresponding contig
 We checked the flanking regions (on the contig) for
conserved genes.
 We will then locate the conserved genes in the Hinf
genome and see if they are adjacent.
 Since a wide net can be cast with BLASTn searches,
this includes homologs.
Start with a
set of Hhae
genes
We
found a
(more)
unique
gene!
Select a unique
Hhae gene from
the set
YES
Reject gene
and start over
Search the set of
common Hhae/Hinf
(conserved) genes for
genes in the flanking
regions
Is there at least
one conserved
gene in each
flanking region?
NO
Are the conserved
genes adjacent or
“close enough” in
the Hinf genome?
Get the locations of the
conserved flanking
genes in the Hinf
genome
NO
YES
PCR Assay: Results

Target 1
50S
fatty acid/
ribosomal
phospholipid
protein synthesis protein
1020 bp
Hh
NTHi
170 bp
Nucleic acid binding
protein (hypothetical)

3-oxoacyl-(acyl carrier
protein) synthase III
PCR product
1250 bp
Hh
NTHi

380 bp
No duplication was found for these genes
PCR Assay: Results

Target 2
1451 bp
Hh
NTHi
fructosebiphosphate
aldolase

1934 bp
predicted
membrane
protein
purine
nucleoside
phosphorylase
PCR product
1451 bp
Hh
NTHi
1934 bp
Target validation by Insilico PCR
Step 1: Multiple Sequence Alignment by ClustalW2 - Overview
1
870
1775
2749
Non Typable
H. influenzae
19 strains + 1
Typeable
H. haemolyticus
5 strains
Target 1
905 nts
Step 2: Phylogenetic analysis
Neighbor Joining Tree Percentage of Identity using
Jalview
Step 3: Finding primers
1
870
Forward
1775
2749
5’-CTCACTTACGCCACCACGTA-3’
Non Typable
H. influenzae
20 strains
H. haemolyticus
5 strains
3’-TGCAACAATAATCAGTTCAATATCT-5’
Reverse
In silico PCR Analysis
Non Typable H. influenzae
AAZD00000000
Product length:
487
H. haemolyticus
M21621
Product length:
1354
In silico PCR Analysis
Sequence (5'->3') FORWARD
STRAIN
PRODUCT LENGTH
START
STOP
M19107
1363
654
673
M19501
1364
654
673
M21127
1363
654
673
M21621
1354
655
673
M21709
487
654
673
CP000671
489
653
672
L42023
487
654
673
MSA – Target 2
1
5372
Non Typable
H. influenzae
20 strains
H. haemolyticus
5 strains
Goal 2
Comparative genomic analysis
Horizontal Gene Transfer
•
Horizontal gene transfer (HGT), also lateral gene transfer (LGT) refers to the transfer of genetic
material between organisms
Alien Hunter
•
•
•
Predicts putative horizontally transferred regions.
Standalone software
Available at http://www.sanger.ac.uk/Software/analysis/alien_hunter
Usage:
./alien-hunter <input_file> <output_file>
INPUT: raw genomic sequence
PREDICTION: HGT regions based on Interpolated Variable Order Motifs (IVOMs)
.sco file
Last time, we got many hits with varied scores that covered almost 90% of the genes in each genome.
Hence, we decided to place a threshold on the scores.
•
We studied the distribution of scores for each genome by plotting histograms for each genome
based on the scores.
•
We decided to place a threshold of >70 after studying all the histograms.
Screenshot of M21621
HGT gene count
6000
5500
5000
4500
4000
3500
before filtering
3000
after filtering
2500
2000
1500
1000
500
0
1
2
3
4
5
6
before filtering
after filtering
1699
360
2709
145
3065
225
5601
253
1140
108
2717
185
Insertion elements
•
An Insertion element is a short DNA sequence that acts as a simple transposable element.
•
A transposable element (TE) is a DNA sequence that can change its relative position (self-transpose)
within the genome of a single cell. The mechanism of transposition can be either "copy and paste" or
"cut and paste".
IS Finder
FASTA sequences
We retrieved FASTA sequences by submitting the accession IDs in NCBI
BLAST
We blasted these insertion sequences against each of the strains and got the location of the
insertion sequences in the strain.
A PERL script was written to extract the insertion sequences from their respective contigs in each
strain.
Feature /Strain
Tools
M19107
M19501
M21127
M21621
M21639
M21709
Average
Genome size
-
1774129
1809865
2029793
1959123
2397857
1808157
1963154
GC content %
IGIPT
39.39
38.23
39.11
39.11
38.30
38.73
38.81
Total Number of
genes
-
1973
1785
2086
1923
2669
1840
2046
Operons
-
76 (27)
69 (14)
71 (25)
79 (21)
87 (36)
70 (14)
73
124
115
116
115
124
144
115
Comparative Analysis Table
Virulence factors
HGT gene count
Alien Hunter
360
145
225
253
108
185
213
Pathogenic
-
No
No
Yes
Yes
Yes
No
-
Insertion elements
IS Finder
6
6
-
-
15
-
4.5
Hemolytic activity
-
Y
N
Y
Y
N
N
-
M19107 – Circular alignment using BRIG
M19501 – Circular alignment using BRIG
M21127– Circular alignment using BRIG
M21621– Circular alignment using BRIG
M21639 – Circular alignment using BRIG
M21709 – Circular alignment using BRIG
Goal 3
Identification and
Characterization of Haemolysin
in Hhae
AIM #1
Look for the hemolysin BA operon present in
the H.haemolyticus strains and characterize it
as present/absent in the hemolytic and non
hemolytic strains
HEMOLYSIN
•
Hemophilus ducreyi, requires two adajecent genes, hhdB and hhdA
for hemolysis .
•
hhdB is an outer membrane protein, which is required for secretion
and activation of the hemolysin structural protein, hhdA.
•
Once secreted, hhdA interacts with target cell membranes,
oligomerizes, and forms pores 2.5 to 3.0 nm in diameter, which lyse
the target cell
TWO PROTEIN
SECRETION
SYESTEM
OUR STRATEGY
• Downloaded the Fasta files of all hemolysin
protein sequence of the Pasteurellaceae family
from NCBI protein database.
• Blasted the predicted protein sequences of the
six strains against these.
Cut off threshold: Identity 70%
Coverage 80%
RESULTS
Strain
Hemolysis
Gene A /contig
Gene B /contif
Haemophilus
haemolyticus
M19107
Y
51_11|7343|11596
1417 amino acids
ZP_09185204.1| hemolysin [Haemophilus
[parainfluenzae]
51_1216|11855|13366
503 amino acids
ZP_09185203.1| hemolysin activation/secretion protein
[Haemophilus [parainfluenzae]
Haemophilus
haemolyticus
M21127
Y
20_113|106934|111307|
1457 amino acids
ZP_09185204.1| hemolysin [Haemophilus
[parainfluenzae]
Haemophilus
haemolyticus
M21621
Y
1_361|369207|373580
1457 amino acids
ZP_09185204.1| hemolysin [Haemophilus
[parainfluenzae]
Haemophilus
haemolyticus
M19501
Haemophilus
haemolyticus
M21639
Haemophilus
influenza
M21709
N
None
20_112|105150|106760
536 amino acids
ZP_09185203.1| hemolysin activation/secretion
protein
Haemophilus [parainfluenzae
1_362|373754|375349|
531 amino acids
|ZP_09185203.1| hemolysin activation/secretion
protein
[Haemophilus [parainfluenzae]
None
N
None
None
N
None
None
All hits had 70% and more identity and 95-100 coverage
AIM# 2
• Characterize the domains/motifs/residue in
hemolysin.
• Depict the secondary structures in hemolysin.
• Predict the 3D structure of hemolysin.
SIGNAL PEPTIDE & HAEMAGGLUTINATION
ACTIVITY DOMAIN
N’ terminal
Haemagglutination
activity domain
Signal
Peptide
• A signal peptide (25 aa) to transport the hemolysin to outer
membrane or periplasm. LipoP cleavage site Spase I at 25-26. NOT
LIPOPROTEIN
• Haemagglutination activity domain -suggested that the
haemagglutination activity domain is a carbohydrate-dependent
haemagglutination activity site which is found in a range of
haemagglutinins and haemolysins
HAEMAGLUTININ REPEAT
• Haemaglutinin repeat is a highly divergent repeat that occurs
in number of proteins implicated in cell aggregation
TPS DOMAIN
All TPS-secreted proteins contain a distinctive N-proximal module essential for
secretion, the TPS domain. TpsA proteins display two conserved regions, C1 and C2,
and two less-conserved regions, LC region. ANPNL and NPNGIS is found in this region
hemolysins/cytolysins ShlA of Serratia marcescens, HpmA of Proteus mirabilis, EthA of Edwardsiella
tarda, HhdA of Haemophilus ducreyi, the large supernatant proteins LspA1 and LspA2 of H. ducreyi, and
the HecA
adhesin of E. chrysanthemi .
Clantin et al., 2004. The crystal structure of filamentous hemagglutinin secretion domain and its implications for the twopartner secretion pathway.PNAS.
Does the TPS domain exist in H.haemolyticus strains?
21127
21621
19107
Fha30
EthA
HpmA
LspA1
LspA2
ShlA
HhdA
.
Fha30
H.H
H.H
H.H
HhdA
EthA
ShlA
HpmA
LSpA1
LSPA2
CONSERVED RESIDUES IN TPS DOMAIN
ANPNL
NPNLGI
NPNL & NPNGI These motifs form type I beta -turns, which might play important stabilizing roles. The
conserved residues of the TPS domain serve to drive the folding of the TPS domain into a beta -helix
and to stabilize the helix
TPS
HAD 39-159-Pfam
Or
TPS 39-270
STRATEGY SECONDARY STRUCTURE
AIM #3
• Identify the domains in the hemolysin
activator gene
• Determine the secondary and 3D structure
of hemolysin activator gene
HEMOLYSIN ACTIVATOR PROTEIN
TRANSMEMBRANE PROTEIN
MEMBRANE PROTEINS
α-helical
β-barrel
β-barrel membrane protein class are located in the outer membrane of
Gram-negative bacteria.
These proteins have membrane spanning segments formed by
antiparallel
β-strands, creating a channel in the form of a barrel that spans the outer
membrane.
DOMAIN IS HEMOLYSIN ACTIVATOR
SP
POTRA
_2
Activator Domain
SP (LipoP) – SPI cleavage site between pos. 19 and 20. NOT LIPOPROTEIN
POTRA_2- polypeptide-transport-associated domain. In ShlB this domain has a chaperone-like
function over ShlA.
Activator domain in ShlB is shown to interacts with ShlA during secretion and imposes a
conformational change in ShlA to form the active hemolysin.
ShlA/B: Serratia marcescens
Strategy
• Prediction of TransMembrame Beta Barrels (PRED TMBB)
Method is powerful when used for discrimination purposes,
as it can discriminate with a high accuracy the outer
membrane proteins from water soluble in large datasets
• The
'TransMembrane
protein
Re-Presentation
in 2 Dimensions' tool, automates the creation of uniform, twodimensional, high analysis graphical images/models of alphahelical or beta-barrel transmembrane proteins.
Work Flow
Sequence
Discrimination
score
Sequence
Step 1: Find Discrimination Score
Haemophilus haemolyticus M19107 –hemolystic
Contig : 51_1216 Start/End: |11855|13366 Length: 503 amino acids
STEP2
Predicted structure 2D of hemolysin activator
Haemophilus haemolyticus M19107 –hemolystic
CYTOPLASMIC
Coloring by Hydrophobic Potential
More hydrophobic
+
-
Hydrophilic
3D STRUCTRES
STRUCTURE PREDICTION METHODS
Homology
Modelling
Requires a template
with a high percentage
identity
De novo
protein
structure
prediction
Models the structure based
on general principles that
govern protein
folding energetics
Protein
Threadin
g
Works when homology
modelling fails
PROTEIN THREADING APPROACH
• It follows Protein Threading approach to predict the structure
of the target protein sequence.
• In practice, when the sequence identity in a sequence
alignment is low (i.e. <25%), homology modeling may not
produce a significant prediction. In this case, if there is
distant homology found for the target, protein threading can
generate a good prediction.
• Protein threading, also known as fold recognition, is a method
of protein modeling (i.e. computational protein structure
prediction) which is used to model those proteins which have
the same fold as proteins of known structures, but do not have
homologous proteins with known structure.
• We used RaptorX server which predicts the structure based
on protein threading method.
3D STRUCTURES OF HEMOLYSIN FROM
RaptorX
Segment 1 (37-158)
Segment 2 (159-1457)
3D STRUCTURES OF ACTIVATOR FROM RaptorX
SP
POTRA
_2
Activator Domain
Segment 3 (1-66)
POTRA_2 Domain
Segment 2 (67-136)
Activator Domain
Segment 1 (149-500)
POTRA DOMAIN
POLYPEPTIDE-TRANSPORT-ASSOCIATED DOMAIN
SP
POTRA
_2
Activator Domain
• POTRA domains have similar
structure comprising a threestranded b sheet overlaid with a
pair of antiparallel helices .
E.coli
POTRA
Domain
figure
Kim et al., 2007. Structure and Function of an Essential
Component of the Outer Membrane Protein Assembly Machine.
SCIENCE VOL 317
Goal 4
Identify and characterize the
potential virulence factors in
Haemophilus haemolyticus
Human Immune System
An Immune System (IS) is a system of biological structures and
processes within an organism that protects again disease.
In order to function properly, IS must detect a wide variety of agents,
from viruses to parasitic worm, and distinguish them from the organism’s
own healthy tissue.
Pathogens can rapidly evolve and adapt to new environments to avoid
detection and destruction by the immune system:
Phase variation
Phase variation
Phase variation is defined as the random
switching of phenotype at frequencies that
are much higher (sometimes >1%) than
classical mutation rates.
Is a widespread source of intraespecific
genotypic and phenotypic variation
Several different mechanisms are exploited
by bacteria to switch gene and/or protein
expression “on” or “off”
Combinatorial math:
A bacterium with just 20-phase variable loci
can exist in 220 different states (more than a
million)
Multiple defense mechanisms have evolved to recognize and neutralize pathogens:
Immune System
Innate
Adaptive
Creates immunological
memory after an initial
response to a specific
pathogen
Non-specific response to
microorganism or toxins
found in the cell
Microbes are identified
by pattern recognition
receptors, eg. LPS
Generic response
Does not confer longlasting immunity against
the pathogen.
1.
2.
3.
Inflammatory response
Activation of Complement System
Antimicrobial peptides
Major component of
innate IS
Immune-Evasion target
Complement System
Biochemical cascade that attacks the surfaces of foreign cells
It can contain over 20 different proteins
Complement the killing of pathogens by antibodies:
- produces peptides that attract immune cells
-opsonize (coat) the surface of a pathogen, marking it for the destruction
- increase vascular permeability
Host Immune Evasion
Microorganisms have developed many ways to evade complement actions:
• Trapping endogenous C1 inhibitor
• Inactivating antibodies through capture of their FC regions
• Mimicking structural regulators
• Degradation crucial components of Complement System
How we did the analysis ?
Search of VF in Haemophilus genus
Virulence Factor Data Base
132 VF were retrieved
Search in NCBI for all VF in
Haemophilus genus
RefSeq protein sequences Data Base
Blastp against all the 25 genomes
of Haemophilus
constraints:
At least 40% identity
At least 70% of query coverage
Build a matrix with presence/absence
+3  Presence
-3  Absence
Upload matrix to MeV
Heat Map was build
HCL was generated
Results….
HCL: Hierarchical Cluster
Distance Metric: Pearson correlation
Linkage method: Average linkage clustering
132 virulence factors analyzed
25 samples: 19 NTHi
5 Hhae
1 THi
Cluster A : Uniform/regular pattern
Cluster B : heterogeneous/ irregular pattern
Cluster A : Uniform/regular pattern
• Transferring-binding protein 1
• tad locus
• LPS biosynthesis
• tad locus
• Type IV pili
• Adherence
• Exopolysaccharide
• LPS biosynthesis
• Haemophilus iron transport locus
• Hemoglobing and hemoglobin-haptoglobin binding
proteins
• Heme biosynthesis
• tad locus
• Cytolethal distending toxin
Cluster B : heterogeneous/ irregular
pattern
Btuc vitamin B12 receptor protein, E. coli
rfaC 1,5 Heptosyltransferase I
ABC_T ATP binding cassette transporter family
hptE Lipopolysaccharidae heptosyltransferase I
IgA1
hxuA
hxuB
hxuC
FepA
Immunoglobulin A protease
Heme/hemopexin-binding
complex
Ferric enterobactin receptor, E. coli
Prevalence Ratio Analysis
Patho vs Asymptomatic - Adherence
HAEMOAGGLUTINATING PILI (hifABCDE
operon)
Function
•Promote adherence to respiratory mucus
and human oropharyngeal epithelial cells
•Facilitates colonization
Mechanism:
Binding to the Anton antigen (An-Wj)
common to buccal epithelial cells and
erythrocytes
Prevalence Ratio analysis
Patho vs Asymptomatic - Adherence
HAEMOAGGLUTINATING PILI (hifABCDE
operon)
Role in virulence
• Expression of pili is a phase-variable
phenomenon
•Variation in (TA) repeat units within the
overlapping promoter region of hifA and hifB
regulates the transcription of the gene
• 11 repeat units – reduced expression
• 10 repeat units – maximal expression
• 9 repeat units – transcriptional silencing
Found in M19501(5,4),AAZD00000000 (9),
AAZE00000000(9),AAZJ00000000(5)
Pathogenic – Hi F3047 strain (9)
In M19107 – only hifA and hifB – split
into two contigs –hence no info on TA
repeats
Prevalence Ratio Analysis
Patho vs Asymptomatic - Adherence
High Molecular Weight Protein 1/2
Function
•Adhesins that mediate attachment to human
epithelial cells
Structure features
• Autotransport protein
• Secretion of these adhesins
HMW1A/HMW1B requires accessory
proteins called HMW1B/HMW2B and
HMW1C/HMW2C
Prevalence Ratio Analysis
Patho vs Asymptomatic-LOS
rfaC
lic3A
Characteristics:
Biosynthesis pathway of LOS is
producing a branched oligosaccharide
attached to a lipid A via two 3-deoxy-Dmanno-2-octulosonic acid (KDO)
molecules.
rfaC gene product adds the first
heptose (Hep I) to KDO
lex2B
siaA
rfaC mutants are shown to produce
truncated LPS
rfaC is absent in all H.influenzae, but
present in all H. haemolyticus
rfaC mutants also shown to decrease
haemolytic activity and expression in
E Coli
So, may be, retaining rfaC helps H.
hae retain its haemolytic activity
lgtA
Prevalence Ratio Analysis
Patho vs Asymptomatic-LOS
rfaC
lic3A
Sialic Acid Transporter
Lic3A,SiaA or LsgB
Characteristics:
Sialic acid is added as terminal
nonreducing sugar to LOS – important for
bacterial virulence.
lex2B
siaA
These genes code for sialyltransferases
which incorporates sialic acid into LOS
In the absence of this transporter, Hinf
cannot survive when exposed to serum
Found to be absent in all
H.haemolyticus
lgtA
Prevalence Ratio Analysis
Patho vs Asymptomatic-LOS
rfaC
lic3A
Phase variable glycosyl
transferases
Lex2B, LgtB
lex2B
Characteristics:
Contributes to the significant intrastrain
heterogeneity of lipopolysaccharide (LPS)
composition in H. influenzae
siaA
And phase variable expression
Found to be absent in all H.
haemolyticus
lgtA
Prevalence Ratio Analysis
Patho vs Asymptomatic- Immuno-evasion
• No difference in pattern observed in this case!
Prevalence Ratio Analysis
Patho vs Asymptomatic- Iron acquisition
HxuABC
HxuA binds to hemehemopexin
HxuB releases HxuA from the
cell surface into the medium
HxuC is involved in the
transport of heme within the
cell
Function
• Using host heme-hemopexin
as the source of heme iron for
growth
Absent in all H.haemolyticus
YadA is a potent serum resistance factor as it inhibits the
classical pathway of complement, Yersinia adhesin A
Fba - Fibronectin-binding protein, streptococcus spp
IgA1 – serine protease, cleaves IgA1
Protein E – adhesine protein, captures vitonecting (Vn), which
prevents the formation of MAC
IgA1
MAC (Membrane Attack Complex)
it forms transmembrane channels, disrupting
the phospholipid bilayer of target cells, leading
to cell lysis and death.
IgA1
mrsA, glmM, galU, galE, manA, manB
Protein E
Is H. haemolyticus an opportunistic or pathogenic bacterium ?
Innate IS Adaptive IS
Ambiental
Opportunistic
Pathogenic
IgA1
ompP2
Protein E
Exopolysaccharides
Btuc
YadA
HuxABC
rfaC/hptE
FepA
fba
DISCUSSION
Hydrogenase-4 10-gene operon
• First identified in E. Coli in 1997
• hyfABCDEFGHIJ - hyfR (transcriptional activator)
• The proteins encoded by the hyf operon are proposed to constitute a
proton-translocating formate hydrogenlyase
• Hyf catalyzes dihydrogen production and ion transport when the cells
are grown at a starting pH of 7.5
• This operon is silent in E. Coli – Hyd-3 is the active H2 evolving operon.
Hydrogenase and Virulence
• As per recent studies, Hyd-4 in Yersinia
enterocolitica helps in gut colonization
▫ Using H2 produced during fermentation by intestinal
microflora
• Also, hydrogenases facilitated respiratory hydrogen
use by Salmonella enterica and is considered
essential for virulence
• So, understanding the expression and role of this
hydrogenase operon could provide critical
information for
▫ Characterization of Hhae
▫ Understanding a new mode of virulence in Hhae (May
be!)
ABC transporter system
• ATP-binding cassette (ABC) transporter system
• One of the largest protein families.
• Found in all species and are evolutionarily
related.
• Functionally diverse and have roles in a wide
range of important cellular functions.
Structural schema
• Bacterial genomes encode different numbers of ABC transporters,
which correlate with their lifestyles, suggests that bacterial ABC
transporters are likely to be necessary for growth and/or survival of
the bacteria in their ecological niches
ABC Transporters and Virulence
• Virulence associated with uptake of nutrients
▫ Polyamine, glutamine, sugar
• Virulence associated with uptake of metal ions
▫ such as iron, zinc, and manganese
• Virulence associated with cell attachment
• ABC transporter (outer membrane) proteins are sometimes
immunogenic too.
• Based on the role of the ABC system in virulence, certain
components could be potential targets for developing vaccines too
• In our case, one of the closest homologs is the SalX gene from
Pasteurella multocida– which is an ABC-type antimicrobial
peptide transport system, ATPase component.
• Hence, characterizing this ABC transporter system could be
insightful.