Download PathogenBioinformatics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ancestral sequence reconstruction wikipedia , lookup

Western blot wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene desert wikipedia , lookup

Non-coding DNA wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein moonlighting wikipedia , lookup

Community fingerprinting wikipedia , lookup

Genomic imprinting wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression wikipedia , lookup

Ridge (biology) wikipedia , lookup

Promoter (genetics) wikipedia , lookup

List of types of proteins wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

Molecular evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
MCB 428
Fall 2007
LABORATORY EXERCISE 7: PATHOGEN BIOINFORMATICS ON THE WEB
(25 POINTS)
Exercise developed by: Lina Dovilas
Edited by: Dr. Gail K. Grabner
The goal of this exercise is to familiarize students with web-based tools that are
available in the study of various pathogens. In the lab on gastrointestinal pathogens
just completed, we reviewed some of the more common food- and water-borne
pathogens. In this exercise we will look in greater detail at another of these
enteropathogens.
Listeria monocytogenes is a Gram-positive, facultative, highly motile rod that is
commonly found in soil, water, on plants, and in sewage. It is a prominent food-borne
pathogen and is associated with listeriosis disease. In most healthy adults, L.
monocytogenes infections are mild or asymptomatic. In fact, 5-10% of the adult
population carry this bacterium as part of their normal microflora. However, individuals
that are immunocompromised in some way are subject to listeriosis disease.
Additionally, L. monocytogenes is one of only a few microorganisms that can cross the
placenta and infect a fetus in utero.
L. monocytogenes enters host epithelial cells by means of surface -D-galactose
residues, which bind to -D-galactose receptors on the host cell. This is the reverse
method of adherence as compared to most pathogens, which bind to carbohydrate
residues on the host cell. Phagocytic engulfment is induced by means of the protein
internalin. Once inside the phagosome, the acidic environment stimulates synthesis of
listeriolysin O (LLO). LLO allows L. monocytogenes to leave the vacuole in the
phagosome and enter the nutrient-rich cytosol, where it can replicate. Listeriolysin O is
a thiol-activated cytolysin (pore-forming toxin) which is most active in acidic conditions.
This virulence factor allows the bacteria to escape formation of a phagolysosome
without lysing the host plasma membrane. In the absence of LLO, virulence is
attenuated. After release into the cytosol, expression of surface protein ActA induces
actin polymerization producing comet tails, thereby allowing Listeria monocytogenes to
propel itself to an adjacent cell. In this manner, it can infect and replicate in other cells
without ever being exposed to the extracellular environment, thereby rendering it
invisible to the immune system. This actin polymerization is also the means of
transmission of this bacterium through the placenta. All of the Listeria strains listed in
the database that we will use in the exercise are virulent except Listeria innocua and
Listeria welsheri.
The National Microbial Pathogen Data Resource (NMPDR) is an online database
containing genome information including genomes of Archaea, Bacteria and Eukarya.
NMPDR specializes in a few specific core pathogens: Campylobacter, Listeria,
Staphylococcus, Streptococcus, and Vibrio. By using comparative tools for analyzing a
specific organism, researchers can use this information to locate drug targets or
possible vaccines. The type of data that can be researched on NMPDR includes
1
MCB 428
Fall 2007
complete genomes, populated subsystems, functional clusters, SigGenes, and BLAST
hits to get a table of bidirectional best hits.
After opening your browser, go the following link: http://www.nmpdr.org/. This is the
main webpage for NMPDR. To search for Listeria monocytogenes, click on ‘Listeria’ on
the left navigation bar, under ‘Organism Data Summaries.’ This will take you to the
Listeria main page.
This page also contains external source links for more information via PathInfo, uBio
Journals, Google news, and Chain link. PathInfo provides basic information concerning
the target organism, such as taxonomy, lifestyle and morphology. It also includes
microscopic photographs of the organism. uBio Journals is a continually updated
source providing the most current scientific literature on Listeria. Chain Link provides
sources for purchasing the strains and microarrays.
Click on Path Info. Here you will see information about the organism and associated
references in outline format. As you scroll down, you will find a table under the heading
“Listeria genome sequence annotation status.” This table lists all of the strains of
Listeria recorded in NMPDR. Included in the listing for each strain are the serotype,
genome size, and Protein Encoding Genes (PEGs). Serotypes 1/2a, 1/2b, 4b are
known to be the major strain types that cause food-borne listeriosis. The next four
columns list the number and percentage (of total genes) of named or hypothetical genes
in or not in subsystems. Hypothetical genes are those that do not have defined roles. A
subsystem consists of genes that have related functional roles (e.g. genes involved in
the biosynthesis of the amino acid glycine).
Scroll down to Listeria monocytogenes 1/2a F6854, the strain that contains the gene for
LLO (mentioned above). By clicking on the name, a basic statistics link lists the
aforementioned information. Additionally, the number of contigs (contiguous length of
DNA sequence data) and CDSs (coding sequences, also known as PEGs) are listed.
Click on Show Subsystems. This gives a complete list of the subsystems and the roles
of the genes within each subsystem. We will return to this strain later in the exercise.
Go back to the previous page and click on show reactions. Now you see the same list
of subsystems, but with the addition of the known or possible reactions associated with
each subsystem. The numbers prefixed with the letter “R” provide links to the KEGG
database, which give more details of the reactions and the chemical structures of the
compounds involved. These reactions reveal the metabolic capacity of the cell – a kind
of metabolic fingerprint. Our gene of interest does not play a role in metabolism so it is
not included in any of these reactions.
Click on the NMPDR icon on the top left corner of your screen to return to the home
page. Then click on ‘Listeria’ to return to the Listeria home page. At the top of the page
you will see a heading marked ‘Virtual structural proteome.’ Click on table to the right of
this heading. This link lists homologs in PDB or Protein Database (another genomic
database). You can read the introductory paragraph at the top of the page to better
2
MCB 428
Fall 2007
understand this link and its purpose. Click on Drug Targets. Recall that NMPDR was
created for the purpose of comparing genomes to locate possible drug, antitoxin, and
vaccine targets. Scroll down (or search by Ctrl f) to find Listeria monocytogenes 1/2a
F6854 fig|267409.1.peg.1504 Thiol-activated cytolysin/ listeriolysin O (LLO). (This is
the same strain we examined earlier in the exercise.) Click on NMPDR. This link
contains all the specific information for Listeria monocytogenes 1/2a F6854
fig|267409.1.peg.1504 Thiol-activated cytolysin/ listeriolysin O (LLO). Note that this is
the virulence gene that encodes listeriolysin, described in the introduction to this
exercise. Directly below the NMPDR designation for this gene is the NCBI Taxonomy
ID (267409). This page shows a graphical display of the genomic context of the proteinencoding gene of interest (the "focus peg "), centered in a 10 – 12 kbp region.
In contrast to the other members of the Cholesterol-dependent cytolysins (CDCs), LLO
contains a PEST-like (Pro-Glu-Ser-Thr) sequence at the N-terminus. This virulence
factor controls the lysing of the host cell. A bacterium containing LLO with a PEST
sequence does not lyse the host plasma membrane. A bacterium which does not
contain the PEST sequence in LLO lyses the membrane, exposing Listeria
monocytogenes to host defenses.
Click on show to the left of ‘Protein Sequence.’ (The sequence is written N to C.)
1. (1 point) Look at the Protein Sequence and locate a 21 long amino acid PESTlike sequence. Look for mostly P and S, though there is also one E and one T.
Write the sequence below, including beginning and ending residue numbers.
type answer here
2. (2 points) How could this sequence be used as a drug target?
type answer here
Now look for the sequence ECTGLAWEWWR at the C-terminal. The relevance of this
will be explained later.
3
MCB 428
Fall 2007
The DNA sequence can also be examined by clicking on the associated ‘show’ button.
Click on the ‘show’ button next to the ‘DNA with Flanking Sequence’ header. Note that
the target gene sequence is highlighted and the flanking sequences are not. This
feature can be used in the design of PCR primer sequences. Lastly, Functional
Coupling shows genes that are conservatively located together regardless of actual
function. Functional coupling includes genes that are 8 kb upstream or downstream of
the focus genes and are found conserved in a minimum of 4 other species (not strains
of the same species). Note that proximal genes are more likely to be related when
clustered together, though this is true only for bacteria (not for eukaryotes). Upon
clicking the show for Functional Coupling, a table would display the number of species
with the Functional Coupling. LLO does not have Functional Coupling since listeriolysin
is species-specific to Listeria. Therefore, there are no blue arrows (functionally coupled
genes) in the arrow diagram below.
The colored arrows show protein encoding genes (PEGs). Placing the mouse over an
arrow names the function and location of the gene in the bacterium. Our green focus
peg is the gene of interest. Neighboring arrows are colored red when genes are not
conserved (less than 70% identical), or blue when genes are functionally coupled.
(Colors may vary with browser.)
3. (1 point) What are the identities of nearby genes that might be functionally
related in the virulence pathway?
type answer here
By clicking on show for Compare Regions (directly under the arrow diagram), five (5)
strains with the most homologous regions are shown. As above, by placing the mouse
over the arrow, additional information concerning each gene can be viewed. The genes
labeled 1 are aligned in the center of the page. These are homologs of the target gene,
LLO in this case. All of the genes within about 8 kb of this central "pin" are shown.
Genes that share the same numerical label are homologs. The numbers above the
arrows correspond to the frequency of the homologous proteins within these 5 strains.
A peg labeled “1” is most commonly found in the strains, while one labeled ‘14’ is less
common. The identity of the gene and genome are revealed upon mouse-over. The
gray arrows are binding sites, promoter regions, etc., which do not code for protein. To
see a list of numbers and their functions, click commentary (directly above the colored
arrows). On this page, each numbered peg is listed as a separate “set” with the gene
identifications for each occurrence in the 5 strains being compared. You may notice
that a particular gene or set (e.g. “2”) may have more than one occurrence in the same
strain of Listeria. Scroll to the bottom of the page and you will see a list of the 5 strains
under comparison. A different set of strains of Listeria monocytogenes can be
compared by checking the boxes correlating to the strains wanted. In this case, you
4
MCB 428
Fall 2007
would click Picked Maps Only to view them. Close this screen and return to the Drug
Targets page.
Scroll down below the colored arrows and click on Bidirectional Best Hits. A table will
appear with the strains and species that contain a protein with function “Thiol-activated
cytolysin.” Look down the list and notice that Listeria innocua and Listeria welsheri are
not present in this list. Note that they are not pathogenic (they do not contain the LLO
sequence) and are therefore not included in this list.
In the Select column, check Listeria monocytogenes Aureli 1997, Clostridium tetani
E88, Clostridium perfringens str. 13, Bacillus cereus G9241, Bacillus anthracis str.
Ames, Streptococcus pyogenes SSI-1, Streptococcus pyogenes M1 GAS, Bacteroides
fragilis YCH46. Make sure the SELECT current PEG is checked. Then scroll back up
to the top of the table and click align. A new window will show the alignment under the
program “The Seed.” The light blue and yellow colors designate consensus sequences
in the residues.
4. (2 points) How do the sequences compare between different strains of the
same species (e.g. between the two strains of Listeria)? How does this relate
to the biochemical (enzymatic) activity of the listed proteins?
type answer here
5. (2 points) How do the sequences compare between different species (e.g.
between Listeria and Streptococcus)? How similar are different species of the
same genus (e.g. B. cereus vs. B. anthracis)? Of the organisms listed, which
exhibits the greatest disparity with regard to amino acid sequence?
type answer here
5
MCB 428
Fall 2007
Scroll down to the bottom of the page and find a heading labeled “Neighbor-joining Tree
of Selected Proteins.” This portion of the page allows you to select some or all of the
proteins aligned at the top of the page in order to examine them in their chromosomal
context, e.g. to look at neighboring genes. Select Check All in order to select all of the
genes for comparison, then click Show Regions. The Seed program will open a new
window to compare the neighboring regions around the selected toxin proteins. Notice
that the heading of this web page is “Chromosomal clusters.” Pins are defined as
conserved neighboring sequences in at least 4 other species. By placing the cursor
over the arrows, the function and identity of the gene is named.
6. (2 points) Does the LLO gene have any pins? How do you know?
type answer here
Close this page and return to the previous page of Protein Alignment. Now select the
circle for residue. Click update to open a new page that displays a color alignment by
residue type. Each color corresponds to a different type of amino acid property, such as
basic, aromatic, etc. As before, notice the similarity only within the species and not
between species.
Close this page and return to the NMPDR Protein Page for Listeria monocytogenes
Thiol-activated cytolysin/listeriolysin O. Scroll down to the Genomic Context Table.
This table lists proteins upstream and downstream of the target protein LLO. Each
protein is labeled with its function and location on the strain (beginning and ending base
pair numbers). In the right-most column labeled “Aliases” are links to other
bioinformatics sites.
Notice the column heading "find best clusters." Functional Clusters are genes with
exact functions that are homologs in other genomes. Conserved clustering implies
related functions. The context of the focus peg may be preserved in other organisms in
clusters made up of more or fewer other genes. Whether or not the gene of interest
appears to be functionally clustered in its genome, orthologs of this gene may occur in
clusters in other genomes. Note that if conserved clusters can be identified within a
number of different pathogens, these genes become potential drug targets that could be
effective against all of the associated pathogens.
Under the column labeled “Function,” find Thiol-activated cytolysin/listeriolysin O (LLO)
and click on CL link in the column headed "find best clusters." A table of homologs in
clusters is returned, ordered by the size of the cluster. These clusters probably contain
different members, so how do you know if the proximity is meaningful? Proximity is
6
MCB 428
Fall 2007
most likely to have a functional basis when the clustering is observed with high
frequency and across a wide variety of organisms.
Back on the Genomic Context Table, a button in the column labeled “Pins” would be a
link to a graphical display of homologous genes across all species arranged
phylogenetically centered around the “pin” or focus gene.
7. (2 points) Based on your answer in question 6 above, would you expect LLO
to have a button in this column? Does the Genomic Context Table confirm
your analysis?
type answer here
8. (2 points) Would LLO be a good candidate for a drug to target multiple
species? Explain.
type answer here
9. (1 point) Would LLO be a potential candidate for a drug to target Listeria
monocytogenes specifically? Explain.
type answer here
7
MCB 428
Fall 2007
Under the Genomic Context Table you will find a heading marked ‘Subsystems in Which
This Protein Plays a Role.’ Click on the subsystem Listeria Pathogenicity Island LIPI-1
extended subsystem. This unique feature groups proteins according to functional role
such as metabolic pathways or pathogen-specific virulence factors. The top table lists
the strains that contain the same subsystem and their corresponding base pair
locations. Functional roles included in this subsystem are in columns. Mouse-over the
column heading to see the definition of each role. The corresponding gene numbers
are listed under each role.
10. (1 point) Are the proteins in this subsystem located together in the genome?
Explain.
type answer here
To better understand the infection process of Listeria, click on Listeria_infection_cycle.
Each of the listed proteins is associated with the infection cycle and each of these
belongs to the LIPI Pathogenicity Island. These virulence factors are listed in the table
on the right by the same abbreviations used in the Functional Roles table of the
previous page. The function of each protein is also listed with the abbreviation. PrfA is
a regulatory virulence factor whose protein activates transcription of hly, the gene that
encodes LLO. Thermo-regulated PrfA transcribes at a maximal rate when at 37°C.
Therefore, Listeria monocytogenes must invade a host whose body temperature is
~37°C. Both phospholipases (PlcA and PlcB) help break down the inner membrane of
the vacuole, which is followed by LLO breaking down the outer membrane.
11. (2 points) Based on your observations from these results, what would be a
good definition for a pathogenicity island?
type answer here
8
MCB 428
Fall 2007
Close the window for the Listeria infection cycle and return to the subsystem page.
Scroll down to below the spreadsheet and click on Show Phylogenetic Tree. This will
open a new window displaying a whole tree for several species of pathogenic
microorganisms.
12. (1 point) To which branch of bacteria is Listeria most phylogenetically related?
Listeria is most closely related to Staphylococcus and Bacillus.
13. (2 points) How could knowing the phylogeny of the bacterium aid in finding a
vaccine or drug? Is this supported by the data obtained from comparing
Bidirectional Best Hits?
type answer here
Back on the Drug Targets page (go back 2 pages), locate the same strain we examined
before (Listeria monocytogenes 1/2a F6854 fig|267409.1.peg.1504). By clicking on
Virulence Associated an NCBI PubMed article based on Listeria monocytogenes will be
available to read. You are not required to read the article.
The last column in the table on the Drug Target page is a column for “Best Hit to
Human.” If there was an equivalent protein in humans, a link would be provided in the
right-most column.
14. (2 points) Why is this information essential for locating potential drug targets?
type answer here
9
MCB 428
Fall 2007
Go to the main NMPDR webpage. Click on BLAST or Scan to open a search engine.
Under the drop down menu for Tool, select protScan. This scans proteins and finds a
selected sequence. Enter the amino acid sequence ‘ECTGLAWEWWR’ into the empty
box to the right of the box marked “Sequence in Raw or FASTA Format.” Select a few
of the following genomes: Listeria, Bacillus, Streptococcus. You will need to use the
Ctrl key to select multiple genomes. (The more genomes you choose, the longer it will
take to load the page.) Scroll to the bottom of the page and press Scan. Notice all of
the genomes that contain that conserved sequence and their assignments (function).
15. (2 points) What can you conclude about this conserved sequence?
type answer here
This conserved undecapeptide, ECTGLAWEWWR, in the C-terminal region is the site of
cholesterol-binding. Remember that we found this sequence at the C- terminal when
looking at LLO Protein Sequence. In order to enter the phagosome through induced
phagocytosis, the bacteria must bind to the cell. Listeria monocytogenes binds the host
cell using cholesterol.
Go back to the Drug Targets page. Locate Listeria monocytogenes Thiol-activated
cytolysin/listeriolysin O str. 4b H7858 (fig|267410.1.peg.2853). In the column to the
right of the strain designation, click on 1s3r. This link will take you to the RCBS Protein
Data Bank (PDB). Besides the overall information provided in the table, observe the 3D
crystal model for this protein to the right. On the picture under Display options: click
MBT SimpleViewer* (needs Java).
By clicking on the screen, you can rotate the 3D crystal structure. Place the curser on
any part of the protein and notice the labeling of each part including the residue number,
conformation, and amino acid. Having such a crystal model in a drug target database
shows possible binding sites and locates functional domains. Also, notice that the three
-sheets in the middle form a -barrel. This -barrel inserts itself into the membrane
bilayer and creates a hydrophobic pore through which it enters the cell.
Go back to the main NMPDR website. In the left navigation bar select Signature Genes
Tool. This feature allows the comparison of selected genomes while excluding other
genomes. Select Listeria monocytogenes strain str. 4b F2365 as the reference
genome. Under Inclusion Genomes, select L. monocytogenes strains Aureli 1997, FSL
N1-017, str. 4b F2365, and str. 4b H7858. Under Exclusion Genomes, select L.
monocytogenes strains 10403S, EGD-e, FSL R2-503, and str. 1/2a F6854. Click Go
(bottom of page). This may take a while, so be patient.
10
MCB 428
Fall 2007
The table shows all of the genes that are included in the Inclusion Genomes and that
are not present in the Exclusion Genomes. The rightmost column provides a number of
equivalence. A score of 1.00 indicates that the gene is present in all of the Inclusion
Genomes and excluded from all of the Exclusion Genomes. A score of less than 1.00
indicates that it may be in some of the Inclusion and not in some of the Exclusion.
Notice that the four strains selected for Inclusion are all serotype 4b. The hypothetical
genes are possible genes that could make the 4b strain so virulent. To obtain more
information about these hypothetical genes, explore NMPDR and compare it to different
genomes. It is the sequence of a gene in the genome that causes virulence.
Sources:
http://textbookofbacteriology.net/Listeria.html
http://www.jcb.org/cgi/content/full/158/3/409#FIG2
http://www.jem.org/cgi/content/full/186/7/1159
http://www.mgc.ac.cn/cgi-bin/VFs/vfs.cgi?Genus=Listeria&Keyword=Toxin
http://www.prospec.co.il/~prospec/cart/catalog/rLLO.html
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1828513
http://www.sciencemag.org/cgi/content/abstract/290/5493/992
For further information read Schnupf, Pamela and Daniel A. Portnoy. “Listeriolysin O: a
phagosome-specific lysin.” Microbes and Infection 9 (2007) 1176-1187
This worksheet was based on the October 2007 version of NMPDR.
11