Download Pathogen Genomics COURSE

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA vaccination wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genome wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomic library wikipedia , lookup

Minimal genome wikipedia , lookup

Metagenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome evolution wikipedia , lookup

NEDD9 wikipedia , lookup

Gene expression profiling wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Protein moonlighting wikipedia , lookup

Point mutation wikipedia , lookup

Microevolution wikipedia , lookup

Genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Genome editing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Helitron (biology) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
TITLE:
COURSE:
INSTRUCTOR:
Pathogen Genomics
MED263, “Bioinformatics Applications to Human Disease”
Christopher H. Woelk & Sergei Kosakovsky Pond
1) Introduction
Today you are going to screen a patient’s HIV sequence for resistance mutations, compare
pathogenic and non-pathogenic strains of Escherichia coli, help your disorganized professor predict
genes, and identify vaccine candidates by reverse vaccinology.
2) The HIV Drug Resistance Database
2.1) Open a web browser and navigate to the HIV Drug Resistance Database
(http://hivdb.stanford.edu/). Click on the link, “HIVdb PROGRAM > Genotype Resistance
Interpretation” in top right hand corner. Then click on “ANALYSIS sequences > Enter Complete
Sequences”.
2.2) In Box A, “Text Input”, insert the following sequence and click “ANALYZE”:
>NC598-1997|AY030412
CCTCAAATCACTCTTTGGCAACGACCCATCGTCACAATAAAGATAGGGGGGCAGCTAARGGAAGCTCTATTAGATACAGGA
GCAGATGATACAGTATTAGAAGATATAAATTTGCCAGGAAGATGGACACCAAAAATKATAGTGGGAATTGGAGGTTTTACC
AAAGTAAGACAGTATGATCAGATACCTGTAGAAATTTGTGGACATAAAGCTATAGGTACAGTRTTAGTAGGACCTACACCT
GCCAACATAATTGGAAGAAATCTGTTGACYCAGATTGGTTGCACTTTAAATTTTCCCATTAGTCCTATTGACACTGTACCA
GTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAA
ATTTGTGCAGAATTGGAASAGGACGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATA
AAGAAAAAGAACAGYGATAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTT
CAATTAGGAATACCACATCCCGGAGGGTTAAAAAAGAACAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCA
RTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGACTAGATAT
CAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAGAATCTTAGAACCTTTT
AGAAAACAGAATCCAGACATAGTTATCTGTCAATAYGTGGATGATTTGTATGTAGGATCTGACTTAGAAATAGAGMAGCAT
AGAACAAAAGTAGAGGAACTGAGACAACATTTGTGGAAGTGGGGNTTTTACACACCAGACAAMAAACATCAGAAAGAACCT
CCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACA
2.3) View the resulting Genotypic Resistance Interpretation Algorithm report. Q1: Which HIV gene
sequences were in the sequence data above? HIV drug regimens normally consist of a Protease
inhibitor (PI) or a Non-nucleoside reverse transcriptase inhibitor (NNRTI) in a background of 2 or 3
nucleoside reverse transcriptase inhibitors (NRTIs). Q2: Would you put the patient infected with this
HIV strain on a PI or an NNRTI based regimen? Q3: Which PI or NNRTI would you select for this
regimen?
3) Genome Comparison with TaxPlot
3.1) Escherichia coli K-12 is a harmless strain whereas E. coli O157:H7 is often in the news with
respect to contaminated food which when eaten can cause diseases such as hemorrhagic colitis.
You will compare the protein coding genes between these to bacterium to reveal those genes that
may be associated with pathogenesis using TaxPlot (http://www.ncbi.nlm.nih.gov/sutils/taxik2.cgi).
3.2) On the TaxPlot page at the NCBI select “E. coli str. K-12 substr. MG1655” (Taxonomy ID:
511145) for the query genome. Then select “E. coli str. K-12 substr. MG1655” again for the first
comparison species and then “E. coli O157:H7 EDL933” (Taxonomy ID: 155864) as the second
comparison species. Select “compare” and then view the resulting plot. Q4: How many proteins are
considered identical between these two strains? Q5: There appear to be two major outliers, what are
they with reference to E. coli K-12?
1 3.3) The two major outliers appear to suggest that “membrane” proteins and “adhesins” may be
important for pathogenesis of E. coli O157:H7. You can use the “Query” function in TaxPlot to
highlight other membrane proteins and adhesins in the plot. Q6: Are there other membrane proteins
and adhesins that are different between these two strains of E. coli?
4) Gene Prediction with GLIMMER
5.1) Your professor has many bacterial genome sequencing projects on the go at once. He has
found a text file of DNA sequence data on his desktop but doesn’t know if it contains useful
information (i.e. genes) or which project it is from. Download the text file (unknown_DNA.txt) of DNA
sequence data from the course website (http://hyphy.org/w/index.php/MED263).
5.2) First use GLIMMER (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi) at the
NCBI to determine if the DNA sequence contains protein coding genes. Upload the text file
containing the DNA sequence, select the appropriate genetic code and then run GLIMMER. Q7: How
many putative genes are found? Q8: What is the “orf” designation of the largest gene that is found,
what reading frame is it in, and how many nucleotides is it?
5.3) Use your favorite alignment editor to extract the DNA for the largest orf from the DNA in the text
file, translate it and then BLAST (blastp) it against the non-redundant protein database at the NCBI.
Q9: What is the gene? Q10: What species does it come from?
5) Protein Annotation for Reverse Vaccinology
6.1) You are going to annotate two proteins from the gram-negative bacteria Helicobacter pylori
(gastric ulcers and stomach cancer) to determine which is the better vaccine candidate. Note: A
vaccine candidate should be surface exposed or secreted from the cell surface so that they may be
recognized by the human immune system. In addition, proteins with too many transmembrane
domains (>2) may be hard to clone and express for testing in animal models and should be avoided.
6.2) Retrieve the fasta formatted protein sequence data for citrate synthase (gltA, NP_222744) and
vacuolating cytotoxin (vacA, AAD04290) from the NCBI.
6.3) Use psortb (http://www.psort.org/psortb/) to predict the subcellular localization for each protein.
Q11: What is the subcellular localization of citrate synthase? Q12: What is the subcellular
localization of vacA?
6.3) Use SignalP (http://www.cbs.dtu.dk/services/SignalP/) to predict whether each protein has a
signal peptide. Q13: Does citrate synthase contain a signal peptide? Q14: Does vacA contain a
signal peptide?
6.4) Use TMHMM (http://www.cbs.dtu.dk/services/TMHMM/) to predict the number of transmembrane
domains in each protein. Q15: How many transmembrane domains does citrate synthase have?
Q16: How many transmembrane domains does vacA have? Q17: Which protein is the better vaccine
candidate? Q18: Would you be surprised to know that citrate synthase also protected mice from H.
pylori [1]?
6) Summary of questions to be answered
Q1: Which HIV gene sequences were in the sequence data above?
2 Q2: Would you put the patient infected with this HIV strain on a PI or an NNRTI based regimen?
Q3: Which PI or NNRTI would you select for this regimen?
Q4: How many proteins are considered identical between these two strains?
Q5: There appear to be two major outliers, what are they with reference to E. coli K-12?
Q6: Are there other membrane proteins and adhesins that are different between these two strains of
E. coli?
Q7: How many putative genes are found?
Q8: What is the orf designation of the largest gene that is found, what reading frame is it in, and how
many nucleotides is it?
Q9: What is the gene?
Q10: What species does it come from?
Q11: What is the subcellular localization of citrate synthase?
Q12: What is the subcellular localization of napA?
Q13: Does citrate synthase contain a signal peptide?
Q14: Does vacA contain a signal peptide?
Q15: How many transmembrane domains does citrate syntase have?
Q16: How many transmembrane domains does vacA have?
Q17: Which protein is the better vaccine candidate?
Q18: Would you be surprised to know that citrate synthase also protected mice from H. pylori?
7) References
1.
Dunkley, ML, et al., Protection against Helicobacter pylori infection by intestinal immunisation
with a 50/52-kDa subunit protein. FEMS Immunol Med Microbiol, 1999. 24(2): p. 221-5.
3