* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Pathogen Genomics COURSE
Genome (book) wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
DNA vaccination wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Minimal genome wikipedia , lookup
Metagenomics wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
History of genetic engineering wikipedia , lookup
Designer baby wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Protein moonlighting wikipedia , lookup
Point mutation wikipedia , lookup
Microevolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
TITLE: COURSE: INSTRUCTOR: Pathogen Genomics MED263, “Bioinformatics Applications to Human Disease” Christopher H. Woelk & Sergei Kosakovsky Pond 1) Introduction Today you are going to screen a patient’s HIV sequence for resistance mutations, compare pathogenic and non-pathogenic strains of Escherichia coli, help your disorganized professor predict genes, and identify vaccine candidates by reverse vaccinology. 2) The HIV Drug Resistance Database 2.1) Open a web browser and navigate to the HIV Drug Resistance Database (http://hivdb.stanford.edu/). Click on the link, “HIVdb PROGRAM > Genotype Resistance Interpretation” in top right hand corner. Then click on “ANALYSIS sequences > Enter Complete Sequences”. 2.2) In Box A, “Text Input”, insert the following sequence and click “ANALYZE”: >NC598-1997|AY030412 CCTCAAATCACTCTTTGGCAACGACCCATCGTCACAATAAAGATAGGGGGGCAGCTAARGGAAGCTCTATTAGATACAGGA GCAGATGATACAGTATTAGAAGATATAAATTTGCCAGGAAGATGGACACCAAAAATKATAGTGGGAATTGGAGGTTTTACC AAAGTAAGACAGTATGATCAGATACCTGTAGAAATTTGTGGACATAAAGCTATAGGTACAGTRTTAGTAGGACCTACACCT GCCAACATAATTGGAAGAAATCTGTTGACYCAGATTGGTTGCACTTTAAATTTTCCCATTAGTCCTATTGACACTGTACCA GTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAA ATTTGTGCAGAATTGGAASAGGACGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATA AAGAAAAAGAACAGYGATAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGTT CAATTAGGAATACCACATCCCGGAGGGTTAAAAAAGAACAAATCAGTAACAGTACTGGATGTGGGTGATGCATATTTTTCA RTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGACTAGATAT CAGTACAATGTGCTTCCACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAGAATCTTAGAACCTTTT AGAAAACAGAATCCAGACATAGTTATCTGTCAATAYGTGGATGATTTGTATGTAGGATCTGACTTAGAAATAGAGMAGCAT AGAACAAAAGTAGAGGAACTGAGACAACATTTGTGGAAGTGGGGNTTTTACACACCAGACAAMAAACATCAGAAAGAACCT CCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACA 2.3) View the resulting Genotypic Resistance Interpretation Algorithm report. Q1: Which HIV gene sequences were in the sequence data above? HIV drug regimens normally consist of a Protease inhibitor (PI) or a Non-nucleoside reverse transcriptase inhibitor (NNRTI) in a background of 2 or 3 nucleoside reverse transcriptase inhibitors (NRTIs). Q2: Would you put the patient infected with this HIV strain on a PI or an NNRTI based regimen? Q3: Which PI or NNRTI would you select for this regimen? 3) Genome Comparison with TaxPlot 3.1) Escherichia coli K-12 is a harmless strain whereas E. coli O157:H7 is often in the news with respect to contaminated food which when eaten can cause diseases such as hemorrhagic colitis. You will compare the protein coding genes between these to bacterium to reveal those genes that may be associated with pathogenesis using TaxPlot (http://www.ncbi.nlm.nih.gov/sutils/taxik2.cgi). 3.2) On the TaxPlot page at the NCBI select “E. coli str. K-12 substr. MG1655” (Taxonomy ID: 511145) for the query genome. Then select “E. coli str. K-12 substr. MG1655” again for the first comparison species and then “E. coli O157:H7 EDL933” (Taxonomy ID: 155864) as the second comparison species. Select “compare” and then view the resulting plot. Q4: How many proteins are considered identical between these two strains? Q5: There appear to be two major outliers, what are they with reference to E. coli K-12? 1 3.3) The two major outliers appear to suggest that “membrane” proteins and “adhesins” may be important for pathogenesis of E. coli O157:H7. You can use the “Query” function in TaxPlot to highlight other membrane proteins and adhesins in the plot. Q6: Are there other membrane proteins and adhesins that are different between these two strains of E. coli? 4) Gene Prediction with GLIMMER 5.1) Your professor has many bacterial genome sequencing projects on the go at once. He has found a text file of DNA sequence data on his desktop but doesn’t know if it contains useful information (i.e. genes) or which project it is from. Download the text file (unknown_DNA.txt) of DNA sequence data from the course website (http://hyphy.org/w/index.php/MED263). 5.2) First use GLIMMER (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi) at the NCBI to determine if the DNA sequence contains protein coding genes. Upload the text file containing the DNA sequence, select the appropriate genetic code and then run GLIMMER. Q7: How many putative genes are found? Q8: What is the “orf” designation of the largest gene that is found, what reading frame is it in, and how many nucleotides is it? 5.3) Use your favorite alignment editor to extract the DNA for the largest orf from the DNA in the text file, translate it and then BLAST (blastp) it against the non-redundant protein database at the NCBI. Q9: What is the gene? Q10: What species does it come from? 5) Protein Annotation for Reverse Vaccinology 6.1) You are going to annotate two proteins from the gram-negative bacteria Helicobacter pylori (gastric ulcers and stomach cancer) to determine which is the better vaccine candidate. Note: A vaccine candidate should be surface exposed or secreted from the cell surface so that they may be recognized by the human immune system. In addition, proteins with too many transmembrane domains (>2) may be hard to clone and express for testing in animal models and should be avoided. 6.2) Retrieve the fasta formatted protein sequence data for citrate synthase (gltA, NP_222744) and vacuolating cytotoxin (vacA, AAD04290) from the NCBI. 6.3) Use psortb (http://www.psort.org/psortb/) to predict the subcellular localization for each protein. Q11: What is the subcellular localization of citrate synthase? Q12: What is the subcellular localization of vacA? 6.3) Use SignalP (http://www.cbs.dtu.dk/services/SignalP/) to predict whether each protein has a signal peptide. Q13: Does citrate synthase contain a signal peptide? Q14: Does vacA contain a signal peptide? 6.4) Use TMHMM (http://www.cbs.dtu.dk/services/TMHMM/) to predict the number of transmembrane domains in each protein. Q15: How many transmembrane domains does citrate synthase have? Q16: How many transmembrane domains does vacA have? Q17: Which protein is the better vaccine candidate? Q18: Would you be surprised to know that citrate synthase also protected mice from H. pylori [1]? 6) Summary of questions to be answered Q1: Which HIV gene sequences were in the sequence data above? 2 Q2: Would you put the patient infected with this HIV strain on a PI or an NNRTI based regimen? Q3: Which PI or NNRTI would you select for this regimen? Q4: How many proteins are considered identical between these two strains? Q5: There appear to be two major outliers, what are they with reference to E. coli K-12? Q6: Are there other membrane proteins and adhesins that are different between these two strains of E. coli? Q7: How many putative genes are found? Q8: What is the orf designation of the largest gene that is found, what reading frame is it in, and how many nucleotides is it? Q9: What is the gene? Q10: What species does it come from? Q11: What is the subcellular localization of citrate synthase? Q12: What is the subcellular localization of napA? Q13: Does citrate synthase contain a signal peptide? Q14: Does vacA contain a signal peptide? Q15: How many transmembrane domains does citrate syntase have? Q16: How many transmembrane domains does vacA have? Q17: Which protein is the better vaccine candidate? Q18: Would you be surprised to know that citrate synthase also protected mice from H. pylori? 7) References 1. Dunkley, ML, et al., Protection against Helicobacter pylori infection by intestinal immunisation with a 50/52-kDa subunit protein. FEMS Immunol Med Microbiol, 1999. 24(2): p. 221-5. 3