Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSCI 474 Lab 1 : NCBI Nucleotide and RCSB PDB Spring 2017 Computer Science This lab is meant to acquaint you with the NCBI Nucleotide database and RCSB Protein Data Bank (PDB). These two depositories are maintained and administered by USA federally funded science initiatives. They are databases of nucleotide sequences and three-dimensional structural data of large biological molecules. The data contained in Nucleotide (which is an umbrella database that culls information from several other federally funded data sources) and the PDB is the output of experiments conducted by biologists, physicists, chemists, and other scientists here in the United States and around the world. This lab introduces you to PDB and NCBI Nucleotide web interfaces and a few of their web tools. We’ll be using sequences and protein structures from these two databases throughout the quarter. I. Activate your CS account If you haven't yet activated your CS account, remember to do it BEFORE attending the first lab. Full instructions are at the following : http://password.cs.wwu.edu II. Lab Overview For this lab you'll explore the sequence and biological structure of hemoglobin, the iron-containing oxygen transport molecule that makes up more than 90% (dry volume) of blood. The biological structure, shown left, is a tetramer (4 chain protein) made up of two subunits (red) and two subunits (blue). The four heme, or iron groups, are shown green. Recall from the bio/chem review lecture that the base pairs A, C, T and G in the DNA gene that encodes for the hemoglobin protein (you’ll first look at the beta subunit) are transcribed to make an hnRNA molecule, from which the exons are extracted to construct an mRNA molecule made up of A, C, U and G bases. The mRNA passes through the nuclear pores to the cytoplasm, where the ribosome synthesizes the hemoglobin polypeptide (the protein) with the help of tRNA molecules. I've briefly alluded to how mutations in the gene manifest as malformed (mutant) proteins. In future labs we'll explore mutations to hemoglobin which results in sickle cell anemia. There are questions in bold throughout this lab handout, which you should compose answers to using your favorite word editor. Upload your final document to Canvas. Here are the first 2 questions. Q1 : What does the acronym NCBI stand for? Q2 : What does the acronym RCSB stand for? WWU CSCI 474, Lab 1 Filip Jagodzinski III. Windows versus Linux (or even Mac), Browser Most CS lab computers are dual boot Linux and Windows. Although you can complete this lab (which requires only the use of a web browser) using either Windows or Linux, it is recommended that you log in via the Linux partition, as most future labs will require you to use Linux. Reboot a lab computer and select the Linux boot option at startup. Once logged on, you’ll see the KUbuntu desktop. Click on the KUbuntu menu icon on the lower left corner and type “Chrome” or "Chromium," or "Firefox", to start a search for a web browser. Start the web browser. IV. NCBI Nucleotide Proceed to the NCBI website at http://www.ncbi.nlm.nih.gov. Q3 : What does the acronym NIH stand for? 1. The unique identifier given to the gene that encodes for the chain of hemoglobin is HBB. Perform a search for HBB using the nucleotide database: Q4 : How many sequences of the HBB gene (across all species) are returned? 2. In this lab we are exploring the human version of HBB. Select the "Animals" link in the "Species" section on the left side of the screen, and from the Results by taxon menu on the right, select Homo sapiens. Notice that as you make selections by clicking on web links, both the original search query is modified, and the Search details entry on the right side of the screen is changed, in this case to: HBB[All Fields] AND "Homo sapiens"[porgn] AND animals[filter] 3. Each nucleotide sequence has a unique identifier. The GeneBank database uses GI (GeneInfo Identifier). Select the link that is the long title for the sequence with GI 49168543. By default the GeneBank database entry is returned (shown right), but you can toggle between the Graphics and FASTA views. Inspect the GeneBank text entry. Q5 : How many base pairs are in the GI 49168543 GeneBank entry? WWU CSCI 474, Lab 1 Filip Jagodzinski 4. There are annotation standards that specify the agreed-upon format that all nucleotide database entries adhere to. For genetic data, that format is FASTA. Click on the FASTA hyperlink. The individual characters in the FASTA view (shown right) is the raw data that is the nucleotide sequence (of the 5' to 3' strand) of the DNA entry. Q6 : What are the first three base pairs of the sequence? Be sure to list both bases of a pair. Q7 : What is the amino acid that the first DNA triplet encodes? 5. A more "visual" representation of the HBB gene is available if you click on Graphics link (shown right). Explore the several UI features that allow you to zoom in on and explore different portions of the nucleotide sequence. Q8 : What is the mRNA codon corresponding to base pairs 211-213 of the DNA? Q9 : When the mRNA molecule is processed by the ribosome, what is the anticodon sequence of the tRNA molecule that is recruited for mRNA base pairs that correspond to the DNA base pairs 211-213? Q10 : When the mRNA molecule is processed by the ribosome, what is the amino acid that is attached to the tRNA when base pairs 211-213 are read? 6. Return to the GeneBank view of the GI 49168543 sequence. Scroll half-way down the page, and you'll see a long list of database cross reference (db_xref) entries that are related to the nucleotide sequence you are viewing. Each db_xref specifies the database and its entry. For example: /db_xref="PDB:1GZX" Follow the 1GZK link to proceed to the Protein Data Bank entry 1GZX (shown right). Note that the PDB entry 1GZX includes BOTH alpha and beta chains of hemoglobin, all 4 heme groups, and an additional 4 oxygen molecules, while the nucleotide entry you were viewing includes the base pair sequence of ONLY the beta subunit of hemoglobin. WWU CSCI 474, Lab 1 Filip Jagodzinski V. The Protein Data Bank The PDB contains experimental data for macromolecules (polypeptide chains of amino acids that are produced by the ribosome). Most PDB entries are determined using an intricate, lengthy (requiring possibly months of wet lab, imaging, and analysis work) process called X-ray crystallography, but structures determined by Nuclear Magnetic Resonance are also available. Q11 : How many entries (all structure files, not just 1GZX) are in the PDB? Q12 : The PDB entry 1GZX was submitted to the PDB by which author(s), and in what year? It is associated with which publication? 7. Click on the HBA1 gene view button (shown right) to proceed to the 1GZX's detailed information about the first alpha subunit of hemoglobin. Q13 : What is the cytogenetic location of HBA1? Explain what each digit and character in the cytogenetic location means (perform a web search to learn about cytogenetic location). 8. Return to the main view page of the 1GZX entry of the PDB. From the Display Files menu, select PDB format (shown right), which will take you to the raw data of the PDB entry. Just like FASTA in the GeneBank, all entries in the PDB adhere to a specific format. Each line of a PDB file begins with a keyword, followed by a number (an ID). Among the many pieces of information in a PDB file, the atom type, its coordinates, the amino acid, the chain, etc. are specified for each line that begins with the keyword ATOM. For example : ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 2 3 4 5 6 7 N CA C O CB CG1 CG2 VAL VAL VAL VAL VAL VAL VAL A A A A A A A 1 1 1 1 1 1 1 18.432 19.662 19.282 18.421 20.659 20.109 21.982 -2.931 -2.549 -1.939 -2.497 -3.754 -4.992 -3.272 3.579 2.806 1.441 0.695 2.825 2.222 2.245 1.00 1.00 1.00 1.00 1.00 1.00 1.00 37.68 35.41 34.04 33.95 35.59 37.84 36.73 N C C O C C C Q14 : What is the amino acid (the full name, not the abbreviation) in Chain B, amino acid 172? Q15 : What are the x, y and z coordinates of the Gamma Carbon atom for amino acid 173 in Chain B? You might need to perform a web search to learn about the PDB format and nomenclature. VI. Submission and rubric Upload to Canvas your answers to the 15 questions. Only.pdf, .docx, or .doc files are accepted Component of Lab 15 questions answered, each 2 points WWU CSCI 474, Lab 1 Points 30 points Filip Jagodzinski