Download CSCI 474 Lab 1 : NCBI Nucleotide and RCSB PDB Spring 2017

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

List of types of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Metalloprotein wikipedia , lookup

Transcript
CSCI 474
Lab 1 : NCBI Nucleotide and RCSB PDB
Spring 2017
Computer Science
This lab is meant to acquaint you with the NCBI Nucleotide
database and RCSB Protein Data Bank (PDB). These two
depositories are maintained and administered by USA federally
funded science initiatives. They are databases of nucleotide sequences and three-dimensional
structural data of large biological molecules.
The data contained in Nucleotide (which is an umbrella database that culls information from several
other federally funded data sources) and the PDB is the output of experiments conducted by biologists,
physicists, chemists, and other scientists here in the United States and around the world.
This lab introduces you to PDB and NCBI Nucleotide web interfaces and a few of their web tools. We’ll
be using sequences and protein structures from these two databases throughout the quarter.
I. Activate your CS account
If you haven't yet activated your CS account, remember to do it BEFORE attending the first lab. Full
instructions are at the following : http://password.cs.wwu.edu
II. Lab Overview
For this lab you'll explore the sequence and biological structure of
hemoglobin, the iron-containing oxygen transport molecule that
makes up more than 90% (dry volume) of blood. The biological
structure, shown left, is a tetramer (4 chain protein) made up of two
 subunits (red) and two  subunits (blue). The four heme, or iron
groups, are shown green.
Recall from the bio/chem review lecture that the base pairs A, C, T and G in the DNA gene that encodes
for the hemoglobin protein (you’ll first look at the beta subunit) are transcribed to make an hnRNA
molecule, from which the exons are extracted to construct an mRNA molecule made up of A, C, U and G
bases. The mRNA passes through the nuclear pores to the cytoplasm, where the ribosome synthesizes
the hemoglobin polypeptide (the protein) with the help of tRNA molecules. I've briefly alluded to how
mutations in the gene manifest as malformed (mutant) proteins. In future labs we'll explore mutations
to hemoglobin which results in sickle cell anemia.
There are questions in bold throughout this lab handout, which you should compose answers to using
your favorite word editor. Upload your final document to Canvas. Here are the first 2 questions.
Q1 : What does the acronym NCBI stand for?
Q2 : What does the acronym RCSB stand for?
WWU CSCI 474, Lab 1
Filip Jagodzinski
III. Windows versus Linux (or even Mac), Browser
Most CS lab computers are dual boot Linux and Windows. Although you can
complete this lab (which requires only the use of a web browser) using either
Windows or Linux, it is recommended that you log in via the Linux partition, as
most future labs will require you to use Linux. Reboot a lab computer and select the Linux boot option at
startup. Once logged on, you’ll see the KUbuntu desktop. Click on the KUbuntu menu icon on the lower
left corner and type “Chrome” or "Chromium," or "Firefox", to start a search for a web browser. Start
the web browser.
IV. NCBI Nucleotide
Proceed to the NCBI website at http://www.ncbi.nlm.nih.gov.
Q3 : What does the acronym NIH stand for?
1. The unique identifier given to the gene that encodes for the  chain of hemoglobin is HBB. Perform a
search for HBB using the nucleotide database:
Q4 : How many sequences of the HBB gene (across all species) are returned?
2. In this lab we are exploring the human version of HBB. Select the "Animals" link in the "Species"
section on the left side of the screen, and from the Results by taxon menu on the right, select Homo
sapiens. Notice that as you make selections by clicking on web links, both the original search query is
modified, and the Search details entry on the right side of the screen is changed, in this case to:
HBB[All Fields] AND "Homo sapiens"[porgn] AND animals[filter]
3. Each nucleotide sequence has a
unique identifier. The GeneBank
database uses GI (GeneInfo Identifier).
Select the link that is the long title for
the sequence with GI 49168543. By
default the GeneBank database entry is
returned (shown right), but you can
toggle between the Graphics and FASTA
views. Inspect the GeneBank text entry.
Q5 : How many base pairs are in the GI
49168543 GeneBank entry?
WWU CSCI 474, Lab 1
Filip Jagodzinski
4. There are annotation standards that specify the
agreed-upon format that all nucleotide database
entries adhere to. For genetic data, that format is
FASTA. Click on the FASTA hyperlink. The individual
characters in the FASTA view (shown right) is the
raw data that is the nucleotide sequence (of the 5'
to 3' strand) of the DNA entry.
Q6 : What are the first three base pairs of the sequence? Be sure to list both bases of a pair.
Q7 : What is the amino acid that the first DNA triplet encodes?
5. A more "visual" representation of the HBB gene is
available if you click on Graphics link (shown right).
Explore the several UI features that allow you to
zoom in on and explore different portions of the
nucleotide sequence.
Q8 : What is the mRNA codon corresponding to
base pairs 211-213 of the DNA?
Q9 : When the mRNA molecule is processed by the
ribosome, what is the anticodon sequence of the
tRNA molecule that is recruited for mRNA base pairs that correspond to the DNA base pairs 211-213?
Q10 : When the mRNA molecule is processed by the ribosome, what is the amino acid that is attached
to the tRNA when base pairs 211-213 are read?
6. Return to the GeneBank view of the GI 49168543
sequence. Scroll half-way down the page, and you'll
see a long list of database cross reference (db_xref)
entries that are related to the nucleotide sequence
you are viewing. Each db_xref specifies the database
and its entry. For example:
/db_xref="PDB:1GZX"
Follow the 1GZK link to proceed to the Protein Data Bank entry 1GZX (shown right). Note that the PDB
entry 1GZX includes BOTH alpha and beta chains of hemoglobin, all 4 heme groups, and an additional 4
oxygen molecules, while the nucleotide entry you were viewing includes the base pair sequence of ONLY
the beta subunit of hemoglobin.
WWU CSCI 474, Lab 1
Filip Jagodzinski
V. The Protein Data Bank
The PDB contains experimental data for macromolecules (polypeptide chains of amino acids that are
produced by the ribosome). Most PDB entries are determined using an intricate, lengthy (requiring
possibly months of wet lab, imaging, and analysis work) process called X-ray crystallography, but
structures determined by Nuclear Magnetic Resonance are also available.
Q11 : How many entries (all structure files, not just 1GZX) are in the PDB?
Q12 : The PDB entry 1GZX was submitted to the PDB by which author(s), and in what year? It is
associated with which publication?
7. Click on the HBA1 gene view button (shown right) to proceed to the 1GZX's
detailed information about the first alpha subunit of hemoglobin.
Q13 : What is the cytogenetic location of HBA1? Explain what each digit and character in the
cytogenetic location means (perform a web search to learn about cytogenetic location).
8. Return to the main view page of the 1GZX entry of the PDB. From the Display Files
menu, select PDB format (shown right), which will take you to the raw data of the PDB
entry. Just like FASTA in the GeneBank, all entries in the PDB adhere to a specific
format. Each line of a PDB file begins with a keyword, followed by a number (an ID).
Among the many pieces of information in a PDB file, the atom type, its coordinates, the amino acid, the
chain, etc. are specified for each line that begins with the keyword ATOM. For example :
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1
2
3
4
5
6
7
N
CA
C
O
CB
CG1
CG2
VAL
VAL
VAL
VAL
VAL
VAL
VAL
A
A
A
A
A
A
A
1
1
1
1
1
1
1
18.432
19.662
19.282
18.421
20.659
20.109
21.982
-2.931
-2.549
-1.939
-2.497
-3.754
-4.992
-3.272
3.579
2.806
1.441
0.695
2.825
2.222
2.245
1.00
1.00
1.00
1.00
1.00
1.00
1.00
37.68
35.41
34.04
33.95
35.59
37.84
36.73
N
C
C
O
C
C
C
Q14 : What is the amino acid (the full name, not the abbreviation) in Chain B, amino acid 172?
Q15 : What are the x, y and z coordinates of the Gamma Carbon atom for amino acid 173 in Chain B?
You might need to perform a web search to learn about the PDB format and nomenclature.
VI. Submission and rubric
Upload to Canvas your answers to the 15 questions. Only.pdf, .docx, or .doc files are accepted
Component of Lab
15 questions answered, each 2 points
WWU CSCI 474, Lab 1
Points
30 points
Filip Jagodzinski