Download Protein Structure Hierarchy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expression vector wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Point mutation wikipedia , lookup

Catalytic triad wikipedia , lookup

Genetic code wikipedia , lookup

Interactome wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Disulfide wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Biochemistry wikipedia , lookup

Metalloprotein wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
Protein Structure Hierarchy
Introduction
With great insight - before the first 3D protein structure was
determined, the Danish chemist Kaj Linderstrøm-Lang (the father
of physical biochemistry), reasoned that there should be at least
four levels of structural organization present in protein structure.
In Linderstrøm-Lang's hierarchy of protein structure, each level
is characterized by a particular type of organizing force and higher
levels of organization are composed of elements described by the
previous level. We now know that this organization is an
Glutaredoxin 3 (E. coli)
oversimplification, but the organization of structure into levels is
still useful from a pedagogical viewpoint. Altogether, 80-90% of
amino acids in globular proteins are in one of the three classical secondary structures: α helix,
β sheet, or turn. Here we will begin with a description of primary structure and continue on to
the higher levels of protein structure.
Objectives
1.
2.
3.
4.
5.
Gain experience using different macromolecular representations in RasMol.
Measure conformational details (distances, angles) in a protein.
Extract information from the coordinate file header.
Investigate disulfide bonds in ribonuclease A.
Use the molecular-graphics program RasMol to analyze the structural features of a
protein.
6. Recognize the main elements of secondary structure within a protein fold.
Primary structure of ribonuclease A
This computer lab will use the enzyme ribonuclease A to explore secondary structure in
proteins. One basic question often asked is what is the primary sequence of a particular
structure entry.
Exercises:
1. 3-letter code. Last time you downloaded the coordinate file for 5RSA. Search the
header for the amino acid sequence. Copy out the sequence and place it in a text
editor.
2. 1-letter code. If you do not like the format (3-letter code) of what you find, you can
revisit the RCSB homepage and search for the entry 5RSA again. Here you will be
able to find the sequence in a number of formats. The FASTA format is a convenient
1-letter code format. Copy this form of the sequence to the same file.
3. Formatting sequences. Try to format your sequence in a useful manner so it will be
easy to find a particular residue number. If you are using a word processor as
opposed to a text editor, be careful of the choice of font. Proportional spaced fonts
(the ones that look nice e.g., Times New Roman) will not allow sequences to be
compared as some letters like “W” get more space than others like “I”. The solution
is to choose a non-proportional spaced font like Courier.
-1-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
4. Take the amino acid challenge. If you did not have time for this during lab 1, try it
now! Follow the link for this lab on the course web site to amino acid quiz. Each link
represents a coordinate file of one of the 20 naturally-occurring amino acids. It is
important to learn to identify the amino acids visually in the different representations.
Display a protein structure using different representations
Find your copy of the coordinate file for ribonuclease A and open it with RasMol. With
this exercise, you will cover most of the important commands and manipulations in RasMol.
Exercises:
1. Change the display. Open ribonuclease A in RasMol. By default, all atoms of all
structures are displayed by using a “wireframe” representation, atoms are invisible
and bonds are drawn as lines. Explore the different display modes (wireframe,
backbone, sticks, spacefill, ball & stick, ribbons, strands and cartoons) from the
“pull-down” menus. Think about the advantages and disadvantages of these different
representations are you see them. Choosing the best way to view a structural detail
can only be learned through experience.
Disulfide bonds and RasMol scripts
Often the primary sequence is defined as the covalent structure of the protein. This then
includes the covalent disulfide bonds which form crosslinks within and sometimes between
proteins. They are a very important structural detail that can be elucidated through chemical
sequencing, or by inspecting the 3D structure if it has been determined at high enough
resolution.
Exercises:
1. Identification of disulfide bonds. Print out a copy of your sequence for
ribonuclease A. Which cysteine residues are paired? One way to find put is to open
the coordinate file in a text editor and search for the information in the “HEADER”.
Indicate which cysteine residues are in disulfide bonds with lines.
2. Display of Disulfide bonds. Now display the disulfide bonds in ribonuclease A
using RasMol. Load the 5RSA structure into RasMol and display the protein using
the “backbone” representation from the menu (“Display -> Backbone”). In RasMol,
“backbone” display as a menu option means “virtual” bonds connecting Cα positions.
Once you have convinced yourself of this, label the cysteine residues at the Cα
position.
RasMol> select cys.ca
RasMol> label %r%s
RasMol can identify (usually) disulfide bonds with the terminal command “ssbonds”.
This will output the number of disulfide bonds identified in the terminal window and
display dotted lines connecting the cysteine sulfur atoms (the default behavior). As
these atoms are not displayed, they will appear to float in the structure. For backbone
and cartoons it is better to draw the disulfide bonds between the Cα positions. To do
this use the commands:
-2-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
RasMol> select all
RasMol> set ssbonds backbone
RasMol> ssbonds 50
RasMol> color ssbonds yellow
Notice how the disulfide bonds form crosslinks in the protein structure. Now display
the entire cysteine side chain and draw the disulfide bonds between the sulfurs.
RasMol> set bondmode or
RasMol> select cys and sidechain
RasMol> wireframe 50
RasMol> set ssbonds sidechain
RasMol> ssbonds 50
From the “Options” menu, the side chain hydrogens can be removed from the
display.
Disulfides have characteristic values for χ3 +/- 90° yielding right- and left-handed
disulfide bonds. What are the values for χ3 in this protein. Use your chemical
intuition to rationalize the value of this dihedral, whereas staggered rotamers of
carbon are –60, 60, and 180°.
3. RasMol scripts. Combinations of commands can be saved as a text file and read into
RasMol to produce the same image as if the commands were entered, one at a time,
into the terminal window. Make a RasMol script using the above commands to
produce this type of display for a generic protein. Save the file and test is after
opening a new PDB file by entering the “script your.filename” command. A
working knowledge of RasMol scripts is VERY useful.
Higher order structure
The description of higher order structure is, in general, less unambiguous than the
description of the covalent (primary) structure.
Exercises:
2. Describe the secondary/tertiary structure composition of ribonuclease. One way
to get an overview of the protein is to display the protein in cartoon or backbone
format and to color by structure. Details like disulfide bonds can also be added.
RasMol> set ssbonds backbone
RasMol> ssbonds 100
RasMol> color ssbonds green
Labels containing residue numbers can also be useful.
RasMol> select (1,25,50,75,100,124) and *.CA
RasMol> label %r
Labels can be removed at anytime by the command
RasMol> labels off
-3-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
Specific residue positions can be identified by clicking the left mouse button on the
atom. The atom selected appears in the terminal window. Note: atom identification
does not work for cartoons, ribbons, or strands.
Prepare a simple 2D plot identifying the secondary structure segment as a function of the
amino acid sequence (shown below). Use arrows to indicate strands of a sheet and cylinders
to indicate helices.
1
10
20
30
40
50
60
KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQ
70
80
90
100
110
120
KNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQANKHIIVACEGNPYVPVHFDASV
How many structural domains does the protein have?
How does RasMol know where the secondary structures are?
Is the beginning and end of the secondary structure unambiguous?
Where might the active site be?
Identification of H-bonds
In this exercise, you will analyze a segment of α helix and a segment of β sheet for the
characteristic hydrogen bonding pattern and measure the distances between hydrogen-bonded
atoms.
Exercises:
1. α helix. Examine one α helix and identify the characteristic hydrogen-bonding
pattern. Select a portion of the polypeptide chain (e.g., residues 5-12) that is in an α
helical conformation for display. Now we will use the “real” bonds between
backbone atoms.
RasMol> select all
“DisplayMenu->Wireframe”
RasMol> restrict backbone and 5-12
RasMol> wireframe 40
RasMol> color cpk
RasMol> center 9
It will be helpful to have labels.
RasMol> select 5-12 and *.CA
RasMol> label %r
Note that in most structures determined by diffraction techniques, there are no
hydrogen atoms (hydrogen atoms do not diffract X-rays very well). Therefore,
identification of hydrogen bonds will require that you recognize the donor (amide
nitrogen) and acceptor (carbonyl oxygen). Protein backbone hydrogen bonds
normally have a donor-acceptor distance of ~ 3.0 Å. Use RasMol to measure these
distances in your helical segment.
-4-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
RasMol> set picking distance
RasMol> set picking monitor
Now use the mouse to select possible hydrogen bond donor-acceptor pairs. The
selection will be shown as a dotted line connecting the two atoms and the distance
will be displayed. To remove an unwanted selection, simply re-select the two atoms.
When you have found the hydrogen bonds in this segment, reset the identification
mode of RasMol.
RasMol> set picking ident
RasMol> monitors off
Regular H-bonds. Using the template below, finish drawing the polypeptide chain
to represent residues 5-12. Draw line segments connecting the atoms involved in
regular α-helical hydrogen bonds:
2.
O
H
N
5
C
H
H
N
C
H
O
Typical α-helical hydrogen bonds are between the carbonyl oxygen of residue i and
the amide hydrogen of residue i + ___ ?
Measure backbone dihedral angles φ, ψ, and ω
The fold of the polypeptide backbone is defined by three dihedral angles φ, ψ, and ω for
each amino acid in the sequence. Due to partial double bond character, the amide bond
(dihedral angle ω) is more-or-less planar. That is to say the atoms Cα(i), C(i), N(i+1), Cα(i+1),
lie in a plane. There are two possible planar orientations which can be called cis (ω ≈ 0°) and
trans (ω ≈ 180°). The trans conformation is slightly more stable than the cis and is the
predominant form in folded proteins.
The regular secondary structures can be conveniently identified by a plot of φ versus ψ
commonly referred to as a "Ramachandran" plot. Not all values of φ and ψ are possible due to
steric hindrance in the polypeptide chain. Thus another use of a Ramachandran plot is as a
check of the accuracy of a structural model. Residues with values outside the "allowed"
regions may represent inaccuracies.
Exercises:
1. Dihedral angles of an α helix. Select a segment of α helix (5-7 residues long).
Display only backbone atoms (N, Cα, C, O and perhaps Cβ). Using the definitions
provided below, measure the dihedral angles φ, ψ, and ω and record them in the
table provided. Enter the values for φ and ψ as points on the Ramachandran plot
(page 7).
RasMol> restrict (backbone or *.CB) and 5-12
RasMol> wireframe 40
RasMol> select *.CA and 5-12
RasMol> label %r
-5-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
The commands for measuring the dihedral angles are:
RasMol> set picking torsion
Use the mouse to select the following four atoms in succession for each angle:
φi: Ci-1, Ni, Cαi, C’i
ψi: Ni, Cαi, Ci, Ni+1
ωi: Cαi, Ci, Ni+1, Cαi+1
The values will be displayed in the terminal window. Fill in the numbers on the
answer sheet.
2. Dihedral angles of a β strand. Select a piece of β sheet (e.g., residues 42-48, 8086, 98-103). Again, display only backbone atoms, measure the dihedral angles φ,
ψ, and ω of the central sheet (80-86) and record them in the table provided. Add
the values for φ and ψ as points on the Ramachandran plot (next page).
RasMol> restrict backbone and (42-48,80-86,98-103)
RasMol> wireframe 40
RasMol> select 80-86 and *.CA
RasMol> label %r
Fill in these values too on the anwer sheet.
Do the values of φ and ψ fall in “allowed” regions?
How do they compare to the average values for these secondary structures (see compendium
2.7.1-2.7.3)?
What are the relative orientations of the strands in the sheet?
A new feature in RasMol 2.7.1 allows for the dihedral angles φ and ψ to be written out
directly to an ASCII file suitable for subsequent plotting (Excel) using the command
RasMol> write RDF myfile.name
-6-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
i+1
i-1
φi
ωi
O
HN
C
N
Cα
Hα Cβ
ψi
180°
81
90°
0
ψ
6
-90°
-180°
-90°
0
φ
-7-
90°
-180°
180°
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
Turns
Turns are the third of the three "classical" secondary structures with approximately 1/3 of all
residues in globular proteins are contained in turns that serve to reverse the direction of the
polypeptide chain. This is perhaps not so surprising as the diameter of the average globular
protein domain is roughly 25 Å (an extended polypeptide conformation would require ~7
residues to traverse the domain before having to change directions). Turns are located
primarily on the protein surface and accordingly, contain polar and charged residues.
Antibody recognition, phosphorylation, glycosylation, hydroxylation, and intron/exon
splicing are found frequently at, or
adjacent to turns. However it is not clear
if this is due to specific recognition or
simply the surface location of turns.
Type I turns. Note the hydrogen bond
between CO of residue i and NH of
residue i+3.
Many have speculated on the role of turns in the folding of globular proteins. Perhaps the two
extremes can be classified as pacifists and activists. The pacifists view turns as a weak link in
the polypeptide chain, allowing the other secondary structures (helix and sheet) to determine
the conformational outcome. The activists, (encouraged by the recent experimental finding of
"turn-like" structures in short peptides in aqueous solution) view turns as structure nucleating
segments, formed early in the folding process. Neither is entirely accurate and evidence can
be given to support each view.
Exercises:
1. Rnase A. RasMol can automatically identify turns in a coordinate file of a protein
structure. Focus on the turns in Rnase A. To classify turns, one will need to
include the backbone atoms N, Ca, C’ and O as type I and II turns have a
characteristic hydrogen bond between C=O of residue i and HN of residue i+3. The
following is a suggestion for such a display.
RasMol> structure // have RasMol calculate SS structure
RasMol> wireframe off
RasMol> restrict protein
RasMol> strands // trace backbone
RasMol> select backbone and turns
RasMol> wireframe 40
RasMol> select turns and *.ca // labels are nice
RasMol> label %r%n
RasMol> color labels green
2. Thioredoxin. To see some “classical” examples of reverse turns, we will use the
small globular protein thioredoxin from E. coli (PDBid code 2TRX). Thioredoxin
is one of a handful of proteins that shuttle reducing from NADH to ribonucleotides
in the synthesis of deoxyribonucleotides. Visit the RCSB and download the
structure. If the above (minus the first statement) is placed into a RasMol script, it
can simply be applied using the “script” command.
-8-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
Appendix I
Plotting data using MicroSoft Excel
ASCII text files containing columns of data can be opened for analysis in MS Excel. To open
a text file and create a plot, follow the following steps.
Importing ASCII files into Excel
Step 1. Start Excel
Step 2. Select File-Open to get a file window and select the text file containing your data.
You will be prompted by the text Import wizard.
Step 3. choose “delimited” and “finish”
Step 4. remove the header from the file (click on the row number of the first row to delete and
while holding down <SHIFT>, select the row number of the last row to delete. Select
Edit/Delete so the data looks like the following with the residue number, φ,ψ in columns B, C
and D, respectively.
GLU
THR
ALA
ALA
ALA
2
3
4
5
6
-53.1
-72
-61.1
-65.4
-68.8
126.8
158.8
-39.6
-45.7
-36.9
Simple plot in Excel
Step 1 select the letter of the column to be on the x-axis (C) then while pressing <SHIFT>
select the column to be on the y-axis (D).
Step 2 Click on the Chart Wizzard. Select a type like XY (Scatter) and then select <finish>.
And a plot is produced.
Step 3 You can make the plot look a bit nicer by double clicking the axis and setting the
limits for both x and y to –180 to 180, and adding the labels for each axis.
-9-
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
Appendix II Allowed regions in φ,ψ space
The allowed regions were determined from 378 different crystal structures with a resolution
of 2.5 A or better. The regions where the density of points (φ,ψ) from this statistic is highest
are marked with color. The black regions contain 80% of all points, the medium grey, 95%
and the light grey 98% and yellow regions together 95%, all colored regions together 98%.
There is a different plot for all non-Gly/Pro residues, one for Val/Ile/Thr and one for Gly.
Gly
”normal”
Thr, Val, Ile
- 10 -
Södertörns högskola & Karolinska Institutet
Structural biochemistry
Computer lab 2 HT05
Answer sheet
Use this form to fill in answers to the questions appearing in the text as you go through it.
What are the values of the χ3 torsion in Rnase
A?
How many structural domains does the protein
have?
How does RasMol know where the secondary
structures are?
Is the beginning and end of the secondary
structure unambiguous?
Where might the active site be?
Between which atoms do we find typical αhelical hydrogen bonds?
How long are these typically (distance between
donor-acceptor, or distance between hydrogenacceptor may be used, but specify which it is)?
Do the values of φ and ψ fall in “allowed”
regions?
How do they compare to the average values for
these secondary structures (see compendium
2.7.1-2.7.3)?
What are the relative orientations of the strands
in the sheet?
Helix residue
6
7
8
9
10
11
φ
-68.8°
ψ
-36.9°
ω
178.1°
Strand residue
81
82
83
84
φ
-133.8°
ψ
154.7°
ω
178.0°
- 11 -