Download Solving Protein Structures

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Circular dichroism wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Cyclol wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

X-ray crystallography wikipedia , lookup

Transcript
page 66
Lab Times
2-2011
Methods
Bench philosophy (29): X-ray data collection in protein crystallography
Solving Protein Structures
Diffraction concepts are a headache for novice crystallographers and the underlying mathematics can be sometimes
tricky but, after spending a reasonable amount of time studying the phenomena, it suddenly becomes clearer.
I
Beamtime at the synchrotron
In synchrotrons, electromagnetic radiation is released when electrons are bent
off a straight path at a speed close to celerity, the speed of light, and this type of X-ray
beam has a much higher intensity than conventional lab sources. Synchrotrons allow
data collection at almost any wavelength as
opposed to the fixed value of a home source
and, therefore, much smaller crystals and
crystals with very large unit cell dimensions
may be used. However, due to the high intensity of synchrotron radiation, the crystals are more prone to radiation damage,
even at cryogenic temperatures.
A perfect crystal lattice is an ordered
array of unit cells that continuously repeat
over three-dimensional space by transla-
Cryo cooling
Collimator X-ray beam
Goniometer head with
mounted crystal
Camera
Photo: Jörg M. Harms
n the Bench philosophy article in issue
6 of Lab Times 2010 (page 74), we described the purification and crystallisation techniques used in protein crystallography. Now, we will continue our journey
with the downstream steps needed to solve
a protein structure for a diffracting protein crystal. But first, let’s refresh on some
theoretical aspects of the X-ray diffraction
of crystals.
Diffraction is an optical phenomenon,
occurring when waves encounter an obstacle of similar length as the incident wavelength. X-ray electromagnetic radiation
consist of waves ranging between 0.1 Å to
100 Å wavelength (1 Å = 10-10 m), which
are emitted by excited electrons that return
to energetically basal atomic states. In order to resolve atomic features, it is necessary to use radiation of the same order of
magnitude as the atomic objects. In typical
in-house rotating anode X-ray generators
the wavelength of the X-ray beam is about
1.54 Å, corresponding to the CuKa transition of copper. In proteins, the length of the
covalent chemical bonds between carbon,
nitrogen and oxygen varies from 1.24 Å
(C=O) to 1.53 Å (C-Ca); these dimensions
are very close to the X-ray wavelength and,
therefore, enable X-ray radiation to “see”
the three-dimensional arrangement of protein atoms.
Experimental hutch of Swiss Light Source (SLS) synchrotron.
tion in all directions. The unit cell is the
smallest repeated element that generates
the crystal, and is defined by three distances (a,b,c) and three angles (a,b,g).
In three dimensions, there are seven lattice systems: cubic, hexagonal, tetragonal,
rhombohedral, orthorhombic, monoclinic
and triclinic, which are subdivided into 230
space groups. These space groups condense
all the possible symmetry operations that
the asymmetric unit can adopt for packing
the unit cell.
The asymmetric unit is the fundamental
unit of crystal construction and it is termed
like this because it corresponds to the smallest unit that can be rotated and translated
in order to generate the unit cell. The Miller indices are a set of three numbers (h,k,l)
used to define a family of planes by specifying the slopes of the planes. They are also
used to describe the spots that arise due to
diffraction from these planes.
Electromagnetic waves are periodic (sinusoidal) functions, consisting of two orthogonal components: electric E and magnetic H, which are out of phase with one
another by 90° or p/2. In X-ray crystallography, only the electrical component interacts with electrons around the atoms producing diffraction phenomena. Waves are
characterised by amplitude and wavelength
(or period in time units); the wavelength
does not change when diffraction occurs.
The amplitude is more intense when the
waves constructively interfere, that is when
summation of waves having a phase difference of exactly 0 or 2np occurs. Therefore,
the spots in the detector are only observed
when constructive interference happens.
This idea of constructive interference is
the underlying concept behind the Bragg
Equation, which states that a diffraction
phenomenon happens only when the distance separating the plane that contains
all lattice points is an exact multiple of the
wavelength: nl=2dhklsina, where n is an
integer, l is the wavelength, dhkl is the distance between the family of planes hkl and
a is the diffracting angle.
Real and reciprocal space
Crystallographers often distinguish real
space from reciprocal space. Real space is
the three-dimensional space as it is in the
crystal, and reciprocal space is related to
the space containing the diffraction spots.
As the detector is usually flat, it is necessary
to rotate the crystal in real space while acquiring the data and this procedure enables recording the three dimensions in re-
2-2011
Methods
ciprocal space. Real and reciprocal space
are related to each other by a Fourier transform (FT), which means that you can swap
from real space to reciprocal space by applying an FT and vice-versa. In real space, the
atoms are positioned in repeating planes
having a distance d, which correlates in reciprocal space to a distance 1/d on a plane
perpendicular to the plane in real space. In
summary, big distances in real space appear as small distances in reciprocal space
and the vectors in real space result in perpendicular vectors in reciprocal space.
Diffraction analysis
Once the theoretical bases have been
understood, we can proceed with the necessary steps to analyse diffraction images. Basically, the position of the spots in reciprocal
space depends on the unit cell and the intensity of the spots depends on the arrangement of the atoms. Therefore, it is possible
to obtain the unit cell dimensions and angles (a,b,c,a,b,g) from the position of the
spots. Thereafter, it is necessary to integrate
the spots’ intensities from all the images obtained by spinning the crystal and average
them in order to get a probable space group.
At this point, you must bear in mind that
the space group is hypothetical until you
get the structure factor amplitudes |F(hkl)|
for each unique set of hkl planes.
The initial calculations in X-ray crystallography are performed in reciprocal space.
The electron density at a point x, y, z can be
calculated from the Fourier transform over
all measured hkl of the structure factors:
r(x,y,z) = 1/V Shkl _F_hkl exp[-2pi(hx+ky+lz)
+ ia(hkl)].
The structure factor amplitudes |F(hkl)|
can be measured from the diffraction pattern; however, the other half of the information, the phase (a) is missing as it cannot be directly determined. No detector can
determine the phase and to obtain a value
is not trivial at all, and that is why crystallographers call it ‘the phase problem’. The solution to the phase problem can be achieved
using experimental or computational approaches, depending on the protein crystal
under investigation.
The experimental methods employed
to solve the phase problem involve either
the addition of heavy atoms into the protein crystals or the utilisation of anomalous scattering atoms in the protein structure (in most cases, heavy atoms). Isomorphous replacement (IR) requires the insertion of heavy atoms in the crystal. The reflections of similar (isomorphous) crystals,
with and without the heavy atoms, are com-
pared to obtain the positions of the heavy
atoms. From these positions, the initial
phases can be calculated. Single or multiple wavelength anomalous dispersion (SAD
or MAD) take advantage of the anomalous
scattering of the protein atoms, such as selenium inserted through the use of selenomethionine in the growth media. Selenium
is the most commonly used element when
SAD or MAD methods are used to obtain
initial phase information. With SAD only
one wavelength is used to collect diffraction
data, whereas MAD uses up to three different wavelengths.
In addition to experimental methods
used for obtaining initial phase information, there are computational methods such
as molecular replacement (MR) and direct
Ab initio (direct) calculations. The latter require the resolution to be below 1 Å, which
has only been achieved for a handful of protein crystals. However, MR takes a solved
homologous protein structure, from which
the initial phases can be obtained, as similar
structures will tend to have related phases
as long as they are in the same position in
the asymmetric unit. MR tries to align the
known structure into the unknown crystal
by using rotation and translation functions.
First, using the rotation function, the approximate orientation of the two molecules
is calculated and then, using the translation
function, a superimposition of the two molecules is achieved. If more than one molecule is present in the unit cell, another rotation-translation function is performed until
all molecules are placed into the unit cell.
MR delivers initial phases from the
known structure generating a first electron
density map. Since this map is biased towards the solved structure, a refinement
is necessary to get the final model closer to
the experimental data than the structure
used in MR.
Manual building
In addition to the previously mentioned
bias, several amino acids that were not homologous might be missing from the model or they might be “mutated” to glycine
or alanine. Manual building can be used
where extra density appears. The addition
of big aromatic amino acids such as tryptophan, phenylalanine or tyrosine is a good
start as they can act as beacons in the structure. The observation of extra density that
fits into the amino acids side chains is a
hint that MR has worked properly. Following manual building, several cycles of refinement can be applied and the outcome
is a new electron density map that contains
Lab Times
page 67
more information based on the experimental data, so that more amino acids, loops,
confirmations and ligands can be added
into the model.
Also, a restrained refinement process
can be applied, which uses the geometry
(bond lengths and angles) of typical values from the organic chemistry literature.
These restraints effectively add more data,
overcoming the lack of measured structure
factor amplitudes due to poor diffraction
of protein crystals. This process of manual
building and refinement is repeated until
no extra information is added to the model.
The refinement progress is monitored
by using Rwork and Rfree values. Rwork is the
measure of how well-refined the data is fitted by the model and Rfree is the Rwork for 5%
of the data that is omitted from the refinement and detects over-fitting of the model.
The equation of Rwork is:
Rwork = Sh _Fo(h)_ - _Fc(h)_ / Sh _Fo(h)_,
where |Fo(h)| is the observed structure factor amplitudes and |Fc(h)| is the calculated
structure factor amplitudes. Initial models
often have an Rwork of 40 to 50%, whereas in
a refined model Rwork is below 25% in most
of the cases (high resolution structures can
have an Rwork below 20%). Rfree is typically a
few percent higher. Finally, the refined protein model needs to be validated on parameters such as geometry, torsion angles and
others for verifying that they are within the
acceptable limits.
The geometry and stereochemical properties of the final protein model can be assessed using MolProbity (Davis et al., Nucleic Acids Res 35: W375-383) or ProCheck
(Laskowski et al., J Mol Biol 231(4): 10491067) servers. The main-chain torsion angles (w,c) can be evaluated using the Ramachandran plots (Ramachandran et al., J
Mol Biol 7: 95-99) for verifying allowed dihedral angles.
Following the structure validation, we
are now in a position to upload the structure model along with the observed structure factor amplitudes into the protein data
bank (PDB), where a final validation is performed by the curators.
Juan David Guzman and
Dimitrios Evangelopoulos (ISMB,
London)
Fancy composing an installment of “Bench Philosophy”?
Contact Lab Times
E-mail: [email protected]