Download Homology Modeling Zinc Fingers – Introduction zf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein folding wikipedia , lookup

Proteomics wikipedia , lookup

Western blot wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Trimeric autotransporter adhesin wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Structural alignment wikipedia , lookup

Cyclol wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Protein domain wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein structure prediction wikipedia , lookup

Homology modeling wikipedia , lookup

Alpha helix wikipedia , lookup

Transcript
Structural Analysis of the EGR Family of Transcription Factors:
Templates for Predicting Protein – DNA Interactions
1BBSI
Jamie Duke1,2
– Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15261
of Biological Sciences, Rochester Institute of Technology, Rochester, NY 14623
2Department
Abstract
Results
Homology Modeling
RMSD vs Time for Position 3 in Helix
Zinc Fingers – Introduction
Nucleic Acid Binding Domain
Classic C2H2 conformation –
coordinating a zinc ion
24 Residue β – β – α motif
Conserved Pattern: x-C-x(1-5)-C-x
Figure 1. Zinc Finger, C2H2 type Domain Profile.
The conserved cysteine residues are located within
β – strands of the domain and are shown in pink.
The conserved histidine residues are located within
the α – helix and are shown in blue. Together the
four residues coordinate a zinc ion that is required
for the stability of the structure of the domain.
Additionally, there is a conserved aromatic residue
that also confers structural stability to the domain,
shown in green.
Referenced from Pfam Acc. No: PF00096 (http://
www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00096)
zf-C2H2 Family Diversity
Currently, there are 32,874 identified zinc fingers of the type zf-C2H2
(Pfam 17.0)
There are 5264 proteins with identified zinc fingers, which are
represented in 235 different architectures
Distribution:
Eukaryota: 5233 proteins
Vertebrata: 3435 proteins
Amphibians: 218 protiens
Humans: 1390 proteins
Mice: 1085
Fungi: 395 proteins
Viruses: 19 proteins
Archea: 12 proteins
Snapshot of the multiple sequence alignment for the domain – the
conserved residues have a * under the column
EGR1_HUMAN/396-418
ZFP60_MOUSE/484-506
ACE2_YEAST/633-657
SUHW_DROAN/349-373
ZNF76_HUMAN/285-309
TTKB_DROME/538-561
XFIN_XENLA/1044-1066
Q17793_CAEEL/209-234
TF3A_BUFAM/161-187
ZN592_HUMAN/1043-1069
1AAY
1G2D
1AAY
1G2D
(12)-H-x(3-6)-H
Conserved aromatic ring
approximately 4 residues away
from the second cysteine
Multiple domains used to
recognize specific DNA
sequences
Most commonly studied family is
the Early Growth Response
(EGR) family
Contains 2 – 3 zinc fingers
Other Names:
Zif268
Nerve Growth Factor
Induced Protein
FACD...ICG...RKFARS...DERKRHTKI...H
FECK...ECG...KAFHFS...SQLNNHKTS...H
YSCDF.PGCT...KAFVRN...HDLIRHKIS...H
YACK...ICG...KDFTRS...YHLKRHQKYS.SC
YTCPE.PHCG...RGFTSA...TNYKNHVRI...H
YPCP...FCF...KEFTRK...DNMTAHVKI..IH
YKCG...LCE...RSFVEK...SALSRHQRV...H
YQCQ...LCK...KSISRHGQYANLLNHLSR...H
YPCRKDSTCP...FVGKTW...SDYMKHAAE..LH
YTCG...YCTEDSPSFPRP...SLLESHISL..MH
*
*
*
*
There are 42 structures of zf-C2H2 proteins in the Protein Data
Bank, with 11 structures that were applicable to our interests
20 / 42 structures were from x-ray crystallography
22 / 42 structures were developed through NMR
At least 15 / 42 structures were duplicate structures
We only considered structures that were developed through x-ray
crystallography and had either 2 or 3 zinc fingers, as they would
belong to the EGR family
MERPYACPVESCDRRFSRSDELTRHIRIHTGQKPFQCRICMRNFS
MERPYACPVESCDRRFS+
L HIRIHTGQKPFQCRICMRNFS
MERPYACPVESCDRRFSQKTNLDTHIRIHTGQKPFQCRICMRNFS
RSDHLTTHIRTHTGEKPFACDICGRKFARSDERKRHTKIHLRQKD
+
L HIRTHTGEKPFACDICGRKFA
R RHTKIHLRQKD
QHTGLNQHIRTHTGEKPFACDICGRKFATLHTRDRHTKIHLRQKD
We were primarily concerned with the 3rd and 6th residues in the αhelix, as well as the residue directly before the α-helix. These are
the residues that are used in DNA recognition.
The Consensus Server, developed in part by Dr. Camacho, was used
to perform the homology modeling (http://structure.bu.edu/cgi-bin/consensus/consensus.cgi)
Since threading algorithms are used in the Consensus method, the
side chains of amino acids can only be predicted to the extent of the
corresponding amino acid from the template. For instance, when
trying to predict the structure for lysine using serine as a template,
the method can only place Cα and Cβ atoms, leaving four carbon
atoms positions undeterminable. Using CHARMM the side chains
were completed and minimized, and polar hydrogen atoms were also
added to the structure.
Side Chain Relaxation
via Molecular Dynamic Simulations
We chose to relax the side chains for each domain independently to
find the most favorable states for the side chains to exist in the
structure.
The simulations were performed using a constrained backbone to
conserve the structure that was predicted in the previous step. The
only fluctuations allowed were those in the atoms in the side chains.
The simulation was run for a total of 4.2 ns for each domain of which
200 ps was used for system equilibration, and each time step being
equal to 2 fs.
This simulation did not take into account ions in either the solution or
the zinc ions that confer stability to the domain. Furthermore, the
simulation was conducted without the DNA present to find the
structure of the side chains in the unbound state.
We were particularly interested in the states of the three residues
involved in DNA recognition.
RMSD Analysis and Clustering
RMSD analysis was performed between the results of the MD
simulations and the crystal structure for the side chains of each
domain. The Cα atoms were aligned to produce a minimized RMSD
calculation. The RMSD was calculated for symmetric structures
where applicable (i.e. arginine residues) to further minimize the
RMSD.
A neighbor clustering algorithm was also applied to analyze the
snapshots that were produced from the MD simulation. Each side
chain was analyzed independent from the other side chains. The
RMSD was calculated for all pairs of snapshots, and were clustered
if the RMSD was within a 1.0 Å threshold. Clusters were ranked
based upon the number of snapshots that were included.
1.9
1.7
3
1.5
2.5
RMSD (Å)
We chose two proteins with known structures to perform homology
modeling. This allows us to compare the predicted structure against
the known structure to determine the accuracy of the prediction.
A Zif268 variant (PDB id: 1G2D) was selected for the target of the
homology modeling, with the template being Zif268 (PDB id: 1AAY)
The protein recognizes the DNA Sequence: 5’– GCTATAAAA – 3’
The sequences are 83% similar, with 81% sequence identity
The sequences diverge in the α-helices of the zinc finger domains,
conferring a different DNA recognition sequence. The α-helical
regions are highlighted in red.
RMSD (Å)
The EGR family of transcription factors is activated in cells exposed to
growth factors. The overall structure of the family is highly conserved
while the amino acid sequence can be quite diverse allowing for a wide
array of DNA recognition sequences. Through homology modeling we
have found we are able to reproduce the structure of the DNA binding
domain of EGR proteins, which consists of three zinc fingers. We have
also determined through molecular dynamic simulations that most side
chains within the domain reach an equilibrium state. Furthermore, the
three recognition residues in each zinc finger are found to have side
chain conformations that are optimal for DNA recognition. These
studies help to show a possible mechanism for zinc finger recognition
of DNA.
RMSD vs. Time for Residue 6 in Helix
1.3
1.1
2
1.5
0.9
1
0.7
0.5
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
0
0.5
1
1.5
2
2.5
3
3.5
4
Time (ns)
Time (ns)
Figure 2. RMSD Analysis for Position 3 in Helix. The
RMSD for the glutamine located at position 3, domain
1 in the α-helix can be seen to fluctuate consistently
between 0.9 and 1.5 Å when compared to the crystal
structure. This graph represents the fluctuation that
occurs for most of the residues in equilibrium.
Figure 3. RMSD Analysis for Position 6 in Helix. The
RMSD for the glutamie located at position 6, domain
2 in the α-helix can be seen to exist in two different
states, one that is closer to the crystal structure, and
one that is considerably farther away from the crystal
structure. when compared to the crystal structure.
This graph represents the few residues that exist in
two different states.
Through RMSD and cluster analysis, we have come determined that
most of the residues reach an equilibrium point that is highly similar
to the crystal structure.
Cluster analysis revealed that the cluster with the most amount of
neighbors is in general highly similar to the crystal structure.
There are a few residues that are seen in the simulation that seem to
fluctuate between two states, as can be seen in Figure 3. We
believe that this fluctuation may be correlated to the mechanism by
which the protein recognizes the DNA.
Figure 4. Stereo View of the
Superimposition of a Crystallized and
Modeled Zinc Finger for 1G2D. The
backbone of the structures is shown in grey,
and remains identical for the two structures.
The crystallized side chains are in red, and
the modeled side chains are in green. The
three residues that are represented as sticks
are the residues used in DNA recognition.
Most side chains adapt a conformation that
is consistent with the crystallized structure.
Image was created through PyMOL.
Conclusions and Future Applications
Through this method we are able to effectively determine a homology
model of zinc finger proteins, more specifically zinc finger proteins in
the EGR family. The modeled side chains are found to be in a state
that is similar to the crystal structure, even in an unbound state,
which is particularly important for the key residues involved in DNA
recognition.
Since the modeled domains are in a desirable conformation, it is
possible to perform docking experiments with homology modeled
zinc fingers, which is currently being done using an DNA-protein
docking algorithm developed in the lab.
Future applications include modeling EGR proteins with an
undetermined structure to see if the model is able to recognize the
proper DNA sequence.
Acknowledgements
Dr. Carlos J. Camacho, Advisor
BBSI – Department of Computational Biology, University of Pittsburgh
NIH - NSF
References
J.C. Prasad, S.R. Comeau, S. Vajda, and C.J. Camacho. Consensus
alignment for reliable framework prediction in homology modeling.
Bioinformatics 2003 19: 1682-1691.
Paillard G., Deremble C., Lavery R. Looking into DNA Recognition: Zinc
Finger Binding Specificity. Nucleic Acids Research 2004 32: 6673-6682.
A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A.
Khanna, M. Marshall, S. Moxon, E.L.L. Sonnhammer, D.J. Studholme, C.
Yeats, S.R. Eddy. The Pfam Protein Families Database. Nucleic Acids
Research: Database Issue 2004 32: D138-D141.