Download Homology Modeling Zinc Fingers – Introduction zf

Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein – DNA Interactions 1BBSI Jamie Duke1,2 – Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15261 of Biological Sciences, Rochester Institute of Technology, Rochester, NY 14623 2Department Abstract Results Homology Modeling RMSD vs Time for Position 3 in Helix Zinc Fingers – Introduction Nucleic Acid Binding Domain Classic C2H2 conformation – coordinating a zinc ion 24 Residue β – β – α motif Conserved Pattern: x-C-x(1-5)-C-x Figure 1. Zinc Finger, C2H2 type Domain Profile. The conserved cysteine residues are located within β – strands of the domain and are shown in pink. The conserved histidine residues are located within the α – helix and are shown in blue. Together the four residues coordinate a zinc ion that is required for the stability of the structure of the domain. Additionally, there is a conserved aromatic residue that also confers structural stability to the domain, shown in green. Referenced from Pfam Acc. No: PF00096 (http:// www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00096) zf-C2H2 Family Diversity Currently, there are 32,874 identified zinc fingers of the type zf-C2H2 (Pfam 17.0) There are 5264 proteins with identified zinc fingers, which are represented in 235 different architectures Distribution: Eukaryota: 5233 proteins Vertebrata: 3435 proteins Amphibians: 218 protiens Humans: 1390 proteins Mice: 1085 Fungi: 395 proteins Viruses: 19 proteins Archea: 12 proteins Snapshot of the multiple sequence alignment for the domain – the conserved residues have a * under the column EGR1_HUMAN/396-418 ZFP60_MOUSE/484-506 ACE2_YEAST/633-657 SUHW_DROAN/349-373 ZNF76_HUMAN/285-309 TTKB_DROME/538-561 XFIN_XENLA/1044-1066 Q17793_CAEEL/209-234 TF3A_BUFAM/161-187 ZN592_HUMAN/1043-1069 1AAY 1G2D 1AAY 1G2D (12)-H-x(3-6)-H Conserved aromatic ring approximately 4 residues away from the second cysteine Multiple domains used to recognize specific DNA sequences Most commonly studied family is the Early Growth Response (EGR) family Contains 2 – 3 zinc fingers Other Names: Zif268 Nerve Growth Factor Induced Protein FACD...ICG...RKFARS...DERKRHTKI...H FECK...ECG...KAFHFS...SQLNNHKTS...H YSCDF.PGCT...KAFVRN...HDLIRHKIS...H YACK...ICG...KDFTRS...YHLKRHQKYS.SC YTCPE.PHCG...RGFTSA...TNYKNHVRI...H YPCP...FCF...KEFTRK...DNMTAHVKI..IH YKCG...LCE...RSFVEK...SALSRHQRV...H YQCQ...LCK...KSISRHGQYANLLNHLSR...H YPCRKDSTCP...FVGKTW...SDYMKHAAE..LH YTCG...YCTEDSPSFPRP...SLLESHISL..MH * * * * There are 42 structures of zf-C2H2 proteins in the Protein Data Bank, with 11 structures that were applicable to our interests 20 / 42 structures were from x-ray crystallography 22 / 42 structures were developed through NMR At least 15 / 42 structures were duplicate structures We only considered structures that were developed through x-ray crystallography and had either 2 or 3 zinc fingers, as they would belong to the EGR family MERPYACPVESCDRRFSRSDELTRHIRIHTGQKPFQCRICMRNFS MERPYACPVESCDRRFS+ L HIRIHTGQKPFQCRICMRNFS MERPYACPVESCDRRFSQKTNLDTHIRIHTGQKPFQCRICMRNFS RSDHLTTHIRTHTGEKPFACDICGRKFARSDERKRHTKIHLRQKD + L HIRTHTGEKPFACDICGRKFA R RHTKIHLRQKD QHTGLNQHIRTHTGEKPFACDICGRKFATLHTRDRHTKIHLRQKD We were primarily concerned with the 3rd and 6th residues in the αhelix, as well as the residue directly before the α-helix. These are the residues that are used in DNA recognition. The Consensus Server, developed in part by Dr. Camacho, was used to perform the homology modeling (http://structure.bu.edu/cgi-bin/consensus/consensus.cgi) Since threading algorithms are used in the Consensus method, the side chains of amino acids can only be predicted to the extent of the corresponding amino acid from the template. For instance, when trying to predict the structure for lysine using serine as a template, the method can only place Cα and Cβ atoms, leaving four carbon atoms positions undeterminable. Using CHARMM the side chains were completed and minimized, and polar hydrogen atoms were also added to the structure. Side Chain Relaxation via Molecular Dynamic Simulations We chose to relax the side chains for each domain independently to find the most favorable states for the side chains to exist in the structure. The simulations were performed using a constrained backbone to conserve the structure that was predicted in the previous step. The only fluctuations allowed were those in the atoms in the side chains. The simulation was run for a total of 4.2 ns for each domain of which 200 ps was used for system equilibration, and each time step being equal to 2 fs. This simulation did not take into account ions in either the solution or the zinc ions that confer stability to the domain. Furthermore, the simulation was conducted without the DNA present to find the structure of the side chains in the unbound state. We were particularly interested in the states of the three residues involved in DNA recognition. RMSD Analysis and Clustering RMSD analysis was performed between the results of the MD simulations and the crystal structure for the side chains of each domain. The Cα atoms were aligned to produce a minimized RMSD calculation. The RMSD was calculated for symmetric structures where applicable (i.e. arginine residues) to further minimize the RMSD. A neighbor clustering algorithm was also applied to analyze the snapshots that were produced from the MD simulation. Each side chain was analyzed independent from the other side chains. The RMSD was calculated for all pairs of snapshots, and were clustered if the RMSD was within a 1.0 Å threshold. Clusters were ranked based upon the number of snapshots that were included. 1.9 1.7 3 1.5 2.5 RMSD (Å) We chose two proteins with known structures to perform homology modeling. This allows us to compare the predicted structure against the known structure to determine the accuracy of the prediction. A Zif268 variant (PDB id: 1G2D) was selected for the target of the homology modeling, with the template being Zif268 (PDB id: 1AAY) The protein recognizes the DNA Sequence: 5’– GCTATAAAA – 3’ The sequences are 83% similar, with 81% sequence identity The sequences diverge in the α-helices of the zinc finger domains, conferring a different DNA recognition sequence. The α-helical regions are highlighted in red. RMSD (Å) The EGR family of transcription factors is activated in cells exposed to growth factors. The overall structure of the family is highly conserved while the amino acid sequence can be quite diverse allowing for a wide array of DNA recognition sequences. Through homology modeling we have found we are able to reproduce the structure of the DNA binding domain of EGR proteins, which consists of three zinc fingers. We have also determined through molecular dynamic simulations that most side chains within the domain reach an equilibrium state. Furthermore, the three recognition residues in each zinc finger are found to have side chain conformations that are optimal for DNA recognition. These studies help to show a possible mechanism for zinc finger recognition of DNA. RMSD vs. Time for Residue 6 in Helix 1.3 1.1 2 1.5 0.9 1 0.7 0.5 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 Time (ns) Time (ns) Figure 2. RMSD Analysis for Position 3 in Helix. The RMSD for the glutamine located at position 3, domain 1 in the α-helix can be seen to fluctuate consistently between 0.9 and 1.5 Å when compared to the crystal structure. This graph represents the fluctuation that occurs for most of the residues in equilibrium. Figure 3. RMSD Analysis for Position 6 in Helix. The RMSD for the glutamie located at position 6, domain 2 in the α-helix can be seen to exist in two different states, one that is closer to the crystal structure, and one that is considerably farther away from the crystal structure. when compared to the crystal structure. This graph represents the few residues that exist in two different states. Through RMSD and cluster analysis, we have come determined that most of the residues reach an equilibrium point that is highly similar to the crystal structure. Cluster analysis revealed that the cluster with the most amount of neighbors is in general highly similar to the crystal structure. There are a few residues that are seen in the simulation that seem to fluctuate between two states, as can be seen in Figure 3. We believe that this fluctuation may be correlated to the mechanism by which the protein recognizes the DNA. Figure 4. Stereo View of the Superimposition of a Crystallized and Modeled Zinc Finger for 1G2D. The backbone of the structures is shown in grey, and remains identical for the two structures. The crystallized side chains are in red, and the modeled side chains are in green. The three residues that are represented as sticks are the residues used in DNA recognition. Most side chains adapt a conformation that is consistent with the crystallized structure. Image was created through PyMOL. Conclusions and Future Applications Through this method we are able to effectively determine a homology model of zinc finger proteins, more specifically zinc finger proteins in the EGR family. The modeled side chains are found to be in a state that is similar to the crystal structure, even in an unbound state, which is particularly important for the key residues involved in DNA recognition. Since the modeled domains are in a desirable conformation, it is possible to perform docking experiments with homology modeled zinc fingers, which is currently being done using an DNA-protein docking algorithm developed in the lab. Future applications include modeling EGR proteins with an undetermined structure to see if the model is able to recognize the proper DNA sequence. Acknowledgements Dr. Carlos J. Camacho, Advisor BBSI – Department of Computational Biology, University of Pittsburgh NIH - NSF References J.C. Prasad, S.R. Comeau, S. Vajda, and C.J. Camacho. Consensus alignment for reliable framework prediction in homology modeling. Bioinformatics 2003 19: 1682-1691. Paillard G., Deremble C., Lavery R. Looking into DNA Recognition: Zinc Finger Binding Specificity. Nucleic Acids Research 2004 32: 6673-6682. A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E.L.L. Sonnhammer, D.J. Studholme, C. Yeats, S.R. Eddy. The Pfam Protein Families Database. Nucleic Acids Research: Database Issue 2004 32: D138-D141.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Homology Modeling Zinc Fingers – Introduction zf