* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Model Refinement
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Drug design wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Gene expression wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
Genetic code wikipedia , lookup
Biochemistry wikipedia , lookup
Interactome wikipedia , lookup
Point mutation wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Comparative Protein Modeling Jason Wiscarson ([email protected]), Lloyd Spaine ([email protected]) Introduction Comparative or homology modeling, is a computational tool used to predict three-dimensional structure of proteins with unknown structures. If the sequence and the protein share sequence similarity, proteins with known 3-D structures may serve as templates to predict the unknown protein structure. The term “homology” refers to evolutionary relationship between two or more proteins that have the same ancestor in an evolution tree regardless of their sequence similarity. Proteins from similar families often have similar functions, yet there are many instances in which proteins have similar structure but different functions. Therefore the process to construct 3-D models of proteins shown in Figure 1 is paramount. Find known Align the target and template amino acid residues sequences and 3-D structures related to the target protein Final Model Evaluate Model Select templates and adjust/improve the alignments Refine Model Construct Model Figure 1 Flow chart that shows construction of comparative protein models. The solid lines represent comparative modeling steps, and dotted lines represent parameters (template, alignment, construction environment, or refinement method) that can improve the quality of the protein model Finding related sequences and structures In comparative protein modeling several databases are used to find genomic, amino acid, and protein data. The Expert Protein Analysis System (ExPASy) is the start for searching for proteins and their related sequences. Swiss-Prot contains data that has been refined by removing unnecessary information and TrEMBL receives and stores initial genomics data. PROSITE uses tertiary structure and key amino acid residues based on biologically significant patterns. ENZYME retrieves an enzyme’s recommended name, alternative names, catalytic activity, cofactors, human genetic diseases, and cross-references. SWISS-MODEL holds comparative protein models that do not have a known 3-D structure. Basic Local Alignment Search Tool (BLAST) uses protein sequence to search and analyze the sequences of interest; locates similar protein sequences: sequence alignments. Protein Data Bank (PDB) is a repository for experimentally determined protein 3-D structures. Sequence Alignment and Modeling System with Hidden Markov Models (SAM)-T02 provides sequence alignment from the target sequence to all templates in steps: 1. Find sequences similar to the target sequence. 2. Predict the secondary structure. 3. Find probable templates for threading. 4. Align the target with the templates. 5. Construct a fragment library for the target. 6. Build a 3-D model of the target. Threading different proteins that have similar structures 1. Creates pseudo-protein models based on solved proteins. 2. Calculates energy value for the pseudo-protein models. 3. Ranks the alignments based on that energy value. Sequence Alignment Alignment based on evolutionary history is done to amino acid residues of target protein. The types of alignment are: a) Global alignment of regions that lack similarity and then search for similar regions. b) Local alignment in regions with significant similarity first, and then align regions of optimally aligned residues. To prepare sequences a database Sequence to Coordinates (S2C) is used to examine the differences that originate from the mutagenesis studies. Alignment programs differ in the methods used but they score or evaluate the final alignment using gap penalties, similarity matrices and alignment scores. Similarity Matrices describe the probability of a specific amino acid residue mutating to a different residue type. Common similarity matrices include : 1. Point-Accepted Mutation per 100 amino acid residues (PAM), is based on the probability of an amino acid residue mutating to another amino acid residue. 2. BLOck SUstitution Matix (BLOSUM) matrices is similar to PAM but uses more diverse set of sequences. 3. Gonnet similarity matrices index and reorganize amino acids using a tree on small cluster of computers. Clustal is an alignment program that aligns large sequences of varying similarity quickly. Sequences are progressively aligned based on the branching order in the phylogenetic tree. Tree-Based Consistency Objective Function for Alignment Evaluation (T-Coffee) is a method to rectify progressive-alignment (heuristic) methods where errors in the first alignment cannot be corrected as other sequences are added to the alignment. It suffers from greediness, its inability to correct errors (addition or extension of a gap). Divide-and-Conquer Alignment (DCA) method aligns sequences simultaneously. It uses the multiple sequence simultaneously (MSA) methodology. Selecting Templates and Improving Alignments The first step is to improve the alignment and select the template. This is where the sequence of interest (target) and other sequences and structures (template) are aligned. Afterwards, the best templates are chosen based on evolutionary distance as determined by a phylogenic tree. Selecting Templates: structure for a protein model is done by considering R-factor (residual index), the value that relates how well predicted structure matches experimental electron density maps. Improving Sequence Alignment With Primary and Secondary Structure Analysis is used to reveal regions rich in proline, glutamic acid, serine, and threonine (PEST regions) locate sequence repeats; predict percentage of buried versus accessible residues; and provide information about protein’s isoelectric point. Pattern and Motif-Based Secondary Structure Prediction: AA sequence 3D structure. Well-known pattern and motif-based secondary structure prediction methods include PSIPRED, GenTHREADER, PREDATOR, PROF, MEMSAT, and PHD. Constructing Protein Models Protein Model Refinement Side-Chains with Rotamer Library (SCWRL) determines the most likely side-chain conformations by 1) Reading the initial structure and determining possible low energy side-chain conformations (rotamers). 2) Defining disulfide bridges and performing a dead-end elimination to get rid of rotamers. 3) Constructing a residue graph and determining the rotamer clusters and outputing the final structure. Molecular Mechanics (MM) is a method that removes repulsive contacts between side chains by allowing the side chains to relax to low-energy rotamers. Molecular Dynamics (MD) simulation involves: 1. Warm-up, equilibrium, cool down 2. Sampling the trajectory during a “production” run time period and analyzing results. Molecular Dynamics with Simulated Annealing (MDSA) is an optimization method that works by heating a system, samples many energy states, and then slowly cools the system to ensure that the low-energy structures are found. Evaluating Protein Models Several methods exist to check imperfections in the models including: Satisfaction of Spatial Restraints (SSR) constructs a 3D protein model using spatial restraints based on distances, bond angles, dihedral angles, dihedral pairs, etc. PROCHECK which does statistical checks and indicates regions of a protein structure that might require modification because of nonoptimal stereochemistry. Segment Match Modeling (SMM) constructs protein by: 1. Choosing protein template. 2. Building list of possible template matches 3. Sorting templates by best fit to target’s structure. 4. Using probabilities to select the “best segment” from a low pseudo-energy subset group. 5. Moving coordinates from best segments template protein. Verify 3D scores 3-D models with probability table and assess probability that each amino acid residue would occupy specific position in the 3-D structure. Multiple Template Method (MTM) uses solved X-ray structures to build the target sequence’s protein model. 3D-JIGSAW creates a homology model: 1. Select and align templates, based on sequence. 2. Select template segments. 3. Create backbone (framework, scaffold). 4. Add side chains, refine and evaluate target protein model. ERRAT examines nonbonded distances of C-C, C-N, C-O, NN, N-O, and O-O atoms. Protein Structure Analysis (ProSa) uses potential of mean force which is change in potential energy of a system caused by the variation of a specific coordinate to locate the regions of the protein structure that may contain improper or unsuitable geometries. Protein Volume Evaluation (PROVE) uses computed volume of individual atoms as a means of evaluating the viability of a protein model. Model Clustering Analysis uses NMRCLUST, NMRCORE, and OLDERADO which are programs that aid in the superposition and clustering of protein structure. References Figure 2 Peptide bonds create rigid plates which rotate about phi and psi. Figure 3 A Ramachandran plot for the tripeptide in Figure 2. [1] Esposito, E. X.; Tobi, D.; Madura, J. D. “Comparative Protein Modeling” Reviews in Computational Chemistry, Volume 22, 2006, Wiley-VCH, John Wiley & Sons, Inc. – to be published. [2] Ramachandran Plot and analine structure: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/AAA/AAA.html