* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Expression vector wikipedia , lookup
Biosynthesis wikipedia , lookup
Magnesium transporter wikipedia , lookup
Gene expression wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
Paracrine signalling wikipedia , lookup
Multi-state modeling of biomolecules wikipedia , lookup
Interactome wikipedia , lookup
Clinical neurochemistry wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Signal transduction wikipedia , lookup
Protein purification wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Western blot wikipedia , lookup
Drug design wikipedia , lookup
Ligand binding assay wikipedia , lookup
Biochemistry wikipedia , lookup
Metalloprotein wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Proteolysis wikipedia , lookup
Allele Mining: with respect to Comparative Protein Structure Modelling and Docking study Sunil Kumar Institute of Life Sciences Bhubaneswar E-mail: [email protected] Allele Mining: an Introduction • Enormous sequence information is available in public databases as a result of sequencing of diverse crop genomes. • It is important to use this genomic information for the identification and isolation of novel and superior alleles of agronomically important genes from crop gene pools to suitably deploy for the development of improved cultivars. • Allele mining is a promising approach to dissect naturally occuring allelic variation at candidate genes controlling key agronomic traits which has potential applications in crop improvement programs. • It helps in tracing the evolution of allels, identification of new haplotypes and development of allele specific markers for use in marker-assisted selection. Allele Mining…..cont • Initial studies of allele mining have focused only on the identification of SNP/InDels at coding sequences or exons of the gene. • Since these variations were expected to affect the encoded protein structure and/or function • However, recent reports indicate that the nucleotide changes in non-coding regions (5’UTR) including promoter, introns and 3’ UTR) also have significant effect on transcript synthesis and accumulation which in turn alter the trait expression Information Transfer pathway within the cell ……ATGCATGCATGCATGCATGC.. ………CGUACGUACGUACGU………… DNA ………CGUACGUACGUACGU………… RNA DECODING MECHANISM PROTEIN Sequence PROTEIN Structure Biological function Proteins Proteins are the building blocks of life. In a cell, 70% is water and 15%-20% are proteins. Examples: hormones – regulate metabolism structural – hair, wool, muscle,… antibodies – immune response enzymes – chemical reactions Amino Acids A protein is composed of a central backbone and a collection of (typically) 50-2000 amino acids There are 20 different kinds of amino acids Name Leucine Alanine Serine Glycine Valine Glutamic acid Threonine 3-letter code Leu Ala Ser Gly Val Glu Thr 1-letter code L A S G V E T Amino Acids Side chain Each amino acid is identified by its side chain, which determines the properties of this amino acid. Side Chain Properties •Hydrophobic stays inside, while hydrophilic stay close to water •Oppositely charged amino acids can form salt bridge. •Polar amino acids can participate hydrogen bonding Protein Folding • Proteins must fold to function • Some diseases are caused by misfolding e.g., mad cow disease Three Structure Levels Primary structure: sequence of amino acids Helix – e.g., DRVYIHPF Secondary structure: local folding patterns – e.g., alpha-helix, beta-sheet, loop Beta Sheet Tertiary structure: complete 3D fold Loop Beta Sheet Examples Parallel beta sheet Anti-parallel beta sheet Helix Examples Domain, Fold, Motif • A protein chain could have several domains ▫ A domain is a discrete portion of a protein, can fold independently, possess its own function • The overall shape of a domain is called a fold. There are only a few thousand possible folds. • Sequence motif: highly conserved protein subsequence • Structure motif: highly conserved substructure Protein Data Bank About 50,000 protein structures, solved using experimental techniques ~800 are unique structural folds Same structural folds Different structural folds The Problem protein structure • Protein functions determined by 3D structures • ~ 50,000 protein structures in PDB (Protein Data Bank) medicine • Experimental determination of protein structures timeconsuming and expensive • Many protein sequences available sequence function Why Protein 3D Structures? 3D Structures of Proteins Better Understanding of Protein Functions “Three-dimensional protein structures are important in understanding the mechanisms of human genetic diseases, predicting the effect of non-synonymous single nucleotide polymorphisms and developing new personalized medicines” Xie and Bourne (2005) PLoS Compt.Biol. 1:e31 What is Homology Modeling? An approach to predict a model of the three-dimensional structure of a given protein sequence (TARGET) based on an alignment to one or more known protein structures (TEMPLATES) The homology modeling method is based on the assumption that the structure of an unknown protein is similar to known structures of reference proteins Why a Model? A model is desirable when either X-ray crystallography or NMR spectroscopy can not determine the structure of a protein in time or at all. While the 3-D structure of proteins can be determined by x-ray crystallography and NMR spectroscopy. These experimental techniques are time consuming and not possible if a sufficient quantity and quality of a proteins is not available. The built model provides a wealth of information of how the protein functions with information at residue property level. This information can than be used for mutational studies or for drug design.. Protein Structure Determination • High-resolution structure determination ▫ X-ray crystallography (~1Å) ▫ Nuclear magnetic resonance (NMR) (~1-2.5Å) • Low-resolution structure determination ▫ Cryo-EM (electron-microscropy) ~10-15Å X-ray crystallography • most accurate • An extremely pure protein sample is needed. • The protein sample must form crystals that are relatively large without flaws. Generally the biggest problem. • Many proteins aren’t amenable to crystallization at all (i.e., proteins that do their work inside of a cell membrane). • ~$100K per structure Nuclear Magnetic Resonance • Fairly accurate • No need for crystals • limited to small, soluble proteins only. Steps in homology modelling Target’s sequence 1. Identification of structures that will form the template for modelling 2. Sequence Alignment of the target with template 3. Transfer of the coordinates from the template(s) to the target of structurally conserved regions (SCR’s) 4. Modelling the missing regions 5. Refinement and validation of the model Target’s structure Template search • Homology modeling is based on using similar structures i.e. no Similar structures = No Model • 40% amino acid identity or higher is best; below that is not advisable but examples of success do exist • Need sequence similarity across the whole sequence, not just in one part Searching Databases Query Database BLASTING…. FASTING…. Key Step: Sequence alignment of the target with the basis structures Good Alignment Good Model • Sequence alignment is a basic technique in homology modeling. • It is used to establish a one-to-one correspondence between the amino acids of the reference protein (template) and those of the unknown protein (target) in the structurally conserved regions. • The correspondence is the basis for transferring coordinates from the reference to the model protein Sequence A Sequence B GGTGGAC AAAGGTGAC GGTGGAC AAAGGTG - AC A Sample alignment of two DNA sequences (a) Un-gapped alignment (b) Gapped alignment. The “I” indicates matching nucleotides Sequence Alignment Global Local Alignm Alignm ent ent Applications: Global alignment : essential for comparative modeling. Local alignment : sufficient for functional domains. N.B: Global alignment is computationally more time consuming than the local alignment. Sequence Homology Vs Sequence Similarity Dotplot: A dotplot gives an overview of all possible alignments Sequence 2 A T T C A C A Sequence T 1 A T A C A T T A C G T A C Dynamic Programming Dynamic programming is a computational method used for aligning two protein or nucleotide sequences. The method compares every pair of residues/nucleotides in the two sequences and generates an alignment. In the alignment matches, mismatches and gaps in the two sequences are positioned in such a way that the number of matches between identical or similar residues is maximum possible. • Needleman and Wunsch Algorithm - Global Alignment • Smith and Waterman Algorithm - Local Alignment - F(i, j) = F(i-1, j-1) + s(xi ,yj) F(i, j) = max F(i, j) = F(i-1, j) - d F(i, j) = F(i, j-1) - d F(i-1, j-1) s(xi ,yj) F(i-1,j) -d j) F(i, F(i, j-1) -d Steps 1. Initialization:- 1st Row and 1st Column- Filled with Multiple of Gap Penalty 2. Rest of the cells: Filled with Vmax Value 3. Generation of Optimal path: Through back tracking 4. Generation of optimal alignment: For the optimal path (No. of optimal path = No. of optimal alignment Scoring Scheme :- Given an alignment between two sequences, we can compute its similarity by :- 1) Rewarding for a match 2) Penalizing for a mismatch 3) Penalizing for a gap Match => +1 Mismatch => -1 Gap or Indel => -2 Smith and Waterman (local alignment) Two differences: 0 1. F(i, j) = max F(i, j) = F(i-1, j-1) + s(xi ,yj) F(i, j) = F(i-1, j) - d F(i, j) = F(i, j-1) - d 2. An alignment can now end anywhere in the matrix Example: Sequence 1 Sequence 2 HEAGAWGHEE PAWHEAE Scoring parameters: BLOSUM Gap penalty: Linear gap penalty of 8 Comparative Modelling Methods Restrained based methods -MODELLER (Sali and Blundell, 1993) MODELLER MODELLER is a computer program that models three-dimensional structures of proteins and their assemblies by satisfaction of spatial restraints. MODELLER is most frequently used for homology or comparative protein structure modeling. The user provides an alignment of a sequence to be modeled with known related structures and MODELLER will automatically calculate a model with all non-hydrogen atoms. A 3D model is obtained by optimization of a molecular probability density function (pdf). Format for Modeller: INCLUDE SET ATOM_FILES_DIRECTORY = './:../‘ SET PDB_EXT = '.atm‘ SET STARTING_MODEL = 1 SET ENDING_MODEL = 20 SET MD_LEVEL = 'refine1‘ SET DEVIATION = 4.0 SET KNOWNS ='1JKE‘ SET HETATM_IO = off SET WATER_IO = off SET ALIGNMENT_FORMAT = 'PIR‘ SET SEQUENCE = 'target1‘ SET ALNFILE = 'multiple1.ali CALL ROUTINE = 'model' Loop Modelling Loop region Calculate distances between the anchor residues. FRAGMENT DATABASE Loop Generation Process: 1. Select a loop for each region 2. Fixing of the loop Loop Library • Loops extracted from PDB using high resolution (<2 Å) X-ray structures • Typically thousands of loops in DB • Includes loop coordinates, sequence, # residues in loop, Ca-Ca distance, preceding 2o structure and following 2o structure (or their Ca coordinates) Structure Validation (a) Stereochemical Quality Check (b) Residue Environment Check Stereochemical Quality Check PROCHECK (Thornton and Co-workers) Following properties are calculated and analysed in comparison with those of highly refined structures solved at varying resolutions. Torsional angles: - (f,y) combination - c1-c2 combination - c1 torsion for those residues without c2 - combined c3 and c4 angles - w angles Covelent geometry: - main-chain bond lengths - main-chain bond angles Profiles-3D • Amino acid residues in proteins can be classified according to their local environments: ▫ solvent accessibility ▫ secondary structure ▫ polarity of other protein chemical groups in contact with them Refining the Model - Energy minimize N- and C-termini. Repair spliced peptide bonds. Minimize loop regions Energy minimize mutated side chains in SCRs. Minimize segments together. Energy Minimization • Energy minimization adjusts the structure of the molecule in order to lower the energy of the system. • For small molecules, a global minimum energy configuration can often be found. • for large macromolecular systems, energy minimization allows one to examine the local minimum around a particular conformation. Modelling on the Web • Prior to 1998 homology modelling could only be done with commercial software or command-line freeware • The process was time-consuming and labor-intensive • The past few years has seen an explosion in automated web-based homology modelling servers • Now anyone can homology model! Application of Comparative Modeling - Comparative modeling is an efficient way to obtain useful information about the proteins of interest. For example – comparative modeling can be helpful in - Designing mutants to text hypothesis about the proteins function. - Identifying active and binding sites. - Searching for designing and improving. - Modeling substrate specificity. - predicting antigenic epitopes. - Simulating protein – protein docking. - Confirming a remote structural relationship. What is docking? Prediction of the optimal physical configuration and energy between two molecules The docking problem optimizes: Binding between two molecules such that their orientation maximizes the interaction Evaluates the total energy of interaction such that for the best binding configuration the binding energy is the minimum The resultant structural changes brought about by the interaction Molecular Docking • The process of “docking” a ligand to a binding site mimics the natural course of interaction of the ligand and its receptor via a lowest energy pathway. • Put a compound in the approximate area where binding occurs and evaluate the following: – Do the molecules bind to each other? – If yes, how strong is the binding? – How does the molecule (or) the protein-ligand complex look like. (understand the intermolecular interactions) – Quantify the extent of binding. Few terms related to docking • Receptor: The receiving molecule, most commonly a protein or other biopolymer. • Ligand: The complementary partener molecule which binds to the receptor. Ligands are most often small molecules but could also be another biopolymer. • Docking: Computational simulation of a candidate ligand binding to a receptor. • Binding mode: The orientation of the ligand relative to the receptor as well as the conformation of the ligand and receptor when bound to each other. • Pose: A candidate binding mode. • Scoring: The process of evaluating a particular pose by counting the number of favorable intermolecular interactions such as hydrogen bonds and hydrophobic contacts. • Ranking: The process of classifying which ligands are most likely to interact favorably to a particular receptor based on the predicted Classes of Docking Protein-Protein docking • • • • Both molecules usually considered rigid. 6 degrees of freedom, 3 for rotation, 3 for translation First apply only steric constraints to limit search space Then examine energetics of possible binding confirmations Protein-Ligand docking • Flexible ligand, rigid receptor. • Search space much larger • Either reduce flexible ligand or rigid fragments to connected by one or several hings (reduces confirmational space) • Or search the confirmational space using the montecarlo methods or molecular dynamics. 1. Protein-Protein Docking 1. Protein-Ligand Docking optimized Docking uses a “search and score” method It involves: Finding useful ways of representing the molecules and molecular properties. Exploration of the configuration spaces available for interaction between ligand and receptor. Evaluate and rank configurations using a scoring system, in this case the binding energy However, since it is difficult to evaluate the binding energy because the binding sites may not be easily accessible, the binding energy is modeled as follows: ∆G bind= ∆Gvdw + ∆Ghbond + ∆Gelect + ∆G conform+ ∆G tor + ∆G sol 3D Structure of the Complex Experimental Information: The active site can be identified based on the position of the ligand in the crystal structures of the protein-ligand complexes If Active Site is not KNOWN????? Some Available Programs to Perform Docking Affinity AutoDock BioMedCAChe CAChe for Medicinal Chemists • DOCK • DockVision • • • • • • • • • • • FlexX Glide GOLD Hammerhead PRO_LEADS SLIDE VRDD Ligand in Active Site Region Ligand Active site residues Histidine 6; Phenylalanine 5; Tyrosine 21; Aspartic acid 91; Aspartic acid 48; Tyrosine 51; Histidine 47; Glycine 29; Leucine 2; Glycine 31; Glycine 22; Alanine 18; Cysteine 28; Valine 20; Lysine 62 Examples of Docked structures HIV protease inhibitors COX2 inhibitors Rigid Docking • Shape-complementarity method: find binding mode(s) without any steric clashes • Only 6-degrees of freedom (translations and rotations) • Move ligand to binding site and monitor the decrease in the energy • Only non-bonded terms remain in the energy term • try to find a good steric match between ligand and receptor The DOCK algorithm in rigid-ligand mode . . 1. Define the target binding site points. .. . . 2. Match the distances. .. . . 3. Calculate the transformation matrix .. for the orientation. N F H N N O S N F H N N O S N F H N 4. Dock the molecule. N O S N F H N N O S 5. Score the fit. Flexible Docking • Dock flexible ligands into binding pocket of rigid protein • Binding site broken down into regions of possible interactions binding site from X-ray H-bonds parameterised binding site Need for Scoring Detailed calculations on all possibilities would be very expensive The major challenge in structure based drug design to identify the best position and orientation of the ligand in the binding site of the target. This is done by scoring or ranking of the various possibilities, which are based on empirical parameters, knowledge based on using rigorous calculations Caspase Dependent Programmed Cell Death in Developing Embryos: A potential Target for Therapeutic Intervention against Pathogenic Nematodes For the first time, we developed and evaluated flow cytometry based assays to assess several conserved features of apoptosis in developing embryos of a pathogenic filarial nematode Setaria digitata, in vitro. We validated programmed cell death in developing embryos by using immunofluorescence microscopy and scoring expression profile of nematode specific proteins related to apoptosis [e.g. CED-3, CED-4 and CED-9]. Mechanistically, apoptotic death of embryonic stages was found to be a caspase dependent phenomenon mediated primarily through induction of intracellular ROS. The apoptogenicity of some pharmacological compounds viz. DEC, Chloroquine, Primaquine and Curcumin were also evaluated. Curcumin was found to be the most effective pharmacological agent followed by Primaquine while Chloroquine displayed minimal effect and DEC had no demonstrable effect. Further, demonstration of induction of apoptosis in embryonic stages by lipid peroxidation products [molecules commonly associated with inflammatory responses in filarial disease] and demonstration of in-situ apoptosis of developing embryos in adult parasites in a natural bovine model of filariasis have offered a framework to understand anti-fecundity host immunity operational against parasitic helminths. PLoS NTD, 2011 Induction of apoptosis in developing embryos of a pathogenic nematode CARD Domain α/β(P-loop) Domain Cytochrome-c Helical Domain Winged helix Domain CED- 4 PLoS NTD, 2011 Binding efficiencies of carbohydrate ligands with different genotypes of cholera toxin B: molecular modeling, dynamics and docking simulation studies J Mol Model, 2011 Molecular interaction plots between carbohydrate ligand and genotype 1. a) Galactose b) Sialic acid c) N-acetyl galactosamine J Mol Model, 2011 Molecular interaction plots between carbohydrate ligand and genotpye 3. a) Galactose b) Sialic acid c) N-acetyl galactosamine J Mol Model, 2011 Molecular interaction plots between carbohydrate ligand and genotype 5. a) Galactose b) Sialic acid c) Nacetyl galactosamine Molecular interaction plots between carbohydrate ligand and genotpye 6. a) Galactose b) Sialic acid c) Nacetyl galactosamine Molecular cloning of cDNA and peptide structure prediction of Plzf expressed in the spermatogonial cells of Labeo rohita • The promyelocytic leukemia zinc finger (Plzf) gene containing evolutionary conserved BTB domain plays a key role in self-renewal of mammalian spermatogonial stem cells. • Little is known about the function of plzf in vertebrate, especially in fish species. • Cloned plzf from the testis of Labeo rohita (rohu), a commercially important freshwater carp. Containing a conserved N-terminal BTB domain and C-terminal C2H2-zinc finger motifs. Marine Genomics, 2010 Molecular cloning of cDNA and peptide structure prediction of Plzf expressed in the spermatogonial cells of Labeo rohita •A 3D model of BTB domain of plzf protein was constructed by homology modeling approach. Marine Genomics, 2010 •Molecular docking on this 3D structure established a homo-dimer between two BTB domains creating a charged pocket containing conserved AA residues: L33,C34, D35, and R49. Marine Genomics, 2010 Thus, Plzf of SSC is structurally and possibly functionally conserved. The identified plzf could be the first step towards exploring its role in rohu SSC behavior. Marine Genomics, 2010 Thank you • Alok Das Mohapatra, Sunil Kumar, Ashok Kumar Satapathy and Balachandran Ravindran (2011). Apoptosis in a pathogenic nematode involves mitochondrial pathway. PloS Neglected Tropical Disease (In Press). • MHU Turabe Fazil, Sunil Kumar, Rohit Farmer, HP Pandey and DV Singh(2011). Binding efficiencies of carbohydrate ligands with different genotypes of cholera toxin B: Molecular Modeling, dynamics and Docking Simulation studies. J Mol Model, DOI 10.1007/s00894-010-0947-6 (Springer publication). • Biswaranjan Paital, Sunil Kumar*, Rohit Farmer, Niraj Kanti Tripathy, Gagan Bihari Nityananda Chainy (2011) In silico prediction and characterization of 3D structure and binding properties of catalase from the commercially important crab, Scylla serrata. Interdiscip Sci Comput Life Sci 3: 110–120(Springer publication).*corresponding author. • Chinmayee Mohapatra, Hirak Kumar Barman, Rudra Prasanna Panda, Sunil Kumar, Varsha Das, Ramya Mohanta, Shibani Mohapatra, Pallipuram Jayasankar (2010) Cloning of cDNA and prediction of peptide structure of plzf expressed in the spermatogonial cells of Labeo rohita, Mar. Genomics, doi: 10.1016/j.margen.2010.09.002. (Elsevier publication). • MHU Turabe Fazil*, Sunil Kumar*, N Subbarao, H P Pandey and Durg V. Singh (2010). Homology modeling of a sensor histidine kinase from Aeromonas hydrophila. J Mol Model, 16: 1003-1009 * Equal contribution. (Springer publication). • • Babu A Manjasetty, Sunil Kumar, Andrew P Turnbull and Niraj Kanti Tripathy (2009). Homology Modeling and Analysis of Human Disease Proteins: Structural Investigations of Shwachman-Bodian-Diamond Syndrome (SBDS) model through Bioinformatics Approach InterJRI Science and Technology, Vol. 1, Issue 2,97-104