* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Structural Bioinformatics In this presentation……
Implicit solvation wikipedia , lookup
Protein design wikipedia , lookup
Rosetta@home wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein purification wikipedia , lookup
Protein folding wikipedia , lookup
Protein moonlighting wikipedia , lookup
Circular dichroism wikipedia , lookup
List of types of proteins wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein domain wikipedia , lookup
Western blot wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Alpha helix wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Structural alignment wikipedia , lookup
Homology modeling wikipedia , lookup
Structural Bioinformatics In this presentation…… Part 1 – Proteins & Proteomics Part 2 – Protein Structure & Function Part 3 – Analysis & Visualization Part 4 – Protein Structure Prediction Part 1 Proteins & Proteomics Proteins • Proteins are the fundamental building blocks of life • Enzymes are proteins that are molecular machines responsible for all the chemical transformations cells are capable of • Those structure that are not made of proteins are produced by enzymes (which are proteins) • A human contains proteins of the order of 100,000 different proteins • Proteins are of variable length and shape Structural types and conceptual models • Globular proteins are soluble in predominantly aqueous solvents such as the cytosol and extra-cellular fluids, and integral membrane proteins exist within the lipid-dominated environment of biological membranes • Conceptual models of protein structure are valuable aids to understanding protein bioinformatics Globular proteins • The linear amino acid polymer forms a 3D structure by folding into a globular compact shape • Globular proteins tend to be soluble in aqueous solvents and folding is dominated by the hydrophobic effect, which directs hydrophobic amino acid side-chains to the structural core of the protein, away from the solvent Secondary structure • Globular proteins usually contain elements of regular secondary structure, including –helices and –strands • These are stabilized by hydrogen bonding and contribute most of the amino acids to globular protein cores • Residues in regular secondary structures are given the symbol H, meaning helix, or E (or B), meaning extended or strand Folding of polypeptide chain into an helix 0.54 nm (3.6 amino acid residues per turn) C atoms of consecutive amino acid residues Position of polypeptide backbone consisting of 0.15 nm (100° C and peptide bond C-N rotation per residue) atoms Cross-sectional view of an helix showing the positions of the side-chains (R groups) of the amino acids on the outside of the helix Amino acid side-chains H N C C H R1 O H H O N C C R2 H N C C H R3 O H H O N C C R4 H N C C H R5 O In the helix the CO group of residue n is hydrogen bounded to the NH group on residue (n+4) Hydrogen bond R R R R R R R R Cross-sectional view of an helix showing the positions of the side-chains (R groups) of the amino acids on the outside of the helix Tertiary structure • It is the full 3D atomic structure of a single peptide chain • It can be viewed as the packing together of secondary structure elements, which are connected by irregular loops that lie predominantly on the protein surface • Loop residues are given the symbol C to distinguish them from residues in helices or strands Tertiary Structures Quaternary structures • Several tertiary structures may pack together to form the biologically functional quaternary structure Quaternary Structures Integral membrane proteins • These exist within biological lipid membranes and obey different structural principles compared with globular proteins • They contain runs of generally hydrophobic amino acids, associated with membrane-spanning segments (often but not exclusively helices), connected by more hydrophilic loops that lie in aqueous environments outside the membrane • Membrane proteins are very important components of cellular signaling and transport systems Domains • Proteins tend to have modular architecture and many proteins contain a number of domains, often with mixed types, for example mixed integral membrane and globular domains Evolution • In globular proteins, surface residues in loops evolve (change) more quickly than residues in the hydrophobic core • In integral membrane proteins, the most slowly evolving residues are those in the membrane-spanning regions Protein structure prediction • Identifying all of the proteins in a human is one thing, but to truly understand a protein’s function scientists must discern its shape and structure • The structural genomics initiative calls for use of quasiautomated x-ray crystallography to study normal and abnormal proteins • Conventional structural biology is based on purifying a molecule, coaxing it to grow into crystals and then bombarding the sample with x-rays. X-rays bounce off the molecule’s atoms, leaving a diffraction pattern that can be interpreted to yield molecule’s overall 3D shape • A structural genomics initiative would depend on scaling up and speeding up the current techniques • By figuring out which of the unknown proteins associated with previously identified ones, the CuraGen and University of Washington scientists were able to sort them into functional categories, such as energy generation, DNA repair, aging • Eventhough yeast is an excellent prototype, Drosophila is good when desired to study an organism with multiple cells Other methods for protein prediction • Another method for studying proteomes is called “guilt by association”: learning about the function of a protein by assessing whether it interacts with another protein whose role in a cell is known • A group lead by Stanley Fields of University of Washington reported that they had deduced 957 interactions among 1,004 proteins in baker’s yeast [S. cerevisiae] • A machine devised by Hochstrasser and his research group goes one step further than the robots. It would automatically extract the protein spots from the gels, use enzymes to chop the proteins into bits, feed the pieces into a laser mass spectrometer and transfer the information to a computer for analysis • With or without robotic arms, 2-D gels have their problems. Besides being tricky to make, they do not resolve highly charged or low mass proteins very well • They also do a poor job of resolving proteins with hydrophobic regions, such as those that span the cell membrane. This is a major limitation, because membranespanning receptors are important drug targets • Fields and his colleagues first devised a widely used method for studying protein interactions called the yeast two-hybrid system, which uses known protein “baits” to find “prey” proteins that bind to the “baits” • Another way to study proteins that has recently become available involves so called protein chips. Ciphergen Biosystems, a biotechnology company in Palo Alto, is selling a range of strips for isolating proteins according to various properties, such as whether they dissolve in water or bind to charged metal atoms. Strips can then be placed in chip reader, which includes a mass spectrometer, for identifying the proteins What’s new • Knowing the exact structural form of each of the proteins in the human proteome should, in theory, help drug designers devise chemicals to fit the slots on the proteins that either activate them or prevent them from interacting • Such efforts, which are generally known as rational drug design, have not shown widespread success so far – but then only roughly one percent of all human proteins have had their structures determined • After scientists catalogue human proteome, it will be the proteins – not the genes – that will be all the rage Part 2 Protein Structure & Function Structure and function • Proteins rely upon the shapes and properties of key functional areas of their 3D structures to carry out biological functions • Knowledge of protein structure is the key to understanding protein function and this is one reason for its importance in bioinformatics MUTZM WTZM Structural and functional constraints • Evolution accepts change to amino acid residues in proteins where they have a neutral or advantageous effect on protein structural stability or protein function • Residues can be conserved for structural or functional reasons • Amino acids are conserved where they are uniquely able to fulfill particular structural roles • This often occurs with cysteine, glycine and proline RPIP TOLC 150 FGF1 XRCC4 300 Evolution of the overall protein fold • If two naturally occurring protein sequences can be aligned to show more than 25 percent similarity over an alignment of 80 or more residues, then they will share the same basic structure • The Sander-Schneider formula gives the higher threshold percentage identifies necessary to guarantee structural similarity from shorter alignments Conservation of structure • Protein structures tend to be conserved even when evolution has changed the sequence almost beyond recognition • Structural knowledge is therefore a key factor in understanding protein evolution Evolution of function • While structure tends to be conserved by evolution, function is observed to change • There are many examples of proteins whose sequence and structure are very similar, but which have different functions • When function has changed, key functional residues change as well, and this is often clear in multiple sequence alignments Multiple sequence alignment • Understanding how structures evolve can help us understand multiple sequence alignments • Key structural and functional residues are often observed to be conserved • Insertions and deletions are seen to occur preferentially in hydrophilic surface loops by comparison with regular secondary structure elements • Loops are also subject to faster mutational change • Conservation of hydrophobic core residues in secondary structure elements is also common, as are conservation patterns associated with amphipathic helices Part 3 Analysis & Visualization Software, data and WWW sites • A large variety of software for structure visualization, alignment and analysis is available on the WWW • All published protein structures are submitted to a public database. Database search and down can be performed at varios WWW sites • Rasmol, Chime and Cn3d are commonly used programs for viewing structural data Structural and functional analysis of structures • There is an enormous amount of software available for structural data analysis, and also several WWW sites holding pre-prepared analyses • Functional sites in protein structures typically contain a few residues in defined spatial positions • Software and databases have been developed to locate and search for similarity in such sites Structural alignment • It can be very difficult to find correct, biologicallymeaningful alignments of very distantly related protein sequences because they contain only a very small proportion of identical monomers • In such cases, structural information can help because evolution tends to change structure less • Superimposing the backbones of similar structures implies structurally equivalent residues and this process is known as structural alignment Structural similarity • Structural alignment methods often produce measure of structural similarity • The most common of these is the RMSD, which is reported by most programs • This the root mean square difference in position between the carbon atoms of aligned residues in optimal structural superposition Why classify protein structures?… • Classification groups together proteins with similar structures and common evolutionary origins • Examples – CATH, available at http://www.biochem.ucl.ac.uk/bsm/cath – SCOP, available at http://scop.mrc-lmb.cam.ac.uk/scop Structural classes • Proteins can be assigned to broad structural classes based on secondary structure content and other criteria • CATH has four such broad classes, but SCOP uses more, giving a more detailed description of structural class Fold or topology • All classifications gather together proteins with the same overall fold or topology • Proteins in the same fold or topology class contain more or less the same SSEs, connected in the same way and in similar relative spatial positions Homologs and analogs • Homologs (homologous proteins) are related by divergent evolution from a common ancestor, and have the same fold • Analogs (analogous proteins) have the same fold, but other evidence for common ancestry is weak Super-folds • Super-folds are proteins folds that seem likely to have arisen more than once in evolution • They are thought to have advantageous physiochemical properties • They appear in SCOP and CATH as fold or topology levels containing several homologous super-families • Examples are the TIM barrel and immunoglobulin fold • Characteristics are that they tend to exhibit approximate symmetries, and are characterized by repeated super-secondary structures Part 4 Protein Structure Prediction Why predict structure?… • Structure prediction is interesting because experimental structure determination is still much slower than sequence determination • Structure predictions help us to understand function and mechanism and can be used for rational drug design • The early work of Levinthal and Anfinsen made structure prediction a fascinating scientific problem Structure prediction methods • • • • • Comparative modeling Secondary structure prediction Fold recognition Ab initio prediction Transmembrane segment prediction Theoretical basis of comparative modeling • Sequences with more than 25 percent identity over an alignment of 80 residues or more adopt the same basic structure • The is the basis of prediction by comparative modeling Ingredients • All that is needed is an alignment between a sequence of unknown structure (target) and one or more of known structure (template(s)) with the above property • Template structures can be found by standard sequence similarity search methods • Lack of suitable template structures is the main limitation of the method, but structural genomics projects are likely to change this in coming years • The accuracy of the alignment is crucial if good prediction is to be obtained The process of prediction • Known structure(s) (templates) are used as the basis of prediction • The process can then be viewed conceptually as comprising placement of conserved core residues, modeling of variable loops, side-chain positioning and optimization, and model refinement • Conserved residues and some side-chain positions can be obtained directly from structural information in the templates • Modeling of variable loops often makes use of the spare parts algorithm, and there are sophisticated algorithms for side-chain placement to obtain an optimally packed hydrophobic core Protein prediction – I Protein prediction – II Accuracy of comparative modeling • Accuracy is controlled almost entirely by the quality of the alignment • Good alignments yield good predictions with most of the main software packages • Of all prediction methods, comparative modeling produces the most accurate models Secondary structure prediction • It predicts the conformational state of each residue in three categories – Helical – Extended or strand – Coil Methods • Many methods are based on ideas related to secondary structure propensity, which is a number reflecting the preference of a residue for a particular secondary structure • Early methods had accuracies of around 60 percent (the percentage of residues predicted in the correct helical/extended/coil state) • Examples of early methods are the Chou-Fasman rule-based method and the information-theoretical GOR method Multiple sequence information • Using multiple alignments of related sequences can improve prediction accuracy enormously by revealing patterns of conservation indicative of certain secondary structures Accuracy of state-of-the-art methods • Currently methods claim an average accuracy over trusted test sets of proteins equal to more than 70 percent of residues correctly predicted • This increase in accuracy can be attributed to the availability of more structural data, and the use of more sophisticated algorithms or methods Prediction of trans-membrane segments • Membrane-spanning segments in integral membrane proteins can be predicted with reasonable accuracy • Most methods make use of a search for contiguous runs of hydrophobic residues that span a lipid membrane • Some methods also predict the orientation (in-out) or topology of the membrane-spanning segments, but this is usually less accurate Availability of tools • Most of the secondary structure and trans-membrane segment prediction tools are available from the ExPASy WWW site, at http://www.expasy.ch Fold recognition • It aims to detect very distant structural and evolutionary relationships • It aims to detect when a protein adopts a known fold even it does not have significant sequence similarity to any protein of known structure • Methods generally try to find the most compatible fold in a library of known folds using both sequence and structural information • An alternative term for fold recognition is threading Ab initio prediction • These methods rely on first principles calculation and are not yet sufficiently well developed to be of real use in practical structure prediction Difficulties in modeling in silico • Not all occurrences of a desired part or fragment are to be found and changed but only particular one • Proteins are globular and not solid objects. They behave differently for different drug molecules • Penetration through cell wall, then through nucleus to DNA is not possible as this could effect the entire cell – the only way is by module it over mRNA Membrane protein • • • • Total entries in PDB 20173 Proteins 18162 Membrane proteins only 8 The membrane proteins are highly suitable for docking drug into the proteins Do not dissolve in water – tried enough with NMR Crystallization of membrane proteins is a difficult task as they cause damage to other structures of the cell After failure of crystallography and NMR, it is the turn of computers (for in silico protein modeling and drug design) Approach to protein modeling • Conventional protein modeling technique is to compute all the folding, side-chain arrangement and visualize the final protein structure • But a new method wherein the side-chain arrangements are computed separately first and then folding computations are done in parallel. Finally, complete information is integrated. This method proved to be million times faster as compared to the conventional method Template library of fragments • The process of protein modeling becomes extremely easier if all the common fragments or side-chains are developed and stored as molecule template library • This technique reduces the time consumed greatly as well as speeds up the visualization • Until date, about 180 templates of various organic compounds have been identified and developed at IIT Delhi • All other compounds or molecules can be modeled by suitably assembling them together • It would also be easier to compare different proteins with this approach Protein folding • It is a well known fact that any protein would fold as and when it reaches 30 nm • Also, it has been found that due to the globular structure, proteins cannot take over 8 sheets and strands • It has been seen that due to the heavy molecular weight, extra large protein molecules disintegrate • Phosphates in DNA repel and hence form coil structures, which add to difficulty in folding and modeling them in silico Folding through computers • Keeping in view the possible number of sheets or strand attachment to a protein, which can occur at 45˚ interval, in 3D there could be 26 possibilities of folding • The folds could be easily simulated or modeled through use of a computer as it would take 226 minutes for folding a protein @ one fold per minute, which is about 50 years!! • With 100 processors running throughout, this could be achieved in about 50 days