* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download An Introduction to Protein Structure Databases
Rosetta@home wikipedia , lookup
Protein design wikipedia , lookup
List of types of proteins wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Circular dichroism wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein folding wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein mass spectrometry wikipedia , lookup
Protein domain wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Homology modeling wikipedia , lookup
Protein structure prediction wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
An Introduction to Protein Structure Databases Michael Tress CNIO The Protein Data Bank Small targets To the PDB … The Protein Data Bank The PDB is the only repository for proteins and related biological macromolecules. It provides links to a variety of tools and resources for studying the structures and their relationship to sequence, function, and disease. The RSCB Protein Data Bank The RSCB Protein Data Bank The RSCB Protein Data Bank The RSCB Protein Data Bank The RSCB Protein Data Bank The RSCB Protein Data Bank The RSCB Protein Data Bank The RSCB Protein Data Bank Sequence Tab Uniprot Sequence Cross-reference Missing (disordered) residues Secondary structure The RSCB Protein Data Bank Other Tabs … Statistical scores showing deviations Structural features Mean deviation from norm The RSCB Protein Data Bank The RSCB Protein Data Bank The RSCB Protein Data Bank Header … Chain ID Journal The RSCB Protein Data Bank Atom Residue Co-ordinates Chain ID Flexibility The RSCB Protein Data Bank SCOP (Structural Classification of Proteins) A detailed description of structural relationships between proteins, classified mostly by eye. FAMILY: Clear evolutionary relationship. Proteins in families are clearly evolutionarily related. Generally the pairwise residue identities are greater than 30%. SUPERFAMILY: Probable common evolutionary origin. Proteins often have low sequences identities, but structural and functional features suggest a common evolutionary origin. FOLD: Major structural similarity. Proteins are defined as having a common fold if they have the same secondary structures arrangement and the same topological connections. The CATH Database The same idea as SCOP, another classification of proteins, except there are 4 levels. The CATH classification system is semi-automatic. Class: Secondary structure and packing Architecture: overall shape, domain structure and orientation (no connectivities between the secondary structures) Topology (FOLD family): overall shape and connectivities. Homologous superfamily: proteins are thought to share common ancestor. Similarities by sequence alignment and then by structure comparison using the SSAP structural alignment program. Structural Classification by 3D Alignments The superposition of 2 (or more) 3D structures, so that as many atoms as possible match. Alignment usually only by c-alpha atoms. 3D alignments are not sequence alignments, but they can converted into sequence alignments. Structural alignment also important for evolutionary comparisons and functional studies. There are many different methods, all of which have different principles. For example, DALI (contact maps), Mammoth (secondary structure), SSAP (dynamic programming), LGA (longest segment). Measurements of Structural Similarity The two structures being compared can be experimental structures or predicted models. RMSD: Root Mean Squared Deviation is the total distance between all the equivalent atoms in two structures. It is measured in Angstroms.