* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download BMMB597E_lecture3
Survey
Document related concepts
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Point mutation wikipedia , lookup
Paracrine signalling wikipedia , lookup
Signal transduction wikipedia , lookup
Biochemistry wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Metalloprotein wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Transcript
BMMB597E Protein Evolution Protein classification 1 Protein families • The first protein structures determined by X-ray crystallography, myoglobin and haemoglobin, were solved (in 1959—60) before the amino acid sequences were determined • It came as a surprise that the structures were quite similar • Soon it became clear, on the basis of both sequences and structures, that there were families of proteins 2 myoglobin haemoglobin 3 50 years earlier, there were some hints … • E.T. Reichert & A.P. Brown. The differentiation and specificity of corresponding proteins and other vital substances in relation to biological classification and organic evolution: the crystallography of hemoglobins. (Carnegie Institution of Washington, 1909) • Crystallography 3 years before discovery of Xray diffraction? 4 Reichert and Brown studied interfacial angles in haemoglobin crystals • Stenö’s law (1669): different crystals of the same substance may have differerent sizes and shapes, but the angles between faces are constant for each substance • They found that the angles differed from species to species • Similarities in values of interfacial angles were consistent with classical taxonomic tree • They even found differences between oxy- and deoxyhaemoglobin 5 Most premature scientific result ever? • These results implied: – That proteins adopted (or at least could adopt) unique structures, to form a crystal – That protein structures varied between species – That this variation was parallel with the evolution of the species – That proteins could change structure as a result of changes in state of ligation • In 1909! 6 M.O. Dayhoff • • • • Pioneer of bioinformatics Collected protein sequences First curated ‘database’ Recognized that proteins form families, on the basis of amino acid sequences • Computational sequence alignments • First evolutionary tree • First amino-acid substitution matrix (later replaced by BLOSUM) 7 Can relationships among proteins be extended beyond families? • Families = sets of proteins with such obvious similarities that we assume that they are related • One question: how much similarity do we need to believe in a relationship? • How far can evolution go? • Convergent evolution? • Cautionary tale: chymotrypsin / subtilisin 8 Chymotrypsin-subtilisin • Both proteolytic enzymes – Chymotrypsin mammalian – subtilisin from B. subtilis • Both have catalytic triads • Same function – same mechanism • Sequences 12% similar (near noise level) • However, structures show them to be unrelated 9 Chymotrypsin / Subtilisin 10 Catalytic triad in serine proteinases 11 Chymotrypsin and subtilisin have similar catalytic triads 12 How can we classify proteins that belong to families? • Align sequences • Calculate phylogenetic tree (various ways to do this, depend on sequence alignment) • Usually, phylogenetic tree of homologous proteins from different species follow phylogenetic tree based on classical taxonomy • That is reassuring • But what happens as divergence proceeds? 13 How can we classify proteins that do not obviously belong to families? • Base this on structure rather than sequence • Structural similarities are maintained as divergence proceeds, better than sequence similarities • For closely related proteins, expect no difference between sequence-based and structure based classification • How far can classification be extended? 14 SCOP Structural Classification of Proteins • Idea of A.G. Murzin, based on old work by C. Chothia and M. Levitt • Even if two proteins are not obviously homologous, they may share structural features, to a greater or lesser degree. • For instance, the secondary structures of some proteins are only -helices • Others, have -sheets but no -helices 15 SCOP • SCOP is a database that gives a hierarchical classification of all protein domains • Recall that a domain is a compact subunit of a protein structure that ‘looks as if’ it would have independent stability Fragment of fibronectin 16 Dissection of structure into domains • It is not always quite so obvious how to divide a protein into domains • There is some (not a lot) of room for argument • Note that sometimes the chain passes back and forth between domains • In these cases one or both domains do not consist entirely of a consecutive set of residues 17 lactoferrin 18 SCOP, CATH, DALI Database classify protein structures • SCOP (Structural Classification of Proteins) • CATH (Class, Architecture, Topology, Homologous superfamily) • DALI Database • These web sites have many useful features: – information-retrieval engines, including search by keyword or sequence – presentation of structure pictures – links to other related sites including bibliographical databases. 19 SCOP http://www.scop.mrc-lmb.cam.ac.uk • SCOP organizes protein structures in a hierarchy according to evolutionary origin and structural similarity. • Domains -- extracted from the Protein Data Bank entries. • Sets of domains are grouped into families: sets domains for which imilarities in structure, function and sequence imply a common evolutionary origin. 20 The SCOP hierarchy • Families that share a common structure, or even a common structure and a common function, but lack adequate sequence similarity – so that the evidence for evolutionary relationship is suggestive but not compelling – are grouped into superfamilies • Superfamilies that share a common folding topology, for at least a large central portion of the structure, are grouped as folds. • Finally, each fold group falls into one of the general classes. 21 Major classes in SCOP • – secondary structure all helical • – secondary structure all sheet • / – helices and sheets, but in different parts of structure • + – contain -- supersecondary structure • ‘small proteins’ – which often have little secondary structure and are held together by disulphide bridges or ligands; for instance, wheatgerm agglutinin) 22 Summary of SCOP hierarchy • • • • • Class Fold Superfamily Family Domain 23 SCOP classification of flavodoxin Protein: Flavodoxin from Clostridium beijerinckii [TaxId: 1520] Lineage: Root: scop Class: Alpha and beta proteins (a/b) [51349] Mainly parallel beta sheets (beta-alpha-beta units) Fold: Flavodoxin-like [52171] 3 layers, a/b/a; parallel beta-sheet of 5 strand, order 21345 Superfamily: Flavoproteins [52218] Family: Flavodoxin-related [52219] binds FMN Protein: Flavodoxin [52220] Species: Clostridium beijerinckii [TaxId: 1520] [52226] PDB Entry Domains: 5nul complexed with fmn; mutant chain a [31191] 2fax complexed with fmn; mutant chain a [31194] … many others 24 Clostridium beijerinckii Flavodoxin (stereo pair) 25 Flavodoxin NADPH-cytochrome P450 reductase same superfamily, different family 26 Flavodoxin CHEY same fold, different superfamily 27 Flavodoxin Spinach ferredoxin reductase same class, different folds 28 Flavodoxin in the SCOP hierarchy • To give some idea of the nature of the similarities expressed by the different levels of the hierarchy • Flavodoxin from Clostridium beijerinckii and NADPHcytochrome P450 reductase are in the same superfamily, but different families. • Flavodoxin and the signal transduction protein CHEY are in the same fold category, but different superfamilies. • Flavodoxin and Spinach ferredoxin reductase are in the same class – + – but have different folds. 29 CATH presents a classification scheme similar to that of SCOP • CATH = Class, Architecture, Topology, Homologous superfamily, the levels of its hierarchy. • In CATH, proteins with very similar structures, sequences and functions are grouped into sequence families. • A homologous superfamily contains proteins for which similarity of sequence and structure gives evidence of common ancestry • A topology or fold family comprises sets of homologous superfamilies that share the spatial arrangement and connectivity of helices and strands • Architectures are groups of proteins with similar arrangements of helices and sheets, but with different connectivity. For instance, different four -helix bundles with different connectivities would share the same architecture but not the same topology in CATH • General classes of architectures in CATH are: . , - (subsuming the / and + classes of SCOP), and domains of low secondary structure content. 30 Do different classification schemes agree? • To classify protein structures (or any other set of objects) you need to be able to measure the similarities among them. • The measure of similarity induces a tree-like representation of the relationships. • CATH, SCOP, DALI and the others, agree, for the most part, on what is similar, and the tree structures of their classifications are therefore also similar. • However, even an objective measure of similarity does not specify how to define the different levels of the hierarchy. • These are interpretative decisions, and any apparent differences in the names and distinctions between the levels disguise the underlying general agreement about what is similar and what is different. 31