* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Amino acid substitution and protein structure
Artificial gene synthesis wikipedia , lookup
Expression vector wikipedia , lookup
Catalytic triad wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Gene expression wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Peptide synthesis wikipedia , lookup
Western blot wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Point mutation wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Homology modeling wikipedia , lookup
Genetic code wikipedia , lookup
Metalloprotein wikipedia , lookup
Biosynthesis wikipedia , lookup
Protein Structure II (end I3+I4-I6) Amino Acid Substitution Let us define “amino acid substitution”: A viable mutation that changes a protein so that the amino acid that was at some location becomes another amino acid Are some amino acid substitutions more likely than others? Amino Acid Substitution Let us define “amino acid substitution”: Are some amino acid substitutions more likely than others? A viable mutation that changes a protein so that the amino acid that was at some location becomes another amino acid Name a table that describes some details Do likelihoods depend on the protein, the location in it, neither, or both? Amino Acid Substitution Let us define “amino acid substitution”: Are some amino acid substitutions more likely than others? A viable mutation that changes a protein so that the amino acid that was at some location becomes another amino acid Name a table that describes some details Do likelihoods depend on the protein, the location in it, neither, or both? Why? 3 Hard-to-Substitute Amino Acids Cysteine (“cyst-e-een”) Stabilizes structures via side chain bonds Glycine Has tiny side “chain” of a hydrogen atom This makes glycine very flexible Two cysteines make a cystine Useful in very tight turns Proline Flips more easily than other amino acids The Proline Example “Proline [is] common in transmembrane helices” S Yohannon et al., “Proline substitutions are not easily accomodated in a membrane protein,” J. Mol. Bio. 2004, 341(1):1-6 “They...produce deviations from canonical helical structure” Flips more easily than other amino acids between cis and trans forms This changes the shape and can affect folding Patterns of conserved AAs can imply facts about the fold Consider this quote: “…the alternation between blocks of conserved residues and blocks where the sequences are more variable…can be interpreted in terms of probable secondary structure elements alternating with surface loops.” – Instant Notes Bioinformatics, Westhead et al., 2002 Would this refer to fibrous, globular, or integral membrane proteins? A Different Conservation Example See handout (Westhead et al. Figure 2) A4, I8, V1, L5 are hydrophobic D2, E6, K3, S7 are hydrophilic How many amino acids per turn? End view hides the turns but #1-8 tell the story A Different Conservation Example See handout (Westhead et al. Figure 2) A4, I8, V1, L5 are hydrophobic D2, E6, K3, S7 are hydrophilic This is about 2 turns of an alpha-helix End view hides the turns but #1-8 tell the story Average turn in alpha helix = 3.6 amino acids Example (continued) A4, I8, V1, L5 are hydrophobic D2, E6, K3, S7 are hydrophilic Many alpha-helices have one side facing inside and one facing the outer, polar, water Called an “amphipathic” alpha helix Amphipathic: having both hydrophobic and hydrophilic parts If a sequence shows alternating hydrophobic and hydrophilic subsequences, that suggests …an amphipathic alpha helix What can you say about the length of the subsequences? What kinds of AA substitutions will tend/not tend to occur? Conserving Overall Structure Sander and Schneider found that for typical naturally occurring proteins… surprisingly (to me) few identical amino acids were needed to conserve structure Let t(L) be the % of identically aligned amino acids required to conserve structure t(L)=290.15L-0.562 L is the length of the sequence Does the % go up or down with greater length? Let’s try a couple of examples View Formats for Proteins I: Wire Frame (or Line) (I4) Shows bonds as line segments Does not show atoms But you can figure out where atoms are Where? Consider alpha-conotoxin (see next slide) Conotoxins are produced by poisonous snails called cone shells Alpha-conotoxin A-CONOTOXIN Cone shell catches, eats fish Speaking of conotoxins… Check out a movie of a snail catching and eating a fish! (It uses conotoxins as part of the process) Downloaded from: http://grimwade.biochem.unimelb.edu.au/cone/fish2.mov Originally from “Neurex Corp - Science and Publications - Articles” http://grimwade.biochem.unimelb.edu.au/cone/envenom.html Alpha-conotoxin This is a type of conotoxin There are also Conotoxins: made by cone shells omega-, mu-, delta-, kappa-conotoxins A category of carnivorous, often beautiful snails “Typically 12-30 amino acid residues in length” “…highly constrained peptides due to their high density of disulphide bonds.” http://grimwade.biochem.unimelb.edu.au/cone/vencomp.html View Formats for Proteins II: Space filling Each atom is shown as a sphere Size of a sphere varies with sizes of its atom Alpha-conotoxin again next… What is the orientation compared to the wire frame picture? Alpha-conotoxin View Formats for Proteins III: Ball and Stick Each atom is shown with a small sphere Bonds shown with sticks Alpha-conotoxin again… What is the orientation compared to the previous models? Alpha-conotoxin View Formats for Proteins IV: Cartoon Shows secondary structures Alpha-conotoxin again… Alpha-conotoxin View Formats for Proteins: Cartoon Compare the orientations! That can also be called a ribbon diagram Above is another take on cartoon vs. ribbon… from http://www.sander.embl-ebi.ac.uk/tops/ExplainDetailed.html Finding Functional Sites Proteins have functional sites and “the other stuff” Functional sites are where the molecule “binds” (interacts with other molecules) Finding Functional Sites II Functional sites are where the molecule binds Does the rest of the protein matter at all? (why?) We’d like to find these “active” sites Some active sites are harder to know about than others The largest cavity on a protein surface is often the active site SURFNET is a program that can locate active sites Identifying Active Site Function Suppose SURFNET finds a likely site Finding an active site is only step 1 Figuring out its function is then step 2 Identifying Active Site Function II Step 2: Figuring out active site function Similar sites tend to have similar functions Find another protein with Hypothesize that the active site of interest… known function and similar active site …has a similar function Databases and sequence matching are key Do you think phylograms could help? Cladograms? Dendrograms? Structural Alignment (I5) Sequence alignment of DNA often works great It tells us about relatedness of When organisms are very divergent: organisms, proteins, genomes… homologous sequences align little or no better than random sequences What is a way to circumvent that? Structural Alignment II Superpose (superimpose) two molecules… “…so that peptide backbones of structurally equivalent residues lie close together in space.” p. 144, Westhead et al. This might be ambiguous. Does it refer to Then, 1. backbones made of residues …or… 2. the peptide backbone parts of residues sequence align the structurally aligned segments This often works because structure tends to be conserved more than sequence over time Source: glinka.bio.neu.edu/SEDB/Examples/Structure_search_FRIEND.JPG Structural Alignment: Example 2 Structural Alignment: Example 3 Source: a, Structure-based alignment of the hOGG1 sequence with those of E. coli AlkA27, 30, E. coli endonuclease III (refs 26, 28) and E. coli MutY29. Secondary structure assignments are listed above the primary sequence with -helices highlighted by cylinders and -sheets highlighted as arrows. The highly conserved HhH–GPD motif is shown in orange. Residues in hOGG1 are highlighted as follows: the catalytic Lys 249 and Asp 268 are boxed; residues that interact with the oxoG and estranged cytosine are red and blue, respectively; residues making DNA backbone contacts are green. b, The conserved HhH–GPD motif (orange) in structurally characterized members of the HhH–GPD superfamily. Source: Structural basis for recognition and repair of the Steven D. Bruner, Derek P. G. Norman and Gregory L. Verdine Nature 403, 859-866 (24 February 2000) Another Kind of Structural Alignment Source: www.herner.hu/daniel/shaolin.html A Last Example Measuring Structural Alignment One approach is RMSD RMSD = Root Mean Square Deviation Superpose the two proteins in 3-D Identify aligned residues Measure the distance between each pair of aligned residues Average the distances (Actually, average the squared distances) (distance between their alpha-carbon atoms) Then take the square root of the result RMSD =square Root of the Mean of the Squares of the Deviations Take a minute to think about this now… Measuring Structural Alignment RMSD=square Root of the Mean of the Squares of the Deviations “…a small RMSD computed over a large number of residues (N ) is more significant than a small RMSD computed over a small number of residues.” – Westhead et al., p. 145 Why? RMSD 1 N d i 2 i Structural Alignment Redux Why are both structural alignment, and sequence alignment useful for comparing proteins? Which is not useful for DNA comparisons? Why? Protein Structure Classification (I6) Protein structure tends to be conserved Therefore, classification can show relatedness between very different organisms very different homologs For example, if “classification” produced trees Then branching near the root implies ancient splits …and branching near the leaves implies recency Just like other kinds of evolutionary dendrograms Of all the things “classification” could mean, this is what it means here! Protein Structure Classification: CATH and SCOP CATH and SCOP are systems of classification for proteins They use structure They produce trees (dendrograms) They work differently, but produce similar results (why?) CATH Main classification levels Class 1: Mainly-Alpha Class 2: Mainly-Beta Class 3: Mixed Alpha-Beta Class 4: Few Secondary Structures (slightly modified from http://www.cathdb.info/cgi-bin/cath/GotoCath.pl?link= CATH II CATH has that and 3 other levels of classification (totalling 4): Each level is a level of a tree Class Architecture Topology Homologous superfamily Root is at top (the 0th level) Class is the 1st level, etc. (let’s draw) Can you guess why the system is called CATH? Small Part of the CATH Tree Source: the CATH Website, www.cathdb.info CATH III Classes are based on secondary structure Mainly-alpha Mainly-beta Alpha-beta (mixed) Low secondary structure content What do alpha & beta refer to? What is homology? Homologous proteins Are evolutionarily related Typically share sequence similarities Typically share structural similarities Might share no significant similarities CATH IV Recall the four top levels of the CATH tree Class, Architecture, Topology, Homologous superfamily A node at which level represents a category of proteins thought to be evolutionarily related? SCOP Has a tree, like CATH Tree is significantly different, however…yet… Proteins classified as homologs by one tend to be classified as homologs by the other (why?) http://scop.mrc-lmb.cam.ac.uk/scop/