* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download structure
Artificial gene synthesis wikipedia , lookup
Peptide synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Point mutation wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Genetic code wikipedia , lookup
Interactome wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Homology modeling wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Metalloprotein wikipedia , lookup
From Sequences to Structure Illustrations from: C Branden and J Tooze, Introduction to Protein Structure, 2nd ed. Garland Pub. ISBN 0815302703 Protein Functions •Mechanoenzymes: myosin, actin •Rhodopsin: allows vision •Globins: transport oxygen •Antibodies: immune system •Enzymes: pepsin, renin, carboxypeptidase A •Receptors: transmembrane signaling •Vitelogenin: molecular velcro –And hundreds of thousands more… 2 Proteins are Chains of Amino Acids •Polymer – a molecule composed of repeating units 3 The Peptide Bond •Dehydration synthesis •Repeating backbone: N–C –C –N–C –C O O –Convention – start at amino terminus and proceed to carboxy terminus 4 Peptidyl polymers •A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids. •Since part of the amino acid is lost during dehydration synthesis, we call the units of a protein amino acid residues. carbonyl carbon amide nitrogen 5 Side Chain Properties •Recall that the electronegativity of carbon is at about the middle of the scale for light elements –Carbon does not make hydrogen bonds with water easily – hydrophobic –O and N are generally more likely than C to h-bond to water – hydrophilic •We group the amino acids into three general groups: –Hydrophobic –Charged (positive/basic & negative/acidic) –Polar 6 The Hydrophobic Amino Acids Proline severely limits allowable conformations! 7 The Charged Amino Acids 8 The Polar Amino Acids 9 More Polar Amino Acids And then there’s… 10 Planarity of the Peptide Bond 11 Phi and psi • = = 180° is extended conformation • : C to N–H • : C=O to C 12 The Ramachandran Plot Observed (non-glycine) Calculated Observed (glycine) •G. N. Ramachandran – first calculations of sterically allowed regions of phi and psi •Note the structural importance of glycine 13 Primary and Secondary Structure •Primary structure = the linear sequence of amino acids comprising a protein: AGVGTVPMTAYGNDIQYYGQVT… •Secondary structure Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the -sheet The location and direction of these periodic, repeating structures is known as the secondary structure of the protein 14 The alpha Helix 60° 15 Properties of the alpha helix • 60° •Hydrogen bonds between C=O of residue n, and NH of residue n+4 •3.6 residues/turn •1.5 Å/residue rise •100°/residue turn 16 Properties of -helices •4 – 40+ residues in length •Often amphipathic or “dual-natured” –Half hydrophobic and half hydrophilic –Mostly when surface-exposed •If we examine many -helices, we find trends… –Helix formers: Ala, Glu, Leu, Met –Helix breakers: Pro, Gly, Tyr, Ser 17 The beta Strand (and Sheet) 135° +135° 18 Properties of beta sheets •Formed of stretches of 5-10 residues in extended conformation •Pleated – each C a bit above or below the previous •Parallel/aniparallel, contiguous/non-contiguous OCCBIO 2006 – Fundamental Bioinformatics 19 Parallel and anti-parallel -sheets •Anti-parallel is slightly energetically favored Anti-parallel Parallel 20 Turns and Loops •Secondary structure elements are connected by regions of turns and loops •Turns – short regions of non-, non- conformation •Loops – larger stretches with no secondary structure. Often disordered. •“Random coil” •Sequences vary much more than secondary structure regions 21 Levels of Protein Structure •Secondary structure elements combine to form tertiary structure •Quaternary structure occurs in multienzyme complexes –Many proteins are active only as homodimers, homotetramers, etc. Disulfide Bonds •Two cyteines in close proximity will form a covalent bond •Disulfide bond, disulfide bridge, or dicysteine bond. •Significantly stabilizes tertiary structure. 23 Protein Structure Examples 24 Determining Protein Structure •There are ~ 100,000 distinct proteins in the human proteome. •3D structures have been determined for 14,000 proteins, from all organisms –Includes duplicates with different ligands bound, etc. •Coordinates are determined by X-ray crystallography 25 X-Ray diffraction •Image is averaged over: –Space (many copies) –Time (of the diffraction experiment) 26 Electron Density Maps •Resolution is dependent on the quality/regularity of the crystal •R-factor is a measure of “leftover” electron density •Solvent fitting •Refinement 27 The Protein Data Bank •http://www.rcsb.org/pdb/ ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 N CA C O CB N CA C O N CA C O CB CG1 CG2 ALA ALA ALA ALA ALA GLY GLY GLY GLY VAL VAL VAL VAL VAL VAL VAL E E E E E E E E E E E E E E E E 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 22.382 22.957 23.572 23.948 23.932 23.656 24.216 25.653 26.258 26.213 27.594 28.569 28.429 27.834 29.259 26.811 47.782 47.648 46.251 45.688 48.787 45.723 44.393 44.308 45.296 43.110 42.879 43.613 43.444 41.363 41.013 40.649 112.975 111.613 111.545 112.603 111.380 110.336 110.087 110.579 110.994 110.521 110.975 110.055 108.822 110.979 111.404 111.850 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 24.09 22.40 21.32 21.54 22.79 19.17 17.35 16.49 15.35 16.21 16.02 15.69 16.43 16.66 17.35 17.03 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 3APR 28 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 Views of a Protein Wireframe Ball and stick 29 Views of a Protein Spacefill Cartoon CPK colors Carbon = green, black Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white 30 The Protein Folding Problem •Central question of molecular biology: “Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be?” •Input: AAVIKYGCAL… Output: 11, 22… = backbone conformation: (no side chains yet) 31 Forces Driving Protein Folding •It is believed that hydrophobic collapse is a key driving force for protein folding –Hydrophobic core –Polar surface interacting with solvent •Minimum volume (no cavities) •Disulfide bond formation stabilizes •Hydrogen bonds •Polar and electrostatic interactions 32 Folding Help •Proteins are, in fact, only marginally stable –Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form •Many proteins help in folding –Protein disulfide isomerase – catalyzes shuffling of disulfide bonds –Chaperones – break up aggregates and (in theory) unfold misfolded proteins 33 The Hydrophobic Core •Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen. •The mutation E6V in the chain places a hydrophobic Val on the surface of hemoglobin •The resulting “sticky patch” causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently •Sickle cell anemia was the first identified molecular disease 34 Sickle Cell Anemia Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination. 35 Computational Problems in Protein Folding •Two key questions: –Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein? •H-bonds, electrostatics, hydrophobic effect, etc. •Derive a function, see how well it does on “real” proteins –Optimization – once we get an evaluation function, can we optimize it? •Simulated annealing/monte carlo •EC •Heuristics 36 Fold Optimization •Simple lattice models (HPmodels) –Two types of residues: hydrophobic and polar –2-D or 3-D lattice –The only force is hydrophobic collapse –Score = number of HH contacts 37 Scoring Lattice Models H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: Penalize for buried polar or surface hydrophobic residues 38 What can we do with lattice models? •For smaller polypeptides, exhaustive search can be used –Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process •For larger chains, other optimization and search methods must be used –Greedy, branch and bound –Evolutionary computing, simulated annealing –Graph theoretical methods 39 Learning from Lattice Models The “hydrophobic zipper” effect: Ken Dill ~ 1997 40 Representing a lattice model Absolute directions UURRDLDRRU Relative directions LFRFRRLLFFL Advantage, we can’t have UD or RL in absolute Only three directions: LRF What about bumps? LFRRR Bad score Use a better representation 41 Preference-order representation •Each position has two “preferences” –If it can’t have either of the two, it will take the “least favorite” path if possible •Example: {LR},{FL},{RL}, {FR},{RL},{RL},{FR},{RF} •Can still cause bumps: {LF},{FR},{RL},{FL}, {RL},{FL},{RF},{RL}, {FL} 42 More Realistic Models •Higher resolution lattices (45° lattice, etc.) •Off-lattice models –Local moves –Optimization/search methods and / representations •Greedy search •Branch and bound •EC, Monte Carlo, simulated annealing, etc. 43 The Other Half of the Picture •Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold). •Theoretical force field: G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb •Empirical force fields –Start with a database –Look at neighboring residues – similar to known protein folds? 44 Threading: Fold recognition •Given: –Sequence: IVACIVSTEYDVMKAAR… –A database of molecular coordinates •Map the sequence onto each fold •Evaluate –Objective 1: improve scoring function –Objective 2: folding 45 Secondary Structure Prediction AGVGTVPMTAYGNDIQYYGQVT … A-VGIVPM-AYGQDIQYAG-GIIP--AYGNELQ-GQVT… AGVCTVPMTA---ELQYYG-GQVT… T… AGVGTVPMTAYGNDIQYYGQVT ----hhhHHHHHHhhh-… eeEE… 46 Secondary Structure Prediction •Easier than folding –Current algorithms can prediction secondary structure with 70-80% accuracy •Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222. –Based on frequencies of occurrence of residues in helices and sheets •Neural network based –Uses a multiple sequence alignment –Rost & Sander, Proteins, 1994 , 19, 55-72 47 Chou-Fasman Parameters Nam e Alanine Arginine AsparticAcid Asparagine Cysteine GlutamicAcid Glutamine Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine Abbrv A R D N C E Q G H I L K M F P S T W Y V P(a) 142 98 101 67 70 151 111 57 100 108 121 114 145 113 57 77 83 108 69 106 P(b) P(turn) 83 66 93 95 54 146 89 156 119 119 37 74 110 98 75 156 87 95 160 47 130 59 74 101 105 60 138 60 55 152 75 143 119 96 137 96 147 114 170 50 f(i) 0.06 0.07 0.147 0.161 0.149 0.056 0.074 0.102 0.14 0.043 0.061 0.055 0.068 0.059 0.102 0.12 0.086 0.077 0.082 0.062 f(i+1) 0.076 0.106 0.11 0.083 0.05 0.06 0.098 0.085 0.047 0.034 0.025 0.115 0.082 0.041 0.301 0.139 0.108 0.013 0.065 0.048 f(i+2) 0.035 0.099 0.179 0.191 0.117 0.077 0.037 0.19 0.093 0.013 0.036 0.072 0.014 0.065 0.034 0.125 0.065 0.064 0.114 0.028 f(i+3) 0.058 0.085 0.081 0.091 0.128 0.064 0.098 0.152 0.054 0.056 0.07 0.095 0.055 0.065 0.068 0.106 0.079 0.167 0.125 0.053 48 Chou-Fasman Algorithm •Identify -helices –4 out of 6 contiguous amino acids that have P(a) > 100 –Extend the region until 4 amino acids with P(a) < 100 found –Compute P(a) and P(b); If the region is >5 residues and P(a) > P(b) identify as a helix •Repeat for -sheets [use P(b)] •If an and a region overlap, the overlapping region is predicted according to P(a) and P(b) 49 Chou-Fasman, cont’d •Identify hairpin turns: –P(t) = f(i) of the residue f(i+1) of the next residue f(i+2) of the following residue f(i+3) of the residue at position (i+3) –Predict a hairpin turn starting at positions where: •P(t) > 0.000075 •The average P(turn) for the four residues > 100 •P(a) < P(turn) > P(b) for the four residues •Accuracy 60-65% 50 Chou-Fasman Example •CAENKLDHVRGPTCILFMTWYNDGP •CAENKL – Potential helix (!C and !N) •Residues with P(a) < 100: RNCGPSTY –Extend: When we reach RGPT, we must stop –CAENKLDHV: P(a) = 972, P(b) = 843 –Declare alpha helix •Identifying a hairpin turn –VRGP: P(t) = 0.000085 –Average P(turn) = 113.25 •Avg P(a) = 79.5, Avg P(b) = 98.25 51 Lots More to Come •Microarray analysis •Mass Spectrometry •Interactions/ Knockouts •Synthetic Lethality •RPPA •..... 52