* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein Structure
Signal transduction wikipedia , lookup
Gene expression wikipedia , lookup
Expression vector wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Peptide synthesis wikipedia , lookup
Interactome wikipedia , lookup
Point mutation wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Metalloprotein wikipedia , lookup
Genetic code wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Protein Structure Amino Acid Structure • • • • • You are going to have to learn the names and structures of the 20 amino acids, along with their 3 letter and 1 letter codes. One of the best ways for you to learn them is to draw the structures out yourself, not just look at pictures of them. I am going to spend a little bit of time going through them systematically to help you out. Most of the one letter codes are simply the first letter of the amino acid, or an obvious extension of them: R for arginine, Y for tyrosine, F for phenylalanine. A few are arbitrary and you just have to memorize them. The basic structure of an amino acid: the central carbon atom, the alpha carbon, Cα, is connected to an amino group (-NH2) one side, and a carboxylic acid group (-COOH) on the other side. The alpha carbon is also connected to the R group, which is different for each of the amino acids. The fourth bond on the alpha carbon is to a hydrogen atom. Glycine and Alanine, the Simplest Amino Acids • The R group on glycine (Gly, G) is just a hydrogen. Thus, the alpha carbon on glycine has 2 hydrogens stuck to it, and for that reason it does not have an L form and a D form. It is nonpolar, but only weakly hydrophobic. It can be found in either hydrophilic or hydrophobic environments. Glycine • Alanine (Ala, A) has a methyl group as its R group. This makes it weakly hydrophobic, but like glycine, it can be found anywhere in a protein. Alanine Adding an –OH: serine and threonine • Serine (Ser, S) is alanine with an –OH group attached. This makes serine a polar amino acid: anytime you have a C-O or a C-N bond, you get polarity, which implies a hydrophilic character and the likelihood of hydrogen bonds. Serine • Threonine (Thr, T) adds another methyl group to the R group carbon of serine. Serine and threonine have very similar properties. Threonine Acidic Amino Acids: aspartic acid and glutamic acid • Aspartic acid (Asp, D) is just alanine with a carboxylic acids group attached. At physiological pHs this group is in the –COOform. This makes it hydrophilic and subject to electrostatic interactions with basic amino acids that have a + charge. • Glutamic acid (Glu, E) simply has one more carbon between the acid group and the alpha carbon than aspartic acid. Properties are very similar. Aspartic acid Glutamic acid Amide Derivatives: asparagine and glutamine • Asparagine (Asn, N) has the carboxylic acid on its R group converted to an amide: CONH2 instead of –COOH. This makes asparagine polar but not charged. • Glutamine (Gln, Q) does the same thing with glutamic acid: an amide group instead of an acid group. Asparagine Glutamine Basic Amino Acids: Varied Structures • • • Lysine (Lys, K) is the simplest of the basic amino acids. Its R group is a 4 carbon chain ending in an amino group. At physiological pHs, the amino group is in the –NH3+ form, making lysine a hydrophilic charged amino acid that forms ionic bonds with acidic amino acids. Arginine (Arg, R) has a 3 carbon chain ending in a more complicated structure: a central carbon connected to 3 nitrogens. This structure carries a + charge at physiological pHs. Histidine (His, H) is like alanine connected to a 5 member ring containing 3 carbons and 2 nitrogens. Histidine normally has a + charge, but its pKa is 6.10, which means that under physiological conditions, small changes in pH will change the amount of charge on the histidine. Lysine Arginine Histidine Sulfur-containing: cysteine and methionine • Cysteine (Cys, C) is identical to serine except that the –OH group has been replaced by an –SH group. That is, it’s like alanine with an -SH attached. Cysteine often forms disulfide bridges with other cysteines, which help stabilize the three dimensional structure of proteins. • Methionine (Met, M) has a linear R group that is 2 carbons, a sulfur and a carbon. This makes methionine fairly hydrophobic. Methionine is the first amino acid in every protein as it is being synthesized, although it is often removed after synthesis in complete. Methionines are also found in the middle of protein sequences. Aliphatic and Hydrophobic: Leucine, Isoleucine, Valine • • These three amino acids are very similar. They all contain hydrocarbon chains, which makes them hydrophobic, usually found in the interior of proteins. Aliphatic means there are no benzene-type rings (which are called aromatic). Valine (Val, V) has an R group that is a V of 3 carbons attached. • Leucine (Leu, L) is like valine with an extra carbon between the 3 carbon V and the alpha carbon. • Isoleucine (Ile, I) has the same number of carbons in its R group as leucine, but arranged slightly differently: it’s like valine with an extra carbon on one of the arms. Valine Leucine Isoleucine Aromatic: Phenylalanine, Tyrosine, Tryptophan • • • Phenylalanine (Phe, F) is like alanine with a phenyl group (a benzene ring) attached). It is a very hydrophobic amino acid, usually buried in the interior of proteins or membranes. Tyrosine (Tyr, Y) is phenylalanine with an – OH group attached to the ring. It is also hydrophobic, but not as much as phenylalanine. The –OH group can form hydrogen bonds, so in some proteins, tyrosine is found exposed to water. Tryptophan (Trp, W) is the largest and least common amino acid. It has an R group with 2 rings. The 5 member ring contains a nitrogen. This structure is an indole ring, with indole being the compound that gives feces its characteristic odor. Because of the nitrogen, tryptophan can form hydrogen bonds even though it is quite hydrophobic. Phenylalanine Tyrosine Tryptophan Imino Acid: Proline • Proline (Pro, P) is unique in that its R group is attached to the amino nitrogen as well as to the alpha carbon. For a chemist, this makes proline an imino acid, not an amino acid. This bond means that proline necessarily introduces a kink in the polypeptide backbone. The lack of an H attached to the N means that proline can’t form any hydrogen bonds. All other amino acids can at least form hydrogen bonds with the N-H in the backbone. Proline is hydrophobic. Peptide Linkage • • • The peptide bond connects the amino group of one amino acid to the acid group of the next amino acid. This bond is called a peptide bond; organic chemists would call this an amide bond. – The peptide bond region is almost always planar with the C=O sticking out one side, and the H on the nitrogen sticking out the other side. These groups are both polar and easily form hydrogen bonds. The other two bonds in the polypeptide backbone are called psi (ψ), between the acid carbon and the alpha carbon, and phi (φ), between the amino nitrogen and the alpha carbon. These bonds can rotate freely, but they are constrained by steric hindrance between the R groups (i.e. they bump into each other); the book refers to this as Van der Waals forces. Also, formation of hydrogen bonds and ionic bonds influences the phi and psi bond angles. Also, hydrophobic interactions: the need for some amino acids to get away from water and others to be in contact with water. Levels of Protein Structure • • • • • Primary (1o) structure: the amino acid sequence. Secondary (2o) structure: local structures, mostly the alpha helix and the beta sheet Tertiary (3o) structure: the overall folding pattern of the whole polypeptide Quaternary (4o) structure: how different polypeptides join together to form a protein with multiple subunits. Between secondary and tertiary is a very important level: the domain. A domain is a region of a polypeptide that can fold into a compact functional structure independent of the rest of the protein. Most proteins are composed of several domains. Usually 100200 amino acids long. Domains are more conserved in evolution than whole proteins are. Pyruvate kinase, an enzyme with 3 domains Secondary Structure • There are just 2 common secondary structures in proteins: the alpha helix and the beta sheet. Both are held together by hydrogen bonds between the C=O in one peptide bond and the N-H in another peptide bond. • The alpha helix is a rigid cylinder formed when a single chain is twisted so the C=O of one amino acid is hydrogen-bonded to the N-H of the fourth amino acid down the backbone. – This gives one turn every 3.6 amino acids • In an alpha helix, the protein backbone is in the center, with the R groups jutting out. • The transmembrane regions of proteins often consist of alpha helices with hydrophobic R groups. The hydrophilic backbone is shielded from the membrane interior by the R groups. More Alpha Helix • Many transmembrane proteins use several alpha helices wrapped up together. – Ion channels and other transporter molecules • Sometimes 2 or 3 alpha helices will wrap around each other: this is called a coiled coil. – In an amphipathic alpha helix, the hydrophobic R groups on one side of each helix to interact with each other, while hydrophilic R groups on the other side of each helix can interact with water. – Structural proteins like keratin (in skin) and myosin (muscle) are long coiled-coil rods. More Secondary Structure: Beta Sheet • A beta strand is a region of a polypeptide that is composed of the Vshaped N-C-C backbone of the amino acids alternating up and down. The R groups also alternate up and down. • Two or more beta strands next to each other is a beta pleated sheet. • Beta sheets are held together by hydrogen bonds between the amino acids of the different strands. • The strands can be parallel or anti-parallel. Or even mixed. More Beta Sheet • Beta sheets form strong and rigid proteins, such as silk protein. • Beta sheets are usually not planar: they tend to curl up. • A beta barrel is a common transmembrane structure that readily forms a pore. Protein Domains • Protein domains are structures that fold independently into compact units that have a specific function. – – • Example of a multi-domain protein: steroid hormone receptors are proteins that have two domains: – – – • a ligand binding domain that binds to the steroid hormone a DNA binding domain that binds to DNA and stimulates transcription The two domains are connected by a few amino acids called a hinge region. Some domains are found in several different proteins: they are shuffled between different proteins during evolution. – • Domains can contain alpha helices, beta sheets, or both, plus less well defined regions. Proteins can be composed of just a single domain, or they can contain several different domains. An example: the TIM barrel, is found in at least 30 different proteins. The three dimensional structure is more evolutionarily conserved than the amino acid sequence. Some TIM Barrel Proteins Domain Structure of a Family of Proteins • Proteins often have several different domains, with similar proteins having slightly different arrangements of the domains. • Here are the domain structures of several extracellular proteins. The red domain is the “chordin-like cysteinerich domain”. Protein Folding • Proteins spontaneously fold into a lowest energy conformation. This is the active conformation, which allows the protein to function properly. – In the 1950s Christian Anfinsen showed that pancreatic RNase could refold itself into its active configuration after denaturation, without any external guidance. This, and many confirming experiments on other proteins, has lead to the general belief that the amino acid sequence of a protein contains all the information needed to fold itself properly, without any additional energy input. • Natural selection has strongly favored protein sequences that have a single conformation that forms easily and without making mistakes. The vast majority of randomly chosen protein sequences don’t do this: they have multiple conformations that are about equally low energy. Anfinsen experiment: RNase is held together by disulfide bonds as well as the non-covalent bonds of the folding. The protein was denatured by adding urea and reducing the disulfides to –SH groups. The experimental group had the urea removed first, then the disulfides re-formed by oxidation. The control had disulfides re-formed first, while the protein was still in a random conformation. Forces That Fold Proteins • Proteins are thought to fold into the lowest free energy conformation. • It is thought that the fastest acting force is hydrophilic and hydrophobic interactions: the need for amino acids with hydrophobic R groups to aggregate together away from water, and the need for the hydrophilic amino acids to fit into the structure of water molecules • Formation of hydrogen bonds and ionic bonds (electrostatic interactions happens after the initial hydrophobic interactions • Van der Waals forces: the slight attraction between all atoms coupled with repulsion if they get too close. This makes the atoms of a protein pack together (as seen in space-filling models). Protein Folding Energetics • An unfolded protein has high entropy: there are many different conformations it can be in. This is symbolized by the width of the funnel. • Unfolded proteins also have a high free energy, meaning that the protein chain moves easily between different conformations • As protein folding proceeds, both the free energy and the entropy decrease to a minimum. The protein assumes a single conformation (meaning entropy is reduced to a minimum) that is the lowest free energy state. • There are various intermediate folding states, some of which can trap the protein into a misfolded and inactive conformation. Bioinformatics: The Protein Folding Problem • • The protein folding problem: predict the three dimensional structure of a protein from its amino acid sequence. If proteins fold into their lowest energy configuration, based entirely on their amino acid sequence, you would think that we could figure out the rules and be able to predict three dimensional structure just from the sequence. – It’s easy to get the amino acid sequence: just translate the DNA sequence of the genes. • However, this hasn’t proved true: we can make useful guesses about structure, but they are still very inaccurate. A few small proteins whose structure has been predicted from the amino acid sequence (blue), then compared to the actual structure (red). Chaperone Proteins • Cells contain machinery to unfold and refold proteins that are mis-folded. – Some cellular proteins require specific chaperones to fold properly: they can’t fold into their active configuration by themselves • Best studied is Hsp100, also called Clp. It uses ATP energy to unfold a protein (and also break any disulfide bridges). Then, the protein chain gets fed through a very tiny opening, allowing it to refold on the other side. Protein Aggregates • Under various stress conditions (like heat or high pH), proteins unfold from their correct, functional configuration. Often, the unfolded proteins bind together into insoluble aggregates. – think what happens when cooking an egg white: the clear water-soluble liquid before cooking is due to albumin proteins existing as individual globular proteins. – When you heat the egg whites, you unfold the proteins, and they aggregate together into an insoluble (but more easily digested) white mass, because the hydrophobic amino acids from different polypeptides stick together to get away from the aqueous environment. This is a protein aggregate. Protein Aggregation in Neurodegenerative Diseases • A more relevant protein aggregate issue: neural degenerative diseases. Alzheimer’s, prion diseases, Huntington’s disease, Parkinson’s disease are all caused by the formation of insoluble protein aggregates in the brain. – These aggregates are mis-folded proteins that form fibrils rich in beta sheet structures. They are called amyloid. – As the protein folds, it gets caught in a local free energy minimum, which can combine with other mis-folded proteins. – Why these aggregates are toxic to neurons is still unclear. • A prion is an “infectious protein”. Prion proteins are encoded in the genome and expressed at high levels in the nervous system. They presumably have a function in normal cells, as yet unknown. • Prions are the agents that cause various neural degenerative diseases: mad cow disease (bovine spongiform encephalopathy), chronic wasting disease in deer and elk, scrapie in sheep, and kuru and Creutzfeld-Jakob syndrome in humans. • The normal prion protein (PrP) is folded into a specific conformation, a state called PrPC. Prion diseases are caused by the same protein folded abnormally, a state called PrPSc. A PrPSc can bind to a normal PrPC protein and convert it to PrPSc. This conversion spreads throughout the body, causing the disease to occur. – It is also a form of inheritance that does not involve nucleic acids. Several prion-like proteins are known in yeast, a model eukaryote. These proteins have 2 stable conformations, which can be inherited across generations. • • • Prions Quaternary Structure: Assembly of Subunits • Most proteins are composed of more than one polypeptide chain (subunit). • Simplest cases: 2 identical subunits bind together to form a symmetrical structure – Sometimes 4 identical subunits, as in neuraminidase • Protein complexes with many different subunits are common. • Even larger structures: viruses, ribosomes, etc. can often be disassembled, and then they spontaneously reassemble. – This implies all the information needed for assembly is present in the proteins themselves – Some of these structures also contain RNA or DNA. Superoxide dismutase, a dimer of 2 identical subunits. ATP synthetase, multiple copies of at least 5 different subunits Quaternary Structure of Fibrous Proteins • Fibrous proteins provide structure to a cell: actin filaments, microtubules, collagen, keratin, etc. • These fibers are formed from many identical subunits binding together. Virus Assembly • This is also a spontaneous process, but it can be very complicated.