Download Introduction, ppt file - Cheriton School of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Catalytic triad wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Fatty acid synthesis wikipedia , lookup

Fatty acid metabolism wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Magnesium transporter wikipedia , lookup

Interactome wikipedia , lookup

Peptide synthesis wikipedia , lookup

Protein purification wikipedia , lookup

Point mutation wikipedia , lookup

Metalloprotein wikipedia , lookup

Western blot wikipedia , lookup

Metabolism wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genetic code wikipedia , lookup

Biosynthesis wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Proteolysis wikipedia , lookup

Biochemistry wikipedia , lookup

Transcript
CS882, Fall 2006
1001 Stories of
Protein Folding
Ming Li
School of Computer Science
University of Waterloo
By the time I finish telling these
protein stories, I hope we know
better how to fold them by
computers.
Prelude: Why should you care?
 Through 3 billion years of evolution, nature has created an
enormous number of protein structures for different biological
functions. Understanding these structures is key to
proteomics. Fast computation of protein structures is one of
the most important unsolved problems in science today. Much
more important than, for example, the P≠NP conjecture.
 We now have a real chance to solve it.
 This course:



I do ½ of the course, so that we understand everything
about proteins.
You do ½ of the course, to present all methods for protein
folding. 50% marks.
You do a final project designing your method for folding
proteins. 50% marks.
Proteins – the life story
 Proteins are building blocks of life. In a cell, 70% is
water and 15%-20% are proteins.
 Examples:
 hormones – regulate metabolism
 structures – hair, wool, muscle,…
 antibodies – immune response
 enzymes – chemical reactions
 Sickle-cell anemia: hemoglobin protein is made of 4
chains, 2 alphas and 2 betas. Single mutation from
Glu to Val happens at residue 6 of the beta chain.
This is recessive. Homozygotes die but
Heterozygotes have resistance to malaria, hence it
had some evolutionary advantage in Africa.
T
A
A
T
C
G
T
A
Human: 3 billion bases, 30k genes.
E. coli: 5 million bases, 4k genes
cDNA
reverse transcription
A
G
C
G
T
C
G
T
C
G
T
A
mRNA
(A,C,G,U)
C
A
translation
transcription
Protein
(20 amino acids)
Codon: three nucleotides encode
an amino acid.
64 codons
20 amino acids, some w/more codes
A
T
They are built from 20 amino acids and
fold in space into functional shapes
Several polypeptide chains can form
more complex structures:
What happened in sickle-cell anemia
Mutating to
Valine.
Hydrophobic
patch on the
surface.
Hemoglobin
Mutating to
Valine.
Hydrophobic
patch on the
surface.
Amino acids stories
 There are 500 amino acids in nature. Only 20
(22) are used in proteins.
 The first amino acid was discovered from
asparagus, hence called Asparagine, in 1806.
All 20 amino acids in proteins are discovered
by 1935.
 Traces of glycin, alanine etc were found in a
meteorite in Australia in 1969. That brings the
conjecture that life began from extraterrestrial
origin.
20 Amino acids – the boring part
 Hydrophobic amino acids







Alanine
Neutral
Valine
Non-polar
Phenylalanine
Proline
Methionine
Isoleucine
Lucine
 Charged Amino Acids




Aspartic acid
Glutamic acid
Lysine
Arginine
 Polar amino acids
 Serine
Polar: one positive
and one negative charged ends,
 Threonine
e.g. H O is polar, oil is non-polar.
 Tyrosine
 Histidine
 Cysteine
 Asparagine
 Glutamine
 Tryptophan
2
 Simplest Amino Acid

Glycine
Why do protein fold? Some philosophy



The folded structure of a protein is actually thermodynamically less favorable
because it reduces the disorder or entropy of the protein. So, why do proteins
fold? One of the most important factors driving the folding of a protein is the
interaction of polar and nonpolar side chains with the environment. Nonpolar
(water hating) side chains tend to push themselves to the inside of a protein
while polar (water loving) side chains tend to place themselves to the outside of
the molecule. In addition, other noncovalent interactions including electrostatic
and van der Waals will enable the protein once folded to be slightly more stable
than not.
When oil, a nonpolar, hydrophobic molecule, is placed into water, they push
each other away.
Since proteins have nonpolar side chains their reaction in a watery environment
is similar to that of oil in water. The nonpolar side chains are pushed to the
interior of the protein allowing them to avoid water molecule and giving the
protein a globular shape. There is, however, a substantial difference in how the
polar side chains react to the water. The polar side chains place themselves to
the outside of the protein molecule which allows for their interact with water
molecules by forming hydrogen bonds. The folding of the protein increases
entropy by placing the nonpolar molecules to the inside, which in turn,
compensates for the decrease in entropy as hydrogen bonds form with the polar
side chains and water molecules.
1 letter label & how to remember them

If only one amino acid begins
with a letter, that letter is used:
 C = Cys = Cysteine
 H = His = Histidine
 I = Ile = Isoleucine
 M = Met = Methionine
 S = Ser = Serine
 V = Val = Valine
 The losers try phonetically




F = Phe = Phenylalanine
R = Arg = Arginine
Y = Tyr = Tyrosine
W = Trp = Trptophan
(double ring)

Otherwise the letter is assigned
to the more frequent one:
 A = Ala = Alanine
 G = Gly = Glycine
 L = Leu = Leucine
 P = Pro = Proline
 T = Thr = Threonine
 When everything fails:





D = Asp = Aspartic acid
N = Asn = Asparagine
E = Glu = Glutamic acid
Q = Gln = Glutamine
K = Lys = Lysine
They really look all the same:
One amino acid.
The difference is
only in the side
chain R.
Lose H2O
Many amino acids
connected to a
polypeptide chain
The amino acids are connected to form polypeptide
chains: going from N terminal to C terminal
Lose water H2O
when forming the
peptide bond
Planar, rigid, with
known bond distances
and angles.
They could have been different
 L-form vs D-form:
Looking down the HCα bond from H, the
L-form is CORN.
The D-form is
NRCO
 All amino acids
occur in proteins
have L-form.
 It is unclear why Dform was not chosen
Mirror image
In functioning
proteins, only
L-form occur
In nature, L, Dforms occur with
equal chance.
Story of cysteines
 Two cysteine residues in different (non-
adjacent) parts of a protein sequence can be
oxidized to form a disulfide bridge, as end
product of air oxidation:
2 cysteines + ½ O2 = 2 linked cysteines + H2O
 They have the functions:


Stablize single protein fold
Linking two chains (linking A and B chains in
insulin)
Disulfide bond between two cystines:
Cystine:
SH
|
CH2
|
Note: We will not study
amino acids one by one,
but we will study
their structures when we
meet them. Red bond
connects to Cα
The Φ and Ψ angles
 The angle at N-Cα is Φ
angle
 The angle at Cα-C’ is Ψ
angle
 No side chain is
involved (which is at Cα)
 These angles determine
backbone structure.
Cα
The Ramachandran plot
Red: good
Yellow: ok
White: forbidden
Except Glycine
L-amino acids cannot form
Large left-handed helix, but
Gly (also apn, asp) can form
short left-handed helix, with
side chain forming hydrogen
bound with main chain.
The story of Glycine
 Glycines have no side-chain (just H), so it can
adopt phi and psi angles in all 4 quadrants of
the Ramachadran plot.
 Thus, it frequently occur in turn regions of
proteins where any other residue would be
sterically hindered.
Glycine:
H
|
Staggered carbon atoms for side chains
Most favorable
+ 1200 rotations
Ethan: CH3CH3
Aligned,
too crowded
Valine: (b)
is more
favorable,
least crowded
Cβ
Cα