Download Protein Structure Predictions 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expression vector wikipedia , lookup

Fatty acid synthesis wikipedia , lookup

Gene expression wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Fatty acid metabolism wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Interactome wikipedia , lookup

Point mutation wikipedia , lookup

Peptide synthesis wikipedia , lookup

Metalloprotein wikipedia , lookup

Western blot wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Metabolism wikipedia , lookup

Homology modeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein wikipedia , lookup

Genetic code wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Proteolysis wikipedia , lookup

Biochemistry wikipedia , lookup

Transcript
Roadmap
The topics:








basic concepts of molecular biology
more on Perl
overview of the field
biological databases and database searching
sequence alignments
phylogenetics
structure prediction
microarray data analysis
Protein
Synthesis
the national health museum
Proteins
Proteins
Proteins perform a vast array of biological functions including:







Transport: hemoglobin (delivers O2 to lungs)
Mechanical support: collagen
Storage: ferritin (stores iron)
Regulation: repressor proteins (gene expression)
Antibodies: immunoglobulin
Catalysis: SOD (superoxide dismutase)
…
Misfold:
mad cow disease, Alzheimer's disease, …
Amino acid composition

Basic Amino Acid
Structure:

The side chain, R,
varies for each of
the 20 amino acids
Side chain
R
H
O
N C C
H
Amino
group
H
OH
Carboxyl
group
The Peptide Bond


Dehydration synthesis
Polypeptide with repeating backbone: N–C –C –N–C –C
Side chain properties
What make amino acids having different properties ?



Carbon does not make hydrogen bonds with water
easily – hydrophobic
O and N are generally more likely than C to h-bond to
water – hydrophilic
The amino acids forms three general groups:



Hydrophobic
Polar
Charged (positive/basic & negative/acidic)
The Hydrophobic Amino Acids
Proline severely
limits allowable
conformations!
The Charged Amino Acids
Krane & Raymer
The Polar Amino Acids
Krane & Raymer
More Polar Amino Acids
and
Peptidyl polymers

A few amino acids in a chain are called a polypeptide. A
protein is usually composed of 50 to 400+ amino acids.
Primary & Secondary Structure

Primary structure = the linear sequence of amino acids
comprising a protein:
AGVGTVPMTAYGNDIQYYGQVT…

Secondary structure


Regular patterns of hydrogen bonding in proteins result in
two patterns that emerge in nearly every protein structure
known: the -helix and the -sheet
The location of direction of these periodic, repeating
structures is known as the secondary structure of the protein
Levels of Protein
Structure

Secondary structure
elements combine to form
tertiary structure

Quaternary structure
occurs in multi-enzyme
complexes

Many proteins are active only
as homodimers,
homotetramers, etc.
Dihedral angles
 Helix





Most abundant secondary structure
3.6 amino acids per turn
Hydrogen bond formed between every fourth reside
Avg length: 10 amino acids, or 3 turns
Varies from 5 to 40 amino acids
 Helix

Normally found on the surface of protein cores

Interact with aqueous environment

Inner facing side has hydrophobic amino acids

Outer-facing side has hydrophilic amino acids

Every third amino acid tends to be hydrophobic

Pattern can be detected computationally

Rich in alanine (A), gutamic acid (E), leucine (L), and methionine
(M)

Poor in proline (P), glycine (G), tyrosine (Y), and serine (S)
 Sheet
 Sheet

Hydrogen bonds between 5-10 consecutive amino acids in one
portion of the chain with another 5-10 farther down the chain

Interacting regions may be adjacent with a short loop, or far
apart with other structures in between

Directions:





Same: Parallel Sheet
Opposite: Anti-parallel Sheet
Mixed: Mixed Sheet
Alpha carbons (and R side groups) alternate above & below the
sheet
Prediction difficult, due to wide range of  and  angles
Ramachandran Plot (alpha)
Ramachandran Plot (beta)
Ramachandran Plot
Helices and Sheets
Loop

Regions between  helices and  sheets

Various lengths and three-dimensional configurations

Located on surface of the structure

Hairpin loops: complete turn in the polypeptide chain, (antiparallel  sheets)

More variable sequence structure

Tend to have charged and polar amino acids
Coil

Region of secondary structure that is not a helix, sheet,
or loop
Determining Protein Structure


There are O(100,000) distinct proteins in human
proteome.
Two methods for revealing positions of atoms in 3D:

X-Ray Crystallography



X-ray diffraction pattern + mathematical construction
Good protein crystal needed, good resolution of diffraction
needed
Nuclear Magnetic Resonance


Small proteins only (< 250 residues)
Inter-proton distances + geometric constraints
Bovine Ribonuclease
Christian Anfinsen, 1957.
Disulfide Bonds

Two cysteines in
close proximity will
form a covalent
bond

Disulfide bond,
disulfide bridge, or
dicysteine bond.

Significantly
stabilizes tertiary
structure.
Principles that govern the folding of protein
chains - Christian Anfinsen, Science 1973
Ribonuclease
Disulfide Bonds
# of cysteines # of S-S bonds # of combinations
4
2
3
6
3
15
8
4
105
10
5
945
12
6
10395
Levinthal’s paradox

How do proteins find the right conformation out of
the simply endless number of potential threedimensional forms that it could randomly fold into?

Consider a 100 residue protein. If each residue can
take only 3 positions, there are ? 3100 = 5  1047
possible conformations.
 If it takes 10-13s to convert from 1 structure to another,
exhaustive search would take ? 1.6  1027 years!
Current Opinion in Structural Biology, 2004, 14, 70-75
What determines fold?

Anfinsen’s experiments in 1957 demonstrated that
proteins can fold spontaneously into their native
conformations under physiological conditions. This
implies that primary structure does indeed determine
folding or 3-D structure.

Exceptions exist


Chaperone proteins assist folding
Abnormally folded Prion proteins can catalyze misfolding of
normal prion proteins that then aggregate
Other factors

Physical properties of protein that influence stability
& therefore, determine its fold:

Rigidity of backbone

Amino acid interaction with water


Hydropathy index for side chains
Interactions among amino acids

Electrostatic interactions

Hydrogen, disulphide bonds

Volume constraints
Understand protein folding

Structure: Given a sequence, what tertiary structure does it
adopt?


Thermodynamics: under mutation does the free energy of the
native state change relative to native sequence?


Global optimization, Monte Carlo, Molecular dynamics, Coarse-grained
dynamics, etc.
MC, MD, Free energy methods, etc.
Kinetics: how fast does the protein fold? Does a different
sequence fold faster and why?

Lattice Monte Carlo, Molecular dynamics, Coarse-grained dynamics
CASP changed the
landscape

Critical Assessment of Structure Prediction competition. Even
numbered years since 1994


Solved, but unpublished structures are posted in May, predictions due in
September
Various categories




Relation to existing structures, ab initio, homology, fold, etc.
Partial vs. Fully automated approaches
Produces lots of information about what aspects of the problems are
hard, and ends arguments about test sets.
Results showing steady improvement, and the value of
integrative approaches.