Download Protein Structure - CS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein (nutrient) wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Gene expression wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Biochemistry wikipedia , lookup

Bottromycin wikipedia , lookup

List of types of proteins wikipedia , lookup

Western blot wikipedia , lookup

Protein wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein design wikipedia , lookup

Protein adsorption wikipedia , lookup

Proteolysis wikipedia , lookup

Metalloprotein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein folding wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein domain wikipedia , lookup

Rosetta@home wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Structural alignment wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Protein Structure Prediction
.
Protein Structure
 Amino-acid
chains can fold to form 3-dimensional
structures
 Proteins are sequences
that have (more or less)
stable 3-dimensional
configuration
Why Structure is Important?
The structure a protein takes is crucial for its function
 Forms “pockets” that can recognize an enzyme
substrate
 Situates side chain of
specific groups to co-locate
to form areas with desired
chemical/electrical properties
 Creates firm structures such as
collagen, keratins, fibroins
Determining Structure
 X-Ray
and NMR methods allow to determine the
structure of proteins and protein complexes
 These methods are expensive and difficult
 Could take several work months to process one
proteins
A
centralized database (PDB) contains all solved
protein structures
 XYZ coordinate of atoms within specified
precision
 ~19,000 solved structures
Growth of the Protein Data Bank
Structure is Sequence Dependent
 Experiments
show that for many proteins, the 3dimensional structure is a function of the sequence
 Force the protein to loose its structure, by
introducing agents that change the environment
 After sequences put back in water, original
conformation/activity is restored
 However,
for complex proteins, there are cellular
processes that “help” in folding
Amino Acids
What Forces Hold the Structure?
 Structure
is supported by several types of chemical
bonds/forces
 Hydrogen Bonds
What Forces Hold the Structure?
 Charge-charge

interactions
Positive charged groups prefer to be situated
against negatively charged groups
What Forces Hold the Structure?
 Disulfide


bonds
S-S bonds between
cysteine residues
These form during
folding
What Forces Hold the Structure?
 Hydrophobic
effect
Levels of structure
Secondary Structure
-helix
-strands
Hydrogen Bonds in -Helixes
-Strands form Sheets
parallel
Anti-parallel
These sheets hold together by hydrogen bonds across strands
Angular Coordinates
 Secondary
residues
structures force specific angles between
Ramachandran Plot
 We
can related angles to types of structures
Labeling Secondary Structure
 Using
both hydrogen bond patterns and angles, we
can label secondary structure tags from XYZ
coordinate of amino-acids
 These do not lead to absolute definition of
secondary structure
Prediction of Secondary Structure
Input:
 amino-acid sequence
Output:
 Annotation sequence of three classes:
 alpha
 beta
 other (sometimes called coil/turn)
Measure of success:
 Percentage of residues that were correctly labeled
Protein Folds: sequential, spatial and
topological arrangement of
secondary structures
The Globin fold
Approaches for structure prediction
Homology modeling
 (25-30% identity as a predictor)
Fold recognition
 Remote homology
Ab initio Prediction
 Heavy computations
Newly Determined StructuresFraction of New Folds
Fraction of new folds
(PDB new entries in 1998)
Koppensteiner et al., 2000,
JMB 296:1139-1152.
A Finite Number of Protein Folds
Aim:
recognize fold that “matches” a given sequence
Approaches:
 PSI-Blast, Profile HMMs, etc.
 Threading
Threading: Essential components
• structural template
4E
• neighbor definition
C3
• energy function
C2
ACCECADAAC
-3-1-4-4-1-4-3-3=-23
A1
E
E
aib j
positionsi, j
10
5
C
9
6 A
8
7 D
Eab
A
C
D
E
.
A C
-3 -1
-1 -4
0 1
0 2
. .
C
A
A
D
0
1
5
6
.
E …..
0 ..
2 ..
6 ..
7 ..
.
Find best fold for a protein sequence:
Fold recognition (threading)
1)
...
56)
...
MAHFPGFGQSLLFGYPVYVFGD...
-10
...
...
n)
...
-123
...
Potential fold
20.5
GenTHREADER
(Jones , 1999, JMB 287:797-815)
For each template provide MSA
 align the query sequence with the MSA
 assess the alignment by sequence alignment
score
 assess the alignment by pairwise potentials
 assess the alignment by solvation function
 record lengths of: alignment, query, template
Essentials of GenTHREADER
Ab-initio Structure Recognition
Goal:
 Predict structure from “first principles”
Benefits:
 Works for novel folds
 Shows that we understand the process
Approaches to Ab-initio Prediction
Molecular Dynamics
 Simulates the forces that governs the protein within
water
 Since proteins natural fold, this would lead to
solved structure
Problems:
 Thousands of atoms
 Huge number of time steps to reach folded protein
Intractable problem
Approaches to Ab-initio Prediction
Minimal Energy
 Assumption: folded form is the minimal energy
conformation of the protein
Decomposition:
 Define energy function
 Search for 3-D conformation that minimize energy
Energy Function
 Account





for the forces that apply on the molecule
Van der wals forces
Covalent bonds
Hydrogen bonds
Charges
Hydrophobic effects
Issues:
 Estimating parameters
 How do we compute it --- O( (# atoms)^2 )
Simplified Energy Functions
Different levels of granularity
 Residue-Residue energy function (Bead model)
 Partial


model
Backbone as a bid
Side-chain as a rigid body that can move wrt to
backbone
 Many
other variants
Search Strategy
 High
dimensional search problem
How do we represent partial solutions?
 Position
of each atom (too detailed!)
 Position of each reside (too coarse!)
 Intermediate solutions (e.g., backbone and side
chain)
Search Strategy
Representation tradeoffs
 X,Y,Z


coordinates
Easy to compute distances between residues
Might represent infeasible solutions
 Angles


between successive residues
Easy to ensure a “legal” protein
Harder to compute distances
Search Strategy
Typical approach:
 Secondary structure prediction
 Attempts at different conformation keeping
secondary structure fixed
 Finer moves relaxing secondary structure
Use
 Greedy search
 Simulated annealing
…
Rosetta Method
Idea:
 “Structural” signatures are reoccurring within
protein structures
 Use these as cues during structure search
Local structure motifs
I-sites Library = a catalog of local sequence-structure correlations
diverging type-2
turn
Frayed helix
Serine hairpin
Proline helix C-cap
Type-I hairpin
alpha-alpha corner
glycine helix N-cap
Example: Non-polar Alpha-helix
Example: Non-polar beta-strand
Example: Gly alpha-C-cap Type 1
Construction of I-sites library
 Construct
profiles (PSI-BLAST like) for each solved
structure
 Collect each possible segments of fixed length
(len = 3, 9, 15)
 Perform k-means clustering of segments
 Check each cluster for a “coherent” structure (in
terms of dihedral angles
 Prune incoherent structures
 Iteratively refine remaining clusters by removing
structurally different segments, redefining cluster
membership, etc.
All proteins can be constructed from
fragments
Recent experiment:
For representative proteins, backbones were
assembled from a library of 1000 different 5residue fragments.
Rosetta: a folding simulation program
Fragment insertion Monte Carlo
backbone torsion angles
fragments
accept or
reject
Choose a fragment
change backbone
angles
Energy
function
evaluate
Convert to 3D
Rosetta’s energy function
Sequence dependent features
Residue-residue contact energies are derived from the database
Rosetta’s energy function
Sequence-independent features
Current structure
vector representation
Probabilities from the database
The energy score for a contact between secondary structures
is summed using database statistics.
Rosetta prediction results
61% “topologically correct”
60% “locally correct”
73% secondary structure (Q3) correct
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
RMSD
L=windowsize
Tertiary structure %correct is the fraction of
the sequence that is in a 30-residue window
with RMSD < 6.0Å
6.0Å
L=30
L=20
L=8
Sequence
MDA
Local structure
Teriary structure
Evaluation of partially correct predictions
Local structure %correct is the fraction of the
sequence that has mda < 90°.
90°
Sequence
mda = maximum
deviation in
backbone angles
over an 8 residue
window.
T0116 262-322 (61 residues)
prediction
true structure
Topologically correct (rmsd=5.9Å) but helix is mispredicted as loop.
T0121 126-199 (66 residues)
prediction
true structure
Topologically correct (rmsd=5.9Å) but loop is mispredicted as helix.
T0122 57-153 (97 residues)
prediction
true structure
...contains a 53 residue stretch with max deviation = 96°
prediction
T0112 153-213
true structure
Low rmsd (5.6Å) and all angles correct ( mda = 84°),
but topologically wrong!!
(this is rare)