Download Solvent Accessibility Prediction in proteins

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Channelrhodopsin wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Cyclol wikipedia , lookup

Implicit solvation wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Solvent Accessibility Prediction in
proteins
Shandar Ahmad
Department of Biosciences,
Jamia Millia Islamia University,
New Delhi-110025, India
“Surface Was Invented by Devil.”, Fermi.
Is that really so?
• Surface of a molecule hides the inner part to
visualization of bulk material, which frustrated
Fermi.
• Surface is the most important part of proteins, as
the enzyme activity, and binding are governed
by surface residues.
• Solvent accessibility is a measure of exposed
surface of an entire protein, individual amino
acid residues or constituent atoms.
Definition
Importance of solvent accessibility
• Accessible surface area (ASA) determines stability of
proteins, as hydrophobic transfer energy is directly a
measure of residue-wise solvent accessible surface.
• Buried residues reside in the core of the protein and
hence crucial to stability, even if they may not be active
sites.
• A protein may lose function upon mutation either
because the new amino acid does not bind or because
the protein lost structure upon that mutation making
stability a crucial factor.
• Accessible residues represent active sites of the protein.
• Secondary structure is not sensitive to point mutations,
as much as ASA is.
DNA-binding probability and ASA
0.25
Fraction of Binding Residues
0.2
0.15
0.1
0.05
0
90-100
80-90
70-80
60-70
50-60
40-50
30-40
20-30
10-20
ASA range (%)
Source: Ahmad S. et. al 2004, Bioinformatics
0-10
How do we calculate ASA
• There are several free programs to
calculate ASA for proteins with known
structures.
Characteristic features of the most common methods for calculating ASA
ACCESS
DSSP
NACCESS
ASC
GETAREA
Standalone
executable
availability
Yes
Yes
Licenced
Yes
No
Online
calculations/
database
No
Yes
No
Yes
Yes
Polar and
nonpolar area
No
No
Yes
No
Yes
Atom-wise surface
area
Yes
No
Yes
Yes
Yes
Source code
availability
No
Yes
No
Yes
No
Choice of probe
radius
Yes
No
Yes
Yes
No
Choice of van der
Waals
and other
parameters
Yes
No
Yes
Yes
By
Manual
editing
Secondary
structure
No
Yes
No
No
No
Reference
Lee and
Richards
(1971)
Kabsch and
Sander
(1983)
Hubbard and
Thornton
(1993)
Eisenhaber
and Argos
(1993)
Fraczkiewicz,
and Braun,
(1998)
Comparison
ASAView: A tool to plot and view
Solvent accessibility
• An online service to calculate and view
ASA was developed
(www.netasa.org/asaview).
• A database of plots for the entire PDB is
included.
• The database can be downloaded.
• ASAView is linked from PDB.
ASAView: Database and tool for solvent accessibility representation
in proteins. Shandar Ahmad, Michael Gromiha, Hamed Fawareh
and Akinori Sarai, BMC Bioinformatics (2004) 5:51
ASAView
• Residues are colored in polar, hydrophobic,
positive and negative charge categories.
• Each residue is represented by a solid circle of
radius proportional to its solvent accessibility.
• Residues are arranged outward in a spiral
diagram, such that the lowest ASA is in the
interior of the diagram, emulating the actual
three-dimensional environment.
Solvent accessibility and protein
interfaces
• Change is ASA is often used to define an
interface and identify interacting residues.
• We have defined native and isolated
domain ASAs to estimate the extent of
unsaturated bonds in a protein sequence.
• Domain-domain and protein-protein
interactions may be predictable from
native ASAs.
M. Firdaus Raih, Shandar Ahmad, Zheng Rong, Rahmah Mohamed
Biophysical Chemistry 114 (2005) 63-69
Post interface ASA in different secondary structure conformations
Relative loss of ASA by
interfacing is not
dependent on secondary
structure
70.0
N a t ive A SA r e lat ive t o do m a
60.0
50.0
40.0
30.0
20.0
10.0
Helix
Strand
BetaB
3-10helix
Turn
Bend
Coil
0.0
Ala
Cys Asp Glu Phe Gly
His
Ile
Lys Leu Met Asn Pro Gln
Residue
Arg Ser Thr
Val
Trp
Charged residues retain most of their ASA even after interfacing
35.0
Postive
Negative
Hydrophobic
Polar neutral
R e la t ive n u m be r o f r e s
30.0
25.0
Surface hydrophobic residues lose
more ASA upon interfacing than
charged ones
20.0
15.0
10.0
5.0
0.0
<10
10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
>90
Post interface ASA range (Native ASA relative to isolated domain ASA)
Solvent accessibility predictions
• Many more sequences are available than
structures.
• Structure prediction requires good templates
or extensive computing.
• Knowledge of solvent accessibility ahead of
structure is useful.
Methods of ASA prediction
• Goal
– ASA categories/ states
– Real relative value of ASA
– Real absolute value of ASA
• Information indices
– Amino acid sequence
– Evolutionary information
– Burial potentials
• Model type
– Information theory
– Neural network (single and two-stage)
– Multiple linear regression
ASA states or categories
• ASA cutoffs:
– ASA values are transformed to normalized values by (1)
extended state ASA of Ala-XAla or Gly-X-Gly (2) highest ASA
– Residues are annotated as buried or exposed based on certain
values of relative ASAs.
– Two, Three or up to 10 categories are defined. (Ahmad &
Gromiha, 2002 Bioinformatics)
• Cutoffs are arbitrary
– Different people use different categories.
– Prediction quality strongly depends on these cutoffs.
– Comparison between performance is difficult to make.
• Recent work by Vardarajan (2006) shows a cutoff at 5%.
– It is shown experimentally that mutations at sites with > 5% ASA,
most strongly affect the protein function i.e. activity.
A multi-layer neural network
Hidden
Layer (s)
Input
Layer
Connection Weights
Wijk (jth unit of layer i and
kth unit of layer i+1)
Output
Layer
Unit activation (kth unit of ith layer)
Uik = f (Σ U(i-1)j W(i-1)jk )
Neural networks and digitization
of amino acids
• Binary orthogonal codification
• Substitution matrices as amino acid codes.
• Dimensionality reduction, using neural
network.
Dimensionality of amino-acid space … , Arauzo Bravo, Ahmad S, and Sarai,
Comp. Biol. & Chem. (In Press) 2006
PSSM based predictions
Development of non-redundant
databases.
• Several data sets are available.
–
–
–
–
–
Barton 512 proteins,
Rost and Sander 126
Yuan 1260
Meller ~800
Ahmad ~2300 domains.
• Largest data set so far used by us based on ASTRAL
picked up domain-wise instead of proteins.
• Redundancy is removed by sequence identity.
• Completeness of structure and quality are checked,
using WhatIF and ProCheck.
Cross-validation
• Three-fold cross-validation
• Leave-one-out cross validation for domain
vs native ASAs
Real value Prediction of ASA
We gave the first Real Value Prediction. Several
other authors followed. (Ahmad et al. Proteins 2003)
Real Value Prediction
Input Layer
Residue and
neighbor information
(Each residue and its
neighbor are coded
by 21 bits).
Connection Weights
Wijk (jth unit of layer i and
kth unit of layer i+1)
x1
x2
P=1/[1+exp(x2-x1)]
x1 and x2 are activation values of
units in the output layer.
P is multiplied by 100 to get a
percentage scale prediction.
Unit activation (kth unit
of ith layer)
Uik = Σ U(i-1)j W(i-1)jk
Results of RV Predictions.
70.0
60.0
Percentage
50.0
40.0
30.0
20.0
10.0
0.0
0--10
10--20
20--30
30--40
40--50
50--60
60--70
70--80
80--90
90--100
ASA Range (%)
As ASA values increase, so does prediction error (black)
due to a corresponding fall in the relative abundance of data (gray).
Residue-wise variation in prediction error
35.0
30.0
25.0
Percentage
Residue-specific
prediction error
and ASA variability.
Dark circles represent
the prediction error,
and gray squares
show the corresponding
standard deviation
in the experimental
ASA for that residue
type. A very high
correlation (r = 0.97)
is observed between
the prediction error
and standard deviation
in the original data.
20.0
15.0
10.0
5.0
0.0
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
Amino acid residue
R
S
T
V
W
Y
Prediction histogram
Relative number of residues (%)
45
40
35
30
25
20
15
10
5
0
0--10
10--20
20--30
30--40
40--50
50--60
60--70
70--80
Prediction error per residue (% )
80--90
90--100
Look-up tables for Solvent
Accessibility Predictions
¾We recently developed residue pattern
libraries to serve as dictionaries of ASA
values www.netasa.org/look-up/.
¾1P, 1N, 2P2N type prediction.
¾Smaller patterns give better results due to
lack of convergence in longer patterns.
Look up tables for prediction and analysis of nearest neighbor
effects on solvent accessibility, Jung-Ying Wang, Shandar Ahmad,
Michael Gromiha and Akinori Sarai Bioploymers 75 (2004) 209-216
Variation between 1P and 1N ASA information
0.12
Stdev (1P)
Stdev (1N)
Stde v i n A SA w i th c h an g e i n n e
0.10
0.08
0.06
0.04
0.02
0.00
A
C
D
E
F
G
H
I
K
L
M
N
Residue
P
Q
R
S
T
V
W
X
Y
Z
Prediction of ASA for each atom
• 167 different atomic groups occur in
proteins.
• We have carried out first large scale
analysis and prediction of ASA for each of
these atoms (Ahmad et al. submitted for
publication, 2006).
• Interesting observations are made about
ASA distribution.
Main results from analysis and
prediction of atomic ASA
Most atoms are primarily distributed in very small ASA
range near 0.
Proline CB atoms are frequently exposed
Some Carbon atoms show a sharp second peak,
suggesting two stable conformations
Acidic and Basic residues have exposed nitrogen/ oxygen
Other results
• 167 neural network were designed and
most atomic ASAs could be predicted
close to 1A except those having bimodal
distribution.
• Some atoms were more sensitive to
neighbor information than others.
• Some were more sensitive to C-terminal
neighbor and some to N-terminal.
Thank you