Download Modeling Cell Proliferation Activity of Human Interleukin

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein–protein interaction wikipedia , lookup

Protein wikipedia , lookup

Genetic code wikipedia , lookup

Point mutation wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Expression vector wikipedia , lookup

Proteolysis wikipedia , lookup

Metalloprotein wikipedia , lookup

Homology modeling wikipedia , lookup

Biochemistry wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Transcript
Modeling Cell Proliferation Activity
of Human Interleukin-3 (IL-3)
Upon Single Residue Replacements
Majid Masso
Bioinformatics and Computational Biology
George Mason University, Manassas, Virginia, USA
BIOSTEC BIOINFORMATICS 2011
IL-3 Structure, Function, and
Experimental Mutagenesis Data
• IL-3 promotes the growth of many hematopoietic cell lines
• Theoretically, there are 19 × 112 = 2128 possible IL-3 mutants
via single residue substitutions at all positions in the structure
• Experimental dataset: 630 of these IL-3 mutants were
synthesized, representing substitutions at all but 12 positions
• Activity of synthesized IL-3 mutants measured as % of wild
type (wt) using erythroleukemic cell proliferation assays:
27 “increased” mutants (>100% wt); 373 “full” (20 – 100% wt);
75 “moderate” (5 – 19% wt); and 155 “low” (< 5% wt)
• Alternatively, there are 400 “unaffected” (“increased” + “full”)
and 230 “affected” (“moderate” + “low”) IL-3 mutants
Delaunay Tessellation of Protein Structure
Aspartic Acid
(Asp or D)
Cα coordinates
Abstract every amino acid residue to a point
Atomic coordinates – Protein Data Bank (PDB)
A22
L6
D3
F7
G62
K4
S64
R5
C63
Delaunay tessellation: 3D “tiling” of space into non-overlapping,
irregular tetrahedral simplices. Each simplex objectively identifies
a quadruplet of nearest-neighbor amino acids at its vertices.
Delaunay Tessellation of Interleukin-3 (IL-3)
• Ribbon (left) from PDB file 1jli (112 residues, positions 14 – 125)
• Each amino acid residue is represented by its Cα in 3D space
• Tessellation of the 112 Cα points (right) is performed using a 12Å
edge-length cutoff, for “true” residue quadruplet interactions
Four-Body Statistical Potential
Training set: nearly 1,400 diverse
high-resolution x-ray structures
PDB
Tessellate
…
1bniA
barnase
3lzm
t4 lysozyme
1rtjA
HIV-1 RT
1efaB
lac repressor
Pool together all simplices from the tessellations, and
compute observed frequencies of simplicial quadruplets
Four-Body Statistical Potential
Computational Mutagenesis
D21
14 simplices,
11 neighbors
of D21 (large
Cα point)
IL-3
tessellation
environmental
change (EC)
Residual profile vector
Rmut of IL-3 D21S mutant
Residual score = EC21
IL-3 Experimental Data:
Structure – Function Relationship
Feature Vectors for IL-3 Mutants
• For IL-3 mutation at position N, nonzero EC scores in residual
profile vector Rmut occur only at N and its structural neighbors
• Every position has at least 6 neighbors, can be ordered based on
Euclidean distance from position N (tessellation edge-lengths)
• So, create new 7D vector: residual score (EC score at N), and EC
scores of the 6 closest neighbors (ordered by distance from N)
• 20 additional features: position number N, wt and replacement
residues, residues at neighbor positions, primary sequence
location of neighbors relative to N, mean tetrahedrality and
volume of simplices using N, secondary structure at N,
tessellation-defined depth of N, and number of surface contacts
• Total: each IL-3 mutant represented as a 27D feature vector
Supervised Classification (unaffected/affected)
• Algorithm: random forest (RF); Training set: 630 IL-3 mutants
• Testing: tenfold cross-validation (10-fold CV), leave-one-out CV
(LOOCV), and random split (2/3 for training, 1/3 for prediction)
• Evaluation of performance:
•
•
•
•
Overall accuracy, or proportion of correct predictions: Q
Balanced error (accuracy) rate: BAR = 1 – BER
Matthew’s correlation coefficient: MCC
Area under ROC curve: AUC
Statistical Significance of Predictions
0.2
Probability Density__
0.18
0.16
0.14
1000 random class
label permutations
0.12
0.1
MCC
0.08
BAR
original
class labels
MCC
(0.54)
BAR
(0.77)
0.06
0.04
0.02
0
-0.2 -0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
RF 10-fold CV Performance
1
Application: Predict Activity of Remaining
IL-3 Mutants
Conclusion and Future Directions
• Computational mutagenesis procedure effectively elucidates the
IL-3 structure-function relationship (via residual scores)
• Random forest predictive model for any mutational effect on IL-3
activity developed using attributes based on:
• computational geometry (Delaunay tessellation of IL-3 structure)
• computational mutagenesis (EC scores of residual profile vectors)
• Current work focused on inductive learning, future project could
apply transductive learning for predicting unknown mutants
• The techniques can be applied to any similar experimental protein
mutant dataset – motivation for robust wet-lab collaborations
• Contact: [email protected]
Slides available at: http://binf.gmu.edu/mmasso