Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Protein–protein interaction wikipedia , lookup
Genetic code wikipedia , lookup
Point mutation wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Expression vector wikipedia , lookup
Proteolysis wikipedia , lookup
Metalloprotein wikipedia , lookup
Homology modeling wikipedia , lookup
Biochemistry wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Modeling Cell Proliferation Activity of Human Interleukin-3 (IL-3) Upon Single Residue Replacements Majid Masso Bioinformatics and Computational Biology George Mason University, Manassas, Virginia, USA BIOSTEC BIOINFORMATICS 2011 IL-3 Structure, Function, and Experimental Mutagenesis Data • IL-3 promotes the growth of many hematopoietic cell lines • Theoretically, there are 19 × 112 = 2128 possible IL-3 mutants via single residue substitutions at all positions in the structure • Experimental dataset: 630 of these IL-3 mutants were synthesized, representing substitutions at all but 12 positions • Activity of synthesized IL-3 mutants measured as % of wild type (wt) using erythroleukemic cell proliferation assays: 27 “increased” mutants (>100% wt); 373 “full” (20 – 100% wt); 75 “moderate” (5 – 19% wt); and 155 “low” (< 5% wt) • Alternatively, there are 400 “unaffected” (“increased” + “full”) and 230 “affected” (“moderate” + “low”) IL-3 mutants Delaunay Tessellation of Protein Structure Aspartic Acid (Asp or D) Cα coordinates Abstract every amino acid residue to a point Atomic coordinates – Protein Data Bank (PDB) A22 L6 D3 F7 G62 K4 S64 R5 C63 Delaunay tessellation: 3D “tiling” of space into non-overlapping, irregular tetrahedral simplices. Each simplex objectively identifies a quadruplet of nearest-neighbor amino acids at its vertices. Delaunay Tessellation of Interleukin-3 (IL-3) • Ribbon (left) from PDB file 1jli (112 residues, positions 14 – 125) • Each amino acid residue is represented by its Cα in 3D space • Tessellation of the 112 Cα points (right) is performed using a 12Å edge-length cutoff, for “true” residue quadruplet interactions Four-Body Statistical Potential Training set: nearly 1,400 diverse high-resolution x-ray structures PDB Tessellate … 1bniA barnase 3lzm t4 lysozyme 1rtjA HIV-1 RT 1efaB lac repressor Pool together all simplices from the tessellations, and compute observed frequencies of simplicial quadruplets Four-Body Statistical Potential Computational Mutagenesis D21 14 simplices, 11 neighbors of D21 (large Cα point) IL-3 tessellation environmental change (EC) Residual profile vector Rmut of IL-3 D21S mutant Residual score = EC21 IL-3 Experimental Data: Structure – Function Relationship Feature Vectors for IL-3 Mutants • For IL-3 mutation at position N, nonzero EC scores in residual profile vector Rmut occur only at N and its structural neighbors • Every position has at least 6 neighbors, can be ordered based on Euclidean distance from position N (tessellation edge-lengths) • So, create new 7D vector: residual score (EC score at N), and EC scores of the 6 closest neighbors (ordered by distance from N) • 20 additional features: position number N, wt and replacement residues, residues at neighbor positions, primary sequence location of neighbors relative to N, mean tetrahedrality and volume of simplices using N, secondary structure at N, tessellation-defined depth of N, and number of surface contacts • Total: each IL-3 mutant represented as a 27D feature vector Supervised Classification (unaffected/affected) • Algorithm: random forest (RF); Training set: 630 IL-3 mutants • Testing: tenfold cross-validation (10-fold CV), leave-one-out CV (LOOCV), and random split (2/3 for training, 1/3 for prediction) • Evaluation of performance: • • • • Overall accuracy, or proportion of correct predictions: Q Balanced error (accuracy) rate: BAR = 1 – BER Matthew’s correlation coefficient: MCC Area under ROC curve: AUC Statistical Significance of Predictions 0.2 Probability Density__ 0.18 0.16 0.14 1000 random class label permutations 0.12 0.1 MCC 0.08 BAR original class labels MCC (0.54) BAR (0.77) 0.06 0.04 0.02 0 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 RF 10-fold CV Performance 1 Application: Predict Activity of Remaining IL-3 Mutants Conclusion and Future Directions • Computational mutagenesis procedure effectively elucidates the IL-3 structure-function relationship (via residual scores) • Random forest predictive model for any mutational effect on IL-3 activity developed using attributes based on: • computational geometry (Delaunay tessellation of IL-3 structure) • computational mutagenesis (EC scores of residual profile vectors) • Current work focused on inductive learning, future project could apply transductive learning for predicting unknown mutants • The techniques can be applied to any similar experimental protein mutant dataset – motivation for robust wet-lab collaborations • Contact: [email protected] Slides available at: http://binf.gmu.edu/mmasso