Download embor2011116-sup-0001

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Magnesium transporter wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Protein moonlighting wikipedia , lookup

Mutation wikipedia , lookup

Western blot wikipedia , lookup

Protein (nutrient) wikipedia , lookup

Molecular evolution wikipedia , lookup

Expanded genetic code wikipedia , lookup

Peptide synthesis wikipedia , lookup

Protein wikipedia , lookup

Protein adsorption wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Cell-penetrating peptide wikipedia , lookup

Protein folding wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Bottromycin wikipedia , lookup

Biochemistry wikipedia , lookup

Cyclol wikipedia , lookup

Protein domain wikipedia , lookup

Metalloprotein wikipedia , lookup

Genetic code wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Transcript
Prediction of amyloid aggregation in vivo
by Mattia Belli, Matteo Ramazzotti and Fabrizio Chiti
SUPPLEMENTARY INFORMATION
Supplementary Table 1: The modus operandi of the algorithms listed in Table 1
Name
Reference
Modus Operandi
Chiti and Dobson
Chiti et al, 2003
It is based on an empirical equation that yields the change in the rate of protofibril
formation of an unstructured peptide or protein following mutation, as a function of
the change, at the site of mutation, in both hydrophobicity and propensity to convert
from -helical to -sheet structure of the chain as well as the change in net charge of
the entire protein.
TANGO
Fernandez-Escamilla et al, 2004
TANGO is a statistical mechanics algorithm able to predict -sheet aggregation of
proteins (not amyloid formation although a correlation exists between the two
phenomena). While scanning a given amino acid sequence, TANGO evaluates the
probability, for each residue, of adopting one of the major conformational states,
including the -aggregate, by estimating the energy from statistical and empirical
considerations and assuming that in -aggregates the core regions are buried. The
algorithm considers protein stability and intrinsic factors of the polypeptide chain
(hydrophobicity,
electrostatic
interactions,
hydrogen-bonding
contributions,
structural conformation propensities), as well as extrinsic factors (pH, protein
concentration, ionic strength, TFE concentration). According to TANGO, a peptide
segment has -aggregation tendency if it includes at least five consecutive residues
with a probability to populate the -aggregate state higher than 5% per residue.
TANGO was thus the first algorithm proposed to predict, in addition to the effect of
a given mutation in a given sequence, the sequence segments that promote
aggregation and form the core regions in the resulting -aggregates.
DuBay et al
DuBay et al, 2004
It adds to the Chiti and Dobson equation to predict the absolute rate of fibril
formation (elongation phase) of polypeptide chains from fully or partially unfolded
states under different conditions and without any experimental information on the
regions most sensitive to aggregation. It accounts for extrinsic factors such as pH,
ionic strength and peptide concentration, and replaced the secondary structure
propensity factor with another one that accounts for the presence of patterns of
alternating hydrophobic and hydrophilic residues (PATs), considered to be ideal for
adopting -sheet structure. The algorithm uses the solution conditions and the
sequence as an input and yields the rate constant for aggregation (elongation phase
when a lag phase is present) after calculating the charge and average hydrophobicity
1
of the sequence and counting the number and lengths of the PATs.
Pawar et al.
Pawar et al, 2005
This algorithm identifies the regions of unstructured peptides or naturally unfolded
proteins that are most important for promoting the formation of amyloid aggregates.
It uses the Dubay et al. algorithm, but considers only the intrinsic factors (charge,
hydrophobicity and PATs) to calculate the amyloid aggregation propensities for all
the naturally occurring amino acid residues. While the algorithm scans a polypeptide
sequence, it assigns to each residue i the average aggregation propensity of the
residues within a window of a few residues centred on the residue i and generates the
aggregation propensity profile (aggregation propensity plotted versus the residue
number). Aggregation propensities are standardized by calculating the standard
scores (Z-scores) relative to a reference set of random polypeptides with the same
length as the analysed sequence and with a residue composition related to the amino
acids frequencies in the Swiss-Prot database. Aggregation-prone regions can thus be
identified as those presenting consecutive residue with scores higher than a threshold
of 1 (1 std. dev. from the mean aggregation propensity of random sequences).
Tartaglia et al.
Tartaglia et al, 2004
As the algorithm from Chiti and Dobson it is able to predict the change in the rate of
Tartaglia et al, 2005
protofibril formation of unstructured peptides or proteins following mutation. In
addition to charge and -propensity, the algorithm also considers water-accessible
surface area, dipole moment and -stacking interaction, to describe the aggregation
propensity of the 20 naturally occurring amino acid residues. Moreover, any
experimentally-determined coefficients was avoided with the aim of generalizing the
model as much as possible. The method was improved in 2005 to predict the
absolute aggregation propensity and the -aggregating regions together with their
preference to adopt a parallel o antiparallel configuration within fibrils. The new
method takes into account mainly aromaticity (π-stacking), -propensity and charge,
but also water-accessible surface area, water solubility and extrinsic factors such as
protein concentration and temperature.
Zyggregator
Tartaglia et al, 2008
Zyggregator is a development of the Pawar et al. algorithm. It is able to determine
the aggregation propensity profile of a given polypeptide in any conformational
state. The algorithm predicts which regions of the sequence aggregate into fibrillar
or proto-fibrillar structures from disordered or folded states under physiological
conditions. It calculates the aggregation propensity for each residue relying on the
same intrinsic properties of the previous model plus an additional factor accounting
for the presence of gatekeeper residues, i.e. residues that prevent aggregation and
usually flanking aggregation-prone regions. The parametric nature of Zyggregator
allows re-calibrations depending on different purposes (e.g. prediction under nonphysiological conditions or to form protofibrillar species as opposed to fibrils).
Finally, in combination with a local stability score deriving from the CamP method
2
(Tartaglia et al, 2007), Zyggregator is able to consider in the calculus the influence
of secondary and tertiary structure formation within folded states on the aggregation
propensity of individual amino acid residues. This allows the generation of an
aggregation propensity profile where the contribution of each residue is weighted by
the level of structure formation. Zyggregator is thus highly versatile and adaptable to
various conditions and purposes.
SALSA
Zibaee et al, 2007
Simple ALgorithm for Sliding Averages (SALSA) was developed to locate regions
with high the propensity for -strand structure (fibrillogenic hotspot) within
polypeptide sequences, assuming a strong correlation between -strand propensity
and formation of fibrillar aggregates. It calculates a mean -strand propensity (MβP)
for each residue in a polypeptide. Several sliding windows of different lengths
containing a given residue are evaluated, their MβP determined by averaging Chou
and Fasman secondary structure propensities and those with a MβP score below 1.2
are discarded. Eventually, all remaining windows are summed up to obtain a specific
score per residue.
AGGRESCAN
Conchillo-Solé et al, 2007
It predicts the aggregation-prone regions in polypeptide sequences without reference
to a particular type of the aggregate morphology. By replacing the residue number
19 of the amyloid  peptide (A 42) with all possible natural amino acids and by
estimating the aggregation propensity of each of the resulting 20 variants through a
GFP reporter after expression in E. coli, a scale of aggregation propensity for the 20
naturally occurring residues was edited. AGGRESCAN uses a sliding window
procedure to assign to each residue a score by averaging the values, taken from the
new scale, of the surrounding residues. From the resulting aggregation propensity
profile a set of mathematical descriptors are provided to help the identification of the
aggregation hot spots. It is the first predictive method totally based on empirical
information in the cellular context and provides predictions of aggregation hot spots
of protein sequences and of mutational effects in vitro.
Net-CSSP
Yoon & Welsh, 2004
It is the first method derived from a structural analysis of folded proteins. It is based
Yoon et al, 2007
on the observation that native-strands and-helices occur more frequently in
regions with high and low numbers of tertiary contacts (TCs), respectively, and that
some sequences having -helical or random coil conformations in native folds can
form -strands and promote amyloid fibril formation under certain conditions.
Amyloidogenic segments in polypeptide sequences can be identified by searching
regions adopting a non--strand conformation within folded proteins, yet having
high values of TCs, namely a hidden-strand propensity (HP). An early version of
2004 was based on the average number of TCs for each amino acid, calculated from
the SCOP20 dataset. Further implementations of 2007 (Net-CSSP) relied on
pairwise potential energy and use artificial neural networks to improve the prediction
3
capability.
PASTA
Trovato et al, 2006
Prediction of Amyloid STructure Aggregation (PASTA) predicts the regions of
polypeptide sequence involved in the formation of ordered cross-β structure. It is
based on the assumption that -strands involved in fibril formation adopt β-pairings
with minimum energies and with a preference for parallel or anti-parallel in-register
arrangement. A dataset of globular proteins with strictly defined secondary
structures was investigated in order to calculate the pairing energies for each pair of
residues facing one another on parallel or antiparallel neighbouring strands within a
β-sheet. Assuming that amyloid fibrils originate from the stacking of β-strands
belonging to different polypeptide molecules with identical sequence, PASTA scans
an input polypeptide sequence determining both parallel and antiparallel pairing
energies for each possible stretch of a given length. Finally the predicted β-pairing
involved in aggregation and the related orientation correspond to those with the
lowest pairing energy. Similarly to other algorithms, PASTA identifies aggregationprone regions. However, it also aims at determining, unlike most other methods,
whether the identified -strands in the fibrils adopt a parallel or antiparallel
orientation.
FoldAmyloid
Galzitskaya et al, 2006
It is a software that aims at predicting the amyloid fibril-forming regions as well as
the intrinsically disordered regions of polypeptide sequences through the estimation
of the mean packing density. Investigating the SCOP database, the observed mean
packing density for each of the 20 amino acid residues was derived, i.e. the mean
number of “close” residues around each of the 20 amino acid residues (where
“close” means that any pair of the heavy atoms is within a distance of 8 Å). A
polypeptide segment within a protein sequence appears amyloidogenic if it includes
more than 5 consecutive residues with a strong packing density; it is considered
intrinsically disordered when more than 11 consecutive residues show a weak
packing density.
3D profile
Thompson et al, 2006
This work exploited the available crystallographic structural details of the amyloid
fibril forming hexapeptide NNQQNY, to predict the regions of polypeptide
sequences forming amyloid fibrils. The crystal structure of NNQQNY was used to
create in silico a collection of near-native templates, or 3D profile. The various
templates differ for small atomic displacement along the 3 orthogonal axes. The
algorithm scans a given sequence by sliding a window of 6 residues, threading the
resulting hexapeptide onto each of the templates from the 3D profile and then
evaluating the energetic fit by using the ROSETTADESIGN software (Kuhlman et
al, 2000). Hexapeptides that yield a minimum energy score lower than a defined
threshold have the potential to form amyloid aggregates. The 3D profile method is
based on the novel idea of analysing the structure of a peptide in a fibril-like
4
conformation.
BETASCAN
Bryan et al, 2009
Following the evidence that the core structure of amyloid is based on -strands
pairing, the BETASCAN algorithm was designed to determine the most likely strands within a polypeptide and the preferred -strand pairings. For an input
sequence BETASCAN estimates the probability for every possible stretch of length
2-13 residues of pairing any other stretch of the same length, relying on pairwise
probability tables compiled by the authors. The preference for each pair of amino
acids to be hydrogen bonded in a -sheet was derived from a selected subset of the
PDB database. All the possible -strands (starting point and length) are visualized in
a lattice where each node represent the relative propensity score and predicted strands appear as triangular signals. In addition, the most likely -strand pairs are
depicted in a scatter plot.
Waltz
Maurer-Stroh et al, 2010
It was developed to better distinguish between ordered amyloid aggregates and
amorphous -sheet aggregates with the aim of predicting the most likely amyloidforming regions. By assuming that peptides involved in amyloid fibril formation
show amino acid preferences in key positions, the authors explored the sequence
diversity of several amyloid hexapeptides from the AmylHex dataset, which includes
examples of hexapeptides positive or negative for fibril formation. This work
allowed the construction of a composite scoring function that includes a positionspecific scoring matrix (PSSM), a set of physicochemical properties and a positionspecific pseudoenergy matrix. This tool allows the identification of amyloid forming
regions in sequences. According to the authors, the PSSM that summarizes the
amino acid preferences for distinct positions in amyloid-forming hexapeptides is
responsible for the predictive power of Waltz.
Supplementary Section 1
Criteria adopted for estimating the correlation between predicted and observed changes in
amyloid aggregation following mutation.
The available experimental data concerning amyloid aggregation in vivo do not allow the
determination of absolute values of aggregation rate/propensity of different proteins or the
aggregation-promoting regions within them. They only allow to estimate quantitatively the change
5
in the aggregation rate/propensity of a given protein following a given mutation, with this parameter
estimated repeatedly using a number of single or multiple mutations.
Our review aimed at evaluating the correlation between such experimental data and those
estimated in silico by several algorithms. The algorithms are, however, different in the type of
predictions they provide. Two of the predictive methods provide directly the change in the
aggregation rate upon mutation (Chiti & Dobson and Tartaglia et al.), while the others rather
provide aggregation propensity profiles, that is the aggregation propensity as a function of residue
number along the sequence (see Table 1 in the main text). Nevertheless, even for an algorithm of
this type, it is possible to determine the change in the aggregation rate/propensity upon mutation by
comparing the aggregation propensity profiles generated using the wild-type and mutant sequences.
As described previously, the change in the aggregation propensity following mutation can be
derived by calculating the difference between the aggregation propensity of the mutant sequence
and the aggregation propensity of the wild-type sequence (Pawar et al, 2005, see “Determination of
the log(kmut/kwt) profiles” in the Materials and methods section).
Therefore, for the profile-generating algorithms, the change in the aggregation propensity of a
given polypeptide sequence following a given mutation(s), can be assessed by determining one of
the following:
 Pmut-Pwt, where P is the value of the profile at the site of mutation (Pmut-Pwt corresponds to the
change in the aggregation propensity at the site of mutation);
 Amut-Awt, where A is the average of all the values of the profile (Amut-Awt corresponds to the
change in the average aggregation propensity);
 Smut-Swt, where S is the sum of all the values of the profile (Smut-Swt is the change in the total
aggregation propensity);
Since Amut-Awt appeared to be redundant with respect to Smut-Swt, the former was not considered.
6
The 4 datasets of experimental data in vivo considered in our analysis included either amino acid
substitutions at single positions or substitutions at multiple positions. To calculate the effect of
single substitutions on the aggregation propensity using profile-generating algorithms we used PmutPwt values. To calculate the effect of multiple substitutions we used Smut-Swt values. The algorithms
calculating directly relative aggregation rates (Chiti & Dobson and Tartaglia et al) provides directly
the effect of single substitutions on the aggregation propensity, whereas we summed the
contributions of single substitutions to calculate the effect of multiple mutations.
The correlation between the change in the aggregation propensity following mutation(s)
predicted by a given algorithm and the corresponding change in the aggregation propensity (or
solubility) observed in vivo was evaluated by applying a linear model. The correlation between the
predicted change in the aggregation propensity and the change in solubility or in relative GFP
fluorescence was observed to be linear in previous studies (for example Figure 5 in Winkelmann et
al 2010 and Figure 4 in de Groot et al 2006). Even though non-linear or multi-modal trends were
sporadically observed, more complex analyses were not performed.
Supplementary Section 2
Explanation of the parameters used for each algorithm.
Here we describe for each algorithm the procedure used to obtain the predicted change in the
aggregation propensity upon mutation(s). We will refer to <P> as the aggregation propensity score
at the site of mutation and to <S> as the total aggregation propensity score.
The Chiti & Dobson and Tartaglia et al methods provides directly the predicted change in
the aggregation rate upon mutation, ln(Pwt/Pmut), so further steps were not necessary. The equations
described in the original papers were implemented in two distinct scripts using the Perl
programming language in order to automatize the computation procedure.
Net-CSSP estimates three propensities: “helix propensity” (Pα), “beta propensity” (Pβ) and
“coil propensity” (Pcoil). These scores were used to calculate the HβP according to the following
7
P


 (Yoon & Welsh, 2005). HβP for individual amino acids (<P>) is
 P  Pcoil 
equation: HβP  ln 
calculated by taking the residue-specific propensities whereas total HβP for the entire sequence
(<S>) is calculated by taking the overall propensities. Net-CSSP is available at the following
address: http://cssp2.sookmyung.ac.kr/.
TANGO provides a “beta aggregation” score for each residue within a sequence besides the
total tendency for β-sheet aggregation (“AGG score”). We used the beta aggregation score at the
site of mutation as the individual residue propensity (<P>) and the AGG score as the total
aggregation propensity (<S>). TANGO is available at the following address: http://tango.crg.es/.
The algorithm from Pawar et al provides a propensity score for each residue within a
sequence. We used the score at the site of mutation as the individual residue propensity (<P>) and
the sum of all the profile scores as the total propensity (<S>). A software that uses the Pawar
algorithm was written by M. Ramazzotti and is available upon request.
The ZipperDB database collects all the aggregation propensity profiles calculated using the
3D profile method. For a given sequence an energy score is provided for each residue, the energy
score being related to the hexapeptide starting at that position. We took the energy score at the site
of mutation as individual residue propensity (<P>) and the sum of all the profile scores as total
propensity
(<S>).
ZipperDB
is
available
at
the
following
address:
http://services.mbi.ucla.edu/zipperdb/.
PASTA gives an aggregation propensity profile showing the “normalized per-residue
probability h(k)”. The scores at the site of mutation was used as the individual residue propensity
(<P>) while the sum of all the profile scores as the total propensity (<S>). PASTA is available at
the following address: http://protein.cribi.unipd.it/pasta/.
The FoldAmyloid server calculates profiles with different scales; we chose the default scale
(“expected number of contacts 8Å”). From the resulting profile we took the value at the site of
8
mutation for individual residue propensity (<P>) and the sum for the total propensity (<S>).
FoldAmyloid is available at the following address: http://antares.protres.ru/fold-amyloid/oga.cgi.
AGGRESCAN provides an aggregation profile in addition to several mathematical
descriptors. From the profile we used the a4v score at the site of mutation as individual residue
propensity (<P>) and the Na4vSS score as the total propensity (<S>). AGGRESCAN is available at
the following address: http://bioinf.uab.es/aggrescan/.
The SALSA algorithm computes the aggregation propensity profile of a given sequence.
The residue score at the site of mutation was used as the individual residue propensity (<P>) and
the sum of all the profile scores as the total propensity (<S>). Since the original software is not
available online, a new software was implemented using the Perl programming language.
Zyggregator computes the aggregation profile of a given sequence. The residue score at the
site of mutation was used as the individual residue propensity (<P>) and the sum of all the profile
scores as the total propensity (<S>). Zyggregator is available at the following address:
http://www-vendruscolo.ch.cam.ac.uk/zyggregator.php/.
For the Waltz algorithm the “detailed with graphics” output was chosen with a threshold of
0 in order to obtain complete profiles. The value on the graph corresponding to the residue at the
site of mutation was used as individual residue propensity (<P>) and the sum of all the values as the
total propensity (<S>). Waltz is available at the following address: http://waltz.vub.ac.be/.
Experimental conditions such as temperature, pH and ionic strength were set, when
requested by the algorithms, according to the information reported by the original experimental
papers, while other parameters were used as default values. The software we implemented in-house
for the algorithms by Chiti & Dobson, Tartaglia et al, Pawar et al and SALSA, were checked before
using them to ensure that the results published in the original papers were reproduced.
9
Supplementary References
Bryan AW, Menke M, Cowen LJ, Lindquist SL, Berger B (2009) BETASCAN: probable amyloids identified by pairwise probabilistic analysis. PLoS Comput Biol 5: e1000333.
Chiti F, Stefani M, Taddei N, Ramponi G, Dobson CM (2003) Rationalization of the effects of
mutations on peptide and protein aggregation rates. Nature 424: 805-8.
Conchillo-Solé O, de Groot NS, Avilés FX, Vendrell J, Daura X, Ventura S (2007) AGGRESCAN:
a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides. BMC
Bioinformatics 8: 65.
de Groot NS, Aviles FX, Vendrell J, Ventura S (2006) Mutagenesis of the central hydrophobic
cluster in Abeta42 Alzheimer's peptide. Side-chain properties correlate with aggregation
propensities. FEBS J 273: 658-68.
DuBay KF, Pawar AP, Chiti F, Zurdo J, Dobson CM, Vendruscolo M (2004) Prediction of the
absolute aggregation rates of amyloidogenic polypeptide chains. J Mol Biol 341: 1317-26.
Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L (2004) Prediction of sequencedependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 22:
1302-6.
Galzitskaya OV, Garbuzynskiy SO, Lobanov MY (2006) Prediction of amyloidogenic and
disordered regions in protein chains. PLoS Comput Biol 2: e177.
Kuhlman B, Baker D (2000) Native protein sequences are close to optimal for their structures. Proc
Natl Acad Sci U S A 97: 10383-8.
Maurer-Stroh S et al (2010) Exploring the sequence determinants of amyloid structure using
position-specific scoring matrices. Nat Methods 7: 237-42.
Monsellier E, Ramazzotti M., de Laureto P.P., Tartaglia G.G., Taddei N., Fontana A., Vendruscolo
M., Chiti F. (2007) The distribution of residues in a polypeptide sequence is a determinant of
aggregation optimized by evolution. Biophys J 93: 4382-91.
10
Pawar AP, Dubay K.F., Zurdo J., Chiti F., Vendruscolo M., Dobson C.M. (2005) Prediction of
"aggregation-prone" and "aggregation-susceptible" regions in proteins associated with
neurodegenerative diseases. J Mol Biol 350: 379-92.
Tartaglia GG, Cavalli A, Pellarin R, Caflisch A (2004) The role of aromaticity, exposed surface,
and dipole moment in determining protein aggregation rates. Protein Sci 13: 1939-41.
Tartaglia GG, Cavalli A, Pellarin R, Caflisch A (2005) Prediction of aggregation rate and
aggregation-prone segments in polypeptide sequences. Protein Sci 14: 2723-34.
Tartaglia GG, Cavalli A, Vendruscolo M (2007) Prediction of local structural stabilities of proteins
from their amino acid sequences. Structure 15: 139-43.
Tartaglia GG, Vendruscolo M (2008) The Zyggregator method for predicting protein aggregation
propensities. Chem Soc Rev 37: 1395-401.
Thompson MJ, Sievers SA, Karanicolas J, Ivanova MI, Baker D, Eisenberg D (2006) The 3D
profile method for identifying fibril-forming segments of proteins. Proc Natl Acad Sci U S A
103: 4074-8.
Trovato A, Chiti F, Maritan A, Seno F (2006) Insight into the structure of amyloid fibrils from the
analysis of globular proteins. PLoS Comput Biol 2: e170.
Winkelmann J, Calloni G, Campioni S, Mannini B, Taddei N, Chiti F (2010) Low-level expression
of a folding-incompetent protein in Escherichia coli: search for the molecular determinants of
protein aggregation in vivo. J Mol Biol 398: 600-13.
Yoon S, Welsh WJ (2004) Detecting hidden sequence propensity for amyloid fibril formation.
Protein Sci 13: 2149-60.
Yoon S, Welsh W.J. (2005) Rapid assessment of contact-dependent secondary structure propensity:
relevance to amyloidogenic sequences. Proteins 60: 110-7.
Yoon S, Welsh WJ, Jung H, Yoo YD (2007) CSSP2: an improved method for predicting contactdependent secondary structure propensity. Comput Biol Chem 31: 373-7.
11
Zibaee S, Makin OS, Goedert M, Serpell LC (2007) A simple algorithm locates -strands in the
amyloid fibril core of -synuclein, A, and tau using the amino acid sequence alone. Protein Sci
16: 906-18.
12
Figure S1. Predicted change in the
aggregation
propensity
upon
mutation versus experimental (in
vivo) solubility for A42 variants.
Each graph reports the predicted
change in the aggregation propensity
upon mutation (calculated according
to the algorithm indicated in each
graph and the procedure described in
the suppl. info.) versus experimental
relative fluorescence of GFP fused
to A42 mutants, as described
(Wurth et al 2002). Scales on the yaxis have been adjusted to show, for
each plot, the full dataset. The lines
represent the best fits of the data to
linear functions. For each plot the
name of the algorithm is reported, as
well as the absolute value of the
Pearson linear correlation coefficient
(r) and the statistical significance of
the slope (p).
13
Figure S2. Predicted change in the
aggregation
propensity
upon
mutation versus experimental (in
vivo) solubility in E. coli cytosol for
A42 variants. Each graph reports
the predicted change in the
aggregation
propensity
upon
mutation (according to the algorithm
indicated in each graph and
following the procedure described in
the suppl. info.) versus experimental
relative fluorescence of GFP fused
to A42 mutants, as described (Kim
et al 2006). Scales on the y-axis
have been adjusted to show, for each
plot, the full dataset. The lines
represent the best fits of the data to
linear functions. For each plot the
name of the algorithm is reported, as
well as the absolute value of the
Pearson linear correlation coefficient
(r) and the statistical significance of
the slope (p).
14