Download Appendix 3 Assessment of the effects of the observed variants We

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Silencer (genetics) wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Expression vector wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein wikipedia , lookup

Interactome wikipedia , lookup

Structural alignment wikipedia , lookup

Metalloprotein wikipedia , lookup

Biochemistry wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Genetic code wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Appendix 3
Assessment of the effects of the observed variants
We assessed the likely effects of the observed variants as follows:
For the previously reported variants, allele frequencies observed in our sample were
compared with published allele frequencies by means of contingency-table chi squared test
(Pearson 2) using the software CONTING ver. 2.61 (1).
Four bioinformatic algorithms were used to perform splice site predictions:
NNsplice (at http://www.fruitfly.org/seq_tools/splice.html) analyzes the structure of donor
and acceptor sites separately by neural network recognisers. Scores from 0.0 to 1.0 are
provided for each splice site, and the cut-off threshold can be altered to exclude low scoring
predictions (2).
SpliceView (http://zeus2.itb.cnr.it/~webgene/wwwspliceview_ex.html) constructs sets of
consensus sequences around splice sites in a classification approach. 9 bp and 20 bp
sequences are considered for predicting donor sites and acceptor sites respectively. Predictive
scores range from 0 to 100 (3).
SplicePort (http://spliceport.cs.umd.edu/) recognises sequence features through the
integration of candidate feature construction and relevant feature selection through the
employment of a feature generation algorithm. The sensitivity value is set by default at 88.5%
for donor sites and 88.8% for acceptor sites corresponding to an FGA score threshold of
(0.0). Depending on the stringency required for predictions, this threshold can be altered, and
sensitivity values will be recalculated accordingly (4).
Human Splice Finder -HSF- (www.umd.be/HSF/) is based on information theory, using a
weight matrix model to calculate the nucleotides frequencies at each position in the splice
site. Consensus values (from 0 to 100) are generated for each splice site (5).
Non-synonymous variants were assessed using the SNP prediction tools SIFT, PolyPhen2,
SNPs3D, PMut and SNPs&GO. The algorithm SIFT (6) is based on the principles of protein
evolution, and uses sequence homology to predict the degree of amino acid conservation in
protein sequences which correlates with the extent to which an amino acid substitution is
tolerated (i.e. substitution is unlikely to be tolerated in highly conserved positions).
Probability scores in SIFT range from 0 to 1, with a cut-off threshold set at 0.05 to sort
tolerant from intolerant substitutions (≤ 0.05 damaging; > 0.05 tolerated). ‘Median info’ and
‘# Seqs at position’ provides, respectively, measures of diversity and numbers of sequences
aligned for prediction. This enables estimation of the reliability of the analysis. In addition to
sequence homology approaches, PolyPhen-2 (7) and SNPs3D (Yue, Melamud and Moult
2006) adds protein structure principles to its prediction models, which may give insights into
protein stability, and possible links with levels of protein expression. PolyPhen-2 uses a scale
from 0 (benign) to 1 (probably damaging) and shows the number of aligned sequences at the
query position as well. SNPs3D returns positive scores for non-deleterious, and negative
scores for deleterious, substitutions. Confidence increases proportionately with scores, with
accuracy significantly higher for scores more than 0.5 or less than -0.5. PMUT (8) computes
an NN (neural network) output that ranges from 0 to 1 (mutants that score above 0.5 are
classified pathological), accompanied with a reliability score from 0 (low) to 9 (very
reliable). Furthermore, PMUT has a feature that performs ‘Alanine scans’ on a protein
sequence, focussing solely on mutations to alanine (Ala). These mutations are putatively the
least detrimental to protein structure. Hence, mutations to Ala that are potentially
pathological and predicted with high reliability are thought to be located in ‘sensitive’
positions in the protein. Data generated from these web interfaces were assembled to aid in a
better understanding of the possible effects of genetic variation on the levels of expression of
PSA. Calabrese et al. 2009 (9) argue that the limited data available on three dimensional
structure of proteins hinders predictions that incorporate structural information. Instead, they
include in their tool SNPs&GO gene ontology (GO) terms that describe - in a standardised
manner - gene products, biological processes (BP), molecular functions (MF) or cellular
components (CC). GO terms, in addition to local sequence-derived information, sequence
alignments, and prediction data from the PANTHER algorithm [Protein ANalysis THrough
Evolutionary Relationships], are meant to improve the reliability of functional predictions for
coding variants. Using SNPs&GO, the PSA (KLK3) GO terms that are associated with our
non-synonymous variants include proteolysis (GO: 0006508), negative regulation of
angiogenesis (GO: 0016525) [BP]; serine-type endopeptidase activity (GO: 0004252) [MF]
and extracellular region (GO: 0005576) [CC]. These terms can be accessed by AmiGO
[http://amigo.geneontology.org/cgi-bin/amigo/go.cgi] or QuickGO
[http://www.ebi.ac.uk/QuickGO/].
The effect of the variants on the secondary structure of PSA was also analysed. The
nucleotide/amino acid sequences were aligned, showing both previously reported variants
and the variants found in our work. Alignment with protein secondary structure was carried
out in order to put into context in which structural features the amino acid substitutions are
located. Secondary protein structure was obtained from Protein Data Bank. The annotation
shown is from the STRIDE algorithm for the assignment of protein secondary structure
elements, given the atomic coordinates of the protein as defined by X-ray crystallography.
Reference List
1. Ott J. CONTING ver 2.61. Utility programs for analysis of genetic linkage. 1988.
2. Reese MG, Eeckman FH, Kulp D, Haussler D. Improved splice site detection in Genie. J Comput
Biol 1997;4:311-23.
3. Rogozin IB, Milanesi L. Analysis of donor splice sites in different eukaryotic organisms. J Mol
Evol 1997;45:50-9.
4. Dogan RI, Getoor L, Wilbur WJ, Mount SM. SplicePort--an interactive splice-site analysis tool.
Nucleic Acids Res 2007;35:W285-W291.
5. Desmet FO, Hamroun D, Lalande M, Collod-Beroud G, Claustres M, Beroud C. Human Splicing
Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009;37:e67.
6. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic
Acids Res 2003;31:3812-4.
7. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P et al. A method and
server for predicting damaging missense mutations. Nat Methods 2010;7:248-9.
8. Ferrer-Costa C, Gelpi JL, Zamakola L, Parraga I, de lC, X, Orozco M. PMUT: a web-based tool for
the annotation of pathological mutations on proteins. Bioinformatics 2005;21:3176-8.
9. Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the
predictive score of human disease-related mutations in proteins. Hum Mutat 2009;30:123744.