* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Appendix 3 Assessment of the effects of the observed variants We
Silencer (genetics) wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Expression vector wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Gene expression wikipedia , lookup
Magnesium transporter wikipedia , lookup
Interactome wikipedia , lookup
Structural alignment wikipedia , lookup
Metalloprotein wikipedia , lookup
Biochemistry wikipedia , lookup
Protein purification wikipedia , lookup
Western blot wikipedia , lookup
Genetic code wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Appendix 3 Assessment of the effects of the observed variants We assessed the likely effects of the observed variants as follows: For the previously reported variants, allele frequencies observed in our sample were compared with published allele frequencies by means of contingency-table chi squared test (Pearson 2) using the software CONTING ver. 2.61 (1). Four bioinformatic algorithms were used to perform splice site predictions: NNsplice (at http://www.fruitfly.org/seq_tools/splice.html) analyzes the structure of donor and acceptor sites separately by neural network recognisers. Scores from 0.0 to 1.0 are provided for each splice site, and the cut-off threshold can be altered to exclude low scoring predictions (2). SpliceView (http://zeus2.itb.cnr.it/~webgene/wwwspliceview_ex.html) constructs sets of consensus sequences around splice sites in a classification approach. 9 bp and 20 bp sequences are considered for predicting donor sites and acceptor sites respectively. Predictive scores range from 0 to 100 (3). SplicePort (http://spliceport.cs.umd.edu/) recognises sequence features through the integration of candidate feature construction and relevant feature selection through the employment of a feature generation algorithm. The sensitivity value is set by default at 88.5% for donor sites and 88.8% for acceptor sites corresponding to an FGA score threshold of (0.0). Depending on the stringency required for predictions, this threshold can be altered, and sensitivity values will be recalculated accordingly (4). Human Splice Finder -HSF- (www.umd.be/HSF/) is based on information theory, using a weight matrix model to calculate the nucleotides frequencies at each position in the splice site. Consensus values (from 0 to 100) are generated for each splice site (5). Non-synonymous variants were assessed using the SNP prediction tools SIFT, PolyPhen2, SNPs3D, PMut and SNPs&GO. The algorithm SIFT (6) is based on the principles of protein evolution, and uses sequence homology to predict the degree of amino acid conservation in protein sequences which correlates with the extent to which an amino acid substitution is tolerated (i.e. substitution is unlikely to be tolerated in highly conserved positions). Probability scores in SIFT range from 0 to 1, with a cut-off threshold set at 0.05 to sort tolerant from intolerant substitutions (≤ 0.05 damaging; > 0.05 tolerated). ‘Median info’ and ‘# Seqs at position’ provides, respectively, measures of diversity and numbers of sequences aligned for prediction. This enables estimation of the reliability of the analysis. In addition to sequence homology approaches, PolyPhen-2 (7) and SNPs3D (Yue, Melamud and Moult 2006) adds protein structure principles to its prediction models, which may give insights into protein stability, and possible links with levels of protein expression. PolyPhen-2 uses a scale from 0 (benign) to 1 (probably damaging) and shows the number of aligned sequences at the query position as well. SNPs3D returns positive scores for non-deleterious, and negative scores for deleterious, substitutions. Confidence increases proportionately with scores, with accuracy significantly higher for scores more than 0.5 or less than -0.5. PMUT (8) computes an NN (neural network) output that ranges from 0 to 1 (mutants that score above 0.5 are classified pathological), accompanied with a reliability score from 0 (low) to 9 (very reliable). Furthermore, PMUT has a feature that performs ‘Alanine scans’ on a protein sequence, focussing solely on mutations to alanine (Ala). These mutations are putatively the least detrimental to protein structure. Hence, mutations to Ala that are potentially pathological and predicted with high reliability are thought to be located in ‘sensitive’ positions in the protein. Data generated from these web interfaces were assembled to aid in a better understanding of the possible effects of genetic variation on the levels of expression of PSA. Calabrese et al. 2009 (9) argue that the limited data available on three dimensional structure of proteins hinders predictions that incorporate structural information. Instead, they include in their tool SNPs&GO gene ontology (GO) terms that describe - in a standardised manner - gene products, biological processes (BP), molecular functions (MF) or cellular components (CC). GO terms, in addition to local sequence-derived information, sequence alignments, and prediction data from the PANTHER algorithm [Protein ANalysis THrough Evolutionary Relationships], are meant to improve the reliability of functional predictions for coding variants. Using SNPs&GO, the PSA (KLK3) GO terms that are associated with our non-synonymous variants include proteolysis (GO: 0006508), negative regulation of angiogenesis (GO: 0016525) [BP]; serine-type endopeptidase activity (GO: 0004252) [MF] and extracellular region (GO: 0005576) [CC]. These terms can be accessed by AmiGO [http://amigo.geneontology.org/cgi-bin/amigo/go.cgi] or QuickGO [http://www.ebi.ac.uk/QuickGO/]. The effect of the variants on the secondary structure of PSA was also analysed. The nucleotide/amino acid sequences were aligned, showing both previously reported variants and the variants found in our work. Alignment with protein secondary structure was carried out in order to put into context in which structural features the amino acid substitutions are located. Secondary protein structure was obtained from Protein Data Bank. The annotation shown is from the STRIDE algorithm for the assignment of protein secondary structure elements, given the atomic coordinates of the protein as defined by X-ray crystallography. Reference List 1. Ott J. CONTING ver 2.61. Utility programs for analysis of genetic linkage. 1988. 2. Reese MG, Eeckman FH, Kulp D, Haussler D. Improved splice site detection in Genie. J Comput Biol 1997;4:311-23. 3. Rogozin IB, Milanesi L. Analysis of donor splice sites in different eukaryotic organisms. J Mol Evol 1997;45:50-9. 4. Dogan RI, Getoor L, Wilbur WJ, Mount SM. SplicePort--an interactive splice-site analysis tool. Nucleic Acids Res 2007;35:W285-W291. 5. Desmet FO, Hamroun D, Lalande M, Collod-Beroud G, Claustres M, Beroud C. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009;37:e67. 6. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 2003;31:3812-4. 7. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P et al. A method and server for predicting damaging missense mutations. Nat Methods 2010;7:248-9. 8. Ferrer-Costa C, Gelpi JL, Zamakola L, Parraga I, de lC, X, Orozco M. PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 2005;21:3176-8. 9. Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R. Functional annotations improve the predictive score of human disease-related mutations in proteins. Hum Mutat 2009;30:123744.