Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1. About the dataset Human nsSNPs were extracted via analysis of the VARIANT field in the corresponding Swiss-Prot entries. nsSNPs annotated as “Disease” are disease-associated, and those annotated as “Polymorphism” are neutral nsSNPs. Major histocompatibility complex proteins and membrane proteins were excluded. We selected nsSNPs with experimentally determined structure or structural homologs. In case of multiple representative PDB entries, the one with highest sequence identity was chosen. This dataset includes 532 neutral nsSNPs within 305 genes and 3686 disease-associated nsSNPs within 323 genes. 2. Field description Field Name SNP PhenoClass Environ SecStruct AreaBuried FracPolar ScopLink SIFTscore NoHomology Meaning The SNP identity. Phenotype annotated by the Swiss-Prot DB. "Disease" or "Neutral". The structural environment of the SNP (1) B1, B2, B3, P1, P2 and E * Secondary structure. H: alpha-helix, S: beta-sheet, C: coil. Solvent accessibility score (1). Environmental polarity score (1). SCOP identifier for the homologous 3D structure. Normalized probability in a multiple sequence alignment (2). Number of homologous sequences used to get the SIFT score * The first character denotes the solvent accessibility B: buried, P: partially buried, E: exposed. The second number (if exists) denotes different environmental polarity provided the solvent accessibility is the same, with a larger number corresponding to a larger polarity. 1. Bowie JU, Luthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 1991 253:164-70 2. Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001 11:863-74