Download DataDescription

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1. About the dataset
Human nsSNPs were extracted via analysis of the VARIANT field in the
corresponding Swiss-Prot entries. nsSNPs annotated as “Disease” are disease-associated,
and those annotated as “Polymorphism” are neutral nsSNPs. Major histocompatibility
complex proteins and membrane proteins were excluded. We selected nsSNPs with
experimentally determined structure or structural homologs. In case of multiple
representative PDB entries, the one with highest sequence identity was chosen. This
dataset includes 532 neutral nsSNPs within 305 genes and 3686 disease-associated
nsSNPs within 323 genes.
2. Field description
Field Name
SNP
PhenoClass
Environ
SecStruct
AreaBuried
FracPolar
ScopLink
SIFTscore
NoHomology
Meaning
The SNP identity.
Phenotype annotated by the Swiss-Prot DB.
"Disease" or "Neutral".
The structural environment of the SNP (1)
B1, B2, B3, P1, P2 and E *
Secondary structure.
H: alpha-helix, S: beta-sheet, C: coil.
Solvent accessibility score (1).
Environmental polarity score (1).
SCOP identifier for the homologous 3D structure.
Normalized probability in a multiple sequence alignment (2).
Number of homologous sequences used to get the SIFT score
*
The first character denotes the solvent accessibility B: buried, P: partially buried, E: exposed.
The second number (if exists) denotes different environmental polarity provided the solvent
accessibility is the same, with a larger number corresponding to a larger polarity.
1. Bowie JU, Luthy R, Eisenberg D. A method to identify protein sequences that fold
into a known three-dimensional structure. Science. 1991 253:164-70
2. Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res.
2001 11:863-74