Download Protein-DNA interaction dataset Understanding the molecular

Protein-DNA interaction dataset Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. This dataset contains 56 proteins bound to double-stranded DNA (dsDNA), 427 protein-DNA complexes with resolution better than 3.0 Å were extracted from the Nucleic Acid Database (NDB). Proteins in the complexes were clustered into homologous families and a non-redundant representative set of 56 proteins was obtained Each input in the data set contains the identities of 2n+1 residues in the form of X  ( xt n , xt n1 ,..., xt 1 , xt , xt 1 ,..., xt n1 , xt n ) where xt is the identity of target residue, xt n , xt n1 ,..., xt 1 and xt 1 ,..., xt n1 , xt n are the identities of n residues on both sides of the target residue. (in this case n=4 was found to yield optimal performance) Relative accessible surface area (rASA) of each residue (in the absence of DNA) was computed using NACCESS program. A training example with target residue rASA added is in the form r x of X  ( xt n , xt n1,..., xt 1, xt , xt 1,..., xt n1, xt n , rt , c) where i and c are defined as above, and t is the rASA of target X  ( xt n , xt n1 ,..., xt 1 , xt , xt 1 ,..., xt n1 , xt n , rt ) residue. A test example is a vector of . Input is encoded in a similar way when other features are added. Entropy of each sequence position was extracted from HSSP database (http://www.cmbi.kun.nl/gv/hssp/). The entropy is normalized to the range of 0-100. The lower the entropy is, the more conserved is the corresponding sequence position.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Protein-DNA interaction dataset Understanding the molecular