Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Motif Space Database Design Kiranjit Sidhu 2 Outline Schema Design Content of Database Functionality Future Plans 3 Sample PDB File Sample PDB File Each PDB File represented as a text file (~ 60K Lines) Inefficient for pattern matching Relational Database required for most efficient solution 4 Structure of Database DB divided into two major components: Protein Data Protein Data Motif (Occurrence) Data Obtained from PDB Files (Protein Data Bank) Derived Data Motif Data Obtained from Luke’s FFSM technique Derived Data 5 Schema Design 6 Schema Design - Protein 7 Schema Design - Motif 8 Tools Used Obtaining Data Perl Scripts Database: SQL Server 2000 and SQL Server 2005 T-SQL (Bulk Import Data) 9 Obtaining Data PDB File Final DB Extract Import CSV File Convert and Derive Temp Tables (T-SQL) T-SQL Procedures 10 Uploading Protein Data Input dataset: ~ 70,000 PDB/Chain Combinations Entries in tables: E.g. Approx. 800 Million Rows in the proteinchaindistance table Initial version imported 10 PDB files in 1 day Current version: under 3 minutes 11 Current Functionality Protein (PDB) data has been completely uploaded into both: Production Database (MotifSpace) Development Database (MotifSpaceDev) Visualize protein structure using data from database (data available) Data can be obtained from Server using SOAP or web services. Basic Queries such as Different PDBs a specific motif occurs in? Histograms to compute statistics. Demo 12