Download Uploading Protein Data

Motif Space Database Design Kiranjit Sidhu 2 Outline Schema Design  Content of Database  Functionality  Future Plans  3 Sample PDB File  Sample PDB File  Each PDB File represented as a text file (~ 60K Lines)  Inefficient for pattern matching  Relational Database required for most efficient solution 4 Structure of Database  DB divided into two major components:    Protein Data    Protein Data Motif (Occurrence) Data Obtained from PDB Files (Protein Data Bank) Derived Data Motif Data   Obtained from Luke’s FFSM technique Derived Data 5 Schema Design 6 Schema Design - Protein 7 Schema Design - Motif 8 Tools Used  Obtaining Data   Perl Scripts Database:   SQL Server 2000 and SQL Server 2005 T-SQL (Bulk Import Data) 9 Obtaining Data PDB File Final DB Extract Import CSV File Convert and Derive Temp Tables (T-SQL) T-SQL Procedures 10 Uploading Protein Data   Input dataset: ~ 70,000 PDB/Chain Combinations Entries in tables:    E.g. Approx. 800 Million Rows in the proteinchaindistance table Initial version imported 10 PDB files in 1 day Current version: under 3 minutes 11 Current Functionality  Protein (PDB) data has been completely uploaded into both:   Production Database (MotifSpace) Development Database (MotifSpaceDev) Visualize protein structure using data from database (data available)  Data can be obtained from Server using SOAP or web services.  Basic Queries such as    Different PDBs a specific motif occurs in? Histograms to compute statistics. Demo 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Uploading Protein Data