Download Uploading Protein Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Motif Space
Database Design
Kiranjit Sidhu
2
Outline
Schema Design
 Content of Database
 Functionality
 Future Plans

3
Sample PDB File

Sample PDB File
 Each
PDB File represented as a text
file (~ 60K Lines)
 Inefficient for pattern matching
 Relational Database required for
most efficient solution
4
Structure of Database

DB divided into two major components:



Protein Data



Protein Data
Motif (Occurrence) Data
Obtained from PDB Files (Protein Data Bank)
Derived Data
Motif Data


Obtained from Luke’s FFSM technique
Derived Data
5
Schema Design
6
Schema Design - Protein
7
Schema Design - Motif
8
Tools Used

Obtaining Data


Perl Scripts
Database:


SQL Server 2000 and SQL Server 2005
T-SQL (Bulk Import Data)
9
Obtaining Data
PDB File
Final DB
Extract
Import
CSV File
Convert and Derive
Temp Tables (T-SQL)
T-SQL Procedures
10
Uploading Protein Data


Input dataset: ~ 70,000 PDB/Chain
Combinations
Entries in tables:



E.g. Approx. 800 Million Rows in the
proteinchaindistance table
Initial version imported 10 PDB files in 1
day
Current version: under 3 minutes
11
Current Functionality

Protein (PDB) data has been completely
uploaded into both:


Production Database (MotifSpace)
Development Database (MotifSpaceDev)
Visualize protein structure using data from
database (data available)
 Data can be obtained from Server using
SOAP or web services.
 Basic Queries such as



Different PDBs a specific motif occurs in?
Histograms to compute statistics.
Demo
12
Related documents