* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 3: Molecular database development - BIDD
Survey
Document related concepts
Commitment ordering wikipedia , lookup
Microsoft Access wikipedia , lookup
Serializability wikipedia , lookup
Schengen Information System wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
LSM3241: Bioinformatics and Biocomputing Lecture 3: Molecular database development Prof. Chen Yu Zong Tel: 6516-6877 Email: [email protected] http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore Development of a protein inhibitor database • Step 1: Start from a protein (alternative names from Swissprot) • Step 2: Search for inhibitors using protein name and alternative names (from Medline, or Google) • Step 3: Find additional information about each of the inhibitors (from ChemIDplus database) • Step 4: Database construction 2 Assignment of a protein • Each student is given an Id number, use this number to get your protein from the protein list posted on the module home page. • In case that you wish to work on a different protein instead of the one assigned to you, get another one from a list of back-up proteins posted on the module home page. • Earn extra credit by working on extra number of proteins 3 Inhibitor search • Search Medline by using keywords (protein name + inhibitor). • In case no inhibitor or too few inhibitors can be found from medline, search google or yahoo as an alternative source. • Make sure the protein name and its alternative names (get from Swissprot) are used in the search. • Record inhibitor name and reference source (journal issue, page and publication year; internet source etc.). • Record the number of references checked (including those found from the keyword-search but without inhibitors) • Try to find as many inhibitors as you can. The more known inhibitors you get, the more credit you can earn. 4 Finding chemical information • Search ChemIDPlus database (http://chem.sis.nlm.nih.gov/chemidplus/) for additional information of each inhibitor • Before using ChemIDPlus, download Chime from MDL Click here to search Input inhibitor name Select “Equals” for inhibitor name Select “Contains” for other keywords Click here to select Chime option in ChemIDPlus (Chime is needed to view structure) More Info at: http://sis.nlm.nih.gov/Chem/ChemMain.html 5 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? Click here to get full record Click here to get structure 6 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? 7 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? 8 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? 9 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? 10 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? 11 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? 12 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? Right-click in this field to get 3D structure file 13 ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? Steps for saving the file: • Open a word or text file • Select all text in the source file • Copy (using Ctrl+C) and paste (using Ctrl+V) to the word or text file • Save the word or text file as chem_name.doc or chem_name.txt 14 Generating 3D structure from 2D structure Use of ViewerPro software Save 2D structure as a MDL file 15 ViewerPro software Load 2D structure MDL file (viracept.mol) Save 2D structure as a MDL file 16 ViewerPro software Save to 3D structure: select File => Save as (pdb format) Save 2D structure as a MDL file 17 ViewerPro software Re-load pdb file and clean structure: select Modify => Clean Structure (click here 10 times) Save 2D structure as a MDL file 18 ViewerPro software Save as pdb file: select File => Save as (pdb file) Save 2D structure as a MDL file 19 Database construction Information needed to construct database Inhibitor Name Aspirin Protein Target Cox I (SwissProt accession number) [1] CAS Number 50-78-2 [2] Formula C9-H8-O4 [2] Systematic Name O-Acetylsalicylic acid, Salicylic acid, acetate Synonyms 2-(Acetyloxy)benzoic acid, 2-Acetoxybenzoic acid [2], A.S.A. empirin [2] Classification Analgesic, Anti-inflammatory agents, non-steroidal Notes: The prototypical analgesic used in the treatment of mild to moderate pain. It has anti-inflammatory and antipyretic properties and acts as an inhibitor of cyclooxygenase which results in the inhibition of the biosynthesis of prostaglandins. Aspirin also inhibits platelet aggregation and is used in the prevention of arterial and venous thrombosis. (From Martindale, The Extra Pharmacopoeia, 30th ed,p5) References: 1:OMIM 176805 (http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?176805) 2:ChemIDplus Database (http://chem.sis.nlm.nih.gov/chemidplus/) 3D structure aspirin.pdb 20 Database construction Database development 21 Database construction Database development 22 Database construction Database development 23 Database Construction Database Definition: • A Database is a shared collection of Inter-Related data designed to meet the needs of multiple types of users and applications. This implies that multiple user VIEWS can be defined • Data stored is independent of the programs which use it • Data is structured to provide a basis for future applications DATABASE = Stored Collection of Related Data May be physically distributed 24 Database construction Database development 25 Database Construction Data/Information A General Definition: DATA - raw (unprocessed or partly processed) facts which represent the state of entities (things) which have occurred INFORMATION - data which has been processed into a form USEFUL TO THE USER What is Information to one user may be Data to another user. 26 Database construction Database development 27 Database Models - Hierarchical owner/parent owner child /parent member child child/parent 28 Database Models - Network ‘set’ of data owner member owner member ‘set’ of data Note: Only linked sets can be accessed 29 Database Models - Relational table A table B table C table D table E Any table(s) can be joined to any other table(s), provided there is a means of effecting the join Primary key / Foreign key concept. Data redundancy No fixed linkages 30 Example: 2 Relations in a Database EMPNUM 3 7 11 15 18 NAME JONES SMITH ADAMS NGUYEN PHAN Date of Birth 16-05-1956 23-09-1965 11-08-1972 23-10-1964 16-11-1976 DEPTNUM 605 432 201 314 201 Relation (Table) Name : EMP Relation Schema: EMP(empnum,name,date of birth,deptnum) DEPTNUM 201 314 432 605 DEPTNAME Production Finance Information Systems Administration Relation (Table) Name : DEPT Relation Schema: DEPT(deptnum, deptname) 31 Definition of a Relational Database • A relational database is a collection of relations or two-dimensional tables. Database Table Name: EMP EMPNO ENAME JOB 7839 7698 7782 7566 KING BLAKE CLARK JONES EXECUTIVE MANAGER MANAGER MANAGER DEPTNO 10 30 10 20 Table Name: DEPT DEPTNO 10 20 30 40 DNAME ACCOUNTING RESEARCH SALES OPERATIONS LOC CLAYTON CAULFIELD PENINSULA GIPPSLAND Data Models Model of system in client’s mind Entity model of client’s model Table model of entity model Server Tables on disk Relational Database Management System Server User tables Data dictionary String Matching Based on 1. Analysis of query pattern 2. Auxiliary table 3. Character mapping 1 2 3 4 5 6 7 8 9 10 11 LECTUREROOM L occurs in position 1 E occurs in positions 2 and 7 C occurs in position 3 T occurs in position 4 U occurs in position 5 R occurs in positions 6 and 8 O occurs in positions 9 and 10 M occurs in position 11 Matching starts <Right> character and progressively shifts left on match 36 Database construction Database development: data format 37 Database construction Database development: data languages 38 Database construction Database development: An example of data language 39 Communicating with a RDBMS Using SQL SQL statement is entered SQL> SELECT loc FROM dept; Statement is sent to database Database Data is displayed LOC ------------CLAYTON CAULFIELD PENINSULA GIPPSLAND Database construction Data Definition Language (DDL) used to describe data at the database level Schema level - complete database description Sub-Schema level - user views (restricted) Data Manipulation Language (DML) Provides for Create Update Delete Modify Report Insert Retrieve (extract) Drop Calculation capabilities 41 Using Oracle • SQL – Structured Query Language: DDL, DML, DCL – Universally supported by database servers • SQL*Plus – Oracle command-line SQL interpreter – Account (schema) = independent working space 42 SQL data types Type description Oracle SQL variable-length char. string VARCHAR2(l)a fixed-length char. string CHAR(l) a number NUMBER(p,s)b currency NUMBER(10,2) date DATE a: length. b: length, scale. 43 General Data Types • Used to augment an attribute description and to provide a means of Integrity ’Normal’ data types are : Character (or text) Numeric - Integer, Decimal, Money, Float (in Access Long and Short Integer, Decimal with options of a number of ‘decimal’ places) Date - Standard date format - Access has various forms Logical - Yes/No True/False Exists/Does Not Exist 44 Data Attribute Size In many cases this is set by the Data Definition facilities e.g.. Date, Short Integer, Long Integer, Logical,Number Others are set by the Designer: Number of Characters, Size of a ‘Decimal Number’ Access allows for a Default value Duplicates/ No Duplicates allowed Primary Key nomination Indexing ‘Required’ Status of an Attribute 45 DDL Example CREATE TABLE books ( title VARCHAR2(60), author VARCHAR2(60), isbn NUMBER(10,0) CONSTRAINT pk_books PRIMARY KEY, pub_date DATE DEFAULT SYSDATE ); PRIMARY KEY: column serves as a unique identifier SYSDATE: returns the system’s current date “;” terminates a SQL statement 46 DDL Example CREATE TABLE book_reviews ( isbn NUMBER(10,0) CONSTRAINT fk_books_booksrev REFERENCES books(isbn), reviewer VARCHAR2(30), comments VARCHAR2(150) ); • REFERENCES: value in this column must be one of the values in the isbn column of the table “books”. This is called referential integrity. 47 DML Example INSERT INTO books VALUES( ’The Importance of Being Earnest’, ’Oscar Wilde’, -- this is a comment 9876543210, ’14-FEB-1895’ ); --: Comment. The part from this symbol till the end of the line will be ignored by the database server. ’14-FEB-1895’: Date format. DD-MMM-YYYY or DD-MMM-YY 48 DML Example SELECT * FROM books; SELECT * FROM books WHERE author = ’Oscar Wilde’; *: every column ’Oscar Wilde’: strings are quoted by ‘. If ‘ is part of the string, it is escaped by double it (e.g. string a’b should be ‘a’’b’ in SQL) 49 DML Example SELECT title FROM books WHERE author LIKE ’Oscar%’; You will get: The Importance of Being Earnest LIKE: fuzzy matching (in contrast to =) %: wild character: % represents any string of any length (Osc%r matches Oscar, Osceer…) _ represents any single character (Osc_r matches Oscar, Oscer…) 50 DML Example COMMIT; ROLLBACK; if you have modified the data in database, such as inserted or deleted an entry, you can either: COMMIT to make this change permanent, or ROLLBACK to cancel your operation (the data will look as if you just finished your last COMMIT). 51 Further readings • Oracle TechNet – http://otn.oracle.com/index.html – Free subscription • Online tutorials – http://oldweb.uwp.edu/academic/mis/baldwin/sqlplus.htm – http://www.ilook.fsnet.co.uk/index/oracle.htm – Google for more! 52 Database construction Database development webpage: http://bidd.nus.edu.sg/group/teach/submit/welcome.asp 53 Database construction Database development webpage: http://bidd.nus.edu.sg/group/teach/submit/welcome.asp 54 Database construction Database development webpage: http://bidd.nus.edu.sg/group/teach/submit/welcome.asp 55 Database construction Database development webpage: http://bidd.nus.edu.sg/group/teach/submit/welcome.asp 56 Summary of Today’s lecture • Molecular database development: – Protein inhibitor search – Getting additional info about selected inhibitors (using ChemIDPlus database) – Generating 3D structure (using DS ViewerPro) – Database construction 57