Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential for every database: 1. Unique identifier, or accession code 2. Name of depositor 3. Literature references 4. Deposition date 5. The real data ©CMBI 2008 Quality of Data SwissProt • Data is only entered by annotation experts EMBL, PDB • “Everybody” can submit data • No human intervention when submitted; some automatic checks ©CMBI 2008 SwissProt database Database of protein sequences 399749 entries (Oct 2008) Ca. 200 Annotation experts worldwide Keyword-organised flatfile Obligatory deposit of in SwissProt before publication Presently, databases are being merged into UniProt. ©CMBI 2008 Important records in SwissProt (1) ID AC DT DT DT HBA_HUMAN Reviewed; 142 AA. P69905; P01922; Q3MIF5; Q96KF1; Q9NYR7; 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. 23-JAN-2007, sequence version 2. 23-SEP-2008, entry version 63. DE RecName: Full=Hemoglobin subunit alpha; DE AltName: Full=Hemoglobin alpha chain; DE AltName: Full=Alpha-globin; ©CMBI 2008 Important records in SwissProt (2) Cross references section: Hyperlinks to all entries in other databases which are relevant for the protein sequence HBA_HUMAN ©CMBI 2008 Important records in SwissProt (3) Features section: post-translational modifications, signal peptides, binding sites, enzyme active sites, domains, disulfide bridges, local secondary structure, sequence conflicts between references etc. etc. ©CMBI 2008 And finally, the amino acid sequence! ©CMBI 2008 Protein Data Bank (PDB) Databank for macromolecular structure data (3-dimensional coordinates). Started ca. 30 years ago (on punched cards!) Obligatory deposit of coordinates in the PDB before publication ~ 50000 entries (April 2008) ( ~2500 “unique” structures) PDB file is a keyword-organised flat-file (80 column) 1) human readable 2) every line starts with a keyword (3-6 letters) 3) platform independent ©CMBI 2008 PDB important records (1) PDB nomenclature Filename= accession number= PDB Code Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN) HEADER describes molecule & gives deposition date HEADER PLANT SEED PROTEIN 30-APR-81 1CRN CMPND name of molecule COMPND CRAMBIN SOURCE organism SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED ©CMBI 2008 PDB important records (2) SEQRES Sequence of protein; be aware: Not always all 3d-coordinates are present for all the amino acids in SEQRES!! SEQRES SEQRES SEQRES SEQRES 1 2 3 4 46 46 46 46 THR ASN ALA CYS THR VAL THR PRO CYS CYS TYR GLY CYS ARG THR ASP PRO LEU GLY TYR SER PRO CYS ALA ILE VAL ALA ARG SER ASN PHE GLY THR PRO GLU ALA ILE CYS ILE ILE ILE PRO GLY ALA THR ASN 1CRN 1CRN 1CRN 1CRN 51 52 53 54 SSBOND disulfide bridges SSBOND 1 CYS 3 CYS 40 SSBOND 2 CYS 4 CYS 32 ©CMBI 2008 PDB important records (3) and at the end of the PDB file the “real” data: ATOM one line for each atom with its unique name and its x,y,z coordinates ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 1 2 3 4 5 6 7 8 9 10 11 N CA C O CB OG1 CG2 N CA C O THR THR THR THR THR THR THR THR THR THR THR 1 1 1 1 1 1 1 2 2 2 2 17.047 16.967 15.685 15.268 18.170 19.334 18.150 15.115 13.856 14.164 14.993 14.099 12.784 12.755 13.825 12.703 12.829 11.546 11.555 11.469 10.785 9.862 3.625 4.338 5.133 5.594 5.337 4.463 6.304 5.265 6.066 7.379 7.443 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 13.79 10.80 9.19 9.85 13.02 15.06 14.23 7.81 8.31 5.80 6.94 1CRN 1CRN 1CRN 1CRN 1CRN 1CRN 1CRN 1CRN 1CRN 1CRN 1CRN 70 71 72 73 74 75 76 77 78 79 80 ©CMBI 2008 MRS home page ©CMBI 2008 MRS Search Steps • Select database(s) of choice • Formulate your query • Hit “Search” • The result is a “query set” or “hitlist” • Analyze the results ©CMBI 2008 MRS Search options Simply type your keywords in the keyword field and choose SEARCH. If you know the fields of the database you are searching in you can specify your query further But think about your query first!! ©CMBI 2008