Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Databases
Data must be in a certain format for software to recognize
Every database can have its own format but some data elements are
essential for every database:
1. Unique identifier, or accession code
2. Name of depositor
3. Literature references
4. Deposition date
5. The real data
©CMBI 2008
Quality of Data
SwissProt
• Data is only entered by annotation experts
EMBL, PDB
• “Everybody” can submit data
• No human intervention when submitted;
some automatic checks
©CMBI 2008
SwissProt database
Database of protein sequences
399749 entries (Oct 2008)
Ca. 200 Annotation experts worldwide
Keyword-organised flatfile
Obligatory deposit of in SwissProt before publication
Presently, databases are being merged into UniProt.
©CMBI 2008
Important records in SwissProt (1)
ID
AC
DT
DT
DT
HBA_HUMAN
Reviewed;
142 AA.
P69905; P01922; Q3MIF5; Q96KF1; Q9NYR7;
21-JUL-1986, integrated into UniProtKB/Swiss-Prot.
23-JAN-2007, sequence version 2.
23-SEP-2008, entry version 63.
DE RecName: Full=Hemoglobin subunit alpha;
DE AltName: Full=Hemoglobin alpha chain;
DE AltName: Full=Alpha-globin;
©CMBI 2008
Important records in SwissProt (2)
Cross references section:
Hyperlinks to all entries in other databases which are relevant for the
protein sequence HBA_HUMAN
©CMBI 2008
Important records in SwissProt (3)
Features section:
post-translational modifications, signal peptides, binding sites, enzyme
active sites, domains, disulfide bridges, local secondary structure,
sequence conflicts between references etc. etc.
©CMBI 2008
And finally, the amino acid sequence!
©CMBI 2008
Protein Data Bank (PDB)
Databank for macromolecular structure data (3-dimensional
coordinates). Started ca. 30 years ago (on punched cards!)
Obligatory deposit of coordinates in the PDB before publication
~ 50000 entries (April 2008) ( ~2500 “unique” structures)
PDB file is a keyword-organised flat-file (80 column)
1) human readable
2) every line starts with a keyword (3-6 letters)
3) platform independent
©CMBI 2008
PDB important records (1)
PDB nomenclature
Filename= accession number= PDB Code
Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN)
HEADER
describes molecule & gives deposition date
HEADER
PLANT SEED PROTEIN
30-APR-81
1CRN
CMPND
name of molecule
COMPND
CRAMBIN
SOURCE
organism
SOURCE
ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED
©CMBI 2008
PDB important records (2)
SEQRES
Sequence of protein; be aware: Not always all 3d-coordinates are present
for all the amino acids in SEQRES!!
SEQRES
SEQRES
SEQRES
SEQRES
1
2
3
4
46
46
46
46
THR
ASN
ALA
CYS
THR
VAL
THR
PRO
CYS
CYS
TYR
GLY
CYS
ARG
THR
ASP
PRO
LEU
GLY
TYR
SER
PRO
CYS
ALA
ILE VAL ALA ARG SER ASN PHE
GLY THR PRO GLU ALA ILE CYS
ILE ILE ILE PRO GLY ALA THR
ASN
1CRN
1CRN
1CRN
1CRN
51
52
53
54
SSBOND
disulfide bridges
SSBOND
1 CYS
3
CYS
40
SSBOND
2 CYS
4
CYS
32
©CMBI 2008
PDB important records (3)
and at the end of the PDB file the “real” data:
ATOM
one line for each atom with its unique name and its x,y,z coordinates
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
ATOM
1
2
3
4
5
6
7
8
9
10
11
N
CA
C
O
CB
OG1
CG2
N
CA
C
O
THR
THR
THR
THR
THR
THR
THR
THR
THR
THR
THR
1
1
1
1
1
1
1
2
2
2
2
17.047
16.967
15.685
15.268
18.170
19.334
18.150
15.115
13.856
14.164
14.993
14.099
12.784
12.755
13.825
12.703
12.829
11.546
11.555
11.469
10.785
9.862
3.625
4.338
5.133
5.594
5.337
4.463
6.304
5.265
6.066
7.379
7.443
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
13.79
10.80
9.19
9.85
13.02
15.06
14.23
7.81
8.31
5.80
6.94
1CRN
1CRN
1CRN
1CRN
1CRN
1CRN
1CRN
1CRN
1CRN
1CRN
1CRN
70
71
72
73
74
75
76
77
78
79
80
©CMBI 2008
MRS home page
©CMBI 2008
MRS Search Steps
• Select database(s) of choice
• Formulate your query
• Hit “Search”
• The result is a “query set” or “hitlist”
• Analyze the results
©CMBI 2008
MRS Search options
Simply type your keywords in the keyword field and choose SEARCH.
If you know the fields of the database you are searching in you can
specify your query further
But think about your query first!!
©CMBI 2008
Related documents