Download ppt - IPAW

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Leptospirosis wikipedia , lookup

Pandemic wikipedia , lookup

African trypanosomiasis wikipedia , lookup

Creutzfeldt–Jakob disease wikipedia , lookup

Bovine spongiform encephalopathy wikipedia , lookup

Prion wikipedia , lookup

Surround optical-fiber immunoassay wikipedia , lookup

Transcript
The craft of annotation
Carole Goble
Based on observations of the
PRINTS protein fingerprint
database
Primary & Secondary databases
Primary source generated by experimentalists.
Role: standards, quality thresholds, dissemination
•Sequence databases: EMBL, GenBank
•Increasingly other data types: micro-array
Secondary source derived from repositories,
other secondary databases, analysis and
expertise.
Role: Distilled and accumulated specialist knowledge.
Value added commentary.
•Swiss-Prot, PRINTS, CATH, PAX6, Enzyme, dbSNP…
Role: Warehouses to support analysis over replicated
data
• GIMS, aMAZE, InterPro…
The “Annotation Pipeline”
Analysis
EMBL
Analysis
SwissProt
Analysis
PRINTS
Interpro
BLOCKS
GPCRDB
TrEMBL
Analysis
Annotation Distillation
millions
Expressed Sequence Tags
nrdb
503,479
234,059
TrEMBL
Swiss-Prot
85,661
InterPro
2990
PRINTS
1310
PRINTS
PRINTS - a database of protein family
“fingerprints”
Fingerprints - groups of motifs excised
from alignments
–used to provide diagnostic signatures for
protein families
PRINTS forms basis of derived resources
–e.g., blocks, emotif, InterPro
Used in gene family analysis, genome
annotation, etc.
ID
AC
DE
OS
OC
OC
OX
RN
RP
RX
RA
RT
RL
RN
RP
RX
RA
RT
RL
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
CC
DR
DR
DR
DR
DR
KW
Swiss-Prot
annotation
PRIO_HUMAN
STANDARD;
PRT;
253 AA.
P04156;
MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR).
Homo sapiens (Human).
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
NCBI_TaxID=9606;
[1]
SEQUENCE FROM N.A.
MEDLINE=86300093 [NCBI, ExPASy, Israel, Japan]; PubMed=3755672;
Kretzschmar H.A., Stowring L.E., Westaway D., Stubblebine W.H., Prusiner S.B., Dearmond S.J.
"Molecular cloning of a human prion protein cDNA.";
DNA 5:315-324(1986).
[6]
STRUCTURE BY NMR OF 23-231.
MEDLINE=97424376 [NCBI, ExPASy, Israel, Japan]; PubMed=9280298;
Riek R., Hornemann S., Wider G., Glockshuber R., Wuethrich K.;
"NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231).";
FEBS Lett. 413:282-288(1997).
-!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THE HOST GENOME AND IS
EXPRESSED BOTH IN NORMAL AND INFECTED CELLS.
-!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLED "RODS".
-!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR.
-!- DISEASE: PRP IS FOUND IN HIGH QUANTITY IN THE BRAIN OF HUMANS AND ANIMALS INFECTED WITH
NEURODEGENERATIVE DISEASES KNOWN AS TRANSMISSIBLE SPONGIFORM ENCEPHALOPATHIES OR PRION
DISEASES, LIKE: CREUTZFELDT-JAKOB DISEASE (CJD), GERSTMANN-STRAUSSLER SYNDROME (GSS),
FATAL FAMILIAL INSOMNIA (FFI) AND KURU IN HUMANS; SCRAPIE IN SHEEP AND GOAT; BOVINE
SPONGIFORM ENCEPHALOPATHY (BSE) IN CATTLE; TRANSMISSIBLE MINK ENCEPHALOPATHY (TME);
CHRONIC WASTING DISEASE (CWD) OF MULE DEER AND ELK; FELINE SPONGIFORM ENCEPHALOPATHY
(FSE) IN CATS AND EXOTIC UNGULATE ENCEPHALOPATHY(EUE) IN NYALA AND GREATER KUDU. THE
PRION DISEASES ILLUSTRATE THREE MANIFESTATIONS OF CNS DEGENERATION: (1) INFECTIOUS (2)
SPORADIC AND (3) DOMINANTLY INHERITED FORMS. TME, CWD, BSE, FSE, EUE ARE ALL THOUGHT TO
OCCUR AFTER CONSUMPTION OF PRION-INFECTED FOODSTUFFS.
-!- SIMILARITY: BELONGS TO THE PRION FAMILY.
HSSP; P04925; 1AG2. [HSSP ENTRY / SWISS-3DIMAGE / PDB]
MIM; 176640; -. [NCBI / EBI]
InterPro; IPR000817; -.
Pfam; PF00377; prion; 1.
PRINTS; PR00341; PRION.
Prion; Brain; Glycoprotein; GPI-anchor; Repeat; Signal; Polymorphism; Disease mutation.
gc;
gx;
gn;
ga;
gt;
gp;
bb;
gr;
bb;
gd;
bb;
si;
si;
sd;
sd;
sd;
sd;
sd;
sd;
sd;
bb;
ci;
ci;
cr;
cd;
cd;
cd;
cd;
cd;
cd;
cd;
cd;
cd;
bb;
tp;
KA;
tp;
KA;
tp;
KA;
bb;
tt;
tt;
tt;
tt;
tt;
tt;
tt;
Nude
PRINTS
entry
manual annotation
SUMMARY INFORMATION
------------------37 codes involving
0 codes involving
0 codes involving
0 codes involving
0 codes involving
1 codes involving
0 codes involving
8
7
6
5
4
3
2
elements
elements
elements
elements
elements
elements
elements
COMPOSITE FINGERPRINT INDEX
--------------------------8| 37
37
37
37
37
37
37
37
7|
0
0
0
0
0
0
0
0
6|
0
0
0
0
0
0
0
0
5|
0
0
0
0
0
0
0
0
4|
0
0
0
0
0
0
0
0
3|
1
0
0
0
1
1
0
0
2|
0
0
0
0
0
0
0
0
--+----------------------------------------|
1
2
3
4
5
6
7
8
PRIO_COLGU
P40251
M1
PRIO_GORGO
P40252
M1
PRIO_SHEEP
P23907
M1
PRIO_COLGU
PRIO_MACFA
PRIO_CEREL
PRIO_ODOHE
PRIO_GORGO
PRIO_PANTR
PRIO_HUMAN
PRIO_MACFA
P40254
M1
PRIO_PANTR
P40253
M1
PRIO_CALJA
P40247
M1
MAJOR
MAJOR
MAJOR
MAJOR
MAJOR
MAJOR
MAJOR
PRION
PRION
PRION
PRION
PRION
PRION
PRION
PRIO_CEREL
P79142
M1
PRIO_HUMAN
P04156
M1
PRIO_BOVIN
P10279
M1
PROTEIN
PROTEIN
PROTEIN
PROTEIN
PROTEIN
PROTEIN
PROTEIN
PRIO_ODOHE
P47852
M1
O46648
O46648
M1
PRP2_BOVIN
Q01880
M1
PRECURSOR
PRECURSOR
PRECURSOR
PRECURSOR
PRECURSOR
PRECURSOR
PRECURSOR
(PRP)
(PRP)
(PRP)
(PRP)
(PRP)
(PRP)
(PRP)
SWISS-PROT IDs
(PRP27-30) (PRP33-35C) - COLOBUS GUEREZA.
(PRP27-30) (PRP33-35C) - MACACA FASCICULARIS (CRAB EATING MACAQUE)
- CERVUS ELAPHUS (RED DEER).
- ODOCOILEUS HEMIONUS (MULE DEER) (BLACK-TAILED DEER).
(PRP27-30) (PRP33-35C) - GORILLA GORILLA GORILLA (LOWLAND GORILLA)
(PRP27-30) (PRP33-35C) - PAN TROGLODYTES (CHIMPANZEE)
(PRP27-30) (PRP33-35C) (ASCR) - HOMO SAPIENS (HUMAN).
Low Level Annotation
Prion protein signature
PROSITE; PS00291 PRION_ 1; PS00706 PRION_ 2
BLOCKS; BL00291
PFAM; PF00377 prion
INTERPRO; IPR000817
1. STAHL, N. AND PRUSINER, S. B.
Prions and prion proteins.
FASEB J. 5 2799- 2807 (1991).
Annotation: “High-level”
Semi-structured text-based annotation,
representing the accumulated knowledge of the
biological community about the data entry
Intellectually formed – the accumulated
knowledge of an expert distilling the aggregated
information drawn from multiple data sources and
analyses, and the annotators knowledge.
Culled from other sources such as other
database entries annotations and the literature.
Intended to be human readable rather than
machine processable.
gc;
gx;
gt;
gp;
gp;
gp;
gp;
bb;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
bb;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
PRION
PR00341
Prion protein signature
INTERPRO; IPR000817
PROSITE; PS00291 PRION_1; PS00706 PRION_2
BLOCKS; BL00291
PFAM; PF00377 prion
1. STAHL, N. AND PRUSINER, S.B.
Prions and prion proteins.
FASEB J. 5 2799-2807 (1991).
2. BRUNORI, M., CHIARA SILVESTRINI, M. AND POCCHIARI, M.
The scrapie agent and the prion hypothesis.
TRENDS BIOCHEM.SCI. 13 309-313 (1988).
PRINTS
Annotation
(manual)
3. PRUSINER, S.B.
Scrapie prions.
ANNU.REV.MICROBIOL. 43 345-374 (1989).
Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of animals infected with
certain degenerative neurological diseases, such as sheep scrapie and bovine spongiform encephalopathy (BSE),
and the human dementias Creutzfeldt-Jacob disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP is
encoded in the host genome and is expressed both in normal and infected cells. During infection, however, the
PrP molecules become altered and polymerise, yielding fibrils of modified PrP protein.
PrP molecules have been found on the outer surface of plasma membranes of nerve cells, to which they are
anchored through a covalent-linked glycolipid, suggesting a role as a membrane receptor. PrP is also
expressed in other tissues, indicating that it may have different functions depending on its location.
The primary sequences of PrP's from different sources are highly similar: all bear an N-terminal domain
containing multiple tandem repeats of a Pro/Gly rich octapeptide; sites of Asn-linked glycosylation; an
essential disulphide bond; and 3 hydrophobic segments. These sequences show some similarity to a chicken
glycoprotein, thought to be an acetylcholine receptor-inducing activity (ARIA) molecule. It has been
suggested that changes in the octapeptide repeat region may indicate a predisposition to disease, but it is
not known for certain whether the repeat can meaningfully be used as a fingerprint to indicate susceptibility.
PRION is an 8-element fingerprint that provides a signature for the prion proteins. The fingerprint was
derived from an initial alignment of 5 sequences: the motifs were drawn from conserved regions spanning
virtually the full alignment length, including the 3 hydrophobic domains and the octapeptide repeats
(WGQPHGGG). Two iterations on OWL18.0 were required to reach convergence, at which point a true set comprising
9 sequences was identified. Several partial matches were also found: these include a fragment (PRIO_RAT)
lacking part of the sequence bearing the first motif,and the PrP homologue found in chicken - this matches
well with only 2 of the 3 hydrophobic motifs (1 and 5) and one of the other conserved regions (6), but has an
N-terminal signature based on a sextapeptide repeat (YPHNPG) rather than the characteristic PrP octapeptide.
High level annotation
Prion protein (PrP) is a small glycoprotein
found in high quantity in the brain of
animals infected with certain degenerative
neurological diseases, such as sheep
scrapie and bovine spongiform encephalopathy
(BSE), and the human dementias
Creutzfeldt- Jacob disease (CJD) and
Gerstmann- Straussler syndrome (GSS).
PRINTS Annotation Process
Finger
Print
Process
Blank
Annotation
MEDLINE
OMIM
PRINTS
GRAP
Annotation
gathering
Editorial
culling
Tag
Decor
-ation
Filled
Annotation
Knowledge
SWISS-PROT
heuristics
mapping rules
PRINTS Annotation Process
For all matches to a fingerprint, full SWISS-PROT entry is
retrieved:
tp; PRIO_COLGU
tp; PRIO_GORGO
tp; PRIO_SHEEP
PRIO_MACFA
PRIO_PANTR
PRIO_CALJA
PRIO_CEREL PRIO_ODOHE
PRIO_HUMAN O46648
PRIO_BOVIN PRP2_BOVIN
ID analysis determines if the entry is a super-family, family or
domain
This is essential as influences how the annotation is processed:
tp; URIC_RAT
URIC_MOUSE URIC_RABIT URIC_PAPHA
tp; URIC_PIG
URIC_DROPS URIC_DROME URIC_DROVI
tp; URIC_SOYBN URIC_EMENI URIC_ASPFL URID_CANLI
tp; MUP5_MOUSE LACB_BOVIN LACB_BUBAR LACB_CAPHI
tp; MUP_RAT
RET1_ONCMY RET2_ONCMY PURP_CHICK
tp; RETB_HUMAN ICYA_MANSE ICYB_MANSE CRA2_HOMGA
tp; UROT_HUMAN PLMN_PIG
PLMN_HUMAN PLMN_BOVIN
tp; APOA_HUMAN UROK_HUMAN APOA_MACMU UROK_PIG
tp; THRB_BOVIN HGFL_MOUSE THRB_HUMAN HGFL_HUMAN
PRINTS Annotation Process
ID analysis usually reveals families unambiguously
the comment field helps to resolve super-families from domains
CC
CC
CC
CC
-!-!-!-!-
SIMILARITY: BELONGS TO THE PRION FAMILY
SIMILARITY: BELONGS TO THE URICASE FAMILY
SIMILARITY: BELONGS TO THE LIPOCALIN FAMILY
SIMILARITY: CONTAINS 38 KRINGLE REGIONS
Once entry type established, appropriate precis is constructed
Shared annotation is engineered to provide a report detailing
the function & structure of the protein
the disease(s) with which it is associated
the family to which it belongs
a set of literature references
a list of keywords
Any other remarks
The precis is then fed into a naked pre-PRINTS file.
Output is English.
Swiss-Prot tag
Heuristics
PRINTS tag
--- Comment field
------>
Description
Copy
gt (title)
RAuthor, RTitle,
Rlocation
Common + Filters:
•Top four
- Date priority
•Mixed paper subject portfolio
gr (reference)
Database cross
Reference fields
Common + Filters:
-Preferred links
gp (other databases)
KeyWords
Up to a threshold of common keywords
gd (general annotation)
Function
Majority vote
function
Subcellular location
Majority vote
subcellular location
Disease
Golden vote -Sequence provenance
disease
Similarity tag
Cluster on SWISS-PROT codes
Majority vote for families
Even distribution for superfamilies
and domains
family
Subunit
An indication of structure
subunit (structure)
RP Structure
Paper type classification
- 1 crystallographic
- 1 NMR
structure
-Preferred order
Swiss-Prot Redundancy
OPSD SHEEP
OPSD HUMAN
OPSD MOUSE
DR PRINTS; PR00237; GPCRRHODOPSN.
DR PRINTS; PR00237; GPCRRHODOPSN.
DR PRINTS; PR00237; GPCRRHODOPSN.
OPSD SHEEP
ABSORBING
VISUAL PIGMENTS ARE THE LIGHT-
MOLECULES THAT MEDIATE
VISION
OPSD HUMAN
VISION
OPSD MOUSE
ABSORBING
VISUAL PIGMENTS ARE THE LIGHT- ABSORBING
MOLECULES THAT MEDIATE
VISUAL PIGMENTS ARE THE LIGHTMOLECULES THAT MEDIATE
VISION
Redundancy elimination
ACM1 HUMAN
ACM4 HUMAN
ACM2 HUMAN
Primary transducing effect is
pi turnover.
Primary transducing effect is
inhibition of adenylate cyclase.
Primary transducing effect is
adenylate cyclase inhibition.
Databases: majority vote
Major prion protein precursor (PRP)
PRINTS; PR00341 PRION
PROSITE; PS00291 PRION_ 1; PS00706
PRION_ 2
PFAM; PF00377 prion
INTERPRO; IPR000817
PDB; 1B10; 1AG2
References: date ranking ++
1. CERVENAKOVA, L., [...]
Infectious amyloid precursor gene sequences
in primates used for experimental
transmission of human spongiform
encephalopathy.
PROC. NATL. ACAD. SCI. USA 91 12159- 12162
(1994).
2. LOWENSTEIN, D. H., [...]
Three hamster species with different scrapie
incubation times and neuropathological
features
encode distinct prion proteins.
MOL. CELL. BIOL. 10 1153- 1163 (1990).
3. KALUZ, S., [...]
Disease – Golden Voting.
(PRIO_ HUMAN; P04156): Prp is found in high
quantity in the brain of humans and animals
infected with neurodegenerative diseases
known as transmissible spongiform
encephalopathies or prion diseases [...]
(PRIO_ HUMAN; P04156): Kuru is transmitted
during ritualistic cannibalism, among natives
of the new guinea highlands. [...]
(PRIO_ SHEEP; P23907): Polymorphism at
position 171 may be related to the alleles of
scrapie [...]
PRINTS Annotation Process
Finger
Print
Process
Blank
Annotation
MEDLINE
OMIM
PRINTS
GRAP
Annotation
gathering
Editorial
culling
Tag
Decor
-ation
Filled
Annotation
Knowledge
SWISS-PROT
heuristics
mapping rules
gc;
gx;
gt;
gp;
gp;
gp;
gp;
gp;
gp;
gp;
bb;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
bb;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
PRIO
Major prion protein precursor (PRP) signature
PROSITE; PS00291 PRION_1; PS00706 PRION_2
INTERPRO; IPR000817
PFAM; PF00377 prion
PDB; 1B10; 1AG2
SCOP; 1B10; 1AG2
CATH; 1B10; 1AG2
MIM; 176640; 123400; 137440; 245300; 600072
PRECIS
annotation
1. LOWENSTEIN, D.H., BUTLER, D.A., WESTAWAY, D., MCKINLEY, M.P., DEARMOND, S.J. AND PRUSINER, S.B.
Three hamster species with different scrapie incubation times and neuropathological features encode distinct
prion proteins.
MOL.CELL.BIOL. 10 1153-1163 (1990).
5. RIEK, R., HORNEMANN, S., WIDER, G., GLOCKSHUBER, R. AND WUETHRICH, K.
NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231).
FEBS LETT. 413 282-288 (1997).
The function of prp is not known. Prp is encoded in the host genome and is expressed both in normal and
infected cells.
(PRIO_HUMAN; P04156):
Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases
known as transmissible spongiform encephalopathies or prion diseases, like: creutzfeldt-jakob disease (cjd),
gerstmann-straussler syndrome (gss), fatal familial insomnia (ffi) and kuru in humans; scrapie in sheep and
goat; bovine spongiform encephalopathy (bse) in cattle; transmissible mink encephalopathy (tme); chronic
wasting disease (cwd) of mule deer and elk; feline spongiform encephalopathy (fse) in cats and exotic ungulate
encephalopathy (eue) in nyala and greater kudu. The prion diseases illustrate three manifestations of cns
degeneration: (1) infectious (2) sporadic and (3) dominantly inherited forms. Tme, cwd, bse, fse, eue are all
thought to occur after consumption of prion-infected foodstuffs.
Prp has a tendency to aggregate yielding polymers called "rods".
The structure has been determined, e.g. "NMR characterization of the full-length recombinant murine prion
protein, mPrP(23-231)" [5].
Belongs to the prion family.
Keywords: GPI-anchor; Repeat; Signal; Prion; Brain; Glycoprotein; Polymorphism; Disease mutation; 3D-structure.
PRIO is an 8-element fingerprint that provides a signature for the Major prion protein precursor (PRP). The
fingerprint was derived from an initial alignment of 6 sequences: the motifs were drawn from conserved regions
spanning virtually the full alignment length. Two iterations on SPTR37_9f were required to reach convergence,
at which point a true set comprising 37 sequences was identified. A single partial match was also found:
(PRIO_CHICK; P27177).
gc;
gx;
gt;
gp;
gp;
gp;
gp;
bb;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
gr;
bb;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
gd;
PRION
PR00341
Prion protein signature
INTERPRO; IPR000817
PROSITE; PS00291 PRION_1; PS00706 PRION_2
BLOCKS; BL00291
PFAM; PF00377 prion
1. STAHL, N. AND PRUSINER, S.B.
Prions and prion proteins.
FASEB J. 5 2799-2807 (1991).
Human
annotation
2. BRUNORI, M., CHIARA SILVESTRINI, M. AND POCCHIARI, M.
The scrapie agent and the prion hypothesis.
TRENDS BIOCHEM.SCI. 13 309-313 (1988).
3. PRUSINER, S.B.
Scrapie prions.
ANNU.REV.MICROBIOL. 43 345-374 (1989).
Prion protein (PrP) is a small glycoprotein found in high quantity in the brain of animals infected with
certain degenerative neurological diseases, such as sheep scrapie and bovine spongiform encephalopathy (BSE),
and the human dementias Creutzfeldt-Jacob disease (CJD) and Gerstmann-Straussler syndrome (GSS). PrP is
encoded in the host genome and is expressed both in normal and infected cells. During infection, however, the
PrP molecules become altered and polymerise, yielding fibrils of modified PrP protein.
PrP molecules have been found on the outer surface of plasma membranes of nerve cells, to which they are
anchored through a covalent-linked glycolipid, suggesting a role as a membrane receptor. PrP is also
expressed in other tissues, indicating that it may have different functions depending on its location.
The primary sequences of PrP's from different sources are highly similar: all bear an N-terminal domain
containing multiple tandem repeats of a Pro/Gly rich octapeptide; sites of Asn-linked glycosylation; an
essential disulphide bond; and 3 hydrophobic segments. These sequences show some similarity to a chicken
glycoprotein, thought to be an acetylcholine receptor-inducing activity (ARIA) molecule. It has been
suggested that changes in the octapeptide repeat region may indicate a predisposition to disease, but it is
not known for certain whether the repeat can meaningfully be used as a fingerprint to indicate susceptibility.
PRION is an 8-element fingerprint that provides a signature for the prion proteins. The fingerprint was
derived from an initial alignment of 5 sequences: the motifs were drawn from conserved regions spanning
virtually the full alignment length, including the 3 hydrophobic domains and the octapeptide repeats
(WGQPHGGG). Two iterations on OWL18.0 were required to reach convergence, at which point a true set comprising
9 sequences was identified. Several partial matches were also found: these include a fragment (PRIO_RAT)
lacking part of the sequence bearing the first motif,and the PrP homologue found in chicken - this matches
well with only 2 of the 3 hydrophobic motifs (1 and 5) and one of the other conserved regions (6), but has an
N-terminal signature based on a sextapeptide repeat (YPHNPG) rather than the characteristic PrP octapeptide.
Implications for provenance
Tools used by the service providers can be
sophisticated.
Provenance information may be recorded in
those tools.
But are not passed on into the annotation
(e.g. SWISS-PROT and PRINTS)
•Why?
Implications for provenance
Mining, Aggregating, Distilling, Summarising and
Generating phrases and texts from comment
fields.
Distillation to create compact and
comprehensive summary.
Urge to be non-redundant.
•How to represent the provenance?
•How does the provenance get aggregated?
•How does it get propagated?
•Degrees of evidence -> Degrees of provenance
Implications for provenance
gr; 5. RIEK, R., HORNEMANN, S., WIDER, G., GLOCKSHUBER, R. AND WUETHRICH, K.
gr; NMR characterization of the full-length recombinant murine prion protein, mPrP(23-231).
gr; FEBS LETT. 413 282-288 (1997).
bb;
gd; The function of prp is not known. Prp is encoded in the host genome and is expressed both in normal and
gd; infected cells.
gd;
gd; (PRIO_HUMAN; P04156):
gd; Prp is found in high quantity in the brain of humans and animals infected with neurodegenerative diseases
gd; known as transmissible spongiform encephalopathies or prion diseases, like: creutzfeldt-jakob disease
(cjd),
gd; gerstmann-straussler syndrome (gss), fatal familial insomnia (ffi) and kuru in humans; scrapie in sheep
and
gd; goat; bovine spongiform encephalopathy (bse) in cattle; transmissible mink encephalopathy (tme); chronic
gd; wasting disease (cwd) of mule deer and elk; feline spongiform encephalopathy (fse) in cats and exotic
ungulate
gd; encephalopathy (eue) in nyala and greater kudu. The prion diseases illustrate three manifestations of cns
gd; degeneration: (1) infectious (2) sporadic and (3) dominantly inherited forms. Tme, cwd, bse, fse, eue are
all
gd; thought to occur after consumption of prion-infected foodstuffs.
gd;
gd; Prp has a tendency to aggregate yielding polymers called "rods".
gd;
gd; The structure has been determined, e.g. "NMR characterization of the full-length recombinant murine prion
gd; protein, mPrP(23-231)" [5].
Swiss-Prot
•Inter and Intra provenance
Implications for provenance
Inheritance of errors E.g. SWISS-PROT errors
gd;
gd;
Polymorphism at position 171 may be related to the
alleles of scarpie incubation-control (sic) gene in this species.
Poor quality begates poor quality. E.g. SWISS-PROT annotation poor
or inconsistent
gd; Visual pigments are the light-absorbing molecules that mediate vision.
They consist
gd; of an apoprotein, opsin, covalently linked to cis-retinal. This receptor
is coupled
gd; to the activation of phospholipase c.
gd;
gd; Visual pigments are the light-absorbing molecules that mediate vision.
They consist
gd; of an apoprotein, opsin, covalently linked to cis-retinal. This receptor
is coupled
gd; to the activation of phospholipase c (by similarity).
•How do we record that it’s a copy but its been
corrected and why?
Implications for provenance
Hugely subjective.
e.g. if only one annotation claims that the family
is implicated in a disease, and that annotation was
by a group Terri Attwood respects then it gets
in.
• How to capture that subjectivity and use it
when using the annotation?
•The workflow is complex – how to capture this?
• Its more like argumentation than reproducible
derivation.
Questions, questions …
Where does provenance come from?
–Incidental vs supplied by the scientist, somehow.
What is provenance used for?
–Reliability & quality:
–Justification & audit:
–Reusability, reproducibility & repeatability
–Change & evolution:
–Ownership, security, credit & copyright.
–Identity - LSID
–Immutability
–Migration & storage
–Aggregation
–Versioning
Spares