Download Annotation Extension (col 16)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Organ-on-a-chip wikipedia , lookup

Histone acetylation and deacetylation wikipedia , lookup

Biochemical switches in the cell cycle wikipedia , lookup

Signal transduction wikipedia , lookup

Cellular differentiation wikipedia , lookup

JADE1 wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene regulatory network wikipedia , lookup

List of types of proteins wikipedia , lookup

Transcript
Annotation extension meeting summary
• Held in Hinxton, June 2014 (full day)
• Editors, Berkeley, EBI, Pombase, Paint, UCL
• Considerable work undertaken before and after
meeting by all attendees:
Jane Lomax
David Osumi-Sutherland
Ruth Lovering
Rebecca Foulger
Pascale Gaudet
Aleks Shypitsyna
Rachael Huntley
Valerie Wood
Chris Mungall
Annotation extension meeting summary
• Meeting took place to discuss
– Completion of wiki page examples
– inconsistent approaches to the use of relationships
– 2 wiki pages summarise meeting
• http://wiki.geneontology.org/index.php/Annotation_Extension_R
elation_Documentation_Jamboree
• http://wiki.geneontology.org/index.php/Annotation_Extension_m
eeting_2014-06-16
Annotation Extension Key Points
1. .. if users of our files in their current state do not make any changes
to their procedures, they will be unaffected by the addition of
annotation extensions
2. Folding creates a new ontology term on the fly, with each extended
annotation materializing as a new GO term
An OWL reasoner is used to automatically construct the
graph
in this new ontology
Folded/un Gene Name
folded
(col 2)
* Unfolded
Folded
GO ID/term (col 5)
Annotation Extension
(col 16)
MGI:2448712
results_in_the_formatio
GO:0030154 cell differentiation
Ren
n of (CL:0000540
neuron)
MGI:2448712 New GO ID:
Ren
neuron cell differentiation
is_a parent to new GO term:
GO:0030154 cell
differentiation
*Existing
annotation
Annotation Extension Key Points
Folding creates a new ontology term on the fly, with each extended
annotation materializing as a new GO term
3. Unfolding replaces existing highly specific GO terms with a more
basic GO term, with the specific information now expressed in the
extension field
Folded/un Gene Name
folded
(col 2)
*
*
Annotation Extension
(col 16)
results_in_the_formatio
MGI:2448712
Unfolded
GO:0030154 cell differentiation n of (CL:0000540
Ren
neuron)
MGI:2448712 New GO ID:
Folded
Ren
neuron cell differentiation
is_a parent to new GO term:
GO:0030154 cell differentiation
Q9H9Q4
GO:0030183 B cell
“Folded”
NHEJ1
differentiation
Q9H9Q4
results_in_the_formatio
“Unfolded”
GO:0030154 cell differentiation
NHEJ1
n of (CL:0000236 B-cell)
*Existing annotation
GO ID/term (col 5)
QuickGO C16 relationship graph
Rachael Huntley, (GOA, EBI), Tony Sawford (GOA, EBI), Eugene Kulesha
(Ensembl, EBI)
Relationship included in column 16
Domain is the GO term in column 5
Range is the ID in column 16
Gene Name
GO ID/term (col 5)
(col 2)
MGI:2448712
GO:0030154 cell differentiation
Ren
Annotation Extension (col 16)
occurs_in (CL:0000540 neuron)
Specific rules on how relationships used
QuickGO C16 relationship graph
Rachael Huntley, (GOA, EBI), Tony Sawford (GOA, EBI), Eugene Kulesha
(Ensembl, EBI)
Examples in ‘occurs_in’ wiki
Gene
GO ID (col 5)
Unfolded
CASQ2
GO:0051208 sequestering of
calcium ion
Folded: NEW GO term
sequestering of calcium ion in
sarcoplasmic reticulum
sequestering of calcium ion in
CASQ2
cardiac muscle cell
Annotation Extension (col 16)
occurs_in(GO:0016529
sarcoplasmic reticulum),
occurs_in(CL:0000746 cardiac
muscle cell)
Parent terms of new GO term
is_a GO:0016529
sequestering of calcium ion
is_a GO:0016529
sequestering of calcium ion
is_a GO:0016529 sequestering of
calcium ion, is_a New GO term
sequestering of calcium ion in
sequestering of calcium ion in
CASQ2 sarcoplasmic reticulum of cardiac
sarcoplasmic reticulum, is_a New
muscle cell
GO term sequestering of calcium
ion in cardiac muscle cell
CASQ2
Annotation using annotation extensions (C16)
Tony Sawford at EBI
32 relationships
Relationship filter in Protein2GO
Limits the number of relationships choice
Based on the Domain and Ranges in the annotation
• The relationships need to be consistently
applied because the OWL reasoner is used to fold
the terms
• Potential for a curator to create an annotation
which is folded differently to how they had
anticipated it being folded
• Many annotation extensions are not folded at all
• Inappropriate IDs present in C16
Relationships covered by meeting
•
•
•
•
•
•
•
•
•
•
•
•
Has_input
Has_direct_input
Has_regulation_target
Has_output
Localization_dependent_on
Occurs_at
Occurs_in
During - removed
Happens_during
Exists_during
Part_of
Causually_upstream_of (documentation after meeting)
Relationships covered by meeting
In addition:
•
Change in response to ontology domain: is_a children to part_of
•
Proposal to increase number of relationships
•
Encourage curators who are new to annotation extensions to start with the
following relations:
– part_of
– occurs_in
•
Identified areas where the annotation extension is difficult to use due to the
complexity of the GO term being extended
Completion of wiki page examples
• 32 relationships
# relationship
8
4
9
11
Wiki page
-
✔
✔
✔
Usage example
Folding example
-
-
✔
✔
✔
21 relationships need documentation completed
Some of the ‘completed’ examples need further discussion
Has_input and has_direct_input
Lots of issues arose here
1. Use of has_direct_input only when PPI confirmed or
predicted based on ortholog PPIs
2. Use of has_direct_input with MF not BP
3. Use of has_input or in_presence_of to specify the
chemical in the BP response_to_chemical’
4. What relationship to use to specify the gene which is a
target of a transcription factor?
5. Would more relationships in this area be useful?
Use of has_direct_input only when PPI
confirmed or predicted based on ortholog
PPIs
• Unanimous agreement on this
• Agreed during annotation calls
• QC to be applied?
Use of has_direct_input with MF not BP
• Not complete agreement
In support:
– BPs are multistep processes so it is not appropriate to
use has_direct_input
Against:
Note a multistep process such as 'negative regulation of intrinsic
apoptotic signaling pathway' should not specify a protein using the
– Some BPs are single step processes, eg
relationship has_input
phosphorylation/methylation etc therefore use of
has_direct_input identifies the ‘direct’ target
– Looks odd to have:
Gene Name (col
2)
Endopeptidase A
GO ID (col 5)
Annotation Extension (col 16)
GO:0004175 endopeptidase
has_direct_input(substrate of endopeptidase A)
activity
Endopeptidase A GO:0006508 proteolysis
has_input(substrate of endopeptidase A)
Use of has_input or in_presence_of to specify the chemical in the
BP response_to_chemical’
Not complete agreement
• Has_input: Identifies an entity affected by (bound,
transported, modified, consumed or destroyed), or a
cellular response process involved in the gene product's
participation in a molecular function or biological process.
– Therefore the ID in C16 can either be changed or can stimulate a
cellular response
• In_presence_of: Identifies a chemical, gene product or
complex in the presence of which an ontology term is
observed to apply to the annotated gene product.
– This seems to provide a more consistent usage of the relationship
– We didn’t have time to fully discuss this RO,
Use of has_input or in_presence_of to specify
the chemical in the BP response_to_chemical’
• Has_input: Identifies an entity affected by (bound,
transported, modified, consumed or destroyed), or a
cellular response process involved in the gene product's
participation in a molecular function or biological process.
– Therefore the ID in C16 can either be changed or can stimulate a
It was agreed
that ‘has_input’ was a better relationship to use
cellular response
But didn’t have time to discuss RO ‘in_presence_of’
• In_presence_of: Identifies a chemical, gene product or
complex in the presence of which an ontology term is
observed to apply to the annotated gene product.
– This seems to provide a more consistent usage of the relationship
has_input and 'response to'
Example: ‘proteolysis [involved] in cellular response to drug.
• two has_input relationships:
– has_input: drug
– has_input: proteolysis target
• The drug isn’t an input to the proteolysis.
• The proteolysis is part of the cellular response to drug.
• Decided to change the child terms of is_a ‘response to x’ to part_of
relationships, then we can use the GO term part_of ‘response to x’ in
the extension.
Gene
GO ID (col 5)
Annotation Extension (col 16)
has_direct_input(UNIPROT:Qxxx protein A) part_of
peptidase GO:0006508 proteolysis GO:0071396 cellular response to lipid
peptidase GO:0071396 cellular
response to lipid
has_direct_input (ChEBI:XXX cholesterol)
What relationship to use to specify the gene
which is a target of a transcription factor?
DNA binding transcription factor A binds the promoter of Polo2 and
increases transcription of Polo2
Gene
TF A
GO ID (col 5)
GO:0044212 transcription regulatory region
DNA binding
Annotation Extension (col 16)
has_direct_input(UNIPROT:Qxxx
Polo2)
OR C16 could have feature ID to specify the motif eg SO ID or Ensembl gene ID
GO:0045944 positive regulation of transcription has_regulation_target(UNIPROT:Qxxx
from RNA polymerase II promoter
Polo2)
Or C16 could have Ensembl gene ID.
Would it be appropriate to have feature ID to specify the motif eg SO ID?
GO:0001228 RNA polymerase II transcription
regulatory region sequence-specific DNA
TF A
What ‘target’ is included here?
binding transcription factor activity involved in
positive regulation of transcription
TF A
Potentially 3 aspects to this term: DNA binding/TF activity/regulation of transcription
So although it is ‘obvious’ that Polo2 is the target, what relationship should be used
More specific relationship terms?
has_input
--has_direct_input
----binds
----has_substrate
----transports
has_regulation_target
----has_direct_regulation_target
has_output
--------transports
--------has_product
Has_regulation_target
Domain: biological regulation
• it seems redundant to have a regulation GO term with 'regulation' in
the annotation extension relationship
Agreed to:
• continue to use the relationship has_regulation_target when extending
'regulation of BP' GO terms
• extension of the MF GO terms such as endopeptidase inhibitor activity
should use the relationship 'has_direct_input'
– the protein identified included in the annotation extension should be known to bind
the protein annotated as an inhibitor.
– has_regulation_target should not be used to specify a downstream process
regulated by a signaling pathway. Possibly instead use 'causally_upstream_of'
localization_dependent_on
• This is suitable for BP annotations where A is localizing B,
but it shouldn't be used for CC annotations.
• This needs some further discussion as this relations is
currently only allowed when annotating to CC.
• We will also need to discuss in_presence_of and
dependent_on (and maybe also requires_substance) at
the same time.
Occurs_at
• Often redundant with occurs_in
Conclusion: We'll use occurs_in and occurs_at in
the following ways, and redefine the relationships:
• OCCURS_IN: All the parts of the process is
contained within (CL, UBERON, GO-CC).
OCCURS_AT: Adjacent to or in the vicinity of.
(SO or GO-CC)
• NB, because the definition of membrane includes
the intrinsic and extrinsic components, you would
use ‘occurs_in’ for membrane annotations
Has_output - proposal
• restrict has_output to BP only
• For MFs this would be appropriate only where
where a catalytic activity can create >1 choice of
output
• If you find you need to use this for MF either:
– bring up your example at annotation call
– request a new GO term
– Annotation call disagreed with this suggestion
Part_of
Identifies the cell, tissue, anatomical entity, biological process
or developmental stage in which the molecular function or
biological process occurs or the cellular component exists
• The extracellular terms need a bit of work because there’s some
annotations at the moment to ‘extracellular matrix’ part_of x_cell.
Where logically you can’t have an extracellular space that is part of a
cell.
• Allow use of the RO relation ‘adjacent_to’ for annotation extensions for
CC extracellular annotations. When this is done, MGI will need to
relook at their ‘extracellular space’ part_of ‘x-cell’ annotations.
• Add a restriction to prevent a part_of relation between a GO process
and a ‘cell cycle phase ; GO:0022403’ in C16’ (not possible). Should
use happens_during
During, happens_during, exists_during
• Agreed to remove ‘During’
– 150 annotations need revision
• Happens_during (BP/MF only)
– Identifies a process or life stage during which a molecular function or
biological process occurs
• Rule now in place to prevent C16 MFs terms using this relationship
– To add a PHASE to the annotation extension field of a biological
processes the ONLY relationship to use is ‘happens_during’ (not part_of)
– To add a biological process to the annotation extension field when the role
of the gene product in the process is unclear, but occurs during this
specific biological process. Consider the use of the relationship part_of if it
is known that the process is definitely part of the biological process.
• Exists_during (CC only)
– Identifies a process or life stage during which a cellular component is
present
Errors in use of relationship
• Annotations need revision:
– Obsolete relationships: eg has_participant,
has_downstream_target
– Invalid domain eg: stabilizes, has_regulation_target, has_input
– Common error: trying to imply too much in one annotation
• eg: regulation of signaling pathway has_input protein ID
• Problem relationships need discussion:
– Known problems:
• dependent on/in presence of/activated by/inhibited by
– Overlap of options for which relationship to use
• transcription
– Unknown problems:
• Relationships where there is no documentation
Future work required
• Provide examples for all relationships
– Ideally covering all domains the relationship can be applied to
– Time scale for this?
– Create QCs where possible
• Tool required similar to term genie
– Ideally creating an ontology graph showing parent relationships for
the annotation
– to check whether the relationship is appropriate
• Filter on browsers to enable filtering via relationship
• Are there any relationships that could be used to
automatically generate annotations?
– Eg kinase activity part_of wnt signaling pathway
• GOC create wnt signaling pathway annotation