Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Organ-on-a-chip wikipedia , lookup
Histone acetylation and deacetylation wikipedia , lookup
Biochemical switches in the cell cycle wikipedia , lookup
Signal transduction wikipedia , lookup
Cellular differentiation wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Annotation extension meeting summary • Held in Hinxton, June 2014 (full day) • Editors, Berkeley, EBI, Pombase, Paint, UCL • Considerable work undertaken before and after meeting by all attendees: Jane Lomax David Osumi-Sutherland Ruth Lovering Rebecca Foulger Pascale Gaudet Aleks Shypitsyna Rachael Huntley Valerie Wood Chris Mungall Annotation extension meeting summary • Meeting took place to discuss – Completion of wiki page examples – inconsistent approaches to the use of relationships – 2 wiki pages summarise meeting • http://wiki.geneontology.org/index.php/Annotation_Extension_R elation_Documentation_Jamboree • http://wiki.geneontology.org/index.php/Annotation_Extension_m eeting_2014-06-16 Annotation Extension Key Points 1. .. if users of our files in their current state do not make any changes to their procedures, they will be unaffected by the addition of annotation extensions 2. Folding creates a new ontology term on the fly, with each extended annotation materializing as a new GO term An OWL reasoner is used to automatically construct the graph in this new ontology Folded/un Gene Name folded (col 2) * Unfolded Folded GO ID/term (col 5) Annotation Extension (col 16) MGI:2448712 results_in_the_formatio GO:0030154 cell differentiation Ren n of (CL:0000540 neuron) MGI:2448712 New GO ID: Ren neuron cell differentiation is_a parent to new GO term: GO:0030154 cell differentiation *Existing annotation Annotation Extension Key Points Folding creates a new ontology term on the fly, with each extended annotation materializing as a new GO term 3. Unfolding replaces existing highly specific GO terms with a more basic GO term, with the specific information now expressed in the extension field Folded/un Gene Name folded (col 2) * * Annotation Extension (col 16) results_in_the_formatio MGI:2448712 Unfolded GO:0030154 cell differentiation n of (CL:0000540 Ren neuron) MGI:2448712 New GO ID: Folded Ren neuron cell differentiation is_a parent to new GO term: GO:0030154 cell differentiation Q9H9Q4 GO:0030183 B cell “Folded” NHEJ1 differentiation Q9H9Q4 results_in_the_formatio “Unfolded” GO:0030154 cell differentiation NHEJ1 n of (CL:0000236 B-cell) *Existing annotation GO ID/term (col 5) QuickGO C16 relationship graph Rachael Huntley, (GOA, EBI), Tony Sawford (GOA, EBI), Eugene Kulesha (Ensembl, EBI) Relationship included in column 16 Domain is the GO term in column 5 Range is the ID in column 16 Gene Name GO ID/term (col 5) (col 2) MGI:2448712 GO:0030154 cell differentiation Ren Annotation Extension (col 16) occurs_in (CL:0000540 neuron) Specific rules on how relationships used QuickGO C16 relationship graph Rachael Huntley, (GOA, EBI), Tony Sawford (GOA, EBI), Eugene Kulesha (Ensembl, EBI) Examples in ‘occurs_in’ wiki Gene GO ID (col 5) Unfolded CASQ2 GO:0051208 sequestering of calcium ion Folded: NEW GO term sequestering of calcium ion in sarcoplasmic reticulum sequestering of calcium ion in CASQ2 cardiac muscle cell Annotation Extension (col 16) occurs_in(GO:0016529 sarcoplasmic reticulum), occurs_in(CL:0000746 cardiac muscle cell) Parent terms of new GO term is_a GO:0016529 sequestering of calcium ion is_a GO:0016529 sequestering of calcium ion is_a GO:0016529 sequestering of calcium ion, is_a New GO term sequestering of calcium ion in sequestering of calcium ion in CASQ2 sarcoplasmic reticulum of cardiac sarcoplasmic reticulum, is_a New muscle cell GO term sequestering of calcium ion in cardiac muscle cell CASQ2 Annotation using annotation extensions (C16) Tony Sawford at EBI 32 relationships Relationship filter in Protein2GO Limits the number of relationships choice Based on the Domain and Ranges in the annotation • The relationships need to be consistently applied because the OWL reasoner is used to fold the terms • Potential for a curator to create an annotation which is folded differently to how they had anticipated it being folded • Many annotation extensions are not folded at all • Inappropriate IDs present in C16 Relationships covered by meeting • • • • • • • • • • • • Has_input Has_direct_input Has_regulation_target Has_output Localization_dependent_on Occurs_at Occurs_in During - removed Happens_during Exists_during Part_of Causually_upstream_of (documentation after meeting) Relationships covered by meeting In addition: • Change in response to ontology domain: is_a children to part_of • Proposal to increase number of relationships • Encourage curators who are new to annotation extensions to start with the following relations: – part_of – occurs_in • Identified areas where the annotation extension is difficult to use due to the complexity of the GO term being extended Completion of wiki page examples • 32 relationships # relationship 8 4 9 11 Wiki page - ✔ ✔ ✔ Usage example Folding example - - ✔ ✔ ✔ 21 relationships need documentation completed Some of the ‘completed’ examples need further discussion Has_input and has_direct_input Lots of issues arose here 1. Use of has_direct_input only when PPI confirmed or predicted based on ortholog PPIs 2. Use of has_direct_input with MF not BP 3. Use of has_input or in_presence_of to specify the chemical in the BP response_to_chemical’ 4. What relationship to use to specify the gene which is a target of a transcription factor? 5. Would more relationships in this area be useful? Use of has_direct_input only when PPI confirmed or predicted based on ortholog PPIs • Unanimous agreement on this • Agreed during annotation calls • QC to be applied? Use of has_direct_input with MF not BP • Not complete agreement In support: – BPs are multistep processes so it is not appropriate to use has_direct_input Against: Note a multistep process such as 'negative regulation of intrinsic apoptotic signaling pathway' should not specify a protein using the – Some BPs are single step processes, eg relationship has_input phosphorylation/methylation etc therefore use of has_direct_input identifies the ‘direct’ target – Looks odd to have: Gene Name (col 2) Endopeptidase A GO ID (col 5) Annotation Extension (col 16) GO:0004175 endopeptidase has_direct_input(substrate of endopeptidase A) activity Endopeptidase A GO:0006508 proteolysis has_input(substrate of endopeptidase A) Use of has_input or in_presence_of to specify the chemical in the BP response_to_chemical’ Not complete agreement • Has_input: Identifies an entity affected by (bound, transported, modified, consumed or destroyed), or a cellular response process involved in the gene product's participation in a molecular function or biological process. – Therefore the ID in C16 can either be changed or can stimulate a cellular response • In_presence_of: Identifies a chemical, gene product or complex in the presence of which an ontology term is observed to apply to the annotated gene product. – This seems to provide a more consistent usage of the relationship – We didn’t have time to fully discuss this RO, Use of has_input or in_presence_of to specify the chemical in the BP response_to_chemical’ • Has_input: Identifies an entity affected by (bound, transported, modified, consumed or destroyed), or a cellular response process involved in the gene product's participation in a molecular function or biological process. – Therefore the ID in C16 can either be changed or can stimulate a It was agreed that ‘has_input’ was a better relationship to use cellular response But didn’t have time to discuss RO ‘in_presence_of’ • In_presence_of: Identifies a chemical, gene product or complex in the presence of which an ontology term is observed to apply to the annotated gene product. – This seems to provide a more consistent usage of the relationship has_input and 'response to' Example: ‘proteolysis [involved] in cellular response to drug. • two has_input relationships: – has_input: drug – has_input: proteolysis target • The drug isn’t an input to the proteolysis. • The proteolysis is part of the cellular response to drug. • Decided to change the child terms of is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. Gene GO ID (col 5) Annotation Extension (col 16) has_direct_input(UNIPROT:Qxxx protein A) part_of peptidase GO:0006508 proteolysis GO:0071396 cellular response to lipid peptidase GO:0071396 cellular response to lipid has_direct_input (ChEBI:XXX cholesterol) What relationship to use to specify the gene which is a target of a transcription factor? DNA binding transcription factor A binds the promoter of Polo2 and increases transcription of Polo2 Gene TF A GO ID (col 5) GO:0044212 transcription regulatory region DNA binding Annotation Extension (col 16) has_direct_input(UNIPROT:Qxxx Polo2) OR C16 could have feature ID to specify the motif eg SO ID or Ensembl gene ID GO:0045944 positive regulation of transcription has_regulation_target(UNIPROT:Qxxx from RNA polymerase II promoter Polo2) Or C16 could have Ensembl gene ID. Would it be appropriate to have feature ID to specify the motif eg SO ID? GO:0001228 RNA polymerase II transcription regulatory region sequence-specific DNA TF A What ‘target’ is included here? binding transcription factor activity involved in positive regulation of transcription TF A Potentially 3 aspects to this term: DNA binding/TF activity/regulation of transcription So although it is ‘obvious’ that Polo2 is the target, what relationship should be used More specific relationship terms? has_input --has_direct_input ----binds ----has_substrate ----transports has_regulation_target ----has_direct_regulation_target has_output --------transports --------has_product Has_regulation_target Domain: biological regulation • it seems redundant to have a regulation GO term with 'regulation' in the annotation extension relationship Agreed to: • continue to use the relationship has_regulation_target when extending 'regulation of BP' GO terms • extension of the MF GO terms such as endopeptidase inhibitor activity should use the relationship 'has_direct_input' – the protein identified included in the annotation extension should be known to bind the protein annotated as an inhibitor. – has_regulation_target should not be used to specify a downstream process regulated by a signaling pathway. Possibly instead use 'causally_upstream_of' localization_dependent_on • This is suitable for BP annotations where A is localizing B, but it shouldn't be used for CC annotations. • This needs some further discussion as this relations is currently only allowed when annotating to CC. • We will also need to discuss in_presence_of and dependent_on (and maybe also requires_substance) at the same time. Occurs_at • Often redundant with occurs_in Conclusion: We'll use occurs_in and occurs_at in the following ways, and redefine the relationships: • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC) • NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations Has_output - proposal • restrict has_output to BP only • For MFs this would be appropriate only where where a catalytic activity can create >1 choice of output • If you find you need to use this for MF either: – bring up your example at annotation call – request a new GO term – Annotation call disagreed with this suggestion Part_of Identifies the cell, tissue, anatomical entity, biological process or developmental stage in which the molecular function or biological process occurs or the cellular component exists • The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell. • Allow use of the RO relation ‘adjacent_to’ for annotation extensions for CC extracellular annotations. When this is done, MGI will need to relook at their ‘extracellular space’ part_of ‘x-cell’ annotations. • Add a restriction to prevent a part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’ (not possible). Should use happens_during During, happens_during, exists_during • Agreed to remove ‘During’ – 150 annotations need revision • Happens_during (BP/MF only) – Identifies a process or life stage during which a molecular function or biological process occurs • Rule now in place to prevent C16 MFs terms using this relationship – To add a PHASE to the annotation extension field of a biological processes the ONLY relationship to use is ‘happens_during’ (not part_of) – To add a biological process to the annotation extension field when the role of the gene product in the process is unclear, but occurs during this specific biological process. Consider the use of the relationship part_of if it is known that the process is definitely part of the biological process. • Exists_during (CC only) – Identifies a process or life stage during which a cellular component is present Errors in use of relationship • Annotations need revision: – Obsolete relationships: eg has_participant, has_downstream_target – Invalid domain eg: stabilizes, has_regulation_target, has_input – Common error: trying to imply too much in one annotation • eg: regulation of signaling pathway has_input protein ID • Problem relationships need discussion: – Known problems: • dependent on/in presence of/activated by/inhibited by – Overlap of options for which relationship to use • transcription – Unknown problems: • Relationships where there is no documentation Future work required • Provide examples for all relationships – Ideally covering all domains the relationship can be applied to – Time scale for this? – Create QCs where possible • Tool required similar to term genie – Ideally creating an ontology graph showing parent relationships for the annotation – to check whether the relationship is appropriate • Filter on browsers to enable filtering via relationship • Are there any relationships that could be used to automatically generate annotations? – Eg kinase activity part_of wnt signaling pathway • GOC create wnt signaling pathway annotation