Download PattArAn – From Annotation Triplets to Sentence Fingerprints

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetically modified crops wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Gene therapy wikipedia , lookup

Essential gene wikipedia , lookup

Genetic engineering wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

NEDD9 wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Public health genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gene nomenclature wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene expression programming wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
PattArAn – From Annotation
Triplets to Sentence Fingerprints
The PattArAn Team at the University of Maryland,
the University of Iowa, and St. Bonaventure University
Motivation
 Scientific concepts are annotated with controlled vocabulary (CV) terms
from ontologies such as Gene Ontology (GO) and Plant Ontology (PO).
 Our Arabidopsis specific tool - Patterns in Arabidopsis Annotation
(PattArAN) will focus on pattern creation from annotation knowledge of
(gene, GO, PO) triplets and triplet validation using the scientific literature.
 PattArAn will help scientists to scour the literature, to understand the
connection to the annotation evidence and biological knowledge, and to
develop hypotheses.
Gene-GO-PO Triplets
 GO and PO combinations centered on a gene.
 Documents supporting annotations identified and collected.
Goals:
(1) Explore new research ideas in three areas of interests using PattArAn.
(2) Build a gold standard dataset using manual annotation of triplet
fingerprints.
 Area 1: regulation of flower and fruit development by genes and
signal pathways. (e.g., genes TSO1, TSO2, MSI1)
 Area 2: signal transduction of the plant hormone ethylene.
(e.g., genes ETR1, ERS1, ETR2)
 Area 3: integration of metabolite transporters with plant growth,
development and survival. (e.g., genes AtCHX17, AtNHX1, AtKEA2)
Document Annotation Guidelines
Observations
Area1 Area2
Area3
# triplets in document set (8 documents)
32
14
14
Found In Full-Text:
# triplets w/ at least 1 sentence
1
11
6
# triplets w/ all 3 doublets in at least 1 sentence each
0
1
0
# triplets w/ only 2 doublets in at least 1 sentence
24
57
5
# triplets w/ only 1 doublet in at least 1 sentence
51
58
54
# triplets found
31
3
8
# doublets found
8
34
69
Found In Supplementary Data:
 Annotations: Triplets represented by sentences to varying degrees.
Supplementary material quite rich. Doublets have most potential.
 Knowledge Underlying Triplets: Annotations of document
(16399800) well explain a biological process of Arabidopsis thaliana. The
TSO2 gene relates to cell division by controlling dNTPs balance. All
annotating GOs link through the function of TSO2. Also TSO2 is
expressed in the organs mentioned in the POs. Thus, this paper nicely
links the PO terms and GO terms.
 Cross-document inference: Document 9880378 indicates that the
redox gene AtCB5-D is expressed at varying levels across plant tissues.
Document 17028151 indicates that upon infection with Pseudomonas
syringae, expression levels drop significantly in Arabidopsis leaves. This
process is one aspect of a complex, genome wide response to bacterial
infection involving many genes.
 Inferred Triplet: Using doublets in document (18305484) we may
infer that: “The plasma membrane protein SLAC1 is essential for stomatal
closure in response to CO2, abscisic acid, ozone, light/dark transitions,
humidity change, calcium ions, hydrogen peroxide and nitric oxide.” This
is interesting as it is describes a single protein that is involved in many
responses due to various environmental signals.
Summary
Using our triplets we could identify connections between a specific area to
other fields in biology in under four weeks. Interesting also to see how
biologists’ genes of interest may function in concert to influence different
bioprocesses. This well serves as the beginning of an exploration that
may eventually lead to new hypotheses and discoveries.
Future Work
• Check inter-annotator agreement.
• Extract gene interaction sentences in the context of our annotation
triplets.
• Develop algorithms to rank sentences by importance with this gold
standard data.