Download Curation bottleneck

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Moving beyond free text
Authors
Moving beyond free text
Old Paradigm:
Scientist does research
Scientist publishes research results in journal article
Want:
All genes involved in seed development
(name, species, protein sequence)
Read 3,404 articles???
Read 592,000 articles???
Old Paradigm - extended:
Scientist does research
Scientist publishes research results as free text
manual curation (+ NLP…?)
Results extracted from free text and converted to a
structured format (ontology annotations)
Database
Structured data combined with other data for queries,
further analysis
Example –
Journal article about
gene function
Example –
Journal article about
gene function
The goal: an annotation
that captures the result
Example –
Journal article about
gene function
The goal: an annotation
that captures the result
Manual curation:
Time consuming,
does not scale well
NLP:
Very challenging
Example – phylogenetic treatment
Relatively high degree of structure compared to journal article
May be more amenable to natural language processing but still very
challenging, complex information
http://www.mobot.org/mobot/research/apweb/welcome.html
Scientist does research
Scientist publishes research results as free text
manual curation (+ NLP)
Can we get authors involved?
Results extracted from free text and converted to a
structured format (ontology annotations)
Database
Structured data combined with other data for queries,
further analysis
Scientific Publishers are interested in this problem…
Link to external resource
Scientific Publishers are interested in this problem…
Science Direct: http://www.sciencedirect.com/science/article/pii/S0378111910001502
Scientific Publishers are interested in this problem…
Databases are
interested in this
problem…
Databases are
interested in this
problem…
What if we had a good general tool
for authors to do this themselves?
Example:
Morphological description of species
http://herbarium.usu.edu/webmanual/
Example:
Morphological description of species
http://herbarium.usu.edu/webmanual/
Example:
Mutant phenotype description
PO:0025034 (leaf), PATO:0000599 (decreased width)
PO:0009010 (seed), PATO:0001997 (reduced)
PO:0020003 (ovule), PATO:0000460 (abnormal)
New Paradigm:
Scientist does research
Scientist publishes research results as free text
and as annotations using ontology terms
Benefit to scientist – wider exposure and reuse of results
Benefit to publishers – tagged text allows enhanced
presentation for subscribers
Benefit to research community – Better access to data