Download Ontology Extraction from Unstructured Text in the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Island restoration wikipedia , lookup

Habitat conservation wikipedia , lookup

Pharmacognosy wikipedia , lookup

Transcript
Ontology Extraction from
Unstructured Text in the
Biodiversity Domain
Daniele Palazzi Krempser1,2, Ana Maria de C. Moura1,
Maria Luiza Machado Campos2, Luiz Gadelha1
1
Brazilian Biodiversity Information System (SiBBr)
National Laboratory of Scientific Computing (LNCC)
Petrópolis – RJ – Brazil
2
Graduate Program in Informatics (PPGI)
Federal University of Rio de Janeiro (UFRJ)
Rio de Janeiro – RJ – Brasil
{dpalazzi,anamoura,lgadelha}@lncc.br, [email protected]
Motivation
●
●
The increased availability of high-capacity sensors in various scientific
domains is causing an exponential growth in the amount of scientific data
generated
SiBBr – Sistema de Informação sobre a Biodiversidade Brasileira
–
Integration with species list
–
Integration with repositories for publications about biodiversity (ex. Scielo)
–
Implementation of Semantic Web technologies
●
Controlled vocabularies
●
Ontologies
●
Linked Open Data
●
…
Ontology Learning
●
●
Inform ation Retrieval (IR) is the task of fulfilling a dem and for
inform ation from a collection of unstructured docum ents,
m ainly text
Ontologies are frequently seen as an answer to the problem of
sem antic interoperability in current inform ation system s
Ontology Learning =
Information Retrieval + Ontologies
Stages
TEXT
Tokenization
NLTK
Stages
TEXT
Tokenization
Remove Stop
Words
(the, is, at, which, on...)
Stages
TEXT
Tokenization
Remove Stop
Words
Named Entity
Extraction
NLTK
Vegetation NN
species NNS
living VBG
Stages
TEXT
Tokenization
Remove Stop
Words
Acquisition of
Synonyms
Named Entity
Extraction
WordNet
Vegetation {flora, botany}
Suggests {propose, suggest, advise}
Living {life}
Stages
TEXT
Term
Frequency
Tokenization
Acquisition of
Synonyms
Remove Stop
Words
Named Entity
Extraction
Stages
TEXT
Term
Frequency
Tokenization
Acquisition of
Synonyms
Remove Stop
Words
Named Entity
Extraction
Sort and
Select Terms
Stages
TEXT
Term
Frequency
Sort and
Select Terms
Tokenization
Acquisition of
Synonyms
Select Nouns
Remove Stop
Words
Named Entity
Extraction
Select Verbs
Stages
TEXT
Term
Frequency
Tokenization
Acquisition of
Synonyms
Remove Stop
Words
Named Entity
Extraction
Sort and
Select Terms
Select Nouns
Select Verbs
Extraction of
Relations for
Ontologies
(ERO)
Stages
TEXT
Term
Frequency
Tokenization
Acquisition of
Synonyms
Remove Stop
Words
Named Entity
Extraction
Sort and
Select Terms
Select Nouns
Select Verbs
Extraction of
Relations for
Ontologies
(ERO)
Validation
with Ontobio
Stages
TEXT
Term
Frequency
Tokenization
Acquisition of
Synonyms
Remove Stop
Words
Named Entity
Extraction
Sort and
Select Terms
Select Nouns
Select Verbs
Extraction of
Relations for
Ontologies
(ERO)
Validation
with Ontobio
OWL
Use Case
Text fragment:
Biological diversity, or the shorter "biodiversity," (bio-di-ver-si-ty) simply
means the diversity, or variety, of plants and animals and other living things
in a particular area or region. For instance, the species that inhabit Los
Angeles are different from those in San Francisco, and desert plants and
animals have different characteristics and needs than those in the
mountains, even though some of the same species can be found in all of
those areas.
Biodiversity also means the number, or abundance of different species living
within a particular region. Scientists sometimes refer to the biodiversity of an
ecosystem, a natural area made up of a community of plants, animals, and
other living things in a particular physical and chemical environment.
Use Case
Text fragment:
Biological diversity, or the shorter "biodiversity," (bio-di-ver-si-ty) simply
means the diversity, or variety, of plants and animals and other living things
in a particular area or region. For instance, the species that inhabit Los
Angeles are different from those in San Francisco, and desert plants and
animals have different characteristics and needs than those in the
mountains, even though some of the same species can be found in all of
those areas.
Biodiversity also means the number, or abundance of different species living
within a particular region. Scientists sometimes refer to the biodiversity of an
ecosystem, a natural area made up of a community of plants, animals, and
other living things in a particular physical and chemical environment.
…
Future Works
●
Generation of axiom s from textual inform ation
●
Applications in other area of the Sem antic Web
–
●
Linked Open Data
Use scientific w orkflow m anagem ent system s for coordinating the
execution of the differents tools used in the ontology extraction
process and for applying it in large scale
Thank you!
Daniele Palazzi Krempser
[email protected]