Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ontology Extraction from Unstructured Text in the Biodiversity Domain Daniele Palazzi Krempser1,2, Ana Maria de C. Moura1, Maria Luiza Machado Campos2, Luiz Gadelha1 1 Brazilian Biodiversity Information System (SiBBr) National Laboratory of Scientific Computing (LNCC) Petrópolis – RJ – Brazil 2 Graduate Program in Informatics (PPGI) Federal University of Rio de Janeiro (UFRJ) Rio de Janeiro – RJ – Brasil {dpalazzi,anamoura,lgadelha}@lncc.br, [email protected] Motivation ● ● The increased availability of high-capacity sensors in various scientific domains is causing an exponential growth in the amount of scientific data generated SiBBr – Sistema de Informação sobre a Biodiversidade Brasileira – Integration with species list – Integration with repositories for publications about biodiversity (ex. Scielo) – Implementation of Semantic Web technologies ● Controlled vocabularies ● Ontologies ● Linked Open Data ● … Ontology Learning ● ● Inform ation Retrieval (IR) is the task of fulfilling a dem and for inform ation from a collection of unstructured docum ents, m ainly text Ontologies are frequently seen as an answer to the problem of sem antic interoperability in current inform ation system s Ontology Learning = Information Retrieval + Ontologies Stages TEXT Tokenization NLTK Stages TEXT Tokenization Remove Stop Words (the, is, at, which, on...) Stages TEXT Tokenization Remove Stop Words Named Entity Extraction NLTK Vegetation NN species NNS living VBG Stages TEXT Tokenization Remove Stop Words Acquisition of Synonyms Named Entity Extraction WordNet Vegetation {flora, botany} Suggests {propose, suggest, advise} Living {life} Stages TEXT Term Frequency Tokenization Acquisition of Synonyms Remove Stop Words Named Entity Extraction Stages TEXT Term Frequency Tokenization Acquisition of Synonyms Remove Stop Words Named Entity Extraction Sort and Select Terms Stages TEXT Term Frequency Sort and Select Terms Tokenization Acquisition of Synonyms Select Nouns Remove Stop Words Named Entity Extraction Select Verbs Stages TEXT Term Frequency Tokenization Acquisition of Synonyms Remove Stop Words Named Entity Extraction Sort and Select Terms Select Nouns Select Verbs Extraction of Relations for Ontologies (ERO) Stages TEXT Term Frequency Tokenization Acquisition of Synonyms Remove Stop Words Named Entity Extraction Sort and Select Terms Select Nouns Select Verbs Extraction of Relations for Ontologies (ERO) Validation with Ontobio Stages TEXT Term Frequency Tokenization Acquisition of Synonyms Remove Stop Words Named Entity Extraction Sort and Select Terms Select Nouns Select Verbs Extraction of Relations for Ontologies (ERO) Validation with Ontobio OWL Use Case Text fragment: Biological diversity, or the shorter "biodiversity," (bio-di-ver-si-ty) simply means the diversity, or variety, of plants and animals and other living things in a particular area or region. For instance, the species that inhabit Los Angeles are different from those in San Francisco, and desert plants and animals have different characteristics and needs than those in the mountains, even though some of the same species can be found in all of those areas. Biodiversity also means the number, or abundance of different species living within a particular region. Scientists sometimes refer to the biodiversity of an ecosystem, a natural area made up of a community of plants, animals, and other living things in a particular physical and chemical environment. Use Case Text fragment: Biological diversity, or the shorter "biodiversity," (bio-di-ver-si-ty) simply means the diversity, or variety, of plants and animals and other living things in a particular area or region. For instance, the species that inhabit Los Angeles are different from those in San Francisco, and desert plants and animals have different characteristics and needs than those in the mountains, even though some of the same species can be found in all of those areas. Biodiversity also means the number, or abundance of different species living within a particular region. Scientists sometimes refer to the biodiversity of an ecosystem, a natural area made up of a community of plants, animals, and other living things in a particular physical and chemical environment. … Future Works ● Generation of axiom s from textual inform ation ● Applications in other area of the Sem antic Web – ● Linked Open Data Use scientific w orkflow m anagem ent system s for coordinating the execution of the differents tools used in the ontology extraction process and for applying it in large scale Thank you! Daniele Palazzi Krempser [email protected]