Download No Slide Title - School of Electrical Engineering and Computer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SahysMod wikipedia , lookup

Soil horizon wikipedia , lookup

Soil respiration wikipedia , lookup

Nitrogen cycle wikipedia , lookup

Agroecology wikipedia , lookup

Soil compaction (agriculture) wikipedia , lookup

Crop rotation wikipedia , lookup

Human impact on the nitrogen cycle wikipedia , lookup

Soil food web wikipedia , lookup

Terra preta wikipedia , lookup

Cover crop wikipedia , lookup

No-till farming wikipedia , lookup

Tillage wikipedia , lookup

Soil salinity control wikipedia , lookup

Plant nutrition wikipedia , lookup

Canadian system of soil classification wikipedia , lookup

Soil contamination wikipedia , lookup

Soil microbiology wikipedia , lookup

Sustainable agriculture wikipedia , lookup

Pedosphere wikipedia , lookup

Transcript
Overview of Research
- Computational Terminology
- Knowledge extraction from Text
- Study of causal relation
- Corpus building
- Uncertainty
- Computer Assisted Language Learning (CALL)
- Interdisciplinary project on French Second Language
- Text understanding
- From speech to sentence
CLiNG - May 24 2002
SeRT - a tool for knowledge extraction from text
Caroline Barrière
School of Information Technology and Engineering
University of Ottawa
Ottawa, Ontario, Canada
[email protected]
CLiNG - May 24 2002
A few questions...
- Why knowledge extraction from text?
For building a Knowledge Base...
- What’s a Knowledge Base?
It depends who defines it....
- From a terminological standpoint:
A static repository of domain-specific knowledge,
giving the important concepts and their relations.
- What kind of relations?
Hyperonymy (is-a), meronymy (part-of), synonymy,
function, definition, causality
- Why start from text?
What are the alternatives?
CLiNG - May 24 2002
Semantic Relations in Text (SeRT)
- Goal : Starting from a corpus of texts on a specific domain,
capture and store the important concepts (terms)
of that domain, as well as their relations.
- Hypothesis
- definitions can be derived from text analysis
- text is used as language and meta-language
- paradigmatic relations can be found in texts by pattern search
- present knowledge representation formalism allow the
representation of this information
CLiNG - May 24 2002
Example of a pattern search for hyperonymy (Corpus on Composting)
In clay soils, organic materials such as compost and pine bark increase drainage and air
space.
Some yard wastes, such as wood chips, are very difficult to compost fully and are therefore
not suitable for incorporation into the soil.
Grass clippings and other green vegetation tend to have a higher proportion of nitrogen
(and therefore a lower C/N ratio) than brown vegetation such as dried leaves or wood
chips.
To help meet that requirement, North Carolina passed l law that prohibits depositing
organic yard wastes such as leaves, grass clippings, or tree trimmings in the state's
landfills.
Table 2. Semantic relation hypernym found through the pattern such as and and other
CLiNG - May 24 2002
SeRT - Features
- parallel search of terms and relations
- term extraction
- search for surface patterns leading to semantic relations
- focus on user interaction (nothing fully automatic)
- term selection and validation
- user definition of surface patterns corresponding to semantic relations
- user selection of concepts involved (tuple) in the semantic relation
- raw text used (no preprocessing necessary)
- easy access to KB : save and retrieval
- to be used in “bootstrapping” mode
CLiNG - May 24 2002
Term extraction
- Usage of a stop list
a, able, about, above, according, accordingly, across, actually …
- appropriate method for English (but maybe not for French)
satellite link - liaison par satellite
laser printer - imprimante au laser
communication network - réseau de communication
- no syntactic analysis
- different from:
Daille 1994: linguistic patterns (French)
Bourigault 1994: morpho-syntactic markers (French)
- lemmatization
'moving quickly'  ‘mov[ing] quick[ly]  'mov* quick*
CLiNG - May 24 2002
Results
- Corpus on “composting”
- Terms
503
373
258
202
170
155
142
110
103
102
100
92
83
compost
pile
composting
soil
materials
material
nitrogen
compost pile
water
bin
time
leaves
bacteria
402
369
199
187
149
146
133
105
102
96
95
94
85
compost
pile
soil
composting
material
materials
nitrogen
compost pile
bin
time
water
Compost
leaves
CLiNG - May 24 2002
402
369
295
260
199
133
105
105
102
96
95
95
94
compost
pile
materi*
compost*
soil
nitrogen
compost pile
temperatur*
bin
time
leav*
water
Compost
CLiNG - May 24 2002
Search for patterns indicating semantic relations
- pre-encoded patterns (earlier work - Barrière 1997)
- find list from all other authors
- pattern search has multiple possibilities:
- string matching
- lemmatized token matching
- part of speech matching
- inclusion of a dictionary look-up
(derived from Collins + morphological rules added)
- possibility of searching for a pattern around 1 term
- usually what Computational Terminologists want to do
- display limited or enlarged context
CLiNG - May 24 2002
Example of search patterns
Hyperonymy
such as
(string matching)
and other *|n
(string + POS)
includ* *|n
(lemmatized string + POS)
*|n is a *|a of
[~part]
(negative filter)
*|y organic materi*
[mostly, especially, specifically]
(positive filter) + (search with specific term)
Synonymy
known as
also called
(string matching)
(string matching)
Meronymy
contains *|n
is a *|a part of
(string + POS)
(string + POS)
CLiNG - May 24 2002
CLiNG - May 24 2002
regular dictionary:
77,000
(1046 kb)
26,000
(387 kb)
94,000
(333 kb)
197,000 entries
(1766 kb)
aback,y
abactinally,y
abashedly,y
abdominally,y
abed,y
abhorrently,y
irregular directory:
a',a
ablebodied,a
ablebodieder,a
ablebodiedest,a
abranchial,a
abranchialer,a
abranchialest,a
entries with multiple POS:
roughcast,nv
huggermugger,anvy
broadcast,anvy
ground,anv
like,acnrvy
cut,anv
draft,nv
TOTAL:
CLiNG - May 24 2002
public String[][] inflect =
// plural nouns
{ { "", "s" }, {
"", "es" }, {
"y", "ies" }, {
"an", "en" }, {
"um", "a" }, {
"", "e" }, {
"us", "i" }, {
...
// comparative adjectives
{
"", "er" }, {
"e", "er" }, {
"y", "ier" }, {
"c", "caler" }, {
"", "der" }, {
CLiNG - May 24 2002
CLiNG - May 24 2002
Information storage in the TKB
- transfer of info found at previous step
- user selects the terms (concepts) around the pattern
- semantic relation / pattern / tuple are stored in the TKB
- an uncertainty factor can also be added to the tuple
- research on causal relation has lead to realize
the necessity of this information
- applies to different relations
CLiNG - May 24 2002
Semantic relation extraction
CLiNG - May 24 2002
Results - semantic relations
- Exploration of a few patterns
- contain? (meronymy)
- such as & and other (hypernymy)
Fresh, young weeds from your irrigated garden can contain 60-70% moisture - no
need to add water to them.
Leaves from eucalyptus, walnuts, and laurel trees contain tannins.
Every piece of organic material contains carbon and nitrogen in differing ratios..
Most compost also contains as much as 2 percent calcium.
Table 1. Semantic relation meronym found through the pattern contain
CLiNG - May 24 2002
In clay soils, organic materials such as compost and pine bark increase drainage and air
space.
Some yard wastes, such as wood chips, are very difficult to compost fully and are therefore
not suitable for incorporation into the soil.
Grass clippings and other green vegetation tend to have a higher proportion of nitrogen
(and therefore a lower C/N ratio) than brown vegetation such as dried leaves or wood
chips.
To help meet that requirement, North Carolina passed l law that prohibits depositing
organic yard wastes such as leaves, grass clippings, or tree trimmings in the state's
landfills.
Table 2. Semantic relation hypernym found through the pattern such as and and other
CLiNG - May 24 2002
tuple
(place 1)
relation < meronym >
tuple (place 2)
relation <hypernym>
tuple (place 1)
tuple (place 2)
60-70%
moisture
young weeds
compost
organic material
tannins
leaves from eucalyptus tree
pine bark
organic material
tannins
leaves from walnut tree
wood chips
yard wastes
tannins
leaves from laurel tree
grass clippings
green vegetation
carbon
organic material
dried leaves
brown vegetation
nitrogen
organic material
wood chips
brown vegetation
calcium
compost
leaves
organic yard wastes
grass clippings
organic yard wastes
tree trimmings
organic yard wastes
Table 3. Possible relations extracted from a text
CLiNG - May 24 2002
Could we infer is-a relations and extend the type hierarchy?
CLiNG - May 24 2002
SeRT use
- Parallel mode
- searching on patterns can suggest terms to be explored
- search on terms can suggest patterns around them
- Bootstrapping mode for relations
- start with one pattern: enhance
- tuplet compost/soil found used to find other patterns
CLiNG - May 24 2002
As a soil amendment, compost is thought to enhance the physical, chemical, and biological
properties of soils.
When worm compost is added to soil, it boosts the nutrients available to plants and enhances soil
structure and drainage.
This discussion is an attempt to enhance your understanding of the conditions which can lead to
odor formation, in the hopes that they can be avoided or at least minimized in the future.
No matter your soil type, your climatic zone, or your choice of crops, composting will enhance your
garden soil, resulting in stronger plants and healthier produce.
Table 4. Sentences containing the verbal pattern enhance
CLiNG - May 24 2002
Before using compost, be sure to study a copy of any soil or waste chemical nutrient analyses,
pesticide and heavy metal analyses, and stability tests that the producer of the compost performed.
When worm compost is added to soil, it boosts the nutrients available to plants and enhances soil
structure and drainage.
How does compost help soil structure?
Some people get around the problem of nitrogen loss by adding bloodmeal to the soil before they
bury the compost materials.
Composting is really quite simple, inexpensive, ecologically sound, and utterly failproof - no matter
what you do, your pile wile eventually rot into soil-enriching compost!
While compost is a panacea for all garden soils, poor soils especially will benefit from consistent
applications.
Table 5. Some examples of the tuple "compost/soil" in the corpus
CLiNG - May 24 2002
Future work
Short term (tool itself)
- Add list of predefined relations & patterns
- Add flexibility in pattern search
- toward a mix of semantic and syntactic search
- Construction of a graphical representation of the semantic network built
CLiNG - May 24 2002
Future work Long term (tool + theoretical background)
- Work on compound nouns
- much implicit information that could be put explicitly in the KB
- Work on representational scheme
- the relational database is too limiting
- causal relation requires a different type of representation
- contexts for expressing the relation (possibly nested)
- uncertainty factors
- inferencing
- Explore pattern search in French
- Batch mode extraction (no user)
- automatic selection of terms around patterns
- after certain terms and patterns have been identified
- need an integration of confidence levels on patterns
CLiNG - May 24 2002