Download 2 - people.vcu.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Microsoft Access wikipedia , lookup

Oracle Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Ingres (database) wikipedia , lookup

Database wikipedia , lookup

Relational algebra wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

ContactPoint wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database model wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Chapter 1
Overview of Research Project
Spatial information can be stored in computers, but it is difficult to search for required
information based purely on the visual information. It is necessary to describe this spatial
information in order to facilitate searching. One aspect of this project is to become familiar
with the problems involved with the storing and representations of this type of data.
In every day usage items are described using subjective wording. For example a person’s
face might be described by saying that they have a big nose, a light complexion and a wide
mouth. The terms "big", "light” and "wide" are not strictly defined, but there is a community
agreement on their meaning. The second aspect of this project is to use fuzzy logic and/or
modal logic to create a system where pictures are described by community-agreed subjective
terms. The hope is that this will allow better searching for spatial information given a query
that allows the use of subjective wording.
Queries that use such subjective wording are not easily stated in SQL. The third aspect of the
project is to automate the translation of natural language queries using this subjective
language into SQL statements. This is the portion of the project addressed by this report.
The complete system, should allow a user to ask a question such as "Find me a man with a
big nose, long hair, dark complexion and blue eyes" and have the system return all the
pictures that meet those criteria.
1
Chapter 2
Natural Language to SQL basics
Introduction
The purpose of a Natural language Interface for a database system is to accept requests in
English and attempt to “understand” them. A natural language interface usually has its own
dictionary. This dictionary contains words related to a database and its relationships. In
addition to this, the interface also maintains a standard dictionary (e.g. Webster’s dictionary).
A natural language interface refers to words in its own dictionary, as well as to the words in
the standard dictionary, in order to interpret a query. If the interpretation is successful, the
interface generates a SQL query corresponding to the natural language request and submits it
to the DBMS for processing; otherwise, a dialogue is started with the user to clarify the
request.
The purpose of this paper is to present an overview of Natural Language Interfaces. The
PRECISE and MASQUE interfaces are discussed. The ELF (English Language Front End)
interface is used to translate the natural language question into SQL for the project described
in this paper. The project makes this SQL available to the fuzzy database research group
which enables them to make changes to the SQL and use it for the fuzzy queries.
The area of NLP research is still very experimental and systems so far have been limited to
small domains, where only certain types of sentences can be used. When the systems are
scaled up to cover larger domains, NLP becomes difficult due to the vast amount of
2
information that needs to be incorporated in order to parse sentences. For example, the
sentence: “The woman saw the man on the hill with a telescope” Could have many different
meanings. To understand what the intended meaning is, we have to take into account the
current context, such as the woman is a witness, and any background information, such as
there is a hill near by with a telescope on it. Alternatively the man could be on the hill, and
the woman may be looking through the telescope. All this information is difficult to
represent, so restricting the domain of an NLP system is a practical way to get a manageable
subset of English to work with.
The standard approach to database NLP systems is well established. This approach creates a
‘semantic grammar’ for each database, and uses this to parse the English question. The
semantic grammar creates a representation of the semantics of a sentence. After some
analysis of the semantic representation, a database query can be generated in SQL or any
other database language.
The drawback of this approach is that the grammar must be tailor-made for each database.
Some systems allow automatic generation of an NLP system for each database, but in almost
all cases there is insufficient information in the database to create a reliable NLP system.
Many databases cover a small domain so that an English question about the data within it can
easily be analyzed by an NLP system. The database can be consulted and an appropriate
response can be generated.
The need for a Natural Language Interface (NLI) to databases has become increasingly
important as more and more people access information through web browsers, PDA’s and
3
cell phones. These people are casual users and it is necessary to have a way that they can
make queries in their own natural language rather than to first learn and then write SQL
queries. But the important point is that NLI’s are only usable if they map natural language
questions to SQL queries correctly.
2.1 Fundamental steps involved in the conversion.
The transformation of a given English query to an equivalent SQL form requires some basic
steps. The workings of all Natural language to SQL software packages deal with these basic
steps in some manner.
First there is a dictionary, where all the words that are expected to be used in any English
question are declared. These words consist of all the elements (relations, attributes and
values) of the database and their synonyms. Then these words are mapped to the database
system. This implies that the meaning of each word needs to be defined.
They may be called by different names in different systems but these two things (i.e.
definition and mapping of the words) form the basis of the conversion. These are domain
dependent modules and have to be there.
4
These are the three basic steps for NL to SQL conversion.
Domain Dependent Modules
English Question
Parser
Lexical dictionary
Parse tree
Semantic dictionary & type
hierarchy
Semantic Interpreter
LQL Query
LQL to SQL Translator
Interface with data
SQL Query
DBMS Tran receiver
DBMS
Query result
Database
Response generator
Figure 2.1[Ref 5]
Architecture of transformation process
5
The architecture of the transformation process is shown in Figure 2.1. Note that the domain
dependent modules (the lexical dictionary, semantic dictionary and interface with data) are
dependant on the data contained in the database. Below is the detailed explanation of each of
these modules.
Lexical dictionary: This holds the definition of all the words that may occur in a question.
The first step of any system is to parse an English question and identify all the words that are
found in the lexical dictionary. The lexical dictionary also contains the synonyms of root
words.
Semantic dictionary: Once the words are extracted from the English question using the
lexical dictionary, they are mapped to the database. The semantic dictionary contains these
mappings. This process transforms the English question to an internal language (LQL for the
architecture shown in Figure 2.1) which is then converted to SQL.
During the mapping process words are attached to each other or to the entities or to the
relations. So the output of this step is a function. For example, consider the question “What
is the salary of each manager”? Here the attributes, salary and manager are attached and so
the output is the function has_salary (salary, manager).
Interface with data: The next step is the conversion of the internal query developed above
to an equivalent SQL statement. This is done somewhat differently by different systems. This
step may be combined with the step above to directly get a SQL statement. There are some
6
predefined rules (depending on the interface) that change the internal language statement into
SQL and so Interface with data contain all those rules.
2.2 MASQUE/SQL
Introduction:
There are many commercial Natural language Query Interfaces available, MASQUE/SQL is
one of them. MASQUE/SQL is a modification of the earlier product MASQUE, which
answered questions by generating Prolog queries [5].MASQUE (Modular Answering System
for Queries in English) was developed at the Artificial Intelligence Applications Institute and
the Department of Artificial Intelligence of the University of Edinburgh. Complex queries
were answered by MASQUE in a couple of seconds. But since it answered queries in prolog,
existing commercial databases were unable to use it and so MASQUE/SQL was developed.
MASQUE/SQL can answer user’s questions almost as quickly as the original MASQUE.
MASQUE/SQL answered most test questions in fewer than six seconds CPU time, including
the time taken by the DBMS to retrieve data from the sample database [5]. One important
point about MASQUE/SQL is that its performance is not significantly affected when using
larger databases. This is because it transforms each user question into a single SQL query,
and the relational DBMS is left to find answers to the query, utilizing its own optimization
and planning techniques. Thus the full power of a relational DBMS is available during the
question answering and therefore the system is expected to scale up easily to larger
databases. The drawback of larger databases is that dictionary gets bigger and complicated.
The parsing time also increases.
7
Working of MASQUE/SQL:
MASQUE/SQL follows the same basic architecture as MASQUE. It starts with the building
of the lexicon. This is done using a built-in domain-editor. The built-in domain-editor helps
the user to declare words expected to be in a question.
The second step, as per the general architecture, is building the semantic dictionary. This is
also done by the built-in domain editor. It allows the user to define the meaning of each
declared word in terms of a logic predicate. The logic predicate defines the mapping of a
word to the database. An example is given in Figure 2.2
Person
customer
Staff
salesperson
manager
Feature
numeric
technician
age
non-numeric
salary
name
address
Figure 2.2 (is-hierarchies)
Figure 2.2 shows two is-hierarchies. The one on the left is for a set of entities and the right
hand one tells about the properties of entities. In the database there is a relationship defined
between these properties and entities. For instance, if salary is related to staff and age to
person,
then
there
are
logic
predicates
such
as
has_salary(Salaries,Staff)
and
has_age(age,Person). There are also predicates such as is_manager(M) to tell whether M is a
manager or not. Now when the noun “manager” (as in the query “Is PG a manager?”)is
encountered, it is mapped to the predicate is_manager(M) and similarly the meaning of the
8
noun “salary” (as in “What is the salary of each manager?”) is expressed as the predicate
has_salary(Sal,Stf). Here the predicate has two attributes. The first is salary, and since only
staff can have salary (from the relationship in the database), ‘Stf’(signifies staff) is the
second attribute. Anyone who is not a staff cannot replace this attribute. So ‘customer’
cannot be used.
There are two types of predicates used in MASQUW/SQL.
1.
Type A - Predicates that show relationship (mapping) between individual
entities. For example has_salary links Staff and Salary.
2.
Type B - Predicates that express relations linking entity-sets to entities. For
example the meaning of “average” as in “What is the average age of the managers?”
is av(Aver, Set), where set stands for a set of numeric entities, and aver stands for a
numeric entity corresponding to the average of Set.
These logic predicates form the subparts of a Prolog-like LQL (Logical Query Language).
This LQL is the internal language which will ultimately be converted to SQL. For example:
“What is the salary of each manager?” is translated as follows in LQL:
Answer ([S, M]):- is_manager(M), has_salary(S,M)
The process is as follows. When an English query is entered, the parser extracts the words
that occur in the lexical dictionary (in this case “Salary” and “Manager”). Next the semantic
dictionary gives the logic predicates for these two nouns, is_manager(M) for “manager” and
has_salary(S,M) for “Salary”.
9
The last step is to convert the LQL into a SQL query that can be executed by the DBMS.
This conversion process is explained with the help of the example given in figure 2.3.
Given the English Query “What is the average age of the managers?”, the following LQL is
generated:
Answer ([Aver]):Setoff (Age:Mng,
(is_manager(Mng),
Has_age(Age, Mng)),
Ages_list)
Av (Aver, Ages_list)
Figure 2.3
The translation algorithm uses a binding structure, which maps LQL variables to SQL code
fragments. Whenever a new LQL variable is encountered, a suitable SQL fragment is stored
as the binding of that variable. When the variable is reencountered, its binding is used to
create a WHERE condition. Returning to the example, the type-A predicate
is_manager(Mng) is translated into the SQL shown in Figure 2.4.
SELECT *
FROM is_manager#1 rel1
Figure 2.4
10
and rel1.arg1 becomes the binding of Mng. Then has_age(Age,Mng) is processed. The
binding of Mng (i.e. rel1.arg1) is used to create a WHERE condition and rel2.arg1 becomes
the binding of Age (see figure 2.5).
SELECT *
FROM has_age#2 rel2
WHERE rel2.arg2 = rel1.arg1
Figure 2.5
To obtain the final SQL query both of these sub queries need to be combined. To do this
conjunction list is formed. To translate this conjunction list, the FROM and WHERE parts of
the translations of the conjuncts are merged. In the example, the second argument of setoff
(Figure 2.3) is as shown in Figure 2.6
SELECT *
FROM is_manager#1 rel1,
has_age#2 rel2
WHERE rel2.arg2 = rel1.arg1
Figure 2.6
The processing of the overall setof instance gives the binding of Ages list shown in Figure
2.7.
11
SELECT rel2.arg1, rel1.arg1
FROM is_manager#1 rel1,
has_age#2 rel2
WHERE rel2.arg2 = rel1.arg1
Figure 2.7
The SELECT part from Figure 2.7 is generated by observing the _rst argument of setof
(Age:Mng) and by using the bindings of Age (rel2.arg1) and Mng (rel1.arg1). The type-B
predicate avg is linked to the pseudo-SQL query given in Figure 2.8.
SELECT avg(first)
FROM pair_set
Figure 2.8
Consulting the SELECT part of the binding of Ages list, the first can be associated to
rel2.arg1, and the second to rel1.arg1. Processing the avg instance causes the binding of
Aver to become as shown in Figure 2.9.
12
SELECT avg(rel2.arg1)
FROM is_manager#1 rel1,
has_age#2 rel2
WHERE rel2.arg2 = rel1.arg1
Figure 2.9
where the FROM and WHERE parts are the same as in the binding of Ages list. The
translation of the full LQL query is the binding of Aver.
2.3 PRECISE
PRECISE is a natural language interface developed at University of Washington,
Seattle, WA. The PRECISE Natural Language Interface to Databases is designed on the
principle that it should guarantee the correctness of its output, or else indicate that it does not
understand the input question. PRECISE is the first system that has formal guarantees on the
soundness & completeness of a NLI [2]. A demo of PRECISE is available at the University
of Washington website [1]. The database used for this demo is an Airline Travel Information
Service (ATIS). In this demo, Precise answers questions about the ATIS database. This
database contains information on flight schedules, fares, ground transportation, airports, and
airplanes. For example asking,”Show me all flights from Seattle to Boston”, will give the
result is given along with the SQL generated. However the PRECISE interface is not
available for commercial use and so it was not tested for this project.
13
A database is made up of three types of elements: a relation, attribute and value. An attribute
is a particular column in a particular relation. Each value element is the value of a particular
attribute. The words that usually appear in questions (what, which, where, who, when) are
known as Wh-words. These words help in identifying attributes in questions. The set of word
stems that matches a database attribute are called Tokens.
For Example: {require, experience} and {need, experience} could refer to the attribute
Required Experience. There are two types of tokens - value tokens and attribute tokens. If the
token corresponds to a value in the database then it is a value token. Similarly if the token
corresponds to a attribute in a database then it is a attribute token. This token system helps
match database elements with values.
A situation where there is a set of tokens such that every word in the question appears in
exactly one token is known as complete tokenization of a question. PRECISE uses
Attachment function which maps pairs of tokens to TRUE or FALSE (tells whether two given
tokens are attached). For example, consider the question “What French restaurants are
located downtown?” In this example the tokens located and downtown are attached, while the
tokens what and downtown are not.
A valid mapping from a complete sentence tokenization to a set of database elements has the
following characteristics:
- each token matches a unique database element
- each attribute token is attached to a unique value token
- a relation token is attached to either an attribute token or value token
14
A question is semantically tractable if it has at least one complete sentence tokenization that
has only distinct tokens, and has at least one value token that matches a Wh-word, and the
question results in a valid mapping. Examples are given in Figures 2.10 and 2.11.
What French restaurants are located downtown?
- “What” is a Wh-word that corresponds to “restaurant”
- the relation token “restaurant” refers to a Restaurants relation with
attributes Name, Cuisine, & Location
- the value token “French” is paired with the attribute token “Cuisine”
In this case, Cuisine is called an implicit attribute
- the value token “Downtown” is paired with the attribute token
“Location”
Figure
2.10 attribute.
In this case, Location is called
an explicit
What French restaurants are located downtown?
Examples of what attachment function does:
- the tokens “located” & “downtown” are attached
- the tokens “what” & “downtown” are not attached
- the relation token “restaurant” is attached to the value token “French”
Figure 2.11
15
How PRECISE works
Given an English question, PRECISE determines whether it is semantically tractable i.e. it
the process to has one complete tokenization and valid mapping. If it is, PRECISE generates
the corresponding SQL query or queries.
PRECISE uses a max-flow algorithm for graph matching problems to find a valid mapping
from the complete sentence tokenization to database elements. For Example, consider “What
are the HP jobs on a UNIX system?” Given a Job relation with attributes Description,
Platform, & Company, PRECISE produces a complete tokenization of the question as shown
in Figure (2.12) .The Syntactic markers (no impact on interpretation of the question) are: are,
the, on, a. The Value tokens are: what, HP, UNIX. The only Attribute token is system. The
only Relation token is job
Figure 2.12 [Ref 4]
An attribute-value graph is constructed as shown in Figure 2.13.
16
Figure 2.13 [Ref 4]
The max-flow algorithm automatically handles ambiguity (i.e., ambiguity caused by “HP”)
and “decides” on the final data flow path indicated by the solid lines in the Figure 2.13.
After all attribute and value tokens have been matched to database elements, the system
checks to make sure all relation tokens correspond to a value token or attribute token. When
multiple relations are involved in a question, a relation flow graph is constructed and the
max-flow algorithm is used in a similar manner.
If all syntactic constraints are satisfied by the resulting value token-attribute token pairings, a
valid mapping has been found and a resulting SQL query is generated.
17
PRECISE System Architecture
Database
PRECISE
Lexicon
Parser
Plug in
Tokenizer
Matcher
Equivalence Checker
Query Generator
English Question
SQL Query Set + Answer Set
Precise System Architecture
Figure 2.14
Figure 2.14 shows the system architecture of PRECISE. The lexicon supports two operations.
First, when given a word stem ws, it returns the set of tokens which contain ws. Second,
when given a token t, it returns the set of database elements matching t. In this way the
names of all database elements are extracted and split into individual words. Each word is
then stemmed and a corresponding set of synonyms is identified using the Lexicon.
The tokenizer’s input is a natural language question and its output is set of all possible
complete tokenizations of a question. In the next step the problem of finding a semantic
18
interpretation of natural language tokens as database elements is reduced to a maximum
matching problem. This is done at Matcher.
Precise then extracts attachment relationship between tokens. For example consider “What
are the capitals of the US state?” The parser enables PRECISE to understand that the token
capital is attached to the tokens state.
The query generator takes the database elements selected by the matcher and generates a
SQL query. The equivalence checker tests whether there are multiple distinct solutions to a
maxflow problem and whether these solutions translate into distinct SQL queries.
2.4 Summary
For a broad class of semantically tractable natural language questions, PRECISE is
guaranteed to map each question to the corresponding SQL query. Experiments conducted
using three databases are discussed in Ref [4]. Several questions are asked to these databases.
It is found that 80% of the questions are semantically tractable questions, which PRECISE
answers correctly. PRECISE automatically recognizes the 20% of questions that it cannot
handle, and requests a rephrase.
19
Chapter 3
Building Lexicon and Semantic Dictionary
While studying Natural Language Interfaces such as MASQUE and Precise it becomes clear
that the building of the Lexicon and the Semantic dictionary are the soul of the
transformation. This chapter discusses how to make a Lexicon and Semantic dictionary. A
description is given of how these are built in a commercial interface available for natural
languages, called ELF (English Language Frontend) [3]. This chapter details the steps for
translating an English query into SQL using ELF.
2.1 Introduction to ELF
ELF is a commercial system that generates Natural Language Processing System for a
database. It is developed by ELF Software Co. ELF is an interface which works with
Microsoft Access and Visual Basic.
2.2 How are the Lexicon and Semantic dictionary built in ELF?
The lexicon is built automatically in ELF. In other words, ELF takes an existing database and
scans through it so that it can understand both the data and the relationships. This process is
the Analyze function, and the interface to it is shown in Figure 3.1.
For simpler cases, Express Analysis is sufficient. This causes ELF to automatically read all
the information it needs out of the database. Words related to attributes and relationships of
the database are stored into the lexicon dictionary.
20
Figure 3.1
There might be situations when certain tables and relationships need to be excluded from the
lexicon. Custom Analysis is selected for such situations. Using this function, decisions can be
made to help Access ELF decide where to concentrate, what to evaluate, and what to ignore.
This allows use of common sense, which no computer program can claim -- and comes in
very handy. The Figure 3.2 shows the custom analysis window where the tables to be
considered can be manually selected.
This window contains all the table names. When a table (or query) in the Custom Analysis
window is de-selected, ELF is excused from answering any questions related to these tables.
ELF will not make an attempt to look at how fields of these tables relate to each other, and
will not store any of the words related to these table and their relationships in its own
dictionary. Of course, this speeds up the Analysis process. Depending upon the situation,
some information is used frequently in searches, occasionally or not at all. For information
which is used frequently and if it's a significant amount of data, it may be wise to reduce
21
processing time by selectively ignoring parts of the table. To do this, right click on any table
in the Custom Analysis window's Data Set list. A listing of the fields in that table will appear,
giving the option of "Acknowledging" (Ack) and/or "Memorizing" (Mem) each one (Figure
3.3).
Figure 3.2
If the Acknowledge field is not selected then it is similar to ignoring an entire table; ELF acts
as if the field does not exist and will not be able to answer questions about it. If the
Acknowledge field is selected but Memorize is de-selected then this means that ELF will
know its type and which table it comes from, as well as other details such as whether it
participates in relationships or whether it seems to be a person's name or a place.
22
Figure 3.3
The only thing it will not do is to save all the data entries from that particular field in the
Lexicon. During the Analysis process, ELF examines the terms used in defining fields and
tables, and uses its Lexicon to try to predict what kinds of synonyms might be used in
queries. It also stores its type (e.g. noun, verb. preposition etc.) and which table it comes
from (builds the Semantic dictionary).
Considering the database used in this project, the entry for element “brown” in the Lexicon is
shown in Figure 3.4.
23
Figure 18
The last two items in the entry, that is EYES_COLOR and HAS_EYE-COLOR, mean that
“brown” is a data item of the COLOR field of the EYES table and also a data item of the
EYE-COLOR field of the HAS table. NOBLE indicates that "Brown" is not part of a
compound data item. CAPDATA indicates that it is spelled with an initial capital letter; and
DATA means that it comes from a database field. Thus all the information needed for
mapping is also stored in the lexicon. This completes the building the lexicon and the
semantic dictionary.
3.3 Transformation of an English query to SQL in ELF
Now the database system is ready to answer an English query. ELF does this in three steps.
The question is typed in the query box as shown in Figure 3.5.
24
Figure 3.5
The first step is to parse the English question and find the words that are stored in the
lexicon. Figure 3.6 shows the result after parsing.
Figure 3.6
The words which are extracted are name, eye color and brown. One important point is that
since ELF found brown, and brown is a data item, it is put in the “where clause”. In the next
step (Figure 3.7) it finds the tables for these words. This information is in the lexicon.
25
Figure 3.7
After getting the tables, the last step involves joining these tables on the common attributes.
In this example there are two tables involved ‘person’ and ‘has’. Both of these tables have
the attribute ID and so the join is made on this attribute. So Figure 3.8 shows the final SQL
statement. The final result generated is as shown in figure 3.9.
Figure 3.8
26
elfWorksheetSub
name
EYE Color
anshu
brown
Figure 3.9
2.4 Storing the SQL generated into a file
The Natural language query has now been converted into SQL. The main aim of the project
was to use this for fuzzy database queries. There had to be a way where the fuzzy group
could access this SQL and make modifications to include the fuzzy data. The fuzzy group
decided to use SQL SERVER for creating the fuzzy database. As stated earlier this
experiment was done using ACCESS. Saving the SQL generated to a file provided a
common interface. Since ELF supports VBScript, a script was written and embedded in the
ELF software. The function was named getsql (Figure 3.10)
Figure 3.10
27
This function is triggered after the SQL has been generated and, as indicated in the script ,
stores the SQL in a file named elfsqlGenerated.txt.
28
Chapter 4
Fuzzy word extraction
As stated in chapter 2, the goal was to get the fuzzy database working with the Natural
Language interface. The SQL generated for the query with the fuzzy word in Figure 3.8
should be SELECT DISTINCT person.name,has.[EYE.Color] FROM Person,has,Person
LEFT Join has Person.ID WHERE has.[EYE Color] = “ very Brown” ;
The first thought for achieving this was to include fuzzy words in the database as attribute
values. For e.g. the values for the attribute COLOR in the table EYE could be “very brown”
or “light brown”. In this case the words are no longer fuzzy but become distinct database
values. For words to be fuzzy they need to have weights associated with them [6]. For
example the “very brown” might have a weight associated as 8.1, and “light brown” a weight
of 3.2. The SQL generated for the query “List all people with very brown eyes?” should be
SELECT * FROM WHERE EYE.COLOR = “BROWN” AND WEIGHT >= 8.1. This query
is now fuzzy.
The second option was to add very brown as a synonym of brown in the Lexicon. This will
also not help because the SQL generated by ELF always takes the root word (i.e. attribute
value), which is “brown” and not “very brown”
The last option was to extract the fuzzy word and send it to a separate file and leave the SQL
generated as is. This was a better choice since the ultimate aim was that in the final SQL
query, instead of “very”, an “and” clause for the weight can be added. The Fuzzy group then
29
picked up the SQL generated from the file and the fuzzy words from files. Based on the
fuzzy word the appropriate “and clause” was then added to the SQL. SELECT DISTINCT
person.name,has.[EYE.Color] FROM Person,has,Person LEFT Join has Person.ID WHERE
has.[EYE Color] = “ Brown” and weight = 0.7. To achieve this VBScript shown in Figure
4.1 was written.
function putq
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile("c:\elfNatural.txt", 2, True)
objTextFile.WriteLine(Question)
objTextFile.Close
Dim Qarray
Dim str
Dim result
Dim final
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile("c:\elfFuzzyWord.txt", 2, True)
QArray=Split(Question," ")
str = "very"
for i = Lbound(QArray) to Ubound(QArray)
if QArray(i) = str then
result = Qarray(i+1)
final = str& " " & result
objTextFile.Write(final)
end if
next
objTextFile.Close
end function
objTextFile.Close
end function
Figure 25
Figure 4.1
30
The file generated now contained the extracted fuzzy words. This function (putq) gets
triggered after the Question has been asked and stores the results in an “elfFuzzyWord.txt”.
The Fuzzy group then read the fuzzy words from the file and accordingly made changes in
the SQL generated to get the correct result for the fuzzy database.
31
Chapter 5
Future Work
Currently the interface does not handle negation with fuzzy words. The interface cannot
handle if we change” very brown” to “not very brown”. Foe example, consider the query,
“Find people with not very brown eyes”. This is interpreted as SELECT DISTINCT
person.name,has.{EYE.Color] FROM Person,has,Person LEFT Join has Person.ID WHERE
has.[EYE Color] is not Brown. This is not correct. The query was supposed to fetch people
with light brown eyes.
Current implementation allows asking Natural Language query in terms of one attribute. Use
of compound attributes in a query (e.g., “List all people with very brown eyes and a broad
face”) will make the system more useful and powerful.
Comparing other Natural Language Interfaces in terms of efficiency, accuracy and quality
would be an interesting research project. Then the interface which is more efficient and
accurate can be used with the fuzzy database and spatial database.
Another research project could be in terms of the handling of synonyms. The key to a good
interface is a good Lexicon. Trying to work on an interface with a simple lexicon (no
synonyms) and observing how it works would be an interesting subject.
32
References
[1] www.cs.washington.edu/research/projects/WebWare1/www/precise/precise.html
[2] Knowles, S., A Natural Language Database Interface for SQL-Tutor, Nov 1993.
[3] ELF Software CO. Natural Language Database Interfaces from ELF Software Co.
available at www.elfsoft.com
[4] Popescu, A.M., Etzioni, O., Kautz, H., Towards a Theory of Natural Language Interfaces
to Databases, Jan 2003
[5] Androutsopoulos, I., Ritchie, G., Thanisch, P., MASQUE/SQL – An Efficient and
Portable Natural Language Query Interface for Relational Databases, Edinburgh, 1993.
[6] Joy,Karen and Dattatri, Smita ,” Implementing a Fuzzy Relational Database and
Querying System With Community Defined Membership Values “, VCU Directed Research
Report ,November 2004
33
Overview of Natural Language Interfaces
34