Download Extending databases to precision-controlled retrieval of qualitative

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Extending databases
to precision-controlled retrieval
of qualitative information
Victor Polo de Gyves,1 Adolfo Guzman1,2, and
Serguei Levachkine2
(1) SoftwarePro International,
(2) Centro de Investigación en Computación,
[email protected]
Qualitative variable
• A variable able to take a symbolic value
• A symbolic value is a set if can be
considered E its name or description
Pale{white, yellow, orange, beige}
Hierarchy
• For an element set E, a hierarchy H is another set of elements
where any element ei is a symbolic value representing:
– An element of E or:
– A Partition. Partition: K is a partition of set S if it is both a
covering for S and an exclusive set. The members of K are
mutually exclusive and collectively exhaust S. Each element
of S is in exactly one Kj.
• And the union of every elements by represented by ei is E. Example:
live_being
animal
mammal
cat
dog
bird
plant
snake
citric
lemon
pine
orange
Confusion
• If I ask for an animal and a snake is given, Is there a mistake?
• I ask for a plant and an animal is given, which is the error value?
can this error be measured?
Measuring the error in using a qualitative value ‘r’ instead of another
qualitative value ‘s’ is defined as follows1:
• conf (r, r) = conf (r, parent of r) = 0
• conf (r, s) = 1 + conf (r, parent of s))
The confusion in using r instead of s (the desired value) is the
number of descending links from r  s.
Conf(r,s) is not a distance, nor ultradistance and it is not symmetric.
1 A. Guzman-Arenas and S. Levachkine. Hierarchies Measuring Qualitative Variables. Lecture Notes in Computer Science
LNCS 2945 (Computational Linguistics and Intelligent Text Processing), (Springer-Verlag 2004). 262-274. ISSN 0372-9743.
live_being
animal
mammal
cat
bird
dog
plant
snake
citric
lemon
pine
orange
conf (cat, mammal) = 0 (If I am using a cat instead of a mammal)
conf (cat, animal) = 0
conf (mammal, cat) = 1
conf (cat, dog) = 1
conf (cat, bird) = 1
conf (cat, lemon) = 3
Predicates with controlled confusion
P is true for x if1:
•The object x satisfies predicate P with confusion  if and only
if:
•P is true for x when P does not contains hierarchical variables
•When pr is a hierarchical variable and P is of the form (pr = c),
if and only if for the value v of the property pr for the object x,
v= c (if the value v can be used instead of c with confusion )
•When P = P1 P2, if and only if P1 is satisfied by x, or P2 is
satisfied by x.
•When P = P1  P2, if and only if P1 is satisfied by x and P2 is
satisfied by x.
•When P = ¬P1, if and only if P1 is not satisfied by x.
1A. Guzman-Arenas and S. Levachkine. Graduated errors in approximate queries using hierarchies and ordered sets.
Lecture Notes in Artificial Intelligence LNAI 2972, (Springer-Verlag 2004). 139-148. ISSN 0302-9743
(Ann (lives_in USA) (pet snake)),
(Bill (lives_in
English_Speaking_Island) (pet
citric)),
(Fred (lives_in USA) (pet cat)),
(Tom (lives_in Mexico)(pet cat)),
(Sam (lives_in Cuba) (pet pine)),
(Pedro (lives_in Mexico)(pet dog)).
1
Caribbean_Islands
Canada USA Mexico
Spanish_speaking_islands
Cuba
Puerto
Rico
P is true for:
E=0 Fred
E=1 Fred, Tom , Pedro
E=2 Fred, Tom, Pedro, Ann
E=3 Fred, Tom, Pedro, Ann, Sam,
Bill
H
North_America
Predicate:
P= (lives_in=USA) (pet=cat)
Central_America
live_being
Guatemala
Costa_Rica
Honduras
English_speaking_islands
Jamaica
animal
mammal
cat
dog
bird
plant
snake
citric
lemon
pine
orange
Implementing conf(r, s) for
databases
•
Looking for the author of an assasination
•
Witness 1:
Witness 2:
- Red car
Elegant car
- Tall
normal height
- Scar in nose
- Scar in the face
Witness 3:
- Small car
Confusion can sort a database with suspects: first, the most probably ones
and at the end the least probably authors of the murdering.
•
An enterprise manager wishes to make a special discount to their
customers interested in office products.
•
Given a document “d” on a digital library, sort any documents in
function of the subject similarity with “d”. The most similar documents
will be closer.
Confusion delimitates objects from a database in agreement with the
closeness or similarity of their properties.
Example 1. (address = california)1
[confusion@~/tesis/jdbc/jdbc/jars]$ java -classpath
$CLASSPATH:postgresql.jar: Conexion "select customers.name,
customers.address, conf(customers.address)from customers
where conf(customers.address,'california')<=1 order by
Extended SQL
conf(customers.address)"
------------(user)
SQL traducido: select customers.name, customers.address,
confusion.customers_address_norm."california"from Pure SQL
customers_address_norm, customers where (
(machine)
confusion.customers_address_norm.nombre = customers.address
AND confusion.customers_address_norm."california"<=1 ) order
by confusion.customers_address_norm."california"
------------NAME
ADDRESS
CALIFORNIA
Tom's Hamburgers pasadena
0
Microsol
silicon valley 0
East coast meat florida
1
Result
Media Tools
new york
1
Texas fruits
texas
1
[confusion@~/tesis/jdbc/jdbc/jars]$
Example 2. Update: (industrial branch, food)0
[confusion@~/tesis/jdbc/jdbc/jars]$ java -classpath $CLASSPATH:postgresql.jar:
Conexion "update customers set discount=0.07 where customers.name in
conf(customers.industrial_branch,'food')<=0"
-------------
Extended SQL
(user)
SQL traducido: update customers set discount=0.07 where customers.name in
(select customers.name from customers_industrial_branch_norm, customers where (
confusion.customers_industrial_branch_norm.nombre = customers.industrial_branch
AND confusion.customers_industrial_branch_norm."food"<=0 ) )
[confusion@~/tesis/jdbc/jdbc/jars]$psql conf
conf=# select * from customers;
name
| industrial_branch|
address
|discount
---------------------+------------------+---------------+--------Media Tools
| computers
| new york
|
0
Garcia Productores | tequila
| mexico city
|
0
Microsol
| software
| silicon valley|
0
Tom's Hamburgers
| food
| pasadena
|
0.07
East coast meat
| meat
| florida
|
0.07
Luigi's italian food| italian food
| north america |
0.07
Mole Doña Rosa
| mexican food
| mexico
|
0.07
Texas fruits
| fruits
| texas
|
0.07
Canada seeds
| food
| canada
|
0.07
Pure SQL
(machine)
Result
Example 3: Options for a bachelor
application.

[confusion@dlguzman1 jars]$ java -classpath $CLASSPATH:postgresql.jar:
3
( (profesion,
(sistema,
abierto
semestral)
)
Conexion
"selectinformatica)
profesiones.*,
2
2
conf(profesiones.profesion),conf(profesiones.sistema)from profesiones
where conf(profesiones.profesion,'informatica')<=2 AND
Extended SQL
conf(profesiones.sistema,'abierto semestral')<=2 and
conf(profesiones.sistema)+conf(profesiones.profesion)<=3 order by
conf(profesiones.profesion)+conf(profesiones.sistema)";
------------Pure SQL
SQL traducido: select profesiones.*,
confusion.profesiones_profesion_norm."informatica",confusion.profesione
s_sistema_norm."abierto semestral"from profesiones_sistema_norm,
profesiones_profesion_norm, profesiones where (
confusion.profesiones_profesion_norm.nombre = profesiones.profesion AND
confusion.profesiones_profesion_norm."informatica"<=2 ) AND (
confusion.profesiones_sistema_norm.nombre = profesiones.sistema AND
confusion.profesiones_sistema_norm."abierto semestral"<=2 ) and
confusion.profesiones_sistema_norm."abierto
semestral"+confusion.profesiones_profesion_norm."informatica"<=3 order
by
confusion.profesiones_profesion_norm."informatica"+confusion.profesione
s_sistema_norm."abierto semestral"
ESCUELA
PROFESION
DISPONIBLE
uam
informatica 5
unam
informatica 30
ipn
informatica 20
monterrey informatica 20
ipn
computacion 30
unam
computacion 30
monterrey computacion 10
uam
quimica
5
Finalizando...
Fin.
[confusion@dlguzman1 jars]$
SISTEMA
INFORMATICA
abierto trimestral
0
escolarizado anual
0
escolarizado semestral 0
escolarizado semestral 0
escolarizado anual
1
escolarizado semestral 1
escolarizado semestral 1
abierto trimestral
2
ABIERTO SEMESTRAL
1
2
2
2
2
2
2
1
Result
The method to create confusion tables
a) Select the attribute(s) from the
entities in which we desire to use
the extended SQL.
b) Join hierarchies with
attributes
Alimento
Mascota_alimento
Nombre/String
Padre/String
Mascota
Nombre/String
Alimento/String*
A=
Nutritivo
Fruta
Basura
Carne
c) Calculate and create the confusion table associated to each attribute of the entity
A = {alimento, nutritivo,
basura, fruta, carne}
AxA = A2 = {
{a,a},{a,n},{a,b},{a,f},{a,c},
{n,a},{n,n},{n,b},{n,f},{n,c},
{b,a},{b,n},{b,b},{b,f},{b,c},
{f,a},{f,n},{f,b},{f,f},{f,c},
{c,a},{c,n},{c,b},{c,f},{c,c}
}
conf(A2)=}
conf(a,a),
conf(a,n),
conf(a,b),
...,
conf(c,c)
}
conf(A2)=}
0, 1, 1,
0, 0, 1,
0, 1, 0,
0, 0, 1,
0, 0, 1,
}
2,
1,
2,
0,
1,
2,
1,
2,
1,
0,
conf=# select * from mascota_alimento_norm ;
nombre |alimento|nutritivo|basura|fruta|verdura
---------+--------+---------+------+-----+------alimento |
0 |
1 |
1 |
2 |
2
nutritivo|
0 |
0 |
1 |
1 |
1
basura
|
0 |
1 |
0 |
2 |
2
fruta
|
0 |
0 |
1 |
0 |
1
verdura |
0 |
0 |
1 |
1 |
0
Steps to write predicates
• Step 1: Create predicate
• Step 2: Write the XSQL sentence
• Step 3: Execute the program
Step 1. Writing the predicate
• I whish to find every baseball players living
in Xochimilco with confusion 1.
• The predicate is:
[(sport = baseball)  (address = xochimilco)]1
Step 2. Writing XSQL
• The user looks the predicate and writes it in XSQL:
[(sport = baseball)  (address = xochimilco)]1
It is rewritten by the user as:
conf(sport, baseball)<=1 AND conf(address, xochimilco)<=1
Step 3. Converting to Pure SQL
• A program converts the expression in XSQL to pure SQL,
able to be executed in any database.
[(sport = baseball)  (address = xochimilco)]1
conf(sport, baseball)<=1 AND conf (address, xochimilco)<=1
The program converts it to:
(confusion.friends_sport_norm.nombre = friends.sport AND
confusion.friends_sport_norm.“baseball"<=1)
AND
(confusion.friends_address_norm.nombre = friends.address
AND confusion.friends_address_norm."xochimilco"<=1)
Confusion tables
• The previous expression refers to the tables
confusion.friends_address_norm
and confusion.friends_sport_norm
• Our program creates the tables just the first
time
– Using the hierarchical tree and the method to
create confusion tables
• Using these tables the query is more
efficient because there is no need to
recalculate the confusion values
Conclusions
• Using hierarchies our program extends
any database to make it able to recover
objects with controlled precision
• Allows to delete and update objects with
controlled precision
• The controlled precision organizes the
objects in a “smart” way