Download Extending databases to precision-controlled retrieval of qualitative

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Extending databases
to precision-controlled retrieval
of qualitative information
Victor Polo de Gyves,1 Adolfo Guzman1,2, and
Serguei Levachkine2
(1) SoftwarePro International,
(2) Centro de Investigación en Computación,
[email protected]
Qualitative variable
• A variable able to take a symbolic value
• A symbolic value is a set if can be
considered E its name or description
Pale{white, yellow, orange, beige}
Hierarchy
• For an element set E, a hierarchy H is another set of elements
where any element ei is a symbolic value representing:
– An element of E or:
– A Partition. Partition: K is a partition of set S if it is both a
covering for S and an exclusive set. The members of K are
mutually exclusive and collectively exhaust S. Each element
of S is in exactly one Kj.
• And the union of every elements by represented by ei is E. Example:
live_being
animal
mammal
cat
dog
bird
plant
snake
citric
lemon
pine
orange
Confusion
• If I ask for an animal and a snake is given, Is there a mistake?
• I ask for a plant and an animal is given, which is the error value?
can this error be measured?
Measuring the error in using a qualitative value ‘r’ instead of another
qualitative value ‘s’ is defined as follows1:
• conf (r, r) = conf (r, parent of r) = 0
• conf (r, s) = 1 + conf (r, parent of s))
The confusion in using r instead of s (the desired value) is the
number of descending links from r  s.
Conf(r,s) is not a distance, nor ultradistance and it is not symmetric.
1 A. Guzman-Arenas and S. Levachkine. Hierarchies Measuring Qualitative Variables. Lecture Notes in Computer Science
LNCS 2945 (Computational Linguistics and Intelligent Text Processing), (Springer-Verlag 2004). 262-274. ISSN 0372-9743.
live_being
animal
mammal
cat
bird
dog
plant
snake
citric
lemon
pine
orange
conf (cat, mammal) = 0 (If I am using a cat instead of a mammal)
conf (cat, animal) = 0
conf (mammal, cat) = 1
conf (cat, dog) = 1
conf (cat, bird) = 1
conf (cat, lemon) = 3
Predicates with controlled confusion
P is true for x if1:
•The object x satisfies predicate P with confusion  if and only
if:
•P is true for x when P does not contains hierarchical variables
•When pr is a hierarchical variable and P is of the form (pr = c),
if and only if for the value v of the property pr for the object x,
v= c (if the value v can be used instead of c with confusion )
•When P = P1 P2, if and only if P1 is satisfied by x, or P2 is
satisfied by x.
•When P = P1  P2, if and only if P1 is satisfied by x and P2 is
satisfied by x.
•When P = ¬P1, if and only if P1 is not satisfied by x.
1A. Guzman-Arenas and S. Levachkine. Graduated errors in approximate queries using hierarchies and ordered sets.
Lecture Notes in Artificial Intelligence LNAI 2972, (Springer-Verlag 2004). 139-148. ISSN 0302-9743
(Ann (lives_in USA) (pet snake)),
(Bill (lives_in
English_Speaking_Island) (pet
citric)),
(Fred (lives_in USA) (pet cat)),
(Tom (lives_in Mexico)(pet cat)),
(Sam (lives_in Cuba) (pet pine)),
(Pedro (lives_in Mexico)(pet dog)).
1
Caribbean_Islands
Canada USA Mexico
Spanish_speaking_islands
Cuba
Puerto
Rico
P is true for:
E=0 Fred
E=1 Fred, Tom , Pedro
E=2 Fred, Tom, Pedro, Ann
E=3 Fred, Tom, Pedro, Ann, Sam,
Bill
H
North_America
Predicate:
P= (lives_in=USA) (pet=cat)
Central_America
live_being
Guatemala
Costa_Rica
Honduras
English_speaking_islands
Jamaica
animal
mammal
cat
dog
bird
plant
snake
citric
lemon
pine
orange
Implementing conf(r, s) for
databases
•
Looking for the author of an assasination
•
Witness 1:
Witness 2:
- Red car
Elegant car
- Tall
normal height
- Scar in nose
- Scar in the face
Witness 3:
- Small car
Confusion can sort a database with suspects: first, the most probably ones
and at the end the least probably authors of the murdering.
•
An enterprise manager wishes to make a special discount to their
customers interested in office products.
•
Given a document “d” on a digital library, sort any documents in
function of the subject similarity with “d”. The most similar documents
will be closer.
Confusion delimitates objects from a database in agreement with the
closeness or similarity of their properties.
Example 1. (address = california)1
[confusion@~/tesis/jdbc/jdbc/jars]$ java -classpath
$CLASSPATH:postgresql.jar: Conexion "select customers.name,
customers.address, conf(customers.address)from customers
where conf(customers.address,'california')<=1 order by
Extended SQL
conf(customers.address)"
------------(user)
SQL traducido: select customers.name, customers.address,
confusion.customers_address_norm."california"from Pure SQL
customers_address_norm, customers where (
(machine)
confusion.customers_address_norm.nombre = customers.address
AND confusion.customers_address_norm."california"<=1 ) order
by confusion.customers_address_norm."california"
------------NAME
ADDRESS
CALIFORNIA
Tom's Hamburgers pasadena
0
Microsol
silicon valley 0
East coast meat florida
1
Result
Media Tools
new york
1
Texas fruits
texas
1
[confusion@~/tesis/jdbc/jdbc/jars]$
Example 2. Update: (industrial branch, food)0
[confusion@~/tesis/jdbc/jdbc/jars]$ java -classpath $CLASSPATH:postgresql.jar:
Conexion "update customers set discount=0.07 where customers.name in
conf(customers.industrial_branch,'food')<=0"
-------------
Extended SQL
(user)
SQL traducido: update customers set discount=0.07 where customers.name in
(select customers.name from customers_industrial_branch_norm, customers where (
confusion.customers_industrial_branch_norm.nombre = customers.industrial_branch
AND confusion.customers_industrial_branch_norm."food"<=0 ) )
[confusion@~/tesis/jdbc/jdbc/jars]$psql conf
conf=# select * from customers;
name
| industrial_branch|
address
|discount
---------------------+------------------+---------------+--------Media Tools
| computers
| new york
|
0
Garcia Productores | tequila
| mexico city
|
0
Microsol
| software
| silicon valley|
0
Tom's Hamburgers
| food
| pasadena
|
0.07
East coast meat
| meat
| florida
|
0.07
Luigi's italian food| italian food
| north america |
0.07
Mole Doña Rosa
| mexican food
| mexico
|
0.07
Texas fruits
| fruits
| texas
|
0.07
Canada seeds
| food
| canada
|
0.07
Pure SQL
(machine)
Result
Example 3: Options for a bachelor
application.

[confusion@dlguzman1 jars]$ java -classpath $CLASSPATH:postgresql.jar:
3
( (profesion,
(sistema,
abierto
semestral)
)
Conexion
"selectinformatica)
profesiones.*,
2
2
conf(profesiones.profesion),conf(profesiones.sistema)from profesiones
where conf(profesiones.profesion,'informatica')<=2 AND
Extended SQL
conf(profesiones.sistema,'abierto semestral')<=2 and
conf(profesiones.sistema)+conf(profesiones.profesion)<=3 order by
conf(profesiones.profesion)+conf(profesiones.sistema)";
------------Pure SQL
SQL traducido: select profesiones.*,
confusion.profesiones_profesion_norm."informatica",confusion.profesione
s_sistema_norm."abierto semestral"from profesiones_sistema_norm,
profesiones_profesion_norm, profesiones where (
confusion.profesiones_profesion_norm.nombre = profesiones.profesion AND
confusion.profesiones_profesion_norm."informatica"<=2 ) AND (
confusion.profesiones_sistema_norm.nombre = profesiones.sistema AND
confusion.profesiones_sistema_norm."abierto semestral"<=2 ) and
confusion.profesiones_sistema_norm."abierto
semestral"+confusion.profesiones_profesion_norm."informatica"<=3 order
by
confusion.profesiones_profesion_norm."informatica"+confusion.profesione
s_sistema_norm."abierto semestral"
ESCUELA
PROFESION
DISPONIBLE
uam
informatica 5
unam
informatica 30
ipn
informatica 20
monterrey informatica 20
ipn
computacion 30
unam
computacion 30
monterrey computacion 10
uam
quimica
5
Finalizando...
Fin.
[confusion@dlguzman1 jars]$
SISTEMA
INFORMATICA
abierto trimestral
0
escolarizado anual
0
escolarizado semestral 0
escolarizado semestral 0
escolarizado anual
1
escolarizado semestral 1
escolarizado semestral 1
abierto trimestral
2
ABIERTO SEMESTRAL
1
2
2
2
2
2
2
1
Result
The method to create confusion tables
a) Select the attribute(s) from the
entities in which we desire to use
the extended SQL.
b) Join hierarchies with
attributes
Alimento
Mascota_alimento
Nombre/String
Padre/String
Mascota
Nombre/String
Alimento/String*
A=
Nutritivo
Fruta
Basura
Carne
c) Calculate and create the confusion table associated to each attribute of the entity
A = {alimento, nutritivo,
basura, fruta, carne}
AxA = A2 = {
{a,a},{a,n},{a,b},{a,f},{a,c},
{n,a},{n,n},{n,b},{n,f},{n,c},
{b,a},{b,n},{b,b},{b,f},{b,c},
{f,a},{f,n},{f,b},{f,f},{f,c},
{c,a},{c,n},{c,b},{c,f},{c,c}
}
conf(A2)=}
conf(a,a),
conf(a,n),
conf(a,b),
...,
conf(c,c)
}
conf(A2)=}
0, 1, 1,
0, 0, 1,
0, 1, 0,
0, 0, 1,
0, 0, 1,
}
2,
1,
2,
0,
1,
2,
1,
2,
1,
0,
conf=# select * from mascota_alimento_norm ;
nombre |alimento|nutritivo|basura|fruta|verdura
---------+--------+---------+------+-----+------alimento |
0 |
1 |
1 |
2 |
2
nutritivo|
0 |
0 |
1 |
1 |
1
basura
|
0 |
1 |
0 |
2 |
2
fruta
|
0 |
0 |
1 |
0 |
1
verdura |
0 |
0 |
1 |
1 |
0
Steps to write predicates
• Step 1: Create predicate
• Step 2: Write the XSQL sentence
• Step 3: Execute the program
Step 1. Writing the predicate
• I whish to find every baseball players living
in Xochimilco with confusion 1.
• The predicate is:
[(sport = baseball)  (address = xochimilco)]1
Step 2. Writing XSQL
• The user looks the predicate and writes it in XSQL:
[(sport = baseball)  (address = xochimilco)]1
It is rewritten by the user as:
conf(sport, baseball)<=1 AND conf(address, xochimilco)<=1
Step 3. Converting to Pure SQL
• A program converts the expression in XSQL to pure SQL,
able to be executed in any database.
[(sport = baseball)  (address = xochimilco)]1
conf(sport, baseball)<=1 AND conf (address, xochimilco)<=1
The program converts it to:
(confusion.friends_sport_norm.nombre = friends.sport AND
confusion.friends_sport_norm.“baseball"<=1)
AND
(confusion.friends_address_norm.nombre = friends.address
AND confusion.friends_address_norm."xochimilco"<=1)
Confusion tables
• The previous expression refers to the tables
confusion.friends_address_norm
and confusion.friends_sport_norm
• Our program creates the tables just the first
time
– Using the hierarchical tree and the method to
create confusion tables
• Using these tables the query is more
efficient because there is no need to
recalculate the confusion values
Conclusions
• Using hierarchies our program extends
any database to make it able to recover
objects with controlled precision
• Allows to delete and update objects with
controlled precision
• The controlled precision organizes the
objects in a “smart” way