Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Extending databases to precision-controlled retrieval of qualitative information Victor Polo de Gyves,1 Adolfo Guzman1,2, and Serguei Levachkine2 (1) SoftwarePro International, (2) Centro de Investigación en Computación, [email protected] Qualitative variable • A variable able to take a symbolic value • A symbolic value is a set if can be considered E its name or description Pale{white, yellow, orange, beige} Hierarchy • For an element set E, a hierarchy H is another set of elements where any element ei is a symbolic value representing: – An element of E or: – A Partition. Partition: K is a partition of set S if it is both a covering for S and an exclusive set. The members of K are mutually exclusive and collectively exhaust S. Each element of S is in exactly one Kj. • And the union of every elements by represented by ei is E. Example: live_being animal mammal cat dog bird plant snake citric lemon pine orange Confusion • If I ask for an animal and a snake is given, Is there a mistake? • I ask for a plant and an animal is given, which is the error value? can this error be measured? Measuring the error in using a qualitative value ‘r’ instead of another qualitative value ‘s’ is defined as follows1: • conf (r, r) = conf (r, parent of r) = 0 • conf (r, s) = 1 + conf (r, parent of s)) The confusion in using r instead of s (the desired value) is the number of descending links from r s. Conf(r,s) is not a distance, nor ultradistance and it is not symmetric. 1 A. Guzman-Arenas and S. Levachkine. Hierarchies Measuring Qualitative Variables. Lecture Notes in Computer Science LNCS 2945 (Computational Linguistics and Intelligent Text Processing), (Springer-Verlag 2004). 262-274. ISSN 0372-9743. live_being animal mammal cat bird dog plant snake citric lemon pine orange conf (cat, mammal) = 0 (If I am using a cat instead of a mammal) conf (cat, animal) = 0 conf (mammal, cat) = 1 conf (cat, dog) = 1 conf (cat, bird) = 1 conf (cat, lemon) = 3 Predicates with controlled confusion P is true for x if1: •The object x satisfies predicate P with confusion if and only if: •P is true for x when P does not contains hierarchical variables •When pr is a hierarchical variable and P is of the form (pr = c), if and only if for the value v of the property pr for the object x, v= c (if the value v can be used instead of c with confusion ) •When P = P1 P2, if and only if P1 is satisfied by x, or P2 is satisfied by x. •When P = P1 P2, if and only if P1 is satisfied by x and P2 is satisfied by x. •When P = ¬P1, if and only if P1 is not satisfied by x. 1A. Guzman-Arenas and S. Levachkine. Graduated errors in approximate queries using hierarchies and ordered sets. Lecture Notes in Artificial Intelligence LNAI 2972, (Springer-Verlag 2004). 139-148. ISSN 0302-9743 (Ann (lives_in USA) (pet snake)), (Bill (lives_in English_Speaking_Island) (pet citric)), (Fred (lives_in USA) (pet cat)), (Tom (lives_in Mexico)(pet cat)), (Sam (lives_in Cuba) (pet pine)), (Pedro (lives_in Mexico)(pet dog)). 1 Caribbean_Islands Canada USA Mexico Spanish_speaking_islands Cuba Puerto Rico P is true for: E=0 Fred E=1 Fred, Tom , Pedro E=2 Fred, Tom, Pedro, Ann E=3 Fred, Tom, Pedro, Ann, Sam, Bill H North_America Predicate: P= (lives_in=USA) (pet=cat) Central_America live_being Guatemala Costa_Rica Honduras English_speaking_islands Jamaica animal mammal cat dog bird plant snake citric lemon pine orange Implementing conf(r, s) for databases • Looking for the author of an assasination • Witness 1: Witness 2: - Red car Elegant car - Tall normal height - Scar in nose - Scar in the face Witness 3: - Small car Confusion can sort a database with suspects: first, the most probably ones and at the end the least probably authors of the murdering. • An enterprise manager wishes to make a special discount to their customers interested in office products. • Given a document “d” on a digital library, sort any documents in function of the subject similarity with “d”. The most similar documents will be closer. Confusion delimitates objects from a database in agreement with the closeness or similarity of their properties. Example 1. (address = california)1 [confusion@~/tesis/jdbc/jdbc/jars]$ java -classpath $CLASSPATH:postgresql.jar: Conexion "select customers.name, customers.address, conf(customers.address)from customers where conf(customers.address,'california')<=1 order by Extended SQL conf(customers.address)" ------------(user) SQL traducido: select customers.name, customers.address, confusion.customers_address_norm."california"from Pure SQL customers_address_norm, customers where ( (machine) confusion.customers_address_norm.nombre = customers.address AND confusion.customers_address_norm."california"<=1 ) order by confusion.customers_address_norm."california" ------------NAME ADDRESS CALIFORNIA Tom's Hamburgers pasadena 0 Microsol silicon valley 0 East coast meat florida 1 Result Media Tools new york 1 Texas fruits texas 1 [confusion@~/tesis/jdbc/jdbc/jars]$ Example 2. Update: (industrial branch, food)0 [confusion@~/tesis/jdbc/jdbc/jars]$ java -classpath $CLASSPATH:postgresql.jar: Conexion "update customers set discount=0.07 where customers.name in conf(customers.industrial_branch,'food')<=0" ------------- Extended SQL (user) SQL traducido: update customers set discount=0.07 where customers.name in (select customers.name from customers_industrial_branch_norm, customers where ( confusion.customers_industrial_branch_norm.nombre = customers.industrial_branch AND confusion.customers_industrial_branch_norm."food"<=0 ) ) [confusion@~/tesis/jdbc/jdbc/jars]$psql conf conf=# select * from customers; name | industrial_branch| address |discount ---------------------+------------------+---------------+--------Media Tools | computers | new york | 0 Garcia Productores | tequila | mexico city | 0 Microsol | software | silicon valley| 0 Tom's Hamburgers | food | pasadena | 0.07 East coast meat | meat | florida | 0.07 Luigi's italian food| italian food | north america | 0.07 Mole Doña Rosa | mexican food | mexico | 0.07 Texas fruits | fruits | texas | 0.07 Canada seeds | food | canada | 0.07 Pure SQL (machine) Result Example 3: Options for a bachelor application. [confusion@dlguzman1 jars]$ java -classpath $CLASSPATH:postgresql.jar: 3 ( (profesion, (sistema, abierto semestral) ) Conexion "selectinformatica) profesiones.*, 2 2 conf(profesiones.profesion),conf(profesiones.sistema)from profesiones where conf(profesiones.profesion,'informatica')<=2 AND Extended SQL conf(profesiones.sistema,'abierto semestral')<=2 and conf(profesiones.sistema)+conf(profesiones.profesion)<=3 order by conf(profesiones.profesion)+conf(profesiones.sistema)"; ------------Pure SQL SQL traducido: select profesiones.*, confusion.profesiones_profesion_norm."informatica",confusion.profesione s_sistema_norm."abierto semestral"from profesiones_sistema_norm, profesiones_profesion_norm, profesiones where ( confusion.profesiones_profesion_norm.nombre = profesiones.profesion AND confusion.profesiones_profesion_norm."informatica"<=2 ) AND ( confusion.profesiones_sistema_norm.nombre = profesiones.sistema AND confusion.profesiones_sistema_norm."abierto semestral"<=2 ) and confusion.profesiones_sistema_norm."abierto semestral"+confusion.profesiones_profesion_norm."informatica"<=3 order by confusion.profesiones_profesion_norm."informatica"+confusion.profesione s_sistema_norm."abierto semestral" ESCUELA PROFESION DISPONIBLE uam informatica 5 unam informatica 30 ipn informatica 20 monterrey informatica 20 ipn computacion 30 unam computacion 30 monterrey computacion 10 uam quimica 5 Finalizando... Fin. [confusion@dlguzman1 jars]$ SISTEMA INFORMATICA abierto trimestral 0 escolarizado anual 0 escolarizado semestral 0 escolarizado semestral 0 escolarizado anual 1 escolarizado semestral 1 escolarizado semestral 1 abierto trimestral 2 ABIERTO SEMESTRAL 1 2 2 2 2 2 2 1 Result The method to create confusion tables a) Select the attribute(s) from the entities in which we desire to use the extended SQL. b) Join hierarchies with attributes Alimento Mascota_alimento Nombre/String Padre/String Mascota Nombre/String Alimento/String* A= Nutritivo Fruta Basura Carne c) Calculate and create the confusion table associated to each attribute of the entity A = {alimento, nutritivo, basura, fruta, carne} AxA = A2 = { {a,a},{a,n},{a,b},{a,f},{a,c}, {n,a},{n,n},{n,b},{n,f},{n,c}, {b,a},{b,n},{b,b},{b,f},{b,c}, {f,a},{f,n},{f,b},{f,f},{f,c}, {c,a},{c,n},{c,b},{c,f},{c,c} } conf(A2)=} conf(a,a), conf(a,n), conf(a,b), ..., conf(c,c) } conf(A2)=} 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, } 2, 1, 2, 0, 1, 2, 1, 2, 1, 0, conf=# select * from mascota_alimento_norm ; nombre |alimento|nutritivo|basura|fruta|verdura ---------+--------+---------+------+-----+------alimento | 0 | 1 | 1 | 2 | 2 nutritivo| 0 | 0 | 1 | 1 | 1 basura | 0 | 1 | 0 | 2 | 2 fruta | 0 | 0 | 1 | 0 | 1 verdura | 0 | 0 | 1 | 1 | 0 Steps to write predicates • Step 1: Create predicate • Step 2: Write the XSQL sentence • Step 3: Execute the program Step 1. Writing the predicate • I whish to find every baseball players living in Xochimilco with confusion 1. • The predicate is: [(sport = baseball) (address = xochimilco)]1 Step 2. Writing XSQL • The user looks the predicate and writes it in XSQL: [(sport = baseball) (address = xochimilco)]1 It is rewritten by the user as: conf(sport, baseball)<=1 AND conf(address, xochimilco)<=1 Step 3. Converting to Pure SQL • A program converts the expression in XSQL to pure SQL, able to be executed in any database. [(sport = baseball) (address = xochimilco)]1 conf(sport, baseball)<=1 AND conf (address, xochimilco)<=1 The program converts it to: (confusion.friends_sport_norm.nombre = friends.sport AND confusion.friends_sport_norm.“baseball"<=1) AND (confusion.friends_address_norm.nombre = friends.address AND confusion.friends_address_norm."xochimilco"<=1) Confusion tables • The previous expression refers to the tables confusion.friends_address_norm and confusion.friends_sport_norm • Our program creates the tables just the first time – Using the hierarchical tree and the method to create confusion tables • Using these tables the query is more efficient because there is no need to recalculate the confusion values Conclusions • Using hierarchies our program extends any database to make it able to recover objects with controlled precision • Allows to delete and update objects with controlled precision • The controlled precision organizes the objects in a “smart” way