Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su 2001/12/18 1/50 Abstract Databases usually change over time and make machine-discovered knowledge inconsistent Useful knowledge should be robust against database changes so that it is unlikely to become inconsistent after database changes 2001/12/18 2/50 Abstract( cont.) Defines this notion of robustness in database and describes how robustness of first-order Horn-clause rules can be estimated and applied in knowledge discovery 2001/12/18 3/50 outline Motivation Objective Terminology Robustness of knowledge Definitions of robustness Estimating robustness Templates for estimating robustness Empirical demonstration of robustness estimation Applying robustness in knowledge discovery Experimental results Conclusion and future work opinion 2001/12/18 4/50 Motivation Many application require discovery knowledge to be consistent in all database states Most solution approach to these problem assume static databases 2001/12/18 5/50 Objective To discover robust knowledge that is unlikely to become inconsistent with new database states To presents an efficient approach to the estimation and use of the new measure 2001/12/18 6/50 Terminology Robustness can be defined as the probability that the knowledge is consistent with a database state This paper considers relational databases, which consist of a set of relations 2001/12/18 7/50 Terminology( cont.) Horn-clause rules that express the regularity of data To literals defined on database relation as database literal and literals on built-in relations as built-in literals Range rule built-in literal Relational rule database literals 2001/12/18 8/50 Terminology( cont.) A database state at a given time t is the collection of the instances present in the database at time t To use the close-world assumption( CWA) to interpret the semantic of a database state A rule is consistent with a database state if all variable instantiations that satisfy the antecedents of the rule also satisfy the consequent of the rule 2001/12/18 9/50 Terminology( cont.) 2001/12/18 10/50 Robustness of knowledge Definitions of robustness Estimating robustness Templates for estimating robustness Empirical demonstration of robustness estimation 2001/12/18 11/50 Definitions of robustness Definition 1(Robustness for all states) Given a rule r, let D be the event that a database is in a state that is consistent with r. The robustness of r is Robust1(r)=Pr(D) # of database states consistent with r Robust1(r)= # of all possible database states Two problems: treats all database states are equally probable and possible database states is intractably large 2001/12/18 12/50 Definitions of robustness( cont.) Definition 2( Robustness for accessible states) Given a rule r, a database in a state denoted as d, in which r is consistent. New database states are accessible from d by performing transactions. Let t denote the event of performing a transaction on d that result in new database state inconsistent with r 2001/12/18 13/50 Definitions of robustness( cont.) The robustness of r in accessible states from the current state d is Robust (r | d ) Pr(t | d ) 1 Pr(t | d ) 2001/12/18 14/50 Definitions of robustness( cont.) Corollary 3 If r is consistent with d, and if new database states are accessible from d only by performing transaction, and all transaction are equally probable, then Robust1(r)=Robust(r|d) 2001/12/18 15/50 Definitions of robustness( cont.) Example to reach a state inconsistent with r d1 delete ten tuples d2 delete one tuple Robust(r|d1) > Robust(r|d2) 2001/12/18 16/50 Estimating robustness Laplace law of succession Given a repeatable experiment with an outcome of one of any of k classes experiment n times r of which have resulted in some outcome C The probability that the outcome of the next experiment will be C can be r 1 nk 2001/12/18 17/50 Estimating robustness( cont.) m-probability Let r, n, C be as laplace law Pr(C) is probability that has an outcome C m is an adjusting constant that indicates our confidence in Pr(c) The probability that the outcome of the next experiment will be C can be r m Pr(c) nm 2001/12/18 18/50 Estimating robustness( cont.) Laplace law is a special case of m-probability with Pc(c)=1/k and m=k To estimate the robustness of a rule based on the probability of transactions that may invalidate the rule To decomposed into the transactions of deriving a set of invalidating transaction and estimating the probability of those transactions 2001/12/18 19/50 Estimating robustness( cont.) example 2001/12/18 20/50 Estimating robustness( cont.) T1, T2 and T3 are mutually exclusive with each other and these cover all possible transactions will invalidate R2.1 Pr(T 1 T 2 T 3) Pr(T 1) Pr(T 2) Pr(T 3) Robust ( R 2.1 | d ) 1 Pr(T 1 T 2 T 3) 2001/12/18 21/50 Estimating robustness( cont.) To decompose the transaction into more primitive statements and estimate their local probabilities. The decomposition is based on a bayesian network model 2001/12/18 22/50 Estimating robustness-example Pr(T 1) Pr( x1 x 2 x3 x 4 x5) Pr( x1) Pr( x 2 | x1) Pr( x3 | x 2 x1) Pr( x 4 | x 2 x1) Pr( x5 | x 4 x 2 x1) X1: a tuple is updated X2: a tuple of geoloc is updated X3: a tuple of geoloc, whose ?country=“malta”, is updated X4: a tuple of geoloc whose ?latitude is updated X5: a tuple of geoloc whose ?latitude is updated to a new value less than 35.89 2001/12/18 23/50 Estimating robustness-example( cont.) X1: a tuple is updated tu is the number of pervious updates t is the total number of pervious transactions tu 1 Pr( x1) t 3 If no information is available, assume tu=t=0 2001/12/18 24/50 Estimating robustness-example( cont.) X2: a tuple of geoloc is updated R is the number of relations in the database tu,geoloc is the number of updates made to tuples of relation geoloc tu , geoloc 1 Pr( x 2 | x1) tu R 2001/12/18 25/50 Estimating robustness-example( cont.) X3: a tuple of geoloc, whose ?country=“malta”, is updated G is the size of relation geoloc Ia3 is the number of tuples in geoloc satisfy ?country=“malta” Tu,a3 is the number of updates made on the tuples in geoloc that satisfy ?country=“malta” Pr( x3 | x2 x1) 2001/12/18 tu , a3 1 tu , geoloc G / Ia3 26/50 Estimating robustness-example( cont.) X4: a tuple of geoloc whose ?latitude is updated A is the number of attributes of geoloc Tu,geoloc,latitude is the number of updates made on the latitude attribute of the geoloc relation tu , geoloc, latitude 1 Pr( x4 | x2 x1) tu , geoloc A 2001/12/18 27/50 Estimating robustness-example( cont.) X5: a tuple of geoloc whose ?latitude is updated to a new value less than 35.89 0.5 no information available Pr( x5 | x4 x2 x1) 0.398 with range information 2001/12/18 28/50 Templates for estimating robustness The templates allow the system to automatically estimate the robustness of knowledge Parameters of these equations can be evaluate by accessing database schema or transaction log 2001/12/18 29/50 Templates for estimating robustness( cont.) 2001/12/18 30/50 Templates for estimating robustness( cont.) 2001/12/18 31/50 Empirical demonstration of robustness estimation 2001/12/18 32/50 Empirical demonstration of robustness estimation( cont.) 2001/12/18 33/50 Empirical demonstration of robustness estimation( cont.) 2001/12/18 34/50 Empirical demonstration of robustness estimation( cont.) Definition 4(probability of consistency) Given a rule r, a database state d and a set of n transactions, the probability of consistency for a rule r after applying n transactions to the database state d is defined Pc(r , n | d ) (robust (r | d )) n 2001/12/18 35/50 Empirical demonstration of robustness estimation( cont.) 2001/12/18 36/50 Applying robustness in knowledge discovery Using robustness alone is not enough to guide the discovery Use robustness together with other measures of usefulness One of the measure of usefulness is applicability A pruning discovered rule is both highly applicable and robust 2001/12/18 37/50 Applying robustness in knowledge discovery( cont.) A rule is more applicable if it is shorter To dividing a learning process into a twostage rule construction and rule pruning Specification of rule pruning 2001/12/18 Take a machine-generated rule as input which is consistent with a database but overly-specific Remove antecedent literals of the rule so that it remains consistent but is short and robust 38/50 Applying robustness in knowledge discovery( cont.) To search for a subset of antecedent literals to remove until any further removal will yield an inconsistent rule To present a beam-search algorithm to trim the search space Two property-robustness and length 2001/12/18 39/50 Applying robustness in knowledge discovery( cont.) 2001/12/18 40/50 Applying robustness in knowledge discovery( cont.) The pruner removes the pruned rules that are inconsistent or dangling literal in the rule To identify an inconsistent rule, the pruner can consult the database directly A set of literals are dangling if the variables occurring in those literals do not occur in any other literals in a rule 2001/12/18 41/50 Applying robustness in knowledge discovery( cont.) To ensure removing a database literal L does not yield dangling literals, L must satisfy following 2001/12/18 No built-in literal in the antecedents of the rule is defined on the variables occurring in L If a variable occurring in the consequent of r also occurs in L, this variable must occurs in some other database literals in the rule Removing L from the rule does not disconnet existing join paths between any database literals in the rule 42/50 Applying robustness in knowledge discovery( cont.) 2001/12/18 43/50 Applying robustness in knowledge discovery( cont.) 2001/12/18 44/50 Applying robustness in knowledge discovery( cont.) 2001/12/18 45/50 Experimental results To used the rule discovery system BASIL Two large ORACLE relational databases 123 synthesized transactions contains 27 updates, 29 deletions and 67 insertions 2001/12/18 46/50 Experimental results Experiment design Train BASIL to discover a set of rules and estimate their robustness 2001/12/18 Exhaust its search space during the rule discovery and generated 355 rules. Meanwhile BASIL estimated the robustness of rules with another 202 sample transactions Use the 123 transactions to generate a new database state 47/50 Experimental results( cont.) 2001/12/18 Check if high robust rules have a better chance to remain consistent with the data in the new database state 48/50 Conclusion and future work To formalize the notion of the robustness against database changes Applying approaches to a variety of KDD applications in database management To improve the precision of the robustness estimation by refining the estimation templates to prevent overestimating 2001/12/18 49/50 Opinion Applying this approach in reliability test 2001/12/18 50/50