Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Association Rules Mining with SQL Kirsten Nelson Deepen Manek November 24, 2003 1 Organization of Presentation Overview – Data Mining and RDBMS Loosely-coupled data and programs Tightly-coupled data and programs Architectural approaches Methods of writing efficient SQL Candidate generation, pruning, support counting K-way join, SubQuery, GatherJoin, Vertical, Hybrid Integrating taxonomies Mining sequential patterns 2 Early data mining applications Most early mining systems were developed largely on file systems, with specialized data structures and buffer management strategies devised for each All data was read into memory before beginning computation This limits the amount of data that can be mined 3 Advantage of SQL and RDBMS Make use of database indexing and query processing capabilities More than a decade spent on making these systems robust, portable, scalable, and concurrent Exploit underlying SQL parallelization For long-running algorithms, use checkpointing and space management 4 Organization of Presentation Overview – Data Mining and RDBMS Loosely-coupled data and programs Tightly-coupled data and programs Architectural approaches Methods of writing efficient SQL Candidate generation, pruning, support counting K-way join, SubQuery, GatherJoin, Vertical, Hybrid Integrating taxonomies Mining sequential patterns 5 Use of Database in Data Mining “Loose coupling” of application and data How would you write an Apriori program? Use SQL statements in an application Use a cursor interface to read through records sequentially for each pass Still two major performance problems: Copying of record from database to memory Process context switching for each record retrieved 6 Organization of Presentation Overview – Data Mining and RDBMS Loosely-coupled data and programs Tightly-coupled data and programs Architectural approaches Methods of writing efficient SQL Candidate generation, pruning, support counting K-way join, SubQuery, GatherJoin, Vertical, Hybrid Integrating taxonomies Mining sequential patterns 7 Tightly-coupled applications Push computations into the database system to avoid performance degradation Take advantage of user-defined functions (UDFs) Does not require changes to database software Two types of UDFs we will use: Ones that are executed only a few times, regardless of the number of rows Ones that are executed once for each selected row 8 Tight-coupling using UDFs Procedure TightlyCoupledApriori(): begin exec sql connect to database; exec sql select allocSpace() into :blob from onerecord; exec sql select * from sales where GenL1(:blob, TID, ITEMID) = 1; notDone := true; 9 Tight-coupling using UDFs while notDone do { exec sql select aprioriGen(:blob) into :blob from onerecord; exec sql select * from sales where itemCount(:blob, TID, ITEMID)=1; exec sql select GenLk(:blob) into :notDone from onerecord } 10 Tight-coupling using UDFs exec sql select getResult(:blob) into :resultBlob from onerecord; exec sql select deallocSpace(:blob) from onerecord; compute Answer using resultBlob; end 11 Organization of Presentation Overview – Data Mining and RDBMS Loosely-coupled data and programs Tightly-coupled data and programs Architectural approaches Methods of writing efficient SQL Candidate generation, pruning, support counting K-way join, SubQuery, GatherJoin, Vertical, Hybrid Integrating taxonomies Mining sequential patterns 12 Methodology Comparison done with Association Rules against IBM DB2 Only consider generation of frequent itemsets using Apriori algorithm Five alternatives considered: Loose-coupling through SQL cursor interface – as described earlier UDF tight-coupling – as described earlier Stored-procedure to encapsulate mining algorithm Cache-mine – caching data and mining on the fly SQL implementations to force processing in the database Consider two classes of implementations SQL-92 – four different implementations SQL-OR (with object relational extensions) – six implementations 13 Architectural Options Stored procedure Apriori algorithm encapsulated as a stored procedure Implication: runs in the same address space as the DBMS Mined results stored back into the DBMS. Cache-mine Variation of stored-procedure Read entire data once from DBMS, temporarily cache data in a lookaside buffer on a local disk Cached data is discarded when execution completes Disadvantage – requires additional disk space for caching Use Intelligent Miner’s “space” option 14 Organization of Presentation Overview – Data Mining and RDBMS Loosely-coupled data and programs Tightly-coupled data and programs Architectural approaches Methods of writing efficient SQL Candidate generation, pruning, support counting K-way join, SubQuery, GatherJoin, Vertical, Hybrid Integrating taxonomies Mining sequential patterns 15 Terminology Use the following terminology T: table of items Ck: candidate k-itemsets {tid,item} pairs Data is normally sorted by transaction id Obtained from joining and pruning frequent itemsets from previous iteration Fk: frequent items sets of length k Obtained from Ck and T 16 Candidate Generation in SQL – join step Generate Ck from Fk-1 by joining Fk-1 with itself insert into Ck select I1.item1,…,I1.itemk-1,I2.itemk-1 from Fk-1 I1,Fk-1 I2 where I1.item1 = I2.item1 and … I1.itemk-2 = I2.itemk-2 and I1.itemk-1 < I2.itemk-1 17 Candidate Generation Example F3 is {{1,2,3},{1,2,4},{1,3,4},{1,3,5},{2,3,4}} C4 is {{1,2,3,4},{1,3,4,5}} Table F3 (I1) Table F3 (I2) item1 item2 item3 item1 item2 item3 1 2 3 1 2 3 1 2 4 1 2 4 1 3 4 1 3 4 1 3 5 1 3 5 2 3 4 2 3 4 18 Pruning Modify candidate generation algorithm to ensure all k subsets of Ck of length (k-1) are in Fk-1 Do a k-way join, skipping itemn-2 when joining with the nth table (2<n≤k) Create primary index (item1, …, itemk-1) on Fk-1 to efficiently process k-way join For k=4, this becomes insert into C4 select I1.item1, I1.item2, I1.item3,I2.item3 from F3 I1,F3 I2, F3 I3, F3 I4 where I1.item1 = I2.item1 … and I1.item3 < I2.item3 and I1.item2 = I3.item1 and I1.item3 = I3.item2 and I2.item3 = I3.item3 and I1.item1 = I4.item1 and I1.item3 = I4.item2 and I2.item3 = I4.item3 19 Pruning Example Evaluate join with I3 using previous example C4 is {1,2,3,4} Table F3 (I1) Table F3 (I2) Table F3 (I3) item1 item2 item3 item1 item2 item3 item1 item2 item3 1 2 3 1 2 3 1 2 3 1 2 4 1 2 4 1 2 4 1 3 4 1 3 4 1 3 4 1 3 5 1 3 5 1 3 5 2 3 4 2 3 4 2 3 4 20 Support counting using SQL Two different approaches Use the SQL-92 standard Use ‘standard’ SQL syntax such as joins and subqueries to find support of itemsets Use object-relational extensions of SQL (SQL-OR) User Defined Functions (UDFs) & table functions Binary Large Objects (BLOBs) 21 Support Counting using SQL-92 4 different methods, two of which detailed in the papers K-way Joins SubQuery Other methods not discussed because of unacceptable performance 3-way join 2 Group-Bys 22 SQL-92: K-way join Obtain Fk by joining Ck with table T of (tid,item) Perform group by on the itemset insert into Fk select item1,…,itemk,count(*) from Ck, T t1, …, T tk, where t1.item = Ck.item1, … , and tk.item = Ck.itemk and t1.tid = t2.tid … and tk-1.tid = tk.tid group by item1,…,itemk having count(*) > :minsup 23 K-way join example C3={B,C,E} and minimum support required is 2 Insert into F3 {B,C,E,2} 24 K-way join: Pass-2 optimization When calculating C2, no pruning is required after we join F1 with itself Don’t calculate and materialize C2- replace C2 in 2-way join algorithm with join of F1 with itself insert into F2 select I1.item1, I2.item1,count(*) from F1 I1, F1 I2, T t1, T t2 where I1.item1 < I2.item1 and t1.item = I1.item1 and t2.item = I2.item1 and t1.tid = t2.tid group by I1.item1,I2.item1 having count(*) > :minsup 25 SQL-92: SubQuery based Split support counting into cascade of k subqueries nth subquery Qn finds all tids that match the distinct itemsets formed by the first n items of Ck insert into Fk select item1, …, itemk, count(*) from (Subquery Qk) t Group by item1, item2 … , itemk having count(*) > :minsup Subquery Qn (for any n between 1 and k): select item1, …, itemn, tid from T tn, (Subquery Qn-1) as rn-1 (select distinct item1, …, itemn from CK) as dn where rn-1.item1 = dn.item1 and … and rn-1.itemn-1 = dn.itemn and rn-1.tid = tn.tid and tn.item = dn.itemn 26 Example of SubQuery based Using previous example from class C3 = {B,C,E}, minimum support = 2 Q0: No subquery Q0 Q1 in this case becomes select item1, tid From T t1, (select distinct item1from C3) as d1 where t1.item = d1.item1 27 Example of SubQuery based cnt’d Q2 becomes select item1, item2, tid from T t2, (Subquery Q1) as r1, (select distinct item1, item2 from C3) as d2 where r1.item1 = d2.item1 and r1.tid = t2.tid and t2.item = d2.item2 28 Example of SubQuery based cnt’d Q3 becomes select item1,item2,item3, tid from T t3, (Subquery Q2) as r2, (select distinct item1,item2,item3 from C3) as d3 where r2.item1 = d3.item1 and r2.item2 = d3.item2 and r2.tid = t3.tid and t3.item = d3.item3 29 Example of SubQuery based cnt’d Output of Q3 is Item1 Item2 Item3 Tid B B C C E E 20 30 Insert statement becomes insert into F3 select item1, item2, item3, count(*) from (Subquery Q3) t group by item1, item2 ,item3 having count(*) > :minsup Insert the row {B,C,E,2} For Q2, pass-2 optimization can be used 30 Performance Comparisons of SQL-92 approaches Used Version 5 of DB2 UDB and RS/6000 Model 140 200 Mhz CPU, 256 MB main memory, 9 GB of disk space, Transfer rate of 8 MB/sec Used 4 different item sets based on real-world data Built the following indexes, which are not included in any cost calculations Composite index (item1, …, itemk) on Ck k different indices on each of the k items in Ck (item,tid) and (tid,item) indexes on the data table T 31 Performance Comparisons of SQL-92 approaches Datasets # records (millions) # Transactions (millions) # Items (thousands) Avg # Items Dataset-A Dataset-B Dataset-C Dataset-D 2.5 7.5 6.6 14 0.57 2.5 0.21 1.44 85 15.8 15.8 480 4.4 2.62 31 9.62 Best performance obtained by SubQuery approach SubQuery was only comparable to loose-coupling in some cases, failing to complete in other cases DataSet C, for support of 2%, SubQuery outperforms loosecoupling but decreasing support to 1%, SubQuery takes 10 times as long to complete Lower support will increase the size of Ck and Fk at each step, causing the join to process more rows 32 Support Counting using SQL with object-relational extensions 6 different methods, four of which detailed in the papers GatherJoin GatherCount GatherPrune Vertical Other methods not discussed because of unacceptable performance Horizontal SBF 33 SQL Object-Relational Extension: GatherJoin Generates all possible k-item combinations of items contained in a transaction and joins them with Ck An index is created on all items of Ck Uses the following table functions Gather: Outputs records {tid,item-list}, with item-list being a BLOB or VARCHAR containing all items associated with the tid Comb-K: returns all k-item combinations from the transaction Output has k attributes T_itm1, …, T_itmk 34 GatherJoin insert into Fk select item1,…, itemk, count(*) from Ck, (select t2.T_itm1,…,t2.itmk from T, table(Gather(T.tid,T.item)) as t1, table(Comb-K(t1.tid,t1.item-list)) as t2) where t2.T_itm1 = Ck.item1 and … and t2.T_itmk = Ck.itemk group by Ck.item1,…,Ck.itemk having count(*) > :minsup 35 Example of GatherJoin t1 t1 (output from Gather) looks like: Item-List 10 20 30 40 A,C,D B,C,E A,B,C,E B,E t2 (generated by Comb-K from t1) will be joined with C3 to obtain F3 Tid 1 row from Tid 10 1 row from Tid 20 4 rows from Tid 30 Insert {B,C,E,2} 36 GatherJoin: Pass 2 optimization When calculating C2, no pruning is required after we join F1 with itself Don’t calculate and materialize C2 - replace C2 with a join to F1 before the table function Gather is only passed frequent 1-itemset rows insert into F2 select I1.item1, I2.item1, count(*) from F1 I1, (select t2.T_itm1,t2.T_itm2 from T, table(Gather(T.tid,T.item)) as t1, table(Comb-K(t1.tid,t1.item-list)) as t2 where T.item = I1.item1) group by t2.T_itm1,t2.T_itm2 having count(*) > :minsup 37 Variations of GatherJoin GatherCount Perform the GROUP BY inside the table function Comb-K for pass 2 optimization Output of the table function Comb-K Not the candidate frequent itemsets (Ck) But the actual frequent itemsets (Fk) along with the corresponding support Use a 2-dimensional array to store possible frequent itemsets in Comb-K May lead to excessive memory use 38 Variations of GatherJoin GatherPrune Push the join with Ck into the table function Comb-K Ck is converted into a BLOB and passed as an argument to the table function. Will have to pass the BLOB for each invocation of Comb-K - # of rows in table T 39 SQL Object-Relational Extension: Vertical For each item, create a BLOB containing the tids the item belongs to Use function Gather to generate {item,tidlist} pairs, storing results in table TidTable Tid-list are all in the same sorted order Use function Intersect to compare two different tid-lists and extract common values Pass-2 optimization can be used for Vertical Similar to K-way join method Upcoming example does not show optimization 40 Vertical insert into Fk select item1, …, itemk, count(tid-list) as cnt from (Subquery Qk) t where cnt > :minsup Subquery Qn (for any n between 2 and k) Select item1, …, itemn, Intersect(rn-1.tid-list, tn.tid-list) as tid-list from TidTable tn, (Subquery Qn-1) as rn-1 (select distinct item1, …, itemn from CK) as dn where rn-1.item1 = dn.item1 and … and rn-1.itemn-1 = dn.itemn-1 and tn.item = dn.itemn Subquery Q1: (select * from TidTable) 41 Example of Vertical Using previous example from class C3 = {B,C,E}, minimum support = 2 Q1 is TidTable Item Tid-List A B C D E 10,30 20,30,40 10,20,30 10 20,30,40 42 Example of Vertical cnt’d Q2 becomes Select item1, item2, Intersect(r1.tid-list, t2.tid-list) as tid-list from TidTable t2, (Subquery Q1) as r1 (select distinct item1, item2 from C3) as d2 where r1.item1 = d2.item1 and t2.item = d2.item2 43 Example of Vertical cnt’d Q3 becomes select item1, item2, item3, intersect(r2.tid-list, t3.tid-list) as tid-list from TidTable t3, (Subquery Q2) as r2 (select distinct item1, item2, item3 from C3) as d3 where r2.item1 = d3.item1 and r2.item2 = d3.item2 and t3.item = d3.item3 44 Performance Comparisons using SQL-OR Datasets # records (millions) # Transactions (millions) # Items (thousands) Avg # Items Dataset-A Dataset-B 2.5 7.5 0.57 2.5 85 15.8 4.4 2.62 Legend: Prep Pass 1 Pass 2 Pass 3 Pass 4 Data Set A Data Set B 14000 2500 12000 Time in Sec 10000 1500 8000 6000 4000 1000 2000 500 0 Support 0.5% 0.35% 0.20% Support 0.10% 0.03% Gcnt Gjoin Gprun Vert Gcnt Gjoin Gprun Vert Gcnt Gjoin Gprun Vert Gcnt Gjoin Gprun Vert Gcnt Gjoin Gprun Vert Gcnt Gjoin Gprun 0 Vert Time in Sec 2000 0.01% 45 Performance Comparisons using SQL-OR Datasets # records (millions) # Transactions (millions) # Items (thousands) Avg # Items Dataset-C Dataset-D 6.6 14 0.21 1.44 15.8 480 31 9.62 Legend: Prep Pass 1 Pass 2 Pass 3 Pass 4 Data Set C Data Set D 14000 12000 10000 10000 8000 8000 Support 2.0% 1.0% 0.25% Support 0.2% 0.07% Gcnt Gjoin Vert Gcnt Gjoin Vert Gcnt Gjoin Gprun Vert Gcnt Gjoin 0 Gprun 0 Vert 2000 Gcnt 2000 Gjoin 4000 Gprun 4000 Vert 6000 Gcnt 6000 Gjoin Time in Sec 12000 Vert Time in Sec 14000 0.02% 46 Performance comparison of SQL object-relational approaches Vertical has best overall performance, sometimes an order of magnitude better than other 3 approaches Pass-2 optimization has huge impact on performance of GatherJoin Majority of time is transforming the data in {item,tid-list} pairs Vertical spends too much time on the second pass For Dataset-B with support of 0.1 %, running time for Pass 2 went from 5.2 hours to 10 minutes Comb-K in GatherJoin generates large number of potential frequent itemsets we must work with 47 Hybrid approach Previous charts and algorithm analysis show Vertical spends too much time on pass 2 compared to other algorithms, especially when the support is decreased GatherJoin degrades when the # of frequent items per transaction increases To improve performance, use a hybrid algorithm Use Vertical for most cases When size of candidate itemset is too large, GatherJoin is a good option if number of frequent items per transaction (Nf) is not too large When Nf is large, GatherCount may be the only good option 48 Architecture Comparisons Compare five alternatives Loose-Coupling, Stored-procedure Basically the same except for address space program is being run in Because of limited difference in performance, focus solely on stored procedure in following charts Cache-Mine UDF tight-coupling Best SQL approach (Hybrid) 49 Performance Comparisons of Architectures Datasets # records (millions) # Transactions (millions) # Items (thousands) Avg # Items Dataset-A Dataset-B 2.5 7.5 0.57 2.5 85 15.8 4.4 2.62 Data Set B Data Set A 5000 700 4500 600 4000 500 3500 400 3000 2500 Time in Sec 300 200 100 1500 500 Support 0.1% 0.03% SQL UDF Sproc Cache SQL UDF Sproc Cache SQL UDF Sproc 0.2% SQL UDF Sproc Cache SQL UDF Cache Sproc 0.35% 0 Cache 0.5% SQL UDF Sproc 0 Support 2000 1000 Cache Time in Sec Legend: Pass 1 Pass 2 Pass 3 Pass 4 0.01% 50 Performance Comparisons of Architectures cnt’d Datasets # records (millions) # Transactions (millions) # Items (thousands) Avg # Items Dataset-C Dataset-D 6.6 14 0.21 1.44 15.8 480 31 9.62 Data Set C Legend: Pass 1 Pass 2 Pass 3 Pass 4 Data Set D 3500 12000 3000 10000 Time in Sec 2000 1500 8000 6000 1000 4000 500 2000 Support 0.2% 0.07% SQL UDF Sproc Cache SQL UDF Sproc Cache SQL UDF SQL UDF Cache Sproc 0.25% Sproc 1.00% SQL UDF Sproc Cache SQL UDF 2.0% 0 Cache Support Sproc 0 Cache Time in Sec 2500 0.02% 51 Performance Comparisons of Architectures cnt’d Cache-Mine is the best or close to the best performance in all cases Factor of 0.8 to 2 times faster than SQL approach Stored procedure is the worst Difference between Cache-Mine directly related to the number of passes through the data Passes increase when the support goes down May need to make multiple passes if all candidates cannot fit in memory UDF time per pass decreases 30-50% compared to stored procedure because of tighter coupling with DB 52 Performance Comparisons of Architectures cnt’d SQL approach comes in second in performance to Cache-Mine Somewhat better than Cache-Mine for high support values 1.8 – 3 times better than Stored-procedure/loose-coupling approach, getting better when support value decreases Cost of converting to Vertical format is less than cost of converting to binary format in Cache-Mine For second pass through data, SQL approach takes much more time than Cache-Mine, particularly when we decrease the support 53 Organization of Presentation Overview – Data Mining and RDBMS Loosely-coupled data and programs Tightly-coupled data and programs Architectural approaches Methods of writing efficient SQL Candidate generation, pruning, support counting K-way join, SubQuery, GatherJoin, Vertical, Hybrid Integrating taxonomies Mining sequential patterns 54 Taxonomies - example Beverages Soft Drinks Pepsi Snacks Alcoholic Drinks Coke Example rule: Soft Drinks Pretzels with 30% confidence, 2% support Beer Pretzels Chocolate Bar Parent Child Beverages Soft Drinks Beverages Alcoholic Drinks Soft Drinks Pepsi Soft Drinks Coke Alcoholic Drinks Beer Snacks Pretzels Snacks Chocolate Bar 55 Taxonomy augmentation Algorithms similar to previous slides Requires two additions to algorithm Pruning itemsets containing an item and its ancestor Pre-computing the ancestors for each item Will also consider support counting 56 Pruning items and ancestors In the second pass we will join F1 with F1 to give C2 This will give, for example: beverages,pepsi snacks,coke pretzels,chocolate bar But beverages,pepsi is redundant! 57 Pruning items and ancestors The following modification to the SQL statement eliminates such redundant combinations from being selected: insert into C2 (select I1.item1, I2.item1 from F1 I1, F1 I2 where I1.item1 < I2.item1) except (select ancestor, descendant from Ancestor union select descendant, ancestor from Ancestor) 58 Pre-computing ancestors An ancestor table is created Format (ancestor, descendant) Use the transitive closure operation insert into Ancestor with R-Tax (ancestor, descendant) as (select parent, child from Tax union all select p.ancestor, c.child from R-Tax p, Tax c where p.descendant = c.parent) select ancestor, descendant from R-Tax 59 Support Counting Extensions to handle taxonomies Straightforward, but Non-trivial Need an extended transaction table For example, if we have {coke, pretzels} We add also {soft drinks, pretzels}, {beverages, pretzels}, {coke, snacks}, {soft drinks, snacks}, {beverages, snacks} 60 Extended transaction table Can be obtained by the following SQL Query to generate T* select item, tid from T union select distinct A.ancestor as item, T.tid from T, Ancestor A where A.descendant = T.item The “select distinct” clause gets rid of items with common ancestor – e.g. don’t want {beverages, beverages} being added twice from {pepsi, coke} 61 Pipelining of Query No need to actually build T* Make following modification to SQL: insert into Fk with T*(tid, item) as (Query for T*) select item1,…,itemk,count(*) from Ck, T* t1, …, T* tk, where t1.item = Ck.item1, … , and tk.item = Ck.itemk and t1.tid = t2.tid … and tk-1.tid = tk.tid group by item1,…,itemk having count(*) > :minsup 62 Organization of Presentation Overview – Data Mining and RDBMS Loosely-coupled data and programs Tightly-coupled data and programs Architectural approaches Methods of writing efficient SQL Candidate generation, pruning, support counting K-way join, SubQuery, GatherJoin, Vertical, Hybrid Integrating taxonomies Mining sequential patterns 63 Sequential patterns Similar to papers covered on Nov 17 Input is sequences of transactions E.g. ((computer,modem),(printer)) Similar to association rules, but dealing with sequences as opposed to sets Can also specify maximum and minimum time gaps, as well as sliding time windows Max-gap, min-gap, window-size 64 Input and output formats Input has three columns: Sequence identifier (sid) Transaction time (time) Idem identifier (item) Output format is a collection of frequent sequences, in a fixed-width table (item1, eno1,…,itemk, enok, len) For smaller lengths, extra column values are set to NULL 65 GSP algorithm Similar to algorithms shown earlier Each Ck has transactions and times, but no length – has fixed length of k Candidates are generated in two steps Join – join Fk-1 with itself Sequence s1 joins with s2 if the subsequence obtained by dropping the first item of s1 is the same as the one obtained by dropping the last item of s2 When generating C2, we need to generate sequences where both of the items appear as a single element as well as two separate elements Prune All candidate sequences that have a non-frequent contiguous (k-1) subsequence are deleted 66 GSP – Join SQL insert into Ck select I1.item1, I1.eno1, ... , I1.itemk-1, I1.enok-1, I2.itemkk-1, I1.enok-1 + I2.enok-1 – I2.enok-2 from Fk-1 I1, Fk-1 I2 where I1.item2 = I2.item1 and ... and I1.itemk-1 = I2.itemk-2 and I1.eno3-I1.eno2 = I2.eno2 – I2.eno1 and ... and I1.enok-1 – I1.enok-2 = I2.enok-2 – I2.enok-3 67 GSP – Prune SQL Write as a k-way join, similar to before There are at most k contiguous subsequences of length (k-1) for which Fk-1 needs to be checked for membership Note that all (k-1) subsequences may not be contiguous because of the max-gap constraint between consecutive elements. 68 GSP – Support Counting In each pass, we use the candidate table Ck and the input data-sequences table D to count the support K-way join We use select distinct before the group by to ensure that only distinct data-sequences are counted We have additional predicates between sequence numbers to handle the special time elements 69 GSP – Support Counting SQL (Ck.enoj = Ck.enoi and abs(dj.time – di.time)≤ window-size) or (Ck.enoj = Ck.enoi + 1 and dj.time – di.time max-gap and dj.time – di.time > min-gap) or (Ck.enoj > Ck.enoi + 1) 70 References 1. Developing Tightly-Coupled Data Mining Applications on a Relational Database System Rakesh Agrawal, Kyuseok Shim, 1996 2. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal, 1998 Refers to 1) above 3. Mining Generalized Association Rules and Sequential Patterns Using SQL Queries Shiby Thomas, Sunita Sarawagi, 1998 Refers to 1) and 2) above 71