Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Knowledge discovery & data mining Towards KD Support Environments Fosca Giannotti and Dino Pedreschi Pisa KDD Lab CNUCE-CNR & Univ. Pisa http://www-kdd.di.unipi.it/ A tutorial @ EDBT2000 Module outline Data analysis and KD Support Environments Data mining technology trends from tools … … to suites … to solutions Towards data mining query languages DATASIFT: a logic-based KDSE Future research challenges EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 2 Vertical applications We outlined three classes of vertical data analysis applications that can be tackled using KDD & DM techniques Fraud detection Market basket analysis Customer segmentation EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 3 Why are these applications challenging? Require manipulation and reasoning over knowledge and data at different abstraction levels conceptual semantic integration of domain knowledge, expert (business) rules and extracted knowledge semantic integration of different analysis paradigms logical/physical interoperability with external components: DBMS’s, data mining tools, desktop tools querying/mining optimization: loose vs. tight coupling between query language and specialized mining tools EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 4 Why are these applications challenging? The associated KDD process needs to be carefully Interpretation specified, tuned and and Evaluation controlled Data Mining Knowledge Selection and Preprocessing Data Consolidation p(x)=0.02 Patterns & Models Warehouse Prepared Data Consolidated Data Data Sources EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 5 Why are these applications challenging? Still not properly supported by available KDD technology what is offered: horizontal, customizable toolkits/suites of data mining primitives what is needed: KD support environments for vertical applications EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 6 Datamining vs. traditional Sw development process Traditional Data mining Focus on knowledge transfer, design and coding 30% - analysis and design 70% - program design, coding and testing Prototyping - expensive Development process has few loops Maintenance requires human analysis EDBT2000 tutorial - KDSE Focus on data selection, representation and search 70% - data preparation 30% - model generation and testing Prototyping - cheap Development process is inherently iterative Maintenance requires relearning model Konstanz, 27-28.3.2000 7 From R. Agrawal’s invited lecture @ KDD’99 Chasm Early Market Mainstream Market The greatest peril in market lies in making market dominated by market dominated by EDBT2000 tutorial - KDSE the development of a high-tech the transition from an early a few visionaries to a mainstream pragmatists. Konstanz, 27-28.3.2000 8 Is data mining in the chasm? Perceived to be sophisticated technology, usable only by specialists Long, expensive projects Stand-alone, loosely-coupled with data infrastructures Difficult to infuse into existing missioncritical applications EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 9 Module outline Data analysis and KD Support Environments Data mining technology trends from tools … … to suites … … to solutions Towards data mining query languages DATASIFT: a logic-based KDSE Future research challenges EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 10 Generation 1: data mining tools ~1980: first generation of DM systems research-driven tools for single tasks, e.g. build a decision tree - say C4.5 find clusters - say Autoclass (Cheeseman 88) … Difficult to use more than one tool on the same data – lots of data/metadata transformation Intended user: a specialist, technically sophisticated. EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 11 Generation 2: data mining suites ~1995: second generation of DM systems toolkits for multiple tasks with support for data preparation and interoperability with DBMS, e.g. SPSS Clementine IBM Intelligent Miner SAS Enterprise Miner SFU DBMiner Intended user: data analyst – suites require significant knowledge of statistics and databases EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 12 Growth of DM tools (source: kdnuggets.com) From G. Piatetsky-Shapiro. The data-mining industry coming of age. IEEE Intelligent Systems, Dec. 1999. EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 13 Generation 3: data mining solutions Beginning end of 1990s vertical data mining-based applications and solutions oriented to solving one specific business problem, e.g. detecting credit card fraud customer retention … Address entire KDD process, and push result into a front-end application Intended user: business user – the interfaces hid the data mining complexity EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 14 Emerging short-term technology trends Tighter interoperability by means of standards which facilitate the integration of data mining with other applications: KDD process, e.g. the Cross-Industry Standard Process for Data Mining model (www.crisp-dm.org) representation of mining models: e.g., the PMML predictive modeling markup language (www.dmg.org) DB interoperability: the Microsoft OLE DB for data mining interface EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 15 Approaches in data mining suites Database-oriented approach IBM Intelligent Miner OLAP-based mining DBMiner - Jiawei Han’s group @ SFU Machine learning CART, ID3/C4.5/C5.0, Angoss Knowledge Studio Statistical approaches The SAS Institute Enterprise Miner. Visualization approach: SGI MineSet, VisDB (Keim et al. 94). EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 16 Other approaches in data mining suites Neural network approach: Cognos 4thoughts, NeuroRule (Lu et al.’95). Deductive DB integration: KnowlegeMiner (Shen et al.’96) Datasift (Pisa KDD Lab - see refs). Rough sets, fuzzy sets: Datalogic/R, 49er Multi-strategy mining: INLEN, KDW+, Explora EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 17 SFU DBMiner: OLAP-centric mining Active Object Elements Warehouse Workplace Active Object EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 18 IBM Intelligent Miner – DB-centric mining Contents Container Mining Base Container Work Area EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 19 IBM – IM architecture EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 20 Angoss Knowledge Studio: ML-centric mining Work Area Project Outline Additional Visualizations EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 21 KS project outline tool (Limited) support to the KDD process EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 22 Support for data consolidation step DBMiner ODBC databases – SQL + SmartDrives Single database – multiple tables Consolidation of heterogeneous sources unsupported Intelligent Miner DB2 and text – SQL without SmartDrives Multiple databases Consolidation of heterogeneous sources supported Knowledge Studio ODBC databases and text Single table Consolidation of heterogeneous sources unsupported Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 23 Support for selection and preprocessing DBMiner SQL only Intelligent Miner SQL + standard and advanced statistical functionalities Knowledge Studio descriptive statistics EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 24 Support for data mining step Knowledge Studio DBMiner Decision trees Clustering Prediction Association rules Decision trees Prediction Intelligent Miner Associations rules Sequential patterns Clustering Classification Prediction Similar time series EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 25 Support for interpretation and evaluation Predefined interestingness measures Emphasis on visualization Limited export capability of analysis results Gain charts for comparison of predictive models (KS and IM) Limited model combination capabilities (KS) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 26 Module outline Data analysis and KD Support Environments Data mining technology trends from tools … … to suites … … to solutions Towards data mining query languages DATASIFT: a logic-based KDSE Future research challenges EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 27 Data Mining Query Languages A DMQL can provide the ability to support ad-hoc and interactive data mining Hope: achieve the same effect that SQL had on relational databases. Various proposals: DMQL (Han et al 96) mine operator (Meo et el 96) M-SQL (Imielinski et al 99) query flocks (Tsur et al 98) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 28 MINE operator of (Meo et al 96) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 29 References - DMQL J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A Data Mining Query Language for Relational Databases. In Proc. 1996 SIGMOD'96 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), pp. 27-33, Montreal, Canada, June 1996. R. Meo, G. Psaila, S. Ceri. A New SQL-like Operator for Mining Association Rules. In Proc. VLDB96, 1996 Int. Conf. Very Large Data Bases, Bombay, India, pp. 122-133, Sept. 1996. T. Imielinski and A. Virmani. MSQL: a query language for database mining. Data Mining and Knowledge Discovery, 3:373-408, 1999. S. Tsur, J. Ulman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov. Query flocks: a generalization of association rule mining. In Proc. 1998 ACM-SIGMOD, p. 1-12, 1998. EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 30 Module outline Data analysis and KD Support Environments Data mining technology trends from tools … … to suites … … to solutions Towards data mining query languages DATASIFT: a logic-based KDSE Future research challenges EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 31 DATASIFT - towards a logic-based KDSE DATASIFT is LDL++ (Logic Data Language, MCC & UCLA) extended with mining primitives (decision trees & association rules) LDL++ syntax: Prolog-like deductive rules LDL++ semantics: SQL extended with recursion (and more) Integration of deduction and induction Employed to systematically develop the methodology for MBA and audit planning See Pisa KDD Lab references EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 32 Our position A suitable integration of deductive reasoning (logic database languages) inductive reasoning (association rules & decision trees) provides a viable solution to high-level problems in knowledge-intensive data analysis applications EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 33 Our goal Demonstrate how we support design and control of the overall KDD process and the incorporation of background knowledge data preparation knowledge extraction post-processing and knowledge evaluation business rules autofocus datamining EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 34 With respect to other DMQL’s extending logic query languages yields extra expressiveness, needed to bridge the gap between data mining (e.g., association rule mining) vertical applications (e.g., market basket analysis) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 35 Architecture - client agent User interface Access to business rules and visualization of results through web browser to control interaction MS Excel objects (sheets and charts) to represent output of analysis (association rules) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 36 Architecture - server agent A query engine (mediator) record previous analyses Metadata/meta knowledge interaction with other components LDL++ server extended with external calls to DBMSs and to … Inductive modules Apriori classifiers (decision trees) Coupling with DBMS using the Cache-mine approach Performance comparable with SQL-based approaches on same mining queries (Giannotti at el 2000) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 37 Deductive rules in LDL++ A small database of cash register transactions basket(1,fish). basket(1,bread). basket(2,bread). basket(2,milk). basket(2,onions). basket(2,fish). basket(3,bread). basket(3,orange). basket(3,milk). E.g.: select transactions involving milk milk_basket(T,I) basket(T,I),basket(T,milk). Querying ?- milk_basket(T,I) milk_basket(2,bread). milk_ basket(2,milk). milk_ basket(2,onions). milk_ basket(2,fish). EDBT2000 tutorial - KDSE milk_basket(3,bread). milk_basket(3,orange). milk_basket(3,milk). Konstanz, 27-28.3.2000 38 Aggregates in LDL++ A small database of cash register transactions basket(1,fish). basket(1,bread). basket(2,bread). basket(2,milk). basket(2,onions). basket(2,fish). basket(3,bread). basket(3,orange). basket(3,milk). E.g.: count occurrences of pairs of distinct aggregate items in all transactions pair(I1,I2,count<T>) basket(T,I1),basket(T,I2),I1 I2. Querying ?- pair(fish,bread,N) pair(fish,bread,2) (i.e., N=2) Aggregates are the logical interface between deductive and inductive environment. EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 39 Association rules in LDL++ basket(1,fish). basket(1,bread). basket(2,bread). basket(2,milk). basket(2,onions). basket(2,fish). basket(3,bread). basket(3,orange). basket(3,milk). E.g., compute one-to-one association rules with at least 40% support rules(patterns<0.4,0,{I1,I2}>)basket(T,I1),basket(T,I2). patterns is the aggregate interfacing the computation of association rules patterns<min_supp, min_conf, trans_set> EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 40 Association rules in LDL++ basket(1,fish). basket(1,bread). basket(2,bread). basket(2,milk). basket(2,onions). basket(2,fish). basket(3,bread). basket(3,orange). basket(3,milk). Result of the query ?- rules(X,Y,S,C) rules({milk},{bread},0.66,1) i.e. milk bread [0.66,1] rules({bread},{milk},0.66,0.66) rules({fish},{bread},0.66,1) rules({bread},{fish},0.66,0.66) Same status for data and induced rules EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 41 Reasoning on item hierarchies Department Sector Family Product (item) EDBT2000 tutorial - KDSE Which rules survive/decay up/down the item hierarchy? rules_at_level(I,pattern<S,C,Itemset>) itemset_abstraction(I,Tid,Itemset). preserved_rules(Left,Right) rules_at_level(I,Left,Right,_,_), rules_at_level(I+1,Left,Right,_,_). Konstanz, 27-28.3.2000 42 Business rules: reasoning on promotions Which rules are established by a promotion? interval(before, -, 3/7/1998). interval(promotion, 3/8/1998, 3/30/1998). interval(after, 3/31/1998, +). established_rules(Left, Right) not rules_partition(before, Left, Right, _, _), rules_partition(promotion, Left, Right, _, _), rules_partition(after, Left, Right, _, _). EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 43 Business rules: temporal reasoning How does rule support change along time? 35 30 25 20 Support Pasta => Fresh Cheese 14 15 Bread Subsidiaries => Fresh Cheese 28 Biscuits => Fresh Cheese 14 10 Fresh Fruit => Fresh Cheese 14 Frozen Food => Fresh Cheese 14 5 05/12/97 04/12/97 02/12/97 01/12/97 30/11/97 03/12/97 EDBT2000 tutorial - KDSE 29/11/97 28/11/97 27/11/97 26/11/97 25/11/97 0 Konstanz, 27-28.3.2000 44 Decision tree construction in DATASIFT construct training and test set using rules training_set(P,Case_list) ... test_tuple(ID,F1,...,F20,Rec,Act_rec,CAR) ... construct classifier using external call to C5.0 tree_rules(Tree_name,P,PF,MC,BO,Rule_list) training_set(P,Case_list), tree_induction(Case_list,PF,MC,BO,Rule_list). parameters pruning factor PF external call misclassification costs MC boosting BO EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 induced classifier 45 Putting decision trees at work prediction of target variable prediction(Tree_name,ID,CAR,Predicted_CAR) tree_rules(Tree_name, _ ,_ , _ , Rule_list), test_subject(ID, F1, …, F20, _, _, CAR), classify(Rule_list ,[F1, …, F20], Predicted_CAR). Model evaluation: actual recovery of a classifier (=sum recovery of tuples classified as positive) actual_recovery(Tree_name,sum<Actual_Recovery>) prediction(Tree_name, ID, _ , pos), test_subject(ID, F1, …, F20, _,Actual_Recovery, _). aggregate EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 46 Combining decision trees Model conjunction: tree_conjunction(T1,T2,ID,CAR,pos) prediction(T1, ID, CAR, pos), prediction(T2, ID, CAR, pos). tree_conjunction (T1, T2, ID, CAR, neg) test_subject(ID, F1, …, F20, _, _, CAR), ~ tree_conjunction(T1, T2, ID, CAR, pos). More interesting combinations readily expressible: e.g. meta learning (Chan and Stolfo 93) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 47 We proposed ... a KDD methodology for audit planning: define an audit cost model monitor training- and test-set construction assess the quality of a classifier tune classifier construction to specific policies and its formalization in a prototype logic-based KDSE, supporting: integration of deduction and induction integration of domain and induced knowledge separation of conceptual and implementation level EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 48 Module outline Data analysis and KD Support Environments Data mining technology trends from tools … … to suites … … to solutions Towards data mining query languages DATASIFT: a logic-based KDSE Future research challenges EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 49 A data mining research agenda 1. Integration with data warehouse and relational DB 2. Scalable, parallel/distributed and incremental mining 3. Data mining query language optimization 4. Multiple, integrated data mining methods 5. KDSE and methodological support for vertical appl. 6. Interactive, exploratory data mining environments 7. Mining on other forms of data: spatio-temporal databases text multimedia web EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 Scale up! Scaling up existing algorithms (AI, ML, IR) Association rules Correlation rules Causal relationship Classification Clustering Bayesian networks EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 51 Background knowledge & constraints Incorporating background knowledge and constraints into existing data mining techniques Double benefit for DMQL: semantics and optimization! traditional algorithms Disproportionate computational cost for selective users Overwhelming volume of potentially useless results need user-controlled focus in mining process Association rules containing certain items Sequential patterns containing certain patterns Classification? EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 52 Vertical applications of data mining More success stories needed! Current data mining systems lack a thick semantic layer (similarly to the early relational database systems) Verticalized data mining systems, e.g. Market analysis systems Fraud detection systems Automated mining and interactive mining: how far are they? EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 53 Autofocus data mining policy options, business rules selection of data mining function fine parameter tuning of mining function EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 54 DBMS coupling Tight-coupling with DBMS Most data mining algorithms are based on flat file data (i.e. loose-coupling with DBMS) A set of standard data mining operators (e.g. sampling operator) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 55 Web mining – why? No standards on the web, enormous blob of unstructured and heterogeneous info Very dynamic One new WWW server every 2 hours 5 million documents in 1995 320 million documents in 1998 Indices get obsolete very quickly Better means needed for discovering resources and extracting knowledge EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 56 Web mining: challenges Today`s search engines are plagued by problems – the abundance problem: 99% of info of no interest to 99% of people! – limited coverage of the Web – limited query interface based on keywordoriented search – limited customization to individual users EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 57 Web mining Web content mining mining what Web search engines find Web document classification (Chakrabarti et al 99) warehousing a Meta-Web (Zaïane and Han 98) intelligent query answering in Web search Web usage mining Web log mining: find access patterns and trends (Zaiane et al 98) customized user tracking and adaptive sites (Perkowitz et al 97) Web structure mining discover authoritative pages: a page is important if important pages point to it (Chakrabarti et al 99, Kleinberg 98) EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 58 Warehousing a Meta-Web (Zaïane & Han 98) Meta-Web: summarizes the contents and structure of the Web, which evolves with the Web Layer0: the Web itself Layer1: the lowest layer of the Meta-Web an entry: a Web page summary, including class, time, URL, contents, keywords, popularity, weight, links, etc. Layer2 and up: summary/classification/clustering Meta-Web is warehoused and incrementally updated Querying and mining is performed on or assisted by meta-Web Is it feasible/sustainable? Is XML of any help? EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 59 Meta-Web from Jiawei Han’s panel talk @ SIGMOD99 Layern More Generalized Descriptions ... Layer1 Generalized Descriptions Layer0 EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 60 Weblog mining Web servers register a log entry for every single access. A huge number of accesses (hits) are registered and collected in an ever-growing web log. Why warehousing/mining web logs? Enhance server performance by learning access patterns of general or particular users (guess what user will ask next and pre-cache!) Improve system design of web applications Identify potential prime advertisement locations Greatest peril: the privacy pitfall See e.g. (Markoff 99) the rise of the Little Brother. EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 61 Some web mining references M. Perkowitz and O. Etzioni. Adaptive sites: Automatically learning from user access patterns. In Proc. 6th Int. World Wide Web Conf., Santa Clara, California, April 1997. J. Pitkow. In search of reliable usage data on the www. In Proc. 6th Int. World Wide Web Conf., Santa Clara, California, April 1997. T. Sullivan. Reading reader reaction : A proposal for inferential analysis of web server log files. In Proc. 3rd Conf. Human Factors & the Web, Denver, Colorado, June 1997. O. R. Zaiane, M. Xin, and J. Han. Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs. In Proc. Advances in Digital Libraries Conf. (ADL'98), pages 19-29, Santa Barbara, CA, April 1998. O. R. Zaiane, and J. Han. Resource and knowledge discovery in global information systems: a preliminary design and experiment. In Proc. KDD’95, p.331-336, 1995. O. R. Zaiane, and J. Han. WebML: querying the world-wide web for resources and knowledge. In Proc. Int. Workshop on Web informtion and Data management (WIDM98), p. 9-12, 1998. S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, et al. Mining the web’s link structure. COMPUTER, 32:60-67, 1999. S. Chakrabarti, B. E. Dom, P. Indik. Enhanced hypertext classification using hyperlinks. In Proc. 1998 ACM-SIGMOD, p. 307-318, 1999. J. Kleinberg. Autohoritative sources in a hyperlinked environment. In Proc. ACM-SIAM Symp. on Discrete Algorithms, 1998. J. Markoff. The Rise of Little Brother. Upside, Apr. 1999; http://www.upside.com/texis/mvm/story?id=36d4613c0 EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 62 Pisa KDD Lab references F. Giannotti and G. Manco. Making Knowledge Extraction and Reasoning Closer. In Proc. PAKDD'99, The Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, 2000. F. Giannotti and G. Manco. Querying Inductive Databases via Logic-Based User Defined Aggregates. In Proc. PKDD'99, The Third Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases. Prague, Sept. 1999. F. Bonchi, F. Giannotti, G. Mainetto, D. Pedreschi. Using Data Mining Techniques in Fiscal Fraud Detection. In Proc. DaWak'99, First Int. Conf. on Data Warehousing and Knowledge Discovery. Florence, Italy, Sept. 1999. F. Bonchi , F. Giannotti, G. Mainetto, D. Pedreschi. A Classification-based Methodology for Planning Audit Strategies in Fraud Detection. In Proc. KDD-99, ACM-SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, San Diego (CA), August 1999. F. Giannotti, G. Manco, D. Pedreschi and F. Turini. Experiences with a logic-based knowledge discovery support environment. In Proc. 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD'99 DMKD). Philadelphia, May 1999. F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Integration of Deduction and Induction for Mining Supermarket Sales Data. In Proc. PADD'99, Practical Application of Data Discovery, Int. Conference, London, April 1999. F. Giannotti, G. Manco, M. Nanni, D. Pedreschi. Nondeterministic, Nonmonotonic Logic Databases. IEEE Trans. on Knowledge and Data Engineering. 2000. F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Using deduction for intelligent data analysis. Submitted, 2000. http://www-kdd.di.unipi.it/ P. Becuzzi, M. Coppola, S. Ruggieri and M. Vanneschi. Parallelisation of C4.5 as a particular divide and conquer computation. Proc.3rd Workshop on High Performance Data Mining, Springer-Verlag LNCS, 2000. EDBT2000 tutorial - KDSE Konstanz, 27-28.3.2000 63