Download knowledge discovery

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Knowledge discovery & data mining
Towards KD Support Environments
Fosca Giannotti and
Dino Pedreschi
Pisa KDD Lab
CNUCE-CNR & Univ. Pisa
http://www-kdd.di.unipi.it/
A tutorial @ EDBT2000
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
2
Vertical applications
We outlined three classes of vertical data
analysis applications that can be tackled
using KDD & DM techniques
Fraud detection
Market basket analysis
Customer segmentation
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
3
Why are these applications challenging?
Require manipulation and reasoning over
knowledge and data at different abstraction
levels
conceptual
 semantic integration of domain knowledge, expert
(business) rules and extracted knowledge
 semantic integration of different analysis paradigms
logical/physical
 interoperability with external components: DBMS’s, data
mining tools, desktop tools
 querying/mining optimization: loose vs. tight coupling
between query language and specialized mining tools
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
4
Why are these applications challenging?
 The associated KDD process
needs to be carefully
Interpretation
specified, tuned and
and Evaluation
controlled
Data Mining
Knowledge
Selection and
Preprocessing
Data
Consolidation
p(x)=0.02
Patterns &
Models
Warehouse
Prepared Data
Consolidated
Data
Data Sources
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
5
Why are these applications challenging?
Still not properly supported by available KDD
technology
what is offered:
horizontal, customizable toolkits/suites of
data mining primitives
what is needed:
KD support environments for vertical
applications
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
6
Datamining vs. traditional Sw development process
Traditional
Data mining
 Focus on knowledge
transfer, design and coding
 30% - analysis and design
 70% - program design,
coding and testing
 Prototyping - expensive
 Development process has
few loops
 Maintenance requires
human analysis
EDBT2000 tutorial - KDSE
 Focus on data selection,
representation and search
 70% - data preparation
 30% - model generation
and testing
 Prototyping - cheap
 Development process is
inherently iterative
 Maintenance requires relearning model
Konstanz, 27-28.3.2000
7
From R. Agrawal’s invited lecture @ KDD’99
Chasm
Early Market
Mainstream Market
The greatest peril in
market lies in making
market dominated by
market dominated by
EDBT2000 tutorial - KDSE
the development of a high-tech
the transition from an early
a few visionaries to a mainstream
pragmatists.
Konstanz, 27-28.3.2000
8
Is data mining in the chasm?
Perceived to be sophisticated technology,
usable only by specialists
Long, expensive projects
Stand-alone, loosely-coupled with data
infrastructures
Difficult to infuse into existing missioncritical applications
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
9
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
10
Generation 1: data mining tools
~1980: first generation of DM systems
research-driven tools for single tasks, e.g.
build a decision tree - say C4.5
find clusters - say Autoclass (Cheeseman 88)
…
Difficult to use more than one tool on the
same data – lots of data/metadata
transformation
Intended user: a specialist, technically
sophisticated.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
11
Generation 2: data mining suites
~1995: second generation of DM systems
toolkits for multiple tasks with support for
data preparation and interoperability with
DBMS, e.g.
SPSS Clementine
IBM Intelligent Miner
SAS Enterprise Miner
SFU DBMiner
Intended user: data analyst – suites require
significant knowledge of statistics and
databases
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
12
Growth of DM tools (source: kdnuggets.com)
 From G. Piatetsky-Shapiro. The data-mining
industry coming of age. IEEE Intelligent Systems,
Dec. 1999.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
13
Generation 3: data mining solutions
Beginning end of 1990s
vertical data mining-based applications and
solutions oriented to solving one specific
business problem, e.g.
detecting credit card fraud
customer retention
…
Address entire KDD process, and push
result into a front-end application
Intended user: business user – the
interfaces hid the data mining complexity
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
14
Emerging short-term technology trends
Tighter interoperability by means of standards
which facilitate the integration of data mining
with other applications:
KDD process, e.g. the Cross-Industry Standard
Process for Data Mining model (www.crisp-dm.org)
representation of mining models: e.g., the PMML predictive modeling markup language (www.dmg.org)
DB interoperability: the Microsoft OLE DB for data
mining interface
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
15
Approaches in data mining suites
Database-oriented approach
IBM Intelligent Miner
OLAP-based mining
DBMiner - Jiawei Han’s group @ SFU
Machine learning
CART, ID3/C4.5/C5.0, Angoss Knowledge Studio
Statistical approaches
The SAS Institute Enterprise Miner.
Visualization approach:
SGI MineSet, VisDB (Keim et al. 94).
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
16
Other approaches in data mining suites
Neural network approach:
Cognos 4thoughts, NeuroRule (Lu et al.’95).
Deductive DB integration:
KnowlegeMiner (Shen et al.’96)
Datasift (Pisa KDD Lab - see refs).
Rough sets, fuzzy sets:
Datalogic/R, 49er
Multi-strategy mining:
INLEN, KDW+, Explora
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
17
SFU DBMiner: OLAP-centric mining
Active Object
Elements
Warehouse
Workplace
Active
Object
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
18
IBM Intelligent Miner – DB-centric mining
Contents
Container
Mining Base
Container
Work
Area
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
19
IBM – IM architecture
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
20
Angoss Knowledge Studio: ML-centric mining
Work
Area
Project
Outline
Additional
Visualizations
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
21
KS project outline tool
 (Limited) support to
the KDD process
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
22
Support for data consolidation step
DBMiner
ODBC databases – SQL + SmartDrives
Single database – multiple tables
Consolidation of heterogeneous sources
unsupported
Intelligent Miner
DB2 and text – SQL without SmartDrives
Multiple databases
Consolidation of heterogeneous sources supported
Knowledge Studio
ODBC databases and text
Single table
Consolidation of heterogeneous sources
unsupported
Konstanz, 27-28.3.2000
EDBT2000
tutorial - KDSE
23
Support for selection and preprocessing
DBMiner
SQL only
Intelligent Miner
SQL + standard and advanced
statistical functionalities
Knowledge Studio
descriptive statistics
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
24
Support for data mining step
Knowledge Studio
DBMiner
Decision trees
Clustering
Prediction
Association rules
Decision trees
Prediction
Intelligent Miner
Associations rules
Sequential patterns
Clustering
Classification
Prediction
Similar time series
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
25
Support for interpretation and evaluation
Predefined interestingness measures
Emphasis on visualization
Limited export capability of analysis results
Gain charts for comparison of predictive
models (KS and IM)
Limited model combination capabilities (KS)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
26
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
27
Data Mining Query Languages
A DMQL can provide the ability to support
ad-hoc and interactive data mining
Hope: achieve the same effect that SQL
had on relational databases.
Various proposals:
DMQL (Han et al 96)
mine operator (Meo et el 96)
M-SQL (Imielinski et al 99)
query flocks (Tsur et al 98)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
28
MINE operator of (Meo et al 96)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
29
References - DMQL




J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A Data Mining
Query Language for Relational Databases. In Proc. 1996 SIGMOD'96 Workshop on
Research Issues on Data Mining and Knowledge Discovery (DMKD'96), pp. 27-33,
Montreal, Canada, June 1996.
R. Meo, G. Psaila, S. Ceri. A New SQL-like Operator for Mining Association Rules. In
Proc. VLDB96, 1996 Int. Conf. Very Large Data Bases, Bombay, India, pp. 122-133,
Sept. 1996.
T. Imielinski and A. Virmani. MSQL: a query language for database mining. Data
Mining and Knowledge Discovery, 3:373-408, 1999.
S. Tsur, J. Ulman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov. Query flocks:
a generalization of association rule mining. In Proc. 1998 ACM-SIGMOD, p. 1-12,
1998.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
30
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
31
DATASIFT - towards a logic-based KDSE
DATASIFT is LDL++ (Logic Data Language,
MCC & UCLA) extended with mining
primitives
(decision trees & association rules)
LDL++ syntax: Prolog-like deductive rules
LDL++ semantics: SQL extended with
recursion (and more)
Integration of deduction and induction
Employed to systematically develop the
methodology for MBA and audit planning
See Pisa KDD Lab references
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
32
Our position
A suitable integration of
deductive reasoning (logic database languages)
inductive reasoning (association rules & decision
trees)
provides a viable solution to high-level
problems in knowledge-intensive data
analysis applications
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
33
Our goal
Demonstrate how we support design and
control of the overall KDD process and the
incorporation of background knowledge
data preparation
knowledge extraction
post-processing and knowledge evaluation
business rules
autofocus datamining
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
34
With respect to other DMQL’s
extending logic query languages yields extra
expressiveness, needed to bridge the gap
between
data mining (e.g., association rule mining)
vertical applications (e.g., market basket
analysis)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
35
Architecture - client agent
User interface
Access to business rules and visualization
of results through
web browser to control interaction
MS Excel objects (sheets and charts) to
represent output of analysis (association rules)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
36
Architecture - server agent
 A query engine (mediator)
record previous analyses
Metadata/meta knowledge
interaction with other components
 LDL++ server
extended with external calls to DBMSs and to …
 Inductive modules
Apriori
classifiers (decision trees)
 Coupling with DBMS using the Cache-mine approach
 Performance comparable with SQL-based approaches
on same mining queries (Giannotti at el 2000)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
37
Deductive rules in LDL++
A small database of cash register transactions
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
 E.g.: select transactions involving milk
milk_basket(T,I)  basket(T,I),basket(T,milk).
 Querying ?- milk_basket(T,I)
milk_basket(2,bread).
milk_ basket(2,milk).
milk_ basket(2,onions).
milk_ basket(2,fish).
EDBT2000 tutorial - KDSE
milk_basket(3,bread).
milk_basket(3,orange).
milk_basket(3,milk).
Konstanz, 27-28.3.2000
38
Aggregates in LDL++
A small database of cash register transactions
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
E.g.:
count occurrences of pairs of distinct
aggregate
items in all transactions
pair(I1,I2,count<T>) basket(T,I1),basket(T,I2),I1 I2.
Querying ?- pair(fish,bread,N)
pair(fish,bread,2) (i.e., N=2)
Aggregates are the logical interface between
deductive and inductive environment.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
39
Association rules in LDL++
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
E.g., compute one-to-one association rules with
at least 40% support
rules(patterns<0.4,0,{I1,I2}>)basket(T,I1),basket(T,I2).
 patterns is the aggregate interfacing the
computation of association rules
 patterns<min_supp, min_conf, trans_set>
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
40
Association rules in LDL++
basket(1,fish).
basket(1,bread).
basket(2,bread).
basket(2,milk).
basket(2,onions).
basket(2,fish).
basket(3,bread).
basket(3,orange).
basket(3,milk).
Result of the query ?- rules(X,Y,S,C)
rules({milk},{bread},0.66,1)
i.e. milk  bread [0.66,1]
rules({bread},{milk},0.66,0.66)
rules({fish},{bread},0.66,1)
rules({bread},{fish},0.66,0.66)
Same status for data and induced rules
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
41
Reasoning on item hierarchies
Department
Sector
Family
Product (item)
EDBT2000 tutorial - KDSE
Which rules survive/decay
up/down the item hierarchy?
rules_at_level(I,pattern<S,C,Itemset>)

itemset_abstraction(I,Tid,Itemset).
preserved_rules(Left,Right)

rules_at_level(I,Left,Right,_,_),
rules_at_level(I+1,Left,Right,_,_).
Konstanz, 27-28.3.2000
42
Business rules: reasoning on promotions
Which rules are established by a promotion?
interval(before, -, 3/7/1998).
interval(promotion, 3/8/1998, 3/30/1998).
interval(after, 3/31/1998, +).
established_rules(Left, Right) 
not rules_partition(before, Left, Right, _, _),
rules_partition(promotion, Left, Right, _, _),
rules_partition(after, Left, Right, _, _).
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
43
Business rules: temporal reasoning
How does rule support change along time?
35
30
25
20
Support
Pasta => Fresh Cheese 14
15
Bread Subsidiaries => Fresh Cheese 28
Biscuits => Fresh Cheese 14
10
Fresh Fruit => Fresh Cheese 14
Frozen Food => Fresh Cheese 14
5
05/12/97
04/12/97
02/12/97
01/12/97
30/11/97
03/12/97
EDBT2000 tutorial - KDSE
29/11/97
28/11/97
27/11/97
26/11/97
25/11/97
0
Konstanz, 27-28.3.2000
44
Decision tree construction in DATASIFT
construct training and test set using rules
training_set(P,Case_list)  ...
test_tuple(ID,F1,...,F20,Rec,Act_rec,CAR)
 ...
construct classifier using external call to C5.0
tree_rules(Tree_name,P,PF,MC,BO,Rule_list) 
training_set(P,Case_list),
tree_induction(Case_list,PF,MC,BO,Rule_list).
parameters
pruning factor PF external call
misclassification costs MC
boosting BO
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
induced classifier
45
Putting decision trees at work
prediction of target variable
prediction(Tree_name,ID,CAR,Predicted_CAR) 
tree_rules(Tree_name, _ ,_ , _ , Rule_list),
test_subject(ID, F1, …, F20, _, _, CAR),
classify(Rule_list ,[F1, …, F20], Predicted_CAR).
Model evaluation: actual recovery of a classifier
(=sum recovery of tuples classified as positive)
actual_recovery(Tree_name,sum<Actual_Recovery>) 
prediction(Tree_name, ID, _ , pos),
test_subject(ID, F1, …, F20, _,Actual_Recovery, _).
aggregate
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
46
Combining decision trees
Model conjunction:
tree_conjunction(T1,T2,ID,CAR,pos) 
prediction(T1, ID, CAR, pos),
prediction(T2, ID, CAR, pos).
tree_conjunction (T1, T2, ID, CAR, neg) 
test_subject(ID, F1, …, F20, _, _, CAR),
~ tree_conjunction(T1, T2, ID, CAR, pos).
More interesting combinations readily
expressible:
e.g. meta learning (Chan and Stolfo 93)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
47
We proposed ...
 a KDD methodology for audit planning:
define an audit cost model
monitor training- and test-set construction
assess the quality of a classifier
tune classifier construction to specific policies
 and its formalization in a prototype logic-based
KDSE, supporting:
integration of deduction and induction
integration of domain and induced knowledge
separation of conceptual and implementation level
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
48
Module outline
Data analysis and KD Support Environments
Data mining technology trends
from tools …
… to suites …
… to solutions
Towards data mining query languages
DATASIFT: a logic-based KDSE
Future research challenges
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
49
A data mining research agenda
1. Integration with data warehouse and relational DB
2. Scalable, parallel/distributed and incremental mining
3. Data mining query language optimization
4. Multiple, integrated data mining methods
5. KDSE and methodological support for vertical appl.
6. Interactive, exploratory data mining environments
7. Mining on other forms of data:
 spatio-temporal databases
 text
 multimedia
 web
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
Scale up!
 Scaling up existing algorithms (AI, ML, IR)
Association rules
Correlation rules
Causal relationship
Classification
Clustering
Bayesian networks
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
51
Background knowledge & constraints
 Incorporating background knowledge and constraints
into existing data mining techniques
 Double benefit for DMQL: semantics and optimization!
traditional algorithms
Disproportionate computational cost for selective
users
Overwhelming volume of potentially useless results
need user-controlled focus in mining process
Association rules containing certain items
Sequential patterns containing certain patterns
Classification?
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
52
Vertical applications of data mining
More success stories needed!
Current data mining systems lack a thick
semantic layer (similarly to the early
relational database systems)
Verticalized data mining systems, e.g.
Market analysis systems
Fraud detection systems
Automated mining and interactive mining: how
far are they?
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
53
Autofocus data mining
policy options, business rules
selection of data mining function
fine parameter tuning of mining function
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
54
DBMS coupling
Tight-coupling with DBMS
Most data mining algorithms are based on flat
file data (i.e. loose-coupling with DBMS)
A set of standard data mining operators
(e.g. sampling operator)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
55
Web mining – why?
No standards on the web, enormous blob of
unstructured and heterogeneous info
Very dynamic
 One new WWW server every 2 hours
 5 million documents in 1995
320 million documents in 1998
Indices get obsolete very quickly
Better means needed for discovering
resources and extracting knowledge
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
56
Web mining: challenges
 Today`s search engines are plagued by problems
– the abundance problem:
99% of info of no interest to 99% of people!
– limited coverage of the Web
– limited query interface based on keywordoriented search
– limited customization to individual users
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
57
Web mining
 Web content mining
mining what Web search engines find
Web document classification (Chakrabarti et al 99)
warehousing a Meta-Web (Zaïane and Han 98)
intelligent query answering in Web search
 Web usage mining
Web log mining: find access patterns and trends (Zaiane et
al 98)
customized user tracking and adaptive sites (Perkowitz et
al 97)
 Web structure mining
discover authoritative pages: a page is important if
important pages point to it
(Chakrabarti et al 99, Kleinberg 98)
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
58
Warehousing a Meta-Web (Zaïane & Han 98)
 Meta-Web: summarizes the contents and structure
of the Web, which evolves with the Web
 Layer0: the Web itself
 Layer1: the lowest layer of the Meta-Web
an entry: a Web page summary, including class,
time, URL, contents, keywords, popularity, weight,
links, etc.
 Layer2 and up: summary/classification/clustering
 Meta-Web is warehoused and incrementally updated
 Querying and mining is performed on or assisted by
meta-Web
 Is it feasible/sustainable? Is XML of any help?
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
59
Meta-Web
from Jiawei Han’s panel talk @ SIGMOD99
Layern
More Generalized Descriptions
...
Layer1
Generalized Descriptions
Layer0
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
60
Weblog mining
 Web servers register a log entry for every single access.
 A huge number of accesses (hits) are registered and collected in
an ever-growing web log.
 Why warehousing/mining web logs?
Enhance server performance by learning access patterns of
general or particular users (guess what user will ask next and
pre-cache!)
Improve system design of web applications
Identify potential prime advertisement locations
 Greatest peril: the privacy pitfall
See e.g. (Markoff 99) the rise of the Little Brother.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
61
Some web mining references










M. Perkowitz and O. Etzioni. Adaptive sites: Automatically learning from user access patterns. In
Proc. 6th Int. World Wide Web Conf., Santa Clara, California, April 1997.
J. Pitkow. In search of reliable usage data on the www. In Proc. 6th Int. World Wide Web Conf.,
Santa Clara, California, April 1997.
T. Sullivan. Reading reader reaction : A proposal for inferential analysis of web server log files. In
Proc. 3rd Conf. Human Factors & the Web, Denver, Colorado, June 1997.
O. R. Zaiane, M. Xin, and J. Han. Discovering Web access patterns and trends by applying OLAP
and data mining technology on Web logs. In Proc. Advances in Digital Libraries Conf. (ADL'98), pages
19-29, Santa Barbara, CA, April 1998.
O. R. Zaiane, and J. Han. Resource and knowledge discovery in global information systems: a
preliminary design and experiment. In Proc. KDD’95, p.331-336, 1995.
O. R. Zaiane, and J. Han. WebML: querying the world-wide web for resources and knowledge. In
Proc. Int. Workshop on Web informtion and Data management (WIDM98), p. 9-12, 1998.
S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, et al. Mining the web’s link structure.
COMPUTER, 32:60-67, 1999.
S. Chakrabarti, B. E. Dom, P. Indik. Enhanced hypertext classification using hyperlinks. In Proc.
1998 ACM-SIGMOD, p. 307-318, 1999.
J. Kleinberg. Autohoritative sources in a hyperlinked environment. In Proc. ACM-SIAM Symp. on
Discrete Algorithms, 1998.
J. Markoff. The Rise of Little Brother. Upside, Apr. 1999;
http://www.upside.com/texis/mvm/story?id=36d4613c0
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
62
Pisa KDD Lab references









F. Giannotti and G. Manco. Making Knowledge Extraction and Reasoning Closer. In Proc. PAKDD'99,
The Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, 2000.
F. Giannotti and G. Manco. Querying Inductive Databases via Logic-Based User Defined Aggregates.
In Proc. PKDD'99, The Third Europ. Conf. on Principles and Practice of Knowledge Discovery in
Databases. Prague, Sept. 1999.
F. Bonchi, F. Giannotti, G. Mainetto, D. Pedreschi. Using Data Mining Techniques in Fiscal Fraud
Detection. In Proc. DaWak'99, First Int. Conf. on Data Warehousing and Knowledge Discovery.
Florence, Italy, Sept. 1999.
F. Bonchi , F. Giannotti, G. Mainetto, D. Pedreschi. A Classification-based Methodology for Planning
Audit Strategies in Fraud Detection. In Proc. KDD-99, ACM-SIGKDD Int. Conf. on Knowledge
Discovery & Data Mining, San Diego (CA), August 1999.
F. Giannotti, G. Manco, D. Pedreschi and F. Turini. Experiences with a logic-based knowledge
discovery support environment. In Proc. 1999 ACM SIGMOD Workshop on Research Issues in Data
Mining and Knowledge Discovery (SIGMOD'99 DMKD). Philadelphia, May 1999.
F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Integration of Deduction and
Induction for Mining Supermarket Sales Data. In Proc. PADD'99, Practical Application of Data
Discovery, Int. Conference, London, April 1999.
F. Giannotti, G. Manco, M. Nanni, D. Pedreschi. Nondeterministic, Nonmonotonic Logic Databases.
IEEE Trans. on Knowledge and Data Engineering. 2000.
F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Using deduction for intelligent data
analysis. Submitted, 2000. http://www-kdd.di.unipi.it/
P. Becuzzi, M. Coppola, S. Ruggieri and M. Vanneschi. Parallelisation of C4.5 as a particular divide
and conquer computation. Proc.3rd Workshop on High Performance Data Mining, Springer-Verlag
LNCS, 2000.
EDBT2000 tutorial - KDSE
Konstanz, 27-28.3.2000
63