Download D 3 M: Domain-Driven Data Mining - Data Sciences and Knowledge

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
D3M: Domain-Driven Data Mining
An Overview of
Domain-Driven Data Mining:
Toward Actionable Knowledge Discovery (AKD)
Longbing Cao
Faculty of Engineering and Information Technology
University of Technology, Sydney, Australia
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
Outline







Why Do We Need D3M
What Is D3M
The D3M Framework
D3M Theoretical Underpinnings
D3M Research Issues
D3M Applications
D3M References
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
2
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
Why Do We Need D3M
 A common scenario in deploying data
mining algorithms
 I find something interesting!
 “Many patterns are found”,
 “They satisfy technical metric threshold well”
 What do business people say?
 “So what?”
 “They are just commonsense”
 “I don’t care about them”
 “I don’t understand them”
 “How can I use them?”
 “Am I wrong? What can I do better for my business mate?”
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
3
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
Why Do We Need D3M
 Where is something wrong?
 Gap:
 academic objectives || business goals
 Technical outputs || business expectation
 macro-level methodological and fundamental issues
 Academic: technical interest; innovative algorithms &
patterns
 Practitioner: social, environmental, organizational
factors and impact; getting a problem solved properly
 micro-level technical and engineering issues
 System dynamics, system environment, and interaction
in a system
 Business processes, organizational factors, and
constraints
 Human and domain knowledge involvement
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
4
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 An example: Problem with association
mining
 Existing association rule mining algorithms are
specifically designed to find strong patterns that
have high predictive accuracy or correlation;
 While frequent patterns are referred to as
commonsense knowledge, they can be eager to
discover new and hidden patterns in databases.
 Many patterns are found;
 How associations can be taken over by business
people seamlessly and into operationalizable
actions accordingly?
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
5
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
What Is D3M
 Next-generation data mining
methodologies, frameworks, algorithms,
evaluation systems, tools and decision
support,
 Cater for business environment
 Satisfy business needs
 Deliver business-friendly and decision-making
rules and actions that are of solid technical and
business significance
 Can be understood & taken over by business
people to make decision
 aim to promote the paradigm shift from data15
December
2008
centered hidden pattern mining to domain-driven
actionable knowledge discovery (AKD)
Cao, L: D3M at DDDM2008 Joint with ICDM2008
6
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Involve and synthesize Ubiquitous
Intelligence
human intelligence,
domain intelligence,
data intelligence,
network intelligence,
organizational and social intelligence,
and
 meta-synthesis of the above ubiquitous
intelligence





15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
7
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
The D3M Framework
 AKD-based problem-solving
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
8
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Interestingness & actionability
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
9
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Conflicts & tradeoff
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
10
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 A framework for AKD
 Post-analysis-based AKD
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
11
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
D3M Theoretical Underpinnings
















artificial intelligence and intelligent systems,
behavior informatics and analytics,
business modeling,
business process management,
cognitive sciences,
data integration,
human-machine interaction,
human-centered computing,
knowledge representation and management,
machine learning,
ontological engineering,
organizational and social computing,
project management methodology,
social network analysis,
statistics,
system simulation, and so on.
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
12
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
D3M Research Issues

Data Intelligence:


Domain Intelligence:


empirical and implicit knowledge, expert knowledge and thoughts,
group/collective intelligence; human-machine interaction, representation and
involvement of human intelligence
Social Intelligence:


network-based data, knowledge, communities and resources; information
retrieval, text mining, web mining, semantic web, ontological engineering
techniques, and web knowledge management
Human Intelligence:


Domain & prior knowledge, business processes/logics/workflow, constraints, and
business interestingness; representation, modeling and involvement of them in
KDD
Network Intelligence:


deep knowledge in complex data structure; mining in-depth data patterns, and
mining structured & informative knowledge in complex data
organizational/social factors, laws/policies/protocols, trust/utility/benefit-cost;
collective intelligence, social network analysis, and social cognition interaction
Intelligence metasynthesis:

15
December
2008
Synthesize ubiquitous intelligence in KDD; metasynthetic interaction (minteraction) as working mechanism, and metasynthetic space (m-space) as an
AKD-based problem-solving system
Cao, L: D3M at DDDM2008 Joint with ICDM2008
13
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 How to reach an interest tradeoff
 Balance between technical and business
interests
 Suppose there are multiple metrics for
each aspect
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
14
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 actionable knowledge discovery through m-spaces
 acquiring and representing unstructured, illstructured and uncertain domain/human knowledge
 supporting dynamic involvement of business experts
and their knowledge/intelligence
 acquiring and representing expert thinking such as
imaginary thinking and creative thinking in group
heuristic discussions during KDD modeling
 acquiring and representing group/collective
interaction behavior and impact emergence
 Building infrastructure supporting the involvement
and synthesis of ubiquitous intelligence
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
15
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
D3M Applications
 Real-world data mining
 Our recent case studies
 Capital markets
 actionable trading agents
 actionable trading strategies
 Social security
 activity mining
 combined mining
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
16
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
Actionable Trading Evidence for
Brokerage Firms
 Trading strategy/evidence
 Actionable trading evidence
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
17
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Domain factors
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
18
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Business interest
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
19
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Developing in-depth trading strategy
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
20
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
21
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
22
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
Activity mining for Australian
Commonwealth Governmental Debt
Prevention
 Impact-targeted activity mining
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
23
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Impact-targeted activity mining
 Frequent impact-targeted activity
sequences
 Impact-contrasted activity sequences
 Impact-reversed activity sequences
 Impact-targeted combined association
clusters
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
24
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Data intelligence







15
December
2008
Activity data
Itemset imbalance
Impact imbalance
Seasonal effect
Demographic data
Transactional data
Itemset/tuple selection/construction
Cao, L: D3M at DDDM2008 Joint with ICDM2008
25
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Domain intelligence
 Business process/event for activity selection
 Domain knowledge
 Feature selection
 Sequence construction
 Impact target
 Positive impact
 Negative impact
 Multi-level impacts
 Feature/attribute selection
 Interestingness definition
 New pattern structures
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
26
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Organizational/social factors
 Operational/intervention activities
 Seasonal business requirement/
interaction changes
 Business cost (debt amount/duration)
 Business benefit (saving/preventing debt
amount or reducing debt duration)
 Deliverable format
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
27
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Impact-reserved pattern pair
 Underlying pattern 1:
 Derivative pattern 2:
 Impact-targeted combined
association clusters
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
28
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au

Conditional impact ratio (Cir)

Conditional Piatetsky-Shapiro’s (P-S) ratio (Cps)
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
29
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Interestingness: tech & biz
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
30
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 The process
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
31
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Impact-reversed sequential activity
patterns
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
32
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
 Demographic + transactional
combined pattern
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
33
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
D3M References
Books:


Cao, L. Yu, P.S., Zhang, C., Zhao, Y. Domain Driven Data Mining, Springer, 2009.
Cao, L. Yu, P.S., Zhang, C., Zhang, H.(ed.) Data Mining for Business Applications, Springer, 2008.
Workshops:


Domain-driven data mining 2008, joint with ICDM2008.
Domain-driven data mining 2007, joint with SIGKDD2007.
Special issues:


Domain-driven data mining, IEEE Trans. Knowledge and Data Engineering, 2009.
Domain-driven, actionable knowledge discovery, IEEE Intelligent Systems, Department, 22(4): 78-89, 2007.
Some of relevant papers:






Longbing Cao, Yanchang Zhao, Huaifeng Zhang, Dan Luo, Chengqi Zhang. Flexible Frameworks for Actionable
Knowledge Discovery, submitted to IEEE Trans. on Knowledge and Data Engineering.
Cao, L., Zhang, H., Zhao, Y., Zhang, C. Combined Mining: Discovering More Informative Knowledge in eGovernment Services, submitted to ACM TKDD, 2008.
Cao, L., Dai, R., Zhou, M.: Metasynthesis, M-Space and M-Interaction for Open Complex Giant Systems, technical
report, 2008.
Cao, L. and Ou, Y. Market Microstructure Patterns Powering Trading and Surveillance Agents. Journal of Universal
Computer Sciences, 2008 (to appear).
Cao, L. and He, T. Developing actionable trading agents, Knowledge and Information Systems: An International
Journal, 2008.
Cao, L. Developing Actionable Trading Strategies, in edited book: Intelligent Agents in the Evolution of WEB and
Applications, Springer, 2008.
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
34
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
Some of relevant papers:






Cao, L., Zhao, Y., Zhang, C. (2008), Mining Impact-Targeted Activity Patterns in Imbalanced Data,
IEEE Trans. Knowledge and Data Engineering, IEEE, , Vol. 20, No. 8, pp. 1053-1066, 2008.
Cao, L., Yu, P., Zhang, C., Zhao, Y., Williams, G.:DDDM2007: Domain Driven Data Mining, ACM
SIGKDD Explorations Newsletter, 9(2): 84-86, 2007.
Cao, L., Zhang, C.: Knowledge Actionability: Satisfying Technical and Business Interestingness,
International Journal of Business Intelligence and Data Mining, 2(4): 496-514, 2007.
Cao, L., Zhang, C.: The Evolution of KDD: Towards Domain-Driven Data Mining, International Journal
of Pattern Recognition and Artificial Intelligence, 21(4): 677-692, 2007.
Cao, L.: Domain-Driven Actionable Knowledge Discovery, IEEE Intelligent Systems, 22(4): 78-89,
2007.
Cao, L., and Zhang, C. Domain-driven data mining: A practical methodology, International Journal of
Data Warehousing and Mining (IJDWM), IGI Global, 2(4):49-65, 2006.
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
35
D3M: Domain-Driven Data Mining
The Smart Lab: datamining.it.uts.edu.au
Thank you!
Longbing CAO
Faculty of Engineering and IT
University of Technology, Sydney, Australia
Tel: 61-2-9514 4477
Fax: 61-2-9514 1807
email: [email protected]
Homepage: www-staff.it.uts.edu.au/~lbcao/
The Smart Lab: datamining.it.uts.edu.au
15
December
2008
Cao, L: D3M at DDDM2008 Joint with ICDM2008
36
Related documents