Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Advanced Topics in Data Mining and Research Directions CSE5610 Intelligent Software Systems Semester 1, 2006 www.monash.edu.au Outline • Mining Different Data Types – Spatial, Temporal, Time Series, Data Streams, Multimedia, XML, Web, Text etc. • Distributed Data Mining (DDM) • Mobile & Ubiquitous Data Mining (UDM) • Data Mining E-Services • Anytime, Anywhere Data Mining E-Services www.monash.edu.au 2 Generations of Data Mining • Four Generations of Data Mining Systems – Robert Grossman • First Generation – Stand Alone, Centralised, Single Algorithm • Second Generation – Integration with databases, support for high-dimensionality, complex data types • Third Generation – Distribution and Heterogeniety • Fourth Generation – Support for mining embedded, mobile and ubiquitous data sources www.monash.edu.au 3 Distributed Data Mining www.monash.edu.au Distributed Data Mining • Inherently distributed data • MNC + Global Markets • => Physical/geographical separation of users from the data sources • Traditional data mining model involving the co-location of users, data and computational resources is inadequate www.monash.edu.au 5 Distributed Data Mining (DDM) • The inherent distribution of data and other resources as a result of organisations being distributed. • The large volumes of data, the transfer of which results in exorbitant communication costs. • The need to mine heterogeneous data, the integration of which is both non-trivial and expensive. • The performance and scalability bottle necks of data mining. www.monash.edu.au 6 Distributed Data Mining (DDM) • DDM = Data Mining (DM) + Knowledge Integration (KI) • DM - Performing traditional knowledge discovery at each distributed data site. • KI - Merging the results generated from the individual sites into a body of cohesive and unified knowledge. www.monash.edu.au 7 Parallel Data Mining (PDM) • Principal distinction between DDM & Parallel DM – parallel mining involves parallel processors with or without shared memory • Parallel data mining also includes development of parallel versions of traditional data mining techniques. • Can be integration – DecisionCentre www.monash.edu.au 8 DDM – Algorithms & Architectures • Research in distributed data mining can be divided into two broad categories [Fu01]: • Data Mining Algorithms. – focus on efficient techniques for knowledge integration. • Distributed Data Mining Architectures. – focus on development of distributed data mining architectures – emphasizes the processes and technologies that support construction of software systems to perform distributed data mining www.monash.edu.au 9 Taxonomy of DDM Architectures Distributed Data Mining Systems Architectures Client-Server Agents Stationary Self-directed migration Mobile www.monash.edu.au 10 Classification – DDM Systems DDM Architectural Models Client-server Agents Mobile Agent Stationary Agent DDM Systems DecisionCentre [CDG99], IntelliMiner [PaS99, PaS01], InterAct [PaD02] JAM [SPT97], Infosleuth [UMG98, MUU99], BODHI [KPH99], Papyrus [Ram98], PADMA [KHS97a, KHS97b] www.monash.edu.au 11 Client-Server DDM Laptop Data PC Workstation Mining Results User Data Mining Request Data Mining Sever Data Server 1 Data Transfer Data Server 2 www.monash.edu.au 12 Mobile Agent Model for DDM USERS PC Workstation Laptop Task Controlling Agent Knowledge Integration Agent Data Mining Result Agent Agent System Data Mining Result Agent Directory Service Data Mining Agents Data Resource Agents Data Server 1 Data Server 1 www.monash.edu.au 13 Hybrid Model for DDM Agent Centre DDM Server Optimiser Agent Agent ClientServer Data Source 1 Data Source2 Data Source n www.monash.edu.au 14 Ubiquitous Data Mining www.monash.edu.au Ubiquitous Data Mining (UDM) • Mining data in a resource-constrained environment to support the time critical information needs of mobile users • Typical Characteristics – Mobile User – frequent disconnections – Handheld Device > Resource constraints – memory, battery, processor, screen real-estate – Time critical – Real-time & On-line – Data Streams • Example Scenarios • Many Challenges www.monash.edu.au 16 Current Research • Kargupta’s Group – MobiMine • @CSSE, Monash Univ. – AgentUDM – Adapative, Cost-efficient & Light-weight data mining techniques for data streams > Mohamed Medhat > LWC, LWF & LWClass > Watch this space!!! www.monash.edu.au 17 Data Mining E-Services www.monash.edu.au Data Mining E-Services • “…data analysis and mining functions themselves will be offered as business intelligence e-services that accept operational data from clients and return models or rules” Umesh Dayal, 2001 •Why? – Knowledge is a key resource – Cost of data mining infrastructure www.monash.edu.au 19 Data Mining E-Services • Current Commercial Landscape – Several ASPs -> DigiMine, Information Discovery, WhiteCross Systems, ListAnalyst.com etc. etc. – Mode of Operation • Hybrid Model & Data Mining ASPs – Optimise Response Time > Leads to improved throughput – QoS Estimation – Location Preferences of Clients www.monash.edu.au 20 Data Mining E-Services • Current Commercial Landscape – Several ASPs -> DigiMine, Information Discovery, WhiteCross Systems, ListAnalyst.com etc. etc. – Mode of Operation • Hybrid Model & Data Mining ASPs – Optimise Response Time > Leads to improved throughput – QoS Estimation – Location Preferences of Clients www.monash.edu.au 21 Anytime, Anywhere Data Mining E-Services www.monash.edu.au My Thoughts • Data is a commodity, Analysis is a service • Access anytime, anywhere • By anyone… – From large corporations to small business to individuals • From home buyers to mobile salespersons to grocery shoppers… www.monash.edu.au 23 My Thoughts • A preliminary model for delivery – Datacentric Grids Compute New Model Request + User Data Compute New Model Request + User Data + User Computation Private Datacentric Grid Compute New Model Request + Remote User Data Compute New Model Request + User Computation Model Repository Mining Algorithms Compute New Model Request Data 1 Model Query Data 2 Data n Data Repository Mobile Agent Management System High Performance Servers Datacentric Grid Management Module www.monash.edu.au 24 References www.monash.edu.au References • http://www.csse.monash.edu.au/projects/ MobileComponents/projects/dame/ • http://www.csse.monash.edu.au/~shonali/ research.html • http://www.csee.umbc.edu/~hillol/DDMBIB / • http://www.csee.umbc.edu/~hillol/diadic.h tml • http://www.csse.monash.edu.au/~mgaber/ main.html www.monash.edu.au 26