Download Ubiquitous Data Mining Technological Challenges

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Mining in
Ubiquitous Distributed
Environments
Assaf Schuster
Technion
SEBD Tutorial, June 06
Purpose of this Tutorial
• Convergence of distributed systems and
data mining
• Evolving field, no systematic coverage of
all aspects
• Will present: issues, challenges, examples
for algorithmic approaches, ideas,
tradeoffs accuracy vs. overhead
• Will not present: formal treatment, proofs,
details, technology, systems, hardware…
SEBD Tutorial, June 06
Ubiquitous Computing Systems
• Various Systems: Grid, P2P, WSN, MANET
• Several similar technological aspects
– Scale, aim for at least 10K (10M in P2P)
• partial failure, heterogeneity, dynamic state / data
– Multi-user, a 10K system serves >= 1K users
• resource sharing, caching, consistency
– Lots of distributed data
• streams, incremental, anytime, local filtering, locality filtering
– Cooperation of self-motivated parties
• trust management, security, privacy, competitive market, self
vs. global optimizations
– Stringently resource limited
• in-network computing, storage distribution
• Non-similar technological aspects
SEBD Tutorial, June 06
Ubiquitous Data Mining
• For the community
– E.g., P2P recommendations based on einteraction
• For Security
– E.g., identify and avert DoS attack (Overpeer and
P2P poisoning)
• For Administration
– E.g., misconfiguration detection system
(DataMiningGrid demo)
• For Data Cleansing
– E.g., in-network outliers detection (and removal) in
WSN
• DM Using HPC
– E.g., idle-cycle batch systems for high-complexity
SEBD Tutorial, June 06
analysis tasks (Superlink-Online)
Technological Challenges: Algorithms
• Scalable and resource limited distributed DM
– Algorithms for 10K peers, algorithms limited to two
messages per peer per hour, synchronization-less,
iteration-less, bag-of-tasks, dynamic divisibility, etc.
• Monitoring
– Distributed, local filtering
• Success, Correctness, and Consistency
– Partial failure, message dropping, heterogeneity, etc.
can yield all sorts of trouble
• Reusability, incrementality
– E.g., multi-class classifiers, multi-metric k-means
SEBD Tutorial, June 06
clustering, etc.
Technological Challenges: Systems
• Exploitation & HCI
– Lay user (parameterless) DM, interactive DM
– DM-based autonomous ubiquitous systems
• Security, Fraud, and Privacy
– Authorization, public-key-infrastructure, trust
management, data polution
• Longevity of DM jobs
– Resource sharing, non dedicated resources
• Communication patterns
– Esp. reliability and addressability. Are these
problems best solved
by suitable algorithms?
SEBD Tutorial, June 06
Related documents