Download Database Issues in Smart Homes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database model wikipedia , lookup

Transcript
Database Issues in
Smart Homes
Pervasive Intelligent
Environments
Spring 2004
March 2, 2004
CRESCENT
TCU Dept. of Computer Science
Topics: Lecture 3
• Preparing for prediction & decision
making: Data Mining/KDD
• An example of some of the issues
we’ve discussed
– “Towards Sensor Database Systems”,
Bonnet, Gehrke, Seshadri
Data mining taken from
Elmasri & Navathe, 4th
edition
CRESCENT
TCU Dept. of Computer Science
Data Warehouses
(1 more thing)
• Repositories for data mining activities
– Aggregates/summaries of data help efficiency
• Optimized for decision-support, not
transaction processing
• Definition (Elmasri, page 900)
– A subject-oriented, integrated, non-volatile,
time-variant collection of data in support of
management’s decisions”
• Replace “management”, with “smart home agents”
CRESCENT
TCU Dept. of Computer Science
Data Mining Definition
• Discovery of new information in terms of patterns
or rules from vast amounts of data
• Extracts patterns that can’t readily be found by
asking the right questions (queries)
– TOO MUCH DATA FOR HUMANS
• Emerged from
– Artificial Intelligence:Machine learning, Neural nets,
Genetic Algorithms
– Statistics
– Operations Research
CRESCENT
TCU Dept. of Computer Science
6 STEPS TO DM:
some may be done as part of warehouse creations
• Data selection -- pick the data needed
• Data cleansing
– Fix bad data (e.g., spelling, zip codes)
– Hard to deal with missing, erroneous, conflicting,
redundant data
• Enrichment
– Add data (e.g., age, gender, income)
• Data transformation
– Aggregate (e.g., zip codes  regions)
• Data mining
• Reporting on discovered K
CRESCENT
TCU Dept. of Computer Science
Types of results
• Association rules
– Buy diapers  buy lots of beer
• Sequential patterns
– Buy house  buy furniture within months
• Classification trees
– Types of buyers (upscale,bargain-conscience, …)
• Why do it?
– Make more money
– Science & medicine
CRESCENT
TCU Dept. of Computer Science
DM/KDD Goals
• Find patterns to predict future
events
• Find major groupings
– Groupings of buyers, stars, diseases …
• Find which group something belongs
to
– creditworthiness
CRESCENT
TCU Dept. of Computer Science
What are we learning?
•
•
•
•
•
•
•
Association rules
Classification hierarchies
Clustering
Sequential patterns
Patterns within time series
Type of result, inputs & algorithms vary
Often interested in some combination of
these types of K
CRESCENT
TCU Dept. of Computer Science
Clustering
– Unsupervised learning techniques
–
–
–
–
• Training samples are unclassified
• Vs. supervised learning (classification)
Drug categories for depression
Categories of TV viewers
Categories of buyers (likely, unlikely)
Categories of households?
• Single male, mother/children, conventional
(M/D/kids), DINKs.
CRESCENT
TCU Dept. of Computer Science
Sequential patterns
• Detecting associations among events
with certain temporal relationships
• Example:
– Cardiac bypass for blocked arteries
– AND within 18 months, high blood urea
– THEN kidney failure likely in next 18
months
• Particularly important in smart homes
CRESCENT
TCU Dept. of Computer Science
Sequential Pattern Discovery
• Sequence of itemsets
– Grocery store purchases by 1 person
(3 itemsets)
• {soy milk, bread, chocolate}, {bananas,
chocolate}, {lettuce, tomato, chocolate}
• 2 Subsequences
– {soy milk, bread, chocolate}, {bananas, chocolate},
– {bananas, chocolate}, {lettuce, tomato, chocolate}
CRESCENT
TCU Dept. of Computer Science
Sequential pattern discovery
• The support for a sequence S is the % of the given
set U of sequences of which S is a subsequence.
– That is: how many times does S show up?
• Find all subsequences from the given sequence
sets that have a user-defined minimum support.
• The sequence S1, S2, … Sn, is a predictor of “fact”
that a customer that buys itemset S1 is likely to
buy itemset S2, then S3, …
• Prediction support based on frequency of this
sequence in the past
• Many research issues to create good algos
CRESCENT
TCU Dept. of Computer Science
Patterns within time series
• Finding 2 patterns that occur over
time
– 2003 stock prices of Choice Homes and
Home Depot
– 2 products show same sales pattern in
summer but different one in winter
– Solar magnetic wind patterns may
predict earth atmospheric changes
CRESCENT
TCU Dept. of Computer Science
Time series pattern discovery
• Time series are sequences of events
– Event could be a transaction (closing
daily stock price)
– Look at sequences over n days, or
– Longest period in which change is no
greater than 1%
• Comparing
– Must define similarity measures
CRESCENT
TCU Dept. of Computer Science
Other approaches in DM/KDD
• Neural nets
– Infer a function from a set of examples
–
–
–
–
• Non-parametric curve-fitting
• Interpolates to solve new problems
Supervised & unsupervised algorithms
 classification
 time-series
 can’t see what it learned (not
declarative)
CRESCENT
TCU Dept. of Computer Science
Other approaches in DM/KDD
• Genetic algorithms
– Set up
• Representation (strings over an alphabet)
• Evaluation (fitness) function
• Parameters: # of generations, cross-over
rate, mutation rate, etc.
– Randomized (probabilistic operators),
parallel search over search space
– Used for problem solving and clustering
CRESCENT
TCU Dept. of Computer Science
Sensor DB Article
• Design
– Distributed vs warehouse approach
– Sensor data
• Measurement uncertainty, communications failures
• Data representation
• Data model
– Relational +
• Sensor descriptions, including location
– Special rep for sensor sequences
• ADT attribute represents sensor data as output of
ADT functions
CRESCENT
TCU Dept. of Computer Science
Sensor DB Article: Queries
• Sample queries/characteristics (2nd page)
and sample extended SQL (3.1)
• Long running (continuous) queries
– Incremental queries retrieves all data over t
second interval, repeated every t seconds, take
union of them
– WHERE $every() in SQL
• Aggregates over time windows
• Virtual joins for ADT (slow) functions
CRESCENT
TCU Dept. of Computer Science