Download Efficient Evaluation of Queries with Mining Predicates by Chaudhuri

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Cluster analysis wikipedia, lookup

Efficient Evaluation of Queries
with Mining Predicates
by Chaudhuri, Narasayya, and Sarawagi
CSci 8701 – Group G07
Charles Braxmeier
Problem Statement
Find more efficient ways to execute
queries where one or more of the
predicates are the results of data mining
 Example Query: Find fans who went to a
Minnesota hockey game last year who
may be football fans as well
Contributions of the Paper
Great detail about different types of mining
models (clustering, decision trees, etc.)
 Discussion regarding the different ways
mining predicate(s) can be joined within a
 Analysis on the experiments done to test
theories regarding query optimization
based on the structure of mining model
Key Concepts
Upper Envelope Predicate
 Tightness of the Query’s Predicates
 Mining Model
 Decision
 Naïve Bayes Classifiers
 Top-Down
Key Concepts (cont’d.)
Mining Model (continued)
 Clustering
 Model-based
 Boundary-based
Validation Methodology
Experimentation based on the theories
posed regarding query reorganization
 Twenty (20) different data sets used. Data
sets vary based on:
 Data
set size
 Number of dimensions in data set
 Size of data set used to train the mining
Validation Methodology (cont’d.)
Analysis of Experiment Results
 65%
of query access paths affected by rearranging the query based on the upper
envelope predicate
 Average run-time decreased by 65% by rearranging the query based on the upper
envelope predicate
More variance in run-time decrease than access
paths affected
Clustering can be evaluated via Bayes
 Therefore,
not too much background info on
clustering and how its experiments were different than
Bayes experiments
Continuous data sets are split into discrete data
sets to assist in mining predictions
 Not
necessarily realistic
 Example, latitude / longitude
Possible Revisions to Paper
Spend more time on analysis of
experiments and results, rather than the
background info
 Background
information took up
approximately 60% of the paper