Download Efficient Evaluation of Queries with Mining Predicates by Chaudhuri

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia, lookup

Cluster analysis wikipedia, lookup

Transcript
Efficient Evaluation of Queries
with Mining Predicates
by Chaudhuri, Narasayya, and Sarawagi
CSci 8701 – Group G07
Charles Braxmeier
Problem Statement
Find more efficient ways to execute
queries where one or more of the
predicates are the results of data mining
decisions
 Example Query: Find fans who went to a
Minnesota hockey game last year who
may be football fans as well

Contributions of the Paper
Great detail about different types of mining
models (clustering, decision trees, etc.)
 Discussion regarding the different ways
mining predicate(s) can be joined within a
query
 Analysis on the experiments done to test
theories regarding query optimization
based on the structure of mining model

Key Concepts
Upper Envelope Predicate
 Tightness of the Query’s Predicates
 Mining Model

 Decision
Tree
 Naïve Bayes Classifiers
Bottom-up
 Top-Down

Key Concepts (cont’d.)

Mining Model (continued)
 Clustering
Centroid-based
 Model-based
 Boundary-based

Validation Methodology
Experimentation based on the theories
posed regarding query reorganization
 Twenty (20) different data sets used. Data
sets vary based on:

 Data
set size
 Number of dimensions in data set
 Size of data set used to train the mining
model
Validation Methodology (cont’d.)

Analysis of Experiment Results
 65%
of query access paths affected by rearranging the query based on the upper
envelope predicate
 Average run-time decreased by 65% by rearranging the query based on the upper
envelope predicate

More variance in run-time decrease than access
paths affected
Assumptions

Clustering can be evaluated via Bayes
classifiers
 Therefore,
not too much background info on
clustering and how its experiments were different than
Bayes experiments

Continuous data sets are split into discrete data
sets to assist in mining predictions
 Not
necessarily realistic
 Example, latitude / longitude
Possible Revisions to Paper

Spend more time on analysis of
experiments and results, rather than the
background info
 Background
information took up
approximately 60% of the paper
Questions?