Download Efficient Evaluation of Queries with Mining Predicates by Chaudhuri

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
Efficient Evaluation of Queries
with Mining Predicates
by Chaudhuri, Narasayya, and Sarawagi
CSci 8701 – Group G07
Charles Braxmeier
Problem Statement
Find more efficient ways to execute
queries where one or more of the
predicates are the results of data mining
decisions
 Example Query: Find fans who went to a
Minnesota hockey game last year who
may be football fans as well

Contributions of the Paper
Great detail about different types of mining
models (clustering, decision trees, etc.)
 Discussion regarding the different ways
mining predicate(s) can be joined within a
query
 Analysis on the experiments done to test
theories regarding query optimization
based on the structure of mining model

Key Concepts
Upper Envelope Predicate
 Tightness of the Query’s Predicates
 Mining Model

 Decision
Tree
 Naïve Bayes Classifiers
Bottom-up
 Top-Down

Key Concepts (cont’d.)

Mining Model (continued)
 Clustering
Centroid-based
 Model-based
 Boundary-based

Validation Methodology
Experimentation based on the theories
posed regarding query reorganization
 Twenty (20) different data sets used. Data
sets vary based on:

 Data
set size
 Number of dimensions in data set
 Size of data set used to train the mining
model
Validation Methodology (cont’d.)

Analysis of Experiment Results
 65%
of query access paths affected by rearranging the query based on the upper
envelope predicate
 Average run-time decreased by 65% by rearranging the query based on the upper
envelope predicate

More variance in run-time decrease than access
paths affected
Assumptions

Clustering can be evaluated via Bayes
classifiers
 Therefore,
not too much background info on
clustering and how its experiments were different than
Bayes experiments

Continuous data sets are split into discrete data
sets to assist in mining predictions
 Not
necessarily realistic
 Example, latitude / longitude
Possible Revisions to Paper

Spend more time on analysis of
experiments and results, rather than the
background info
 Background
information took up
approximately 60% of the paper
Questions?