Download Efficient Evaluation of Queries with Mining Predicates by Chaudhuri

Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi CSci 8701 – Group G07 Charles Braxmeier Problem Statement Find more efficient ways to execute queries where one or more of the predicates are the results of data mining decisions  Example Query: Find fans who went to a Minnesota hockey game last year who may be football fans as well  Contributions of the Paper Great detail about different types of mining models (clustering, decision trees, etc.)  Discussion regarding the different ways mining predicate(s) can be joined within a query  Analysis on the experiments done to test theories regarding query optimization based on the structure of mining model  Key Concepts Upper Envelope Predicate  Tightness of the Query’s Predicates  Mining Model   Decision Tree  Naïve Bayes Classifiers Bottom-up  Top-Down  Key Concepts (cont’d.)  Mining Model (continued)  Clustering Centroid-based  Model-based  Boundary-based  Validation Methodology Experimentation based on the theories posed regarding query reorganization  Twenty (20) different data sets used. Data sets vary based on:   Data set size  Number of dimensions in data set  Size of data set used to train the mining model Validation Methodology (cont’d.)  Analysis of Experiment Results  65% of query access paths affected by rearranging the query based on the upper envelope predicate  Average run-time decreased by 65% by rearranging the query based on the upper envelope predicate  More variance in run-time decrease than access paths affected Assumptions  Clustering can be evaluated via Bayes classifiers  Therefore, not too much background info on clustering and how its experiments were different than Bayes experiments  Continuous data sets are split into discrete data sets to assist in mining predictions  Not necessarily realistic  Example, latitude / longitude Possible Revisions to Paper  Spend more time on analysis of experiments and results, rather than the background info  Background information took up approximately 60% of the paper Questions?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Efficient Evaluation of Queries with Mining Predicates by Chaudhuri