* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Efficient Evaluation of Queries with Mining Predicates by Chaudhuri, Narasayya, and Sarawagi CSci 8701 – Group G07 Charles Braxmeier Problem Statement Find more efficient ways to execute queries where one or more of the predicates are the results of data mining decisions Example Query: Find fans who went to a Minnesota hockey game last year who may be football fans as well Contributions of the Paper Great detail about different types of mining models (clustering, decision trees, etc.) Discussion regarding the different ways mining predicate(s) can be joined within a query Analysis on the experiments done to test theories regarding query optimization based on the structure of mining model Key Concepts Upper Envelope Predicate Tightness of the Query’s Predicates Mining Model Decision Tree Naïve Bayes Classifiers Bottom-up Top-Down Key Concepts (cont’d.) Mining Model (continued) Clustering Centroid-based Model-based Boundary-based Validation Methodology Experimentation based on the theories posed regarding query reorganization Twenty (20) different data sets used. Data sets vary based on: Data set size Number of dimensions in data set Size of data set used to train the mining model Validation Methodology (cont’d.) Analysis of Experiment Results 65% of query access paths affected by rearranging the query based on the upper envelope predicate Average run-time decreased by 65% by rearranging the query based on the upper envelope predicate More variance in run-time decrease than access paths affected Assumptions Clustering can be evaluated via Bayes classifiers Therefore, not too much background info on clustering and how its experiments were different than Bayes experiments Continuous data sets are split into discrete data sets to assist in mining predictions Not necessarily realistic Example, latitude / longitude Possible Revisions to Paper Spend more time on analysis of experiments and results, rather than the background info Background information took up approximately 60% of the paper Questions?