Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Query Relaxation Using Malleable Schemas Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, Wolfgang Nejdl L3S Research Center Leibniz University Hanover, Germany Presented by Aaron Stewart BYU CS 652 Spring 2009 Problem + =? Problem • Multiple data sources • Unmatched schemas Approach 1. Malleable schemas 2. Discover correlations 3. Relax user queries Malleable Schemas • Allow duplicate fields • Allow related fields Malleable Schemas Malleable Schemas first_name, sur_name name Malleable Schemas body contents In Practice: Tables • “…a malleable schema… contains imprecise and overlapping definitions of attributes or relationships.” • “In this way, a malleable schema can capture such heterogeneous data structures as in Figure 1.” In Practice: Tables In Practice: Tables Attributes (database fields, columns) Entities (database records, rows) Equivalently: Distinct tables Query Relaxation Planning • Multiple queries – Different columns or tables – As few queries as possible • Exponential number of relaxed queries – Evaluate in order of precision – Stop at k results Query Relaxation Planning relaxed attribute child attributes A1 A2 Query Relaxation Planning • A “relaxed query always yields better precision than its child queries, so that it should always be evaluated prior to its child queries” Parent/Child Relationship • We would think A is the parent, and A1 and A2 are the children, but… • Put them in order of correlation probability – If P(A|A1) > P(A|A2) – Then A => A1 => A2 Query Relaxation Planning Query Relaxation Experiments • Data sets – IMDB Movies – Amazon.com DVDs and VHS videos Results Results Results Analysis • Strengths – Handles mixed schemas – Well-designed algorithms (IMO) • Future work – Speed