Download Supporting Join Queries

Supporting Join Queries Talk by: Collaborators: Andy Cooke Alasdair Gray, Lisha Ma, and Werner Nutt Heriot-Watt University What queries would users like to ask? (1)  A continuously executing query that might involve matching tuples across several streams. “ stream to me average net traffic passing between two ComputingElements (CEs)” need to specify in the query the age of tuples that can be matched (a “sliding window”)  e.g. “consider only tuples no older than 5 min. from now”  Possibly interesting? What queries would users like to ask? (2)  A “latest snapshot” query that joins the latest values of keys. “return all CEs that Steve is allowed to use” (Resource Broker)  This query would involve joining tuples from CE tables, VO tables and denied users tables Probably interesting!  A “history” query involving self-joins and aggregation “what was the growth in net traffic since last week?” Possibly interesting? How can R-GMA answer such queries? Observation:  If all the relevant tuples are inside one DBMS, then we can pass the query on to that DBMS query engine. - EASY!  If there are > 1 relevant producers, then our mediator probably needs an execution engine! - HARD! In any case, we know that some R-GMA users are defining Archivers and querying these directly. However:   the local answer may only be a subset of the global answer. they may get a wrong answer (if the query involved max, avg, count, etc.) Answering Joins using Archivers tables: cpuLoad, discspace condition: country =‘britain’ Requirements: • Complete views (I publish everything!) • “Latest” or “History” query-type (so data in a database, not a buffer). • A smart registry hmm.. just need to go to 1 Archiver. • Tuple matching always needs to take place in the same database, and never across databases. e.g. “SELECT * FROM cpuload c, discspace s WHERE c.site = s.site” can easily be answered using site archivers Problems with Answering Joins using Archivers (1)  Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers  new LatestProducer registers.  Archiver can’t stream from it.  mediator needs to mediate between two producers, but doesn’t have a query engine!  If a Answering Joins using Archivers (2) Problems:  Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers  new LatestProducer registers.  Archiver can’t stream from it.  mediator needs to mediate between two producers, but doesn’t have a query engine! If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. Answering Joins using Archivers (2) Problems:  Archivers can’t access the tuples introduced by LatestProducers and DatabaseProducers  new LatestProducer registers.  Archiver can’t stream from it.  mediator needs to mediate between two producers, but doesn’t have a query engine! If a new LatestProducer is registered, the Archiver cannot access these tuples because LatestProducers can’t answer stream queries. Answering Joins without Archivers Query Planning and Execution:    What are the relevant Producers? What sub-queries should we send them? How should results be combined and operated on? (need a query engine!) Where? Possible Query Engines:   MySQL - dump all the data into MySQL … easy! Polar Star (Manchester) ? … compatability? Conclusions We could support some “global” join queries quite easily:  when just one Archiver is enough (needs a smarter Registry)  suggestions could be given when there isn’t one Archiver available (consumer.getPlan())  and/or ad hoc joins could answered (in-efficiently) by first loading data into MySQL But:  what queries do users want to pose?  shouldn’t we restrict users to using only StreamProducers?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Supporting Join Queries