Download Supporting Join Queries

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Microsoft Access wikipedia , lookup

Clusterpoint wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Versant Object Database wikipedia , lookup

Join (SQL) wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Relational algebra wikipedia , lookup

Transcript
Supporting Join Queries
Talk by:
Collaborators:
Andy Cooke
Alasdair Gray, Lisha Ma,
and Werner Nutt
Heriot-Watt University
What queries would users like to ask? (1)

A continuously executing query that might involve
matching tuples across several streams.
“ stream to me average net traffic passing between two
ComputingElements (CEs)”
need to specify in the query the age of tuples that can
be matched (a “sliding window”)
 e.g. “consider only tuples no older than 5 min. from
now”

Possibly interesting?
What queries would users like to ask? (2)

A “latest snapshot” query that joins the latest values
of keys.
“return all CEs that Steve is allowed to use”
(Resource Broker)
 This query would involve joining tuples from CE
tables, VO tables and denied users tables
Probably interesting!
 A “history” query involving self-joins and aggregation
“what was the growth in net traffic since last week?”
Possibly interesting?
How can R-GMA answer such queries?
Observation:

If all the relevant tuples are inside one DBMS, then we can
pass the query on to that DBMS query engine.
- EASY!

If there are > 1 relevant producers, then our mediator
probably needs an execution engine!
- HARD!
In any case, we know that some R-GMA users are defining
Archivers and querying these directly. However:


the local answer may only be a subset of the global answer.
they may get a wrong answer (if the query involved max,
avg, count, etc.)
Answering Joins using Archivers
tables: cpuLoad,
discspace
condition:
country =‘britain’
Requirements:
• Complete views (I publish everything!)
• “Latest” or “History” query-type
(so data in a database, not a buffer).
• A smart registry
hmm.. just need to go to 1 Archiver.
• Tuple matching always needs to take place in
the same database, and never across
databases. e.g.
“SELECT * FROM cpuload c, discspace s
WHERE c.site = s.site”
can easily be answered using site archivers
Problems with Answering Joins using
Archivers (1)

Archivers can’t access the tuples introduced by
LatestProducers and DatabaseProducers
 new LatestProducer registers.
 Archiver can’t stream from it.
 mediator needs to mediate between two
producers, but doesn’t have a query engine!
 If a
Answering Joins using Archivers (2)
Problems:

Archivers can’t access the tuples introduced by
LatestProducers and DatabaseProducers
 new LatestProducer registers.
 Archiver can’t stream from it.
 mediator needs to mediate between two
producers, but doesn’t have a query engine!
If
a new LatestProducer is registered, the Archiver cannot access
these tuples because LatestProducers can’t answer stream queries.
Answering Joins using Archivers (2)
Problems:

Archivers can’t access the tuples introduced by
LatestProducers and DatabaseProducers
 new LatestProducer registers.
 Archiver can’t stream from it.
 mediator needs to mediate between two
producers, but doesn’t have a query engine!
If
a new LatestProducer is registered, the Archiver cannot access
these tuples because LatestProducers can’t answer stream queries.
Answering Joins without Archivers
Query Planning and Execution:



What are the relevant Producers?
What sub-queries should we send them?
How should results be combined and operated
on? (need a query engine!) Where?
Possible Query Engines:


MySQL - dump all the data into MySQL
… easy!
Polar Star (Manchester) ?
… compatability?
Conclusions
We could support some “global” join queries quite easily:
 when just one Archiver is enough (needs a smarter
Registry)
 suggestions could be given when there isn’t one Archiver
available (consumer.getPlan())
 and/or ad hoc joins could answered (in-efficiently) by first
loading data into MySQL
But:
 what queries do users want to pose?
 shouldn’t we restrict users to using only StreamProducers?