* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CS chit-chat - ERI people pages
Global serializability wikipedia , lookup
Commitment ordering wikipedia , lookup
Relational algebra wikipedia , lookup
Serializability wikipedia , lookup
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
Database model wikipedia , lookup
Gazetteer database • 4.5 million items, each having: – 1+ names • fair to good discriminator – 1 geospatial footprint • 99.9% points, 0.1% boxes & polygons • anticipate getting more (overlapping) polygons – 1+ types • excellent to horrible discriminator • No obvious way to partition them Greg Janée • chit-chat with CS database folks • 10/26/01 Gazetteer database • Informix Dynamic Server 2000 • Indexes – names: Verity (external; blade interface) – footprint: MapInfo (R-tree; blade interface) – types: B-tree Greg Janée • chit-chat with CS database folks • 10/26/01 Gazetteer database • Queries – dynamic – general case: arbitrary boolean combinations – in practice: • [name] AND [footprint] AND [([type] OR [type] OR ...)] • Desired behavior – see some results immediately queries – after seeing some results, ability to kill query Greg Janée • chit-chat with CS database folks • 10/26/01 Gazetteer database • Observed behavior: – Either: query answered quickly (< 2 minutes) • optimizer picks “right” index – Or: query takes forever (> 30 minutes) • optimizer picks “wrong” index • Complications: – no way to kill or interrupt database thread, through JDBC or otherwise Greg Janée • chit-chat with CS database folks • 10/26/01 Challenge #1 • Solve ADEPT’s query problem – Multiple, different data types • spatial, text, traditional linear types – Discriminability of any given index greatly depends on both data & query – Dynamic queries Greg Janée • chit-chat with CS database folks • 10/26/01 Load balancing • Why – distribute those queries-from-hell – increase reliability • How – multiple independent, identical databases – middleware directs query to “best” database – database executes query in its entirety Greg Janée • chit-chat with CS database folks • 10/26/01 Load balancing • What the middleware knows about a database: – current & maximum connection count – current queries – amount of time each query has been processing (QPT) • Idea: – score each DB inversely proportional to QPT2 Greg Janée • chit-chat with CS database folks • 10/26/01 Challenge #2 • Define metric – overall query processing time – response time • Do better – incorporate connection counts into formula – analyze queries – keep history Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • Gazetteer is an instance of a much more general problem • To wit: – how to describe the automatic translation of dynamic queries written in an abstract query language to SQL – in an easy, powerful, flexible way – making as few assumptions as possible about the underlying schema – and producing “reasonable” SQL • not so bad as to preclude database’s optimizer from working Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • Collection ( database) contains items having IDs – query should return IDs; duplicates OK – response time more important than overall QPT • Search bucket – abstract, typed thing against which constraints may be placed; standard buckets • Each collection supports 1+ buckets in idiosyncratic ways • Query language – arbitrary boolean combinations of bucket constraints Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • Standard buckets – subject-related text • title • assigned term – – – – – – originator geographic location coverage date object type, feature type, ... format identifier Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • Buckets grouped by types – spatial • e.g., overlaps a given polygon – temporal • e.g., contains a given date range – hierarchical • e.g., is any kind of geographic work – textual, qualified textual • e.g., contains the phrase “luis obispo” – numeric • e.g., > 5.7 meters Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • Example: translating a spatial bucket – MapInfo datablade • ST_Contains(table.column, “HG_Box(coords)”) – Geodetic datablade • Inside(“GeoBox(coords)”, table.column) – four bounding coordinate columns • table.northcolumn >= ... and table.southcolumn <= ... and ... Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • Existing Python-based scripting system – easy to configure & extend – comes with library of standard translation techniques – "geographic-location" : Bucket( "spatial", standardSpatialOperators, spatialToInformixMapInfo, ["j_holding", "footprint"]) • How to extend to boolean queries? Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • Too-easy solution: – given constraints C1, C2 that translate into SQL constraints (T1, S1), (T2, S2) then constraint C1 op C2 where op is AND, OR, or AND NOT becomes • select id from T1 where S1 op id in (select id from T2 where C2) • But: Informix appears to execute subqueries in their entirety before considering the outer query Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • The problems with JOINs – handling ANDs • tables may have {1, 1?, 1+, 0+} rows per item – handling ORs • may require UNION • but UNION is not nestable Greg Janée • chit-chat with CS database folks • 10/26/01 Query translation • The problem with disjunctive normal form: – may be inefficient • select id from table where S1 and (S2 or S3) – versus • select id from table where S1 and S2 union select id from table where S1 and S3 Greg Janée • chit-chat with CS database folks • 10/26/01 Challenge #3 • Design a translation description system – easy to configure & extend – makes as few assumptions as possible about the underlying schema – should produce reasonable SQL by default – supports customization of translation process – supports pattern-based overrides Greg Janée • chit-chat with CS database folks • 10/26/01