Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DGrid: A Library of Large-Scale Distributed Spatial Data Structures Pieter Hooimeijer, 2006-05-02 Motivation • DGrid was designed to: – Support very large sets of dynamic point data (i.e. points that move unpredictably). – Offer flexible trade-offs between the cost of search operations and the cost of updates. – Run on parallel and distributed systems. 2 Spatial Data Structures • Definition—Any data structure that holds: - points - rectangles - lines - polygons - curves - etc. • They are typically optimized for a particular type of search operation. • We’ll focus on point data. 3 • Commonly used example: The Quadtree. – Works like a binary search tree. – Each node in the tree has four children: NE, SE, SW, NW. – This is called ‘Recursive Decomposition’ 4 • The quadtree implementation in DGrid works like this: A (1,3) B (1,2) C (2,0) • This is a ‘bottom-up Matrix (MX) Quadtree.’ (section 3.1.2) 5 • Let’s do a search on that quadtree: • We ruled out the ‘entire’ NE quadrant at the root level of the tree. 6 • Trade-offs for bottom-up MX Quadtree, compared to other tree data structures: – The shape of the tree does not depend on the insertion order. – No need to balance the tree (which would be expensive). – Insertion and deletion are cheaper for clustered data. 7 C++ Templates • They look like this: template <typename ItemType> class vector { // ... }; • In this case, a separate vector< > class is generated for each item type. // Strong type-checking using templates: MyType * a = someVector.get(5); // instead of: MyType * a = (MyType)someVector.get(5); 8 • Turns out, C++ templates are a crude functional programming language. • Why ‘crude?’ // C++ {- Haskell Templates -} template fact 1< =int 1 N> struct factfact n = {n * fact (n - 1) static const int value = N * fact <N - 1 >:: value; }; template < > struct fact <1 > { static const int value = 1; }; • This is ‘executed’ by the compiler! 9 • This is called template metaprogramming. • It’s used extensively in DGrid, to make it: – easier to use; – faster; – type safe. 10 Distributed Data • DGrid uses Message Passing Interface (MPI) to run on distributed systems. – MPI is a library of basic ‘send’ and ‘receive’ operations. – Each processor gets a unique ID (‘rank’). – Use if-statements to run different code on different processes. 11 12 DGrid • DGrid has these data structures: – Two types of 2D arrays. – A quadtree. – A distributed data structure. – A location class. • Allows nesting of these data structures. 13 • Let’s see some examples of nested data structures: – A 2D array of quadtrees (implied: the quadtree contains locations). – A quadtree of small 2D arrays. – A 2D array of 2D arrays. • Called ‘tiling.’ • A lot like a ‘shallow’ quadtree. 14 • DGrid uses the Composite Design Pattern: location< > DataStructure 15 • DGrid uses templates instead of a ‘Component’ interface. • The result is that the user can do this: using namespace dgrid::tags; typedef dgrid::dgrid<MyItem, partial_grid_tag< quadtree_tag< > > > bucket; bucket a(0, 0, 639, 639, tiles(64, 64) << tiles(1, 1)); • This is the ‘2D array of quadtrees’ example. 16 • Important: The definition of the data structure is a type! • Consequences: – Can check parameters at compile time. (Must be tiles( ) << tiles ( ) for this example, or it won’t compile.) – Compiler can optimize extensively (it knows which functions are going to call each other). – Can’t define a type at runtime, so ‘composition’ must be known at compile time. 17 • Data structure operations: – insert(x, y, item) – add item at (x, y) – delete(x, y, item) – remove item from (x, y) – get(x, y, some_list) – get all items at (x, y) – get_range(x0, y0, x1, y1) – get all items in the range [ (x0, y0) : (x1, y1) ] • Note: even the location class must support these operations. 18 • In a nested data structure, operations are passed on from level to level. • Because the types are known at compile time, these calls can be inlined. – Pro: eliminates the overhead of the function call. – Con: code size increases (function body is repeated). 19 20 Future Work • Add more data structures, more search operations. • Separate interface further from implementation. • (Dynamic) Load Balancing. 21