Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
XmdvTool Interactive Visual Data Exploration System for High-dimensional Data Sets http://davis.wpi.edu/~xmdv Matthew O. Ward, Elke A. Rundensteiner, Jing Yang, Punit Doshi, Geraldine Rosario, Allen R. Martin, Ying-Huey Fua, Daniel Stroe Worcester Polytechnic Institute This work partially funded by NSF Grants IIS-9732897, IRIS-9729878 and IIS-0119276 1 XmdvTool Features • • Hierarchical visualization and interaction tools for exploring very large high-dimensional data sets to discover patterns, trends and outliers Applications: • • Bioterrorism Detection Bioinformatics and Drug Discovery Space Science Geology and Geochemistry Systems Monitoring and Performance Evaluation Economics and Business Simulation Design and Analysis Multi-platform support (Unix, Linux, Windows) Public domain software: http://davis.wpi.edu/~xmdv 2 Xmdv: Main Features • Scale-up to High Dimensions: Visual Hierarchical Dimension Reduction • Scale-up to Large Data Sets: Interactive Hierarchical Displays, Database Backend with Minmax Encoding, Semantic Caching and Adaptive Prefetching • Interlinked Multi-Displays: Parallel Coordinates, Glyphs, Scatterplot Matrices, Dimensional Stacking • Visual Interaction Tools: N-Dimensional Brushes, Structure-Based Brushing, InterRing 3 Scale-Up for Large Number of Dimensions Solution to High Dimensional Datasets: • Group Similar Dimensions into Dimension Hierarchy • Navigate Dimension Hierarchy by InterRing • Form Lower Dimensional Spaces by Dimension Clusters • Convey Dimension Cluster Information by Dissimilarity Display 4 Visual Hierarchical Dimension Reduction Process 5 Visual Hierarchical Dimension Reduction Process A 42-dimensional Data Set A 4-Dimensional Subspace Dimension Hierarchy Interaction Tool: InterRing 6 InterRing - Dimension Hierarchy Navigation and Manipulation Roll-up/Drill-down Distort Rotate Zoom in/out Modify 7 Dissimilarity Display Three Axes Method Diagonal Plot Method Axis Width Method Mean-Band Method 8 Scale-up for Large Number of Records Solution to Large Scale Datasets: • Group Similar Records into Data Hierarchy • Navigate Data Hierarchy by Structure-Based Brushing • Represent Data Clusters by Mean-Band Method • Provide Database Backend Support using MinMax Tree, Caching, Prefetching 9 Interactive Hierarchical Display 2D example Hierarchical Clustering Structure-Based Brushing 10 Interactive Hierarchical Display Flat Display Hierarchical Display Mean-Band Method in Parallel Coordinates 11 Interactive Hierarchical Display Flat Display Hierarchical Display Mean-Band Method in Parallel Coordinates 12 Scalability of Data Access • Approach • Attach database system to visualization front-end • MinMax hierarchy encoding • Key idea: avoid recursive processing • Pre-computed • Caching • Key idea: reduce response time and network traffic • Prefetching • Key idea: use application hints and predict user patterns • Performed during idle time 13 Scalability of Data Access: MinMax Hierarchy Encoding • Pre-compute object positions – level-of-detail (L) – extent values (x,y) – preserve tree structure level of detail • New query semantics – objects are now rectangles – select objects that touch L – select objects that touch (x, y) – structure-based brush = intersection of two selections L x y L extent values query = (x, y, L) x y 14 Scalability of Data Access: Caching • Purpose • reduce response time and network traffic • Issues • visual query cannot directly translate into object IDs high-level cache specification to avoid complete scans • Semantic caching • queries are cached rather than objects • minimize cost of cache lookup • dynamically adapt cached queries to patterns of queries 15 Scalability of Data Access: Prefetching • Strategy – Speculative (no specific hints) – navigation remains local – both user and data set influence exploration – Adaptive (strategy changes over time) – Evolves as more knowledge becomes available – Non-pure (interruptible prefetching) – leave buffer in consistent state • Requirements – non-pure prefetching + large transactions & small object size + semantic caching small granularity (object level) – speculative, non-pure prefetcher cache replacement policy + guessing method 16 Scalability of Data Access: Experimental Evaluation Effectiveness of Prefetcher 200 160 120 80 40 0 Client OFF Server OFF Client OFF Server ON Client ON Server Client ON Server OFF ON Caching % Improvement in Response Time Response Time (seconds) Effectiveness of Caching 30 25 20 15 10 5 0 0 2 4 6 Delay between User Operations (seconds) 8 Conclusions: Caching reduces response time by 80% Prefetching further reduces response time by 30% Designing better prefetching strategies might help further reduce response time 17 Scalability of Data Access: Prefetching Mean Strategy Random Strategy p p 1 1 Direction Strategy 4 p 4 p 1 (m-1) 1 m m(n-1) m(n+1) (m+1) 4 m(n-2) 4 Localized Speculative Strategies Exponential Weight Average Strategy Focus Strategy m(n-1) Current Navigation Window m(n) Hot Regions Data Set Driven Strategy m(n) m(n+1) m(n-2) Vector Strategies 18 Xmdv System Implementation OFF-LINE PROCESS • Tools – – – – – C/C++ TCL/TK OpenGL Oracle 8i Pro*C MinMax Labeling DB DB DB Loader Schema Info Translator MEMORY Hierarchical Data User Rewriter Exploration Buffer Variables Queries GUI Prefetcher Library: Buffer ON-LINE PROCESS Flat Data Estimator Random Direction Focus Mean EWA 19 Publications (available at http://davis.wpi.edu/~xmdv) • Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, "InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures", InfoVis 2002, to appear • Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward and Daniel Stroe, “Prefetching For Visual Data Exploration.” Technical Report #: WPI-CS-TR-02-07, 2002 • Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, “Interactive Hierarchical Displays: A General Framework for Visualization and Exploration of Large Multivariate Data Sets”, Computers and Graphics Journal, 2002, to appear • Daniel Stroe, Elke A. Rundensteiner and Matthew O. Ward, “Scalable Visual Hierarchy Exploration”, Database and Expert Systems Applications, pages 784-793, Sept. 2000 • Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner, “Hierarchical Parallel Coordinates for Exploration of LargeDatasets”, IEEE Proc. of Visualization, pages 43-50, Oct. 1999 • Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner, “Navigating Hierarchies with Structure-Based Brushes”, IEEE 20 Proceedings of Visualization, pages 43-50, Oct. 1999