Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CPT-S 415 Big Data Yinghui Wu EME B45 1 CPT_S 415 Big Data Data Visualization and Navigation (Information Visualization) Information Visualization Graph visualization – Graph drawing and graph visualization – Graph layout courtesy from Ivan Herman “Close the loop” Data quality -> knowledge quality (potential new area) 3 Data interpretation Visual analytics Graph visualization http://www.matthiasdittrich.com/proj ekte/narratives/visualisation/ 4 Information Visualization 5 Linking data to human Collecting information is no longer a problem, but extracting value from information collections has become progressively more difficult. Visualization links the human eye and computer, helping to identify patterns and to extract insights from large amounts of information Visualization technology shows considerable promise from increasing the value of large-scales collections of information Visualization can be classified as scientific visualization, software visualization, and information visualization 6 Visualization Classification Scientific Visualization helps understanding physical phenomena in data (Nielson, 1991) – Mathematical model plays an essential role – Isosurfaces, volume rendering, and glyphs are commonly used techniques – Isosurfaces depict the distribution of certain attributes – Volume rendering allows views to see the entire volume of 3-D data in a single image (Nielson, 1991) – Glyphs provides a way to display multiple attributes through combinations of various visual cues (Chernoff, 1973) Visualization Classification Software Visualization helps people understand and use computer software effectively (Stasko et al. 1998) – Program visualization helps programmers manage complex software (Baecker & Price, 1998) – Visualizing the source code (Baecer & Marcus, 1990) data structure, and the changes made to the software (Erick et al., 1992) – Algorithm animation is used to motivate and support the learning of computational algorithms http://www.algomation.com/ http://www.algomation.com/algorithm/quicksort-visualization What is Information Visualization? Information visualization helps users identify patterns, correlations, or clusters – Structured information • Graphical representation to reveal patterns • Integration with various data mining techniques (Thealing et al., 2002; Johnston, 2002) – Unstructured Information • Need to identify variables and construct visualizable structures The depiction of information using spatial/graphical representations, to facilitate comparison, pattern recognition, change detection, and other cognitive skills by making use of the visual system. Information Visualization Problem: – HUGE Datasets: How to understand them? Solution – Take better advantage of human perceptual system – Convert information into a graphical representation. Issues – How to convert abstract information into graphical form? – Do visualizations do a better job than other methods? Goals of Information Visualization Make large datasets coherent (Present huge amounts of information compactly) Present information from various viewpoints Present information at several levels of detail (from overviews to fine structure) Support visual comparisons Tell stories about the data “Sci vis” versus “Info vis” Scientific visualization: specifically concerned with data that has a well-defined representation in 2D or 3D space (e.g., from simulation mesh or scanner). *Adapted from The ParaView Tutorial, Moreland Information visualization Information visualization: concerned with data that does not have a well-defined representation in 2D or 3D space (i.e., “abstract data”). Data Attributes •Data attributes: Infovis has more data types than numerical values Data Type Attribute Domain Operations Examples nominal Unordered set Comparison (=) Text, references, syntax elements ordinal Ordered set Ordering (=, <, >) Ratings (e.g., bad, average, good) discrete Integer Integer arithmetic Line of code continuous real Real arithmetic Code metrics Info Viz vs Sic Viz Scivis Infovis Data Domain spatial, compact non-spatial, abstract Attribute Type numerical any data type Data Points Samples over the domain Tuples of attributes Cells Support interpolation Describe relations Interpolation Piecewise continuous Can be inexistent 16 Information representation in InfoViz 17 Information Representation Shneiderman (1996) proposed seven types of representation methods: • 1-D, 2-D, 3-D • Multidimensional • Tree • Network • Temporal approaches 18 1-D TileBars (Hearst, 1995) 2-D & 3-D 2D: To represent information as two-dimensional visual objects – Visualization systems based on self-organizing map (SOM) (Kohonen, 1995) – To help users deal with the large number of categories created for the mass textual data 3D: To represent information as three-dimensional visual objects – WebBook system folds web pages into three-dimensional books (Card et al., 1996) – 3-D version of a tree or network • 3-D hyperbolic tree to visualize large-scale hierarchical relationships (Munzner 2000) 20 – http://www.start.umd.edu/gtd/globe/index.html Multidimensional To represent information as multidimensional objects and projects them into a three-dimensional or a twodimensional space – Dimensionality reduction algorithm will be used • • • • Multidimensional scaling (MDS) Hierarchical clustering K-means algorithms Principle components analysis – Examples • SPIRE system (Wise et al. 1995) • VxInsight System (Boyack et al. 2002) • Glyph representation has been used in various social visualization techniques (Donath, 2002) to describe human behavior during computer-mediated communication (CMC) 21 Table Visualization Simple list; does not support analysis, or insight Table Visualization Aided by •Sorting, •Bar Graph, •Evolution Icons Dense Pixel Display: •Bar Graph, •Table Lens Tree To represent hierarchical relationship – Challenge: nodes grows exponentially • Different layout algorithms have been applied – Examples • Tree-Map allocates space according to attributes of nodes (Johnson & Shneiderman 1991) • Cone Tree system uses 3-D visual structure to pack more nodes on the screen (Robertson et al., 1991) • Hyperbolic Tree projects subtrees on a hyperbolic plane and puts the plane (Lamping et al., 1995) 24 Tree Visualization The TreeMap (Johnson & Shneiderman ‘91) Idea: Show a hierarchy as a 2D layout • Fill up the space with rectangles representing objects • Size on screen indicates relative size of underlying objects. Treemap method: visualize the tree structure that use virtually every pixel of the display space to convey information Every subtree is represented by a rectangle, that is partitioned into smaller rectangles with correspond to its children. The position of the slicing lines determines the relative sizes of the child rectangles. For every child, repeat the slicing recursively, swapping the slicing direction from vertical to horizontal or conversely Tree Visualization: Examples Treemap Tree Visualization Ball-and-stick visualization: use the position and appearance of the glyphs Rooted-Tree Layout Radial-Tree Layout Graphs and Networks To represent complex relationships that a simple tree structure is insufficient to represent – Citation among academic papers( C. Chen & Paul 2001; Mackinlay et al., 1995) – Documents linked by the internet (Andrews, 1995) – Spring-embedder model (Eades, 1984) along with its variants ( Davidson & Harel, 1996;l Fruchterman & Reingold, 1991) have become the most popular drawing algorithms. Examples Network visualization (vizster) Temporal To represent information based on temporal order – Location and animation are commonly used visual variables to reveal the temporal aspect of information – Examples • Perspective Wall lists objects along the x-axis based on time sequence and presents attriibutes along the y-axis (Robertson et al., 1993) • In VxInsight system (Boyack et al., 2002), the landscape changes as the time changes. 30 Examples Geo data mapping Demo Cyber Attacks http://map.norsecorp.com/#/ Additional Examples http://map.norsecorp.com/v1/ NY Times words, words, numbers Visual Complexity (from book by Manuel Lima) 50 examples (from June 2009, somewhat dated) D3 Gallery Visualization components User-Interface Interaction: Color, Size, Texture, Proximity, Annotation, Interactivity – Immediate interaction not only allows direct manipulation of the visual objects displayed but also allows users to select what to be displayed (Card et al., 1999) – Shneiderman (1996) summarizes six types of interface functionality: Overview, Zoom, Filtering, Details on demand, Relate, history Information Analytics – Indexing • Extract the semantics of information – Analysis • Clustering, classification Visualization pipeline Acquire -> Parse -> Filter -> Mine -> Represent -> Refine -> Interact Parse Represent Interact Acquire Filter/Mine Refine 34 Visualization software Host language (C/C++/Java/Python) plus OpenGL Stat/math package with graphics – R – MATLAB Special-purpose info viz software – Earth mapping, biological network visualization, etc. Browser-enabled graphics/info viz packages – – – – Google Charts Processing / Processing.js D3 Java + Flash (becoming rarer) Graph Drawing and Graph Visualization 36 Information Visualization vs. Graph Drawing Graph Drawing – Old topic, many books, etc. – May have other goals than visualization • E.g. VLSI design Graph Visualization – Size key issue – Usability requires nodes to be discernable – Navigation considered Graph Visualization Hierarchical graph of the evolution of the UNIX operating system Graph Visualization The Call Graph Three concentric rings show containment (1)Files (2)Classes (3)Methods The curved lines indicate function calls Graph visualization Circle chart When is Graph Visualization Applicable? Ask the question: is there an inherent relation among the data elements to be visualized? – If YES – then the data can be represented by nodes of a graph, with edges representing the relations. – If NO – then the data elements are “unstructured” and goal is to use visualization to analyze and discover relationships among data. Source: Herman, Graph Visualization and Navigation in Information Visualization: a Survey Traditional Graph Drawing Optimization based on a set of criteria (mathematical aesthetics) – Minimize edge crossings – Minimize area – Maximize smallest angle – Maximize symmetry – Do all at once is hard. Often unsuitable for interactive visualization – Many optimizations are NP-Hard – Approximation algorithms very complex • Precompute layout, or compute once at the beginning of an application then support interaction Slide adapted from Jeff Heer Traditional Graph Drawing poly-line graphs (includes bends) orthogonal drawing planar, straight-line drawing upward drawing of DAGs Layout Approaches Tree-ify the graph - then use tree layout Hierarchical graph layout Radial graph layout Optimization-based techniques Adjacency matrices Structurally-independent layout On-demand revealing of subgraphs Distortion-based views – Includes spring-embedding / force-directed layout – Hyperbolic browser (this list is not meant to be exhaustive) Tree-based graph layout Select a tree-structure out of the graph – Breadth-first-search tree – Minimum spanning tree – Other domain-specific structures Use a tree layout algorithm Benefits – Fast, supports interaction and refinement Drawbacks – Limited range of layouts Tree-ify the graph Traditional Tree Layouts H-tree layout: best for balanced trees Radial view Balloon view: related to 3-d cone tree Hierarchical graph layout Use directed structure of graph to inform layout Order the graph into distinct levels – this determines one dimension Now optimize within levels – determines the second dimension – minimize edge crossings, etc The method used in graphviz’s “dot” algorithm Great for directed acyclic graphs, but often misleading in the case of cycles Hierarchical Graph Layout Evolution of the UNIX operating system Hierarchical layering based on descent Hierarchical graph layout Gnutella network Radial Layout Animated Exploration of Graphs with Radial Layout, Yee et al., 2001 Gnutella network Optimization-based layout Specify constraints for layout – Series of mathematical equations – Hand to “solver” which tries to optimize the constraints Examples – Minimize edge crossings, line bends, etc – Multi-dimensional scaling (preserve multi-dim distance) – Force-directed placement (use physics metaphor) Benefits – General applicability – Often customizable by adding new constraints Drawbacks – Approximate constraint satisfaction – Running time; “organic” look not always desired Example: Force-Directed Layout Uses physics model to layout graph, Nodes repel each other, edges act as springs, and some amount of friction or drag force is used. Special techniques to dampen “jitter”. http://getspringy.com/demo.ht ml visual wordnet visuwords http://www.kylescholz.com/projects/wordnet http://www.visuwords.com/ Hyperbolic Browser: Inspiration Using Distortion and Focus + Context The Hyperbolic Tree Browser The Hyperbolic Browser: A Focus + Context Technique for Visualizing Large Hierarchies, Lamping & Rao, CHI 1996. – http://www.inxight.com/products/sdks/st/ – Uses non-Euclidean geometry as basis of focus + context technique • The hyperbolic browser is a projection into a Euclidean space – a circle – The circumference of a circle increases at a linearly with radius (2 PI) – The circumference of a circle in hyperbolic space increases exponentially Exponential growth in space available with linear growth of radius – Makes tree layout easy Size of objects decreases with growth of radius – Reduces expense of drawing trees when cut-off at one pixel Appearance of Initial Layout Root mapped at center Multiple generations of children mapped out towards edge of circle Drawing of nodes cuts off when less than one pixel Structurally-Independent Layout Ignore the graph structure. Base the layout on other attributes of the data Examples: – Geography – Time Benefits – Often very quick layout – Optimizes communication of particular features Drawbacks – May or may not present structure well Structurally Independent Layout The “Skitter” Layout – Internet Connectivity Angle = Longitude – geography Radius = Degree – # of connections http://www.caida.org/research/topology/as_core_network/2007/images/ascore-simple.2007_big.png Skitter, www.caida.org References David Gotz and Michelle X. Zhou: Characterizing users' visual analytic activity for insight provenance. Information Visualization 8(1): 42-55, 2009. David Gotz and Zhen Wen: Behavior-driven visualization recommendation. IUI 2009: 315-324, 2008. Eser Kandogan: Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations. IEEE VAST 2012: 73-82. Mengdie Hu, Huahai Yang, Michelle X. Zhou, Liang Gou, Yunyao Li, and Eben Haber: OpinionBlocks: A Crowd-Powered, Self-Improving Interactive Visual Analytic System for Understanding Opinion Text. To appear in Proc. INTERACT 2013. Zhen Wen and Michelle X. Zhou: Evaluating the Use of Data Transformation for Information Visualization. IEEE Trans. Vis. Comp. Graph. 14(6): 1309-1316, 2008. Zhen Wen and Michelle X. Zhou: An optimization-based approach to dynamic data transformation for smart visualization. IUI 2008: 70-79 Zhen Wen, Michelle X. Zhou, and Vikram Aggarwal: An Optimization-based Approach to Dynamic Visual Context Management. INFOVIS 2005: 25-32. Huahai Yang, Yunyao Li, and Michelle X. Zhou: A Crowd-sourced Study: Understanding Users’ Comprehension and Preferences for Composing Information Graphics. In Submission to TOCHI 2013. Michelle X. Zhou and Min Chen: Automated Generation of Graphic Sketches by Example. IJCAI 2003: 65-74 Michelle X. Zhou, Min Chen, and Ying Feng: Building a Visual Database for Example-based Graphics Generation. INFOVIS 2002: 23-30. Michelle X. Zhou, Sheng Ma, and Ying Feng: Applying machine learning to automated information graphics generation. IBM Systems Journal 41(3): 504-523 (2002)