Download LN28 - WSU EECS

Document related concepts

Waveform graphics wikipedia , lookup

Transcript
CPT-S 415
Big Data
Yinghui Wu
EME B45
1
CPT_S 415
Big Data
Data Visualization and Navigation
(Information Visualization)
 Information Visualization
 Graph visualization
– Graph drawing and graph visualization
– Graph layout
courtesy from Ivan Herman
“Close the loop”
Data quality -> knowledge quality
(potential new area)
3
Data interpretation
Visual analytics
Graph visualization
http://www.matthiasdittrich.com/proj
ekte/narratives/visualisation/
4
Information Visualization
5
Linking data to human
 Collecting information is no longer a problem, but extracting
value from information collections has become progressively
more difficult.
 Visualization links the human eye and computer, helping to
identify patterns and to extract insights from large amounts of
information
 Visualization technology shows considerable promise from
increasing the value of large-scales collections of information
 Visualization can be classified as scientific visualization,
software visualization, and information visualization
6
Visualization Classification
 Scientific Visualization helps understanding physical
phenomena in data (Nielson, 1991)
– Mathematical model plays an essential role
– Isosurfaces, volume rendering, and glyphs are commonly
used techniques
– Isosurfaces depict the distribution of certain attributes
– Volume rendering allows views to see the entire volume of 3-D data in a
single image (Nielson, 1991)
– Glyphs provides a way to display multiple attributes through combinations
of various visual cues (Chernoff, 1973)
Visualization Classification
 Software Visualization helps people understand and use
computer software effectively (Stasko et al. 1998)
– Program visualization helps programmers manage complex
software (Baecker & Price, 1998)
– Visualizing the source code (Baecer & Marcus, 1990) data
structure, and the changes made to the software (Erick et al., 1992)
– Algorithm animation is used to motivate and support the
learning of computational algorithms
http://www.algomation.com/
http://www.algomation.com/algorithm/quicksort-visualization
What is Information Visualization?
Information visualization helps users identify patterns,
correlations, or clusters
– Structured information
• Graphical representation to reveal patterns
• Integration with various data mining techniques (Thealing et
al., 2002; Johnston, 2002)
– Unstructured Information
• Need to identify variables and construct visualizable
structures
The depiction of information using spatial/graphical
representations, to facilitate comparison, pattern recognition,
change detection, and other cognitive skills by making use of the
visual system.
Information Visualization
 Problem:
– HUGE Datasets: How to understand them?
 Solution
– Take better advantage of human perceptual system
– Convert information into a graphical representation.
 Issues
– How to convert abstract information into graphical form?
– Do visualizations do a better job than other methods?
Goals of Information Visualization
Make large datasets coherent
(Present huge amounts of information compactly)
Present information from various viewpoints
Present information at several levels of detail
(from overviews to fine structure)
Support visual comparisons
Tell stories about the data
“Sci vis” versus “Info vis”
 Scientific visualization: specifically concerned with
data that has a well-defined representation in 2D or 3D
space (e.g., from simulation mesh or scanner).
*Adapted from The ParaView
Tutorial, Moreland
Information visualization
 Information visualization: concerned with data
that does not have a well-defined representation
in 2D or 3D space (i.e., “abstract data”).
Data Attributes
•Data attributes: Infovis has more data types than numerical values
Data Type
Attribute
Domain
Operations
Examples
nominal
Unordered
set
Comparison
(=)
Text,
references,
syntax
elements
ordinal
Ordered set
Ordering
(=, <, >)
Ratings (e.g.,
bad, average,
good)
discrete
Integer
Integer
arithmetic
Line of code
continuous
real
Real
arithmetic
Code metrics
Info Viz vs Sic Viz
Scivis
Infovis
Data Domain
spatial, compact
non-spatial, abstract
Attribute Type
numerical
any data type
Data Points
Samples over
the domain
Tuples of attributes
Cells
Support
interpolation
Describe relations
Interpolation
Piecewise
continuous
Can be inexistent
16
Information representation in InfoViz
17
Information Representation
 Shneiderman (1996) proposed seven types of representation
methods:
• 1-D, 2-D, 3-D
• Multidimensional
• Tree
• Network
• Temporal approaches
18
1-D
TileBars (Hearst, 1995)
2-D & 3-D
 2D: To represent information as two-dimensional visual objects
– Visualization systems based on self-organizing map (SOM)
(Kohonen, 1995)
– To help users deal with the large number of categories
created for the mass textual data
 3D: To represent information as three-dimensional visual
objects
– WebBook system folds web pages into three-dimensional
books (Card et al., 1996)
– 3-D version of a tree or network
• 3-D hyperbolic tree to visualize large-scale hierarchical
relationships (Munzner 2000)
20
– http://www.start.umd.edu/gtd/globe/index.html
Multidimensional
 To represent information as multidimensional objects
and projects them into a three-dimensional or a twodimensional space
– Dimensionality reduction algorithm will be used
•
•
•
•
Multidimensional scaling (MDS)
Hierarchical clustering
K-means algorithms
Principle components analysis
– Examples
• SPIRE system (Wise et al. 1995)
• VxInsight System (Boyack et al. 2002)
• Glyph representation has been used in various social
visualization techniques (Donath, 2002) to describe human
behavior during computer-mediated communication (CMC)
21
Table Visualization
Simple list; does not support analysis, or insight
Table Visualization
Aided by
•Sorting,
•Bar Graph,
•Evolution Icons
Dense Pixel Display:
•Bar Graph,
•Table Lens
Tree
 To represent hierarchical relationship
– Challenge: nodes grows exponentially
• Different layout algorithms have been applied
– Examples
• Tree-Map allocates space according to attributes of
nodes (Johnson & Shneiderman 1991)
• Cone Tree system uses 3-D visual structure to pack
more nodes on the screen (Robertson et al., 1991)
• Hyperbolic Tree projects subtrees on a hyperbolic plane
and puts the plane (Lamping et al., 1995)
24
Tree Visualization
 The TreeMap (Johnson & Shneiderman ‘91) Idea: Show a




hierarchy as a 2D layout
• Fill up the space with rectangles representing objects
• Size on screen indicates relative size of underlying
objects.
Treemap method: visualize the tree structure that use
virtually every pixel of the display space to convey
information
Every subtree is represented by a rectangle, that is
partitioned into smaller rectangles with correspond to its
children.
The position of the slicing lines determines the relative sizes
of the child rectangles.
For every child, repeat the slicing recursively, swapping the
slicing direction from vertical to horizontal or conversely
Tree Visualization: Examples
 Treemap
Tree Visualization
Ball-and-stick visualization: use the position and
appearance of the glyphs
Rooted-Tree Layout
Radial-Tree Layout
Graphs and Networks
 To represent complex relationships that a simple tree
structure is insufficient to represent
– Citation among academic papers( C. Chen & Paul 2001; Mackinlay
et al., 1995)
– Documents linked by the internet (Andrews, 1995)
– Spring-embedder model (Eades, 1984) along with its variants (
Davidson & Harel, 1996;l Fruchterman & Reingold, 1991) have
become the most popular drawing algorithms.
Examples
 Network visualization (vizster)
Temporal
 To represent information based on temporal order
– Location and animation are commonly used visual variables
to reveal the temporal aspect of information
– Examples
• Perspective Wall lists objects along the x-axis based on time
sequence and presents attriibutes along the y-axis (Robertson
et al., 1993)
• In VxInsight system (Boyack et al., 2002), the landscape
changes as the time changes.
30
Examples
 Geo data
mapping
Demo
 Cyber
Attacks
http://map.norsecorp.com/#/
Additional Examples
 http://map.norsecorp.com/v1/
 NY Times words, words, numbers
 Visual Complexity (from book by Manuel Lima)
 50 examples (from June 2009, somewhat dated)
 D3 Gallery
Visualization components
 User-Interface Interaction: Color, Size, Texture, Proximity,
Annotation, Interactivity
– Immediate interaction not only allows direct manipulation of
the visual objects displayed but also allows users to select
what to be displayed (Card et al., 1999)
– Shneiderman (1996) summarizes six types of interface
functionality: Overview, Zoom, Filtering, Details on demand,
Relate, history
 Information Analytics
– Indexing
• Extract the semantics of information
– Analysis
• Clustering, classification
Visualization pipeline
 Acquire -> Parse -> Filter -> Mine -> Represent -> Refine -> Interact
Parse
Represent
Interact
Acquire
Filter/Mine
Refine
34
Visualization software
 Host language (C/C++/Java/Python) plus OpenGL
 Stat/math package with graphics
– R
– MATLAB
 Special-purpose info viz software
– Earth mapping, biological network visualization, etc.
 Browser-enabled graphics/info viz packages
–
–
–
–
Google Charts
Processing / Processing.js
D3
Java + Flash (becoming rarer)
Graph Drawing and Graph Visualization
36
Information Visualization
vs. Graph Drawing
 Graph Drawing
– Old topic, many books, etc.
– May have other goals than visualization
• E.g. VLSI design
 Graph Visualization
– Size key issue
– Usability requires nodes to be discernable
– Navigation considered
Graph Visualization
Hierarchical
graph of the
evolution of
the UNIX
operating
system
Graph Visualization
The Call Graph
Three concentric
rings show
containment
(1)Files
(2)Classes
(3)Methods
The curved lines
indicate function
calls
Graph visualization
 Circle chart
When is Graph Visualization Applicable?
 Ask the question: is there an inherent relation among the data
elements to be visualized?
– If YES – then the data can be represented by nodes of a graph,
with edges representing the relations.
– If NO – then the data elements are “unstructured” and goal is to
use visualization to analyze and discover relationships among
data.
Source: Herman, Graph Visualization and Navigation in Information
Visualization: a Survey
Traditional Graph Drawing
 Optimization based on a set of criteria (mathematical
aesthetics)
– Minimize edge crossings
– Minimize area
– Maximize smallest angle
– Maximize symmetry
– Do all at once is hard.
 Often unsuitable for interactive visualization
– Many optimizations are NP-Hard
– Approximation algorithms very complex
• Precompute layout, or compute once at the beginning of an
application then support interaction
Slide adapted from
Jeff Heer
Traditional Graph Drawing
poly-line graphs
(includes bends)
orthogonal drawing
planar, straight-line
drawing
upward drawing of
DAGs
Layout Approaches




Tree-ify the graph - then use tree layout
Hierarchical graph layout
Radial graph layout
Optimization-based techniques




Adjacency matrices
Structurally-independent layout
On-demand revealing of subgraphs
Distortion-based views
– Includes spring-embedding / force-directed layout
– Hyperbolic browser
 (this list is not meant to be exhaustive)
Tree-based graph layout
 Select a tree-structure out of the graph
– Breadth-first-search tree
– Minimum spanning tree
– Other domain-specific structures
 Use a tree layout algorithm
 Benefits
– Fast, supports interaction and refinement
 Drawbacks
– Limited range of layouts
Tree-ify the graph
Traditional Tree Layouts
 H-tree layout: best for balanced trees
 Radial view
 Balloon view: related to 3-d cone tree
Hierarchical graph layout
 Use directed structure of graph to inform layout
 Order the graph into distinct levels
– this determines one dimension
 Now optimize within levels
– determines the second dimension
– minimize edge crossings, etc
 The method used in graphviz’s “dot” algorithm
 Great for directed acyclic graphs, but often misleading in the
case of cycles
Hierarchical Graph Layout
 Evolution of the
UNIX operating
system
 Hierarchical
layering based on
descent
Hierarchical graph layout
Gnutella network
Radial Layout
 Animated Exploration of Graphs with Radial Layout, Yee et al.,
2001
 Gnutella network
Optimization-based layout
 Specify constraints for layout
– Series of mathematical equations
– Hand to “solver” which tries to optimize the constraints
 Examples
– Minimize edge crossings, line bends, etc
– Multi-dimensional scaling (preserve multi-dim distance)
– Force-directed placement (use physics metaphor)
 Benefits
– General applicability
– Often customizable by adding new constraints
 Drawbacks
– Approximate constraint satisfaction
– Running time; “organic” look not always desired
Example: Force-Directed Layout
Uses physics model to
layout graph,
Nodes repel each other,
edges act as springs, and
some amount of friction or
drag force is used.
Special techniques to
dampen “jitter”.
http://getspringy.com/demo.ht
ml
visual wordnet
visuwords
http://www.kylescholz.com/projects/wordnet
http://www.visuwords.com/
Hyperbolic Browser: Inspiration
Using Distortion and Focus + Context
 The Hyperbolic Tree Browser
The Hyperbolic Browser: A Focus + Context Technique for Visualizing Large Hierarchies,
Lamping & Rao, CHI 1996.
– http://www.inxight.com/products/sdks/st/
– Uses non-Euclidean geometry as basis of focus + context technique
• The hyperbolic browser is a projection into a Euclidean space – a circle
– The circumference of a circle increases at a linearly with radius (2 PI)
– The circumference of a circle in hyperbolic space increases
exponentially
 Exponential growth in space available with linear growth of radius
– Makes tree layout easy
 Size of objects decreases with growth of radius
– Reduces expense of drawing trees when cut-off at one pixel
Appearance of Initial Layout
 Root mapped at center
 Multiple generations of children mapped out towards edge of circle
 Drawing of nodes cuts off when less than one pixel
Structurally-Independent Layout
 Ignore the graph structure.
 Base the layout on other attributes of the data
 Examples:
– Geography
– Time
 Benefits
– Often very quick layout
– Optimizes communication of particular features
 Drawbacks
– May or may not present structure well
Structurally Independent Layout
 The “Skitter” Layout
– Internet Connectivity
 Angle = Longitude
– geography
 Radius = Degree
– # of connections
http://www.caida.org/research/topology/as_core_network/2007/images/ascore-simple.2007_big.png
Skitter, www.caida.org
References
 David Gotz and Michelle X. Zhou: Characterizing users' visual analytic activity for insight provenance.
 Information Visualization 8(1): 42-55, 2009.
 David Gotz and Zhen Wen: Behavior-driven visualization recommendation. IUI 2009: 315-324, 2008.
 Eser Kandogan: Just-in-time annotation of clusters, outliers, and trends in point-based data
visualizations. IEEE VAST 2012: 73-82.
 Mengdie Hu, Huahai Yang, Michelle X. Zhou, Liang Gou, Yunyao Li, and Eben Haber: OpinionBlocks: A
Crowd-Powered, Self-Improving Interactive Visual Analytic System for Understanding Opinion Text. To
appear in Proc. INTERACT 2013.
 Zhen Wen and Michelle X. Zhou: Evaluating the Use of Data Transformation for Information
Visualization. IEEE Trans. Vis. Comp. Graph. 14(6): 1309-1316, 2008.
 Zhen Wen and Michelle X. Zhou: An optimization-based approach to dynamic data transformation for
smart visualization. IUI 2008: 70-79
 Zhen Wen, Michelle X. Zhou, and Vikram Aggarwal: An Optimization-based Approach to Dynamic Visual
Context Management. INFOVIS 2005: 25-32.
 Huahai Yang, Yunyao Li, and Michelle X. Zhou: A Crowd-sourced Study: Understanding Users’
Comprehension and Preferences for Composing Information Graphics. In Submission to TOCHI 2013.
 Michelle X. Zhou and Min Chen: Automated Generation of Graphic Sketches by Example. IJCAI 2003:
65-74
 Michelle X. Zhou, Min Chen, and Ying Feng: Building a Visual Database for Example-based Graphics
Generation. INFOVIS 2002: 23-30.
 Michelle X. Zhou, Sheng Ma, and Ying Feng: Applying machine learning to automated information
graphics generation. IBM Systems Journal 41(3): 504-523 (2002)