Download Integrating Statistical Analysis with Visualization

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia, lookup

History of statistics wikipedia, lookup

Misuse of statistics wikipedia, lookup

Time series wikipedia, lookup

Integrating Statistical Analysis
with Visualization
CS 4390/5390 Data Visualization
Shirley Moore, Instructor
October 8, 2014
Descriptive Statistics
• Basic statistics such as mean, mode, median,
standard deviation
• Correlation coefficient expresses strength of
an assumed linear correlation of two random
variables on a scale between -1 and 1.
• Probability density describes the likelihood for
a random variable to take on a given value.
Pearson’s Correlation Coefficient
Scatterplots with Correlation
Normal Distribution
Box Plots
• Graphically displays data according to their
• Indicate dispersion and skewness
• Show outliers
• Examples
– Box plots in D3
Inferential Statistics
• Draw conclusions that reach beyond the
immediate data
• Regression modeling
– linear
– nonlinear
– multiple
• Clustering
– centroid-based
– hierarchical
Linear Regression
• Least squares regression calculates the bestfitting line for the observed data by minimizing
the sum of the squares of the vertical deviations
from each data point to the line.
• Coefficient of determination (R2 value) indicates
how much of the total variation in y can be
explained by the relationship between x and y.
• Linear regression in D3:
Anscombe’s Quartet
Cluster Analysis
• Task of grouping a set of objects in such a way
that objects in the same group (called a
cluster) are more similar in some sense to
each other than to those in other groups
• Many different algorithms
Centroid-based Clustering
• Clusters are represented by a central vector,
which may not necessarily be a member of the
data set.
• When the number of clusters is fixed to k, kmeans clustering finds k cluster centers and
assigns the objects to the nearest cluster center,
such that the sum of the squared distances from
the centers is minimized.
• Since the problem is NP-hard, approximation
algorithms are often used.
k-means Clustering Formal Definition
Lloyd’s Algorithm
See science.js for an implementation by Jason Davies.
See also
Hierarchical Clustering
• Agglomerative and divisive methods
• Distance-based agglomerative method
repeatedly merges “closest” clusters
• Result usually represented as dendogram
Hierarchical Clustering Example
See science.js for D3 hcluster code
Preparation for Next Class
• Finish Lab 3
– upload files prior to class
• Review for the Quest (quiz/test) that will be
Wed., Oct 15
– Bring questions to class