Download Datasheets - Forrest W. Young

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Collaborative decision-making software wikipedia , lookup

Sensitivity analysis wikipedia , lookup

Predictive analytics wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Transcript
A Look at Alternatives
to the Usual
Statistical Software Packages
Dominic T. Moore
Forrest W. Young
University of North Carolina at Chapel Hill
Contact: [email protected]
Web: http://www.mindspring.com/~dtmoore
1
SAS
SPSS
SPlus
Systat
Stata
Minitab
Statview
Statistica
Others?
2
“Free software is software that comes
with permission for anyone to use, copy
and distribute, either verbatim or with
modifications, either gratis or for a fee.
In particular, this means that source code
is available…”
Free Software Foundation
3
Introduction to Open Source
The basic idea behind open source is very
simple. When programmers on the Internet can
read, redistribute, and modify the source for a
piece of software, it evolves. People improve it,
people adapt it, people fix bugs. And this can
happen at a speed that, if one is used to the slow
pace of conventional software development,
seems astonishing.
We in the open-source community have learned
that this rapid evolutionary process produces
better software than the traditional closed
model, in which only a very few programmers
can see source and everybody else must blindly
use an opaque block of bits.
4
How is `open source' related
to `free software'?
Open Source is a marketing program for
free software. It's a pitch for `free
software' on solid pragmatic grounds
rather than ideological tub-thumping.
The winning substance has not changed,
the losing attitude and symbolism have.
OPENSOURCE.ORG
5
Numerous ways
to philosophize or think
about software
 Politically
 Economically
 Religiously???
(Microsoft as the Great Satan?)
6
Two Free Statistical
Software Environments
 R-code - based on and like
the S-language
 XLisp-Stat - based on Lisp
language
7
LISP-STAT
AN OBJECT-ORIENTED
ENVIRONMENT FOR
STATISTICAL COMPUTING
AND DYNAMIC GRAPHICS.
By Luke Tierney
Wiley Series in Probability and
Mathematical Statistics
8
The implementation
of Lisp-Stat is known
as XLisp-Stat, (XLS)
Since David Betz developed
XLisp and made source code
available to the public.
9
About XLisp-Stat
 Object-oriented programming
 Prototyping, Statistical model representation
 Portable Windows interface
 Macintosh, X windows, Microsoft windows
 Graphics
 Dynamic and Customizable
 The LISP Language
 Extends Lisp arithmetic, element-wise operations
 Adds Statistical and Linear Algebra functions
10
More about XLisp-Stat
 Comes with complete source code
in ANSI C.
 It’s free and can be given away for
people to use and extend onto any
number of computers you like.
 Porting from Common Lisp is now
easy.
 The whole environment can be
controlled and written in XLS
(windows, dialogue boxes, menus, etc.)
 Dynamic graphics - interactivity,
with visualizations.
11
Things I like about
Xlisp-Stat:
 Learning to program
“inside out”.
 Interactive, Iterative
programming
 Very different from “data
stepping” and “PROC-ing”
12
Pearson Chi-square:
s
r
i 1
j 1


Qp =
( Observed ( ij )  Expected ( ij ))2
Expected ( ij )
Where
( i th RowTotal )( j th ColumnTotal )
Expected(ij) =
N
total
13
The Quadratic form
for Qp
Qp = (n-m) A (AVA )
/
/
/ -1
A(n-
m)
Where
n=(n11, n12...n1r...ns1...nsr)/ vector
m is the corresponding expected vector
V is the covariance matrix
and A = [I(r-1) O(r-1)]  [I(s-1)1 O(s-1)]
14
ViSta is:
 Professor Forrest W. Young’s 10 year
software development project that utilizes
cognitive science and visualization
techniques.
 Designed particularly for students and
teachers of statistics, (of all levels).
 Used as a research and development tool in
computational and graphical statistics
 Free, extendible and downloadable from the
web, (and runs under a variety of platforms).
15
ViSta can:
 Reveal structure in your data
 Guide you through an analysis
 Show you results of your analyses
 Structure your data analysis process
16
Neat Things about
ViSta
 Automatically linked graphics
 Data as objects
 Data entry is intuitive, not a
remote data step, (ie 2x2 table).
 Point and click guidemap.
 A visual interface for novices,
A command line for experts.
 Everything can be programmed,
(windows, dialog boxes, menus).
 Add on in the form of plug-ins
17
The Structured Desktop
ViSta's Desktop has
WorkMaps, GuideMaps,
SpreadPlots, Datasheets and
other features designed to
structure and assist the
statistical analyst.
18
Here is an example of a ViSta
desktop:
19
Datasheets
Datasheets display data and provide a
simple datasheet editor to create and edit
your data.
Here is an example of a datasheet:
20
WorkMaps are ViSta's
visualization technique
for structuring data
analysis sessions.
WorkMaps are created by ViSta
as the data analysis session
progresses.
21
Here is an example of a ViSta
WorkMap:
22
SpreadPlots
SpreadPlots help you explore
your data (and models of the
data) to see what they seem to
say.
SpreadPlots are state of the art:
They are structured, multiwindow, linked, dynamic and
interactive.
23
 Multi-Window: SpreadPlots are groups
of several plot-windows.
 Structured: Each plot-window show a
particular aspect of your data or model.
 Linked: The plot-windows can be linked
by the data's observations or variables.
 Dynamic: Each plot window shows a
dynamic graphic. For example, spinplots
spin to communicate 3D structure.
Boxplots can show a moving parallel
coordinate plot to communicate higher
dimensional structure.
 Interactive: You interact with the
spinplot to make it spin, with the boxplot
to move the parallel coordinate lines.
24
Spreadplot for Three-Way
Table
25
Analyses
Exploratory and Descriptive
Data Analysis
 Dynamic Exploratory Graphics include
Spinplots, Scatterplots, Scatterplot Matrices,
Histograms, Boxplots, Parallel Coordinate
Plots, Mosaic Plots, Quantile Plots, Normal
Probability Plots, Quantile-Quantile Plots,
Diamond Plots, Dotplots, Biplots, and
Guided Tour Plots.
 Plots support brushing and labeling, and are
dynamically linked.
 Smoothers and Contours can be added to
several plots.
 Descriptive Statistics including Means,
Standard Deviations, Variances, Ranges,
Quartiles, Medians, Correlations,
Covariances, Distances
26
Analyses
Univariate Analysis
 Univariate Tests including T- and Z-tests
(confidence intervals) for single sample, paired
samples and two independent samples data, with
Wilcoxon Signed-Rank and Mann-Whitney tests
in appropriate situations.
 ANOVA - Univariate Analysis of Variance for
balanced and unbalanced, one or multi-way data
(data must be complete). Model may or may not
include two-way (but not higher-way)
interactions. The model visualization is a
spreadplot composed of a boxplot, diamond plot,
quantile plot, quantile-quantile plot and effects
plot.
27
More Univariate Analysis
 Multiple Regression - Univariate regression
includes simple, multiple, robust, and monotonic
regression. The model visualization is a
spreadplot comprised of a regression, addedvariable, influence, leverage, and residuals plots.
Weight plots are also included for robust and
monotonic regression.
28
Analyses
Multivariate Analysis
 Multiple Regression - Multivariate Multiple
Regression Analysis. The spreadplot consists of
a biplot, spinplot, histogram and scatterplotmatrix.
 Principal Component Analysis of correlations
or covariances. The model visualization is a
spreadplot composed of a biplot, spin-plot,
scree-plot and scatterplot-matrix.
 Multidimensional Scaling of one or more
symmetric or asymmetric matrices. The model
visualization is a spreadplot composed of a
scatterplot, spin-plot, scree-plot and scatterplotmatrix. The spreadplot supports graphical reestimation of model parameters.
 Correspondence Analysis of two-way
contingency tables. The model visualization is a
spreadplot composed of a biplot, spinplot,
residuals plot and scree-plot. The spreadplot
supports graphical re-estimation of model
parameters.
29
30