Download R-DataVisualization(II)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Geographic information system wikipedia , lookup

Theoretical computer science wikipedia , lookup

Neuroinformatics wikipedia , lookup

Inverse problem wikipedia , lookup

Data analysis wikipedia , lookup

Multidimensional empirical mode decomposition wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Transcript
Data Visualization with R (II)
Dr. Jieh-Shan George YEH
[email protected]
Outlines
• Data Visualization with R
• Visualizing Different Type of Data
– Univariate
– Univariate Categorical
– Bivariate Categorical
– Bivariate Continuous vs Categorical
– Bivariate Continuous vs Continuous
– Bivariate: Continuous vs Time
2
Data Visualization with R
• Both anecdotally, and per Google Trends, R is
the language and tool most closely associated
with creating data visualizations.
– https://www.google.com/trends/explore?date=all&q=
R%20language,Data%20Visualization,D3.js&hl=en-US
3
Google Trend on R & Data Visualization
4
GRAPH FOR DATA MINING
5
Hierarchical Clustering
• hc<-hclust(dist(mtcars))
• plot(hc)
• rect.hclust(hc, k=4)
6
Decision Tree
require(rpart)
require(rpart.plot)
rp1<rpart(factor(cyl)~mpg,
data=mtcars)
prp(rp1)
7
OTHERS
8
Financial Timeseries
Quantitative Financial Modeling Framework
• require(quantmod)
• getSymbols("YHOO",src="google") # from google
finance
• getSymbols("YHOO", from="2014-01-01")
• chartSeries(YHOO)
9
• barChart(YHOO)
• candleChart(YHOO,multi.col=TRUE,theme="white")
• chartSeries(to.weekly(YHOO),up.col='white',dn.col='
blue')
10
GGPLOT2
11
ggplot2
• The ggplot2 package, created by Hadley Wickham,
offers a powerful graphics language for creating
elegant and complex plots.
• Originally based on Leland Wilkinson's The Grammar of
Graphics, ggplot2 allows you to create graphs that
represent both univariate and multivariate numerical
and categorical data in a straightforward manner.
• Grouping can be represented by color, symbol, size,
and transparency. The creation of trellis plots (i.e.,
conditioning) is relatively simple.
• qplot() (for quick plot) hides much of this complexity
when creating standard graphs.
12
qplot()
• The qplot() function can be used to create the most common graph
types. While it does not expose ggplot's full power, it can create a
very wide range of useful plots. The format is:
qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=,
formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=)
Notes:
• At present, ggplot2 cannot be used to create 3D graphs or mosaic
plots.
• Use I(value) to indicate a specific value. For example size=z makes
the size of the plotted points or lines proportional to the values of a
variable z. In contrast, size=I(3) sets each point or line to three
times the default size.
13
Customizing ggplot2 Graphs
• Unlike base R graphs, the ggplot2 graphs are not
effected by many of the options set in the par( )
function.
• They can be modified using the theme() function,
and by adding graphic parameters within the
qplot() function.
• For greater control, use ggplot() and other
functions provided by the package.
• ggplot2 functions can be chained with "+" signs
to generate the final plot.
14
15
Example
# ggplot2 examples
library(ggplot2)
# create factors with value labels
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1),
labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
labels=c("4cyl","6cyl","8cyl"))
16
# Kernel density plots for mpg
# grouped by number of gears (indicated by color)
qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),
main="Distribution of Gas Milage", xlab="Miles Per Gallon",
ylab="Density")
17
# Scatterplot of mpg vs. hp for each combination of gears and
cylinders
# in each facet, transmission type is represented by shape and
color
qplot(hp, mpg, data=mtcars,
shape=am, color=am,
facets=gear~cyl, size=I(3),
xlab="Horsepower",
ylab="Miles per Gallon")
18
# Separate regressions of mpg on weight for each number of
cylinders
qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"),
method="lm",
formula=y~x, color=cyl,
xlab="Weight",
ylab="Miles per Gallon“,
main="Regression of MPG on Weight",
)
19
# Boxplots of mpg by number of gears
# observations (points) are overlayed and jittered
qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"),
fill=gear,
main="Mileage by Gear Number",
xlab="",
ylab="Miles per Gallon")
20
• To learn more, see the ggplot reference site
– http://docs.ggplot2.org/current/index.html
21