Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Geographic information system wikipedia , lookup
Theoretical computer science wikipedia , lookup
Neuroinformatics wikipedia , lookup
Inverse problem wikipedia , lookup
Data analysis wikipedia , lookup
Multidimensional empirical mode decomposition wikipedia , lookup
K-nearest neighbors algorithm wikipedia , lookup
Data assimilation wikipedia , lookup
Data Visualization with R (II) Dr. Jieh-Shan George YEH [email protected] Outlines • Data Visualization with R • Visualizing Different Type of Data – Univariate – Univariate Categorical – Bivariate Categorical – Bivariate Continuous vs Categorical – Bivariate Continuous vs Continuous – Bivariate: Continuous vs Time 2 Data Visualization with R • Both anecdotally, and per Google Trends, R is the language and tool most closely associated with creating data visualizations. – https://www.google.com/trends/explore?date=all&q= R%20language,Data%20Visualization,D3.js&hl=en-US 3 Google Trend on R & Data Visualization 4 GRAPH FOR DATA MINING 5 Hierarchical Clustering • hc<-hclust(dist(mtcars)) • plot(hc) • rect.hclust(hc, k=4) 6 Decision Tree require(rpart) require(rpart.plot) rp1<rpart(factor(cyl)~mpg, data=mtcars) prp(rp1) 7 OTHERS 8 Financial Timeseries Quantitative Financial Modeling Framework • require(quantmod) • getSymbols("YHOO",src="google") # from google finance • getSymbols("YHOO", from="2014-01-01") • chartSeries(YHOO) 9 • barChart(YHOO) • candleChart(YHOO,multi.col=TRUE,theme="white") • chartSeries(to.weekly(YHOO),up.col='white',dn.col=' blue') 10 GGPLOT2 11 ggplot2 • The ggplot2 package, created by Hadley Wickham, offers a powerful graphics language for creating elegant and complex plots. • Originally based on Leland Wilkinson's The Grammar of Graphics, ggplot2 allows you to create graphs that represent both univariate and multivariate numerical and categorical data in a straightforward manner. • Grouping can be represented by color, symbol, size, and transparency. The creation of trellis plots (i.e., conditioning) is relatively simple. • qplot() (for quick plot) hides much of this complexity when creating standard graphs. 12 qplot() • The qplot() function can be used to create the most common graph types. While it does not expose ggplot's full power, it can create a very wide range of useful plots. The format is: qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=) Notes: • At present, ggplot2 cannot be used to create 3D graphs or mosaic plots. • Use I(value) to indicate a specific value. For example size=z makes the size of the plotted points or lines proportional to the values of a variable z. In contrast, size=I(3) sets each point or line to three times the default size. 13 Customizing ggplot2 Graphs • Unlike base R graphs, the ggplot2 graphs are not effected by many of the options set in the par( ) function. • They can be modified using the theme() function, and by adding graphic parameters within the qplot() function. • For greater control, use ggplot() and other functions provided by the package. • ggplot2 functions can be chained with "+" signs to generate the final plot. 14 15 Example # ggplot2 examples library(ggplot2) # create factors with value labels mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("Automatic","Manual")) mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl")) 16 # Kernel density plots for mpg # grouped by number of gears (indicated by color) qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5), main="Distribution of Gas Milage", xlab="Miles Per Gallon", ylab="Density") 17 # Scatterplot of mpg vs. hp for each combination of gears and cylinders # in each facet, transmission type is represented by shape and color qplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon") 18 # Separate regressions of mpg on weight for each number of cylinders qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, xlab="Weight", ylab="Miles per Gallon“, main="Regression of MPG on Weight", ) 19 # Boxplots of mpg by number of gears # observations (points) are overlayed and jittered qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), fill=gear, main="Mileage by Gear Number", xlab="", ylab="Miles per Gallon") 20 • To learn more, see the ggplot reference site – http://docs.ggplot2.org/current/index.html 21