Category Archives: visualization

Interactive Graphics with the iplots Package (from “R in Action”)

The followings introductory post is intended for new users of R.  It deals with interactive visualization using R through the iplots package.

This is a guest article by Dr. Robert I. Kabacoff, the founder of (one of) the first online R tutorials websites: Quick-R. Kabacoff has recently published the book ”R in Action“, providing a detailed walk-through for the R language based on various examples for illustrating R’s features (data manipulation, statistical methods, graphics, and so on…). In previous guest posts by Kabacoff we introduced data.frame objects in R and dealt with the Aggregation and Restructuring of data (using base R functions and the reshape package).

For readers of this blog, there is a 38% discount off the “R in Action” book (as well as all other eBooks, pBooks and MEAPs at Manning publishing house), simply by using the code rblogg38 when reaching checkout.

Let us now talk about Interactive Graphics with the iplots Package:

Interactive Graphics with the iplots Package

The base installation of R provides limited interactivity with graphs. You can modify graphs by issuing additional program statements, but there’s little that you can do to modify them or gather new information from them using the mouse. However, there are contributed packages that greatly enhance your ability to interact with the graphs you create—playwith, latticist, iplots, and rggobi. In this article, we’ll focus on functions provided by the iplots package. Be sure to install it before first use.

While playwith and latticist allow you to interact with a single graph, the iplots package takes interaction in a different direction. This package provides interactive mosaic plots, bar plots, box plots, parallel plots, scatter plots, and histograms that can be linked together and color brushed. This means that you can select and identify observations using the mouse, and highlighting observations in one graph will automatically highlight the same observations in all other open graphs. You can also use the mouse to obtain information about graphic objects such as points, bars, lines, and box plots.

The iplots package is implemented through Java and the primary functions are listed in table 1.

Table 1 iplot functions

Function

Description

ibar() Interactive bar chart
ibox() Interactive box plot
ihist() Interactive histogram
imap() Interactive map
imosaic() Interactive mosaic plot
ipcp() Interactive parallel coordinates plot
iplot() Interactive scatter plot

To understand how iplots works, execute the code provided in listing 1.

Listing 1 iplots demonstration

1
2
3
4
5
6
7
8
9
10
11
12
library(iplots)
attach(mtcars)
cylinders <- factor(cyl)
gears <- factor(gear)
transmission <- factor(am)
ihist(mpg)
ibar(gears)
iplot(mpg, wt)
ibox(mtcars[c("mpg", "wt", "qsec", "disp", "hp")])
ipcp(mtcars[c("mpg", "wt", "qsec", "disp", "hp")])
imosaic(transmission, cylinders)
detach(mtcars)

Six windows containing graphs will open. Rearrange them on the desktop so that each is visible (each can be resized if necessary). A portion of the display is provided in figure 1.

Figure 1 An iplots demonstration created by listing 1. Only four of the six windows are displayed to save room. In these graphs, the user has clicked on the three-gear bar in the bar chart window.

Now try the following:

  • Click on the three-gear bar in the Barchart (gears) window. The bar will turn red. In addition, all cars with three-gear engines will be highlighted in the other graph windows.
  • Mouse down and drag to select a rectangular region of points in the Scatter plot (wt vs mpg) window. These points will be highlighted and the corresponding observations in every other graph window will also turn red.
  • Hold down the Ctrl key and move the mouse pointer over a point, bar, box plot, or line in one of the graphs. Details about that object will appear in a pop-up window.
  • Right-click on any object and note the options that are offered in the context menu. For example, you can right-click on the Boxplot (mpg) window and change the graph to a parallel coordinates plot (PCP).
  • You can drag to select more than one object (point, bar, and so on) or use Shift-click to select noncontiguous objects. Try selecting both the three- and five-gear bars in the Barchart (gears) window.

The functions in the iplots package allow you to explore the variable distributions and relationships among variables in subgroups of observations that you select interactively. This can provide insights that would be difficult and time-consuming to obtain in other ways. For more information on the iplots package, visit the project website at http://rosuda.org/iplots/.

Summary

In this article, we explored one of the several packages for dynamically interacting with graphs, iplots. This package allows you to interact directly with data in graphs, leading to a greater intimacy with your data and expanded opportunities for developing insights.


This article first appeared as chapter 16.4.4 from the “R in action book, and is published with permission from Manning publishing house.  Other books in this serious which you might be interested in are (see the beginning of this post for a discount code):

Engineering Data Analysis (with R and ggplot2) – a Google Tech Talk given by Hadley Wickham

It appears that just days ago, Google Tech Talk released a new, one hour long, video of a presentation (from June 6, 2011) made by one of R’s community more influential contributors, Hadley Wickham.

This seems to be one of the better talks to send a programmer friend who is interested in getting into R.

Talk abstract

Data analysis, the process of converting data into knowledge, insight and understanding, is a critical part of statistics, but there’s surprisingly little research on it. In this talk I’ll introduce some of my recent work, including a model of data analysis. I’m a passionate advocate of programming that data analysis should be carried out using a programming language, and I’ll justify this by discussing some of the requirement of good data analysis (reproducibility, automation and communication). With these in mind, I’ll introduce you to a powerful set of tools for better understanding data: the statistical programming language R, and the ggplot2 domain specific language (DSL) for visualisation.

The video

More resources

R GUI now offers interactive graphics – Deducer 0.4-2 connects with iplots

Earlier today, Ian Fwllows has announced the release of Deducer 0.4-2 and DeducerExtras 1.2 to CRAN (I copy his announcement here):

Deducer 0.4-2 contains a few bug fixes, and an interface to the iplots package. With the new iplots interface it is now possible to do interactive plots with Deducer. An introductory example screen cast (by Ian) is available on the tube:

DeducerExtras 1.2 contains a few new dialogs including ‘load data from package’, and ‘t-test power’.

Additionally, a new Windows R/JGR/Deducer installer is available which installs R-2.12.0, JGR with it’s launcher, Deducer, DeducerExtras, and DeducerPlugInScaling. It is available on the Deducer website:

http://www.deducer.org/pmwiki/pmwiki.php?n=Main.WindowsInstallation

The new GUI for ggplot2 (using Deducer) – the designer wants your opinion

After discovering that R is expected (this summer) to have a GUI for ggplot2 (through deducer), I later found Ian’s gsoc proposal for this GUI.  Since the system is in it’s early stages of development, Ian has invited people to give comments, input and critique on his plans for the project.

For your convenience (and with Ian’s permission), I am reposting his proposal here. You are welcome to send him feedback by e-mailing him (at: [email protected]), or by leaving a comment here (and I will direct him to your comment).

Read more »

Jeroen Ooms’s ggplot2 web interface – a new version released (V0.2)

Good news.

Jeroen Ooms released a new version of his (amazing) online ggplot2 web interface:

yeroon.net/ggplot2 is a web interface for Hadley Wickham’s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the client side: a standard browser will do.

The new version has a lot of cool new features, like advanced data import, integration with Google docs, converting variables from numeric to factor to dates and vice versa, and a lot of new geom’s. Some of which you can watch in his new video demo of the application:

The application is on:
http://www.yeroon.net/ggplot2/

p.s: other posts about this (including videos explaining how some of this was done) can be views on the category page: R and the web

Quantile LOESS – Combining a moving quantile window with LOESS (R function)

In this post I will provide R code that implement’s the combination of repeated running quantile with the LOESS smoother to create a type of “quantile LOESS” (e.g: “Local Quantile Regression”).

This method is useful when the need arise to fit robust and resistant (Need to be verified) a smoothed line for a quantile (an example for such a case is provided at the end of this post).

If you wish to use the function in your own code, simply run inside your R console the following line:

?View Code RSPLUS
source("http://www.r-statistics.com/wp-content/uploads/2010/04/Quantile.loess_.r.txt")

Background

I came a cross this idea in an article titled “High throughput data analysis in behavioral genetics” by Anat Sakov, Ilan Golani, Dina Lipkind and my advisor Yoav Benjamini. From the abstract:

In recent years, a growing need has arisen in different fields, for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with non-standard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. [...] we use a non-standard mix of robust and resistant methods: LOWESS and repeated running median.

The motivation for this technique came from “Path data” (of mice) which is

prone to suffer from noise and outliers. During progression a tracking system might lose track of the animal, inserting (occasionally very large) outliers into the data. During lingering, and even more so during arrests, outliers are rare, but the recording noise is large relative to the actual size of the movement. The statistical implications are that the two types of behavior require different degrees of smoothing and resistance. An additional complication is that the two interchange many times throughout a session. As a result, the statistical solution adopted needs not only to smooth the data, but also to recognize, adaptively, when there are arrests. To the best of our knowledge, no single existing smoothing technique has yet been able to fulfill this dual task. We elaborate on the sources of noise, and propose a mix of LOWESS (Cleveland, 1977) and the repeated running median (RRM; Tukey, 1977) to cope with these challenges

If all we wanted to do was to perform moving average (running average) on the data, using R, we could simply use the rollmean function from the zoo package.
But since we wanted also to allow quantile smoothing, we turned to use the rollapply function.

R function for performing Quantile LOESS

Here is the R function that implements the LOESS smoothed repeated running quantile (with implementation for using this with a simple implementation for using average instead of quantile):

Read more »

Nutritional supplements efficacy score – Graphing plots of current studies results (using R)

In this post I showcase a nice bar-plot and a balloon-plot listing recommended Nutritional supplements , according to how much evidence exists for thier benefits, scroll down to see it(and click here for the data behind it)
* * * *
The gorgeous blog “Information Is Beautiful” recently publish an eye candy post showing a “balloon race” image (see a static version of the image here) illustrating how much evidence exists for the benefits of various Nutritional supplements (such as: green tea, vitamins, herbs, pills and so on) . The higher the bubble in the Y axis score (e.g: the bubble size) for the supplement the greater the evidence there is for its effectiveness (But only for the conditions listed along side the supplement).

There are two reasons this should be of interest to us:

  1. This shows a fun plot, that R currently doesn’t know how to do (at least I wasn’t able to find an implementation for it). So if anyone thinks of an easy way for making one – please let me know.
  2. The data for the graph is openly (and freely) provided to all of us on this Google Doc.

The advantage of having the data on a google doc means that we can see when the data will be updated. But more then that, it means we can easily extract the data into R and have our way with it (Thanks to David Smith’s post on the subject)

For example, I was wondering what are ALL of the top recommended Nutritional supplements, an answer that is not trivial to get from the plot that was in the original post.

In this post I will supply two plots that present the data: A barplot (that in retrospect didn’t prove to be good enough) and a balloon-plot for a table (that seems to me to be much better).

Barplot
(You can click the image to enlarge it)

The R code to produce the barplot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
 
# loading the data
supplements.data.0 < - read.csv("http://spreadsheets.google.com/pub?key=0Aqe2P9sYhZ2ndFRKaU1FaWVvOEJiV2NwZ0JHck12X1E&output=csv")
supplements.data <- supplements.data.0[supplements.data.0[,2] >2,] # let's only look at "good" supplements
supplements.data < - supplements.data[!is.na(supplements.data[,2]),] # and we don't want any missing data
 
supplement.score <- supplements.data[, 2]
ss <- order(supplement.score, decreasing  = F)	# sort our data
supplement.score <- supplement.score[ss]
supplement.name <- supplements.data[ss, 1]
supplement.benefits <- supplements.data[ss, 4]
supplement.score.col <- factor(as.character(supplement.score))
	levels(supplement.score.col) <-  c("red", "orange", "blue", "dark green")
	supplement.score.col <- as.character(supplement.score.col)
 
# mar: c(bottom, left, top, right) The default is c(5, 4, 4, 2) + 0.1.
par(mar = c(5,9,4,13))	# taking care of the plot margins
bar.y <- barplot(supplement.score, names.arg= supplement.name, las = 1, horiz = T, col = supplement.score.col, xlim = c(0,6.2),
				main = c("Nutritional supplements efficacy score","(by evidence for its effectiveness on the listed condition)", "(2010)"))
axis(4, labels = supplement.benefits, at = bar.y, las = 1) # Add right axis
abline(h = bar.y, col = supplement.score.col , lty = 2) # add some lines so to easily follow each bar

Also, the nice things is that if the guys at Information Is Beautiful will update there data, we could easily run the code and see the updated list of recommended supplements.

Balloon plot
So after some web surfing I came around an implementation of a balloon plot in R (Thanks to R graph gallery)
There where two problems with using the command out of the box. The first one was that the colors where non informative (easily fixed), the second one was that the X labels where overlapping one another. Since there is no “las” parameter in the function, I just opened the function up, found where this was plotted and changed it manually (a bit messy, but that’s what you have to do sometimes…)

Here are the result (you can click the image for a larger image):

And here is The R code to produce the Balloon plot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).
(it’s just the copy of the function with a tiny bit of editing in line 146, and then using it)

Read more »

Fun interpretive dances for common statistical plots

My wife is a big lover of dance (especially Dance In Israel), and while reading through the NYtimes article: “To Impress, Tufts Prospects Turn to YouTube“, she found me a pearl: A woman performing interpretive dances for math/statistical plots. That includes small dance for: scatter plots, boxplots, barplots and a few others. Enjoy:

Web Development with R – an HD video tutorial of Jeroen Ooms talk

Here is a HD version of a video tutorial on web development with R, a lecture that was given by Jeroen Ooms (the guy who made A web application for R’s ggplot2). This talk was given at the Bay Area UseR Group meeting on R-Powered Web Apps.

You can also view the slides for his talk and view (great) examples for: stockplotlme4, and gpplot2.

Thanks again to Jeroen for sharing his knowledge and experience!

A web application for R’s ggplot2

One of the exciting new frontiers for R programming is of creating website interfaces to R code. At the forefront of this domain is a young and (very) bright man called Jeroen Ooms, whom I had the pleasure of meeting at useR 2009 (press the link to see his presentation).

Today Jeroen announced a new version (0.11) of his web interface to ggplot2. See it here:
http://www.yeroon.net/ggplot2/

As Jeroen wrote:

New features include 1D geom’s (histogram, density, freqpoly), syntax mode (by clicking the tiny arrow at the bottom), and some additional facet options. And some minor improvements and fixes, most notably for Internet Explorer.
The data upload has not been improved yet, I am working on that. For now, it supports .csv, .sav (spss), and tab delimited data. Please make sure your filename has the appropriate extension and every column has a header in your data. If you export a dataframe from R, use:
write.csv(mydf, ”mydf.csv” , row.names=F). If you upload an spss
datafile, none of this should be a concern.
Supported browsers are IE6-8, FF, Safari, and Chrome, but a recent browser is highly recommended. As always, feedback is more than welcome.

Here is a little demo video that shows how to use the new features:

The datafile from the demo is available at http://www.yeroon.net/ggplot2/myMovies.csv.

I wish the best to Jeroen, and hope to see many more such uses in the future.