The new GUI for ggplot2 (using Deducer) – the designer wants your opinion

After discovering that R is expected (this summer) to have a GUI for ggplot2 (through deducer), I later found Ian’s gsoc proposal for this GUI.  Since the system is in it’s early stages of development, Ian has invited people to give comments, input and critique on his plans for the project.

For your convenience (and with Ian’s permission), I am reposting his proposal here. You are welcome to send him feedback by e-mailing him (at: [email protected]), or by leaving a comment here (and I will direct him to your comment).

Continue reading The new GUI for ggplot2 (using Deducer) – the designer wants your opinion

R is going to have a GUI to ggplot2! (by the end of this years google-summer-of-code)

I was delighted to see the following e-mail post from Dirk Eddelbuettel regarding the google-summer-of-code R google group:
* * *

Earlier today Google finalised student / mentor pairings and allocations for
the Google Summer of Code 2010 (GSoC 2010). The R Project is happy to
announce that the following students have been accepted:

Colin Rundel, “rgeos – an R wrapper for GEOS”, mentored by Roger Bivand of
the Norges Handelshoyskole, Norway

Ian Fellows, “A GUI for Graphics using ggplot2 and Deducer”, mentored by
Hadley Wickham of Rice University, USA

Chidambaram Annamalai, “rdx – Automatic Differentiation in R”, mentored by
John Nash of University of Ottawa, Canada

Yasuhisa Yoshida, “NoSQL interface for R”, mentored by Dirk Eddelbuettel,
Chicago, USA

Felix Schoenbrodt, “Social Relations Analyses in R”, mentored by Stefan
Schmukle, Universitaet Muenster, Germany

Details about all proposals are on the R Wiki page for the GSoC 2010 at

The R Project is honoured to have received its highest number of student
allocations yet, and looks forward to an exciting Summer of Code. Please
join me in welcoming our new students.

At this time, I would also like to thank all the other students who have
applied for working with R in this Summer of Code. With a limited number of
available slots, not all proposals can be accepted — but I hope that those
not lucky enough to have been granted a slot will continue to work with R and
towards making contributions within the R world.

I would also like to express my thanks to all other mentors who provided for
a record number of proposals. Without mentors and their project ideas we
would not have a Summer of Code — so hopefully we will see you again next


Dirk (acting as R/GSoC 2010 admin)

* * *

From all the projects, the one I am most excited about is:
Ian Fellows, “A GUI for Graphics using ggplot2 and Deducer”, mentored by Hadley Wickham of Rice University, USA

Deducer (text from the website) attempts to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is to two-fold.

  • Provide an intuitive interface so that non-technical users can learn and perform analyses without programming getting in their way.
  • Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.

Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).

This combination (of Deducer and ggplot2) might finally provide the bridge to the layman-statistician that some people recently wrote to be one of R’s weak spots (while other bloogers wrote back that this is o.k., still no one refuted that R doesn’t compete with the point-and-click of softwares like SPSS or JMP.)
I came across Ian in the discussion forums, where he provided very kind help to his package “deducer”. Coupled with having Hadley as his mentor, I am very optimistic about the prospects of seeing this project reaching very high standards.
Very exciting development indeed!

Update: Ian’s proposal is available to view here.

p.s: for some intuition about how a GUI for ggplot2 can look like, have a look at this video of Jeroen Ooms’s ggplot2 web interface

How to upgrade R on windows XP – another strategy (and the R code to do it)

Update: This post has a follow-up for how to upgrade R on windows 7 explaining how to deal with permission issues.

Background – how I heard that there is more then one way to upgrade R

If you didn’t hear it by now – R 2.11.0 is out with a bunch of new features.

After Andrew Gelman recently lamented the lack of an easy upgrade process for R, a Stackoverflow thread (by JD Long) invited R users to share their strategies for easily upgrading R.

Upgrading strategy – moving to a global R library

In that thread, Dirk Eddelbuettel suggested another idea for upgrading R. His idea is of using a folder for R’s packages which is outside the standard directory tree of the installation (a different strategy then the one offered on the R FAQ).

The idea of this upgrading strategy is to save us steps in upgrading. So when you wish to upgrade R, instead of doing the following three steps:

  • download new R and install
  • copy the “library” content from the old R to the new R
  • upgrade all of the packages (in the library folder) to the new version of R.

You could instead just have steps 1 and 3, and skip step 2 (thus, saving us time…).

For example, under windows XP, you might have R installed on:
C:Program FilesRR-2.11.0
But (in this alternative model for upgrading) you will have your packages library on a “global library folder” (global in the sense of independent of a specific R version):
C:Program FilesRlibrary

So in order to use this strategy, you will need to do the following steps (all of them are performed in an R code provided later in the post)-

  1. In the OLD R installation (in the first time you move to the new system of managing the upgrade):
    1. Create a new global library folder (if it doesn’t exist)
    2. Copy to the new “global library folder” all of your packages from the old R installation
    3. After you move to this system – the steps 1 and 2 would not need to be repeated. (hence the advantage)
  2. In the NEW R installation:
    1. Create a new global library folder (if it doesn’t exist – in case this is your first R installation)
    2. Premenantly point to the Global library folder whenever R starts
    3. (Optional) Delete from the “Global library folder” all the packages that already exist in the local library folder of the new R install (no need to have doubles)
    4. Update all packages. (notice that you picked a mirror where the packages are up-to-date, you sometimes need to choose another mirror)

Thanks to help from Dirk, David Winsemius and Uwe Ligges, I was able to write the following R code to perform all the tasks I described :-)

So first you will need to run the following code:
Continue reading How to upgrade R on windows XP – another strategy (and the R code to do it)

The difference between "letters[c(1,NA)]" and "letters[c(NA,NA)]"

In David Smith’s latest blog post (which, in a sense, is a continued response to the latest public attack on R), there was a comment by Barry that caught my eye. Barry wrote:

Even I get caught out on R quirks after 20 years of using it. Compare letters[c(12,NA)] and letters[c(NA,NA)] for the most recent thing that made me bang my head against the wall.

So I did, and here’s the output:

> letters[c(12,NA)]
[1] "l" NA
>  letters[c(NA,NA)]

Interesting isn’t it?
I had no clue why this had happened but luckily for us, Barry gave a follow-up reply with an explanation. And here is what he wrote:
Continue reading The difference between "letters[c(1,NA)]" and "letters[c(NA,NA)]"

Parallel Multicore Processing with R (on Windows)

This post offers simple example and installation tips for “doSMP” the new Parallel Processing backend package for R under windows.
* * *

The required packages are not yet now available on CRAN, but until they will get online, you can download them from here:
REvolution foreach windows bundle
(Simply unzip the folders inside your R library folder)

* * *

Recently, REvolution blog announced the release of “doSMP”, an R package which offers support for symmetric multicore processing (SMP) on Windows.
This means you can now speed up loops in R code running iterations in parallel on a multi-core or multi-processor machine, thus offering windows users what was until recently available for only Linux/Mac users through the doMC package.


For now, doSMP is not available on CRAN, so in order to get it you will need to download the REvolution R distribution “R Community 3.2” (they will ask you to supply your e-mail, but I trust REvolution won’t do anything too bad with it…)
If you already have R installed, and want to keep using it (and not the REvolution distribution, as was the case with me), you can navigate to the library folder inside the REvolution distribution it, and copy all the folders (package folders) from there to the library folder in your own R installation.

If you are using R 2.11.0, you will also need to download (and install) the revoIPC package from here:
revoIPC package – download link (required for running doSMP on windows)
(Thanks to Tao Shi for making this available!)


Once you got the folders in place, you can then load the packages and do something like this:

workers <- startWorkers(2) # My computer has 2 cores
# create a function to run in each itteration of the loop
check <-function(n) {
	for(i in 1:1000)
		sme <- matrix(rnorm(100), 10,10)
times <- 10	# times to run the loop
# comparing the running time for each loop
system.time(x <- foreach(j=1:times ) %dopar% check(j))  #  2.56 seconds  (notice that the first run would be slower, because of R's lazy loading)
system.time(for(j in 1:times ) x <- check(j))  #  4.82 seconds
# stop workers

Points to notice:

  • You will only benefit from the parallelism if the body of the loop is performing time-consuming operations. Otherwise, R serial loops will be faster
  • Notice that on the first run, the foreach loop could be slow because of R’s lazy loading of functions.
  • I am using startWorkers(2) because my computer has two cores, if your computer has more (for example 4) use more.
  • Lastly – if you want more examples on usage, look at the “ParallelR Lite User’s Guide”, included with REvolution R Community 3.2 installation in the “doc” folder


(15.5.10) :
The new R version (2.11.0) doesn’t work with doSMP, and will return you with the following error:

Loading required package: revoIPC
Error: package ‘revoIPC’ was built for i386-pc-intel32

So far, a solution is not found, except using REvolution R distribution, or using R 2.10
A thread on the subject was started recently to report the problem. Updates will be given in case someone would come up with better solutions.

Thanks to Tao Shi, there is now a solution to the problem. You’ll need to download the revoIPC package from here:
revoIPC package – download link (required for running doSMP on windows)
Install the package on your R distribution, and follow all of the other steps detailed earlier in this post. It will now work fine on R 2.11.0

Update 2: Notice that I added, in the beginning of the post, a download link to all the packages required for running parallel foreach with R 2.11.0 on windows. (That is until they will be uploaded to CRAN)

Update 3 (04.03.2011): doSMP is now officially on CRAN!

An article attacking R gets responses from the R blogosphere – some reflections

In this post I reflect on the current state of the R blogosphere, and share my hopes for it’s future.

* * *


I am very grateful to Dr. AnnMaria De Mars for writing her post “The Next Big Thing”.
In her post, Dr. De Mars attacked R by accusing it of being “an epic fail” (in being user-friendly) and “NOT the next big thing”. Of course one should look at Dr. De Mars claims in their context. She is talking about particular aspects in which R fails (the lacking of a mature GUI for non-statisticians), and had her own (very legitimate) take on where to look for “the next big thing”. All in all, her post was decent, and worth contemplating upon respectfully (even if one, me for example, doesn’t agree with all of Dr. De Mars claims.)

R bloggers are becoming a community

But Dr. De Mars post is (very) important for a different reason. Not because her claims are true or false, but because her writing angered people who love and care for R (whether legitimately or not, it doesn’t matter). Anger, being a very powerful emotion, can reveal interesting things. In our case, it just showed that R bloggers are connected to each other.

So far there are 69 R bloggers who wrote in reply to Dr. De Mars post (some more kind then others), they are:

  • R and the Next Big Thing by David Smith
  • This is good news, since it shows that R has a community of people (not “just people”) who write about it.
    In one of the posts, someone commented about how R current stage reminds him of how linux was in 1998, and how he believes R will grow to be amazingly dominant in the next 10 years.
    In the same way, I feel the R blogosphere is just now starting to “wake up” and become aware that it exists. Already 6 bloggers found they can write not just about R code, but also reply to does who “attack” R (in their view). Imagine how the R blogosphere might look in a few years from now…

    I would like to end with a more general note about the importance of R bloggers collaboration to the R ecosystem.

    Continue reading An article attacking R gets responses from the R blogosphere – some reflections

    "The next big thing", R, and Statistics in the cloud

    A friend just e-mailed me about a blog post by Dr. AnnMaria De Mars titled “The Next Big Thing”.

    In it Dr. De Mars wrote (I allowed myself to emphasize some parts of the text):

    Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. […]
    for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.

    Here are my two cents on the subject:
    Continue reading "The next big thing", R, and Statistics in the cloud