How to upgrade R on windows 7

Background – time to upgrade to R 2.13.0

The news of the new release of R 2.13.0 is out, and the R blogosphere is buzzing. Bloggers posting excitedly about the new R compiler package that brings with it the hope to speed up our R code with up to 4 times improvement and even a JIT compiler for R. So it is time to upgrade, and bloggers are here to help. Some wrote how to upgrade R on Linux and mac OSX (based on posts by Paolo). And it is now my turn, with suggestions on how to upgrade R on windows 7.

Upgrading R on windows – the two strategies

The classic description of how to upgrade R can be found in the R project FAQ page (and also the FAQ on how to install R on windows)

There are basically two strategies for R upgrading on windows. The first is to install a new R version and copy paste all the packages to the new R installation folder. The second is to have a global R package folder, each time synced to the most current R installation (thus saving us the time of copying the package library each we upgrade R).

I described the second strategy in detail in a post I wrote a year ago titled: “How to upgrade R on windows XP – another strategy” which explains how to upgrade R using the simple two-liner code:


p.s: If this is the first time you are upgrading R using this method, then first run the following two lines on your old R installation (before running the above code in the new R intallation):


The above code should be enough.  However, there are some common pitfalls you might encounter when upgrading R on windows 7, bellow I outline the ones I know about, and how they can be solved.

Continue reading How to upgrade R on windows 7

Article about plyr published in JSS, and the citation was added to the new plyr (version 1.5)

The plyr package (by Hadley Wickham) is one of the few R packages for which I can claim to have used for all of my statistical projects. So whenever a new version of plyr comes out I tend to be excited about it (as was when version 1.2 came out with support for parallel processing)

So it is no surprise that the new release of plyr 1.5 got me curious. While going through the news file with the new features and bug fixes, I noticed how (quietly) Hadley has also released (6 days ago) another version of plyr prior to 1.5 which was numbered 1.4.1. That version included only one more function, but a very important one – a new citation reference for when using the plyr package. Here is how to use it:

install.packages("plyr") # so to upgrade to the latest release

The output gives both a simple text version as well as a BibTeX entry for LaTeX users. Here it is (notice the download link for yourself to read):

To cite plyr in publications use:
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data
Analysis. Journal of Statistical Software, 40(1), 1-29. URL

I hope to see more R contributers and users will make use of the ?citation() function in the future.

Beeswarm Boxplot (and plotting it with R)

(The image above is called a “Beeswarm Boxplot” , the code for producing this image is provided at the end of this post)

The above plot is implemented under different names in different softwares. This “Scatter Dot Beeswarm Box Violin – plot” (in the lack of an agreed upon term) is a one-dimensional scatter plot which is like “stripchart”, but with closely-packed, non-overlapping points; the positions of the points are corresponding to the frequency in a similar way as the violin-plot. The plot can be superimposed with a boxplot to give a very rich description of the underlaying distribution.

This plot has been implemented in various statistical packages, in this post I will list the few I came by so far. And if you know of an implementation I’ve missed please tell me about it in the comments.

Continue reading Beeswarm Boxplot (and plotting it with R)

Book review: 25 Recipes for Getting Started with R

Recently I was asked by O’Reilly publishing to give a book review for Paul Teetor new introductory book to R.  After giving the book some attention and appreciating it’s delivery of the material, I was happy to write and post this review.  Also, I’m very happy to see how a major publishing house like O’Reilly is producing more and more R books, great news indeed.

And now for the book review:

Executive summary: a book that offers a well designed gentle introduction for people with some background in statistics wishing to learn how to get common (basic) tasks done with R.


By: Paul Teetor
MediaReleased: January 2011
Pages: 58 (est.)


The book “25 Recipes for Getting Started with R” offers an interesting take on how to bring R to the general (statistically oriented) public.

Continue reading Book review: 25 Recipes for Getting Started with R

How to label all the outliers in a boxplot

In this post I present a function that helps to label outlier observations When plotting a boxplot using R.

An outlier is an observation that is numerically distant from the rest of the data. When reviewing a boxplot, an outlier is defined as a data point that is located outside the fences (“whiskers”) of the boxplot (e.g: outside 1.5 times the interquartile range above the upper quartile and bellow the lower quartile).

Identifying these points in R is very simply when dealing with only one boxplot and a few outliers. That can easily be done using the “identify” function in R. For example, running the code bellow will plot a boxplot of a hundred observation sampled from a normal distribution, and will then enable you to pick the outlier point and have it’s label (in this case, that number id) plotted beside the point:

y <- rnorm(100)
identify(rep(1, length(y)), y, labels = seq_along(y))

However, this solution is not scalable when dealing with:

  • Many outliers
  • Overlapping data-points, and
  • Multiple boxplots in the same graphic window

For such cases I recently wrote the function “boxplot.with.outlier.label” (which you can download from here). This function will plot operates in a similar way as “boxplot” (formula) does, with the added option of defining “label_name”. When outliers are presented, the function will then progress to mark all the outliers using the label_name variable. This function can handle interaction terms and will also try to space the labels so that they won’t overlap (my thanks goes to Greg Snow for his function “spread.labs” from the {TeachingDemos} package, and helpful comments in the R-help mailing list).

Here is some example code you can try out for yourself:

source("") # Load the function
# sample some points and labels for us:
y <- rnorm(2000)
x1 <- sample(letters[1:2], 2000,T)
x2 <- sample(letters[1:2], 2000,T)
lab_y <- sample(letters[1:4], 2000,T)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x2*x1, lab_y)

Here is the resulting graph:

You can also have a try and run the following code to see how it handles simpler cases:

# plot a boxplot without interactions:
boxplot.with.outlier.label(y~x1, lab_y, ylim = c(-5,5))
# plot a boxplot of y only
boxplot.with.outlier.label(y, lab_y, ylim = c(-5,5))
boxplot.with.outlier.label(y, lab_y, spread_text = F) # here the labels will overlap (because I turned spread_text off)

Here is the output of the last example, showing how the plot looks when we allow for the text to overlap (we would often prefer to NOT allow it).

boxplot - with one group and identifiyed outliers (allowing label overlap)

Regarding package dependencies: notice that this function requires you to first install the packages {TeachingDemos} (by Greg Snow) and {plyr} (by Hadley Wickham)

19.04.2011 – I’ve added support to the boxplot “names” and “at” parameters.

You are very much invited to leave your comments if you find a bug, think of ways to improve the function, or simply enjoyed it and would like to share it with me.

Call for proposals for writing a book about R (via Chapman & Hall/CRC)

Rob Calver wrote an interesting invitation on the R mailing list today, inviting potential authors to submit their vision of the next great book about R. The announcement originated from the Chapman & Hall/CRC publishing houses, backed up by an impressive team of R celebrities, chosen as the editors of this new R books series, including:

Bellow is the complete announcement:
Continue reading Call for proposals for writing a book about R (via Chapman & Hall/CRC)

R-bloggers in 2010: Top 14 R posts, site statistics and invitation for sponsors

A year ago (on December 9th 2009), I wrote about founding, an (unofficial) online R journal written by bloggers who agreed to contribute their R articles to the site.

In this post I wish to celebrate R-bloggers’ first birthday by sharing with you:

  1. Links to the top 14 posts of 2010
  2. Reflections about the origin of R-bloggers
  3. Statistics on “how well” R-bloggers did this year
  4. Links to other related projects
  5. An invitation for sponsors/supporters to help keep the site alive

Continue reading R-bloggers in 2010: Top 14 R posts, site statistics and invitation for sponsors

The R Journal, Vol.2 Issue 2 is out

The second issue of the second volume of The R Journal is now available .

Download complete issue

Refereed articles may be downloaded individually using the links below. [Bibliography of refereed articles]

Table of Contents

Editorial 3

Contributed Research Articles

Solving Differential Equations in R
Karline Soetaert, Thomas Petzoldt and R. Woodrow Setzer
Source References
Duncan Murdoch
hglm: A Package for Fitting Hierarchical Generalized Linear Models
Lars Rönnegård, Xia Shen and Moudud Alam
dclone: Data Cloning in R
Péter Sólymos
stringr: modern, consistent string processing
Hadley Wickham
Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations
David Ardia and Lennart F. Hoogerheide
cudaBayesreg: Bayesian Computation in CUDA
Adelino Ferreira da Silva
binGroup: A Package for Group Testing
Christopher R. Bilder, Boan Zhang, Frank Schaarschmidt and Joshua M. Tebbs
The RecordLinkage Package: Detecting Errors in Data
Murat Sariyar and Andreas Borg
spikeslab: Prediction and Variable Selection Using Spike and Slab Regression
Hemant Ishwaran, Udaya B. Kogalur and J. Sunil Rao

From the Core

What’s New? 74

News and Notes

useR! 2010 77
Forthcoming Events: useR! 2011 79
Changes in R 81
Changes on CRAN 90
News from the Bioconductor Project 101
R Foundation News 102

New edition of "R Companion to Applied Regression" – by John Fox and Sandy Weisberg

Just two hours ago, Professor John Fox has announced on the R-help mailing list of a new (second) edition to his book “An R and S Plus Companion to Applied Regression”, now title . “An R Companion to Applied Regression, Second Edition”.

John Fox is (very) well known in the R community for many contributions to R, including the car package (which any one who is interested in performing SS type II and III repeated measures anova in R, is sure to come by), the Rcmdr pacakge (one of the two major GUI’s for R, the second one is Deducer), sem (for Structural Equation Models) and more.  These might explain why I think having him release a new edition for his book to be big news for the R community of users.

In this new edition, Professor Fox has teamed with Professor Sandy Weisberg, to refresh the original edition so to cover the development gained in the (nearly) 10 years since the first edition was written.

Here is what John Fox had to say:

Dear all,

Sandy Weisberg and I would like to announce the publication of the second
edition of An R Companion to Applied Regression (Sage, 2011).

As is immediately clear, the book now has two authors and S-PLUS is gone
from the title (and the book). The R Companion has also been thoroughly
rewritten, covering developments in the nearly 10 years since the first
edition was written and expanding coverage of topics such as R graphics and
R programming. As before, however, the R Companion provides a general
introduction to R in the context of applied regression analysis, broadly
construed. It is available from the publisher at (US) or (UK), and from Amazon (see here)

The book is augmented by a web site with data sets, appendices on a variety of topics, and more, and it associated with the car package on CRAN, which has recently undergone an overhaul.

John and Sandy

Continue reading New edition of "R Companion to Applied Regression" – by John Fox and Sandy Weisberg

WP-CodeBox: A better R syntax highlighter plugin for WordPress

Today I was informed of (what I believe is) a better the best WordPress plugin for R syntax highlighting called WP-CodeBox.  This plugin doesn’t require any hacks to make it work (as opposed to the WP-Syntax plugin, which I wrote about in the past).  WP-CodeBox can be downloaded and installed on a WordPress by searching for it in the “Add New” section in the plugins menu.

WP-CodeBox provides some nice features (some AJAX based) to the display of the code in the post:

  1. The code box in the post can now be folded (top right of the code box) so the code can be hidden so to not clutter the post (if the code is too long)
  2. The code box is added with another button  (top left of the code box) which allows the reader to see the code in a new window – so to easily enable a copy paste of the code.
  3. The options of the plugin allows automatic row numbering of the code, control over “tab” length and some other features.

p.s: Lastly, my thanks goes to guangchuang yu who’s comment on my original post, and he’s post on wp-codebox and R, has introduced me to this better plugin.

p.p.s: in case you blog on, there is also a solution for R syntax highlighting for bloggers.