R 3.0.2 and RStudio 0.9.8 are released!

R 3.0.2 (codename “Frisbee Sailing”) was released yesterday. The full list of new features and bug fixes is provided below.

Also, RStudio v0.98 (in a “secret” preview) was announced two days ago with MANY new features, including:

Upgrading to R 3.0.2

You can download the latest version from here. Or, if you are using Windows, you can upgrade to the latest version using the installr package (also available on CRAN and github). Simply run the following code:

# installing/loading the package:
if(!require(installr)) { 
install.packages("installr"); require(installr)} #load / install+load installr
 
updateR(to_checkMD5sums = FALSE) # the use of to_checkMD5sums is because of a slight bug in the MD5 file on R 3.0.2. This issue is already resolved in the installr version on github, and will be released into CRAN in about a month from now..

I try to keep the installr package updated and useful. If you have any suggestions or remarks on the package, you’re invited to leave a comment below.

If you use the global library system (as I do), you can run the following in the new version of R:

source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()

p.s: you can also use the installr package to quickly install the new RStudio by using:

# installing/loading the package:
if(!require(installr)) { 
install.packages("installr"); require(installr)} #load / install+load installr
 
install.RStudio()

Continue reading

A speed test comparison of plyr, data.table, and dplyr

ssssssspeed_521872450_d085d1e928

Guest post by Jake Russ

For a recent project I needed to make a simple sum calculation on a rather large data frame (0.8 GB, 4+ million rows, and ~80,000 groups). As an avid user of Hadley Wickham’s packages, my first thought was to use plyr. However, the job took plyr roughly 13 hours to complete.

plyr is extremely efficient and user friendly for most problems, so it was clear to me that I was using it for something it wasn’t meant to do, but I didn’t know of any alternative screwdrivers to use.

I asked for some help on the manipulator Google group , and their feedback led me to data.table and dplyr, a new, and still in progress, package project by Hadley.

What follows is a speed comparison of these three packages incorporating all the feedback from the manipulator folks. They found it informative, so Tal asked me to write it up as a reproducible example.

Continue reading