Seeking a New Maintainer for the Popular R Package installr

TL;DR

I’m seeking someone to take over maintenance of the the popular R package installr (github), due to a shift away from Windows OS. The package has been downloaded over 3.3 million times and has a current download rate of around 61k times a month. The ideal candidate should have experience with Windows OS, be an experienced R developer, and be passionate about helping fellow R users.

Interested? please leave a comment on the github issue here.

 

Continue reading “Seeking a New Maintainer for the Popular R Package installr”

Installing Pandoc from R (on Windows) – using the {installr} package

The R blogger Rolf Fredheim has recently wrote a great piece called “Reproducible research with R, Knitr, Pandoc and Word“, where he advocates for Pandoc as an essential part of reproducible research workflow in R, in helping to turn documents which are knitted in R into high quality Word for exchanging with our colleagues. It is a great post, with many useful bits of code, and I wanted to supplement it with one missing function: “install.pandoc“.

Update: the install.pandoc function is now part of the {installr} package.

Continue reading “Installing Pandoc from R (on Windows) – using the {installr} package”

How to load the {rJava} package after the error "JAVA_HOME cannot be determined from the Registry"

In case you tried loading a package that depends on the {rJava} package (by Simon Urbanek), you might came across the following error:

Loading required package: rJava
library(rJava)
Error : .onLoad failed in loadNamespace() for ‘rJava’, details:
call: fun(libname, pkgname)
error: JAVA_HOME cannot be determined from the Registry

The error tells us that there is no entry in the Registry that tells R where Java is located. It is most likely that Java was not installed (or that the registry is corrupt).

This error is often resolved by installing a Java version (i.e. 64-bit Java or 32-bit Java) that fits to the type of R version that you are using (i.e. 64-bit R or 32-bit R). This problem can easily effect Windows 7 users, since they might have installed a version of Java that is different than the version of R they are using.

Note that it is necessary to ‘manually download and install’ the 64 bit version of JAVA. By default, the download page gives a 32 bit version .

You can pick the exact version of Java you wish to install from this link. If you might (for some reason) work on both versions of R, you can install both version of Java (Installing the “Java Runtime Environment” is probably good enough for your needs).
(Source: Uwe Ligges)

Other possible solutions is trying to re-install rJava.

If that doesn’t work, you could also manually set the directory of your Java location by setting it before loading the library:

Sys.setenv(JAVA_HOME='C:\\Program Files\\Java\\jre7') # for 64-bit version
Sys.setenv(JAVA_HOME='C:\\Program Files (x86)\\Java\\jre7') # for 32-bit version
library(rJava)

(Source: “nograpes” from Stackoverflow, which also describes the find.java in the rJava:::.onLoad function)

Using the {plyr} (1.2) package parallel processing backend with windows

Hadley Wickham has just announced the release of a new R package “reshape2” which is (as Hadley wrote) “a reboot of the reshape package”. Alongside, Hadley announced the release of plyr 1.2.1 (now faster and with support to parallel computation!).
Both releases are exciting due to a significant speed increase they have now gained.

Yet in case of the new plyr package, an even more interesting new feature added is the introduction of the parallel processing backend.

    Reminder what is the `plyr` package all about

    (as written in Hadley’s announcement)

    plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to:

    • fit the same model each patient subsets of a data frame
    • quickly calculate summary statistics for each group
    • perform group-wise transformations like scaling or standardising

    It’s already possible to do this with base R functions (like split and the apply family of functions), but plyr makes it all a bit easier with:

    • totally consistent names, arguments and outputs
    • convenient parallelisation through the foreach package
    • input from and output to data.frames, matrices and lists
    • progress bars to keep track of long running operations
    • built-in error recovery, and informative error messages
    • labels that are maintained across all transformations

    Considerable effort has been put into making plyr fast and memory efficient, and in many cases plyr is as fast as, or faster than, the built-in functions.

    You can find out more at http://had.co.nz/plyr/, including a 20 page introductory guide, http://had.co.nz/plyr/plyr-intro.pdf.  You can ask questions about plyr (and data-manipulation in general) on the plyr mailing list. Sign up at http://groups.google.com/group/manipulatr

    What’s new in `plyr` (1.2.1)

    The exiting news about the release of the new plyr version is the added support for parallel processing.

    l*ply, d*ply, a*ply and m*ply all gain a .parallel argument that when TRUE, applies functions in parallel using a parallel backend registered with the
    foreach package.

    The new package also has some minor changes and bug fixes, all can be read here.

    In the original announcement by Hadley, he gave an example of using the new parallel backend with the doMC package for unix/linux.  For windows (the OS I’m using) you should use the doSMP package (as David mentioned in his post earlier today). However, this package is currently only released for “REvolution R” and not released yet for R 2.11 (see more about it here).  But due to the kind help of Tao Shi there is a solution for windows users wanting to have parallel processing backend to plyr in windows OS.

    All you need is to install the doSMP package, according to the instructions in the post “Parallel Multicore Processing with R (on Windows)“, and then use it like this:


    require(plyr) # make sure you have 1.2 or later installed
    x <- seq_len(20) wait <- function(i) Sys.sleep(0.1) system.time(llply(x, wait)) # user system elapsed # 0 0 2 require(doSMP) workers <- startWorkers(2) # My computer has 2 cores registerDoSMP(workers) system.time(llply(x, wait, .parallel = TRUE)) # user system elapsed # 0.09 0.00 1.11

    Update (03.09.2012): the above code will no longer work with updated versions of R (R 2.15 etc.)

    Trying to run it will result in the error massage:

    Loading required package: doSMP
    Warning message:
    In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called ‘doSMP’
    

    Because trying to install the package will give the error massage:

    > install.packages("doSMP")
    Installing package(s) into ‘D:/R/library’
    (as ‘lib’ is unspecified)
    Warning message:
    package ‘doSMP’ is not available (for R version 2.15.0)
    

    You can fix this be replacing the use of {doSMP} package with the {doParallel}+{foreach} packages. Here is how:

    if(!require(foreach)) install.packages("foreach")
    if(!require(doParallel)) install.packages("doParallel")
    # require(doSMP) # will no longer work...
    library(foreach)
    library(doParallel)
    workers <- makeCluster(2) # My computer has 2 cores
    registerDoParallel(workers)
    
    x <- seq_len(20)
    wait <- function(i) Sys.sleep(0.3)
    system.time(llply(x, wait)) # 6 sec
    system.time(llply(x, wait, .parallel = TRUE)) # 3.53 sec
    

    How to upgrade R on windows XP – another strategy (and the R code to do it)

    Update: This post has a follow-up for how to upgrade R on windows 7 explaining how to deal with permission issues.

    Background – how I heard that there is more then one way to upgrade R

    If you didn’t hear it by now – R 2.11.0 is out with a bunch of new features.

    After Andrew Gelman recently lamented the lack of an easy upgrade process for R, a Stackoverflow thread (by JD Long) invited R users to share their strategies for easily upgrading R.

    Upgrading strategy – moving to a global R library

    In that thread, Dirk Eddelbuettel suggested another idea for upgrading R. His idea is of using a folder for R’s packages which is outside the standard directory tree of the installation (a different strategy then the one offered on the R FAQ).

    The idea of this upgrading strategy is to save us steps in upgrading. So when you wish to upgrade R, instead of doing the following three steps:

    • download new R and install
    • copy the “library” content from the old R to the new R
    • upgrade all of the packages (in the library folder) to the new version of R.

    You could instead just have steps 1 and 3, and skip step 2 (thus, saving us time…).

    For example, under windows XP, you might have R installed on:
    C:Program FilesRR-2.11.0
    But (in this alternative model for upgrading) you will have your packages library on a “global library folder” (global in the sense of independent of a specific R version):
    C:Program FilesRlibrary

    So in order to use this strategy, you will need to do the following steps (all of them are performed in an R code provided later in the post)-

    1. In the OLD R installation (in the first time you move to the new system of managing the upgrade):
      1. Create a new global library folder (if it doesn’t exist)
      2. Copy to the new “global library folder” all of your packages from the old R installation
      3. After you move to this system – the steps 1 and 2 would not need to be repeated. (hence the advantage)
    2. In the NEW R installation:
      1. Create a new global library folder (if it doesn’t exist – in case this is your first R installation)
      2. Premenantly point to the Global library folder whenever R starts
      3. (Optional) Delete from the “Global library folder” all the packages that already exist in the local library folder of the new R install (no need to have doubles)
      4. Update all packages. (notice that you picked a mirror where the packages are up-to-date, you sometimes need to choose another mirror)

    Thanks to help from Dirk, David Winsemius and Uwe Ligges, I was able to write the following R code to perform all the tasks I described 🙂

    So first you will need to run the following code:
    Continue reading “How to upgrade R on windows XP – another strategy (and the R code to do it)”

    Parallel Multicore Processing with R (on Windows)

    Parallel Processing backend for R under windows – installation tips and some examples.

    This post offers simple example and installation tips for “doSMP” the new Parallel Processing backend package for R under windows.
    * * *

    Update:
    The required packages are not yet now available on CRAN, but until they will get online, you can download them from here:
    REvolution foreach windows bundle
    (Simply unzip the folders inside your R library folder)

    * * *

    Recently, REvolution blog announced the release of “doSMP”, an R package which offers support for symmetric multicore processing (SMP) on Windows.
    This means you can now speed up loops in R code running iterations in parallel on a multi-core or multi-processor machine, thus offering windows users what was until recently available for only Linux/Mac users through the doMC package.

    Installation

    For now, doSMP is not available on CRAN, so in order to get it you will need to download the REvolution R distribution “R Community 3.2” (they will ask you to supply your e-mail, but I trust REvolution won’t do anything too bad with it…)
    If you already have R installed, and want to keep using it (and not the REvolution distribution, as was the case with me), you can navigate to the library folder inside the REvolution distribution it, and copy all the folders (package folders) from there to the library folder in your own R installation.

    If you are using R 2.11.0, you will also need to download (and install) the revoIPC package from here:
    revoIPC package – download link (required for running doSMP on windows)
    (Thanks to Tao Shi for making this available!)

    Usage

    Once you got the folders in place, you can then load the packages and do something like this:

    require(doSMP)
    workers <- startWorkers(2) # My computer has 2 cores
    registerDoSMP(workers)
    
    # create a function to run in each itteration of the loop
    check <-function(n) {
    	for(i in 1:1000)
    	{
    		sme <- matrix(rnorm(100), 10,10)
    		solve(sme)
    	}
    }
    
    
    times <- 10	# times to run the loop
    
    # comparing the running time for each loop
    system.time(x <- foreach(j=1:times ) %dopar% check(j))  #  2.56 seconds  (notice that the first run would be slower, because of R's lazy loading)
    system.time(for(j in 1:times ) x <- check(j))  #  4.82 seconds
    
    # stop workers
    stopWorkers(workers)
    

    Points to notice:

    • You will only benefit from the parallelism if the body of the loop is performing time-consuming operations. Otherwise, R serial loops will be faster
    • Notice that on the first run, the foreach loop could be slow because of R's lazy loading of functions.
    • I am using startWorkers(2) because my computer has two cores, if your computer has more (for example 4) use more.
    • Lastly - if you want more examples on usage, look at the "ParallelR Lite User's Guide", included with REvolution R Community 3.2 installation in the "doc" folder

    Updates

    (15.5.10) :
    The new R version (2.11.0) doesn't work with doSMP, and will return you with the following error:

    Loading required package: revoIPC
    Error: package 'revoIPC' was built for i386-pc-intel32


    So far, a solution is not found, except using REvolution R distribution, or using R 2.10
    A thread on the subject was started recently to report the problem. Updates will be given in case someone would come up with better solutions.

    Thanks to Tao Shi, there is now a solution to the problem. You'll need to download the revoIPC package from here:
    revoIPC package - download link (required for running doSMP on windows)
    Install the package on your R distribution, and follow all of the other steps detailed earlier in this post. It will now work fine on R 2.11.0


    Update 2: Notice that I added, in the beginning of the post, a download link to all the packages required for running parallel foreach with R 2.11.0 on windows. (That is until they will be uploaded to CRAN)

    Update 3 (04.03.2011): doSMP is now officially on CRAN!