The ensurer package (validation inside pipes)

Guest post by Stefan Holst Milton Bache on the ensurer package.

If you use R in a production environment, you have most likely experienced that some circumstances change in ways that will make your R scripts run into trouble. Many things can go wrong; package updates, external data sources, daylight savings time, etc. There is a general increasing focus on this within the R community and words like “reproducibility”, “portability” and “unit testing” are buzzing big time. Many really neat solutions are already helping a lot: RStudio’s Packrat project, Revolution Analytic’s “snapshot” feaure, and Hadley Wickham’s testthat package to name a few. Another interesting package under development is Edwin de Jonge’s “validate” package.

I found myself running into quite a few annoying “runtime” moments, where some typically external factors break R software, and more often than not I spent just too much time tracking down where the bug originated. It made me think about how best to ensure that vulnarable statements behaves as expected and how to know exactly where and when things go wrong. My coding style is heaviliy influenced by the magrittr package’s pipe operator, and I am very happy with the workflow it generates:

data < -
  read_external(...) %>%
  make_transformation(...) %>%
  munge_a_little(...) %>%
  summarize_somehow(...) %>%
  filter_relevant_records(...) %T>%
  maybe_even_store

It’s like a recipe. But the problem is that I found no existing way of tagging potentially vulnarable steps in the above process, leaving the choice of doing nothing, or breaking it up. So I decided to make “ensurer”, so I could do:

data < -
  read_external(...) %>%
  ensure_that(all(is.good(.)) %>%
  make_transformation(...) %>%
  ensure_that(all(is.still.good(.))) %>%
  munge_a_little(...) %>% 
  summarize_somehow(...) %>%
  filter_relevant_records(...) %T>%
  maybe_even_store

Now, I don’t have a blog, but Tal Galili has been so kind to accept the ensurer vignette as a post for r-bloggers.com. I hope that ensurer can help you write better and safer code; I know it has helped me. It has some pretty neat features, so read on and see if you agree!

Continue reading The ensurer package (validation inside pipes)

Analyzing coverage of R unit tests in packages – the {testCoverage} package

(guest post by Andy Nicholls and the team of Mango Business Solutions)

Introduction

Testing is a crucial component in ensuring that the correct analyses are deployed. However it is often considered unglamorous; a poor relation in terms of the time and resources allocated to it in the process of developing a package. But with the increasing popularity and commercial application of R it testing is a subject that is gaining significantly in importance.

At the time of writing there are 5987 packages on CRAN. Due to the nature of CRAN and the motivations of contributors the quality of packages can vary greatly. Some are very popular and well maintained, others are essentially inactive with development having all but ceased. As the number of packages on CRAN continues to grow, determining which packages are fit for purpose in a commercial environment is becomming an increasingly difficult task. There have been numerous articles and blog posts on the subject of CRAN’s growth and the quality of R packages. In particular, Francis Smart’s R-bloggers post entitled Does R have too many packages? highlights five perceived concerns with the growing number of R packages. I would like to expand on one of these themes in particular, namely the “inconsistent quality of individual packages”.

There are many ways in which a package can be assessed for quality. Popularity is clearly one: if lots of people use it then it must be quite good! But popular packages tend to also have authors that actively develop their packages and fix bugs as users identify them. Development activity is therefore another factor; the length of time that a package has existed for; the package dependency tree and the number of reverse ‘Depends’, ‘Imports’ and ‘Suggests'; the number of authors and their reputation; and finally there is testing. Francis briefly mentions testing in his post noting that “testing is still largely left up to the authors and users”. In other words there is no requirement for an author to write tests for their package and often they don’t!

Continue reading Analyzing coverage of R unit tests in packages – the {testCoverage} package

R 3.1.2 release (and upgrading for Windows users)

R 3.1.2 (codename “Pumpkin Helmet“) was released last week. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Upgrading to R 3.1.2 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code:

# installing/loading the latest installr package:
install.packages("installr"); library(installr) #load / install+load installr
 
updateR() # updating R.

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.).

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to leave a comment below.

If you use the global library system (as I do), you can run the following in the new version of R:

source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()

CHANGES IN R 3.1.2:

David smith mentioned in his post some of the main changes, writing:

[…] improvements for the log-Normal distribution function, improved axis controls for histograms, a fix to the nlminb optimizer which was causing rare crashes on Windows (and traced to a bug in the gcc compiler), and some compatibility updates for the Yosemite release of OS X on Macs.

And here is also the full list:

Continue reading R 3.1.2 release (and upgrading for Windows users)

Simpler R coding with pipes > the present and future of the magrittr package

Background

It has only been 7 months and a bit since my initial magrittr commit to GitHub on January 1st. It has had more success than I had anticipated, and it appears that I was not quite alone with a frustration which caused me to start the magrittr project. I am not easily frustrated with R, but after a few weeks working with F# at work, I felt it upon returning to R: I had gotten used to writing code in a different way — all nicely aligned with thought and order of execution. The forward pipe operator |> was so addictive that being unable to do something similar in R was more than mildly irritating. Reversing thought, deciphering nested function calls, and making excessive use of temporary variables almost became deal breakers! Surprisingly, I had never really noticed this before, but once I did my returning to R became a difficult crossing.

An amazing thing about R is that it is a very flexible language and the problem could be solved. The |> operator in F# is indeed very simple: it is defined as let (|>) x f = f x. However, the usefulness of this simplicity relies heavily on a concept that is not available in Rpartial application. Furthermore, functions in F# almost always adhere to certain design principles which make the simple definition sufficient. Suppose that f is a function of two arguments, then in F# you may apply f to only the first argument and obtain a new function as the result — a function of the second argument alone. This is partial application, and works with any number of arguments, but application is always from left to right in the argument list. This is why the most important argument (and the one most likely to be a left-hand side object in the pipeline) is almost always the last argument, which in turn makes the simple definition of |> work. To illustrate, consider the following example:

some_value |> some_function other_value

Here, some_function is partially applied to other_value, creating a new function of a single argument, and by the simple definition of |>, this is applied to some_value.

It was clear to me that because R is lacking native partial application and conventions on argument order, no simple solution would be satisfactory, although definitely possible, see e.g. here or here. I wanted to make something that would feel natural in R, and which would serve the main purpose of improving cognitive performance of those writing the code, and of those reading the code.

It turned out that while I was working on magrittr’s %>% operator, Hadley Wickham and Romain Francois was implementing a similar %.% operator in their dplyr package which they announced on January 17. However, it was not quite as flexible, and we thought that piping functionality was better placed in its own more light-weight package. Hadley joined the magrittr project, and in dplyr 2.0 the %.% operator was deprecated — instead%>% was imported from magrittr.

Continue reading Simpler R coding with pipes > the present and future of the magrittr package

R 3.1.1 is released (and how to quickly update it on Windows OS)

R 3.1.1 (codename “Sock it to Me“) was released today! You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Upgrading to R 3.1.1 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code:

# installing/loading the latest installr package:
install.packages("installr"); require(installr) #load / install+load installr
 
updateR()

After running “updateR()”, the function will detect that R is available for you, and will download+install it (etc.).

Note that the latest installr version (0.15.3) was released just less than a month ago to CRAN, and it is recommended to upgrade to it, since it has more updated URLs to some software.
I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to leave a comment below.

If you use the global library system (as I do), you can run the following in the new version of R:

source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()

CHANGES IN R 3.1.1:

David smith gave a nice summary of the features here. And here is also the full list:

NEW FEATURES

Continue reading R 3.1.1 is released (and how to quickly update it on Windows OS)

The dendextend package for visualizing and comparing trees of hierarchical clusterings (slides from useR!2014)

This week I presented in the useR!2014 my package dendextend (also on github), for easily manipulating, visualizing, and comparing dendrograms. Put simply, it is a package designed to easily create figures like these:

dendextend_01

Here is my presentation from useR:

Download (PDF, 8.42MB)

You are also invited to give a look to the current version of the package vignettes:

https://github.com/talgalili/dendextend/blob/master/vignettes/dendextend-tutorial.pdf

I highly welcome features suggestions and bug reports (or just “wow, this is awesome”) sent to my e-mail (tal.galili AT gmail.com), you can also leave a comment or use the github issue page.

A sidenote on useR!2014: this year’s useR conference was wonderful! I enjoyed the many talks, sessions, posters, and especially the so many wonderful R users I got to meet (and I will not try to list all of you – but you know who you are, and how much I enjoyed seeing you!). As corny as it may sound – we, the people who use R, are truly a community. There is a lot to be said about getting to meet so many people who share my own passion for statistical programming, open source, collaboration, open science, and a better future in general. Gladly, you can get a sense of what happened there by having a look at the twitter hashtag #useR2014. Several great R bloggers already started writing about it, you can see their posts here: 1, 2, 3, 4, 5. And I hope more posts will follow. I hope to see you in next year’s useR!2015!

R 3.1.0 is released!

R 3.1.0 (codename “Spring Dance“) was released today!

hora jump
Photo credit: The Batsheva Dance Company in Ohad Naharin’s Hora. Photo by Gadi Dagon.

You can get the source code from
http://cran.r-project.org/src/base/R-3/R-3.1.0.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course.

The full list of new features and bug fixes is provided below.

Upgrading to R 3.1.0

You can download the latest version from here.

If you are using Windows, it might take another 24 hours until you could update R. For convenience, you can upgrade to the latest version of R using the installr package. Simply run the following code:

# installing/loading the latest installr package:
install.packages("installr"); require(installr) #load / install+load installr
 
updateR()

After running “updateR()”, the function will detect that R is available for you, and will download+install it (etc.).

Note that the latest installr version (0.14.0) was released a week ago to CRAN, and it is recommended to upgrade to it, since it is now more robust for various extreme cases of upgrading R.
I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to leave a comment below.

If you use the global library system (as I do), you can run the following in the new version of R:

source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()

CHANGES IN R 3.1.0:

NEW FEATURES

Continue reading R 3.1.0 is released!

R 3.0.3 is released

R 3.0.3 (codename “Warm Puppy) was released several days ago. The full list of new features and bug fixes is provided below.

Upgrading to R 3.0.3

You can download the latest version from here. Or, if you are using Windows, you can upgrade to the latest version using the installr package. Simply run the following code:

# installing/loading the package:
if(!require(installr)) { 
install.packages("installr"); require(installr)} #load / install+load installr
 
updateR()

I try to keep the installr package updated and useful. If you have any suggestions or remarks on the package, you’re invited to leave a comment below.

If you use the global library system (as I do), you can run the following in the new version of R:

source("http://www.r-statistics.com/wp-content/uploads/2010/04/upgrading-R-on-windows.r.txt")
New.R.RunMe()

CHANGES IN R 3.0.3:

NEW FEATURES

Continue reading R 3.0.3 is released

Statistics with R, and open source stuff (software, data, community)