Update: the competition was just launched.
* * *
What is the competition about?
Drew Conway and John Myles Whyte have collected data from (52) R users about the packages they have installed. The data is now available on github for download and the contest will be run on the kaggle platform.
For more details, head over to dataists.
And for fun, here is the dependency graph for R packages they have assembled so far:
A graphical visualization of packages’ “suggestion” relationships. Affectionately referred to as the R Flying Spaghetti Monster. More info below.
A tiny bit more on R bloggers virality
Continue reading A competition to recommend "relevant" R packages – and the future of R
A few hours ago, Jens Oehlschlägel has announced on the R-help mailing list of the release of a new version of the ff package.
The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory – the effective virtual memory consumption per ff object.
Here are the new features of ff, as Jens wrote in his announcement:
Dear R community,
The next release of package ff is available on CRAN. With kind help of Brian Ripley it now supports the Win64 and Sun versions of R. It has three major functional enhancements:
a) new fast in-memory sorting and ordering functions (single-threaded)
b) ff now supports on-disk sorting and ordering of ff vectors and ffdf dataframes
c) ff integer vectors now can be used as subscripts of ff vectors and ffdf dataframes
a) is achieved by careful implementation of NA-handling and exploiting context information
b) although permanently stored, sorting and ordering of ff objects can be faster than the standard routines in R
c) applying an order to ff vectors and ffdf dataframes is substantially slower than in pure R because it involves disk-access AND sorting index positions (to avoid random access).
There is still room for improvement, however, the current status should already be useful. I run some comparisons with SAS (see end of mail):
– both could sort German census size (81e6 rows) on a 3GB notebook
– ff sorts and orders faster on single columns
– sorting big multicolumn-tables is faster in SAS
Continue reading A new version of ff released (version 2.2.0)