R 3.4.0 is released – with new speed upgrades and bug-fixes

R 3.4.0 (codename “You Stupid Darkness”) was released 3 days ago. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of bug fixes and new features is provided below.

As mentioned two months ago by David Smith, R 3.4.0 indicates several major changes aimed at improving the performance of R in various ways. These includes:

  • The JIT (‘Just In Time’) byte-code compiler is now enabled by default at its level 3. This means functions will be compiled on first or second use and top-level loops will be compiled and then run. (Thanks to Tomas Kalibera for extensive work to make this possible.) For now, the compiler will not compile code containing explicit calls to browser(): this is to support single stepping from the browser() call. JIT compilation can be disabled for the rest of the session using compiler::enableJIT(0) or by setting environment variable R_ENABLE_JIT to 0.
  • Matrix products now consistently bypass BLAS when the inputs have NaN/Inf values. Performance of the check of inputs has been improved. Performance when BLAS is used is improved for matrix/vector and vector/matrix multiplication (DGEMV is now used instead of DGEMM). One can now choose from alternative matrix product implementations via options(matprod = ). The “internal” implementation is not optimized for speed but consistent in precision with other summations in R (using long double accumulators where available). “blas” calls BLAS directly for best speed, but usually with undefined behavior for inputs with NaN/Inf.
  • Speedup in simplify2array() and hence sapply() and mapply() (for the case of names and common length #> 1), thanks to Suharto Anggono’s PR#17118.
  • Accumulating vectors in a loop is faster – Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred. The implementation is subject to change and not intended to be used in packages at this time.
  • C-LEVEL FACILITIES have been extended.
  • Radix sort (which can be considered more efficient for some cases) is now chosen by method = “auto” for sort.int() for double vectors (and hence used for sort() for unclassed double vectors), excluding ‘long’ vectors. sort.int(method = “radix”) no longer rounds double vectors. The default method until R 3.2.0 was “shell”. A minimal comparison between the two shows that for very short vectors (100 values), “shell” would perform better. From a 1000 values, they are comparable, and for larger vectors – “radix” is doing 2-3 times faster (which is probably the use case for which we would care about more). More about this can be read in ?sort.int

 

#> 
#> set.seed(2017-04-24)
#> x  microbenchmark(shell = sort.int(x, method = "shell"), radix = sort.int(x, method = "radix"))
Unit: microseconds
  expr    min     lq     mean median     uq    max neval cld
 shell 15.775 16.606 17.80971 17.989 18.543 33.211   100  a 
 radix 32.657 34.595 35.67700 35.148 35.702 88.561   100   b
#> 
#> set.seed(2017-04-24)
#> x  microbenchmark(shell = sort.int(x, method = "shell"), radix = sort.int(x, method = "radix"))
Unit: microseconds
  expr    min     lq     mean median      uq    max neval cld
 shell 53.414 55.074 56.54395 56.182 57.0120 96.034   100   b
 radix 45.665 46.772 48.04222 47.325 48.1555 78.598   100  a 
#> 
#> set.seed(2017-04-24)
#> x  microbenchmark(shell = sort.int(x, method = "shell"), radix = sort.int(x, method = "radix"))
Unit: milliseconds
  expr      min       lq      mean    median        uq      max neval cld
 shell 93.33140 95.94478 107.75347 103.02756 115.33709 221.0800   100   b
 radix 38.18241 39.01516  46.47038  41.45722  47.49596 159.3518   100  a 
#> 
#> 

More about the changes in R case be read at the nice post by David Smith, or in the list of changes given below.

 

Upgrading to R 3.4.0 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.
# If you wish it to go faster, run: installr::updateR(T)

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package. If you only see the option to upgrade to an older version of R, then change your mirror or try again in a few hours (it usually take around 24 hours for all CRAN mirrors to get the latest version of R).

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

Continue reading “R 3.4.0 is released – with new speed upgrades and bug-fixes”