R 3.2.3 is released (with improvements for Windows users, and general bug fixes)

R 3.2.3 (codename “Wooden Christmas Tree”) was released several days ago. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Major changes in R 3.2.3

As highlighted by David Smith, this release makes a few small improvements and bug fixes to R, including:

  • Improved support for users of the Windows OS in time zones, OS version identification, FTP connections, and printing (in the GUI).
  • Performance improvements and more support for long vectors in some functions including which.max
  • Improved accuracy for the Chi-Square distribution functions in some extreme cases

Upgrading to R 3.2.3 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE)
installr::updateR() # updating R.

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package.

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

NEW FEATURES

  • Some recently-added Windows time zone names have been added to the conversion table used to convert these to Olson names. (Including those relating to changes for Russia in Oct 2014, as in PR#16503.)
  • (Windows) Compatibility information has been added to the manifests for ‘Rgui.exe’, ‘Rterm.exe’ and ‘Rscript.exe’. This should allow win.version() andSys.info() to report the actual Windows version up to Windows 10.
  • Windows "wininet" FTP first tries EPSV / PASV mode rather than only using active mode (reported by Dan Tenenbaum).
  • which.min(x) and which.max(x) may be much faster for logical and integer x and now also work for long vectors.
  • The ‘emulation’ part of tools::texi2dvi() has been somewhat enhanced, including supporting quiet = TRUE. It can be selected by texi2dvi = "emulation".(Windows) MiKTeX removed its texi2dvi.exe command in Sept 2015: tools::texi2dvi() tries texify.exe if it is not found.
  • (Windows only) Shortcuts for printing and saving have been added to menus in Rgui.exe. (Request of PR#16572.)
  • loess(..., iterTrace=TRUE) now provides diagnostics for robustness iterations, and the print() method for summary(<loess>) shows slightly more.
  • The included version of PCRE has been updated to 8.38, a bug-fix release.
  • View() now displays nested data frames in a more friendly way. (Request with patch in PR#15915.)

BUG FIXES

  • regexpr(pat, x, perl = TRUE) with Python-style named capture did not work correctly when x contained NA strings. (PR#16484)
  • The description of dataset ToothGrowth has been improved/corrected. (PR#15953)
  • model.tables(type = "means") and hence TukeyHSD() now support "aov" fits without an intercept term. (PR#16437)
  • close() now reports the status of a pipe() connection opened with an explicit open argument. (PR#16481)
  • Coercing a list without names to a data frame is faster if the elements are very long. (PR#16467)
  • (Unix-only) Under some rare circumstances piping the output from Rscript or R -f could result in attempting to close the input file twice, possibly crashing the process. (PR#16500)
  • (Windows) Sys.info() was out of step with win.version() and did not report Windows 8.
  • topenv(baseenv()) returns baseenv() again as in R 3.1.0 and earlier. This also fixes compilerJIT(3) when used in ‘.Rprofile’.
  • detach()ing the methods package keeps .isMethodsDispatchOn() true, as long as the methods namespace is not unloaded.
  • Removed some spurious warnings from configure about the preprocessor not finding header files. (PR#15989)
  • rchisq(*, df=0, ncp=0) now returns 0 instead of NaN, and dchisq(*, df=0, ncp=*) also no longer returns NaN in limit cases (where the limit is unique). (PR#16521)
  • pchisq(*, df=0, ncp > 0, log.p=TRUE) no longer underflows (for ncp > ~60).
  • nchar(x, "w") returned -1 for characters it did not know about (e.g. zero-width spaces): it now assumes 1. It now knows about most zero-width characters and a few more double-width characters.
  • Help for which.min() is now more precise about behavior with logical arguments. (PR#16532)
  • The print width of character strings marked as "latin1" or "bytes" was in some cases computed incorrectly.
  • abbreviate() did not give names to the return value if minlength was zero, unlike when it was positive.
  • (Windows only) dir.create() did not always warn when it failed to create a directory. (PR#16537)
  • When operating in a non-UTF-8 multibyte locale (e.g. an East Asian locale on Windows), grep() and related functions did not handle UTF-8 strings properly. (PR#16264)
  • read.dcf() sometimes misread lines longer than 8191 characters. (Reported by Hervé Pagès with a patch.)
  • within(df, ..) no longer drops columns whose name start with a ".".
  • The built-in HTTP server converted entire Content-Type to lowercase including parameters which can cause issues for multi-part form boundaries (PR#16541).
  • Modifying slots of S4 objects could fail when the methods package was not attached. (PR#16545)
  • splineDesign(*, outer.ok=TRUE) (splines) is better now (PR#16549), and interpSpline() now allows sparse=TRUE for speedup with non-small sizes.
  • If the expression in the traceback was too long, traceback() did not report the source line number. (Patch by Kirill Müller.)
  • The browser did not truncate the display of the function when exiting with options("deparse.max.lines") set. (PR#16581)
  • When bs(*, Boundary.knots=) had boundary knots inside the data range, extrapolation was somewhat off. (Patch by Trevor Hastie.)
  • var() and hence sd() warn about factor arguments which are deprecated now. (PR#16564)
  • loess(*, weights = *) stored wrong weights and hence gave slightly wrong predictions for newdata. (PR#16587)
  • aperm(a, *) now preserves names(dim(a)).
  • poly(x, ..) now works when either raw=TRUE or coef is specified. (PR#16597)
  • data(package=*) is more careful in determining the path.
  • prettyNum(*, decimal.mark, big.mark): fixed bug introduced when fixing PR#16411.

INSTALLATION and INCLUDED SOFTWARE

  • The included configuration code for libintl has been updated to that from gettext version 0.19.5.1 — this should only affect how an external library is detected (and the only known instance is under OpenBSD). (Wish of PR#16464.)
  • configure has a new argument –disable-java to disable the checks for Java.
  • The configure default for MAIN_LDFLAGS has been changed for the FreeBSD, NetBSD and Hurd OSes to one more likely to work with compilers other than gcc(FreeBSD 10 defaults to clang).
  • configure now supports the OpenMP flags -fopenmp=libomp (clang) and -qopenmp (Intel C).
  • Various macros can be set to override the default behaviour of configure when detecting OpenMP: see file ‘config.site’.
  • Source installation on Windows has been modified to allow for MiKTeX installations without texi2dvi.exe. See file ‘MkRules.dist’.

“Why do people contribute to the R?” – concolusions from a new PNAS article

tl;dr: People contribute to R for various reasons, which evolves with time. The main reasons appear to be: “fun coding”, personal commitment to the community, interaction with like-minded and/or important people  – leading to higher self-esteem, future job opportunities, a chance to express oneself and enjoyable social inclusion.

From the abstract

One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation.

Summary of results

R developers, statisticians, and psychologists from Harvard University, University of Vienna, WU Vienna University of Economics, and University of Innsbruck empirically studied psychosocial drivers of participation of R package authors. Through an online survey they collected data from 1,448 package authors. The questionnaire included psychometric scales (types of motivation, psychological values, work design), sociodemografic variables related to the work on R, and three participation measures (number of packages, participation in mailing lists, participation in conferences).

cranpnas-dia

The data were analyzed using item response models and subsequently generalized linear models (logistic regressions, negative-binomial regression) with SIMEX corrected parameters.

The analysis reveals that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Hybrid motivation acknowledges that motivation is a complex continuum of intrinsic, extrinsic, and internalized extrinsic motives.
Motives evolve over time, as task characteristics shift from need-driven problem solving to mundane maintenance tasks within the R community.
For instance, motivation can evolve from pure “fun coding” towards a personal commitment with associated higher responsibilities within the community. The community itself provides a social work environment with high degrees of interaction, two facets of which are strong motivators. First, interaction with persons perceived as important increases one’s own reputation (self-esteem, future job opportunities, etc.) Second, interaction with alike minded persons (i.e., interested in solving statistical problems) creates opportunities to express oneself and enjoy social inclusion.

The findings do not substantiate the commonly held perception that people develop packages out of purely altruistic motives. It is also notable that in most cases package development is undertaken as part of an individual’s research, which is paid by an (academic) institution, rather than uncompensated developments that cut into leisure time.

Full paper (behind PNAS’s paywall for now) is available here:

Mair, P., Hofmann, E., Gruber, K., Hatzinger, R., Zeileis, A., and Hornik, K. (2015). Motivation, values, and work design as drivers of participation in the R
open source project for statistical computing. Proceedings of the National Academy of Sciences of the United States of America, 112(48), 14788-14792