The “Future of Open Source” Survey – an R user’s thoughts and conclusions

Over a month ago, David Smith published a call for people to participate in the “Future of Open Source” Survey. 550 people (and me) took the survey, and today I got an e-mail with the news that the 2010 survey results are analysed and where published in the “Future.Of.Open.Source blog” In the following (38 slides) presentation:

I would like to thank Bryan House and anyone else who took part in making this survey, analyzing and publishing it’s results.

The presentation has left me with some thoughts and conclusions, I would like to share with you here.

Pre conclusions 1 – thoughts about the graphical/statistical presentation:
(p.s: all in good faith, please – no taking offense from anything I write. And if you have anything to comment on – please enlighten me in the comments)

  • (-1) For (most of) the uses of pie-charts instead of bar-plots (for more on that, see Wikipedia on pie charts)
  • (+1) For comparing previous years to current year.
  • (+1) For using different font weights (point sizes) to emphasize quantity on slide 12 (I found it useful)
  • (-1) After this presentation was made “Tufte killed another kitten” (link hat-tip for letting me know about the image goes to David of revolution’s blog)
  • (+1) Good use of images!
  • (-2) For only presenting 1 dimensional analysis of the data

Pre conclusions 2 - A plea for providing the source data for the Survey:

My big hope is to see the release of the source data collected in the survey published so that other people (me :-) ) will be able to analyse it.  “Setting the data free” as can be derived from O’Reilly’s keynote at OSBC conference, is a bit virtue.  Here’s a link to his talk slides, and to David’s wonderful notes about that talk (A great read.)

And now for some (humble) conclusions from the survey.

Conclusion 1 –  Let’s invest in making the following of R extension even more scalable

Slide 12 – people believe (now more then in previous years) that one of OSS attractive features are it’s rapid pace of innovation.

That’s good news for R, since R is known for that it gives more “up to date” statistical tools then any other statistical package in existence.  That is due to amazing community of statisticians and statistical programmers, coupled with a solid structure for creating R extensions.

But at the same time, there are several challenges in having open source innovation.

One such drawback is given by John Chambers on the subject in “Facets of R”  (A Special invited paper on “The Future of R”  - see page 3 section “Modular design and collaborative support”), and I quote:

On the downside, a large collaborative enterprise with a general practice of making collective decisions has a natural tendency towards conservatism. Radical changes do threaten to break currently working features. The future benefits they might bring will often not be sufficiently persuasive. The very success of R, combined with its collaborative facet, poses a challenge to cultivate the “next big step” in software for data analysis.

Another good discussion of this was made by John Fox in Aspects of the Social Organization and Trajectory of the R ProjectThe R Journal, 1(2):5-13, December 2009

Both authors reflect on how CRAN is having so many packages (extensions to R core).  While the diversity is wonderful, the scalability in the user’s ability to handle the variety is limited.  From a user’s perspective it is very hard to find/follow/manage all the innovative R extensions out there.  One hope for improvement in this front is the project “Crantastic“, which I hope will get (much) more attention and expansion.  An optimistic news regarding the future of the project was published recently by Dirk Eddelbuettel who shared with all of us about the open (R) projects in 2010 google summer of code, two important projects (in this respect) are Crantastic2 and cran_stats, which I hope will come through.

Conclusion 2 –  If you want R to spread – support open source in general

slide 13 – shows that people believe that the there are 3 main drivers for the adaptation of OSS (such as R):

  1. Public sector adaptation – R is in the Universities – checked.
  2. Private sector adaptation- R has a way to go here – but we are on the way
  3. Past experience with OSS – my conclusion from this is that if you help promote any open source software, chances are you are also helping to promote R.

Conclusion 3 –  get to know what “the cloud” can do for you!

slide 24 – This year, 40% of the people answering the survey (twice as much in the past two years), said that Cloud computing is gonne have an impact on OSS vendors.  If you don’t know what you can do with R and the cloud, it might be time for you to learn the subject and see if you are not missing out on something.

Some of this year’s tutorials on useR2010 conference, will talk about cloud computing and R:

My current (humble) contribution to the subject is the post I recently published about How to use google forms with R to Easily collect and access data for analysis.

* * *

I welcome any comments (or reply posts) on the subject. Please let me know what you think (of the survey results and on the points I brought up)