A web application for R's ggplot2

One of the exciting new frontiers for R programming is of creating website interfaces to R code. At the forefront of this domain is a young and (very) bright man called Jeroen Ooms, whom I had the pleasure of meeting at useR 2009 (press the link to see his presentation).

Today Jeroen announced a new version (0.11) of his web interface to ggplot2. See it here:

As Jeroen wrote:

New features include 1D geom’s (histogram, density, freqpoly), syntax mode (by clicking the tiny arrow at the bottom), and some additional facet options. And some minor improvements and fixes, most notably for Internet Explorer.
The data upload has not been improved yet, I am working on that. For now, it supports .csv, .sav (spss), and tab delimited data. Please make sure your filename has the appropriate extension and every column has a header in your data. If you export a dataframe from R, use:
write.csv(mydf, ”mydf.csv” , row.names=F). If you upload an spss
datafile, none of this should be a concern.
Supported browsers are IE6-8, FF, Safari, and Chrome, but a recent browser is highly recommended. As always, feedback is more than welcome.

Here is a little demo video that shows how to use the new features:

The datafile from the demo is available at http://www.yeroon.net/ggplot2/myMovies.csv.

I wish the best to Jeroen, and hope to see many more such uses in the future.

Free statistics e-books for download

This post will eventually grow to hold a wide list of books on statistics (e-books, pdf books and so on) that are available for free download.  But for now we’ll start off with just one several books:


Several of these books were discovered through a CrossValidated discussion:

  1. http://stats.stackexchange.com/questions/614/open-source-statistical-textbooks/
  2. https://stats.stackexchange.com/questions/170/free-statistical-textbooks

* * *

Know of any more e-books freely available for download? Please write to us about them in the comments.

R Flashmob

Today I noticed a call for R users to gather around a single campfire for one hour and share their questions and answers.

The campfire name is stackoverflow.com, a site dedicated for handling programming questions. The event details are bellow:

From: The R Flashmob Project
Subject: R Flashmob #2

You are invited to take part in R Flashmob, the project that makes the
world a better place by posting helpful questions and answers about the
R statistical language to the programmer’s Q & A site stackoverflow.com

Please forward this to other people you know who might like to join.


Q. Why would I want to join an inexplicable R mob?

A. Tons of other people are doing it.

Q. Why else?

A. Stackoverflow was built specifically for handling programming questions.
It’s a better mousetrap. It offers search (and is well indexed by search engines),
tagging, voting, the ability to choose the “best” answer to a question, and the ability to
edit questions and answers as technology progresses. It has a karma system to
reward people who are happy to help and discourage MLJs (mailing list jerks).

Q. Do the organizers of this MOB have any commercial interest in stackoverflow?

A. None at all. We’re just convinced it is the best way to help and promote R. All
the content submitted to stackoverflow is protected by a Creative Commons
CC-Wiki License, meaning anyone is free to copy, distribute, transmit, and
remix the information on stackoverflow. All the content on stackoverflow is
regularly made available for download by the public.

Location: stackoverflow.com
Start Date: Tuesday, September 8th, 2009
Start Time:
10:04 AM – US Pacific
11:04 AM – US Mountain
12:04 PM – US Central
1:04 PM – US Eastern
6:04 PM – UK
7:04 PM – Continental W. Europe
5:04 AM (Weds) – New Zealand (birthplace of R)
Duration: 50 minutes

(1) At some point during the day on September 8th, synchronize your watch to

(2) The mob should form at precisely 4 minutes past the hour and not beforehand.

(3) At 4 minutes past the hour, you should arrive at stackoverflow.com, log in,
and post 3 R questions. Be sure to tag the questions “R”. See the posting
guidelines at http://stackoverflow.com/faq to understand what makes a good

(4) Follow R Flashmob updates at http://twitter.com/rstatsmob

(5) Post twitter messages tagged #rstats and #rstatsmob during the mob,
providing links to your questions.

(6) During the R MOB, you can chat with other participants on the #R channel
on IRC (freenode). To do this, install the Chatzilla extension on Firefox.
Click “freenode” on the main screen. Then type /join #R in the field at the
bottom of the screen. Then chat.

(7) If you finish posting your three questions within the 50 minutes, stick
around to answer questions and give “up votes” to good questions and answers.

(8) IMPORTANT: After posting, sign the R Flashmob guestbook at

(9) Return to what you would otherwise have been doing. Await
instructions for R MOB #3.

This invitation already gained exposure from 3 blogs:

I am waiting to see who else will join the fun.

Simple visualization of a 11X5 table (for WordPress 2.9 Features Vote Results)

Simply Something Sophisicated - a WordPress poster

I guess this is not the number one post I would like to start with on this blog, but I feel the time is right for it (community-wise).

I’ll move on to the subject matter in a moment, but first a short intro: This blog is written by Tal Galili. I am an aspiring statistician who also loves to use R for his work. At the same time I am also a WordPress blogger, writing mainly at www.TalGalili.com where I can use my native language (Hebrew) for self expression.

This combination of statistics and blogging will lead me to sometimes much less statistical, but more Web/Open-Source oriented posts like this one. So for the statisticians in the audience I extend my apologies and invite you to wait for future posts which will be more fully focused on Statistics and R.

And now for the topic at hand. . .

*         *         *         *         *
Continue reading “Simple visualization of a 11X5 table (for WordPress 2.9 Features Vote Results)”

What is R?


  • R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.   If you wish to download R, please choose your preferred CRAN mirror.
  • The R language has become a de facto standard among statisticians for the development of statistical software,and is widely used for statistical software development and data analysis.
  • Basic questions about R like how to download and install the software, or what the license terms are, are answered in the answers to frequently asked questions section.

Introduction to R

R is a language and environment for statistical computing and graphics.  R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

R is available as Free Software under the terms of the Free Software Foundation‘s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

R and S

R is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues.  R can be considered as a different implementation of S.  There are some important differences, but much code written for S runs unaltered under R. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

The R environment

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented.  R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.

R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.

(credit: the R about page and the Wikipedia article R (programming language))