Recently I was asked by O’Reilly publishing to give a book review for Paul Teetor new introductory book to R. After giving the book some attention and appreciating it’s delivery of the material, I was happy to write and post this review. Also, I’m very happy to see how a major publishing house like O’Reilly is producing more and more R books, great news indeed.
And now for the book review:
Executive summary: a book that offers a well designed gentle introduction for people with some background in statistics wishing to learn how to get common (basic) tasks done with R.
By: Paul Teetor
MediaReleased: January 2011
Pages: 58 (est.)
The book “25 Recipes for Getting Started with R” offers an interesting take on how to bring R to the general (statistically oriented) public.
Instead of teaching R (or topics in statistics) in a systematic way, the author chose to assemble a likely set of cheat-sheet-like how-to tasks (“R recipes”) that a new user of R is assumed to encounter in their first steps of using R. Tasks like: Installing R, finding help, reading data, selecting data, basic summary statistics, plotting some graphs, loading packages, and performing/diagnosing OLS regression.
These recipes were taken from the “R Cookbook” (O’Reilly) which contains over 200 such recipes.
Each of the 25 “R recipe” is comprised of four sections:
- Problem - stating in one sentence what is the task we wish to accomplish.
- Solution - a direct solution to the problem presented in very few paragraphs (ranging from one paragraph up to a page)
- Discussion - an extension of the solution, offering several pages of variations and common pitfalls.
- See also – with reference for further information (not always present)
The book is modest in it’s presumptions of scope (which I appreciate) and tries only to offer a bird’s eye view for statistically oriented, first time (short on time) users, wanting to feel they can get to do “something” using R.
I can imagine a first year student (or an IT professional with some stats background), benefiting from such a book if they have learned their stats with another package (like stata, SAS , SPSS and so on).
The books scope is both an advantage and a disadvantage, depending on the target audience. I would find it surprising if experience R users will have much (or any) to gain from it, and it can not serve as a reference. Although this might be a different case with the extended “R cookbook” (which I hope to get my hands on at this point or another, since I enjoyed the authors writing).
Lastly, I should mention that someone who is already well versed in SAS or SPSS would probably prefer Robert Muenchens superb book “R for SAS and SPSS Users” in order to make the transition to R smoother.
Content outline (with some notes)
I added some notes to the chapter names. I’d like to state again that my general impression of the book is good. The points I make are mostly subtle and only placed to guide you in case you give the book as a gift to a friend, in case you might wish to emphasize some things to your friend that were not mentioned in this book.
The books content includes:
- Downloading and Installing R
- Getting Help on a Function
- Viewing the Supplied Documentation
- Searching the Web for Help – credit goes to the author for mentioning stats.stackexchange.com and stackoverflow.com , while highlighting the use of the R tag on stackoverflow. Although I wished he had mentions R-bloggers (edit: after corresponding with the author, he wrote to me that: F.Y.I., I do mention R-bloggers in the full R Cookbook. The 25 Recipes book cannot contain as much useful information. In the Cookbook, I recommend that readers follow R-bloggers as a way to keep up with developments in the R community.).
- Reading Tabular Datafiles – the author makes proper distinctions with how to menage factors vs characters.
- Reading from CSV Files
- Creating a Vector
- Computing Basic Statistics – the author gives proper room for handling missing values.
- Initializing a Data Frame from Column Data
- Selecting Data Frame Columns by Position
- Selecting Data Frame Columns by Name
- Forming a Confidence Interval for a Mean
- Forming a Confidence Interval for a Proportion
- Comparing the Means of Two Samples
- Testing a Correlation for Significance
- Creating a Scatter Plot – I wish more attention would have been made to talking about lattice (which was mentioned, twice, in the book) and ggplot2 (in the see also, discussion or the preface). The same could have been said about many other procedures but I think graphics and R is a special case since it should be clear to the reader how R packages can extend it’s statistical procedures but the reader may not notice how there are R packages that extend it’s graphical capabilities as well.
- Creating a Bar Chart
- Creating a Box Plot
- Creating a Histogram
- Performing Simple Linear Regression
- Performing Multiple Linear Regression – there might have been room to mention the existence of “I” (for example: y~x+I(x^2)) and interactions (“*”).
- Getting Regression Statistics
- Diagnosing a Linear Regression – this section include the command outlier.test which is based on the car package (and not in base R). It would have probably been clearer if the author directed the reader to the section on using packages in the “see also” instead of only talking about install.pacakges (which wasn’t the place for it, IMHO).
- Predicting New Values – I would have recommended to highlight the importance of retaining the same column names in the new data.frame since failing to do so results in a (quite common) failure of the function.
- Accessing the Functions in a Package – I think this section should have been referenced more. And also that the installation of new packages could have been inserted here.
* * *
If you got to have a look at the book, I’d be very curious to read your thoughts about it in the comments.