In it Dr. De Mars wrote (I allowed myself to emphasize some parts of the text):

Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. […]
for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.

Repeated measures ANOVA is a common task for the data analyst.

There are (at least) two ways of performing “repeated measures ANOVA” using R but none is really trivial, and each way has it’s own complication/pitfalls (explanation/solution to which I was usually able to find through searching in the R-help mailing list).

So for future reference, I am starting this page to document links I find to tutorials, explanations (and troubleshooting) of “repeated measure ANOVA” done with R

Functions and packages

(I suggest using the tutorials supplied bellow for how to use these functions)

aov {stats} – offers SS type I repeated measures anova, by a call to lm for each stratum. A short example is given in the ?aov help file

Anova {car} – Calculates type-II or type-III analysis-of-variance tables for model objects produced by lm, and for various other object. The ?Anova help file offers an example for how to use this for repeated measures

ezANOVA {ez} – This function provides easy analysis of data from factorial experiments, including purely within-Ss designs (a.k.a. “repeated measures”), purely between-Ss designs, and mixed within-and-between-Ss designs, yielding ANOVA results and assumption checks. It is a wrapper of the Anova {car} function, and is easier to use. The ez package also offers the functions ezPlot and ezStats to give plot and statistics of the ANOVA analysis. The ?ezANOVA help file gives a good demonstration for the functions use (My thanks goes to Matthew Finkbe for letting me know about this cool package)

friedman.test {stats} – Performs a Friedman rank sum test with unreplicated blocked data. That is, a non-parametric one-way repeated measures anova. I also wrote a wrapper function to perform and plot a post-hoc analysis on the friedman test results

Non parametric multi way repeated measures anova – I believe such a function could be developed based on the Proportional Odds Model, maybe using the {repolr} or the {ordinal} packages. But I still didn’t come across any function that implements these models (if you do – please let me know in the comments).

Repeated measures, non-parametric, multivariate analysis of variance – as far as I know, such a method is not currently available in R. There is, however, the Analysis of similarities (ANOSIM) analysis which provides a way to test statistically whether there is a signiﬁcantdifference between two or more groups of sampling units. Is is available in the {vegan} package through the “anosim” function. There is also a tutorial and a relevant published paper.

Good Tutorials

A basic tutorial about ANOVA with R (only the last bit holds some example of repeated measures) on personality-project

Unbalanced design
Unbalanced design doesn’t work when doing repeated measures ANOVA with aov, it just doesn’t. This situation occurs if there are missing values in the data or that the data is not from a fully balanced design. The way this will show up in your output is that you will see the between subject section showing withing subject variables.

A solution for this might be to use the Anova function from library car with parameter type=”III”. But before doing that, first make sure you understand the difference between SS type I, II and III. Here is a good tutorial for helping you out with that.
By the way, these links are also useful in case you want to do a simple two way ANOVA for unbalanced design

I will “later” add R-help mailing list discussions that I found helpful on the subject.

If you come across good resources, please let me know about them in the comments.

yeroon.net/ggplot2 is a web interface for Hadley Wickham’s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the client side: a standard browser will do.

The new version has a lot of cool new features, like advanced data import, integration with Google docs, converting variables from numeric to factor to dates and vice versa, and a lot of new geom’s. Some of which you can watch in his new video demo of the application:

When analyzing a questionnaire, one often wants to view the correlation between two or more Likert questionnaire item’s (for example: two ordered categorical vectors ranging from 1 to 5).

When dealing with several such Likert variable’s, a clear presentation of all the pairwise relation’s between our variable can be achieved by inspecting the (Spearman) correlation matrix (easily achieved in R by using the “cor.test” command on a matrix of variables).
Yet, a challenge appears once we wish to plot this correlation matrix. The challenge stems from the fact that the classic presentation for a correlation matrix is a scatter plot matrix – but scatter plots don’t (usually) work well for ordered categorical vectors since the dots on the scatter plot often overlap each other.

There are four solution for the point-overlap problem that I know of:

Jitter the data a bit to give a sense of the “density” of the points

Use a color spectrum to represent when a point actually represent “many points”

Use different points sizes to represent when there are “many points” in the location of that point

Add a LOWESS (or LOESS) line to the scatter plot – to show the trend of the data

In this post I will offer the code for the a solution that uses solution 3-4 (and possibly 2, please read this post comments). Here is the output (click to see a larger image):

The integration of R into online web services is (for me) one of the more exciting prospects in R’s future. That is way I was very excited coming across Jamie Love’s recent creation: R-Node.

What is R-Node

R-Node is a (open source) web front-end to R (the statistical analysis package).

Using this front-end, you can from any web browser connect to an R instance running on a remote (or local) server, and interact with it, sending commands and receiving the responses. In particular, graphing commands such as plot() and hist() will execute in the browser, drawing the graph as an SVG image.

You can see a live demonstration of this interface by visiting: http://69.164.204.238:2904/
And using the following user/password login info:
User: pvdemouser
Password: svL35NmPwMnt
(This link was originally posted here)

Here are some screenshots:

In the second screenshot you see the results of the R command ‘plot(x, y)’ (with the reimplementation of plot doing the actual plotting), and in the fourth screenshot you see a similar plot command along with a subsequent best fit line (data points calculated with ‘lowess()’) drawn in.

Once in, you can try out R by typing something like:

x <-rnorm(100)plot(x, main="Random numbers")
l <-lowess(x)lines(l$y)

The plot and lines commands will bring up a graph – you can escape out of it, download the graph as a SVG file, and change the graph type (e.g. do: plot (x, type=”o”) ).
Many R commands will work, though only the hist(), plot() and lines() work for graphing.
Please don’t type the R command q() – it will quit the server, stopping it working for everyone! Also, as everyone shares the same session for now, using more unique variable name than ‘x’ and ‘l’ will help you.

Currently there is only limited error checking but the code continues to be improved and developed. You can download it from: http://gitorious.org/r-node

How do you may imagine yourself using something like this? Feel invited to share with me and everyone else in the comments.

In this post I will provide R code that implement’s the combination of repeated running quantile with the LOESS smoother to create a type of “quantile LOESS” (e.g: “Local Quantile Regression”).

This method is useful when the need arise to fit robust and resistant (Need to be verified) a smoothed line for a quantile (an example for such a case is provided at the end of this post).

If you wish to use the function in your own code, simply run inside your R console the following line:

In recent years, a growing need has arisen in different fields, for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with non-standard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. […] we use a non-standard mix of robust and resistant methods: LOWESS and repeated running median.

The motivation for this technique came from “Path data” (of mice) which is

prone to suffer from noise and outliers. During progression a tracking system might lose track of the animal, inserting (occasionally very large) outliers into the data. During lingering, and even more so during arrests, outliers are rare, but the recording noise is large relative to the actual size of the movement. The statistical implications are that the two types of behavior require different degrees of smoothing and resistance. An additional complication is that the two interchange many times throughout a session. As a result, the statistical solution adopted needs not only to smooth the data, but also to recognize, adaptively, when there are arrests. To the best of our knowledge, no single existing smoothing technique has yet been able to fulfill this dual task. We elaborate on the sources of noise, and propose a mix of LOWESS (Cleveland, 1977) and the repeated running median (RRM; Tukey, 1977) to cope with these challenges

If all we wanted to do was to perform moving average (running average) on the data, using R, we could simply use the rollmean function from the zoo package.
But since we wanted also to allow quantile smoothing, we turned to use the rollapply function.

R function for performing Quantile LOESS

Here is the R function that implements the LOESS smoothed repeated running quantile (with implementation for using this with a simple implementation for using average instead of quantile):

Someone on the R mailing list (link) asked: how can you easily (daily) collect data from many people into a spreadsheet and then analyse it using R.

The answer people gave to it where on various ways of using excel. But excel files (at least for now), are not “on the cloud”. A better answer might be to create a google form that will update a google spreadsheet that will then be read by R.

If my last sentence wasn’t clear to you, then this post is for you.

JRR Tolkien wrote a story (sans hobbits) called ‘Leaf by Niggle’ that has always resonated with me. I offer you an imperfect, incomplete tree (but my roof is intact).

Suggestions for improvements are encouraged.

And here is the link tree for the document (for your easy reviewing of the offered content) :

In this post I showcase a nice bar-plot and a balloon-plot listing recommended Nutritional supplements , according to how much evidence exists for thier benefits, scroll down to see it(and click here for the data behind it)
* * * *
The gorgeous blog “Information Is Beautiful” recently publish an eye candy post showing a “balloon race” image (see a static version of the image here) illustrating how much evidence exists for the benefits of various Nutritional supplements (such as: green tea, vitamins, herbs, pills and so on) . The higher the bubble in the Y axis score (e.g: the bubble size) for the supplement the greater the evidence there is for its effectiveness (But only for the conditions listed along side the supplement).

There are two reasons this should be of interest to us:

This shows a fun plot, that R currently doesn’t know how to do (at least I wasn’t able to find an implementation for it). So if anyone thinks of an easy way for making one – please let me know.

The data for the graph is openly (and freely) provided to all of us on this Google Doc.

The advantage of having the data on a google doc means that we can see when the data will be updated. But more then that, it means we can easily extract the data into R and have our way with it (Thanks to David Smith’s post on the subject)

For example, I was wondering what are ALL of the top recommended Nutritional supplements, an answer that is not trivial to get from the plot that was in the original post.

In this post I will supply two plots that present the data: A barplot (that in retrospect didn’t prove to be good enough) and a balloon-plot for a table (that seems to me to be much better).

Barplot
(You can click the image to enlarge it)

The R code to produce the barplot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).

# loading the data
supplements.data.0 <-read.csv("http://spreadsheets.google.com/pub?key=0Aqe2P9sYhZ2ndFRKaU1FaWVvOEJiV2NwZ0JHck12X1E&output=csv")
supplements.data<- supplements.data.0[supplements.data.0[,2]>2,]# let's only look at "good" supplements
supplements.data<- supplements.data[!is.na(supplements.data[,2]),]# and we don't want any missing data
supplement.score<- supplements.data[, 2]
ss <-order(supplement.score, decreasing =F)# sort our data
supplement.score<- supplement.score[ss]
supplement.name<- supplements.data[ss, 1]
supplement.benefits<- supplements.data[ss, 4]
supplement.score.col<-factor(as.character(supplement.score))levels(supplement.score.col)<-c("red", "orange", "blue", "dark green")
supplement.score.col<-as.character(supplement.score.col)# mar: c(bottom, left, top, right) The default is c(5, 4, 4, 2) + 0.1.par(mar =c(5,9,4,13))# taking care of the plot margins
bar.y<-barplot(supplement.score, names.arg= supplement.name, las =1, horiz =T, col= supplement.score.col, xlim =c(0,6.2),
main =c("Nutritional supplements efficacy score","(by evidence for its effectiveness on the listed condition)", "(2010)"))axis(4, labels= supplement.benefits, at = bar.y, las =1)# Add right axisabline(h = bar.y, col= supplement.score.col , lty =2)# add some lines so to easily follow each bar

Also, the nice things is that if the guys at Information Is Beautiful will update there data, we could easily run the code and see the updated list of recommended supplements.

Balloon plot
So after some web surfing I came around an implementation of a balloon plot in R (Thanks to R graph gallery)
There where two problems with using the command out of the box. The first one was that the colors where non informative (easily fixed), the second one was that the X labels where overlapping one another. Since there is no “las” parameter in the function, I just opened the function up, found where this was plotted and changed it manually (a bit messy, but that’s what you have to do sometimes…)

Here are the result (you can click the image for a larger image):

And here is The R code to produce the Balloon plot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).
(it’s just the copy of the function with a tiny bit of editing in line 146, and then using it)