Barnard's exact test – a powerful alternative for Fisher's exact test (implemented in R)

(The R code for Barnard’s exact test is at the end of the article, and you could also just download it from here, or from github)

Barnards exact test - p-value based on the nuisance parameter
Barnards exact test - p-value based on the nuisance parameter

About Barnard’s exact test

About half a year ago, I was studying various statistical methods to employ on contingency tables. I came across a promising method for 2×2 contingency tables called “Barnard’s exact test“. Barnard’s test is a non-parametric alternative to Fisher’s exact test which can be more powerful (for 2×2 tables) but is also more time-consuming to compute (References can be found in the Wikipedia article on the subject).

The test was first published by George Alfred Barnard (1945) (link to the original paper in Nature). Mehta and Senchaudhuri (2003) explain why Barnard’s test can be more powerful than Fisher’s under certain conditions:

When comparing Fisher’s and Barnard’s exact tests, the loss of power due to the greater discreteness of the Fisher statistic is somewhat offset by the requirement that Barnard’s exact test must maximize over all possible p-values, by choice of the nuisance parameter, π. For 2 × 2 tables the loss of power due to the discreteness dominates over the loss of power due to the maximization, resulting in greater power for Barnard’s exact test. But as the number of rows and columns of the observed table increase, the maximizing factor will tend to dominate, and Fisher’s exact test will achieve greater power than Barnard’s.

About the R implementation of Barnard’s exact test

After finding about Barnard’s test I was sad to discover that (at the time) there had been no R implementation of it. But last week, I received a surprising e-mail with good news. The sender, Peter Calhoun, currently a graduate student at the University of Florida, had implemented the algorithm in R. Peter had  found my posting on the R mailing list (from almost half a year ago) and was so kind as to share with me (and the rest of the R community) his R code for computing Barnard’s exact test. Here is some of what Peter wrote to me about his code:

On a side note, I believe there are more efficient codes than this one.  For example, I’ve seen codes in Matlab that run faster and display nicer-looking graphs.  However, this code will still provide accurate results and a plot that gives the p-value based on the nuisance parameter.  I did not come up with the idea of this code, I simply translated Matlab code into R, occasionally using different methods to get the same result.  The code was translated from:

Trujillo-Ortiz, A., R. Hernandez-Walls, A. Castro-Perez, L. Rodriguez-Cardozo. Probability Test.  A MATLAB file. URL

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=6198

My goal was to make this test accessible to everyone.  Although there are many ways to run this test through Matlab, I hadn’t seen any code to implement this test in R.  I hope it is useful for you, and if you have any questions or ways to improve this code, please contact me at [email protected]

Continue reading “Barnard's exact test – a powerful alternative for Fisher's exact test (implemented in R)”

Web Development with R – an HD video tutorial of Jeroen Ooms talk

Here is a HD version of a video tutorial on web development with R, a lecture that was given by Jeroen Ooms (the guy who made A web application for R’s ggplot2). This talk was given at the Bay Area UseR Group meeting on R-Powered Web Apps.

You can also view the slides for his talk and view (great) examples for: stockplotlme4, and gpplot2.

Thanks again to Jeroen for sharing his knowledge and experience!

Statistics plugins for WordPress

Today I came across a post named “24 Noble WordPress Plugins To Determine The Performance of your Blog” through Weblog Tools Collection (one of my favorite places to stay updates on wordpress). The post provided a good solid list of statistics plugins for wordpress. Some of them are too old to count (pun intended), others are much more recent and relevant.

As a statistics (and WordPress) lover myself, I was inspired to extend the list of wordpress statistics plugins for the hope of benefiting the community:
Blog Metrics
This plugin is based on ideas in an excellent post by Avinash Kaushik (Whom I consider a Web analytics guru and a brilliant blogger!).

it calculates:

  • Raw Author Contribution:
    • average number of posts per month
    • average number of words per post
  • Conversation Rate:
    • average number of comments per postwithout your own comments
    • average number of words used in comments to posts

Both for all the time you’ve been blogging, and for the last month, it then adds these values in a page on your WordPress dashboard.

Blog Metrics for a single author blogBlog metrics per author

Search Meter

This plugin is a must for any blogger. Period.

If you have a Search box on your blog, Search Meter automatically records what people are searching for — and whether they are finding what they are looking for. Search Meter’s admin interface shows you what people have been searching for in the last couple of days, and in the last week or month. It also shows you which searches have been unsuccessful. If people search your blog and get no results, they’ll probably go elsewhere. With Search Meter, you’ll be able to find out what people are searching for, and give them what they want by creating new posts on those topics.  […]

Google analytics Dashboard

Google Analytics Dashboard gives you the ability to view your Google Analytics data in your WordPress dashboard. You can also alow other users to see the same dashboard information when they are logged in or embed parts of the data into posts or as part of your theme.

The biggest advantage of this plugin in my view is that it adds sparklines in the “posts -> edit” page in the admin area.

Analytics360
I don’t use this one much. But one feature it has that I find interesting is that is adds information of when you posted something with the trend line of the google analytics traffic data. It also mixes data from MailChimp’s, which I don’t use.

MailChimp’s Analytics360 plugin allows you to pull Google Analytics and MailChimp data directly into your dashboard, so you can access robust analytics tools without leaving WordPress.

Broken Link Checker
This plugin is also a must.

This plugin will monitor your blog looking for broken links and let you know if any are found.

  • Monitors links in your posts, pages, the blogroll, and custom fields (optional).
  • Detects links that don’t work and missing images.
  • Notifies you on the Dashboard if any are found.
  • Also detects redirected links.
  • Makes broken links display differently in posts (optional).
  • Link checking intervals can be configured.
  • New/modified posts are checked ASAP.
  • You view broken links, redirects, and a complete list of links used on your site, in the Tools -> Broken Links tab.
  • Searching and filtering links by URL, anchor text and so on is also possible.
  • Each link can be edited or unlinked directly via the plugin’s page, without manually editing each post.

Piwik + WP-Piwik

This plugin adds a Piwik stats site to your WordPress dashboard. It’s also able to add the Piwik tracking code to your blog.
Piwik is an open source (GPL licensed) web analytics software program. It provides you with detailed real time reports on your website visitors: the search engines and keywords they used, the language they speak, your popular pages and so on…

You can install Piwik more or less like you install WordPress, and then you are left to integrate it into your blog. The only real down side of it for me (compared to google analytics) is the advanced segmentation and pivoting. But in general it is a free, great (and growing!) Web analytics solution.

Woopra Analytics Plugin
I have been using Woopra since their release thanks to lorelle. I enjoy the ability to follow the live actions that are happening inside the blog. Although since woopra went from BETA to GOLD, I lost most interest because the total blogs I track have more traffic volume then woopra allow tracking in their free account. But small bloggers could find the service gratifying.

Woopra is the world’s most comprehensive, information rich, easy to use, real-time Web tracking and analysis application.

Features include:

  • Live Tracking and Web Statistics
  • A rich user interface and client monitoring application
  • Real-time Analytics
  • Manage Multiple Blogs and Websites
  • Deep analytic and search capabilities
  • Click-to-chat
  • Visitor and member tagging
  • Real-time notifications
  • Easy Installation and Update Notification

Final notes

If you are into web analytics, I also encourage you to give the following a try: Nuconomy,ClickTale, Crazy Egg. And of course, Google analytics. Each of them (and also Woopra) strips you and your visitors a bit more from their privacy. But that is the ultimate price we pay for the strong Web analytics solutions that exists out there.
If you got any more statistics plugins I missed, feel encouraged to share them with me in the comments 🙂

Animation video of rgl in action

Duncan Murdoch just posted a youtube video presenting an animation clip of a 3d rgl object.

Duncan even went further and wrote an explanation on how he made the video:

here are the steps I used:
1.  Design a shape to be displayed, and then play with the animation functions to make it change over time.  Use play3d to do it live in R, movie3d to write the individual frames of the movie to .png files.
2.  Use the ffmpeg package (not an R package, a separate project at http://ffmpeg.org) to convert the .png files to an .mp4 file.  The individual frames totalled about 1 GB; the compressed movie is about 45 MB.
3.  Upload to Youtube.  I’m not a musician, so I had to use one of their licensed background tracks, I couldn’t write my own.  I spent a lot of time picking one and then adjusting the timing of the video to compensate.  Each render/upload cycle at full resolution took about an hour and a half.  It’s a lot faster to render in a smaller window with fewer frames per second, but it’s still tedious.   It’s easier to synchronize if you actually have a copy of the music locally, but Youtube doesn’t let you download their music.  So the timing isn’t perfect, but it’s good enough for me!

Wonderful work Duncan, thanks for sharing!

Announcing R-bloggers.com: a new R news site (for bloggers by bloggers)

I already wrote about R-bloggers on the R mailing list, so it only seems fitting to write about it more here. I will explain what R-bloggers is and then move to explain what I hope it will accomplish.

R-Bloggers.com is a central hub of content collected from bloggers who write about R (in English) and if you are an R blogger you can join it by filling in this form.

I built the site with the aspiration to  help R bloggers and users to connect and follow the “R blogosphere”. When I am writing these words, R-bloggers already has 17 blogs in it, and I hope for many (many) more.

How does R-Bloggers operate? This site aggregates feeds (only with permission!) from participating R blogs. The beginnings of each participating blog’s posts will automatically be displayed on the main page with links to the original posts; inside every post there is a link to the original blog and links to other related articles. While all participating blogs have links in the “Contributors” section of our sidebar

What does R-Bloggers offer it’s visitors?

  • Discover (for all): Find new R blogs you didn’t know about. And Search in them for content you want.
  • Follow (for people who don’t use RSS): Enter your e-mail and subscribe to receive a daily digest with teasers of new posts from participating blogs. You will more easily get a sense of hot topics in the R blogosphere.
  • Connect (for facebook users): Click on “Fan this site” to become a “fan” of R Bloggers. You can then “friend” other people and share thoughts on our wall. Or just by leaving comments on the blog.
  • Participate (for bloggers): Add your R blog to get increased visibility (for readers and search engines) with permanent links on our Contributors sidebar. Your blog will also gain visibility via our e-mail digest and through your presence on the main page with posts.

Who started R-Bloggers (and way)? R Bloggers was started by Tal Galili (well, me).  After searching for numerous R blogs I decided that there must be more R blogs our there then he knows about, and maybe the best way for finding them is to make them find him.

After writing about it in the R mailing list, I got some good feedbacks but also questions about why use only R blogs and not all the R feeds that exist. Who is the website actually for (when there are services like Google reader for us to read our feeds with), and what am I hoping it will do. So here is what I answered:

For me there are two audiences:
One is that of the web 2.0 power users. That is, people who know what RSS is and use it, maybe evern write their own blogs. These people have only one problem (as I see it) that R-bloggers tries to solve, and that is to know who else lives in their ecosystem. Who else they should follow.
For that, google reader recommendation system is great, but not enough. A much better system is if there was a one place where all R bloggers would go, write down their website, and all of us would know they exist. That is what R-bloggers offers for the power users. I think this is also why over 20 of them subscribed to the site RSS feed.
BTW, The origin of this idea came to me when I was trying to find all the dance bloggers for my wife (who is a dance researcher and blogger herself). After a while we started http://www.dancebloggers.com/ while knowing of only 10 bloggers. They list now has over 80 bloggers, most of which we would have not known about without this hub.
The same thing I am trying to do for the R community, that is way I hope more R bloggers would write about the service – so their network of readers which includes other R bloggers would add themselves and we will all know about them.
If that was my only purpose, a simple directory would have been enough. But I also have a second one and that is to help the second audience.

The second audience I am thinking of are people of our community who are not so much early adopters (and actually quite late adapters) of the new facilities that the new web (a.k.a: web 2.0) provides.
To them the all RSS thing is too much to look at, and they are used to e-mails. And because of that they are (until now) disconected from many of the R bloggers out there, simply because it is in-efficient for them to go through all these blogs each day (or even week). So for them, to see all the content in one place (and even get an e-mail about it) would be (I hope) a service. I believe that’s why 5 of them (so far) has subscribed via e-mail.
I also hope teachers will direct their students to this as a resource for getting a sense of what people who are using R are doing.
Another thing that hints me about the R community is seeing how the “facebook fan box” is still empty. Which tells me that (sadly) very few R users are actively using facebook as a means for connecting with the outer networks of people out there.

All I wrote also explains why R-bloggers will only take feeds of bloggers and only (as much as can be said) their posts that are centered around R (hence the website name 🙂 ).
It both follows what Gabor talked about – having a site who’s content is only about R. But also what I wish, which is to have “content” in the sense of articles to read (mostly). And not so much things like news feeds of wikipedia or new packages published.

I hope this post will both notify people about this new resource, encourage more R bloggers to join, and will help for future people to better understand what this R-Bloggers thing is all about 🙂

A web application for R's ggplot2

One of the exciting new frontiers for R programming is of creating website interfaces to R code. At the forefront of this domain is a young and (very) bright man called Jeroen Ooms, whom I had the pleasure of meeting at useR 2009 (press the link to see his presentation).

Today Jeroen announced a new version (0.11) of his web interface to ggplot2. See it here:
http://www.yeroon.net/ggplot2/

As Jeroen wrote:

New features include 1D geom’s (histogram, density, freqpoly), syntax mode (by clicking the tiny arrow at the bottom), and some additional facet options. And some minor improvements and fixes, most notably for Internet Explorer.
The data upload has not been improved yet, I am working on that. For now, it supports .csv, .sav (spss), and tab delimited data. Please make sure your filename has the appropriate extension and every column has a header in your data. If you export a dataframe from R, use:
write.csv(mydf, ”mydf.csv” , row.names=F). If you upload an spss
datafile, none of this should be a concern.
Supported browsers are IE6-8, FF, Safari, and Chrome, but a recent browser is highly recommended. As always, feedback is more than welcome.

Here is a little demo video that shows how to use the new features:

The datafile from the demo is available at http://www.yeroon.net/ggplot2/myMovies.csv.

I wish the best to Jeroen, and hope to see many more such uses in the future.

Free statistics e-books for download

This post will eventually grow to hold a wide list of books on statistics (e-books, pdf books and so on) that are available for free download.  But for now we’ll start off with just one several books:

 

Several of these books were discovered through a CrossValidated discussion:

  1. http://stats.stackexchange.com/questions/614/open-source-statistical-textbooks/
  2. https://stats.stackexchange.com/questions/170/free-statistical-textbooks

* * *

Know of any more e-books freely available for download? Please write to us about them in the comments.

R Flashmob

Today I noticed a call for R users to gather around a single campfire for one hour and share their questions and answers.

The campfire name is stackoverflow.com, a site dedicated for handling programming questions. The event details are bellow:

From: The R Flashmob Project
Subject: R Flashmob #2

You are invited to take part in R Flashmob, the project that makes the
world a better place by posting helpful questions and answers about the
R statistical language to the programmer’s Q & A site stackoverflow.com

Please forward this to other people you know who might like to join.

FAQ

Q. Why would I want to join an inexplicable R mob?

A. Tons of other people are doing it.

Q. Why else?

A. Stackoverflow was built specifically for handling programming questions.
It’s a better mousetrap. It offers search (and is well indexed by search engines),
tagging, voting, the ability to choose the “best” answer to a question, and the ability to
edit questions and answers as technology progresses. It has a karma system to
reward people who are happy to help and discourage MLJs (mailing list jerks).

Q. Do the organizers of this MOB have any commercial interest in stackoverflow?

A. None at all. We’re just convinced it is the best way to help and promote R. All
the content submitted to stackoverflow is protected by a Creative Commons
CC-Wiki License, meaning anyone is free to copy, distribute, transmit, and
remix the information on stackoverflow. All the content on stackoverflow is
regularly made available for download by the public.

INSTRUCTIONS – R MOB #2
Location: stackoverflow.com
Start Date: Tuesday, September 8th, 2009
Start Time:
10:04 AM – US Pacific
11:04 AM – US Mountain
12:04 PM – US Central
1:04 PM – US Eastern
6:04 PM – UK
7:04 PM – Continental W. Europe
5:04 AM (Weds) – New Zealand (birthplace of R)
Duration: 50 minutes

(1) At some point during the day on September 8th, synchronize your watch to
http://timeanddate.com/worldclock/personal.html?cities=137,75,64,179,136,37,22

(2) The mob should form at precisely 4 minutes past the hour and not beforehand.

(3) At 4 minutes past the hour, you should arrive at stackoverflow.com, log in,
and post 3 R questions. Be sure to tag the questions “R”. See the posting
guidelines at http://stackoverflow.com/faq to understand what makes a good
question.

(4) Follow R Flashmob updates at http://twitter.com/rstatsmob

(5) Post twitter messages tagged #rstats and #rstatsmob during the mob,
providing links to your questions.

(6) During the R MOB, you can chat with other participants on the #R channel
on IRC (freenode). To do this, install the Chatzilla extension on Firefox.
Click “freenode” on the main screen. Then type /join #R in the field at the
bottom of the screen. Then chat.

(7) If you finish posting your three questions within the 50 minutes, stick
around to answer questions and give “up votes” to good questions and answers.

(8) IMPORTANT: After posting, sign the R Flashmob guestbook at
https://bit.ly/6F8B2

(9) Return to what you would otherwise have been doing. Await
instructions for R MOB #3.

This invitation already gained exposure from 3 blogs:

I am waiting to see who else will join the fun.

Simple visualization of a 11X5 table (for WordPress 2.9 Features Vote Results)

Simply Something Sophisicated - a WordPress poster

I guess this is not the number one post I would like to start with on this blog, but I feel the time is right for it (community-wise).

I’ll move on to the subject matter in a moment, but first a short intro: This blog is written by Tal Galili. I am an aspiring statistician who also loves to use R for his work. At the same time I am also a WordPress blogger, writing mainly at www.TalGalili.com where I can use my native language (Hebrew) for self expression.

This combination of statistics and blogging will lead me to sometimes much less statistical, but more Web/Open-Source oriented posts like this one. So for the statisticians in the audience I extend my apologies and invite you to wait for future posts which will be more fully focused on Statistics and R.

And now for the topic at hand. . .

*         *         *         *         *
Continue reading “Simple visualization of a 11X5 table (for WordPress 2.9 Features Vote Results)”

What is R?

Highlights

  • R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.   If you wish to download R, please choose your preferred CRAN mirror.
  • The R language has become a de facto standard among statisticians for the development of statistical software,and is widely used for statistical software development and data analysis.
  • Basic questions about R like how to download and install the software, or what the license terms are, are answered in the answers to frequently asked questions section.

Introduction to R

R is a language and environment for statistical computing and graphics.  R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

R is available as Free Software under the terms of the Free Software Foundation‘s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

R and S

R is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues.  R can be considered as a different implementation of S.  There are some important differences, but much code written for S runs unaltered under R. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

The R environment

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

R, like S, is designed around a true computer language, and it allows users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of S, which makes it easy for users to follow the algorithmic choices made. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

Many users think of R as a statistics system. We prefer to think of it of an environment within which statistical techniques are implemented.  R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and many more are available through the CRAN family of Internet sites covering a very wide range of modern statistics.

R has its own LaTeX-like documentation format, which is used to supply comprehensive documentation, both on-line in a number of formats and in hardcopy.

(credit: the R about page and the Wikipedia article R (programming language))