Could we run a statistical analysis on iPhone/iPad using R?

Updates (17.07.10 + 13.09.10 + 03.05.11)

03.05.2011: “Satisfaction blog” wrote about the idea to use iPhone with RStudio – great job Julyan!

I now came across David smith’s post on the REvolution blog, pointing to instruction on the R wiki for how to install R on the iPhone!
I didn’t try it myself since it both requires jailbreaking the iPhone, and I don’t have an iPhone. But it is still interesting to know of.

The blog “Computational Mathematics” recently published a post about a package on Cydia to ease R installation on iPhone, you can read it here: R on the iPhone.

Preface – I don’t use Mac

I don’t use Mac! Not that there is anything wrong with that, but I don’t use Mac…

Yet at the same time, wonderful people like my wife, my brother, my thesis advisor and even my mother-in-law – all use mac. So one can’t help but wonder if I might be missing out on something.

Still, for a Windows user like me it is a bit difficult to understand the hype around the iPhone 4 release:

Such releases tend to look to me more like this spoof video about the release of the apple “i”.

So while not using apples product, I have a deep respect for the impact it has made in peoples lives. Which begs the question: Could you use R on an iPhone (or an iPad) ??

Can R be run on iPhone/iPad ?

This question (and the motivation for this post) was raised in an R help mailing list thread a week ago.

After receiving permission from the threads author, I am republishing the content that was presented there in the hopes it might be of interest to other R community members.

And here is what “Marc Schwartz” wrote:
Continue reading “Could we run a statistical analysis on iPhone/iPad using R?”

Syncing files across computers using DropBox

Motivation

In the past few months I have been using DropBox for syncing my work files between my home and work computer. It has saved me from numerous mistakes and from sending the files to myself via e-mail.

Recently I found this service highly useful for sharing files with 4 other people with whom I am working on a data analysis project. Being so happy with it (and also by gaining more storage space by inviting friends to use it), I thought of sharing my experience here with other R users that might benefit from this cool (free) service.

What is Dropbox?

Dropbox is a Software/Web2.0 file hosting service which enable users to synchronize files and folders between computers across the internet.
This is done by installing a software and then picking a “shared folder” on your computer. From that moment on, that folder will be synced with any computer you choose to install the software on (for example, your home/work computer, your laptop – and so on)

http://www.youtube.com/watch?v=OFb0NaeRmdg

DropBox also enables users to share some of their folders with other DropBox users. This seamless integration of the service with your OS file system (Windows, Mac or Linux) is what’s making this service so comfortable, by allowing me to work with co-workers and have the same “project tree” of folders, all of which are always synced.

You could also share a file “online”, by getting a link to it which you could share with others. So for example, you could write an R code, share it online, and call to it later with source(). This is the easiest way I know of how to do this.

Dropbox is a “cloud computing” Web2.0 file hosting service offering both free and paid services. The free version (which I use) offers 2GB of “shared storage” (unless you invite other users, in which case you get some extended storage space. Which is one of my motivations in writing this post).

Dropbox has other non-trivial uses allowing one to:

The service’s major competitors are Box.net, Sugarsync and Mozy, non of which I have had the chance of trying.

How to start?

Simply go to: DropBox.com
Sign up, install the software, use the new shared folder, and let me know if it helped you 🙂

How to get Extra space?

You can:

  • Earn another 750MB of space by connecting your dropbox to your twitter/facebook account and sending a status update about them. To get this bonus, head over to “Get extra space free!” page.
  • Refer a friend to open a dropbox account (every friend joining earns you another 250MB of space). This bonus is bounded by a total of 8GB of added space (after that, you won’t be allowed any more extra space)
  • Upgrade – pay 10$ a month and get extra 50GB

useR-2010 is looking for a T-shirt design

Katharine Mullen has just published on the R mailing list a call for designeRs who might be willing to design a T-shirt aRt design for the shirt that will be given in useR 2010.

I consider such contests as one of those good-for-the-community things, and hope regular useRs, R bloggers, and companies that are based on R – will consider spreading the word, participating in it (and maybe even offer more bonuses to the designers).

If you design something and put it on picasa or flickr, please tag it with “useR2010Tshirt” (and consider leaving a comment with a link to the design), so there could later be a follow up on your work. Even if you don’t “win” you will get positive “karma points” from the community 🙂 .

Here are the competition details, as published in the mailing list:
Continue reading “useR-2010 is looking for a T-shirt design”

Exporting R output to MS-Word with R2wd (an example session)

UPDATE (2014-11-02): please note that this post is from 2010. These days, it is much simpler to create docx files from R using knitr+pandoc. Using pander (links: [1], [2]) can also help make the markdown output look nicer in the file.

Creating reports is one of the basic tasks in data analysis. R provides numerous functions and packages to export it’s (beautiful) output and help compile it into a report.

In this post I will present one such (basic) solution for Windows OS users for exporting R output into Microsoft Word using the R2wd (package). There are more ways and strategies for doing this, and if encouraged by comments, I will gladly write more on the subject.
* * *

R to Word using {R2wd}

The package R2wd (available through CRAN) relies on rcom. It is a wrapper that uses the statconnDCOM server to communicate with MS-Word via the COM interface.

R2wd can perform the basic tasks you would expect to need when creating a report from R. It allows you to:

  • Create a new Word file
  • Create headers and sub-headers
  • Move to a new pages in the document
  • Write text
  • Insert tables (that is “data.frame” and “matrix”objects)
  • Insert plots
  • Save and close the Word document
  • …(and more)

The current R2wd can still be seen as being in BETA stages.  Some features are not yet available, such as:

  • Choosing text font (which means most of us will need to manually change the font in the document to “couriers new…”, in order for the formatting to look good)
  • Inserting of complex object outputs (such as summery.lm, although in the example bellow I show how that can be achieved using a simple function)
  • Speed – the speed of inserting a table is somewhat slow, I am not sure how it would scale to large documents

But from a (pleasant) correspondence with the package developer, I was assured the next release will supply us with more options and features.

R2wd package developer, Christan Ritter, invites feedback from users.  So if you have features you are missing in this packages, I believe he would like to know about it (you can e-mail Christan at:     christian.ritter <-at-> ridaco <-dot-> be  )

Getting R2wd 1.3

The current version of R2wd is 1.1 and Christan Ritter (the package developer), says it is a “first idea” and that a more elaborate version will soon (e.g: around July) be available on CRAN.   In the meantime, Christan was so kind as to send me a more recent version of the package, which you (until it gets uploaded to CRAN), you are welcome to download from here:
R2wd 1.3 download link

How to use R2wd to create a report – a sample session

Being young doesn’t prevent from R2wd to do some nice things.

Here is the text from the library(help=R2wd) :

If Word is not already running, wdGet() opens a new Word document, otherwise, it establishes a COM handle to the instance which is already running. The functions wdTitle, wdHeader, wdBody, and wdParagraph can be used to inject text elements into Word. Moreover, bookmarks can be added via wdInsertBookmarks and wdGoToBookmark allows to navigate among the bookmarks which also exist. There is another set of convenience functions, wdSection, wdSubsection, and wdSubsubsection which insert headers of level 1, 2, or 3, start new ’Sections’ in Word, and add bookmarks.
Graphs and dataframes can be inserted intoWord, by the wdPlot, wdTable commands. The wdTable command takes a dataframe or an array as arguments, creates a Word table of the appropriate dimensions and injects the content of the dataframe or array into it. It then formats the table in Word using elementary formating elements.
The functions wdApplyTheme and wdApplyTemplate allow to work with themes and templates.

Here is an example sessions to demonstrate some of what is said:

# install.packages("R2wd")
# library(help=R2wd)
require(R2wd)


wdGet(T)	# If no word file is open, it will start a new one - can set if to have the file visiable or not
wdNewDoc("c:\This.doc")	# this creates a new file with "this.doc" name

wdApplyTemplate("c:\This.dot")	# this applies a template


wdTitle("Examples of R2wd (a package to write Word documents from R)")	# adds a title to the file

wdSection("Example 1 - adding text", newpage = T) # This can also create a header

wdHeading(level = 2, "Header 2")
wdBody("This is the first example we will show")
wdBody("(Notice how, by using two different lines in wdBody, we got two different paragraphs)")
wdBody("(Notice how I can use this: ' n' (without the space), to  n  go to the next
		line)")
wdBody("האם זה עובד בעברית ?")
wdBody("It doesn't work with Hebrew...")
wdBody("O.k, let's move to the next page (and the next example)")

wdSection("Example 2 - adding tables", newpage = T)
wdBody("Table using 'format'")
wdTable(format(head(mtcars)))
wdBody("Table without using 'format'")
wdTable(head(mtcars))


wdSection("Example 3 - adding lm summary", newpage = T)

## Example from  ?lm
ctl

Update:
Upon reading my post, Chris suggested that I’ll also add a note here about SWORD, a tool written by Thomas Baier (the creator of the StatconnDCOM server) which allows to include R-code in a Sweave-like fashion in Word documents. Here is a link to the project: http://rcom.univie.ac.at

R is going to have a GUI to ggplot2! (by the end of this years google-summer-of-code)

I was delighted to see the following e-mail post from Dirk Eddelbuettel regarding the google-summer-of-code R google group:
* * *

Earlier today Google finalised student / mentor pairings and allocations for
the Google Summer of Code 2010 (GSoC 2010). The R Project is happy to
announce that the following students have been accepted:

Colin Rundel, “rgeos – an R wrapper for GEOS”, mentored by Roger Bivand of
the Norges Handelshoyskole, Norway

Ian Fellows, “A GUI for Graphics using ggplot2 and Deducer”, mentored by
Hadley Wickham of Rice University, USA

Chidambaram Annamalai, “rdx – Automatic Differentiation in R”, mentored by
John Nash of University of Ottawa, Canada

Yasuhisa Yoshida, “NoSQL interface for R”, mentored by Dirk Eddelbuettel,
Chicago, USA

Felix Schoenbrodt, “Social Relations Analyses in R”, mentored by Stefan
Schmukle, Universitaet Muenster, Germany

Details about all proposals are on the R Wiki page for the GSoC 2010 at
http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010

The R Project is honoured to have received its highest number of student
allocations yet, and looks forward to an exciting Summer of Code. Please
join me in welcoming our new students.

At this time, I would also like to thank all the other students who have
applied for working with R in this Summer of Code. With a limited number of
available slots, not all proposals can be accepted — but I hope that those
not lucky enough to have been granted a slot will continue to work with R and
towards making contributions within the R world.

I would also like to express my thanks to all other mentors who provided for
a record number of proposals. Without mentors and their project ideas we
would not have a Summer of Code — so hopefully we will see you again next
year.

Regards,

Dirk (acting as R/GSoC 2010 admin)

* * *

From all the projects, the one I am most excited about is:
Ian Fellows, “A GUI for Graphics using ggplot2 and Deducer”, mentored by Hadley Wickham of Rice University, USA

Deducer (text from the website) attempts to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is to two-fold.

  • Provide an intuitive interface so that non-technical users can learn and perform analyses without programming getting in their way.
  • Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.

Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).

This combination (of Deducer and ggplot2) might finally provide the bridge to the layman-statistician that some people recently wrote to be one of R’s weak spots (while other bloogers wrote back that this is o.k., still no one refuted that R doesn’t compete with the point-and-click of softwares like SPSS or JMP.)
I came across Ian in the discussion forums, where he provided very kind help to his package “deducer”. Coupled with having Hadley as his mentor, I am very optimistic about the prospects of seeing this project reaching very high standards.
Very exciting development indeed!

Update: Ian’s proposal is available to view here.

p.s: for some intuition about how a GUI for ggplot2 can look like, have a look at this video of Jeroen Ooms’s ggplot2 web interface

Parallel Multicore Processing with R (on Windows)

Parallel Processing backend for R under windows – installation tips and some examples.

This post offers simple example and installation tips for “doSMP” the new Parallel Processing backend package for R under windows.
* * *

Update:
The required packages are not yet now available on CRAN, but until they will get online, you can download them from here:
REvolution foreach windows bundle
(Simply unzip the folders inside your R library folder)

* * *

Recently, REvolution blog announced the release of “doSMP”, an R package which offers support for symmetric multicore processing (SMP) on Windows.
This means you can now speed up loops in R code running iterations in parallel on a multi-core or multi-processor machine, thus offering windows users what was until recently available for only Linux/Mac users through the doMC package.

Installation

For now, doSMP is not available on CRAN, so in order to get it you will need to download the REvolution R distribution “R Community 3.2” (they will ask you to supply your e-mail, but I trust REvolution won’t do anything too bad with it…)
If you already have R installed, and want to keep using it (and not the REvolution distribution, as was the case with me), you can navigate to the library folder inside the REvolution distribution it, and copy all the folders (package folders) from there to the library folder in your own R installation.

If you are using R 2.11.0, you will also need to download (and install) the revoIPC package from here:
revoIPC package – download link (required for running doSMP on windows)
(Thanks to Tao Shi for making this available!)

Usage

Once you got the folders in place, you can then load the packages and do something like this:

require(doSMP)
workers <- startWorkers(2) # My computer has 2 cores
registerDoSMP(workers)

# create a function to run in each itteration of the loop
check <-function(n) {
	for(i in 1:1000)
	{
		sme <- matrix(rnorm(100), 10,10)
		solve(sme)
	}
}


times <- 10	# times to run the loop

# comparing the running time for each loop
system.time(x <- foreach(j=1:times ) %dopar% check(j))  #  2.56 seconds  (notice that the first run would be slower, because of R's lazy loading)
system.time(for(j in 1:times ) x <- check(j))  #  4.82 seconds

# stop workers
stopWorkers(workers)

Points to notice:

  • You will only benefit from the parallelism if the body of the loop is performing time-consuming operations. Otherwise, R serial loops will be faster
  • Notice that on the first run, the foreach loop could be slow because of R's lazy loading of functions.
  • I am using startWorkers(2) because my computer has two cores, if your computer has more (for example 4) use more.
  • Lastly - if you want more examples on usage, look at the "ParallelR Lite User's Guide", included with REvolution R Community 3.2 installation in the "doc" folder

Updates

(15.5.10) :
The new R version (2.11.0) doesn't work with doSMP, and will return you with the following error:

Loading required package: revoIPC
Error: package 'revoIPC' was built for i386-pc-intel32


So far, a solution is not found, except using REvolution R distribution, or using R 2.10
A thread on the subject was started recently to report the problem. Updates will be given in case someone would come up with better solutions.

Thanks to Tao Shi, there is now a solution to the problem. You'll need to download the revoIPC package from here:
revoIPC package - download link (required for running doSMP on windows)
Install the package on your R distribution, and follow all of the other steps detailed earlier in this post. It will now work fine on R 2.11.0


Update 2: Notice that I added, in the beginning of the post, a download link to all the packages required for running parallel foreach with R 2.11.0 on windows. (That is until they will be uploaded to CRAN)

Update 3 (04.03.2011): doSMP is now officially on CRAN!

Correlation scatter-plot matrix for ordered-categorical data

When analyzing a questionnaire, one often wants to view the correlation between two or more Likert questionnaire item’s (for example: two ordered categorical vectors ranging from 1 to 5).

When dealing with several such Likert variable’s, a clear presentation of all the pairwise relation’s between our variable can be achieved by inspecting the (Spearman) correlation matrix (easily achieved in R by using the “cor.test” command on a matrix of variables).
Yet, a challenge appears once we wish to plot this correlation matrix. The challenge stems from the fact that the classic presentation for a correlation matrix is a scatter plot matrix – but scatter plots don’t (usually) work well for ordered categorical vectors since the dots on the scatter plot often overlap each other.

There are four solution for the point-overlap problem that I know of:

  1. Jitter the data a bit to give a sense of the “density” of the points
  2. Use a color spectrum to represent when a point actually represent “many points”
  3. Use different points sizes to represent when there are “many points” in the location of that point
  4. Add a LOWESS (or LOESS) line to the scatter plot – to show the trend of the data

In this post I will offer the code for the  a solution that uses solution 3-4 (and possibly 2, please read this post comments). Here is the output (click to see a larger image):

And here is the code to produce this plot:

Continue reading “Correlation scatter-plot matrix for ordered-categorical data”

R-Node: a web front-end to R with Protovis

Update (April 6 – 2010) : R-Node now has it’s own a website, with a dedicated google group (you can join it here)

* * * *

The integration of R into online web services is (for me) one of the more exciting prospects in R’s future. That is way I was very excited coming across Jamie Love’s recent creation: R-Node.

What is R-Node

R-Node is a (open source) web front-end to R (the statistical analysis package).

Using this front-end, you can from any web browser connect to an R instance running on a remote (or local) server, and interact with it, sending commands and receiving the responses. In particular, graphing commands such as plot() and hist() will execute in the browser, drawing the graph as an SVG image.

You can see a live demonstration of this interface by visiting:
http://69.164.204.238:2904/
And using the following user/password login info:
User: pvdemouser
Password: svL35NmPwMnt
(This link was originally posted here)

Here are some screenshots:


In the second screenshot you see the results of the R command ‘plot(x, y)’ (with the reimplementation of plot doing the actual plotting), and in the fourth screenshot you see a similar plot command along with a subsequent best fit line (data points calculated with ‘lowess()’) drawn in.

Once in, you can try out R by typing something like:

x <- rnorm(100)
plot(x, main="Random numbers")
l <- lowess(x)
lines (l$y)

The plot and lines commands will bring up a graph - you can escape out of it, download the graph as a SVG file, and change the graph type (e.g. do: plot (x, type="o") ).
Many R commands will work, though only the hist(), plot() and lines() work for graphing.
Please don't type the R command q() - it will quit the server, stopping it working for everyone! Also, as everyone shares the same session for now, using more unique variable name than 'x' and 'l' will help you.

Currently there is only limited error checking but the code continues to be improved and developed. You can download it from:
http://gitorious.org/r-node

How do you may imagine yourself using something like this? Feel invited to share with me and everyone else in the comments.

Here are some of the more technical details of R-Node:
Continue reading "R-Node: a web front-end to R with Protovis"

Google spreadsheets + google forms + R = Easily collecting and importing data for analysis

Someone on the R mailing list (link) asked: how can you easily (daily) collect data from many people into a spreadsheet and then analyse it using R.

The answer people gave to it where on various ways of using excel.  But excel files (at least for now),  are not “on the cloud”.  A better answer might be to create a google form that will update a google spreadsheet that will then be read by R.

If my last sentence wasn’t clear to you, then this post is for you.

Continue reading “Google spreadsheets + google forms + R = Easily collecting and importing data for analysis”

Highlight the R syntax on your (WordPress) blog using the wp-syntax plugin

Update (11.10.10): I found a better solution for R syntax highlighting then the one presented in this post. The plugin is called WP-CodeBox, and I wrote about it on the post – WP-CodeBox: A better R syntax highlighter plugin for WordPress
Download link for WP-Syntax plugin (with GeSHi version 1.0.8.6)

In case you have a self hosted WordPress blog, and you wish to show your R code in it, how would you do it?

The simplest solution would be to just paste the code as plain text, which will look like this:

x <- rnorm(100, mean = 2, sd = 3)
plot(x, xlab = “index”, main = “Example code”)

But if you would like to help our readers orient themselves inside your code by giving different colors to different commands in the code (a.k.a: syntax highlighting). So it would like something like this:

x <- rnorm(100, mean = 2, sd = 3) # Creating a vector
plot(x, xlab = "index", main = "Example code") # Plotting it

How then would you do it?

Plugin Installation

The easiest way to do this inside a self hosted WordPress blog is by installing a plugin called WP-Syntax:

WP-Syntax provides clean syntax highlighting using GeSHi -- supporting a wide range of popular languages (including R). It supports highlighting with or without line numbers and maintains formatting while copying snippets of code from the browser.

But there is a problem. The current WP-Syntax version is using an old version of GeSHi, and only the newer version (currently GeSHi version 1.0.8.6) includes support for R syntax. In order to solve this I patched the plugin and I encourage you to download (the fixed version of) WP-Syntax from here, which will allow you to highlight your R code.

Usage

After installing (and activating) the plugin, in order to add R code to your post you will need to:
1) Only work in HTML mode (not the Visual mode). Or else, the code you will paste will be messed up.
2) Put your code between the <pre> tag, like this:

(Note: make sure that you rewrite the " - so it will work.)

<pre lang="rsplus" line="1">
...Your R code here...
</pre>

Final note: R Syntax highlight in other ways

If you wish to have R syntax higlight inside an HTML file, I encourage you can have a look at the highlight package, by Romain Francois.

If you want to higlight your R syntax inside wordpress.com, here is a blog post by Erik Iverson showing how to do that using Emacs.

p.s: If you have a blog in which you write about R, please let me know about it in the comments (Or just join R-bloggers.com) - I'd love to follow you 🙂

Update: Stephen Turner wrote about a syntax highlighting solution for R and blogger using github gist. And also mentioned there another solution for self hosted wordpress blogs, via J.D. Long: a Github Gist plugin for WordPress. Go publish code 🙂