Exporting R output to MS-Word with R2wd (an example session)

UPDATE (2014-11-02): please note that this post is from 2010. These days, it is much simpler to create docx files from R using knitr+pandoc. Using pander (links: [1], [2]) can also help make the markdown output look nicer in the file.

Creating reports is one of the basic tasks in data analysis. R provides numerous functions and packages to export it’s (beautiful) output and help compile it into a report.

In this post I will present one such (basic) solution for Windows OS users for exporting R output into Microsoft Word using the R2wd (package). There are more ways and strategies for doing this, and if encouraged by comments, I will gladly write more on the subject.
* * *

R to Word using {R2wd}

The package R2wd (available through CRAN) relies on rcom. It is a wrapper that uses the statconnDCOM server to communicate with MS-Word via the COM interface.

R2wd can perform the basic tasks you would expect to need when creating a report from R. It allows you to:

  • Create a new Word file
  • Create headers and sub-headers
  • Move to a new pages in the document
  • Write text
  • Insert tables (that is “data.frame” and “matrix”objects)
  • Insert plots
  • Save and close the Word document
  • …(and more)

The current R2wd can still be seen as being in BETA stages.  Some features are not yet available, such as:

  • Choosing text font (which means most of us will need to manually change the font in the document to “couriers new…”, in order for the formatting to look good)
  • Inserting of complex object outputs (such as summery.lm, although in the example bellow I show how that can be achieved using a simple function)
  • Speed – the speed of inserting a table is somewhat slow, I am not sure how it would scale to large documents

But from a (pleasant) correspondence with the package developer, I was assured the next release will supply us with more options and features.

R2wd package developer, Christan Ritter, invites feedback from users.  So if you have features you are missing in this packages, I believe he would like to know about it (you can e-mail Christan at:     christian.ritter <-at-> ridaco <-dot-> be  )

Getting R2wd 1.3

The current version of R2wd is 1.1 and Christan Ritter (the package developer), says it is a “first idea” and that a more elaborate version will soon (e.g: around July) be available on CRAN.   In the meantime, Christan was so kind as to send me a more recent version of the package, which you (until it gets uploaded to CRAN), you are welcome to download from here:
R2wd 1.3 download link

How to use R2wd to create a report – a sample session

Being young doesn’t prevent from R2wd to do some nice things.

Here is the text from the library(help=R2wd) :

If Word is not already running, wdGet() opens a new Word document, otherwise, it establishes a COM handle to the instance which is already running. The functions wdTitle, wdHeader, wdBody, and wdParagraph can be used to inject text elements into Word. Moreover, bookmarks can be added via wdInsertBookmarks and wdGoToBookmark allows to navigate among the bookmarks which also exist. There is another set of convenience functions, wdSection, wdSubsection, and wdSubsubsection which insert headers of level 1, 2, or 3, start new ’Sections’ in Word, and add bookmarks.
Graphs and dataframes can be inserted intoWord, by the wdPlot, wdTable commands. The wdTable command takes a dataframe or an array as arguments, creates a Word table of the appropriate dimensions and injects the content of the dataframe or array into it. It then formats the table in Word using elementary formating elements.
The functions wdApplyTheme and wdApplyTemplate allow to work with themes and templates.

Here is an example sessions to demonstrate some of what is said:

# install.packages("R2wd")
# library(help=R2wd)
require(R2wd)


wdGet(T)	# If no word file is open, it will start a new one - can set if to have the file visiable or not
wdNewDoc("c:\This.doc")	# this creates a new file with "this.doc" name

wdApplyTemplate("c:\This.dot")	# this applies a template


wdTitle("Examples of R2wd (a package to write Word documents from R)")	# adds a title to the file

wdSection("Example 1 - adding text", newpage = T) # This can also create a header

wdHeading(level = 2, "Header 2")
wdBody("This is the first example we will show")
wdBody("(Notice how, by using two different lines in wdBody, we got two different paragraphs)")
wdBody("(Notice how I can use this: ' n' (without the space), to  n  go to the next
		line)")
wdBody("האם זה עובד בעברית ?")
wdBody("It doesn't work with Hebrew...")
wdBody("O.k, let's move to the next page (and the next example)")

wdSection("Example 2 - adding tables", newpage = T)
wdBody("Table using 'format'")
wdTable(format(head(mtcars)))
wdBody("Table without using 'format'")
wdTable(head(mtcars))


wdSection("Example 3 - adding lm summary", newpage = T)

## Example from  ?lm
ctl

Update:
Upon reading my post, Chris suggested that I’ll also add a note here about SWORD, a tool written by Thomas Baier (the creator of the StatconnDCOM server) which allows to include R-code in a Sweave-like fashion in Word documents. Here is a link to the project: http://rcom.univie.ac.at

The new GUI for ggplot2 (using Deducer) – the designer wants your opinion

After discovering that R is expected (this summer) to have a GUI for ggplot2 (through deducer), I later found Ian’s gsoc proposal for this GUI.  Since the system is in it’s early stages of development, Ian has invited people to give comments, input and critique on his plans for the project.

For your convenience (and with Ian’s permission), I am reposting his proposal here. You are welcome to send him feedback by e-mailing him (at: [email protected]), or by leaving a comment here (and I will direct him to your comment).

Continue reading “The new GUI for ggplot2 (using Deducer) – the designer wants your opinion”

R is going to have a GUI to ggplot2! (by the end of this years google-summer-of-code)

I was delighted to see the following e-mail post from Dirk Eddelbuettel regarding the google-summer-of-code R google group:
* * *

Earlier today Google finalised student / mentor pairings and allocations for
the Google Summer of Code 2010 (GSoC 2010). The R Project is happy to
announce that the following students have been accepted:

Colin Rundel, “rgeos – an R wrapper for GEOS”, mentored by Roger Bivand of
the Norges Handelshoyskole, Norway

Ian Fellows, “A GUI for Graphics using ggplot2 and Deducer”, mentored by
Hadley Wickham of Rice University, USA

Chidambaram Annamalai, “rdx – Automatic Differentiation in R”, mentored by
John Nash of University of Ottawa, Canada

Yasuhisa Yoshida, “NoSQL interface for R”, mentored by Dirk Eddelbuettel,
Chicago, USA

Felix Schoenbrodt, “Social Relations Analyses in R”, mentored by Stefan
Schmukle, Universitaet Muenster, Germany

Details about all proposals are on the R Wiki page for the GSoC 2010 at
http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010

The R Project is honoured to have received its highest number of student
allocations yet, and looks forward to an exciting Summer of Code. Please
join me in welcoming our new students.

At this time, I would also like to thank all the other students who have
applied for working with R in this Summer of Code. With a limited number of
available slots, not all proposals can be accepted — but I hope that those
not lucky enough to have been granted a slot will continue to work with R and
towards making contributions within the R world.

I would also like to express my thanks to all other mentors who provided for
a record number of proposals. Without mentors and their project ideas we
would not have a Summer of Code — so hopefully we will see you again next
year.

Regards,

Dirk (acting as R/GSoC 2010 admin)

* * *

From all the projects, the one I am most excited about is:
Ian Fellows, “A GUI for Graphics using ggplot2 and Deducer”, mentored by Hadley Wickham of Rice University, USA

Deducer (text from the website) attempts to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is to two-fold.

  • Provide an intuitive interface so that non-technical users can learn and perform analyses without programming getting in their way.
  • Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.

Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).

This combination (of Deducer and ggplot2) might finally provide the bridge to the layman-statistician that some people recently wrote to be one of R’s weak spots (while other bloogers wrote back that this is o.k., still no one refuted that R doesn’t compete with the point-and-click of softwares like SPSS or JMP.)
I came across Ian in the discussion forums, where he provided very kind help to his package “deducer”. Coupled with having Hadley as his mentor, I am very optimistic about the prospects of seeing this project reaching very high standards.
Very exciting development indeed!

Update: Ian’s proposal is available to view here.

p.s: for some intuition about how a GUI for ggplot2 can look like, have a look at this video of Jeroen Ooms’s ggplot2 web interface

How to upgrade R on windows XP – another strategy (and the R code to do it)

Update: This post has a follow-up for how to upgrade R on windows 7 explaining how to deal with permission issues.

Background – how I heard that there is more then one way to upgrade R

If you didn’t hear it by now – R 2.11.0 is out with a bunch of new features.

After Andrew Gelman recently lamented the lack of an easy upgrade process for R, a Stackoverflow thread (by JD Long) invited R users to share their strategies for easily upgrading R.

Upgrading strategy – moving to a global R library

In that thread, Dirk Eddelbuettel suggested another idea for upgrading R. His idea is of using a folder for R’s packages which is outside the standard directory tree of the installation (a different strategy then the one offered on the R FAQ).

The idea of this upgrading strategy is to save us steps in upgrading. So when you wish to upgrade R, instead of doing the following three steps:

  • download new R and install
  • copy the “library” content from the old R to the new R
  • upgrade all of the packages (in the library folder) to the new version of R.

You could instead just have steps 1 and 3, and skip step 2 (thus, saving us time…).

For example, under windows XP, you might have R installed on:
C:Program FilesRR-2.11.0
But (in this alternative model for upgrading) you will have your packages library on a “global library folder” (global in the sense of independent of a specific R version):
C:Program FilesRlibrary

So in order to use this strategy, you will need to do the following steps (all of them are performed in an R code provided later in the post)-

  1. In the OLD R installation (in the first time you move to the new system of managing the upgrade):
    1. Create a new global library folder (if it doesn’t exist)
    2. Copy to the new “global library folder” all of your packages from the old R installation
    3. After you move to this system – the steps 1 and 2 would not need to be repeated. (hence the advantage)
  2. In the NEW R installation:
    1. Create a new global library folder (if it doesn’t exist – in case this is your first R installation)
    2. Premenantly point to the Global library folder whenever R starts
    3. (Optional) Delete from the “Global library folder” all the packages that already exist in the local library folder of the new R install (no need to have doubles)
    4. Update all packages. (notice that you picked a mirror where the packages are up-to-date, you sometimes need to choose another mirror)

Thanks to help from Dirk, David Winsemius and Uwe Ligges, I was able to write the following R code to perform all the tasks I described 🙂

So first you will need to run the following code:
Continue reading “How to upgrade R on windows XP – another strategy (and the R code to do it)”

The difference between "letters[c(1,NA)]" and "letters[c(NA,NA)]"

In David Smith’s latest blog post (which, in a sense, is a continued response to the latest public attack on R), there was a comment by Barry that caught my eye. Barry wrote:

Even I get caught out on R quirks after 20 years of using it. Compare letters[c(12,NA)] and letters[c(NA,NA)] for the most recent thing that made me bang my head against the wall.

So I did, and here’s the output:

> letters[c(12,NA)]
[1] "l" NA
>  letters[c(NA,NA)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>

Interesting isn’t it?
I had no clue why this had happened but luckily for us, Barry gave a follow-up reply with an explanation. And here is what he wrote:
Continue reading “The difference between "letters[c(1,NA)]" and "letters[c(NA,NA)]"”

Parallel Multicore Processing with R (on Windows)

Parallel Processing backend for R under windows – installation tips and some examples.

This post offers simple example and installation tips for “doSMP” the new Parallel Processing backend package for R under windows.
* * *

Update:
The required packages are not yet now available on CRAN, but until they will get online, you can download them from here:
REvolution foreach windows bundle
(Simply unzip the folders inside your R library folder)

* * *

Recently, REvolution blog announced the release of “doSMP”, an R package which offers support for symmetric multicore processing (SMP) on Windows.
This means you can now speed up loops in R code running iterations in parallel on a multi-core or multi-processor machine, thus offering windows users what was until recently available for only Linux/Mac users through the doMC package.

Installation

For now, doSMP is not available on CRAN, so in order to get it you will need to download the REvolution R distribution “R Community 3.2” (they will ask you to supply your e-mail, but I trust REvolution won’t do anything too bad with it…)
If you already have R installed, and want to keep using it (and not the REvolution distribution, as was the case with me), you can navigate to the library folder inside the REvolution distribution it, and copy all the folders (package folders) from there to the library folder in your own R installation.

If you are using R 2.11.0, you will also need to download (and install) the revoIPC package from here:
revoIPC package – download link (required for running doSMP on windows)
(Thanks to Tao Shi for making this available!)

Usage

Once you got the folders in place, you can then load the packages and do something like this:

require(doSMP)
workers <- startWorkers(2) # My computer has 2 cores
registerDoSMP(workers)

# create a function to run in each itteration of the loop
check <-function(n) {
	for(i in 1:1000)
	{
		sme <- matrix(rnorm(100), 10,10)
		solve(sme)
	}
}


times <- 10	# times to run the loop

# comparing the running time for each loop
system.time(x <- foreach(j=1:times ) %dopar% check(j))  #  2.56 seconds  (notice that the first run would be slower, because of R's lazy loading)
system.time(for(j in 1:times ) x <- check(j))  #  4.82 seconds

# stop workers
stopWorkers(workers)

Points to notice:

  • You will only benefit from the parallelism if the body of the loop is performing time-consuming operations. Otherwise, R serial loops will be faster
  • Notice that on the first run, the foreach loop could be slow because of R's lazy loading of functions.
  • I am using startWorkers(2) because my computer has two cores, if your computer has more (for example 4) use more.
  • Lastly - if you want more examples on usage, look at the "ParallelR Lite User's Guide", included with REvolution R Community 3.2 installation in the "doc" folder

Updates

(15.5.10) :
The new R version (2.11.0) doesn't work with doSMP, and will return you with the following error:

Loading required package: revoIPC
Error: package 'revoIPC' was built for i386-pc-intel32


So far, a solution is not found, except using REvolution R distribution, or using R 2.10
A thread on the subject was started recently to report the problem. Updates will be given in case someone would come up with better solutions.

Thanks to Tao Shi, there is now a solution to the problem. You'll need to download the revoIPC package from here:
revoIPC package - download link (required for running doSMP on windows)
Install the package on your R distribution, and follow all of the other steps detailed earlier in this post. It will now work fine on R 2.11.0


Update 2: Notice that I added, in the beginning of the post, a download link to all the packages required for running parallel foreach with R 2.11.0 on windows. (That is until they will be uploaded to CRAN)

Update 3 (04.03.2011): doSMP is now officially on CRAN!

An article attacking R gets responses from the R blogosphere – some reflections

In this post I reflect on the current state of the R blogosphere, and share my hopes for the future

In this post I reflect on the current state of the R blogosphere, and share my hopes for it’s future.

* * *

Background

I am very grateful to Dr. AnnMaria De Mars for writing her post “The Next Big Thing”.
In her post, Dr. De Mars attacked R by accusing it of being “an epic fail” (in being user-friendly) and “NOT the next big thing”. Of course one should look at Dr. De Mars claims in their context. She is talking about particular aspects in which R fails (the lacking of a mature GUI for non-statisticians), and had her own (very legitimate) take on where to look for “the next big thing”. All in all, her post was decent, and worth contemplating upon respectfully (even if one, me for example, doesn’t agree with all of Dr. De Mars claims.)

R bloggers are becoming a community

But Dr. De Mars post is (very) important for a different reason. Not because her claims are true or false, but because her writing angered people who love and care for R (whether legitimately or not, it doesn’t matter). Anger, being a very powerful emotion, can reveal interesting things. In our case, it just showed that R bloggers are connected to each other.

So far there are 69 R bloggers who wrote in reply to Dr. De Mars post (some more kind then others), they are:

  • R and the Next Big Thing by David Smith
  • This is good news, since it shows that R has a community of people (not “just people”) who write about it.
    In one of the posts, someone commented about how R current stage reminds him of how linux was in 1998, and how he believes R will grow to be amazingly dominant in the next 10 years.
    In the same way, I feel the R blogosphere is just now starting to “wake up” and become aware that it exists. Already 6 bloggers found they can write not just about R code, but also reply to does who “attack” R (in their view). Imagine how the R blogosphere might look in a few years from now…

    I would like to end with a more general note about the importance of R bloggers collaboration to the R ecosystem.

    Continue reading “An article attacking R gets responses from the R blogosphere – some reflections”

    "The next big thing", R, and Statistics in the cloud

    A friend just e-mailed me about a blog post by Dr. AnnMaria De Mars titled “The Next Big Thing”.

    In it Dr. De Mars wrote (I allowed myself to emphasize some parts of the text):

    Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. […]
    for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.

    Here are my two cents on the subject:
    Continue reading “"The next big thing", R, and Statistics in the cloud”

    Repeated measures ANOVA with R (functions and tutorials)

    Repeated measures ANOVA is a common task for the data analyst.

    There are (at least) two ways of performing “repeated measures ANOVA” using R but none is really trivial, and each way has it’s own complication/pitfalls (explanation/solution to which I was usually able to find through searching in the R-help mailing list).

    So for future reference, I am starting this page to document links I find to tutorials, explanations (and troubleshooting) of “repeated measure ANOVA” done with R

    Functions and packages

    (I suggest using the tutorials supplied bellow for how to use these functions)

    • aov {stats} – offers SS type I repeated measures anova, by a call to lm for each stratum. A short example is given in the ?aov help file
    • Anova {car} – Calculates type-II or type-III analysis-of-variance tables for model objects produced by lm, and for various other object. The ?Anova help file offers an example for how to use this for repeated measures
    • ezANOVA {ez} – This function provides easy analysis of data from factorial experiments, including purely within-Ss designs (a.k.a. “repeated measures”), purely between-Ss designs, and mixed within-and-between-Ss designs, yielding ANOVA results and assumption checks. It is a wrapper of the Anova {car} function, and is easier to use. The ez package also offers the functions ezPlot and ezStats to give plot and statistics of the ANOVA analysis. The ?ezANOVA help file gives a good demonstration for the functions use (My thanks goes to Matthew Finkbe for letting me know about this cool package)
    • friedman.test {stats} – Performs a Friedman rank sum test with unreplicated blocked data. That is, a non-parametric one-way repeated measures anova. I also wrote a wrapper function to perform and plot a post-hoc analysis on the friedman test results
    • Non parametric multi way repeated measures anova – I believe such a function could be developed based on the Proportional Odds Model, maybe using the {repolr} or the {ordinal} packages. But I still didn’t come across any function that implements these models (if you do – please let me know in the comments).
    • Repeated measures, non-parametric, multivariate analysis of variance – as far as I know, such a method is not currently available in R.  There is, however, the Analysis of similarities (ANOSIM) analysis which provides a way to test statistically whether there is a significantdifference between two or more groups of sampling units.  Is is available in the {vegan} package through the “anosim” function.  There is also a tutorial and a relevant published paper.

    Good Tutorials

    Troubelshooting

    Unbalanced design
    Unbalanced design doesn’t work when doing repeated measures ANOVA with aov, it just doesn’t. This situation occurs if there are missing values in the data or that the data is not from a fully balanced design. The way this will show up in your output is that you will see the between subject section showing withing subject variables.

    A solution for this might be to use the Anova function from library car with parameter type=”III”. But before doing that, first make sure you understand the difference between SS type I, II and III. Here is a good tutorial for helping you out with that.
    By the way, these links are also useful in case you want to do a simple two way ANOVA for unbalanced design

    I will “later” add R-help mailing list discussions that I found helpful on the subject.

    If you come across good resources, please let me know about them in the comments.

    Jeroen Ooms's ggplot2 web interface – a new version released (V0.2)

    Good news.

    Jeroen Ooms released a new version of his (amazing) online ggplot2 web interface:

    yeroon.net/ggplot2 is a web interface for Hadley Wickham’s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the client side: a standard browser will do.

    The new version has a lot of cool new features, like advanced data import, integration with Google docs, converting variables from numeric to factor to dates and vice versa, and a lot of new geom’s. Some of which you can watch in his new video demo of the application:

    The application is on:
    http://www.yeroon.net/ggplot2/

    p.s: other posts about this (including videos explaining how some of this was done) can be views on the category page: R and the web