Update (2019-08-17): to see a good solution for this problem, please go to this link. The solution in the post is old and while it still works, it is better to use the newer methods from the link.
The problem: producing a Word (.docx) file of a statistical report created in R, with as little overhead as possible. The solution: combining R+knitr+rmarkdown+pander+pandoc (it is easier than it is spelled).
If you get what this post is about, just jump to the “Solution: the workflow” section.
Preface: why is this a problem (/still)
Before turning to the solution, let’s address two preliminary questions:
Q: Why is it important to be able to create report in Word from R?
A: Because many researchers we may work with are used to working with Word for editing their text, tracking changes and merging edits between different authors, and copy-pasting text/tables/images from various sources. This means that a report produced as a PDF file is less useful for collaborating with less-tech-savvy researchers (copying text or tables from PDF is not fun). Even exchanging HTML files may appear somewhat awkward to fellow researchers. Continue reading “Writing a MS-Word document using R (with as little overhead as possible)”
stargazer is a new R package that creates LaTeX code for well-formatted regression tables, with multiple models side-by-side, as well as for summary statistics tables. It can also output the content of data frames directly into LaTeX. Compared to available alternatives, stargazer excels in three regards: its ease of use, the large number of models it supports, and its beautiful aesthetics.
Ease of use
stargazer was designed with the user’s comfort in mind. The learning curve is very mild and all arguments are very intuitive, so that even a beginning user of R or LaTeX can quickly become familiar with the package’s many capabilities. The package is intelligent, and tries to minimize the amount of effort the user has to put into adjusting argument values. If stargazer is given a set of regression model objects, for instance, the package will create a side-by-side regression table. By contrast, if the user feeds it a data frame, stargazer will know that the user is most likely looking for a summary statistics table or – if the summary argument is set to false – wants to output the content of the data frame.
A quick reproducible example shows just how easy stargazer is to use. You can install stargazer from CRAN in the usual way:
This post shows how to print a prettier nested pivot table, created using the {reshape} package (similar to what you would get with Microsoft Excel), so you could print it either in the R terminal or as a LaTeX table. This task is done by bridging between the cast_df object produced by the {reshape} package, […]
This post shows how to print a prettier nested pivot table, created using the {reshape} package (similar to what you would get with Microsoft Excel), so you could print it either in the R terminal or as a LaTeX table. This task is done by bridging between the cast_df object produced by the {reshape} package, and the tabular function introduced by the new {tables} package.
Here is an example of the type of output we wish to produce in the R terminal:
The cast function is wonderful but it has one problem – the format of the output. As opposed to a pivot table in (for example) MS excel, the output of a nested table created by cast is very “flat”. That is, there is only one row for the header, and only one column for the row names. So for both the R terminal, or an Sweave document, when we deal with a more complex reshaping/aggregating, the result is not something you would be proud to send to a journal.
The opportunity: the {tables} package
The good news is that Duncan Murdoch have recently released a new package to CRAN called {tables}. The {tables} package can compute and display complex tables of summary statistics and turn them into nice looking tables in Sweave (LaTeX) documents. For using the full power of this package, you are invited to read through its detailed (and well written) 23 pages Vignette. However, some of us might have preferred to keep using the syntax of the {reshape} package, while also benefiting from the great formatting that is offered by the new {tables} package. For this purpose, I devised a function that bridges between cast_df (from {reshape}) and the tabular function (from {tables}).
The bridge: between the {tables} and the {reshape} packages
The code for the function is available on my github (link: tabular.cast_df.r on github) and it seems to works fine as far as I can see (though I wouldn’t run it on larger data files since it relies on melting a cast_df object.)
Here is an example for how to load and use the function:
####################### Loading the functions####################### Making sure we can source code from githubsource("https://www.r-statistics.com/wp-content/uploads/2012/01/source_https.r.txt")# Reading in the function for using tabular on a cast_df object:
source_https("https://raw.github.com/talgalili/R-code-snippets/master/tabular.cast_df.r")####################### example:################################### Loading and preparing some datarequire(reshape)names(airquality)<-tolower(names(airquality))
airquality2 <-airquality
airquality2$temp2 <-ifelse(airquality2$temp >median(airquality2$temp), "hot", "cold")
aqm <- melt(airquality2, id=c("month", "day","temp2"), na.rm=TRUE)colnames(aqm)[4]<-"variable2"# because otherwise the function is having problem when relying on the melt function of the cast objecthead(aqm,3)# month day temp2 variable2 value#1 5 1 cold ozone 41#2 5 2 cold ozone 36#3 5 3 cold ozone 12############# Running the example:
tabular.cast_df(cast(aqm, month ~ variable2, c(mean,sd)))
tabular(cast(aqm, month ~ variable2, c(mean,sd)))# notice how we turned tabular to be an S3 method that can deal with a cast_df object
Hmisc::latex(tabular(cast(aqm, month ~ variable2, c(mean,sd))))# this is what we would have used for an Sweave document
And here are the results in the terminal:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>> tabular.cast_df(cast(aqm, month ~ variable2, c(mean,sd)))
ozone solar.r wind temp
month meansdmeansdmeansdmeansd523.6222.22181.3115.0811.6233.53165.556.855629.4418.21190.292.8810.2673.76979.106.599759.1231.64216.580.578.9423.03683.904.316859.9639.68171.976.838.7943.22683.976.585931.4524.14167.479.1210.1803.46176.908.356> tabular(cast(aqm, month ~ variable2, c(mean,sd)))# notice how we turned tabular to be an S3 method that can deal with a cast_df object
ozone solar.r wind temp
month meansdmeansdmeansdmeansd523.6222.22181.3115.0811.6233.53165.556.855629.4418.21190.292.8810.2673.76979.106.599759.1231.64216.580.578.9423.03683.904.316859.9639.68171.976.838.7943.22683.976.585931.4524.14167.479.1210.1803.46176.908.356
And in an Sweave document:
Here is an example for the Rnw file that produces the above table: cast_df to tabular.Rnw
I will finish with saying that the tabular function offers more flexibility then the one offered by the function I provided. If you find any bugs or have suggestions of improvement, you are invited to leave a comment here or inside the code on github.
(Link-tip goes to Tony Breyal for putting together a solution for sourcing r code from github.)