Installing Pandoc from R (on Windows) – using the {installr} package

The R blogger Rolf Fredheim has recently wrote a great piece called “Reproducible research with R, Knitr, Pandoc and Word“, where he advocates for Pandoc as an essential part of reproducible research workflow in R, in helping to turn documents which are knitted in R into high quality Word for exchanging with our colleagues. It is a great post, with many useful bits of code, and I wanted to supplement it with one missing function: “install.pandoc“.

Update: the install.pandoc function is now part of the {installr} package.

Continue reading “Installing Pandoc from R (on Windows) – using the {installr} package”

{stargazer} package for beautiful LaTeX tables from R statistical models output

stargazer is a new R package that creates LaTeX code for well-formatted regression tables, with multiple models side-by-side, as well as for summary statistics tables. It can also output the content of data frames directly into LaTeX. Compared to available alternatives, stargazer excels in three regards:  its ease of use, the large number of models it supports, and its beautiful aesthetics.

Ease of use

stargazer was designed with the user’s comfort in mind. The learning curve is very mild and all arguments are very intuitive, so that even a beginning user of R or LaTeX can quickly become familiar with the package’s many capabilities. The package is intelligent, and tries to minimize the amount of effort the user has to put into adjusting argument values. If stargazer is given a set of regression model objects, for instance, the package will create a side-by-side regression table. By contrast, if the user feeds it a data frame, stargazer will know that the user is most likely looking for a summary statistics table or – if the summary argument is set to false – wants to output the content of the data frame.

A quick reproducible example shows just how easy stargazer is to use. You can install stargazer from CRAN in the usual way:

install.packages("stargazer")
library(stargazer)

Continue reading “{stargazer} package for beautiful LaTeX tables from R statistical models output”

Generation of E-Learning Exams in R for Moodle, OLAT, etc.

(Guest post by Achim Zeileis)
Development of the R package exams for automatic generation of (statistical) exams in R started in 2006 and version 1 was published in JSS by Grün and Zeileis (2009). It was based on standalone Sweave exercises, that can be combined into exams, and then rendered into different kinds of PDF output (exams, solutions, self-study materials, etc.). Now, a major revision of the package has been released that extends the capabilities and adds support for learning management systems. It is still based on the same type of
Sweave files for each exercise but can also render them into output formats like HTML (with various options for displaying mathematical content) and XML specifications for online exams in learning management systems such as Moodle or OLAT. Supplementary files such as graphics or data are
handled automatically. Here, I give a brief overview of the new capabilities. A detailed discussion is in the working paper by Zeileis, Umlauf, and Leisch (2012) that is also contained in the package as a vignette.
Continue reading “Generation of E-Learning Exams in R for Moodle, OLAT, etc.”

Comparing Shiny with gWidgetsWWW2.rapache

(A guest post by John Verzani)

A few days back the RStudio blog announced Shiny, a new product for easily creating interactive web applications (http://www.rstudio.com/shiny/). I wanted to compare this new framework to one I’ve worked on, gWidgetsWWW2.rapache – a version of the gWidgets API for use with Jeffrey Horner’s rapache module for the Apache web server (available at GitHub). The gWidgets API has a similar aim to make it easy for R users to create interactive applications.

I don’t want to worry here about deployment of apps, just the writing side. The shiny package uses websockets to transfer data back and forth from browser to server. Though this may cause issues with wider deployment, the industrious RStudio folks have a hosting program in beta for internet-wide deployment. For local deployment, no problems as far as I know – as long as you avoid older versions of internet explorer.

Now, Shiny seems well suited for applications where the user can parameterize a resulting graphic, so that was the point of comparison. Peter Dalgaard’s tcltk package ships with a classic demo tkdensity.R. I use that for inspiration below. That GUI allows the user a few selections to modify a density plot of a random sample.
Continue reading “Comparing Shiny with gWidgetsWWW2.rapache”

Speed up your R code using a just-in-time (JIT) compiler

This post is about speeding up your R code using the JIT (just in time) compilation capabilities offered by the new (well, now a year old) {compiler} package. Specifically, dealing with the practical difference between enableJIT and the cmpfun functions.

If you do not want to read much, you can just skip to the example part.

As always, I welcome any comments to this post, and hope to update it when future JIT solutions will come along.

Continue reading “Speed up your R code using a just-in-time (JIT) compiler”

Do more with dates and times in R with lubridate 1.1.0

This is a guest post by Garrett Grolemund (mentored by Hadley Wickham)

Lubridate is an R package that makes it easier to work with dates and times. The newest release of lubridate (v 1.1.0) comes with even more tools and some significant changes over past versions. Below is a concise tour of some of the things lubridate can do for you. At the end of this post, I list some of the differences between lubridate (v 0.2.4) and lubridate (v 1.1.0). If you are an old hand at lubridate, please read this section to avoid surprises!

Lubridate was created by Garrett Grolemund and Hadley Wickham.

Parsing dates and times

Getting R to agree that your data contains the dates and times you think it does can be a bit tricky. Lubridate simplifies that. Identify the order in which the year, month, and day appears in your dates. Now arrange “y”, “m”, and “d” in the same order. This is the name of the function in lubridate that will parse your dates. For example,

library(lubridate)
ymd("20110604"); mdy("06-04-2011"); dmy("04/06/2011")
## "2011-06-04 UTC"
## "2011-06-04 UTC"
## "2011-06-04 UTC"

Parsing functions automatically handle a wide variety of formats and separators, which simplifies the parsing process.

If your date includes time information, add h, m, and/or s to the name of the function. ymd_hms() is probably the most common date time format. To read the dates in with a certain time zone, supply the official name of that time zone in the tz argument.

arrive < - ymd_hms("2011-06-04 12:00:00", tz = "Pacific/Auckland")
## "2011-06-04 12:00:00 NZST"
leave <- ymd_hms("2011-08-10 14:00:00", tz = "Pacific/Auckland")
## "2011-08-10 14:00:00 NZST"

Setting and Extracting information

Extract information from date times with the functions second(), minute(), hour(), day(), wday(), yday(), week(), month(), year(), and tz(). You can also use each of these to set (i.e, change) the given information. Notice that this will alter the date time. wday() and month() have an optional label argument, which replaces their numeric output with the name of the weekday or month.

second(arrive)
## 0
second(arrive) < - 25
arrive
## "2011-06-04 12:00:25 NZST"
second(arrive) <- 0
wday(arrive)
## 7
wday(arrive, label = TRUE)
## Sat

Time Zones

There are two very useful things to do with dates and time zones. First, display the same moment in a different time zone. Second, create a new moment by combining a given clock time with a new time zone. These are accomplished by with_tz() and force_tz().

For example, I spent last summer researching in Auckland, New Zealand. I arranged to meet with my advisor, Hadley, over skype at 9:00 in the morning Auckland time. What time was that for Hadley who was back in Houston, TX?

meeting < - ymd_hms("2011-07-01 09:00:00", tz = "Pacific/Auckland")
## "2011-07-01 09:00:00 NZST"
with_tz(meeting, "America/Chicago")
## "2011-06-30 16:00:00 CDT"

So the meetings occurred at 4:00 Hadley’s time (and the day before no less). Of course, this was the same actual moment of time as 9:00 in New Zealand. It just appears to be a different day due to the curvature of the Earth.

What if Hadley made a mistake and signed on at 9:00 his time? What time would it then be my time?

mistake < - force_tz(meeting, "America/Chicago")
## "2011-07-01 09:00:00 CDT"
with_tz(mistake, "Pacific/Auckland")
## "2011-07-02 02:00:00 NZST"

His call would arrive at 2:00 am my time! Luckily he never did that.

Continue reading “Do more with dates and times in R with lubridate 1.1.0”

Printing nested tables in R – bridging between the {reshape} and {tables} packages

This post shows how to print a prettier nested pivot table, created using the {reshape} package (similar to what you would get with Microsoft Excel), so you could print it either in the R terminal or as a LaTeX table. This task is done by bridging between the cast_df object produced by the {reshape} package, […]

This post shows how to print a prettier nested pivot table, created using the {reshape} package (similar to what you would get with Microsoft Excel), so you could print it either in the R terminal or as a LaTeX table. This task is done by bridging between the cast_df object produced by the {reshape} package, and the tabular function introduced by the new {tables} package.

Here is an example of the type of output we wish to produce in the R terminal:

1
2
3
4
5
6
7
       ozone       solar.r        wind         temp
 month mean  sd    mean    sd     mean   sd    mean  sd
 5     23.62 22.22 181.3   115.08 11.623 3.531 65.55 6.855
 6     29.44 18.21 190.2    92.88 10.267 3.769 79.10 6.599
 7     59.12 31.64 216.5    80.57  8.942 3.036 83.90 4.316
 8     59.96 39.68 171.9    76.83  8.794 3.226 83.97 6.585
 9     31.45 24.14 167.4    79.12 10.180 3.461 76.90 8.356

Or in a latex document:

Motivation: creating pretty nested tables

In a recent post we learned how to use the {reshape} package (by Hadley Wickham) in order to aggregate and reshape data (in R) using the melt and cast functions.

The cast function is wonderful but it has one problem – the format of the output. As opposed to a pivot table in (for example) MS excel, the output of a nested table created by cast is very “flat”. That is, there is only one row for the header, and only one column for the row names. So for both the R terminal, or an Sweave document, when we deal with a more complex reshaping/aggregating, the result is not something you would be proud to send to a journal.

The opportunity: the {tables} package

The good news is that Duncan Murdoch have recently released a new package to CRAN called {tables}. The {tables} package can compute and display complex tables of summary statistics and turn them into nice looking tables in Sweave (LaTeX) documents. For using the full power of this package, you are invited to read through its detailed (and well written) 23 pages Vignette. However, some of us might have preferred to keep using the syntax of the {reshape} package, while also benefiting from the great formatting that is offered by the new {tables} package. For this purpose, I devised a function that bridges between cast_df (from {reshape}) and the tabular function (from {tables}).

The bridge: between the {tables} and the {reshape} packages

The code for the function is available on my github (link: tabular.cast_df.r on github) and it seems to works fine as far as I can see (though I wouldn’t run it on larger data files since it relies on melting a cast_df object.)

Here is an example for how to load and use the function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
######################
# Loading the functions
######################
# Making sure we can source code from github
source("http://www.r-statistics.com/wp-content/uploads/2012/01/source_https.r.txt")
 
# Reading in the function for using tabular on a cast_df object:
source_https("https://raw.github.com/talgalili/R-code-snippets/master/tabular.cast_df.r")
 
 
 
######################
# example:
######################
 
############
# Loading and preparing some data
require(reshape)
names(airquality) <- tolower(names(airquality))
airquality2 <- airquality
airquality2$temp2 <- ifelse(airquality2$temp > median(airquality2$temp), "hot", "cold")
aqm <- melt(airquality2, id=c("month", "day","temp2"), na.rm=TRUE)
colnames(aqm)[4] <- "variable2"	# because otherwise the function is having problem when relying on the melt function of the cast object
head(aqm,3)
#  month day temp2 variable2 value
#1     5   1  cold     ozone    41
#2     5   2  cold     ozone    36
#3     5   3  cold     ozone    12
 
############
# Running the example:
tabular.cast_df(cast(aqm, month ~ variable2, c(mean,sd)))
tabular(cast(aqm, month ~ variable2, c(mean,sd))) # notice how we turned tabular to be an S3 method that can deal with a cast_df object
Hmisc::latex(tabular(cast(aqm, month ~ variable2, c(mean,sd)))) # this is what we would have used for an Sweave document

And here are the results in the terminal:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
>
> tabular.cast_df(cast(aqm, month ~ variable2, c(mean,sd)))
 
       ozone       solar.r        wind         temp
 month mean  sd    mean    sd     mean   sd    mean  sd
 5     23.62 22.22 181.3   115.08 11.623 3.531 65.55 6.855
 6     29.44 18.21 190.2    92.88 10.267 3.769 79.10 6.599
 7     59.12 31.64 216.5    80.57  8.942 3.036 83.90 4.316
 8     59.96 39.68 171.9    76.83  8.794 3.226 83.97 6.585
 9     31.45 24.14 167.4    79.12 10.180 3.461 76.90 8.356
> tabular(cast(aqm, month ~ variable2, c(mean,sd))) # notice how we turned tabular to be an S3 method that can deal with a cast_df object
 
       ozone       solar.r        wind         temp
 month mean  sd    mean    sd     mean   sd    mean  sd
 5     23.62 22.22 181.3   115.08 11.623 3.531 65.55 6.855
 6     29.44 18.21 190.2    92.88 10.267 3.769 79.10 6.599
 7     59.12 31.64 216.5    80.57  8.942 3.036 83.90 4.316
 8     59.96 39.68 171.9    76.83  8.794 3.226 83.97 6.585
 9     31.45 24.14 167.4    79.12 10.180 3.461 76.90 8.356

And in an Sweave document:

Here is an example for the Rnw file that produces the above table:
cast_df to tabular.Rnw

I will finish with saying that the tabular function offers more flexibility then the one offered by the function I provided. If you find any bugs or have suggestions of improvement, you are invited to leave a comment here or inside the code on github.

(Link-tip goes to Tony Breyal for putting together a solution for sourcing r code from github.)