Amazon Web Services (AWS) include many different computational tools, ranging from storage systems and virtual servers to databases and analytical tools. For us R-programmers, being familiar and experienced with these tools can be extremely beneficial in terms of efficiency, style, money-saving and more.
In this post we present a step-by-step screenshot tutorial that will get you to know Amazon EC2 service. We will set up an EC2 instance (Amazon virtual server), install an Rstudio server on it and use our beloved Rstudio via browser (all for free!). The slides below will also include an introduction to linux commands (basic), instructions for connecting to a remote server via ssh and more. No previous knowledge is required.
Set up an AWS account (do not worry about the credit card details, you will not be charged for any of our actions) – the steps are presented in the slides below.
Windows users: download MobaXterm (or any other ssh client software). Mac users: make sure you are familiar with the terminal (cause I’m not).
# installing/loading the package:
install.packages("installr"); require(installr)} #load / install+load installr
updateR(to_checkMD5sums = FALSE) # the use of to_checkMD5sums is because of a slight bug in the MD5 file on R 3.0.2. This issue is already resolved in the installr version on github, and will be released into CRAN in about a month from now..
I try to keep the installr package updated and useful. If you have any suggestions or remarks on the package, you’re invited to leave a comment below.
If you use the global library system (as I do), you can run the following in the new version of R:
What are the top 100 (most downloaded) R packages in 2013? Thanks to the recent release of RStudio of their “0-cloud” CRAN log files (but without including downloads from the primary CRAN mirror or any of the 88 other CRAN mirrors), we can now answer this question (at least for the months of Jan till May)!
By relying on the nice code that Felix Schonbrodt recently wrote for tracking packages downloads, I have updated my installr R package with functions that enables the user to easily download and visualize the popularity of R packages over time. In this post I will share some nice plots and quick insights that can be made from this great data. The code for this analysis is given at the end of this post.
Top 8 most downloaded R packages – downloads over time
Let’s first have a look at the number of downloads per day for these 5 months, of the top 8 most downloaded packages (click the image for a larger version):
We can see the strong weekly seasonality of the downloads, with Saturday and Sunday having much fewer downloads than other days. This is not surprising since we know that the countries which uses R the most have these days as rest days (see James Cheshire’s world map of R users). It is also interesting to note how some packages had exceptional peaks on some dates. For example, I wonder what happened on January 23rd 2013 that the digest package suddenly got so many downloads, or that colorspace started getting more downloads from April 15th 2013.
“Family tree” of the top 100 most downloaded R packages
We can extract from this data the top 100 most downloaded R packages. Moreover, we can create a matrix showing for each package which of our unique ids (censored IP addresses), has downloaded which package. Using this indicator matrix, we can thing of the “similarity” (or distance) between each two packages, and based on that we can create a hierarchical clustering of the packages – showing which packages “goes along” with one another.
With this analysis, you can locate package on the list which you often use, and then see which other packages are “related” to that package. If you don’t know that package – consider having a look at it – since other R users are clearly finding the two packages to be “of use”.
Such analysis can (and should!) be extended. For example, we can imagine creating a “suggest a package” feature based on this data, utilizing the package which you use, the OS that you use, and other parameters. But such coding is beyond the scope of this post.
Here is the “family tree” (dendrogram) of related packages:
To make it easier to navigate, here is a table with links to the top 100 R packages, and their links:
Update (2019-08-17): to see a good solution for this problem, please go to this link. The solution in the post is old and while it still works, it is better to use the newer methods from the link.
The problem: producing a Word (.docx) file of a statistical report created in R, with as little overhead as possible. The solution: combining R+knitr+rmarkdown+pander+pandoc (it is easier than it is spelled).
If you get what this post is about, just jump to the “Solution: the workflow” section.
Preface: why is this a problem (/still)
Before turning to the solution, let’s address two preliminary questions:
Q: Why is it important to be able to create report in Word from R?
A: Because many researchers we may work with are used to working with Word for editing their text, tracking changes and merging edits between different authors, and copy-pasting text/tables/images from various sources. This means that a report produced as a PDF file is less useful for collaborating with less-tech-savvy researchers (copying text or tables from PDF is not fun). Even exchanging HTML files may appear somewhat awkward to fellow researchers. Continue reading “Writing a MS-Word document using R (with as little overhead as possible)”