shinyHeatmaply – a shiny app for creating interactive cluster heatmaps

My friend Jonathan Sidi and I (Tal Galili) are pleased to announce the release of shinyHeatmaply (0.1.0): a new Shiny application (and Shiny gadget) for creating interactive cluster heatmaps. shinyHeatmaply is based on the heatmaply R package which strives to make it easy as possible to create interactive cluster heatmaps.

The app introduces a functionality that saves to disk a self contained copy of the htmlwidget as an html file with your data and specifications you set from the UI, so it can be embedded in webpages, blogposts and online web appendices for academic publications.

You can see some of shinyHeatmaply‘s capabilities in the following 40 seconds video:

Installing shinyHeatmaply

From CRAN:

install.packages('shinyHeatmaply')

From github:

devtools::install_github('yonicd/shinyHeatmaply')

Running the app/gadget

The application has an import interface as part of the application which currently supports csv, txt, tab, xls, xlsx, rd, rda. You can start the app using:

library(shiny)
library(heatmaply)
# If you didn't get shinyHeatmaply yet, you can run it through github:
# runGitHub("yonicd/shinyHeatmaply",subdir = 'inst/shinyapp')
# or just use your locally installed package:
library(shinyHeatmaply)
runApp(system.file("shinyapp", package = "shinyHeatmaply"))

The gadget is called from the R console and accepts input arguments. The object defined as the input to the shinyHeatmaply gadget is a data.frame or a list of data.frames. You can start it using the following code:

library(shinyHeatmaply)

#single data.frame
data(mtcars)
launch_heatmaply(mtcars)

#list
data(iris)
launch_heatmaply(list('Example1'=mtcars,'Example2'=iris))

You can see an example of a saved shinyHeatmaply output here. Or view the following iframe:

Continue reading “shinyHeatmaply – a shiny app for creating interactive cluster heatmaps”

R 3.3.3 is released!

R 3.3.3 (codename “Another Canoe”) was released yesterday You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of bug fixes and new features is provided below.

A quick summary by David Smith:

R 3.3.3 fixes an issue related to attempting to use download.file on sites that automatically redirect from http to https: now, R will re-attempt to download the secure link rather than failing. Other fixes include support for long vectors in the vapply function, the ability to use pmax (and pmin) on ordered factors, improved accuracy for qbeta for some extreme cases, corrected spelling for “Cincinnati” in the precip data set, and a few other minor issues.

Upgrading to R 3.3.3 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package. If you only see the option to upgrade to an older version of R, then change your mirror or try again in a few hours (it usually take around 24 hours for all CRAN mirrors to get the latest version of R).

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

Continue reading “R 3.3.3 is released!”

Data Driven Cheatsheets

Guest post by Jonathan Sidi

Cheatsheets are currently built and used exclusivley as a teaching tool. We want to try and change this and produce a cheat sheet that gives a roadmap to build a known product, but also is built as a function so users can input data into it to make the cheatsheet more personalized. This gives a versalility of a consistent format that people can share with each other, but has the added value of conveying a message through data driven visual changes.

Example

ggplot2 themes

The ggplot2 theme object is an amazing object you can specify nearly any part of the plot that is not conditonal on the data. What sets the theme object apart is that its structure is consistent, but the values in it change. In addition to change a theme it is a single function that too has a consistent call. The reoccuring challenge for users is to remember all the options that can be used in the theme call (there are approximately 220 unique options to calibrate at last count) or bookmark the help page for the theme and remember how you deciphered it last time.

This becomes a problem to pass all the information of the theme to someone who does not know what the values are set in your theme and attach instructions on it to let them recreate it without needing to open any web pages.

In writing the library ggedit we tried to make it easy to edit your theme so you don’t have to know too much about ggplots to make a large number of changes at once, for a quick clip see here. We had to make it easy to track those changes for people who are not versed in R, and plot.theme() was the outcome. In short think of the theme as a lot of small images that are combined to create a singel portrait.

Continue reading “Data Driven Cheatsheets”

ggedit 0.0.2: a GUI for advanced editing of ggplot2 objects

Guest post by Jonathan Sidi, Metrum Research Group

Last week the updated version of ggedit was presented in RStudio::conf2017. First, a BIG thank you to the whole RStudio team for a great conference and being so awesome to answer the insane amount of questions I had (sorry!). For a quick intro to the package see the previous post.

To install the package:

devtools::install_github("metrumresearchgroup/ggedit",subdir="ggedit")

Highlights of the updated version.

verbose script handling during updating in the gagdet (see video below)
verbose script output for updated layers and theme to parse and evaluate in console or editor
colourpicker control for both single colours/fills and and palletes
output for scale objects eg scale*grandient,scale*grandientn and scale*manual
verbose script output for scales eg scale*grandient,scale*grandientn and scale*manual to parse and evaluate in console or editor
input plot objects can have the data in the layer object and in the base object.
- ggplot(data=iris,aes(x=Sepal.Width,y=Sepal.Length,colour=Species))+geom_point()
- ggplot(data=iris,aes(x=Sepal.Width,y=Sepal.Length))+geom_point(aes(colour=Species))
- ggplot()+geom_point(data=iris,aes(x=Sepal.Width,y=Sepal.Length,colour=Species))
plot.theme(): S3 method for class ‘theme’
- visualizing theme objects in single output
- visual comparison of two themes objects in single output
- will be expanded upon in upcoming post

Continue reading “ggedit 0.0.2: a GUI for advanced editing of ggplot2 objects”

ggedit – interactive ggplot aesthetic and theme editor

Guest post by Jonathan Sidi, Metrum Research Group

ggplot2 has become the standard of plotting in R for many users. New users, however, may find the learning curve steep at first, and more experienced users may find it challenging to keep track of all the options (especially in the theme!).

ggedit is a package that helps users bridge the gap between making a plot and getting all of those pesky plot aesthetics just right, all while keeping everything portable for further research and collaboration.

ggedit is powered by a Shiny gadget where the user inputs a ggplot plot object or a list of ggplot objects. You can run ggedit directly from the console from the Addin menu within RStudio.

Continue reading “ggedit – interactive ggplot aesthetic and theme editor”

R 3.3.2 is released!

R 3.3.2 (codename “Sincere Pumpkin Patch”) was released yesterday You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of bug fixes and new features is provided below.

Upgrading to R 3.3.2 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

Continue reading “R 3.3.2 is released!”

Set Application Domain Name with Shiny Server

Guest post by AVNER KANTOR

I used the wonderful tutorial of Dean Attall to set my machine in Google cloud. After I finished to configure it successfully I wanted to redirect my domain to the Shiny application URL. This is a short description how you can do it.

Continue reading “Set Application Domain Name with Shiny Server”

Presidential Election Predictions 2016 (an ASA competition)

Guest post by Jo Hardin, professor of mathematics, Pomona College.

ASA’s Prediction Competition

In this election year, the American Statistical Association (ASA) has put together a competition for students to predict the exact percentages for the winner of the 2016 presidential election. They are offering cash prizes for the entry that gets closest to the national vote percentage and that best predicts the winners for each state and the District of Columbia. For more details see:

http://thisisstatistics.org/electionprediction2016/

To get you started, I’ve written an analysis of data scraped from fivethirtyeight.com. The analysis uses weighted means and a formula for the standard error (SE) of a weighted mean. For your analysis, you might consider a similar analysis on the state data (what assumptions would you make for a new weight function?). Or you might try some kind of model – either a generalized linear model or a Bayesian analysis with an informed prior. The world is your oyster!

Continue reading “Presidential Election Predictions 2016 (an ASA competition)”

The reproducibility crisis in science and prospects for R

Guest post by Gregorio Santori (<[email protected]>)

The results that emerged from a recent Nature‘s survey confirm as, for many researchers, we are living in a weak reproducibility age (Baker M. Is there a reproducibility crisis? Nature 2016;533:453-454). Although the definition of reproducibility can vary widely between disciplines, in this survey was adopted the version for which “another scientist using the same methods gets similar results and can draw the same conclusions” (Reality check on reproducibility. Nature 2016;533:437). Already in 2009, Roger Peng formulated a definition of reproducibility very attractive: “In many fields of study there are examples of scientific investigations that cannot be fully replicated because of a lack of time or resources. In such a situation there is a need for a minimum standard that can fill the void between full replication and nothing. One candidate for this minimum standard is «reproducible research», which requires that data sets and computer code be made available to others for verifying published results and conducting alternative analyses” (Peng R. Reproducible research and Biostatistics. Biostatistics. 2009;10:405-408). For many readers of R-bloggers, the Peng’s formulation probably means in the first place a combination of R, LaTeX, Sweave, knitr, R Markdown, RStudio, and GitHub. From the broader perspective of scholarly journals, it mainly means Web repositories for experimental protocols, raw data, and source code.

Although researchers and funders can contribute in many ways to reproducibility, scholarly journals seem to be in a position to give a decisive advancement for a more reproducible research. In the incipit of the “Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly Work in Medical Journals“, developed by the International Committee of Medical Journals Editors (ICMJE), there is an explicit reference to reproducibility. Moreover, the same ICMJE Recommendations reported as “the Methods section should aim to be sufficiently detailed such that others with access to the data would be able to reproduce the results“, while “[the Statistics section] describe[s] statistical methods with enough detail to enable a knowledgeable reader with access to the original data to judge its appropriateness for the study and to verify the reported results“.

In December 2010, Nature Publishing Group launched Protocol Exchange, “[…] an Open Repository for the deposition and sharing of protocols for scientific research“, where “protocols […] are presented subject to a Creative Commons Attribution-NonCommercial licence“.

In December 2014, PLOS journals announced a new policy for data sharing, resulted in the Data Availability Statement for submitted manuscripts.

In June 2014, at the American Association for the Advancement of Science headquarter, the US National Institute of Health held a joint workshop on the reproducibility, with the participation of the Nature Publishing Group, Science, and the editors representing over 30 basic/preclinical science journals. The workshop resulted in the release of the “Principles and Guidelines for Reporting Preclinical Research“, where rigorous statistical analysis and data/material sharing were emphasized.

In this scenario, I have recently suggested a global “statement for reproducibility” (Research papers: Journals should drive data reproducibility. Nature 2016;535:355). One of the strong points of this proposed statement is represented by the ban of “point-and-click” statistical software. For papers with a “Statistical analysis” section, only original studies carried out by using source code-based statistical environments should be admitted to peer review. In any case, the current policies adopted by scholarly journals seem to be moving towards stringent criteria to ensure more reproducible research. In the next future, the space for “point-and-click” statistical software will progressively shrink, and a cross-platform/open source language/environment such as R will be destined to play a key role.

Using 2D Contour Plots within {ggplot2} to Visualize Relationships between Three Variables

Guest post by John Bellettiere, Vincent Berardi, Santiago Estrada

The Goal

To visually explore relations between two related variables and an outcome using contour plots. We use the contour function in Base R to produce contour plots that are well-suited for initial investigations into three dimensional data. We then develop visualizations using ggplot2 to gain more control over the graphical output. We also describe several data transformations needed to accomplish this visual exploration.

Continue reading “Using 2D Contour Plots within {ggplot2} to Visualize Relationships between Three Variables”