ggedit – interactive ggplot aesthetic and theme editor

Guest post by Jonathan Sidi, Metrum Research Group

ggplot2 has become the standard of plotting in R for many users. New users, however, may find the learning curve steep at first, and more experienced users may find it challenging to keep track of all the options (especially in the theme!).

ggedit is a package that helps users bridge the gap between making a plot and getting all of those pesky plot aesthetics just right, all while keeping everything portable for further research and collaboration.

ggedit is powered by a Shiny gadget where the user inputs a ggplot plot object or a list of ggplot objects. You can run ggedit directly from the console from the Addin menu within RStudio.

Installation

devtools::install_github("metrumresearchgroup/ggedit",subdir="ggedit")

Layers

The gadget creates a popup window which is populated by the information found in each layer. You can edit the aesthetic values found in a layer and see the changes happen in real time.

You can edit the aesthetic layers while still preserving the original plot, because the changed layers are cloned from the original plot object and are independent of it. The edited layers are provided in the output as objects, so you can use the layers independent of the plot using regular ggplot2 grammar. This is a great advantage when collaborating with other people, where you can send a plot to team members to edit the layers aesthetics and they can send you back just the new layers for you to implement them.

Themes

ggedit also has a theme editor inside. You can edit any element in the theme and see the changes in real time, making the trial and error process quick and easy. Once you are satisfied with the edited theme you can apply it to other plots in the plot list with one click or even make it the session theme regardless of the gadget. As with layers, the new theme object is part of the output, making collaboration easy.

Outputs

The gadget returns a list containing 4 elements

  • updatedPlots
    • List containing updated ggplot objects
  • updatedLayers
    • For each plot a list of updated layers (ggproto) objects
    • Portable object
  • updatedLayersElements
    • For each plot a list elements and their values in each layer
    • Can be used to update the new values in the original code
  • updatedThemes
    • For each plot a list of updated theme objects
    • Portable object
    • If the user doesn’t edit the theme updatedThemes will not be returned

rgg

After you finish editing the plots the natural progression is to use them in the rest of the script. In ggedit there is the function rgg (remove and replace ggplot). Using this function you can chain into the original code changes to the plot without multiplying script needlessly.

With this function you can

Specify which layer you want to remove from a plot:

ggObj%>%rgg('line')

Provide an index to a specific layer, in instances where there are more than one layer of the same type in the plot

ggObj%>%rgg('line',2)

Remove a layer from ggObj and replace it with a new one from the ggedit output p.out

ggObj%>%rgg('line',newLayer = p.out$UpdatedLayers)

Remove a layer and replace it with a new one and the new theme

ggObj%>%rgg('line',newLayer = p.out$UpdatedLayers)+p.out$UpdatedThemes

There is also a plotting function for ggedit objects that creates a grid.view for you and finds the best grid size for the amount of plots you have in the list. And for the exotic layouts you can give specific positions and the rest will be done for you. If you didn’t use ggedit, you can still add the class to any ggplot and use the plotting function just the same.

plot(as.ggedit(list(p0,p1,p2,p3)),list(list(rows=1,cols=1:3),
                                       list(rows=2,cols=2),
                                       list(rows=2,cols=1),
                                       list(rows=2,cols=3))
)

Addin Launch

To launch the Shiny gadget from the addin menu highlight the code that creates the plot object or the plot name in the source pane of Rstudio, then click on the ggedit addin from the Addins the dropdown menu.


Jonathan Sidi joined Metrum Researcg Group in 2016 after working for several years on problems in applied statistics, financial stress testing and economic forecasting in both industrial and academic settings.

To learn more about additional open-source software packages developed by Metrum Research Group please visit the Metrum website.

Contact: For questions and comments, feel free to email me at: [email protected] or open an issue in github.

R 3.3.2 is released!

R 3.3.2 (codename “Sincere Pumpkin Patch”) was released yesterday You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of bug fixes and new features is provided below.

Upgrading to R 3.3.2 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE) # only for R versions older than 3.3.0
installr::updateR() # updating R.

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package. If you only see the option to upgrade to an older version of R, then change your mirror or try again in a few hours (it usually take around 24 hours for all CRAN mirrors to get the latest version of R).

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

Continue reading “R 3.3.2 is released!”

Set Application Domain Name with Shiny Server

Guest post by AVNER KANTOR

I used the wonderful tutorial of Dean Attall to set my machine in Google cloud. After I finished to configure it successfully I wanted to redirect my domain to the Shiny application URL. This is a short description how you can do it.

Continue reading “Set Application Domain Name with Shiny Server”

Presidential Election Predictions 2016 (an ASA competition)

Guest post by Jo Hardinprofessor of mathematics, Pomona College.

ASA’s Prediction Competition

In this election year, the American Statistical Association (ASA) has put together a competition for students to predict the exact percentages for the winner of the 2016 presidential election. They are offering cash prizes for the entry that gets closest to the national vote percentage and that best predicts the winners for each state and the District of Columbia. For more details see:

http://thisisstatistics.org/electionprediction2016/

To get you started, I’ve written an analysis of data scraped from fivethirtyeight.com. The analysis uses weighted means and a formula for the standard error (SE) of a weighted mean. For your analysis, you might consider a similar analysis on the state data (what assumptions would you make for a new weight function?). Or you might try some kind of model – either a generalized linear model or a Bayesian analysis with an informed prior. The world is your oyster!

Continue reading “Presidential Election Predictions 2016 (an ASA competition)”

Using 2D Contour Plots within {ggplot2} to Visualize Relationships between Three Variables

Guest post by John Bellettiere, Vincent Berardi, Santiago Estrada

The Goal

To visually explore relations between two related variables and an outcome using contour plots. We use the contour function in Base R to produce contour plots that are well-suited for initial investigations into three dimensional data. We then develop visualizations using ggplot2 to gain more control over the graphical output. We also describe several data transformations needed to accomplish this visual exploration.

Continue reading “Using 2D Contour Plots within {ggplot2} to Visualize Relationships between Three Variables”

heatmaply: interactive heat maps (with R)

I am pleased to announce heatmaply, my new R package for generating interactive heat maps, based on the plotly R package.

tl;dr

By running the following 3 lines of code:

install.packages("heatmaply")
library(heatmaply)
heatmaply(mtcars, k_col = 2, k_row = 3) %>% layout(margin = list(l = 130, b = 40))

You will get this output in your browser (or RStudio console):

Continue reading “heatmaply: interactive heat maps (with R)”

R 3.3.0 is released!

R 3.3.0 (codename “Supposedly Educational”) was released today. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Upgrading to R 3.3.0 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE)
installr::updateR() # updating R.

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package. If you only see the option to upgrade to an older version of R, then change your mirror or try again in a few hours (it usually take around 24 hours for all CRAN mirrors to get the latest version of R).

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

CHANGES IN R 3.3.0

SIGNIFICANT USER-VISIBLE CHANGES

  • nchar(x, *)‘s argument keepNA governing how the result for NAs in x is determined, gets a new default keepNA = NA which returns NA where x is NA, except for type = "width" which still returns 2, the formatting / printing width of NA.
  • All builds have support for https: URLs in the default methods for download.file(), url() and code making use of them.Unfortunately that cannot guarantee that any particular https: URL can be accessed. For example, server and client have to successfully negotiate a cryptographic protocol (TLS/SSL, …) and the server’s identity has to be verifiable via the available certificates. Different access methods may allow different protocols or use private certificate bundles: we encountered a https: CRAN mirror which could be accessed by one browser but not by another nor by download.file() on the same Linux machine.

NEW FEATURES

Continue reading “R 3.3.0 is released!”

Election tRends: An interactive US election tracker (using Shiny and Plotly)

Guest post by Jonathan Sidi

Introduction

The US primaries are coming on fast with almost 120 days left until the conventions. After building a shinyapp for the Israeli Elections I decided to update features in the app and tried out plotly in the shiny framework.

As a casual voter, trying to gauge the true temperature of the political landscape from the overwhelming abundance of polling is a heavy task. Polling data is continuously published during the state primaries and the variety of pollsters makes it hard to keep track what is going on. The app self updates using data published publicly by realclearpolitics.com.

The app keeps track of polling trends and delegate count daily for you. You create a personal analysis from the granular level data all the way to distributions using interactive ggplot2 and plotly graphs and check out the general elections polling to peak into the near future.

The app can be accessed through a couple of places. I set up an AWS instance to host the app for realtime use and there is the Github repository that is the maintained home of the app that is meant for the R community that can host shiny locally.

Running the App through Github

(github repo: yonicd/Elections)

#changing locale to run on Windows
if (Sys.info()[1] == "Windows") Sys.setlocale("LC_TIME","C") 
 
#check to see if libraries need to be installed
libs=c("shiny","shinyAce","plotly","ggplot2","rvest","reshape2","zoo","stringr","scales","plyr","dplyr")
x=sapply(libs,function(x)if(!require(x,character.only = T)) install.packages(x));rm(x,libs)
 
#run App
shiny::runGitHub("yonicd/Elections",subdir="USA2016/shiny")
 
#reset to original locale on Windows
if (Sys.info()[1] == "Windows") Sys.setlocale("LC_ALL")

Application Layout:

(see next section for details)

  1. Current Polling
  2. Election Analyis
  3. General Elections
  4. Polling Database

Usage Instructions:

Current Polling

  • The top row depicts the current accumulation of delegates by party and candidate is shown in a step plot, with a horizontal reference line for the threshold needed per party to recieve the nomination. Ther accumulation does not include super delegates since it is uncertain which way they will vote. Currently this dataset is updated offline due to its somewhat static nature and the way the data is posted online forces the use of Selenium drivers. An action button will be added to invoke refreshing of the data by users as needed.
  • The bottom row is a 7 day moving average of all polling results published on the state and national level. The ribbon around the moving average is the moving standard deviation on the same window. This is helpful to pick up any changes in uncertainty regarding how the voting public is percieving the candidates. It can be seen that candidates with lower polling averages and increased variance trend up while the opposite is true with the leading candidates, where voter uncertainty is a bad thing for them.

Snapshot of Overview Plot

Continue reading “Election tRends: An interactive US election tracker (using Shiny and Plotly)”

50 years of Data Science – by David Donoho

David Donoho published a fascinating paper based on a presentation at the Tukey Centennial workshop, Princeton NJ Sept 18 2015. You can download the full paper from here. 

The paper got quite the attention on Hacker News, Data Science Central, Simply Stats, Xi’an’s blog, srown ion medium, and probably others. Share your thoughts in the comments.

Here is the abstract and table of content.

Abstract

More than 50 years ago, John Tukey called for a reformation of academic statistics. In ‘The Future of Data Analysis’, he pointed to the existence of an as-yet unrecognized science, whose subject of interest was learning from data, or ‘data analysis’. Ten to twenty years ago, John Chambers, Bill Cleveland and Leo Breiman independently once again urged academic statistics to expand its boundaries beyond the classical domain of theoretical statistics; Chambers called for more emphasis on data preparation and presentation rather than statistical modeling; and Breiman called for emphasis on prediction rather than inference. Cleveland even suggested the catchy name “Data Science” for his envisioned field.

A recent and growing phenomenon is the emergence of “Data Science” programs at major universities, including UC Berkeley, NYU, MIT, and most recently the Univ. of Michigan, which on September 8, 2015 announced a $100M “Data Science Initiative” that will hire 35 new faculty. Teaching in these new programs has significant overlap in curricular subject matter with traditional statistics courses; in general, though, the new initiatives steer away from close involvement with academic statistics departments.

This paper reviews some ingredients of the current “Data Science moment”, including recent commentary about data science in the popular media, and about how/whether Data Science is really different from Statistics.

The now-contemplated field of Data Science amounts to a superset of the fields of statistics and machine learning which adds some technology for ‘scaling up’ to ‘big data’. This chosen superset is motivated by commercial rather than intellectual developments. Choosing in this way is likely to miss out on the really important intellectual event of the next fifty years.

Because all of science itself will soon become data that can be mined, the imminent revolution in Data Science is not about mere ‘scaling up’, but instead the emergence of scientific studies of data analysis science-wide. In the future, we will be able to predict how a proposal to change data analysis workflows would impact the validity of data analysis across all of science, even predicting the impacts field-by-field. Drawing on work by Tukey, Cleveland, Chambers and Breiman, I present a vision of data science based on the activities of people who are ‘learning from data’, and I describe an academic field dedicated to improving that activity in an evidence-based manner. This new field is a better academic enlargement of statistics and machine learning than today’s Data Science Initiatives, while being able to accommodate the same short-term goals.

Contents

1 Today’s Data Science Moment

2 Data Science ‘versus’ Statistics

2.1 The ‘Big Data’ Meme

2.2 The ‘Skills’ Meme

2.3 The ‘Jobs’ Meme

2.4 What here is real?

2.5 A Better Framework

3 The Future of Data Analysis, 1962

4 The 50 years since FoDA

4.1 Exhortations

4.2 Reification

5 Breiman’s ‘Two Cultures’, 2001

6 The Predictive Culture’s Secret Sauce

6.1 The Common Task Framework

6.2 Experience with CTF

6.3 The Secret Sauce

6.4 Required Skills

7 Teaching of today’s consensus Data Science

8 The Full Scope of Data Science

8.1 The Six Divisions

8.2 Discussion

8.3 Teaching of GDS

8.4 Research in GDS

8.4.1 Quantitative Programming Environments: R

8.4.2 Data Wrangling: Tidy Data

8.4.3 Research Presentation: Knitr

8.5 Discussion

9 Science about Data Science

9.1 Science-Wide Meta Analysis

9.2 Cross-Study Analysis

9.3 Cross-Workflow Analysis

9.4 Summary

10 The Next 50 Years of Data Science

10.1 Open Science takes over

10.2 Science as data

10.3 Scientific Data Analysis, tested Empirically

10.3.1 DJ Hand (2006)

10.3.2 Donoho and Jin (2008)

10.3.3 Zhao, Parmigiani, Huttenhower and Waldron (2014)

10.4 Data Science in 2065

11 Conclusion

You can download the full paper from here. 

R 3.2.3 is released (with improvements for Windows users, and general bug fixes)

R 3.2.3 (codename “Wooden Christmas Tree”) was released several days ago. You can get the latest binaries version from here. (or the .tar.gz source code from here). The full list of new features and bug fixes is provided below.

Major changes in R 3.2.3

As highlighted by David Smith, this release makes a few small improvements and bug fixes to R, including:

  • Improved support for users of the Windows OS in time zones, OS version identification, FTP connections, and printing (in the GUI).
  • Performance improvements and more support for long vectors in some functions including which.max
  • Improved accuracy for the Chi-Square distribution functions in some extreme cases

Upgrading to R 3.2.3 on Windows

If you are using Windows you can easily upgrade to the latest version of R using the installr package. Simply run the following code in Rgui:

install.packages("installr") # install 
setInternet2(TRUE)
installr::updateR() # updating R.

Running “updateR()” will detect if there is a new R version available, and if so it will download+install it (etc.). There is also a step by step tutorial (with screenshots) on how to upgrade R on Windows, using the installr package.

I try to keep the installr package updated and useful, so if you have any suggestions or remarks on the package – you are invited to open an issue in the github page.

NEW FEATURES

  • Some recently-added Windows time zone names have been added to the conversion table used to convert these to Olson names. (Including those relating to changes for Russia in Oct 2014, as in PR#16503.)
  • (Windows) Compatibility information has been added to the manifests for ‘Rgui.exe’, ‘Rterm.exe’ and ‘Rscript.exe’. This should allow win.version() andSys.info() to report the actual Windows version up to Windows 10.
  • Windows "wininet" FTP first tries EPSV / PASV mode rather than only using active mode (reported by Dan Tenenbaum).
  • which.min(x) and which.max(x) may be much faster for logical and integer x and now also work for long vectors.
  • The ‘emulation’ part of tools::texi2dvi() has been somewhat enhanced, including supporting quiet = TRUE. It can be selected by texi2dvi = "emulation".(Windows) MiKTeX removed its texi2dvi.exe command in Sept 2015: tools::texi2dvi() tries texify.exe if it is not found.
  • (Windows only) Shortcuts for printing and saving have been added to menus in Rgui.exe. (Request of PR#16572.)
  • loess(..., iterTrace=TRUE) now provides diagnostics for robustness iterations, and the print() method for summary(<loess>) shows slightly more.
  • The included version of PCRE has been updated to 8.38, a bug-fix release.
  • View() now displays nested data frames in a more friendly way. (Request with patch in PR#15915.)

BUG FIXES

  • regexpr(pat, x, perl = TRUE) with Python-style named capture did not work correctly when x contained NA strings. (PR#16484)
  • The description of dataset ToothGrowth has been improved/corrected. (PR#15953)
  • model.tables(type = "means") and hence TukeyHSD() now support "aov" fits without an intercept term. (PR#16437)
  • close() now reports the status of a pipe() connection opened with an explicit open argument. (PR#16481)
  • Coercing a list without names to a data frame is faster if the elements are very long. (PR#16467)
  • (Unix-only) Under some rare circumstances piping the output from Rscript or R -f could result in attempting to close the input file twice, possibly crashing the process. (PR#16500)
  • (Windows) Sys.info() was out of step with win.version() and did not report Windows 8.
  • topenv(baseenv()) returns baseenv() again as in R 3.1.0 and earlier. This also fixes compilerJIT(3) when used in ‘.Rprofile’.
  • detach()ing the methods package keeps .isMethodsDispatchOn() true, as long as the methods namespace is not unloaded.
  • Removed some spurious warnings from configure about the preprocessor not finding header files. (PR#15989)
  • rchisq(*, df=0, ncp=0) now returns 0 instead of NaN, and dchisq(*, df=0, ncp=*) also no longer returns NaN in limit cases (where the limit is unique). (PR#16521)
  • pchisq(*, df=0, ncp > 0, log.p=TRUE) no longer underflows (for ncp > ~60).
  • nchar(x, "w") returned -1 for characters it did not know about (e.g. zero-width spaces): it now assumes 1. It now knows about most zero-width characters and a few more double-width characters.
  • Help for which.min() is now more precise about behavior with logical arguments. (PR#16532)
  • The print width of character strings marked as "latin1" or "bytes" was in some cases computed incorrectly.
  • abbreviate() did not give names to the return value if minlength was zero, unlike when it was positive.
  • (Windows only) dir.create() did not always warn when it failed to create a directory. (PR#16537)
  • When operating in a non-UTF-8 multibyte locale (e.g. an East Asian locale on Windows), grep() and related functions did not handle UTF-8 strings properly. (PR#16264)
  • read.dcf() sometimes misread lines longer than 8191 characters. (Reported by Hervé Pagès with a patch.)
  • within(df, ..) no longer drops columns whose name start with a ".".
  • The built-in HTTP server converted entire Content-Type to lowercase including parameters which can cause issues for multi-part form boundaries (PR#16541).
  • Modifying slots of S4 objects could fail when the methods package was not attached. (PR#16545)
  • splineDesign(*, outer.ok=TRUE) (splines) is better now (PR#16549), and interpSpline() now allows sparse=TRUE for speedup with non-small sizes.
  • If the expression in the traceback was too long, traceback() did not report the source line number. (Patch by Kirill Müller.)
  • The browser did not truncate the display of the function when exiting with options("deparse.max.lines") set. (PR#16581)
  • When bs(*, Boundary.knots=) had boundary knots inside the data range, extrapolation was somewhat off. (Patch by Trevor Hastie.)
  • var() and hence sd() warn about factor arguments which are deprecated now. (PR#16564)
  • loess(*, weights = *) stored wrong weights and hence gave slightly wrong predictions for newdata. (PR#16587)
  • aperm(a, *) now preserves names(dim(a)).
  • poly(x, ..) now works when either raw=TRUE or coef is specified. (PR#16597)
  • data(package=*) is more careful in determining the path.
  • prettyNum(*, decimal.mark, big.mark): fixed bug introduced when fixing PR#16411.

INSTALLATION and INCLUDED SOFTWARE

  • The included configuration code for libintl has been updated to that from gettext version 0.19.5.1 — this should only affect how an external library is detected (and the only known instance is under OpenBSD). (Wish of PR#16464.)
  • configure has a new argument –disable-java to disable the checks for Java.
  • The configure default for MAIN_LDFLAGS has been changed for the FreeBSD, NetBSD and Hurd OSes to one more likely to work with compilers other than gcc(FreeBSD 10 defaults to clang).
  • configure now supports the OpenMP flags -fopenmp=libomp (clang) and -qopenmp (Intel C).
  • Various macros can be set to override the default behaviour of configure when detecting OpenMP: see file ‘config.site’.
  • Source installation on Windows has been modified to allow for MiKTeX installations without texi2dvi.exe. See file ‘MkRules.dist’.