A guest post by Jeff Hemsley, who has co-authored with Karine Nahon a new book titled Going Viral. ————————-
In Going Viral (Polity Press, 2013) we explore the topic of virality, the process of sharing messages that results in a fast, broad spread of information. What does that have to do R, or the R-bloggers community? First and foremost, we use the R-bloggers community as an example of the role of interest networks (see description below) in driving viral events. But we also used R as our go-to tool for our research that went into the book. Even the cover art, pictured here, was created with R, using the iGraph package. Included below is an excerpt from chapter 4 that includes the section on interest networks and R-bloggers.
Disclaimer: This post is not intended to be a comprehensive review, but more of a “getting started guide”. If I did not mention an important tool or package I apologize, and invite readers to contribute in the comments.
I have recently had the delight to participate in a “Brain Hackathon” organized as part of the OHBM2013 conference. Being supported by Amazon, the hackathon participants were provided with Amazon credit in order to promote the analysis using Amazon’s Web Services (AWS). We badly needed this computing power, as we had 14*109 p-values to compute in order to localize genetic associations in the brain leading to Figure 1.
Figure 1- Brain volumes significantly associated to genotype.
While imaging genetics is an interesting research topic, and the hackathon was a great idea by itself, it is the AWS I wish to present in this post. Starting with the conclusion:
Storing your data and analyzing it on the cloud, be it AWS, Azure, Rackspace or others, is a quantum leap in analysis capabilities. I fell in love with my new cloud powers and I strongly recommend all statisticians and data scientists get friendly with these services. I will also note that if statisticians do not embrace these new-found powers, we should not be surprised if data analysis becomes synonymous with Machine Learning and not with Statistics (if you have no idea what I am talking about, read this excellent post by Larry Wasserman).
As motivation for analysis in the cloud consider:
The ability to do your analysis from any device, be it a PC, tablet or even smartphone.
The ability to instantaneously augment your CPU and memory to any imaginable configuration just by clicking a menu. Then scaling down to save costs once you are done.
The ability to instantaneously switch between operating systems and system configurations.
The ability to launch hundreds of machines creating your own cluster, parallelizing your massive job, and then shutting it down once done.
Here is a quick FAQ before going into the setup stages.
R-bloggers.com is now three years young. The site is an (unofficial) online journal of the R statistical programming environment, written by bloggers who agreed to contribute their R articles to the site.
Last year, I posted on the top 24 R posts of 2011. In this post I wish to celebrate R-bloggers’ third birthmounth by sharing with you:
Links to the top 100 most read R posts of 2012
Statistics on “how well” R-bloggers did this year
My wishlist for the R community for 2013 (blogging about R, guest posts, and sponsors)
1. Top 100 R posts of 2012
R-bloggers’ success is thanks to the content submitted by the over 400 R bloggers who have joined r-bloggers. The R community currently has around 245 active R bloggers (links to the blogs are clearly visible in the right navigation bar on the R-bloggers homepage). In the past year, these bloggers wrote around 3200 posts about R!
Here is a list of the top visited posts on the site in 2012 (you can see the number of unique visitors in parentheses, while the list is ordered by the number of total page views):
(Guest post by Achim Zeileis) Development of the R package exams for automatic generation of (statistical) exams in R started in 2006 and version 1 was published in JSS by Grün and Zeileis (2009). It was based on standalone Sweaveexercises, that can be combined into exams, and then rendered into different kinds of PDF output (exams, solutions, self-study materials, etc.). Now, a major revision of the package has been released that extends the capabilities and adds support for learning management systems. It is still based on the same type of Sweave files for each exercise but can also render them into output formats like HTML (with various options for displaying mathematical content) and XML specifications for online exams in learning management systems such as Moodle or OLAT. Supplementary files such as graphics or data are handled automatically. Here, I give a brief overview of the new capabilities. A detailed discussion is in the working paper by Zeileis, Umlauf, and Leisch (2012) that is also contained in the package as a vignette. Read more »
A few days back the RStudio blog announced Shiny, a new product for easily creating interactive web applications (http://www.rstudio.com/shiny/). I wanted to compare this new framework to one I’ve worked on, gWidgetsWWW2.rapache – a version of the gWidgets API for use with Jeffrey Horner’s rapache module for the Apache web server (available at GitHub). The gWidgets API has a similar aim to make it easy for R users to create interactive applications.
I don’t want to worry here about deployment of apps, just the writing side. The shiny package uses websockets to transfer data back and forth from browser to server. Though this may cause issues with wider deployment, the industrious RStudio folks have a hosting program in beta for internet-wide deployment. For local deployment, no problems as far as I know – as long as you avoid older versions of internet explorer.
Now, Shiny seems well suited for applications where the user can parameterize a resulting graphic, so that was the point of comparison. Peter Dalgaard’s tcltk package ships with a classic demo tkdensity.R. I use that for inspiration below. That GUI allows the user a few selections to modify a density plot of a random sample. Read more »
In it Dr. De Mars wrote (I allowed myself to emphasize some parts of the text):
Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. [...] for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.
The new version has a lot of cool new features, like advanced data import, integration with Google docs, converting variables from numeric to factor to dates and vice versa, and a lot of new geom’s. Some of which you can watch in his new video demo of the application:
One of the exciting new frontiers for R programming is of creating website interfaces to R code. At the forefront of this domain is a young and (very) bright man called Jeroen Ooms, whom I had the pleasure of meeting at useR 2009 (press the link to see his presentation).
New features include 1D geom’s (histogram, density, freqpoly), syntax mode (by clicking the tiny arrow at the bottom), and some additional facet options. And some minor improvements and fixes, most notably for Internet Explorer. The data upload has not been improved yet, I am working on that. For now, it supports .csv, .sav (spss), and tab delimited data. Please make sure your filename has the appropriate extension and every column has a header in your data. If you export a dataframe from R, use: write.csv(mydf, ”mydf.csv” , row.names=F). If you upload an spss datafile, none of this should be a concern. Supported browsers are IE6-8, FF, Safari, and Chrome, but a recent browser is highly recommended. As always, feedback is more than welcome.
Here is a little demo video that shows how to use the new features:
In a live webinar today hosted by Alteryx, five industry experts shared 14 analytics predictions for 2014. The panel included Paul Ross (Alteryx), Charles Zedlewski (Cloudera), Rick Schultz (Alteryx), Ellie Fields (Tableau) and Michele Chambers (Revolution Analytics). Their predictions were: Analysts will matter more than data scientists R will replace legac […]
We have written a bit on sample size for common events. We would like to extend this analysis to rare events. In web marketing and a lot of other applications you are trying to estimate a probability of an event (like conversion) where the probability is fairly low (say 5% to 0.5%). In this case […] Related posts: A bit more on sample size Estimating rates f […]
Consumers will not complete long questionnaires, so marketing research must get the most it can from every item. In this post, we look into the toolbox of R packages and search for statistical models that enable us to learn a great deal about eac...
Today, we’re excited to announce the release of Shiny Server version 0.4 as well as the availability of a beta version of Shiny Server Professional Edition. Shiny Server is a platform for hosting Shiny Applications over the Web and has undergone substantial work in the past few months. We have fixed many bugs, added stability enhancements, […]
I’ve been doodling some chart in R/ggplot using geom_text() to generate a labelled scatterplot. The chart actually builds up several layers using different datasets, so it’s not obvious how to set the ranges cleanly: I know the lower bound I want for the y-axis (y=0), but I want to let the upper bound float. There’s […]
digest version 0.6.4 is now on CRAN and in Debian. This is a pure maintenance release which should help with a build issue affecting users on Solaris. CRANberries provides the usual summary of changes to version 0.6.3. Our package is available... […]
An edited book titled Data Mining Applications with R will be on market soon, which features 15 real-word applications on data mining with R. A preview of the book is available on Google Books. R code, data and color figures … Continue reading →
Following the very positive feedback that Andreas and I have received from delegates of the first R in Insurance conference in July of this year, we are planning to repeat the event next year. We have already reserved a bigger auditorium. The second conference on R in Insurance will be held on Monday 14 July 2014 at Cass Business School in London, UK. This o […]