This post is a call for both R community members and R-bloggers, to come and help make The R Programming wikibook be amazing.
The R Programming wikibook is not just another one of the many free books about statistics/R, it is a community project which aims to create a cross-disciplinary practical guide to the R programming language. Here is how you can join:
It appears that just days ago, Google Tech Talk released a new, one hour long, video of a presentation (from June 6, 2011) made by one of R’s community more influential contributors, Hadley Wickham.
This seems to be one of the better talks to send a programmer friend who is interested in getting into R.
Talk abstract
Data analysis, the process of converting data into knowledge, insight and understanding, is a critical part of statistics, but there’s surprisingly little research on it. In this talk I’ll introduce some of my recent work, including a model of data analysis. I’m a passionate advocate of programming that data analysis should be carried out using a programming language, and I’ll justify this by discussing some of the requirement of good data analysis (reproducibility, automation and communication). With these in mind, I’ll introduce you to a powerful set of tools for better understanding data: the statistical programming language R, and the ggplot2 domain specific language (DSL) for visualisation.
Foursquare, the mobile location-sharing app (of which I'm a big fan), has an excellent recommondation system. Based on your recent checkins, places your friends found popular, and even the time of day, Foursquare Explore will recommend a great place for a sushi lunch, or the best place to buy new shoes. This presentation from Foursquare engineer Ben Lee […]
The recent Hack/Reduce hackathon in Montreal was a tonne of fun. Our team tackled a data set of consisting of Bixi (Montreal’s bicycle share system) station states at one minute temporal resolution. We used Hadoop and mapreduce to pull out some features of user behaviours. One of the things we extracted was the flux at […]
by Yanchang Zhao, RDataMining.com With a Mac, parallel computing can be achieved with package multicore. Unfortunately, it does not work under Windows. A simple way for parallel computing under Windows (and also Mac) is using package snowfall, which can work … Continue reading → […]
Diversification is hard to find nowadays because financial markets are becoming increasingly correlated. I found a good visually presentation of Cross Sectional Correlation of stocks in the S&P 500 index in the Trading correlation by D. Varadi and C. Rittenhouse article. Let’s compute and plot the average correlation among stocks in the S&P 500 index […]
We have now completed our revision of the paper Relevant statistics for Bayesian model choice, written with Judith Rousseau, Jean-Michel Marin, and Natesh Pillai. It has been resubmitted to Series B and reposted on arXiv. The major change in the paper is the inclusion of a check about the relevance of a given summary statistics, […]
DEADLINE FAST APPROACHING – 8th Annual International R User Conference useR! 2012, Nashville, Tennessee USA Registration Deadlines: Early Registration: Passed Regular Registration: Mar 1- May 12 Late Registration: May 13 – June 4 On-Site Registration: June 12 – June 15 Please note: Nashville is offering several large entertainment events the month of June, a […]
The story about the great work that SUNY Buffalo has been doing to find a cure for Multiple Sclerosis with Revolution R Enterprise and IBM Netezza has generated a lot of attention, with stories in Forbes, InformationWeek and eWeek (amongst others). To continue the discussion, IBM has put together a panel for a "Tweet Chat" on Thursday (May 10) at n […]
In a previous post I used JAGS to build the Bayesian equivalent of a two-way ANOVA. Effects were determined of products, panelists and their interaction. In this post this model will be rebuild to provide a more simplified and advanced model. The inter... […]
Ph.D candidate in sociology Ethan Fosse just switched from Stata to doing 100% of his analysis with R. His reasons? If you want to do Bayesian analysis or graph modeled coefficients (or work with complex data structures more generally), then R is much easier than Stata due to the object-oriented programming environment. It's unbelievably liberating to b […]
We've been more sensitive to accounting for multiple comparisons recently, in part due to work that Nick and colleagues published on the topic. In this entry, we consider results from a randomized trial (Kypri et al., 2009) to reduce problem drinking ... […]