Tag Archives: R - Page 4

Dumping functions from the global environment into an R script file

Looking at a project you didn’t touch for years poses many challenges. The less documentation and organization you had in your files, the more time you’ll have to spend tracing back what you did back when the code was written.

I just opened up such a project, that was before I ever knew to split my .r files to “data.r”, “functions.r”, “do.r”. All I have are several versions of an old .RData file and many .r files with a mix of functions and commands (oh the shame!)

One idea I had for the tracing back was to take the latest version of .RData I had, and see what functions I had in it’s environment. simply typing ls() wouldn’t work. Also, I wanted to have a list of all the functions that where defined in my .RData environment. Thanks to the code recently published by Richie Cotton, I was able to create the “save.functions.from.env”. This function will go through all your defined functions and write them into “d:\\temp.r”.

I hope this might be useful to one of you in the future, here is the code to do it:

save.functions.from.env <- function(file = "d:\\temp.r")
{
	# This function will go through all your defined functions and write them into "d:\\temp.r"
	# let's get all the functions from the envoirnement:
	funs <- Filter(is.function, sapply(ls( ".GlobalEnv"), get))
 
	# Let's 
	for(i in seq_along(funs))
	{
		cat(	# number the function we are about to add
			paste("\n" , "#------ Function number ", i , "-----------------------------------" ,"\n"),
			append = T, file = file
			)
 
		cat(	# print the function into the file
			paste(names(funs)[i] , "<-", paste(capture.output(funs[[i]]), collapse = "\n"), collapse = "\n"),
			append = T, file = file
			)
 
		cat(
			paste("\n" , "#-----------------------------------------" ,"\n"),
			append = T, file = file
			)
	}
 
	cat( # writing at the end of the file how many new functions where added to it
		paste("# A total of ", length(funs), " Functions where written into", file),
		append = T, file = file
		)
	print(paste("A total of ", length(funs), " Functions where written into", file))
}
 
# save.functions.from.env() # this is how you run it

Update: Joshua Ulrich gave on stackoverflow another solution for this challenge:

	newEnv <- new.env()
	load("myFunctions.Rdata", newEnv)
	dump(c(lsf.str(newEnv)), file="normalCodeFile.R", envir=newEnv)

And also suggested to look into ?prompt (which creates documentation files for objects) and / or ?package.skeleton.

Using the {plyr} (1.2) package parallel processing backend with windows

Hadley Wickham has just announced the release of a new R package “reshape2” which is (as Hadley wrote) “a reboot of the reshape package”. Alongside, Hadley announced the release of plyr 1.2.1 (now faster and with support to parallel computation!).
Both releases are exciting due to a significant speed increase they have now gained.

Yet in case of the new plyr package, an even more interesting new feature added is the introduction of the parallel processing backend.

    Reminder what is the `plyr` package all about

    (as written in Hadley’s announcement)

    plyr is a set of tools for a common set of problems: you need to __split__ up a big data structure into homogeneous pieces, __apply__ a function to each piece and then __combine__ all the results back together. For example, you might want to:

    • fit the same model each patient subsets of a data frame
    • quickly calculate summary statistics for each group
    • perform group-wise transformations like scaling or standardising

    It’s already possible to do this with base R functions (like split and the apply family of functions), but plyr makes it all a bit easier with:

    • totally consistent names, arguments and outputs
    • convenient parallelisation through the foreach package
    • input from and output to data.frames, matrices and lists
    • progress bars to keep track of long running operations
    • built-in error recovery, and informative error messages
    • labels that are maintained across all transformations

    Considerable effort has been put into making plyr fast and memory efficient, and in many cases plyr is as fast as, or faster than, the built-in functions.

    You can find out more at http://had.co.nz/plyr/, including a 20 page introductory guide, http://had.co.nz/plyr/plyr-intro.pdf.  You can ask questions about plyr (and data-manipulation in general) on the plyr mailing list. Sign up at http://groups.google.com/group/manipulatr

    What’s new in `plyr` (1.2.1)

    The exiting news about the release of the new plyr version is the added support for parallel processing.

    l*ply, d*ply, a*ply and m*ply all gain a .parallel argument that when TRUE, applies functions in parallel using a parallel backend registered with the
    foreach package.

    The new package also has some minor changes and bug fixes, all can be read here.

    In the original announcement by Hadley, he gave an example of using the new parallel backend with the doMC package for unix/linux.  For windows (the OS I’m using) you should use the doSMP package (as David mentioned in his post earlier today). However, this package is currently only released for “REvolution R” and not released yet for R 2.11 (see more about it here).  But due to the kind help of Tao Shi there is a solution for windows users wanting to have parallel processing backend to plyr in windows OS.

    All you need is to install the doSMP package, according to the instructions in the post “Parallel Multicore Processing with R (on Windows)“, and then use it like this:


    require(plyr) # make sure you have 1.2 or later installed
    x <- seq_len(20)
    wait <- function(i) Sys.sleep(0.1)
    system.time(llply(x, wait))
    # user system elapsed
    # 0 0 2
    require(doSMP)
    workers <- startWorkers(2) # My computer has 2 cores
    registerDoSMP(workers)
    system.time(llply(x, wait, .parallel = TRUE))
    # user system elapsed
    # 0.09 0.00 1.11

    Update (03.09.2012): the above code will no longer work with updated versions of R (R 2.15 etc.)

    Trying to run it will result in the error massage:

    Loading required package: doSMP
    Warning message:
    In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
      there is no package called ‘doSMP’

    Because trying to install the package will give the error massage:

    > install.packages("doSMP")
    Installing package(s) into ‘D:/R/library(as ‘lib’ is unspecified)
    Warning message:
    package ‘doSMP’ is not available (for R version 2.15.0)

    You can fix this be replacing the use of {doSMP} package with the {doParallel}+{foreach} packages. Here is how:

    if(!require(foreach)) install.packages("foreach")
    if(!require(doParallel)) install.packages("doParallel")
    # require(doSMP) # will no longer work...
    library(foreach)
    library(doParallel)
    workers <- makeCluster(2) # My computer has 2 cores
    registerDoParallel(workers)
     
    x <- seq_len(20)
    wait <- function(i) Sys.sleep(0.3)
    system.time(llply(x, wait)) # 6 sec
    system.time(llply(x, wait, .parallel = TRUE)) # 3.53 sec

    Tips for the R beginner (a 5 page overview)

    In this post I publish a PDF document titled “A collection of tips for R in Finance”.
    It is a basic 5 page introduction to R in finances by Arnaud Amsellem (linked in profile).

    The article offers tips related to the following points:

    • Code Editor
    • Organizing R code
    • Update packages
    • Getting external data into R
    • Communicating with external applications
    • Optimizing R code

    This article is well articulated, and offers a perspective of someone who is experienced in the field and touches points that I can imagine beginners might otherwise overlook. I hope publishing it here will be of use to some readers out there.

    Update: as some readers have noted to me (by e-mail, and by commenting), this document touches very lightly on the topic of “finances” in R. I therefore decided to update the title from “R in finance – some tips for beginners”, to it’s current form.

    Lastly: if you (a reader of this blog) feel you have an article (“post”) to contribute, but don’t feel like starting your own blog, feel welcome to contact me, and I’ll be glad to post what you have to say on my blog (and subsequently, also on R bloggers).

    Here is the article:
    Read more »

    Rose plot using Deducers ggplot2 plot builder

    The (excellent!) LearnR blog had a post today about making a rose plot in
    ggplot2.

    Following today’s announcement, by Ian Fellows, regarding the release of the new version of Deducer (0.4) offering a strong support for ggplot2 using a GUI plot builder, Ian also sent an e-mail where he shows how to create a rose plot using the new ggplot2 GUI included in the latest version of Deducer. After the template is made, the plot can be generated with 4 clicks of the mouse.

    Here is a video tutorial (Ian published) to show how this can be used:

    The generated template file is available at:
    http://neolab.stat.ucla.edu/cranstats/rose.ggtmpl

    I am excited about the work Ian is doing, and hope to see more people publish use cases with Deducer.

    ggplot2 plot builder is now on CRAN! (through Deducer 0.4 GUI for R)

    Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of Deducer (0.4) to CRAN (scheduled to update in the next day or so).
    This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality.

    Following is the e-mail he sent out with all the details and demo videos.

    Read more »

    Richard Stallman talk+Q&A at the useR! 2010 conference (audio files attached)

    The audio files of the full talk by Richard Stallman are attached to the end of this post.

    —————–

    Videos of all the invited talks of the useR! 2010 conference can be viewed on the R User Group blog

    —————–

    Last week I had the honor of attending the talk given by Richard Stallman, the last keynote speaker on the useR 2010 conference.  In this post I will give a brief context for the talk, and then give the audio files of the talk, with some description of what was said in the talk.

    Context for the talk

    Richard Stallman can be viewed as (one of) the fathers of free software (free as in speech, not as in beer).

    He is the man who led the GNU project for the creation of a free (as in speech, not as in beer) operation systems on the basis of which GNU-Linux, with its numerous distributions, was created.
    Richard also developed a number of pieces of widely used software, including the original Emacs,[4] the GNU Compiler Collection,[5], the GNU Debugger[6], and many tools in the GNU Coreutils

    Richard also initiated the free software movement and in October 1985 he also founded it’s formal foundation and co-founded the League for Programming Freedom in 1989.

    Stallman pioneered the concept of “copyleft” and he is the main author of several copyleft licenses including the GNU General Public License, the most widely used free software license.

    You can read about him in the wiki article titles “Richard Stallman

    The useR 2010 conference is an annual 4 days conference of the community of people using R.  R is a free open source software for data analysis and statistical computing (Here is a bit more about what is R).

    The conference this year was truly a wonderful experience for me.  I  had the pleasure of giving two talks (about which I will blog later this month), listened to numerous talks on the use of R, and had a chance to meet many (many) kind and interesting people.

    Richard Stallmans talk

    The talk took place on July 23rd 2010 at NIST U.S.  and was the concluding talk for the useR2010 conference.  The talk consisted of a two hour lecture followed by a half-hour question and answer session.

    On a personal note, I was very impressed by Richards talk.  Richard is not a shy computer geek, but rather a serious leader and thinker trying to stir people to action.  His speech was a sermon on free software, the history of GNU-Linux, the various versions of GPL, and his own history involving them.

    I believe this talk would be of interest to anyone who cares about social solidarity, free software, programming and the hope of a better world for all of us.

    I am eager for your thoughts in the comments (but please keep a kind tone).

    Here is Richard Stallmans  (2 hours) talk:

    Read more »

    Want to join the closed BETA of a new Statistical Analysis Q&A site – NOW is the time!

    The bottom line of this post is for you to go to:
    Stack Exchange Q&A site proposal: Statistical Analysis
    And commit yourself to using the website for asking and answering questions.

    (And also consider giving the contender, MetaOptimize a visit)

    * * * *

    Statistical analysis Q&A website is about to go into BETA

    A month ago I invited readers of this blog to commit to using a new Q&A website for Data-Analysis (based on StackOverFlow engine), once it will open (the site was originally proposed by Rob Hyndman).
    And now, a month later, I am happy to write that over 500 people have shown interest in the website, and choose to commit themselves. This means we we have reached 100% completion of the website proposal process, and in the next few days we will move to the next step.

    The next step is that the website will go into closed BETA for about a week. If you want to be part of this – now is the time to join (<--- call for action people).
    From being part in some other closed BETA of similar projects, I can attest that the enthusiasm of the people trying to answer questions in the BETA is very impressive, so I strongly recommend the experience.

    If you won't make it by the time you see this post, then no worries - about a week or so after the website will go online, it will be open to the wide public.

    (p.s: thanks Romunov for pointing out to me that the BETA is about to open)

    p.s: MetaOptimize

    I would like to finish this post with mentioning MetaOptimize. This is a Q&A website which is of a more “machine learning” then a “statistical” community. It also started out some short while ago, and already it has around 700 users who have submitted ~160 questions with ~520 answers given. From my experience on the site so far, I have enjoyed the high quality of the questions and answers.
    When I first came by the website, I feared that supporting this website will split the R community of users between this website and the area 51 StackExchange website.
    But after a lengthy discussion (published recently as a post) with MetaOptimize founder, Joseph Turian, I came to have a more optimistic view of the competition of the two websites. Where at first I was afraid, I am now hopeful that each of the two website will manage to draw a tiny bit of different communities of people (that would otherwise wouldn’t be present in the other website) – thus offering all of us a wider variety of knowledge to tap into.

    See you there…

    New versions for ggplot2 (0.8.8) and plyr (1.0) were released today

    As prolific as the CRAN website is of packages, there are several packages to R that succeeds in standing out for their wide spread use (and quality), Hadley Wickhams ggplot2 and plyr are two such packages.
    plyr image
    And today (through twitter) Hadley has updates the rest of us with the news:

    just released new versions of plyr and ggplot2. source versions available on cran, compiled will follow soon #rstats

    Going to the CRAN website shows that plyr has gone through the most major update, with the last update (before the current one) taking place on 2009-06-23. And now, over a year later, we are presented with plyr version 1, which includes New functions, New features some Bug fixes and a much anticipated Speed improvements.
    ggplot2, has made a tiny leap from version 0.8.7 to 0.8.8, and was previously last updated on 2010-03-03.

    Me, and I am sure many R users are very thankful for the amazing work that Hadley Wickham is doing (both on his code, and with helping other useRs on the help lists). So Hadley, thank you!

    Here is the complete change-log list for both packages:
    Read more »

    Visualization of regression coefficients (in R)

    Update (07.07.10): The function in this post has a more mature version in the “arm” package. See at the end of this post for more details.
    * * * *

    Imagine you want to give a presentation or report of your latest findings running some sort of regression analysis. How would you do it?

    This was exactly the question Wincent Rong-gui HUANG has recently asked on the R mailing list.

    One person, Bernd Weiss, responded by linking to the chapter “Plotting Regression Coefficients” on an interesting online book (I have never heard of before) called “Using Graphs Instead of Tables” (I should add this link to the free statistics e-books list…)

    Letter in the conversation, Achim Zeileis, has surprised us (well, me) saying the following

    I’ve thought about adding a plot() method for the coeftest() function in the “lmtest” package. Essentially, it relies on a coef() and a vcov() method being available – and that a central limit theorem holds. For releasing it as a general function in the package the code is still too raw, but maybe it’s useful for someone on the list. Hence, I’ve included it below.

    (I allowed myself to add some bolds in the text)

    So for the convenience of all of us, I uploaded Achim’s code in a file for easy access. Here is an example of how to use it:

    source("http://www.r-statistics.com/wp-content/uploads/2010/07/coefplot.r.txt")
     
    data("Mroz", package = "car")
    fm <- glm(lfp ~ ., data = Mroz, family = binomial)
    coefplot(fm, parm = -1)

    Here is the resulting graph:

    I hope Achim will get around to improve the function so he might think it worthy of joining his“lmtest” package. I am glad he shared his code for the rest of us to have something to work with in the meantime :)

    * * *

    Update (07.07.10):
    Thanks to a comment by David Atkins, I found out there is a more mature version of this function (called coefplot) inside the {arm} package. This version offers many features, one of which is the ability to easily stack several confidence intervals one on top of the other.

    It works for baysglm, glm, lm, polr objects and a default method is available which takes pre-computed coefficients and associated standard errors from any suitable model.

    Example:
    (Notice that the Poisson model in comparison with the binomial models does not make much sense, but is enough to illustrate the use of the function)

    library("arm")
    data("Mroz", package = "car")
    M1<-      glm(lfp ~ ., data = Mroz, family = binomial)
    M2<- bayesglm(lfp ~ ., data = Mroz, family = binomial)
    M3<-      glm(lfp ~ ., data = Mroz, family = binomial(probit))
    coefplot(M2, xlim=c(-2, 6),            intercept=TRUE)
    coefplot(M1, add=TRUE, col.pts="red",  intercept=TRUE)
    coefplot(M3, add=TRUE, col.pts="blue", intercept=TRUE, offset=0.2)

    (hat tip goes to Allan Engelhardt for help improving the code, and for Achim Zeileis in extending and improving the narration for the example)

    Resulting plot

    * * *
    Lastly, another method worth mentioning is the Nomogram, implemented by Frank Harrell’a rms package.

    Contest: Road Traffic Prediction for Intelligent GPS Navigation

    About prize baring contests

    Competition with prizes are an amazing thing. If you are not sure of that, I urge you to listened to Peter Diamandis talk about his experience with the X prize (start listening at minute 11:40):

    At short – prizes can give up to 1 to 50 ratio of return on investment of the people giving funding to the prize. The money is spent only when results are achieved. And there is a lot of value in terms of public opinion and publicity. And the best of all (for the promoter of the competition) – prizes encourage people to take risks (at their own expense) in order to get results done.

    All of that said, I look at prize baring competition as something worth spreading, especially in cases where the results of the winning team will be shared with the public.

    About the IEEE ICDM Contest

    The IEEE ICDM Contest (“Road Traffic Prediction for Intelligent GPS Navigation”), seems to be one of those cases. Due to a polite request, I am republishing here the details of this new competition, in the hope that some of my R colleagues will bring the community some pride :)
    Read more »