Archive for the ‘R community’ Category

You are currently browsing the archives for the R community category.

Blogging about R – presentation and audio

At the useR!2010 conference I had the honor of giving a (~15 minute) talk titled “Blogging about R”. The following is the abstract I submited, followed by the slides of the talk and the audio file of a recording I made of the talk (I am sad it got a bit of “hall echo”, but it’s still listenable…)

P.S: this post does not absolve me from writing up something (with many thanks and links to people) about the useR2010 conference, but I can see it taking a bit longer till I do that.

—————–

Abstract of the talk

This talk is a basic introduction to blogs: why to blog, how to blog, and the importance of the R blogosphere to the R community.

Because R is an open-source project, the R community members rely (mostly) on each other’s help for statistical guidance, generating useful code, and general moral support.

Current online tools available for us to help each other include the R mailing lists, the community R-wiki, and the R blogosphere. The emerging R blogosphere is the only source, besides the R journal, that provides our community with articles about R. While these articles are not peer reviewed, they do come in higher volume (and often are of very high quality).

According to the meta-blog R-bloggers.com, the (English) R blogosphere has produced, in January 2010, about 115 “articles” about R. There are (currently) a bit over 50 bloggers (now about 100) who write about R, with about 1000 (now ~2200) subscribers who read them daily (through e-mails or RSS). These numbers allow me to believe that there is a genuine interest in our community for more people – perhaps you? – to start (and continue) blogging about R.

In this talk I intend to share knowledge about blogging so that more people are able to participate (freely) in the R blogosphere – both as readers and as writers. The talk will have three main parts:

  • What is a blog
  • How to blog – using the (free) blogging service WordPress.com (with specific emphasis on R)
  • How to develop readership – integration with other social media/networks platforms, SEO, and other best practices

* * *
Tal Galili founded www.R-bloggers.com and blogs on www.R-statistics.com
* * *

Audio recording of the talk

(more…)


Richard Stallman talk+Q&A at the useR! 2010 conference (audio files attached)

The current hosting provider of the files couldn’t handle the work load.
I am now moving the file to a different (hopefully more robust) hosting solution.
Please come back in an hour or so to download the files.

The files are online again!
(The audio files of the full talk by Richard Stallman are attached to the end of this post.)

—————–

Last week I had the honor of attending the talk given by Richard Stallman, the last keynote speaker on the useR 2010 conference.  In this post I will give a brief context for the talk, and then give the audio files of the talk, with some description of what was said in the talk.

Context for the talk

Richard Stallman can be viewed as (one of) the fathers of free software (free as in speech, not as in beer).

He is the man who led the GNU project for the creation of a free (as in speech, not as in beer) operation systems on the basis of which GNU-Linux, with its numerous distributions, was created.
Richard also developed a number of pieces of widely used software, including the original Emacs,[4] the GNU Compiler Collection,[5], the GNU Debugger[6], and many tools in the GNU Coreutils

Richard also initiated the free software movement and in October 1985 he also founded it’s formal foundation and co-founded the League for Programming Freedom in 1989.

Stallman pioneered the concept of “copyleft” and he is the main author of several copyleft licenses including the GNU General Public License, the most widely used free software license.

You can read about him in the wiki article titles “Richard Stallman

The useR 2010 conference is an annual 4 days conference of the community of people using R.  R is a free open source software for data analysis and statistical computing (Here is a bit more about what is R).

The conference this year was truly a wonderful experience for me.  I  had the pleasure of giving two talks (about which I will blog later this month), listened to numerous talks on the use of R, and had a chance to meet many (many) kind and interesting people.

Richard Stallmans talk

The talk took place on July 23rd 2010 at NIST U.S.  and was the concluding talk for the useR2010 conference.  The talk consisted of a two hour lecture followed by a half-hour question and answer session.

On a personal note, I was very impressed by Richards talk.  Richard is not a shy computer geek, but rather a serious leader and thinker trying to stir people to action.  His speech was a sermon on free software, the history of GNU-Linux, the various versions of GPL, and his own history involving them.

I believe this talk would be of interest to anyone who cares about social solidarity, free software, programming and the hope of a better world for all of us.

I am eager for your thoughts in the comments (but please keep a kind tone).

Here is Richard Stallmans  (2 hours) talk:

(more…)


Want to join the closed BETA of a new Statistical Analysis Q&A site – NOW is the time!

The bottom line of this post is for you to go to:
Stack Exchange Q&A site proposal: Statistical Analysis
And commit yourself to using the website for asking and answering questions.

(And also consider giving the contender, MetaOptimize a visit)

* * * *

Statistical analysis Q&A website is about to go into BETA

A month ago I invited readers of this blog to commit to using a new Q&A website for Data-Analysis (based on StackOverFlow engine), once it will open (the site was originally proposed by Rob Hyndman).
And now, a month later, I am happy to write that over 500 people have shown interest in the website, and choose to commit themselves. This means we we have reached 100% completion of the website proposal process, and in the next few days we will move to the next step.

The next step is that the website will go into closed BETA for about a week. If you want to be part of this – now is the time to join (<--- call for action people).
From being part in some other closed BETA of similar projects, I can attest that the enthusiasm of the people trying to answer questions in the BETA is very impressive, so I strongly recommend the experience.

If you won't make it by the time you see this post, then no worries - about a week or so after the website will go online, it will be open to the wide public.

(p.s: thanks Romunov for pointing out to me that the BETA is about to open)

p.s: MetaOptimize

I would like to finish this post with mentioning MetaOptimize. This is a Q&A website which is of a more “machine learning” then a “statistical” community. It also started out some short while ago, and already it has around 700 users who have submitted ~160 questions with ~520 answers given. From my experience on the site so far, I have enjoyed the high quality of the questions and answers.
When I first came by the website, I feared that supporting this website will split the R community of users between this website and the area 51 StackExchange website.
But after a lengthy discussion (published recently as a post) with MetaOptimize founder, Joseph Turian, I came to have a more optimistic view of the competition of the two websites. Where at first I was afraid, I am now hopeful that each of the two website will manage to draw a tiny bit of different communities of people (that would otherwise wouldn’t be present in the other website) – thus offering all of us a wider variety of knowledge to tap into.

See you there…


StackOverFlow and MetaOptimize are battling to be the #1 “Statistical Analysis Q&A website” – to whom would you signup?

A new statistical analysis Q&A website launched

While the proposal for a statistical analysis Q&A website on area51 (stackexchange) is taking it’s time, and the website is still collecting people who will commit to it,
Joseph Turian, who seems a nice guy from his various comments online, seem to feel this website is not what the community needs and that we shouldn’t hold up on our questions for the website to go online. Therefore, Joseph is pushing with all his might his newest creation “MetaOptimize QA“, a StackOverFlow like website for (long list follows): machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization.
With all the bells and whistles that the OSQA framework (an open source stackoverflow clone, and more, system) can offer (you know, rankings, badges and so on).

Is this new website better then the area51 website? Will all the people go to just one of the two websites. or will we end up with two places that attracts more people then we had to begin with? These are the questions that come to mind when faced with the story in front of us.

My own suggestion is to try both websites (the stackoverflow statistical analysis website to come and “MetaOptimize QA“) and let time tell.

More info on this story bellow.

MetaOptimize online impact so far

The need for such a Q&A site is clearly evident. With just several days after being promoted online, MetaOptimize has claimed the eyes of almost 300 users, submitting 59 questions and 129 answers.
Already many bloggers in the statistical community have contributed their voices with encouraging posts, here is just a collection of the post I was able to find with some googling:

But is it goos to have two websites?

But wait, didn’t we just start pushing forward another statistical Q&A website two weeks ago?  I am talking about the Stack Exchange Q&A site proposal: Statistical Analysis.

So what should we (the community of statistical minded people) to do the next time we have a question?

Should we wait for Stack Exchange offer for a new website to start?  Or should we start using MetaOptimize?

Update: after lengthy e-mail exchange with Joseph (the person who founded MetaOptimize), I decided to erase what I originally wrote as my doubts, and instead give a Q&A session that him and I have had in the e-mails exchange.  It is a bit edited from what was originally, and some of the content will probably get updated – so if you are into this subject, check in again in a few hours :)


Honestly, I am split in two (and Joseph, I do hope you’ll take this in a positive way, since personally I feel confident you are a good guy).  I very strongly believe in the need and value of such a Q&A website.  Yet I am wondering how I feel about such a website being hosted as MetaOptimize and outside the hands of the stackoverflow guys.
On the one hand, open source lovers (like myself) tend to like decentralization and reliance on OSS (open source software) solutions (such as the one OSQA framework offers).  On the other hand, I do believe that the stackoverflow people  have (much) more experience in handling such websites then Joseph.  I can very easily trust them to do regular database backups, share the websites database dumps with the general community, smoothly test and upgrade to provide new features, and generally speaking perform in a more  experienced way with the online Q&A community.
It doesn’t mean that Joseph won’t do a great job, personally I hope he will.

Q&A session with Joseph Turian (MetaOptimize founder)

Tal: Let’s start with the easy question, should I worry about technical issues in the website (like, for example, backups)?

Joseph:

The OSQA team (backed by DZone) have got my back. They have been very helpful since day one to all OSQA users, and have given me a lot of support. Thanks, especially Rick and Hernani!

They provide email and chat support for OSQA users.

I will commit to putting up regular automatic database dumps, whenever the OSQA team implements it:
http://meta.osqa.net/questions/3120/how-do-i-offer-database-dumps
If, in six months, they don’t have this feature as part of their core, and someone (e.g. you) emails me reminding me that they want a dump, I will manually do a database dump and strip the user table.

Also, I’ve got a scheduled daily database dump that is mirrored to Amazon S3.

Tal: Why did you start MetaOptimize instead of supporting the area51 proposal?
Joseph:

  1. On Area51, people asked to have AI merged with ML, and ML merged with statistical analysis, but their requests seemed to be ignored. This seemed like a huge disservice to these communities.
  2. Area 51 didn’t have academics in ML + NLP. I know from experience it’s hard to get them to buy in to new technology. So why would I risk my reputation getting them to sign up for Area 51, when I know that I will get a 1% conversion? They aren’t early adopters interested in the process, many are late adopters who won’t sign up for something until they have too.
  3. If the Area 51 sites had a strong newbie bent, which is what it seemed like the direction was going, then the academic experts definitely wouldn’t waste their time. It would become a support
    community for newbies, without core expert discussion. So basically, I know that I and a lot of my colleagues wanted the site I built. And I felt like area 51 was shaping the communities really incorrectly in several respects, and was also taking a while.  I could have fought an institutional process and maybe gotten half the results above and it took a few months, or I could just build the site and invite my friends, and shape the community correctly.

Besides that, there are also personal motives:

  • I wanted the recognition for having a good vision for the community, and driving forward something they really like.
  • I wanted to experiment with some NLP and ML extensions for the Q+A software, to help organize the information better. Not possible on a closed platform.

Tal: Me (and maybe some other people) fear that this might fork the people in the field to two websites, instead of bringing them together. What are your thoughts about that?
Joseph:
How am I forking the community? I’m bringing a bunch of people in who wouldn’t have even been part of the Area 51 community.
Area 51 was going to fork it into five communities: stat analysis, ML, NLP, AI, and data mining. And then a lot fewer people would have been involved.

Tal: What are the things that people who support your website are saying?
Joseph:
Here are some quotes about my site:

Philip Resnick (UMD): “Looking at the questions being asked, the people responding, and the quality of the discussion, I can already see this becoming the go-to place for those ‘under the hood’ details
you rarely see in the textbooks or conference papers. This site is going to save a lot of people an awful lot of time and frustration.”

Aria Haghighi (Berkeley): “Both NLP and ML have a lot of folk wisdom about what works and what doesn’t. A site like this is crucial for facilitating the sharing and validation of this collective knowledge.”

Alexandre Passos (Unicamp): “Really thank you for that. As a machine learning phd student from somewhere far from most good research centers (I’m in brazil, and how many brazillian ML papers have you
seen in NIPS/ICML recently?), I struggle a lot with this folk wisdom. Most professors around here haven’t really interacted enough with the international ML community to be up to date”
(http://news.ycombinator.com/item?id=1476247)

Ryan McDonald (Google): “A tool like this will help disseminate and archive the tricks and best practices that are common in NLP/ML, but are rarely written about at length in papers.”

esoom on Reddit: “This is awesome. I’m really impressed by the quality of some of the answers, too. Within five minutes of skimming the site, I learned a neat trick that isn’t widely discussed in the literature.”
(http://www.reddit.com/r/MachineLearning/comments/ckw5k/stackoverflow_for_machine_learning_and_natural/c0tb3gc)

Tal: In order to be fair to area51 work, they have gotten wonderful responses for the “statistical analysis” proposal as well (see it here)
I have also contacted area51 directly and asked them and invited them to come and join the discussion. I’ll update this post with their reply.

So what’s next?

I don’t know.
If the Stack Exchange website where to launch today, I would probably focus on using it and hint to the site for MetaOptimize (for the reasons I just mentioned, and also for some that Rob Hyndman maintained when he first wrote on the subject).
If the stack exchange version of the website where to start in a few weeks, I would probably sit on the fence and see if people are using it.  I suspect that by that time, there wouldn’t be many people left to populate it (but I could always be wrong).
And what if the website where to start in a week, what then?  I have no clue.

Good question.
My current feeling is that I am glad to let this play out.
It seems this is a good case study for some healthy competition between platforms and models (OSQA vs stackoverflow/area51-system) – one that I hope will generate more good features from both companies. And also will make both parties work hard to get people to participate.
It also seems that this situation is getting many people in our field to be approached with the same idea (Q&A website). After Joseph input on the subject, I am starting to think that maybe at the end of the day this will benefit all of us. Instead of forking one community into two, maybe what we’ll end up with is getting more (experienced) people online (into two locations) that would otherwise would have stayed in the shadows.

The verdict is still out, but I am a bit more optimistic than I was when first writing this post. I’ll update this post after getting more input from people.

And as always – I would love to know your thoughts on the subject.


A new Q&A website for Data-Analysis (based on StackOverFlow engine) – is waiting for you

The bottom line of this post is for you to go to:
Stack Exchange Q&A site proposal: Statistical Analysis
And commit yourself to using the website for asking and answering questions.
144 peoples already committed to using the website, we need 356 more… :-)
If you are looking for the reasons to do so – read on…

What is the StackOverFlow Q&A website about?

StackOverFlow.com (“SO” for short) is a programming Q & A site that’s free. Free to ask questions, free to answer questions, free to read. Free, And fast.

For the R community, SO offers a growing database of R related questions and answer (click the link to check them out).

You might be asking yourself what’s so special about SO over other available resources such as R mailing lists, R blogs, R wiki and so on?
That is a great question.

The answer is that SO succeeds in doing a great job synthesizing aspects of Wikis, Blogs, Forums, and Digg/Reddit to offer a very powerful Q&A website.

In SO, the new questions are like forum/blog posts (A main text with comments/answers). After someone answers a question, other users can give a thumb-up or a thumb-down to the answer (like digg/reddit). And all content can be edited, like a wiki page, by the users (provided the user has enough “karma points”).
You also get badges (“awards”) for a bunch of actions (like coming to the website every day for a month. Giving an answer that got X amount of thumb-ups and so on). The awards allows someone who is asking a question to see how much the person who had answered him has good reputation (in terms of acceptance/appreciation of his answers by other SO members).
It also offers a small (but effective) ego-boost for the person who gives answers.

So if StackOverFlow is so great – what is this new website you wrote about in the title?

Well, StackOverFlow has one limitation. It deals ONLY with programming questions. Other questions like:

  • Which of the following three graphics best displays this data set? Why?
  • Can you give an example of where I might prefer to use a z-test vs a t-test?
  • What is the relationship between Bayesian and neural networks?

Will not be answered, and the threads will get closed as being “off topic”. Why? because such questions are dealing with: statistics, data analysis, data mining, data visualization – But in no means in programming.

So there is no StackOverFlow-like Q&A website for data analysis… Until now!

In the past few weeks, Rob Hyndman and other users, have made much effort to push the creation of a new website, based on the StackOverFlow engine, to allow for statistically related Q&A.
His proposal for a new website is almost complete. All it need is for you (yes you), to go to the following link:
Stack Exchange Q&A site proposal: Statistical Analysis
And commit yourself to the website (that is, click the button called “commit” – so to declare that you will have interest in reading, asking and answering questions on such a website)

Once a few more tens 379 more people will commit – the website will go online!

Hope to see you there.


June 20, online Registration deadline for useR! 2010

useR!2010 is coming. I am going to give two talks there (I will write more of that soon), but in the meantime, please note that the online registration deadline is coming to an end.

This was published on the R-help mailing list today:

————-

The final registration deadline for the R User Conference is June 20,
2010, one week away.  Later registration will not be possible on site!

Conference webpage:  http://www.R-project.org/useR-2010
Conference program: http://www.R-project.org/useR-2010/program.html

Registration:
http://www.R-project.org/useR-2010/registration/registration.html

The conference is scheduled for July 21-23, 2010, and will take place at
the campus of the National Institute of Standards and Technology (NIST) in
Gaithersburg, Maryland, USA.

(more…)


useR-2010 is looking for a T-shirt design

Katharine Mullen has just published on the R mailing list a call for designeRs who might be willing to design a T-shirt aRt design for the shirt that will be given in useR 2010.

I consider such contests as one of those good-for-the-community things, and hope regular useRs, R bloggers, and companies that are based on R – will consider spreading the word, participating in it (and maybe even offer more bonuses to the designers).

If you design something and put it on picasa or flickr, please tag it with “useR2010Tshirt” (and consider leaving a comment with a link to the design), so there could later be a follow up on your work. Even if you don’t “win” you will get positive “karma points” from the community :-) .

Here are the competition details, as published in the mailing list:
(more…)


An article attacking R gets responses from the R blogosphere – some reflections

In this post I reflect on the current state of the R blogosphere, and share my hopes for it’s future.

* * *

Background

I am very grateful to Dr. AnnMaria De Mars for writing her post “The Next Big Thing”.
In her post, Dr. De Mars attacked R by accusing it of being “an epic fail” (in being user-friendly) and “NOT the next big thing”. Of course one should look at Dr. De Mars claims in their context. She is talking about particular aspects in which R fails (the lacking of a mature GUI for non-statisticians), and had her own (very legitimate) take on where to look for “the next big thing”. All in all, her post was decent, and worth contemplating upon respectfully (even if one, me for example, doesn’t agree with all of Dr. De Mars claims.)

R bloggers are becoming a community

But Dr. De Mars post is (very) important for a different reason. Not because her claims are true or false, but because her writing angered people who love and care for R (whether legitimately or not, it doesn’t matter). Anger, being a very powerful emotion, can reveal interesting things. In our case, it just showed that R bloggers are connected to each other.

So far there are 69 R bloggers who wrote in reply to Dr. De Mars post (some more kind then others), they are:

  • R and the Next Big Thing by David Smith
  • This is good news, since it shows that R has a community of people (not “just people”) who write about it.
    In one of the posts, someone commented about how R current stage reminds him of how linux was in 1998, and how he believes R will grow to be amazingly dominant in the next 10 years.
    In the same way, I feel the R blogosphere is just now starting to “wake up” and become aware that it exists. Already 6 bloggers found they can write not just about R code, but also reply to does who “attack” R (in their view). Imagine how the R blogosphere might look in a few years from now…

    I would like to end with a more general note about the importance of R bloggers collaboration to the R ecosystem.

    (more…)


    The “Future of Open Source” Survey – an R user’s thoughts and conclusions

    Over a month ago, David Smith published a call for people to participate in the “Future of Open Source” Survey. 550 people (and me) took the survey, and today I got an e-mail with the news that the 2010 survey results are analysed and where published in the “Future.Of.Open.Source blog” In the following (38 slides) presentation:

    I would like to thank Bryan House and anyone else who took part in making this survey, analyzing and publishing it’s results.

    The presentation has left me with some thoughts and conclusions, I would like to share with you here.

    (more…)


    Announcing R-bloggers.com: a new R news site (for bloggers by bloggers)

    I already wrote about R-bloggers on the R mailing list, so it only seems fitting to write about it more here. I will explain what R-bloggers is and then move to explain what I hope it will accomplish.

    R-Bloggers.com is a central hub of content collected from bloggers who write about R (in English) and if you are an R blogger you can join it by filling in this form.

    I built the site with the aspiration to  help R bloggers and users to connect and follow the “R blogosphere”. When I am writing these words, R-bloggers already has 17 blogs in it, and I hope for many (many) more.

    How does R-Bloggers operate? This site aggregates feeds (only with permission!) from participating R blogs. The beginnings of each participating blog’s posts will automatically be displayed on the main page with links to the original posts; inside every post there is a link to the original blog and links to other related articles. While all participating blogs have links in the “Contributors” section of our sidebar

    What does R-Bloggers offer it’s visitors?

    • Discover (for all): Find new R blogs you didn’t know about. And Search in them for content you want.
    • Follow (for people who don’t use RSS): Enter your e-mail and subscribe to receive a daily digest with teasers of new posts from participating blogs. You will more easily get a sense of hot topics in the R blogosphere.
    • Connect (for facebook users): Click on “Fan this site” to become a “fan” of R Bloggers. You can then “friend” other people and share thoughts on our wall. Or just by leaving comments on the blog.
    • Participate (for bloggers): Add your R blog to get increased visibility (for readers and search engines) with permanent links on our Contributors sidebar. Your blog will also gain visibility via our e-mail digest and through your presence on the main page with posts.

    Who started R-Bloggers (and way)? R Bloggers was started by Tal Galili (well, me).  After searching for numerous R blogs I decided that there must be more R blogs our there then he knows about, and maybe the best way for finding them is to make them find him.

    After writing about it in the R mailing list, I got some good feedbacks but also questions about why use only R blogs and not all the R feeds that exist. Who is the website actually for (when there are services like Google reader for us to read our feeds with), and what am I hoping it will do. So here is what I answered:

    For me there are two audiences:
    One is that of the web 2.0 power users. That is, people who know what RSS is and use it, maybe evern write their own blogs. These people have only one problem (as I see it) that R-bloggers tries to solve, and that is to know who else lives in their ecosystem. Who else they should follow.
    For that, google reader recommendation system is great, but not enough. A much better system is if there was a one place where all R bloggers would go, write down their website, and all of us would know they exist. That is what R-bloggers offers for the power users. I think this is also why over 20 of them subscribed to the site RSS feed.
    BTW, The origin of this idea came to me when I was trying to find all the dance bloggers for my wife (who is a dance researcher and blogger herself). After a while we started http://www.dancebloggers.com/ while knowing of only 10 bloggers. They list now has over 80 bloggers, most of which we would have not known about without this hub.
    The same thing I am trying to do for the R community, that is way I hope more R bloggers would write about the service – so their network of readers which includes other R bloggers would add themselves and we will all know about them.
    If that was my only purpose, a simple directory would have been enough. But I also have a second one and that is to help the second audience.

    The second audience I am thinking of are people of our community who are not so much early adopters (and actually quite late adapters) of the new facilities that the new web (a.k.a: web 2.0) provides.
    To them the all RSS thing is too much to look at, and they are used to e-mails. And because of that they are (until now) disconected from many of the R bloggers out there, simply because it is in-efficient for them to go through all these blogs each day (or even week). So for them, to see all the content in one place (and even get an e-mail about it) would be (I hope) a service. I believe that’s why 5 of them (so far) has subscribed via e-mail.
    I also hope teachers will direct their students to this as a resource for getting a sense of what people who are using R are doing.
    Another thing that hints me about the R community is seeing how the “facebook fan box” is still empty. Which tells me that (sadly) very few R users are actively using facebook as a means for connecting with the outer networks of people out there.

    All I wrote also explains why R-bloggers will only take feeds of bloggers and only (as much as can be said) their posts that are centered around R (hence the website name :) ).
    It both follows what Gabor talked about – having a site who’s content is only about R. But also what I wish, which is to have “content” in the sense of articles to read (mostly). And not so much things like news feeds of wikipedia or new packages published.

    I hope this post will both notify people about this new resource, encourage more R bloggers to join, and will help for future people to better understand what this R-Bloggers thing is all about :-)