StackOverFlow and MetaOptimize are battling to be the #1 "Statistical Analysis Q&A website” – to whom would you signup?

A new statistical analysis Q&A website launched

While the proposal for a statistical analysis Q&A website on area51 (stackexchange) is taking it’s time, and the website is still collecting people who will commit to it,
Joseph Turian, who seems a nice guy from his various comments online, seem to feel this website is not what the community needs and that we shouldn’t hold up on our questions for the website to go online. Therefore, Joseph is pushing with all his might his newest creation “MetaOptimize QA“, a StackOverFlow like website for (long list follows): machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization.
With all the bells and whistles that the OSQA framework (an open source stackoverflow clone, and more, system) can offer (you know, rankings, badges and so on).

Is this new website better then the area51 website? Will all the people go to just one of the two websites. or will we end up with two places that attracts more people then we had to begin with? These are the questions that come to mind when faced with the story in front of us.

My own suggestion is to try both websites (the stackoverflow statistical analysis website to come and “MetaOptimize QA“) and let time tell.

More info on this story bellow.

MetaOptimize online impact so far

The need for such a Q&A site is clearly evident. With just several days after being promoted online, MetaOptimize has claimed the eyes of almost 300 users, submitting 59 questions and 129 answers.
Already many bloggers in the statistical community have contributed their voices with encouraging posts, here is just a collection of the post I was able to find with some googling:

But is it goos to have two websites?

But wait, didn’t we just start pushing forward another statistical Q&A website two weeks ago?  I am talking about the Stack Exchange Q&A site proposal: Statistical Analysis.

So what should we (the community of statistical minded people) to do the next time we have a question?

Should we wait for Stack Exchange offer for a new website to start?  Or should we start using MetaOptimize?

Update: after lengthy e-mail exchange with Joseph (the person who founded MetaOptimize), I decided to erase what I originally wrote as my doubts, and instead give a Q&A session that him and I have had in the e-mails exchange.  It is a bit edited from what was originally, and some of the content will probably get updated – so if you are into this subject, check in again in a few hours 🙂


Honestly, I am split in two (and Joseph, I do hope you’ll take this in a positive way, since personally I feel confident you are a good guy).  I very strongly believe in the need and value of such a Q&A website.  Yet I am wondering how I feel about such a website being hosted as MetaOptimize and outside the hands of the stackoverflow guys.
On the one hand, open source lovers (like myself) tend to like decentralization and reliance on OSS (open source software) solutions (such as the one OSQA framework offers).  On the other hand, I do believe that the stackoverflow people  have (much) more experience in handling such websites then Joseph.  I can very easily trust them to do regular database backups, share the websites database dumps with the general community, smoothly test and upgrade to provide new features, and generally speaking perform in a more  experienced way with the online Q&A community.
It doesn’t mean that Joseph won’t do a great job, personally I hope he will.

Q&A session with Joseph Turian (MetaOptimize founder)

Tal: Let’s start with the easy question, should I worry about technical issues in the website (like, for example, backups)?

Joseph:

The OSQA team (backed by DZone) have got my back. They have been very helpful since day one to all OSQA users, and have given me a lot of support. Thanks, especially Rick and Hernani!

They provide email and chat support for OSQA users.

I will commit to putting up regular automatic database dumps, whenever the OSQA team implements it:
http://meta.osqa.net/questions/3120/how-do-i-offer-database-dumps
If, in six months, they don’t have this feature as part of their core, and someone (e.g. you) emails me reminding me that they want a dump, I will manually do a database dump and strip the user table.

Also, I’ve got a scheduled daily database dump that is mirrored to Amazon S3.

Tal: Why did you start MetaOptimize instead of supporting the area51 proposal?
Joseph:

  1. On Area51, people asked to have AI merged with ML, and ML merged with statistical analysis, but their requests seemed to be ignored. This seemed like a huge disservice to these communities.
  2. Area 51 didn’t have academics in ML + NLP. I know from experience it’s hard to get them to buy in to new technology. So why would I risk my reputation getting them to sign up for Area 51, when I know that I will get a 1% conversion? They aren’t early adopters interested in the process, many are late adopters who won’t sign up for something until they have too.
  3. If the Area 51 sites had a strong newbie bent, which is what it seemed like the direction was going, then the academic experts definitely wouldn’t waste their time. It would become a support
    community for newbies, without core expert discussion. So basically, I know that I and a lot of my colleagues wanted the site I built. And I felt like area 51 was shaping the communities really incorrectly in several respects, and was also taking a while.  I could have fought an institutional process and maybe gotten half the results above and it took a few months, or I could just build the site and invite my friends, and shape the community correctly.

Besides that, there are also personal motives:

  • I wanted the recognition for having a good vision for the community, and driving forward something they really like.
  • I wanted to experiment with some NLP and ML extensions for the Q+A software, to help organize the information better. Not possible on a closed platform.

Tal: Me (and maybe some other people) fear that this might fork the people in the field to two websites, instead of bringing them together. What are your thoughts about that?
Joseph:
How am I forking the community? I’m bringing a bunch of people in who wouldn’t have even been part of the Area 51 community.
Area 51 was going to fork it into five communities: stat analysis, ML, NLP, AI, and data mining. And then a lot fewer people would have been involved.

Tal: What are the things that people who support your website are saying?
Joseph:
Here are some quotes about my site:

Philip Resnick (UMD): “Looking at the questions being asked, the people responding, and the quality of the discussion, I can already see this becoming the go-to place for those ‘under the hood’ details
you rarely see in the textbooks or conference papers. This site is going to save a lot of people an awful lot of time and frustration.”

Aria Haghighi (Berkeley): “Both NLP and ML have a lot of folk wisdom about what works and what doesn’t. A site like this is crucial for facilitating the sharing and validation of this collective knowledge.”

Alexandre Passos (Unicamp): “Really thank you for that. As a machine learning phd student from somewhere far from most good research centers (I’m in brazil, and how many brazillian ML papers have you
seen in NIPS/ICML recently?), I struggle a lot with this folk wisdom. Most professors around here haven’t really interacted enough with the international ML community to be up to date”
(http://news.ycombinator.com/item?id=1476247)

Ryan McDonald (Google): “A tool like this will help disseminate and archive the tricks and best practices that are common in NLP/ML, but are rarely written about at length in papers.”

esoom on Reddit: “This is awesome. I’m really impressed by the quality of some of the answers, too. Within five minutes of skimming the site, I learned a neat trick that isn’t widely discussed in the literature.”
(http://www.reddit.com/r/MachineLearning/comments/ckw5k/stackoverflow_for_machine_learning_and_natural/c0tb3gc)

Tal: In order to be fair to area51 work, they have gotten wonderful responses for the “statistical analysis” proposal as well (see it here)
I have also contacted area51 directly and asked them and invited them to come and join the discussion. I’ll update this post with their reply.

So what’s next?

I don’t know.
If the Stack Exchange website where to launch today, I would probably focus on using it and hint to the site for MetaOptimize (for the reasons I just mentioned, and also for some that Rob Hyndman maintained when he first wrote on the subject).
If the stack exchange version of the website where to start in a few weeks, I would probably sit on the fence and see if people are using it.  I suspect that by that time, there wouldn’t be many people left to populate it (but I could always be wrong).
And what if the website where to start in a week, what then?  I have no clue.

Good question.
My current feeling is that I am glad to let this play out.
It seems this is a good case study for some healthy competition between platforms and models (OSQA vs stackoverflow/area51-system) – one that I hope will generate more good features from both companies. And also will make both parties work hard to get people to participate.
It also seems that this situation is getting many people in our field to be approached with the same idea (Q&A website). After Joseph input on the subject, I am starting to think that maybe at the end of the day this will benefit all of us. Instead of forking one community into two, maybe what we’ll end up with is getting more (experienced) people online (into two locations) that would otherwise would have stayed in the shadows.

The verdict is still out, but I am a bit more optimistic than I was when first writing this post. I’ll update this post after getting more input from people.

And as always – I would love to know your thoughts on the subject.

22 thoughts on “StackOverFlow and MetaOptimize are battling to be the #1 "Statistical Analysis Q&A website” – to whom would you signup?”

  1. You’re guys are such a fan boys… So you can trust stack exchange to do regular backups and provide new features etc, etc. Do you trust them to keep the site alive? What guarantees you have that they won’t be altering their business model again and leave everyone holding their pants once again?

    1. Hernâni, the same argument could have also been true for Joseph website. I don’t think SO wants to damage people, I think they are making an effort to create a good service – because it is both in their best interests (and because it is fun).
      And yes, I trust them to try and keep the site alive (as I do of Joseph, just as well).

  2. The most important part of a Q+A site is the community. As Chris Manning (Stanford NLP professor) said, Area 51 (StackOverflow) never got the buy-in from the academic community. I have used my academic connections to form a solid core user base, from which I am expanding. This is the main reason to use MetaOptimize.

    But the main mistake Area 51 makes, in my mind, was niching the sites too distinctly. They wanted a statistical analysis site, a data mining site, an NLP site, a ML site, and an AI site. Why should these be separate? The last thing we need is for ML people and statistics people to talk LESS! Or for NLP and ML communities not to interact! The fact that these groups attend different conferences is a BUG not a feature. My site was designed so that these adjacent fields can crosspolinate information. As we’ve learned from StackOverflow, it’s better to pick a broad topic and have a lot of experts in one place, communicating and helping with each other’s problems.

    I definitely agree that the StackOverflow people have more experience running a large website than me. Luckily, the OSQA team (backed by DZone) also has a lot of experience in running large websites, and they’ve got my back. They have been very helpful since day one to all OSQA users, and have given me a lot of support. Thanks, especially Rick and Hernani!

    1. No better way to learn than by jumping in. There’s very little that will go wrong for you running the site as long as you regularly back it up. Even downtime isn’t a dealbreaker (as long as it’s not excessive or frequent).

      1. Cullen,
        I mostly agree with what you say (although I would hope to discover that this is not Joseph first experience with such a website. Still from what he writes I gather that he has support from the OSQA, which makes me much more calm about this)

    2. Hi Joseph.

      I wrote to you at length in private, but I will just respond here to the points you have made:

      1) Community – I think both sites will have a community.

      2) Over diversity in area51 (copying someone else’s reply) –
      I don’t think that it’s a foregone conclusion that Area51 generates niche sites, just because it generates niche site proposals. I for one believe that it’s the big blockbuster stackoverflow-size sites that will actually make it to the end of the Area 51 process and will actually get created. One that will probably be similar to your vision as well.

      3) I am glad you god the OSQA support on this. It really makes me feel better, and once you answer my e-mail with more details – I would add it to my post.

      Lastly, I didn’t write this post to make one website look better then the other. But because I thought that it would make the situation
      1) more clear (this way or the other., I didn’t know).
      2) Get publicity for both websites (which it did…). Thus helping both MetaOptimize and area51 to get more traffic and grow (which one it would support more, I wasn’t sure)

  3. What’s important to me is having users who are experts in that long list of fields and who will actively participate in discussions, hopefully generating a body of knowledge that others will later appreciate.

    When choosing a Q&A website I would personally put much more emphasis on such a community, rather than regular back-ups (I hadn’t even thought of that issue before reading this post). I think that technical issues that you present are secondary to other (much more difficult to solve!) problems such as convincing too-busy-researchers to contribute, or maintaining certain quality standards, or even just making sure the site stays active beyond the first couple of weeks of publicity. Buying space on Amazon S3 is easy; convincing my PhD advisor to share his knowledge is hard.

    Disclaimer: I know and work with (but not for 🙂 ) Joseph.

    1. Hi Dumitru,
      Thank you for the disclaimer!

      1) I think backups are VERY important – but in my post they where just an example of a technical side (which also factors here, it is not as small as you make it – when dealing with big websites).
      2) I fully agree with you that the bigger issue is who is willing to participate. Yet making it as if MetaOptimize has “the place of the community” is ignoring that over 350 people commited to the area51 website (people some of which I know to be very strong in their field). So the issue of getting the know-how people to help is, I agree, the big issue. But it would be the more or less the same in both websites (since we can see both got support). Or in other words, if what it takes for someone like you or Joseph to talk to the smart people you know and get them online is to do it on your own hosted website – that is fine by me. But it by itself is not enough to make me think this is “the place” of this community (especially since the community is built, probably, from many sub communities you didn’t yet reach – for example all the people in my statistical department).

  4. There does seem to be some differences in the topic areas, with the Statistical Analysis Stack Exchange site leaning towards the Statistics community and MetaOptimize leaning towards the Machine Learning community. Perhaps there’s enough distinction in the types of questions for both sites to thrive? I certain intend to keep using Stack Overflow to ask R programming questions, which are generally off-topic for either of these boards.

    1. Harlen,
      Very good point – it might turn out that way in the end.
      And Joseph, I am not sure this website would solve this gap – although I do value the effort.

  5. Wasn’t there an issue with Jeff Atwood (StackOverflow guy) losing two websites due to a bad backup policy? Seems like an unfair metric to base something off of.

    1. Turns out you’re right.

      “Jeff Atwood, one of the two founders behind the popular StackOverflow question-and-answer site, has suffered a complete loss of his entire blog archive including both his personal Coding Horror blog and the StackOverflow blog site. Given Atwood has admonished people in the past for not following his advice on backup regimes what went wrong?”

      Article

    2. Cullen,

      First – I’ll tell you that a person whom I would trust more then anyone to backup, is the person who once didn’t.

      But my real answers are:
      1) I didn’t know this when writing the post. and,
      2) here is the reply on the same issue from someone on hacker news: “I believe the data loss was on Jeff Atwood’s personal blog codinghorror.com, not Stack Overflow, and it was due to outsourcing the hosting/backup to an (incompetent) third party.”
      I do agree that he should have still done backups but lastly,
      3) Stackoverflow releases database dumps – so they are backedup. I didn’t see Joseph writing that he’ll do that (I hope now he will)

  6. Frankly I think he’ll provide a better final result focused on the target community, especially because he comes from that community. The SO guys will give you a one-size-fits-themselves solution. You’ll have to beg and plead and fight for any changes, and they’ll ignore it unless it pleases them to fix it.

    Backups? What are the odds that it’ll become a problem after this discussion? Very low, I’d say.

    1. Hi Richard,

      Regarding the backups – if you are right then at least one good thing came out from my post 🙂

      Regarding the “better final result” claim – I don’t know. That is way (since we are now late in the game, with people committed to both websites) I would much rather see both websites open and moving forward – and in a month or two be more insightful about how each are performing.

      Cheers 🙂
      Tal

  7. What about semanticoverflow.com?
    That site has been up for a while and has a small but knowledgable and active user base.

    Have a question on semantic technologies, artificial intelligence, machine learning, etc?

    Try stackoverflow.com as well as the others.

    I am in no way affiliated with them, other than being a user.

    1. Hi Bill, thanks for the link, I never knew about them.
      Are they affiliated with SO (I wasn’t able to see it in the FAQ)? I wonder who is behind the website.

      In any case, I don’t think this website would go. Going through this website tag cloud, it doesn’t seem any people there are asking/answering there any questions regarding statistical analysis.
      Also, go through the questions on MetaOptimize – I can’t imagine any of them getting answered on that website. Same true for the other (simpler) questions presented in the proposition in area51.

      But again, thanks for the link (I have just the friend to send there :))

  8. Tal and Joseph: thanks for the discussion. My aim in proposing the statistical analysis site on SE was for it to cover statistics *and* data mining, machine learning, etc. That is, I wanted it to include all the groups involved in data science. I think it would be a great shame if we end up with several Q&A sites with each group favouring a different one. I guess we’ll have to see how it pans out.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.