<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R-statistics blog &#187; statistics</title>
	<atom:link href="http://www.r-statistics.com/on/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.r-statistics.com</link>
	<description>Writing about statistics with R, and open source stuff (software, data, community)</description>
	<lastBuildDate>Mon, 30 Jan 2012 07:45:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Diagram for a Bernoulli process (using R)</title>
		<link>http://www.r-statistics.com/2011/11/diagram-for-a-bernoulli-process-using-r/</link>
		<comments>http://www.r-statistics.com/2011/11/diagram-for-a-bernoulli-process-using-r/#comments</comments>
		<pubDate>Thu, 10 Nov 2011 12:44:41 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[Bernoulli process]]></category>
		<category><![CDATA[binomial distribution]]></category>
		<category><![CDATA[distribution]]></category>
		<category><![CDATA[statistical distribution]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=829</guid>
		<description><![CDATA[A Bernoulli process is a sequence of Bernoulli trials (the realization of n binary random variables), taking two values (0/1, Heads/Tails, Boy/Girl, etc&#8230;). It is often used in teaching introductory probability/statistics classes about the binomial distribution. When visualizing a Bernoulli process, it is common to use a binary tree diagram in order to show the [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2011/11/diagram-for-a-bernoulli-process-using-r/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2011/11/diagram-for-a-bernoulli-process-using-r/"></g:plusone></div></div><p>A Bernoulli process is a sequence of Bernoulli trials (the realization of n binary random variables), taking two values (0/1, Heads/Tails, Boy/Girl, etc&#8230;).  It is often used in teaching introductory probability/statistics classes about the binomial distribution.</p>
<p>When visualizing a Bernoulli process, it is common to use a binary tree diagram in order to show the progression of the process, as well as the various consequences of the trial.  We might also include the number of &#8220;successes&#8221;, and the probability for reaching a specific terminal node.</p>
<p>I wanted to be able to create such a diagram using R.  For this purpose I composed some code which uses the {<a href="http://cran.r-project.org/web/packages/diagram/">diagram</a>} R package.  The final function should allow one to create different sizes of diagrams, while allowing flexibility with regards to the text which is used in the tree.</p>
<p>Here is an example of the simplest use of the function:</p>

<div class="wp_codebox"><table><tr id="p8294"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p829code4"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game_.r.txt&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># loading the function</span>
binary.<span style="">tree</span>.<span style="">for</span>.<span style="">binomial</span>.<span style="">game</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># creating a tree for B(2,0.5)</span></pre></td></tr></table></div>

<p>The resulting diagram will look like this:</p>
<p><a href="http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game001.png"><img src="http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game001-300x257.png" alt="" title="binary.tree.for.binomial.game001" width="300" height="257" class="alignnone size-medium wp-image-832" /></a></p>
<p>The same can be done for creating larger trees.  For example, here is the code for a 4 stage Bernoulli process:</p>

<div class="wp_codebox"><table><tr id="p8295"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p829code5"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game_.r.txt&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># loading the function</span>
binary.<span style="">tree</span>.<span style="">for</span>.<span style="">binomial</span>.<span style="">game</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">4</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># creating a tree for B(4,0.5)</span></pre></td></tr></table></div>

<p>The resulting diagram will look like this:</p>
<p><a href="http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game-BIG.png"><img src="http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game-BIG-300x150.png" alt="" title="binary.tree.for.binomial.game - BIG" width="300" height="150" class="alignnone size-medium wp-image-830" /></a></p>
<p>The function can also be tweaked in order to describe a more specific story.  For example, the following code describes a 3 stage Bernoulli process where an unfair coin is tossed 3 times (with probability of it giving heads being 0.8):</p>

<div class="wp_codebox"><table><tr id="p8296"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p829code6"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game_.r.txt&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># loading the function</span>
binary.<span style="">tree</span>.<span style="">for</span>.<span style="">binomial</span>.<span style="">game</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">3</span>, <span style="color: #ff0000;">0.8</span>, first_box_text <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Tossing an unfair coin&quot;</span>, <span style="color: #ff0000;">&quot;(3 times)&quot;</span><span style="color: #080;">&#41;</span>, left_branch_text <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Failure&quot;</span>, <span style="color: #ff0000;">&quot;Playing again&quot;</span><span style="color: #080;">&#41;</span>, right_branch_text <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Success&quot;</span>, <span style="color: #ff0000;">&quot;Playing again&quot;</span><span style="color: #080;">&#41;</span>, 
    left_leaf_text <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Failure&quot;</span>, <span style="color: #ff0000;">&quot;Game ends&quot;</span><span style="color: #080;">&#41;</span>, right_leaf_text <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Success&quot;</span>, 
        <span style="color: #ff0000;">&quot;Game ends&quot;</span><span style="color: #080;">&#41;</span>, cex <span style="color: #080;">=</span> <span style="color: #ff0000;">0.8</span>, rescale_radx <span style="color: #080;">=</span> <span style="color: #ff0000;">1.2</span>, rescale_rady <span style="color: #080;">=</span> <span style="color: #ff0000;">1.2</span>, 
    box_color <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;lightgrey&quot;</span>, shadow_color <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;darkgrey&quot;</span>, left_arrow_text <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Tails <span style="color: #000099; font-weight: bold;">\n</span>(P = 0.2)&quot;</span><span style="color: #080;">&#41;</span>, 
    right_arrow_text <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Heads <span style="color: #000099; font-weight: bold;">\n</span>(P = 0.8)&quot;</span><span style="color: #080;">&#41;</span>, distance_from_arrow <span style="color: #080;">=</span> <span style="color: #ff0000;">0.04</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>The resulting diagram is:</p>
<p><a href="http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game002.png"><img src="http://www.r-statistics.com/wp-content/uploads/2011/11/binary.tree_.for_.binomial.game002-300x257.png" alt="" title="binary.tree.for.binomial.game002" width="300" height="257" class="alignnone size-medium wp-image-833" /></a></p>
<p>If you make up neat examples of using the code (or happen to find a bug), or for any other reason &#8211; you are <strong>welcome to leave a comment</strong>.</p>
<p>(note: the images above are licensed under CC BY-SA)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2011/11/diagram-for-a-bernoulli-process-using-r/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Article about plyr published in JSS, and the citation was added to the new plyr (version 1.5)</title>
		<link>http://www.r-statistics.com/2011/04/article-about-plyr-published-in-jss-and-the-citation-was-added-to-the-new-plyr-version-1-5/</link>
		<comments>http://www.r-statistics.com/2011/04/article-about-plyr-published-in-jss-and-the-citation-was-added-to-the-new-plyr-version-1-5/#comments</comments>
		<pubDate>Mon, 11 Apr 2011 15:36:26 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[citation]]></category>
		<category><![CDATA[Hadley Wickham]]></category>
		<category><![CDATA[packages]]></category>
		<category><![CDATA[plyr]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=682</guid>
		<description><![CDATA[The plyr package (by Hadley Wickham) is one of the few R packages for which I can claim to have used for all of my statistical projects. So whenever a new version of plyr comes out I tend to be excited about it (as was when version 1.2 came out with support for parallel processing) [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2011/04/article-about-plyr-published-in-jss-and-the-citation-was-added-to-the-new-plyr-version-1-5/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2011/04/article-about-plyr-published-in-jss-and-the-citation-was-added-to-the-new-plyr-version-1-5/"></g:plusone></div></div><p>The <a href="http://cran.r-project.org/web/packages/plyr/index.html">plyr package</a> (by <a href="http://had.co.nz/">Hadley Wickham</a>) is one of the few R packages for which I can claim to have used for all of my statistical projects.  So whenever a new version of plyr comes out I tend to be excited about it (as was when <a href="http://www.r-statistics.com/2010/09/using-the-plyr-1-2-package-parallel-processing-backend-with-windows/">version 1.2 came out with support for parallel processing</a>) </p>
<p>So it is no surprise that the new release of plyr 1.5 got me curious.  While going through the<a href="http://cran.r-project.org/web/packages/plyr/NEWS"> news file </a>with the new features and bug fixes, I noticed how (quietly) Hadley has also released (6 days ago) another version of plyr prior to 1.5 which was numbered 1.4.1.  That version included only one more function, but a very important one &#8211; a new citation reference for when using the plyr package.  Here is how to use it:</p>

<div class="wp_codebox"><table><tr id="p6828"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p682code8"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">install.<span style="">packages</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;plyr&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># so to upgrade to the latest release</span>
<span style="color: #0000FF; font-weight: bold;">citation</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;plyr&quot;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>The output gives both a simple text version as well as a BibTeX entry for LaTeX users.  Here it is (notice the download link for yourself to read):</p>
<blockquote><p>To cite plyr in publications use:<br />
<strong>  Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data<br />
  Analysis. Journal of Statistical Software, 40(1), 1-29. URL<br />
  <a href="http://www.jstatsoft.org/v40/i01">http://www.jstatsoft.org/v40/i01/</a>.<br />
</strong></p></blockquote>
<p>I hope to see more R contributers and users will make use of the ?citation() function in the future.</p>
<p><a href="http://www.r-statistics.com/wp-content/uploads/2011/04/plyr-240px-Cutting_tool_1b.jpg"><img src="http://www.r-statistics.com/wp-content/uploads/2011/04/plyr-240px-Cutting_tool_1b.jpg" alt="" title="plyr - 240px-Cutting_tool_1b" width="240" height="445" class="alignright size-full wp-image-684" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2011/04/article-about-plyr-published-in-jss-and-the-citation-was-added-to-the-new-plyr-version-1-5/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Book review: 25 Recipes for Getting Started with R</title>
		<link>http://www.r-statistics.com/2011/02/book-review-25-recipes-for-getting-started-with-r/</link>
		<comments>http://www.r-statistics.com/2011/02/book-review-25-recipes-for-getting-started-with-r/#comments</comments>
		<pubDate>Thu, 24 Feb 2011 13:45:29 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[book review]]></category>
		<category><![CDATA[introduction]]></category>
		<category><![CDATA[O’Reilly]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=646</guid>
		<description><![CDATA[Recently I was asked by O&#8217;Reilly publishing to give a book review for Paul Teetor new introductory book to R.  After giving the book some attention and appreciating it&#8217;s delivery of the material, I was happy to write and post this review.  Also, I&#8217;m very happy to see how a major publishing house like O&#8217;Reilly is producing more and [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2011/02/book-review-25-recipes-for-getting-started-with-r/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2011/02/book-review-25-recipes-for-getting-started-with-r/"></g:plusone></div></div><p>Recently I was asked by O&#8217;Reilly publishing to give a book review for Paul Teetor new introductory book to R.  After giving the book some attention and appreciating it&#8217;s delivery of the material, I was happy to write and post this review.  Also, I&#8217;m very happy to see how a major publishing house like O&#8217;Reilly is producing more and more R books, great news indeed.</p>
<p>And now for the book review:</p>
<p><strong>Executive summary:</strong> a book that offers a well designed gentle introduction for people with some background in statistics wishing to learn how to get common (basic) tasks done with R.</p>
<p><a href="http://www.r-statistics.com/wp-content/uploads/2011/02/Getting-started-with-R-Book-cover.gif"><img class="size-full wp-image-648 alignright" title="Getting started with R - Book cover" src="http://www.r-statistics.com/wp-content/uploads/2011/02/Getting-started-with-R-Book-cover.gif" alt="" width="180" height="236" /></a></p>
<h3>Information</h3>
<p>By: Paul Teetor<br />
Publisher:O&#8217;Reilly<br />
MediaReleased: January 2011<br />
Pages: 58 (est.)</p>
<p><H3>Format</H3></p>
<p>The book &#8220;25 Recipes for Getting Started with R&#8221; offers an interesting take on how to bring R to the general (statistically oriented) public.</p>
<p><span id="more-646"></span></p>
<p>Instead of teaching R (or topics in statistics) in a systematic way, the author chose to assemble a likely set of cheat-sheet-like how-to tasks (&#8220;R recipes&#8221;) that a new user of R is assumed to encounter in their first steps of using R.  Tasks like: Installing R, finding help, reading data, selecting data, basic summary statistics, plotting some graphs, loading packages, and performing/diagnosing OLS regression.</p>
<p>These recipes were taken from the &#8220;R Cookbook&#8221; (O’Reilly) which contains over 200 such recipes.</p>
<p>Each of the 25 &#8220;R recipe&#8221; is comprised of four sections:</p>
<ul>
<li><strong>Problem </strong>- stating in one sentence what is the task we wish to accomplish.</li>
<li><strong>Solution </strong>- a direct solution to the problem presented in very few paragraphs (ranging from one paragraph up to a page)</li>
<li><strong>Discussion </strong>- an extension of the solution, offering several pages of variations and common pitfalls.</li>
<li><strong>See also</strong> &#8211; with reference for further information (not always present)</li>
</ul>
<p>The book is modest in it&#8217;s presumptions of scope (which I appreciate) and tries only to offer a bird&#8217;s eye view for statistically oriented, first time (short on time) users, wanting to feel they can get to do &#8220;something&#8221; using R.</p>
<h3>Audience</h3>
<p>I can imagine a first year student (or an IT professional with some stats background), benefiting from such a book if they have learned their stats with another package (like stata, <a title="SAS news" href="http://sas-x.com/">SAS </a>, SPSS and so on).</p>
<p>The books scope is both an advantage and a disadvantage, depending on the target audience.  I would find it surprising if experience R users will have much (or any) to gain from it, and it can not serve as a reference.  Although this might be a different case with the extended &#8220;R cookbook&#8221; (which I hope to get my hands on at this point or another, since I enjoyed the authors writing).</p>
<p>Lastly, I should mention that someone who is already well versed in SAS or SPSS would probably prefer Robert Muenchens superb book &#8221;<a href="http://www.springer.com/statistics/computanional+statistics/book/978-0-387-09417-5">R for SAS and SPSS Users</a>&#8221; in order to make the transition to R smoother.</p>
<h3>Content outline (with some notes)</h3>
<p>I added some notes to the chapter names.  I&#8217;d like to state again that my general impression of the book is good.  The points I make are mostly subtle and only placed to guide you in case you give the book as a gift to a friend, in case you might wish to emphasize some things to your friend that were not mentioned in this book.</p>
<p>The books content includes:</p>
<ul>
<li>Downloading and Installing R</li>
<li>Getting Help on a Function</li>
<li>Viewing the Supplied Documentation</li>
<li>Searching the Web for Help &#8211; credit goes to the author for mentioning <a href="http://stats.stackexchange.com/">stats.stackexchange.com</a> and stackoverflow.com , while highlighting the use of <a href="http://stackoverflow.com/questions/tagged/r">the R tag on stackoverflow. </a>Although I wished he had mentions <a title="R news from blogs" href="http://www.r-bloggers.com/">R-bloggers</a> (edit: after corresponding with the author, he wrote to me that: <em>F.Y.I., I do mention R-bloggers in the full R Cookbook. The 25 Recipes book cannot contain as much useful information. In the Cookbook, I recommend that readers follow R-bloggers as a way to keep up with developments in the R community.)</em>.</li>
<li>Reading Tabular Datafiles &#8211; the author makes proper distinctions with how to menage factors vs characters.</li>
<li>Reading from CSV Files</li>
<li>Creating a Vector</li>
<li>Computing Basic Statistics &#8211; the author gives proper room for handling missing values.</li>
<li>Initializing a Data Frame from Column Data</li>
<li>Selecting Data Frame Columns by Position</li>
<li>Selecting Data Frame Columns by Name</li>
<li>Forming a Confidence Interval for a Mean</li>
<li>Forming a Confidence Interval for a Proportion</li>
<li>Comparing the Means of Two Samples</li>
<li>Testing a Correlation for Significance</li>
<li>Creating a Scatter Plot &#8211; I wish more attention would have been made to talking about  lattice (which was mentioned, twice, in the book) and ggplot2 (in the see also, discussion or the preface).  The same could have been said about many other procedures but I think graphics and R is a special case since it should be clear to the reader how R packages can extend it&#8217;s statistical procedures but the reader may not notice how there are R packages that extend it&#8217;s graphical capabilities as well.</li>
<li>Creating a Bar Chart</li>
<li>Creating a Box Plot</li>
<li>Creating a Histogram</li>
<li>Performing Simple Linear Regression</li>
<li>Performing Multiple Linear Regression &#8211; there might have been room to mention the existence of &#8220;I&#8221; (for example: y~x+I(x^2)) and interactions (&#8220;*&#8221;).</li>
<li>Getting Regression Statistics</li>
<li>Diagnosing a Linear Regression &#8211; this section include the command outlier.test which is based on the car package (and not in base R).   It would have probably been clearer if the author directed the reader to the section on using packages in the &#8220;see also&#8221; instead of only talking about install.pacakges (which wasn&#8217;t the place for it, IMHO).</li>
<li>Predicting New Values &#8211; I would have recommended to highlight the importance of retaining the same column names in the new data.frame since failing to do so results in a (quite common) failure of the function.</li>
<li>Accessing the Functions in a Package &#8211; I think this section should have been referenced more.  And also that the installation of new packages could have been inserted here.</li>
</ul>
<p>* * *</p>
<p>If you got to have a look at the book, I&#8217;d be very curious to read your thoughts about it in the comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2011/02/book-review-25-recipes-for-getting-started-with-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The R Journal, Vol.2 Issue 2 is out</title>
		<link>http://www.r-statistics.com/2010/12/the-r-journal-vol-2-issue-2-is-out/</link>
		<comments>http://www.r-statistics.com/2010/12/the-r-journal-vol-2-issue-2-is-out/#comments</comments>
		<pubDate>Fri, 31 Dec 2010 14:29:27 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[The R Journal]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=609</guid>
		<description><![CDATA[The second issue of the second volume of The R Journal is now available . Download complete issue Refereed articles may be downloaded individually using the links below. [Bibliography of refereed articles] Table of Contents Editorial 3 Contributed Research Articles Solving Differential Equations in R Karline Soetaert, Thomas Petzoldt and R. Woodrow Setzer 5 Source [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/12/the-r-journal-vol-2-issue-2-is-out/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/12/the-r-journal-vol-2-issue-2-is-out/"></g:plusone></div></div><p>The second issue of the second volume of<a href="http://journal.r-project.org/current.html"> The R Journal is now available </a>.</p>
<p><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2.pdf">Download complete issue</a></p>
<p>Refereed articles may be downloaded individually using the links below. [<a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2.html">Bibliography of refereed articles</a>]</p>
<h2>Table of Contents</h2>
<table>
<tbody>
<tr>
<td width="94%">Editorial</td>
<td width="4%" valign="top">3</td>
</tr>
<tr>
<td width="94%">
<h3>Contributed Research Articles</h3>
</td>
<td width="4%" valign="top"></td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Soetaert~et~al.pdf">Solving Differential Equations in R </a><br />
<em>Karline Soetaert, Thomas Petzoldt and R. Woodrow Setzer</em></td>
<td width="4%" valign="top">5</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Murdoch.pdf">Source References </a><br />
<em>Duncan Murdoch</em></td>
<td width="4%" valign="top">16</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Roennegaard~et~al.pdf">hglm: A Package for Fitting Hierarchical Generalized Linear Models </a><br />
<em>Lars Rönnegård, Xia Shen and Moudud Alam</em></td>
<td width="4%" valign="top">20</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Solymos.pdf">dclone: Data Cloning in R </a><br />
<em>Péter Sólymos</em></td>
<td width="4%" valign="top">29</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Wickham.pdf">stringr: modern, consistent string processing </a><br />
<em>Hadley Wickham</em></td>
<td width="4%" valign="top">38</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Ardia+Hoogerheide.pdf">Bayesian Estimation of the GARCH(1,1) Model with Student-t Innovations </a><br />
<em>David Ardia and Lennart F. Hoogerheide</em></td>
<td width="4%" valign="top">41</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Ferreira~da~Silva.pdf">cudaBayesreg: Bayesian Computation in CUDA </a><br />
<em>Adelino Ferreira da Silva</em></td>
<td width="4%" valign="top">48</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Bilder~et~al.pdf">binGroup: A Package for Group Testing </a><br />
<em>Christopher R. Bilder, Boan Zhang, Frank Schaarschmidt and Joshua M. Tebbs</em></td>
<td width="4%" valign="top">56</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Sariyar+Borg.pdf">The RecordLinkage Package: Detecting Errors in Data </a><br />
<em>Murat Sariyar and Andreas Borg</em></td>
<td width="4%" valign="top">61</td>
</tr>
<tr>
<td width="94%"><a href="http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Ishwaran~et~al.pdf">spikeslab: Prediction and Variable Selection Using Spike and Slab Regression </a><br />
<em>Hemant Ishwaran, Udaya B. Kogalur and J. Sunil Rao</em></td>
<td width="4%" valign="top">68</td>
</tr>
<tr>
<td width="94%">
<h3>From the Core</h3>
</td>
<td width="4%" valign="top"></td>
</tr>
<tr>
<td width="94%">What&#8217;s New?</td>
<td width="4%" valign="top">74</td>
</tr>
<tr>
<td width="94%">
<h3>News and Notes</h3>
</td>
<td width="4%" valign="top"></td>
</tr>
<tr>
<td width="94%">useR! 2010</td>
<td width="4%" valign="top">77</td>
</tr>
<tr>
<td width="94%">Forthcoming Events: useR! 2011</td>
<td width="4%" valign="top">79</td>
</tr>
<tr>
<td width="94%">Changes in R</td>
<td width="4%" valign="top">81</td>
</tr>
<tr>
<td width="94%">Changes on CRAN</td>
<td width="4%" valign="top">90</td>
</tr>
<tr>
<td width="94%">News from the Bioconductor Project</td>
<td width="4%" valign="top">101</td>
</tr>
<tr>
<td width="94%">R Foundation News</td>
<td width="4%" valign="top">102</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/12/the-r-journal-vol-2-issue-2-is-out/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New edition of &#8220;R Companion to Applied Regression&#8221; – by John Fox and Sandy Weisberg</title>
		<link>http://www.r-statistics.com/2010/12/new-edition-of-r-companion-to-applied-regression-by-john-fox-and-sandy-weisberg/</link>
		<comments>http://www.r-statistics.com/2010/12/new-edition-of-r-companion-to-applied-regression-by-john-fox-and-sandy-weisberg/#comments</comments>
		<pubDate>Fri, 10 Dec 2010 19:26:30 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[ANOVA]]></category>
		<category><![CDATA[John Fox]]></category>
		<category><![CDATA[r books]]></category>
		<category><![CDATA[R regression]]></category>
		<category><![CDATA[regression]]></category>
		<category><![CDATA[Sandy Weisberg]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=590</guid>
		<description><![CDATA[Just two hours ago, Professor John Fox has announced on the R-help mailing list of a new (second) edition to his book &#8220;An R and S Plus Companion to Applied Regression&#8221;, now title . &#8220;An R Companion to Applied Regression, Second Edition&#8221;. John Fox is (very) well known in the R community for many contributions [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/12/new-edition-of-r-companion-to-applied-regression-by-john-fox-and-sandy-weisberg/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/12/new-edition-of-r-companion-to-applied-regression-by-john-fox-and-sandy-weisberg/"></g:plusone></div></div><p><a rel="attachment wp-att-592" href="http://www.r-statistics.com/2010/12/new-edition-of-r-companion-to-applied-regression-by-john-fox-and-sandy-weisberg/anrcomapniontoregression-cover/"><img class="size-full wp-image-592 alignright" title="An R companion to Applied Regression (cover)" src="http://www.r-statistics.com/wp-content/uploads/2010/12/AnRcomapniontoRegression-cover.jpg" alt="" width="150" height="214" /></a></p>
<p>Just two hours ago, Professor <a href="http://socserv.mcmaster.ca/jfox/">John Fox</a> has <a href="http://www.mail-archive.com/r-help@r-project.org/msg119817.html">announced on the R-help</a> mailing list of a new (second) edition to his book &#8220;An R and S Plus Companion to Applied Regression&#8221;, now title .  <a href="http://socserv.mcmaster.ca/jfox/Books/Companion/">&#8220;An R Companion to Applied Regression, Second Edition&#8221;</a>.</p>
<p>John Fox is (very) well known in the R community for <strong>many </strong>contributions to <a title="the R project" href="http://www.r-project.org/">R</a>, including the <a href="http://cran.r-project.org/web/packages/car/index.html">car package</a> (which any one who is interested in <a title="Repeated measures ANOVA with R and car" href="http://www.r-statistics.com/2010/04/repeated-measures-anova-with-r-tutorials/">performing SS type II and III repeated measures anova in R</a>, is sure to come by), the <a href="http://cran.r-project.org/web/packages/Rcmdr/index.html">Rcmdr pacakge </a>(one of the two major GUI&#8217;s for R, the second one is <a title="Deducer by Ian Fellows" href="http://cran.r-project.org/web/packages/Deducer/index.html">Deducer</a>), <a href="http://cran.r-project.org/web/packages/sem/index.html">sem</a> (for Structural Equation Models) and <a title="John Fox on Crantastic" href="http://crantastic.org/authors/169">more</a>.  These might explain why I think having him release a new edition for his book to be big news for the R community of users.</p>
<p>In this new edition, Professor Fox has teamed with Professor <a href="http://www.stat.umn.edu/~sandy/">Sandy Weisberg</a>, to refresh the original edition so to cover the development gained in the (nearly) 10 years since the first edition was written.</p>
<p>Here is what John Fox had to say:</p>
<blockquote><p>Dear all,</p>
<p>Sandy Weisberg and I would like to announce the publication of the second<br />
edition of An R Companion to Applied Regression (Sage, 2011).</p>
<p>As is immediately clear, the book now has two authors and S-PLUS is gone<br />
from the title (and the book). The R Companion has also been thoroughly<br />
rewritten, covering developments in the nearly 10 years since the first<br />
edition was written and expanding coverage of topics such as R graphics and<br />
R programming. As before, however, the R Companion provides a general<br />
introduction to R in the context of applied regression analysis, broadly<br />
construed. It is available from the publisher at <a href="http://www.sagepub.com/books/Book233899">(US)</a> or <a href="http://www.uk.sagepub.com/books/Book233899">(UK)</a>, and <a href="http://www.amazon.ca/R-Companion-Applied-Regression/dp/141297514X/ref=sr_1_3?s=books&amp;ie=UTF8&amp;qid=1291995545&amp;sr=1-3">from Amazon (see here)</a></p>
<p>The book is augmented by a web site with data sets, appendices on a variety of topics, and more, and it associated with the car package on CRAN, which has recently undergone an overhaul.</p>
<p>Regards,<br />
John and Sandy</p></blockquote>
<p><span id="more-590"></span></p>
<p>If you are interested in more information about <a href="http://socserv.mcmaster.ca/jfox/Books/Companion/">&#8220;An R Companion to Applied Regression, Second Edition&#8221;</a>, you can head over to the books web page, here&#8217;s some of the highlights of what you will find there:</p>
<ul>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/preface.pdf">Preface to the book</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/contents.pdf">Table of contents</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://www.sagepub.com/books/Book233899?#tabview=samples">Sample chapters from Sage</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/reviews.html">Reviews of the book</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/errata.html">Errata and updates</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/scripts.html">Scripts for examples by chapter and appendix</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/data.html">Data files used in the book</a>.</span></li>
<li><span style="font-size: 9.02778px;"><a href="http://cran.r-project.org/web/packages/car/index.html">The car package for R</a>.</span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/appendix.html">Web appendix to the text</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion-1E/index.html">Web site for the first edition</a></span></li>
<li><span style="font-size: 9.02778px;"><a href="http://socserv.mcmaster.ca/jfox/Books/Companion/resources.html">Other R resources</a></span></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/12/new-edition-of-r-companion-to-applied-regression-by-john-fox-and-sandy-weisberg/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Managing a statistical analysis project &#8211; guidelines and best practices</title>
		<link>http://www.r-statistics.com/2010/09/managing-a-statistical-analysis-project-guidelines-and-best-practices/</link>
		<comments>http://www.r-statistics.com/2010/09/managing-a-statistical-analysis-project-guidelines-and-best-practices/#comments</comments>
		<pubDate>Thu, 30 Sep 2010 16:03:12 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[best practices]]></category>
		<category><![CDATA[code management]]></category>
		<category><![CDATA[r tips]]></category>
		<category><![CDATA[statistical analysis]]></category>
		<category><![CDATA[tips]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=556</guid>
		<description><![CDATA[In the past two years, a growing community of R users (and statisticians in general) have been participating in two major Question-and-Answer websites: The R tag page on Stackoverflow, and Stat over flow (which will soon move to a new domain, no worries, I&#8217;ll write about it once it happens) In that time, several long (and fascinating) [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/09/managing-a-statistical-analysis-project-guidelines-and-best-practices/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/09/managing-a-statistical-analysis-project-guidelines-and-best-practices/"></g:plusone></div></div><p>In the past two years, a growing community of R users (and statisticians in general) have been participating in two major Question-and-Answer websites:</p>
<ol>
<li><a href="http://stackoverflow.com/questions/tagged/r">The R tag page on Stackoverflow</a>, and</li>
<li><a href="http://stats.stackexchange.com/">Stat over flow</a> (which will soon move to a new domain, no worries, I&#8217;ll write about it once it happens)</li>
</ol>
<p>In that time, several long (and fascinating) discussion threads where started, reflecting on tips and best practices for managing a statistical analysis project.  They are:</p>
<ul>
<li><a href="http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing">&#8220;Workflow for statistical analysis and report writing&#8221;</a></li>
<li><a href="http://stackoverflow.com/questions/2284446/organizing-r-source-code">&#8220;Organizing R Source Code&#8221;</a></li>
<li><a href="http://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs">&#8220;How to organize large R programs?&#8221;</a></li>
<li><a href="http://stackoverflow.com/questions/2712421/r-and-version-control-for-the-solo-data-analyst">&#8220;R and version control for the solo data analyst&#8221;</a></li>
<li><a href="http://stackoverflow.com/questions/2295389/how-does-software-development-compare-with-statistical-programming-analysis">&#8220;How does software development compare with statistical programming/analysis ?&#8221;</a></li>
<li><a href="http://stackoverflow.com/questions/2286831/how-do-you-combine-revision-control-with-workflow-for-r">&#8220;How do you combine “Revision Control” with “WorkFlow” for R?&#8221;</a></li>
<li><a href="http://stats.stackexchange.com/questions/2910/how-to-efficiently-manage-a-statistical-analysis-project">How to efficiently manage a statistical analysis project?</a></li>
</ul>
<p>On the last thread in the list, the user <a href="http://stats.stackexchange.com/users/930/chl">chl</a>, has started with trying to <a href="http://stats.stackexchange.com/questions/2910/how-to-efficiently-manage-a-statistical-analysis-project/3191#3191">compile all the tips and suggestions</a> together.  And with his permission, I am now republishing it here.  I encourage you to contribute from your own experience (either in the comments, or by answering to any of the threads I&#8217;ve linked to)</p>
<p><span id="more-556"></span></p>
<p>From here on is what &#8220;chl&#8221; wrote:</p>
<p><span style="font-size: 13.1944px;">These guidelines where compiled from </span><span style="font-size: 13.1944px;"><a rel="nofollow" href="http://www.stackoverflow.com/">SO</a> (as suggested by @Shane), <a rel="nofollow" href="http://biostar.stackexchange.com/">Biostar</a> (hereafter, BS), and <a href="http://stats.stackexchange.com/">SE</a>. I tried my best to acknowledge ownership for each item, and to select first or highly upvoted answer. I also added things of my own, and flagged items that are specific to the [<a href="http://www.r-project.org/">R</a>] environment.</span></p>
<p><strong>Data management</strong></p>
<ul>
<li>create a project structure for keeping all things at the right place (data, code, figures, etc., <a rel="nofollow" href="http://biostar.stackexchange.com/questions/822/how-do-you-manage-your-files-directories-for-your-projects/826#826">giovanni</a>/BS)</li>
<li>never modify raw data files (ideally, they should be read-only), copy/rename to new ones when making transformations, cleaning, etc.</li>
<li>check data consistency (<a href="http://stats.stackexchange.com/questions/2768/what-is-a-consistency-check/2785#2785">whuber</a> /SE)</li>
</ul>
<p><strong>Coding</strong></p>
<ul>
<li>organize source code in logical units or building blocks (<a href="http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing/1434424#1434424">Josh Reich</a>/<a href="http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing/1430569#1430569">hadley</a>/<a href="http://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs/1269808#1269808">ars</a> /SO; <a rel="nofollow" href="http://biostar.stackexchange.com/questions/822/how-do-you-manage-your-files-directories-for-your-projects/826#826">giovanni</a>/<a rel="nofollow" href="http://biostar.stackexchange.com/questions/822/how-do-you-manage-your-files-directories-for-your-projects/829#829">Khader Shameer</a> /BS)</li>
<li>separate source code from editing stuff, especially for large project &#8212; partly overlapping with previous item and reporting</li>
<li>document everything, with e.g. [R]oxygen (<a href="http://stackoverflow.com/questions/2284446/organizing-r-source-code/2284486#2284486">Shane</a> /SO) or consistent self-annotation in the source file</li>
<li>[R] custom functions can be put in a dedicated file (that can be sourced when necessary), in a new environment (so as to avoid populating the top-level namespace, <a href="http://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs/1319786#1319786">Brendan OConnor</a> /SO), or a package (<a href="http://stackoverflow.com/questions/1266279/how-to-organize-large-r-programs/1266400#1266400">Dirk Eddelbuettel</a>/<a href="http://stackoverflow.com/questions/2284446/organizing-r-source-code/2284486#2284486">Shane</a> /SO)</li>
</ul>
<p><strong>Analysis</strong></p>
<ul>
<li>don&#8217;t forget to set/record the seed you used when calling RNG or stochastic algorithms (e.g. k-means)</li>
<li>for Monte Carlo studies, it may be interesting to store specs/parameters in a separate file (<a rel="nofollow" href="http://neuralensemble.org/trac/sumatra">sumatra</a>may be a good candidate, <a rel="nofollow" href="http://biostar.stackexchange.com/questions/822/how-do-you-manage-your-files-directories-for-your-projects/826#826">giovanni</a> /BS)</li>
<li>don&#8217;t limit yourself to one plot per variable, use multivariate (Trellis) displays and interactive visualization tools (e.g. GGobi)</li>
</ul>
<p><strong>Versioning</strong></p>
<ul>
<li>use some kind of CVS for easy tracking/export, e.g. Git (<a href="http://stackoverflow.com/questions/2712421/r-and-version-control-for-the-solo-data-analyst/2715569#2715569">Sharpie</a>/<a href="http://stackoverflow.com/questions/2545765/how-can-i-email-someone-a-git-repository/2545784#2545784">VonC</a>/<a href="http://stackoverflow.com/questions/2286831/how-do-you-combine-revision-control-with-workflow-for-r/2290194#2290194">JD Long</a> /SO) &#8212; this follows from nice questions asked by @Jeromy and @Tal</li>
<li>backup everything, on a regular basis (<a href="http://stackoverflow.com/questions/2712421/r-and-version-control-for-the-solo-data-analyst/2715569#2715569">Sharpie</a>/<a href="http://stackoverflow.com/questions/2286831/how-do-you-combine-revision-control-with-workflow-for-r/2290194#2290194">JD Long</a> /SO)</li>
<li>keep a log of your ideas, or rely on an issue tracker, like <a rel="nofollow" href="http://ditz.rubyforge.org/ditz/">ditz</a> (<a rel="nofollow" href="http://biostar.stackexchange.com/questions/822/how-do-you-manage-your-files-directories-for-your-projects/826#826">giovanni</a> /BS) &#8212; partly redundant with the previous item since it is available in Git</li>
</ul>
<p><strong>Editing/Reporting</strong></p>
<ul>
<li>[R] Sweave (<a href="http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing/1430013#1430013">Matt Parker</a> /SO)</li>
<li>[R] brew (<a href="http://stackoverflow.com/questions/1429907/workflow-for-statistical-analysis-and-report-writing/1436809#1436809">Shane</a> /SO)</li>
<li>[R] [R2HTML]<a rel="nofollow" href="http://cran.r-project.org/web/packages/R2HTML/index.html">20</a> or <a rel="nofollow" href="http://cran.r-project.org/web/packages/ascii/index.html">ascii</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/09/managing-a-statistical-analysis-project-guidelines-and-best-practices/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Tips for the R beginner (a 5 page overview)</title>
		<link>http://www.r-statistics.com/2010/08/tips-for-the-r-beginner-5-pages-overview/</link>
		<comments>http://www.r-statistics.com/2010/08/tips-for-the-r-beginner-5-pages-overview/#comments</comments>
		<pubDate>Mon, 23 Aug 2010 20:24:34 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[begginers]]></category>
		<category><![CDATA[Finance]]></category>
		<category><![CDATA[Finances]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=521</guid>
		<description><![CDATA[In this post I publish a PDF document titled &#8220;A collection of tips for R in Finance&#8221;. It is a basic 5 page introduction to R in finances by Arnaud Amsellem (linked in profile). The article offers tips related to the following points: Code Editor Organizing R code Update packages Getting external data into R [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/08/tips-for-the-r-beginner-5-pages-overview/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/08/tips-for-the-r-beginner-5-pages-overview/"></g:plusone></div></div><p>In this post I publish a PDF document titled &#8220;A collection of tips for R in Finance&#8221;.<br />
It is a basic 5 page introduction to R in finances by Arnaud Amsellem (<a href="http://fr.linkedin.com/pub/arnaud-amsellem/9/33b/20a">linked in profile</a>).</p>
<p>The article offers tips related to the following points:</p>
<ul>
<li>Code Editor</li>
<li>Organizing R code</li>
<li>Update packages</li>
<li>Getting external data into R</li>
<li>Communicating with external applications</li>
<li>Optimizing R code</li>
</ul>
<p>This article is well articulated, and offers a perspective of someone who is experienced in the field and touches points that I can imagine beginners might otherwise overlook.  I hope publishing it here will be of use to some readers out there.</p>
<p>Update: as some readers have noted to me (by e-mail, and by commenting), this document touches very lightly on the topic of &#8220;finances&#8221; in R.  I therefore decided to update the title from &#8220;R in finance &#8211; some tips for beginners&#8221;, to it&#8217;s current form.</p>
<p><strong>Lastly</strong>: if you (a reader of this blog) feel you have an article (&#8220;post&#8221;) to contribute, but don&#8217;t feel like <a href="http://www.r-statistics.com/2010/07/blogging-about-r-presentation-and-audio/">starting your own blog</a>, feel welcome to <a href="http://www.r-statistics.com/contact-me/">contact me</a>, and I&#8217;ll be glad to post what you have to say on my blog (and subsequently, also on <a href="http://www.r-bloggers.com/">R bloggers</a>).</p>
<p>Here is the article:<br />
<span id="more-521"></span><br />
<p class="gde-text"><a href="http://www.r-statistics.com/wp-content/uploads/2010/08/A-collection-of-tips-for-R-in-Finance.pdf" target="_blank" class="gde-link">Download (PDF, 418.09KB)</a></p>
<iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.r-statistics.com%2Fwp-content%2Fuploads%2F2010%2F08%2FA-collection-of-tips-for-R-in-Finance.pdf&hl=en_US&embedded=true" class="gde-frame" style="width:500px; height:700px; border: none;" scrolling="no"></iframe>

</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/08/tips-for-the-r-beginner-5-pages-overview/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>ggplot2 plot builder is now on CRAN! (through Deducer 0.4 GUI for R)</title>
		<link>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/</link>
		<comments>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 18:53:03 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[deducer]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[google summer of code]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[Hadley Wickham]]></category>
		<category><![CDATA[Ian fellows]]></category>
		<category><![CDATA[interfaces]]></category>
		<category><![CDATA[plot builder]]></category>
		<category><![CDATA[R GUI]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[tutorials]]></category>
		<category><![CDATA[videos]]></category>
		<category><![CDATA[youtube]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=507</guid>
		<description><![CDATA[Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of Deducer (0.4) to CRAN (scheduled to update in the next day or so). This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality. Following is the e-mail [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/"></g:plusone></div></div><p>Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of <a href="http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual">Deducer </a>(0.4) to <a href="http://cran.r-project.org/web/packages/Deducer/index.html">CRAN</a> (scheduled to update in the next day or so).<br />
This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality.</p>
<p>Following is the e-mail he sent out with all the details and demo videos.</p>
<p><span id="more-507"></span></p>
<h3>Deducer</h3>
<p>Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is two fold.</p>
<p>Provide an intuitive interface so that non-technical users can learn and perform analyses without programming getting in their way.<br />
Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.<br />
Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).</p>
<p>For those not familiar with Deducer, an online manual is available at: <a href="http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual">http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual</a></p>
<p>An introductory tour of Deducer (4.5 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/iZ857h2j6wA?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/iZ857h2j6wA?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>There is also an &#8220;expert users introsuction&#8221; (8 min)</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/AjLToyuluSM?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/AjLToyuluSM?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h3>ggplot2 Plot Builder</h3>
<p>The major change to Deducer is the inclusion of a new plotting GUI built on the ggplot2 package. This Google Summer of Code project provides an easy to use system to make anything from simple histograms, to custom publication ready graphics. Feel free to check out the video introduction:</p>
<p>Part 1 (6 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/-Rym6Ucraes?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/-Rym6Ucraes?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Part 2 (6 min): </p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/k6elEgB3OCE?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/k6elEgB3OCE?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Additional videos:<br />
Templates (5 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/ktdifzqbLW8?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ktdifzqbLW8?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Extending the Builder (4 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/RsxOo0jx0II?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/RsxOo0jx0II?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h3>Deducer Extras</h3>
<p>The DeducerExtras package is an add-on package containing a variety of additional analysis dialogs. These include:</p>
<ul>
<li>Distribution quantiles</li>
<li>Single/multiple sample proportion tests</li>
<li>Paired t-test, and wilcoxon signed rank test</li>
<li>Levene&#8217;s test and bartlett&#8217;s test</li>
<li>K-means clustering</li>
<li>Hierarchical clustering</li>
<li>Factor analysis</li>
<li>Multi-dimensional scaling</li>
</ul>
<p>Introduction to Deducer Extras (~2 min): </p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/UCrhxB8tSJY?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/UCrhxB8tSJY?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h3>Final thanks</h3>
<p>I would like to take this opportunity to thank the R community for choosing this project for a Google Summer of Code grant, and for the support and encouragement. In particular I would like to thank Hadley Wickham for mentoring the Plot Builder GUI, and Dirk Eddelbuettel for his organization of students and mentors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Want to join the closed BETA of a new Statistical Analysis Q&amp;A site &#8211; NOW is the time!</title>
		<link>http://www.r-statistics.com/2010/07/want-to-join-the-closed-beta-of-a-new-statistical-analysis-qa-site-now-is-the-time/</link>
		<comments>http://www.r-statistics.com/2010/07/want-to-join-the-closed-beta-of-a-new-statistical-analysis-qa-site-now-is-the-time/#comments</comments>
		<pubDate>Fri, 16 Jul 2010 07:06:56 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[R community]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[communites]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[online]]></category>
		<category><![CDATA[Q&A]]></category>
		<category><![CDATA[statistical analysis]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=474</guid>
		<description><![CDATA[The bottom line of this post is for you to go to: Stack Exchange Q&#038;A site proposal: Statistical Analysis And commit yourself to using the website for asking and answering questions. (And also consider giving the contender, MetaOptimize a visit) * * * * Statistical analysis Q&#038;A website is about to go into BETA A [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/07/want-to-join-the-closed-beta-of-a-new-statistical-analysis-qa-site-now-is-the-time/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/07/want-to-join-the-closed-beta-of-a-new-statistical-analysis-qa-site-now-is-the-time/"></g:plusone></div></div><p><strong>The bottom line of this post is for you to go to:<br />
<a href="http://area51.stackexchange.com/proposals/33/statistical-analysis?referrer=3OUOcMUJcOo1">Stack Exchange Q&#038;A site proposal: Statistical Analysis </a><br />
And commit yourself to using the website for asking and answering questions.</strong></p>
<p>(And also consider giving the contender, <a href="http://metaoptimize.com/qa">MetaOptimize</a> a visit)</p>
<p>* * * * </p>
<h3>Statistical analysis Q&#038;A website is about to go into BETA</h3>
<p>A month ago I <a href="http://www.r-statistics.com/2010/06/a-new-qa-website-for-data-analysis-based-on-stackoverflow-engine-is-waiting-for-you/">invited readers of this blog to commit to using a new Q&#038;A website for Data-Analysis</a> (based on StackOverFlow engine), once it will open (the site was originally proposed by <a href="http://robjhyndman.com/researchtips/">Rob Hyndman</a>).<br />
And now, a month later, I am happy to write that <strong>over 500 people</strong> have shown interest in the website, and choose to commit themselves.  This means we we have reached 100% completion of the website proposal process, and in the next few days we will move to the next step.</p>
<p>The next step is that the website will go into closed BETA for about a week.  If you want to be part of this &#8211; now is <a href="http://area51.stackexchange.com/proposals/33/statistical-analysis?referrer=3OUOcMUJcOo1">the time to join</a> (<--- call for action people).<br />
From being part in some other closed BETA of similar projects, I can attest that the enthusiasm of the people trying to answer questions in the BETA is very impressive, so I strongly recommend the experience.</p>
<p>If you won't make it by the time you see this post, then no worries - about a week or so after the website will go online, it will be open to the wide public.</p>
<p>(p.s: thanks Romunov for pointing out to me that the BETA is about to open)</p>
<h3>p.s: MetaOptimize</h3>
<p>I would like to finish this post with mentioning <a href="http://metaoptimize.com/qa/">MetaOptimize</a>.   This is a Q&#038;A website which is of a more &#8220;machine learning&#8221; then a &#8220;statistical&#8221; community.  It also started out some short while ago, and already it has <a href="http://metaoptimize.com/qa/users/">around 700 users</a> who have submitted ~160 questions with ~520 answers given.  From my experience on the site so far, I have enjoyed the high quality of the questions and answers.<br />
When I first came by the website, I feared that supporting this website will split the R community of users between this website and the <a href="http://area51.stackexchange.com/proposals/33/statistical-analysis?referrer=3OUOcMUJcOo1">area 51 StackExchange website</a>.<br />
But after a lengthy discussion (<a href="http://www.r-statistics.com/2010/07/statistical-analysis-qa-website-did-stackoverflow-just-lose-it-to-metaoptimize-and-is-it-good-or-bad/">published recently as a post</a>) with MetaOptimize founder, Joseph Turian, I came to have a more optimistic view of the competition of the two websites.  Where at first I was afraid, I am now <strong>hopeful</strong> that each of the two website will manage to draw a tiny bit of different communities of people (that would otherwise wouldn&#8217;t be present in the other website) &#8211; thus offering all of us a wider variety of knowledge to tap into.</p>
<p>See you there&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/07/want-to-join-the-closed-beta-of-a-new-statistical-analysis-qa-site-now-is-the-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>StackOverFlow and MetaOptimize are battling to be the #1 &#8220;Statistical Analysis Q&amp;A website” &#8211; to whom would you signup?</title>
		<link>http://www.r-statistics.com/2010/07/statistical-analysis-qa-website-did-stackoverflow-just-lose-it-to-metaoptimize-and-is-it-good-or-bad/</link>
		<comments>http://www.r-statistics.com/2010/07/statistical-analysis-qa-website-did-stackoverflow-just-lose-it-to-metaoptimize-and-is-it-good-or-bad/#comments</comments>
		<pubDate>Fri, 02 Jul 2010 21:55:05 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R community]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[area51]]></category>
		<category><![CDATA[artificial intelligence]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[Q&A website]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[stack exchange]]></category>
		<category><![CDATA[stackoverflow]]></category>
		<category><![CDATA[statistical modeling]]></category>
		<category><![CDATA[text analysis]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=442</guid>
		<description><![CDATA[A new statistical analysis Q&#38;A website launched While the proposal for a statistical analysis Q&#38;A website on area51 (stackexchange) is taking it&#8217;s time, and the website is still collecting people who will commit to it, Joseph Turian, who seems a nice guy from his various comments online, seem to feel this website is not what [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/07/statistical-analysis-qa-website-did-stackoverflow-just-lose-it-to-metaoptimize-and-is-it-good-or-bad/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/07/statistical-analysis-qa-website-did-stackoverflow-just-lose-it-to-metaoptimize-and-is-it-good-or-bad/"></g:plusone></div></div><h3>A new statistical analysis Q&amp;A website launched</h3>
<p>While <a href="http://bit.ly/aDuRKV">the proposal for a statistical analysis Q&amp;A website</a> on area51 (stackexchange) is taking it&#8217;s time, and the website is still collecting people who will commit to it,<br />
<a href="http://www-etud.iro.umontreal.ca/~turian/">Joseph Turian</a>, who seems a nice guy from his various comments online, seem to feel this website is not what the community needs and that we shouldn&#8217;t hold up on our questions for the website to go online.  Therefore, Joseph is pushing with all his might his newest creation &#8220;<a href="http://metaoptimize.com/qa">MetaOptimize QA</a>&#8220;, a <a href="http://StackOverFlow.com">StackOverFlow </a>like website for (long list follows): <em>machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization</em>.<br />
With all the bells and whistles that the <a href="http://www.osqa.net/">OSQA framework</a> (an open source stackoverflow clone, and more, system) can offer (you know, rankings, badges and so on).</p>
<p>Is this new website better then the area51 website?  Will all the people go to just one of the two websites. or will we end up with two places that attracts more people then we had to begin with?  These are the questions that come to mind when faced with the story in front of us.</p>
<p>My own suggestion is to try both websites (<a href="http://bit.ly/aDuRKV">the stackoverflow statistical analysis website to come</a> and &#8220;<a href="http://metaoptimize.com/qa">MetaOptimize QA</a>&#8220;) and let time tell.</p>
<p>More info on this story bellow.</p>
<h3>MetaOptimize online impact so far</h3>
<p>The need for such a Q&amp;A site is clearly evident.  With just several days after being promoted online, MetaOptimize has claimed the eyes of almost 300 users, submitting 59 questions and 129 answers.<br />
Already many bloggers in the statistical community have contributed their voices with encouraging posts, here is just a collection of the post I was able to find with some googling:</p>
<ul>
<li><a href="http://hunch.net/?p=1425">http://hunch.net/?p=1425</a></li>
<li><a href="http://ebiquity.umbc.edu/blogger/2010/06/30/training-examples-qa-stackoverflow-for-nlp-and-ml/">http://ebiquity.umbc.edu/blogger/2010/06/30/training-examples-qa-stackoverflow-for-nlp-and-ml/</a></li>
<li><a href="http://lingpipe-blog.com/2010/06/29/training-examples-a-stack-overflow-for-nlp-and-ml-and/">http://lingpipe-blog.com/2010/06/29/training-examples-a-stack-overflow-for-nlp-and-ml-and/</a></li>
<li><a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2010/06/question_answer.html">http://www.stat.columbia.edu/~cook/movabletype/archives/2010/06/question_answer.html</a></li>
<li><a href="http://kaggle.com/blog/2010/07/02/new-machine-learning-and-natural-language-processing-qa-site/">http://kaggle.com/blog/2010/07/02/new-machine-learning-and-natural-language-processing-qa-site/</a></li>
<li><a href="http://www.jroller.com/otis/entry/metaoptimize_com_q_a_site">http://www.jroller.com/otis/entry/metaoptimize_com_q_a_site</a></li>
<li><a href="http://sbseminar.wordpress.com/2010/06/17/statistics-version-of-mathoverflow-looking-for-beta-testers/">http://sbseminar.wordpress.com/2010/06/17/statistics-version-of-mathoverflow-looking-for-beta-testers/</a></li>
<li><a href="http://myumbc3.my.umbc.edu/news/1841">http://myumbc3.my.umbc.edu/news/1841</a></li>
<li><a href="http://ebiquity.umbc.edu/blogger/2010/06/30/training-examples-qa-stackoverflow-for-nlp-and-ml/">http://ebiquity.umbc.edu/blogger/2010/06/30/training-examples-qa-stackoverflow-for-nlp-and-ml/</a></li>
</ul>
<h3>But is it goos to have two websites?</h3>
<p>But wait, didn&#8217;t we just start pushing forward another <a href="http://www.r-statistics.com/2010/06/a-new-qa-website-for-data-analysis-based-on-stackoverflow-engine-is-waiting-for-you/">statistical Q&amp;A website two weeks ago</a>?  I am talking about the <strong><a href="http://bit.ly/aDuRKV">Stack Exchange Q&amp;A site proposal: Statistical Analysis</a>.</strong></p>
<p>So what should we (the community of statistical minded people) to do the next time we have a question?</p>
<p>Should we wait for Stack Exchange offer for a new website to start?  Or should we start using MetaOptimize?</p>
<p><strong>Update: <span style="font-weight: normal;">after lengthy e-mail exchange with Joseph (the person who founded MetaOptimize), I decided to erase what I originally wrote as my doubts, and instead give a Q&amp;A session that him and I have had in the e-mails exchange.  It is a bit edited from what was originally, and some of the content will probably get updated &#8211; so if you are into this subject, check in again in a few hours <img src='http://www.r-statistics.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </span></strong></p>
<p><del datetime="2010-07-03T09:28:16+00:00"><br />
Honestly, I am split in two (and <a href="http://www-etud.iro.umontreal.ca/~turian/">Joseph</a>, I do hope you&#8217;ll take this in a positive way, since personally I feel confident you are a good guy).  I very strongly believe in the need and value of such a Q&amp;A website.  Yet I am wondering how I feel about such a website being hosted as MetaOptimize and outside the hands of the stackoverflow guys.<br />
On the one hand, open source lovers (like myself) tend to like decentralization and reliance on OSS (open source software) solutions (such as the one <a href="http://www.osqa.net/">OSQA framework</a> offers).  On the other hand, I do believe that the stackoverflow people  have (much) more experience in handling such websites then Joseph.  I can very easily trust them to do regular database backups, share the websites database dumps with the general community, smoothly test and upgrade to provide new features, and generally speaking perform in a more  experienced way with the online Q&amp;A community.<br />
It doesn&#8217;t mean that Joseph won&#8217;t do a great job, personally I hope he will.</del></p>
<h3><strong><span style="text-decoration: underline;">Q&amp;A session with Joseph Turian (MetaOptimize founder)</span></strong></h3>
<p><strong><span style="text-decoration: underline;">Tal</span></strong>: Let&#8217;s start with the easy question, should I worry about technical issues in the website (like, for example, backups)?</p>
<p><span style="text-decoration: underline;"><strong>Joseph</strong></span>:</p>
<div id="_mcePaste">The OSQA team (backed by DZone) have got my back. They have been very helpful since day one to all OSQA users, and have given me a lot of support. Thanks, especially Rick and Hernani!</div>
<p>They provide email and chat support for OSQA users.</p>
<p>I will commit to putting up regular automatic database dumps, whenever the OSQA team implements it:<br />
<a href="http://meta.osqa.net/questions/3120/how-do-i-offer-database-dumps">http://meta.osqa.net/questions/3120/how-do-i-offer-database-dumps</a><br />
If, in six months, they don&#8217;t have this feature as part of their core, and someone (e.g. you) emails me reminding me that they want a dump, I will manually do a database dump and strip the user table.</p>
<p>Also, I&#8217;ve got a scheduled daily database dump that is mirrored to Amazon S3.</p>
<p><span style="text-decoration: underline;"><strong><strong><span style="text-decoration: underline;">Tal</span></strong>:</strong></span> Why did you start MetaOptimize instead of supporting the area51 proposal?<br />
<span style="text-decoration: underline;"><strong>Joseph</strong></span>:</p>
<ol>
<li><span style="font-size: 13.1944px;">On Area51, people asked to have AI merged with ML, and ML merged with statistical analysis, but their requests seemed to be ignored. This seemed like a huge disservice to these communities.</span></li>
<li><span style="font-size: 13.1944px;">Area 51 didn&#8217;t have academics in ML + NLP. I know from experience it&#8217;s hard to get them to buy in to new technology. So why would I risk my reputation getting them to sign up for Area 51, when I know that I will get a 1% conversion? They aren&#8217;t early adopters interested in the process, many are late adopters who won&#8217;t sign up for something until they have too.</span></li>
<li><span style="font-size: 13.1944px;">If the Area 51 sites had a strong newbie bent, which is what it seemed like the direction was going, then the academic experts definitely wouldn&#8217;t waste their time. It would become a support<br />
</span><span style="font-size: 13.1944px;">community for newbies, without core expert discussion.  So basically, I know that I and a lot of my colleagues wanted the site I built. And I felt like area 51 was shaping the communities really incorrectly in several respects, and was also taking a while.  I could have fought an institutional process and maybe gotten half the results above and it took a few months, or I could just build the site and invite my friends, and shape the community correctly.</span></li>
</ol>
<p>Besides that, there are also personal motives:</p>
<ul>
<li><span style="font-size: 13.1944px;">I wanted the recognition for having a good vision for the community, and driving forward something they really like.</span></li>
<li><span style="font-size: 13.1944px;">I wanted to experiment with some NLP and ML extensions for the Q+A software, to help organize the information better. Not possible on a closed platform.</span></li>
</ul>
<p><span style="text-decoration: underline;"><strong><strong><span style="text-decoration: underline;">Tal</span></strong>:</strong></span> Me (and maybe some other people) fear that this might fork the people in the field to two websites, instead of bringing them together.  What are your thoughts about that?<br />
<span style="text-decoration: underline;"><strong>Joseph</strong></span>:<br />
How am I forking the community? I&#8217;m bringing a bunch of people in who wouldn&#8217;t have even been part of the Area 51 community.<br />
Area 51 was going to fork it into five communities: stat analysis, ML, NLP, AI, and data mining.  And then a lot fewer people would have been involved.</p>
<p><span style="text-decoration: underline;"><strong><strong><span style="text-decoration: underline;">Tal</span></strong>:</strong></span> What are the things that people who support your website are saying?<br />
<span style="text-decoration: underline;"><strong>Joseph</strong></span>:<br />
Here are some quotes about my site:</p>
<blockquote><p>Philip Resnick (UMD): &#8220;Looking at the questions being asked, the people responding, and the quality of the discussion, I can already see this becoming the go-to place for those &#8216;under the hood&#8217; details<br />
you rarely see in the textbooks or conference papers. This site is going to save a lot of people an awful lot of time and frustration.&#8221;</p>
<p>Aria Haghighi (Berkeley): &#8220;Both NLP and ML have a lot of folk wisdom about what works and what doesn&#8217;t. A site like this is crucial for facilitating the sharing and validation of this collective knowledge.&#8221;</p>
<p>Alexandre Passos (Unicamp): &#8220;Really thank you for that. As a machine learning phd student from somewhere far from most good research centers (I&#8217;m in brazil, and how many brazillian ML papers have you<br />
seen in NIPS/ICML recently?), I struggle a lot with this folk wisdom. Most professors around here haven&#8217;t really interacted enough with the international ML community to be up to date&#8221;<br />
(http://news.ycombinator.com/item?id=1476247)</p>
<p>Ryan McDonald (Google): &#8220;A tool like this will help disseminate and archive the tricks and best practices that are common in NLP/ML, but are rarely written about at length in papers.&#8221;</p>
<p>esoom on Reddit: &#8220;This is awesome. I&#8217;m really impressed by the quality of some of the answers, too. Within five minutes of skimming the site, I learned a neat trick that isn&#8217;t widely discussed in the literature.&#8221;<br />
(http://www.reddit.com/r/MachineLearning/comments/ckw5k/stackoverflow_for_machine_learning_and_natural/c0tb3gc)</p>
<p><span style="text-decoration: underline;"><strong><strong><span style="text-decoration: underline;">Tal</span></strong>:</strong></span> In order to be fair to area51 work, they have gotten wonderful responses for the &#8220;statistical analysis&#8221; proposal as well (<a href="http://bit.ly/aDuRKV">see it here</a>)<br />
I have also contacted area51 directly and asked them and invited them to come and join the discussion.  I&#8217;ll update this post with their reply.</p></blockquote>
<h3><span style="text-decoration: underline;">So what&#8217;s next?</span></h3>
<p><del datetime="2010-07-03T08:08:02+00:00">I don&#8217;t know.<br />
If the Stack Exchange website where to launch today, I would probably focus on using it and hint to the site for MetaOptimize (for the reasons I just mentioned, and also for some that Rob Hyndman maintained when he <a href="http://robjhyndman.com/researchtips/stack-exchange-for-statistical-analysis-needs-you/">first wrote on the subject</a>).<br />
If the stack exchange version of the website where to start in a few weeks, I would probably sit on the fence and see if people are using it.  I suspect that by that time, there wouldn&#8217;t be many people left to populate it (but I could always be wrong).<br />
And what if the website where to start in a week, what then?  I have no clue.</del><br />
Good question.<br />
My current feeling is that I am glad to let this play out.<br />
It seems this is a good case study for some healthy competition between platforms and models (OSQA vs stackoverflow/area51-system) &#8211; one that I hope will generate more good features from both companies.  And also will make both parties work hard to get people to participate.<br />
It also seems that this situation is getting many people in our field to be approached with the same idea (Q&amp;A website).  After Joseph input on the subject, I am starting to think that maybe at the end of the day this will benefit all of us.  Instead of forking one community into two, maybe what we&#8217;ll end up with is getting more (experienced) people online (into two locations) that would otherwise would have stayed in the shadows.</p>
<p>The verdict is still out, but I am a bit more optimistic than I was when first writing this post.  I&#8217;ll update this post after getting more input from people.</p>
<p>And as always &#8211; I would love to know <strong><span style="text-decoration: underline;">your thoughts</span></strong> on the subject.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/07/statistical-analysis-qa-website-did-stackoverflow-just-lose-it-to-metaoptimize-and-is-it-good-or-bad/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
	</channel>
</rss>

