<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R-statistics blog &#187; ggplot2</title>
	<atom:link href="http://www.r-statistics.com/tag/ggplot2/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.r-statistics.com</link>
	<description>Writing about statistics with R, and open source stuff (software, data, community)</description>
	<lastBuildDate>Mon, 30 Jan 2012 07:45:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Comparison of ave, ddply and data.table</title>
		<link>http://www.r-statistics.com/2011/08/comparison-of-ave-ddply-and-data-table/</link>
		<comments>http://www.r-statistics.com/2011/08/comparison-of-ave-ddply-and-data-table/#comments</comments>
		<pubDate>Thu, 25 Aug 2011 13:29:44 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[ave]]></category>
		<category><![CDATA[ddply]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[speed]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=792</guid>
		<description><![CDATA[A guest post by Paul Hiemstra. &#8212;&#8212;&#8212;&#8212; Fortran and C programmers often say that interpreted languages like R are nice and all, but lack in terms of speed. How fast something works in R greatly depends on how it is implemented, i.e. which packages/functions does one use. A prime example, which shows up regularly on [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2011/08/comparison-of-ave-ddply-and-data-table/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2011/08/comparison-of-ave-ddply-and-data-table/"></g:plusone></div></div><p>A guest post by <a href="http://intamap.geo.uu.nl/~paul/">Paul Hiemstra</a>.<br />
&#8212;&#8212;&#8212;&#8212;</p>
<p>Fortran and C programmers often say that interpreted languages like R are nice and all, but lack in terms of speed. How fast something works in R greatly depends on how it is implemented, i.e. which packages/functions does one use. A prime example, which shows up regularly on the R-help list, is letting a vector grow as you perform an analysis. In pseudo-code this might look like:</p>

<div class="wp_codebox"><table><tr id="p7924"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p792code4"><pre class="rsplus" style="font-family:monospace;">dum <span style="color: #080;">=</span> NULL
<span style="color: #0000FF; font-weight: bold;">for</span><span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">100000</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
   <span style="color: #228B22;"># new_outcome = ...do some stuff...</span>
   dum <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>dum, new_outcome<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span></pre></td></tr></table></div>

<p>The problem here is that dum is continuously growing in size. This forces the operating system to allocate new memory space for the object, which is terribly slow. Preallocating dum to the length it is supposed to be greatly improves the performance. Alternatively, the use of apply type of functions, or functions from plyr package prevent these kinds of problems. But even between more advanced methods there are large differences between different implementations. </p>
<p>Take the next example. We create a dataset which has two columns, one column with values (e.g. amount of rainfall) and in the other a category (e.g. monitoring station id). We would like to know what the mean value is per category. One way is to use for loops, but I&#8217;ll skip that one for now. Three possibilities exist that I know of: ddply (plyr), ave (base R) and data.table. The piece of code at the end of this post compares these three methods. The outcome in terms of speed is:<br />
(press the image to see a larger version)<br />
<a href="http://www.r-statistics.com/wp-content/uploads/2011/08/comparing-ddply-ave-and-data.table_.png"><img src="http://www.r-statistics.com/wp-content/uploads/2011/08/comparing-ddply-ave-and-data.table_-300x201.png" alt="" title="comparing ddply ave and data.table" width="300" height="201" class="alignnone size-medium wp-image-791" /></a></p>

<div class="wp_codebox"><table><tr id="p7925"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code" id="p792code5"><pre class="rsplus" style="font-family:monospace;">   datsize noClasses  tave tddply tdata.<span style="">table</span>
<span style="color: #ff0000;">1</span>    1e<span style="color: #080;">+</span>05        <span style="color: #ff0000;">10</span> <span style="color: #ff0000;">0.091</span>  <span style="color: #ff0000;">0.035</span>       <span style="color: #ff0000;">0.011</span>
<span style="color: #ff0000;">2</span>    1e<span style="color: #080;">+</span>05        <span style="color: #ff0000;">50</span> <span style="color: #ff0000;">0.102</span>  <span style="color: #ff0000;">0.050</span>       <span style="color: #ff0000;">0.012</span>
<span style="color: #ff0000;">3</span>    1e<span style="color: #080;">+</span>05       <span style="color: #ff0000;">100</span> <span style="color: #ff0000;">0.105</span>  <span style="color: #ff0000;">0.065</span>       <span style="color: #ff0000;">0.012</span>
<span style="color: #ff0000;">4</span>    1e<span style="color: #080;">+</span>05       <span style="color: #ff0000;">200</span> <span style="color: #ff0000;">0.109</span>  <span style="color: #ff0000;">0.101</span>       <span style="color: #ff0000;">0.010</span>
<span style="color: #ff0000;">5</span>    1e<span style="color: #080;">+</span>05       <span style="color: #ff0000;">500</span> <span style="color: #ff0000;">0.113</span>  <span style="color: #ff0000;">0.248</span>       <span style="color: #ff0000;">0.012</span>
<span style="color: #ff0000;">6</span>    1e<span style="color: #080;">+</span>05      <span style="color: #ff0000;">1000</span> <span style="color: #ff0000;">0.123</span>  <span style="color: #ff0000;">0.438</span>       <span style="color: #ff0000;">0.012</span>
<span style="color: #ff0000;">7</span>    1e<span style="color: #080;">+</span>05      <span style="color: #ff0000;">2500</span> <span style="color: #ff0000;">0.146</span>  <span style="color: #ff0000;">0.956</span>       <span style="color: #ff0000;">0.013</span>
<span style="color: #ff0000;">8</span>    1e<span style="color: #080;">+</span>05     <span style="color: #ff0000;">10000</span> <span style="color: #ff0000;">0.251</span>  <span style="color: #ff0000;">3.525</span>       <span style="color: #ff0000;">0.020</span>
<span style="color: #ff0000;">9</span>    1e<span style="color: #080;">+</span>06        <span style="color: #ff0000;">10</span> <span style="color: #ff0000;">0.905</span>  <span style="color: #ff0000;">0.393</span>       <span style="color: #ff0000;">0.101</span>
<span style="color: #ff0000;">10</span>   1e<span style="color: #080;">+</span>06        <span style="color: #ff0000;">50</span> <span style="color: #ff0000;">1.003</span>  <span style="color: #ff0000;">0.473</span>       <span style="color: #ff0000;">0.100</span>
<span style="color: #ff0000;">11</span>   1e<span style="color: #080;">+</span>06       <span style="color: #ff0000;">100</span> <span style="color: #ff0000;">1.036</span>  <span style="color: #ff0000;">0.579</span>       <span style="color: #ff0000;">0.105</span>
<span style="color: #ff0000;">12</span>   1e<span style="color: #080;">+</span>06       <span style="color: #ff0000;">200</span> <span style="color: #ff0000;">1.052</span>  <span style="color: #ff0000;">0.826</span>       <span style="color: #ff0000;">0.106</span>
<span style="color: #ff0000;">13</span>   1e<span style="color: #080;">+</span>06       <span style="color: #ff0000;">500</span> <span style="color: #ff0000;">1.079</span>  <span style="color: #ff0000;">1.508</span>       <span style="color: #ff0000;">0.109</span>
<span style="color: #ff0000;">14</span>   1e<span style="color: #080;">+</span>06      <span style="color: #ff0000;">1000</span> <span style="color: #ff0000;">1.092</span>  <span style="color: #ff0000;">2.652</span>       <span style="color: #ff0000;">0.111</span>
<span style="color: #ff0000;">15</span>   1e<span style="color: #080;">+</span>06      <span style="color: #ff0000;">2500</span> <span style="color: #ff0000;">1.167</span>  <span style="color: #ff0000;">6.051</span>       <span style="color: #ff0000;">0.117</span>
<span style="color: #ff0000;">16</span>   1e<span style="color: #080;">+</span>06     <span style="color: #ff0000;">10000</span> <span style="color: #ff0000;">1.338</span> <span style="color: #ff0000;">23.224</span>       <span style="color: #ff0000;">0.132</span></pre></td></tr></table></div>

<p>It is quite obvious that ddply performs very bad when the number of unique categories is large. The ave function performs better. However, the data.table option is by far the best one, outperforming both other alternatives easily. In response to this, Hadley Wickham (author of plyr) responded:</p>
<blockquote><p>This is a drawback of the way that ddply always works with data frames.  It will be a bit faster if you use summarise instead of data.frame (because data.frame is very slow), but I&#8217;m still thinking about how to overcome this fundamental limitation of the ddply approach.
</p></blockquote>
<p>I hope this comparison is of use to readers. And remember, think before complaining that R is slow <img src='http://www.r-statistics.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .</p>
<p>Paul  (e-mail: <a href="mailto://p.h.hiemstra@gmail.com">p.h.hiemstra@gmail.com</a>)</p>
<p>ps This blogpost is based on discussions on the R-help and manipulatr mailing lists:<br />
- <a href="http://www.mail-archive.com/r-help@r-project.org/msg142797.html">http://www.mail-archive.com/r-help@r-project.org/msg142797.html</a><br />
- <a href="http://groups.google.com/group/manipulatr/browse_thread/thread/5e8dfed85048df99">http://groups.google.com/group/manipulatr/browse_thread/thread/5e8dfed85048df99</a></p>
<p>R code to perform the comparison</p>

<div class="wp_codebox"><table><tr id="p7926"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code" id="p792code6"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>ggplot2<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>data.<span style="">table</span><span style="color: #080;">&#41;</span>
theme_set<span style="color: #080;">&#40;</span>theme_bw<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
datsize <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>10e4, 10e5<span style="color: #080;">&#41;</span>
noClasses <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">10</span>, <span style="color: #ff0000;">50</span>, <span style="color: #ff0000;">100</span>, <span style="color: #ff0000;">200</span>, <span style="color: #ff0000;">500</span>, <span style="color: #ff0000;">1000</span>, <span style="color: #ff0000;">2500</span>, 10e3<span style="color: #080;">&#41;</span>
comb <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">expand.<span style="">grid</span></span><span style="color: #080;">&#40;</span>datsize <span style="color: #080;">=</span> datsize, noClasses <span style="color: #080;">=</span> noClasses<span style="color: #080;">&#41;</span>
res <span style="color: #080;">=</span> ddply<span style="color: #080;">&#40;</span>comb, .<span style="color: #080;">&#40;</span>datsize, noClasses<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
  expdata <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>value <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span>x$datsize<span style="color: #080;">&#41;</span>,
                      <span style="color: #0000FF; font-weight: bold;">cat</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">runif</span><span style="color: #080;">&#40;</span>x$datsize, <span style="color: #0000FF; font-weight: bold;">min</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">max</span> <span style="color: #080;">=</span> x$noClasses<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  expdataDT <span style="color: #080;">=</span> data.<span style="">table</span><span style="color: #080;">&#40;</span>expdata<span style="color: #080;">&#41;</span>
&nbsp;
  t1 <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span><span style="color: #080;">&#40;</span>res1 <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">with</span><span style="color: #080;">&#40;</span>expdata, <span style="color: #0000FF; font-weight: bold;">ave</span><span style="color: #080;">&#40;</span>value, <span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  t2 <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span><span style="color: #080;">&#40;</span>res2 <span style="color: #080;">&lt;-</span> ddply<span style="color: #080;">&#40;</span>expdata, .<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">mean</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
  t3 <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">system.<span style="">time</span></span><span style="color: #080;">&#40;</span>res3 <span style="color: #080;">&lt;-</span> expdataDT<span style="color: #080;">&#91;</span>, <span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>value<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">by</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">cat</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
  <span style="color: #0000FF; font-weight: bold;">return</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>tave <span style="color: #080;">=</span> t1<span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span>, tddply <span style="color: #080;">=</span> t2<span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span>, tdata.<span style="">table</span> <span style="color: #080;">=</span> t3<span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>, .<span style="">progress</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">'text'</span><span style="color: #080;">&#41;</span>
&nbsp;
res
&nbsp;
ggplot<span style="color: #080;">&#40;</span>aes<span style="color: #080;">&#40;</span>x <span style="color: #080;">=</span> noClasses, y <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">log</span><span style="color: #080;">&#40;</span>value<span style="color: #080;">&#41;</span>, color <span style="color: #080;">=</span> variable<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span>
melt<span style="color: #080;">&#40;</span>res, id.<span style="">vars</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;datsize&quot;</span>,<span style="color: #ff0000;">&quot;noClasses&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span> facet_wrap<span style="color: #080;">&#40;</span>~ datsize<span style="color: #080;">&#41;</span>
<span style="color: #080;">+</span> geom_line<span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2011/08/comparison-of-ave-ddply-and-data-table/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Engineering Data Analysis (with R and ggplot2) &#8211; a Google Tech Talk given by Hadley Wickham</title>
		<link>http://www.r-statistics.com/2011/06/engineering-data-analysis-with-r-and-ggplot2-a-google-tech-talk-given-by-hadley-wickham/</link>
		<comments>http://www.r-statistics.com/2011/06/engineering-data-analysis-with-r-and-ggplot2-a-google-tech-talk-given-by-hadley-wickham/#comments</comments>
		<pubDate>Fri, 17 Jun 2011 08:30:48 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[R links]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[ggplot2 book]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google tech talk]]></category>
		<category><![CDATA[Hadley Wickham]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=760</guid>
		<description><![CDATA[It appears that just days ago, Google Tech Talk released a new, one hour long, video of a presentation (from June 6, 2011) made by one of R&#8217;s community more influential contributors, Hadley Wickham. This seems to be one of the better talks to send a programmer friend who is interested in getting into R. [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2011/06/engineering-data-analysis-with-r-and-ggplot2-a-google-tech-talk-given-by-hadley-wickham/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2011/06/engineering-data-analysis-with-r-and-ggplot2-a-google-tech-talk-given-by-hadley-wickham/"></g:plusone></div></div><p><a href="http://www.r-statistics.com/wp-content/uploads/2011/06/YouTube-Engineering-Data-Analysis-with-R-and-ggplot2-Google-Chrome_2011-06-17_11-31-21.png"><img class="alignnone size-full wp-image-764" title="YouTube - Engineering Data Analysis (with R and ggplot2) - Google Chrome_2011-06-17_11-31-21" src="http://www.r-statistics.com/wp-content/uploads/2011/06/YouTube-Engineering-Data-Analysis-with-R-and-ggplot2-Google-Chrome_2011-06-17_11-31-21-e1308299835422.png" alt="" width="500" height="307" /></a></p>
<p>It appears that just days ago, Google Tech Talk released a new, one hour long, video of a presentation (from June 6, 2011) made by one of R&#8217;s community more influential contributors, <a href="http://had.co.nz/">Hadley Wickham</a>.</p>
<p>This seems to be one of the better talks to send a programmer friend who is interested in getting into <a href="http://www.r-project.org/">R</a>.</p>
<h3>Talk abstract</h3>
<p>Data analysis, the process of converting data into knowledge, insight and understanding, is a critical part of statistics, but there&#8217;s surprisingly little research on it. In this talk I&#8217;ll introduce some of my recent work, including a model of data analysis. I&#8217;m a passionate advocate of programming that data analysis should be carried out using a programming language, and I&#8217;ll justify this by discussing some of the requirement of good data analysis (reproducibility, automation and communication). With these in mind, I&#8217;ll introduce you to a powerful set of tools for better understanding data: the statistical programming language R, and the ggplot2 domain specific language (DSL) for visualisation.</p>
<h3>The video</h3>
<p><object width="500" height="306"><param name="movie" value="http://www.youtube.com/v/TaxJwC_MP9Q?version=3"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/TaxJwC_MP9Q?version=3" type="application/x-shockwave-flash" width="500" height="306" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h3>More resources</h3>
<ul>
<li><a href="http://had.co.nz/">Hadley&#8217;s homepage</a></li>
<li><a href="http://hadley.github.com/">More talks/presentations by Hadley</a></li>
<li><a href="http://had.co.nz/ggplot2/book/">The ggplot2 book (sample chapters)</a></li>
<li><a href="http://cran.r-project.org/web/packages/ggplot2/index.html">GGplot2 on CRAN</a></li>
<li>Hat (link) tip goes to my good, <a href="http://productivewise.com/">social media, internet and productivity researcher</a>, friend Eyal Sela &#8211; for informing me about this talk.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2011/06/engineering-data-analysis-with-r-and-ggplot2-a-google-tech-talk-given-by-hadley-wickham/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Rose plot using Deducers ggplot2 plot builder</title>
		<link>http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/</link>
		<comments>http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 22:35:52 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[deducer]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[Hadley Wickham]]></category>
		<category><![CDATA[Ian fellows]]></category>
		<category><![CDATA[interfaces]]></category>
		<category><![CDATA[plot builder]]></category>
		<category><![CDATA[R GUI]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[tutorials]]></category>
		<category><![CDATA[videos]]></category>
		<category><![CDATA[youtube]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=517</guid>
		<description><![CDATA[The (excellent!) LearnR blog had a post today about making a rose plot in ggplot2. Following today&#8217;s announcement, by Ian Fellows, regarding the release of the new version of Deducer (0.4) offering a strong support for ggplot2 using a GUI plot builder, Ian also sent an e-mail where he shows how to create a rose [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/"></g:plusone></div></div><p>The (excellent!) <a href="http://learnr.wordpress.com/2010/08/16/consultants-chart-in-ggplot2/">LearnR blog had a post today</a> about making a rose plot in<br />
<a href="http://had.co.nz/ggplot2/">ggplot2</a>.</p>
<p>Following today&#8217;s announcement, by <a href="http://www.deducer.org/pmwiki/index.php/">Ian Fellows</a>, regarding <a href="http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/">the release of the new version of Deducer (0.4)</a> offering a strong support for ggplot2 using a GUI plot builder,  Ian also sent an e-mail where he shows how to create a rose plot using the new ggplot2 GUI included in the latest version of Deducer.  After the template is made, the plot can be generated with 4 clicks of the mouse.</p>
<p>Here is a video tutorial (Ian published) to show how this can be used:</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/CHYATHLM5sY?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/CHYATHLM5sY?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>The generated template file is available at:<br />
<a href="http://neolab.stat.ucla.edu/cranstats/rose.ggtmpl">http://neolab.stat.ucla.edu/cranstats/rose.ggtmpl</a></p>
<p>I am excited about the work Ian is doing, and hope to see more people publish use cases with Deducer.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ggplot2 plot builder is now on CRAN! (through Deducer 0.4 GUI for R)</title>
		<link>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/</link>
		<comments>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 18:53:03 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[deducer]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[google summer of code]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[Hadley Wickham]]></category>
		<category><![CDATA[Ian fellows]]></category>
		<category><![CDATA[interfaces]]></category>
		<category><![CDATA[plot builder]]></category>
		<category><![CDATA[R GUI]]></category>
		<category><![CDATA[SPSS]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[tutorials]]></category>
		<category><![CDATA[videos]]></category>
		<category><![CDATA[youtube]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=507</guid>
		<description><![CDATA[Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of Deducer (0.4) to CRAN (scheduled to update in the next day or so). This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality. Following is the e-mail [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/"></g:plusone></div></div><p>Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of <a href="http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual">Deducer </a>(0.4) to <a href="http://cran.r-project.org/web/packages/Deducer/index.html">CRAN</a> (scheduled to update in the next day or so).<br />
This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality.</p>
<p>Following is the e-mail he sent out with all the details and demo videos.</p>
<p><span id="more-507"></span></p>
<h3>Deducer</h3>
<p>Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is two fold.</p>
<p>Provide an intuitive interface so that non-technical users can learn and perform analyses without programming getting in their way.<br />
Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.<br />
Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).</p>
<p>For those not familiar with Deducer, an online manual is available at: <a href="http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual">http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual</a></p>
<p>An introductory tour of Deducer (4.5 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/iZ857h2j6wA?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/iZ857h2j6wA?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>There is also an &#8220;expert users introsuction&#8221; (8 min)</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/AjLToyuluSM?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/AjLToyuluSM?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h3>ggplot2 Plot Builder</h3>
<p>The major change to Deducer is the inclusion of a new plotting GUI built on the ggplot2 package. This Google Summer of Code project provides an easy to use system to make anything from simple histograms, to custom publication ready graphics. Feel free to check out the video introduction:</p>
<p>Part 1 (6 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/-Rym6Ucraes?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/-Rym6Ucraes?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Part 2 (6 min): </p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/k6elEgB3OCE?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/k6elEgB3OCE?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Additional videos:<br />
Templates (5 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/ktdifzqbLW8?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/ktdifzqbLW8?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Extending the Builder (4 min):</p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/RsxOo0jx0II?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/RsxOo0jx0II?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h3>Deducer Extras</h3>
<p>The DeducerExtras package is an add-on package containing a variety of additional analysis dialogs. These include:</p>
<ul>
<li>Distribution quantiles</li>
<li>Single/multiple sample proportion tests</li>
<li>Paired t-test, and wilcoxon signed rank test</li>
<li>Levene&#8217;s test and bartlett&#8217;s test</li>
<li>K-means clustering</li>
<li>Hierarchical clustering</li>
<li>Factor analysis</li>
<li>Multi-dimensional scaling</li>
</ul>
<p>Introduction to Deducer Extras (~2 min): </p>
<p><object width="500" height="400"><param name="movie" value="http://www.youtube.com/v/UCrhxB8tSJY?fs=1"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/UCrhxB8tSJY?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<h3>Final thanks</h3>
<p>I would like to take this opportunity to thank the R community for choosing this project for a Google Summer of Code grant, and for the support and encouragement. In particular I would like to thank Hadley Wickham for mentoring the Plot Builder GUI, and Dirk Eddelbuettel for his organization of students and mentors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>New versions for ggplot2 (0.8.8) and plyr (1.0) were released today</title>
		<link>http://www.r-statistics.com/2010/07/released-today-new-versions-for-ggplot2-0-8-8-and-plyr-1-0/</link>
		<comments>http://www.r-statistics.com/2010/07/released-today-new-versions-for-ggplot2-0-8-8-and-plyr-1-0/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 07:32:11 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[Hadley Wickham]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[plyr]]></category>
		<category><![CDATA[update]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=459</guid>
		<description><![CDATA[As prolific as the CRAN website is of packages, there are several packages to R that succeeds in standing out for their wide spread use (and quality), Hadley Wickhams ggplot2 and plyr are two such packages. And today (through twitter) Hadley has updates the rest of us with the news: just released new versions of [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/07/released-today-new-versions-for-ggplot2-0-8-8-and-plyr-1-0/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/07/released-today-new-versions-for-ggplot2-0-8-8-and-plyr-1-0/"></g:plusone></div></div><p>As prolific as the CRAN website is of packages, there are several packages to R that succeeds in standing out for their wide spread use (and quality), <a href="http://had.co.nz/">Hadley Wickhams </a><a href="http://had.co.nz/ggplot2/">ggplot2 </a>and <a href="http://had.co.nz/plyr/">plyr </a>are two such packages.<br />
<img src="http://had.co.nz/plyr/pliers.jpg" alt="plyr image" /><br />
And today (<a href="http://twitter.com/hadleywickham/status/17814050267">through twitter</a>) Hadley has updates the rest of us with the news:</p>
<blockquote><p>just released new versions of plyr and ggplot2. source versions available on cran, compiled will follow soon #rstats</p></blockquote>
<p>Going to the CRAN website shows that plyr has gone through the most major update, with the last update (before the current one) taking place on 2009-06-23.  And now, over a year later, we are presented with plyr version 1, which includes New functions, New features some Bug fixes and a much anticipated Speed improvements.<br />
ggplot2, has made a tiny leap from version 0.8.7 to 0.8.8, and was previously last updated on 2010-03-03.</p>
<p>Me, and I am sure many R users are very thankful for the amazing work that Hadley Wickham is doing (both on his code, and with helping other useRs on the help lists).  So Hadley, <strong>thank you</strong>!</p>
<p>Here is the complete change-log list for both packages:<br />
<span id="more-459"></span></p>
<h3>plyr 1.0 (2010-07-02) &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</h3>
<p>(taken from <a href="http://cran.r-project.org/web/packages/plyr/NEWS">the CRAN website</a>)<br />
<strong> New functions:</strong></p>
<p>* arrange, a new helper method for reordering a data frame.<br />
* count, a version of table that returns data frames immediately and that is<br />
much much faster for high-dimensional data.<br />
* desc makes it easy to sort any vector in descending order<br />
* join, works like merge but can be much faster and has a somewhat simpler<br />
syntax drawing from SQL terminology<br />
* rbind.fill.matrix is like rbind.fill but works for matrices, code<br />
contributed by C. Beleites</p>
<p><strong>Speed improvements</strong></p>
<p>* experimental immutable data frame (idata.frame) that vastly speeds up<br />
subsetting &#8211; for large datasets with large numbers of groups, this can yield<br />
10-fold speed ups. See examples in ?idata.frame to see how to use it.<br />
* rbind.fill rewritten again to increase speed and work with more data types<br />
* d*ply now much faster with nested groups</p>
<p><strong>New features:</strong></p>
<p>* d*ply now accepts NULL for splitting variables, indicating that the data<br />
should not be split<br />
* plyr no longer exports internal functions, many of which were causing<br />
clashes with other packages<br />
* rbind.fill now works with data frame columns that are lists or matrices<br />
* test suite ensures that plyr behaviour is correct and will remain correct<br />
as I make future improvements.</p>
<p><strong>Bug fixes:</strong></p>
<p>* **ply: if zero splits, empty list(), data.frame() or logical() returned,<br />
as appropriate for the output type<br />
* **ply: leaving .fun as NULL now always returns list<br />
(thanks to Stavros Macrakis for the bug report)<br />
* a*ply: labels now respect options(stringAsFactors)<br />
* each: scoping bug fixed, thanks to Yasuhisa Yoshida for the bug report<br />
* list_to_dataframe is more consistent when processing a single data frame<br />
* NAs preserved in more places<br />
* progress bars: guaranteed to terminate even if **ply prematurely terminates<br />
* progress bars: misspelling gives informative warning, instead of<br />
uninformative error<br />
* splitter_d: fixed ordering bug when .drop = FALSE</p>
<h3>ggplot2 0.8.8 (2010-07-02) &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</h3>
<p>(taken from <a href="http://cran.r-project.org/web/packages/ggplot2/NEWS">the CRAN website</a>)</p>
<p><strong>Bug fixes:</strong></p>
<p>* coord_equal finally works as expected (thanks to continued prompting from Jean-Olivier Irisson)<br />
* coord_equal renamed to coord_fixed to better represent capabilities<br />
* coord_polar and coord_polar: new munching system that uses distances (as defined by the coordinate system) to figure out how many pieces each segment should be broken in to (thanks to prompting from Jean-Olivier Irisson)<br />
* fix ordering bug in facet_wrap (thanks to bug report by Frank Davenport)<br />
* geom_errorh correctly responds to height parameter outside of aes<br />
* geom_hline and geom_vline will not impact legend when used for fixed intercepts<br />
* geom_hline/geom_vline: intercept values not set quite correctly which caused a problem in conjunction with transformed scales (reported by Seth Finnegan)<br />
* geom_line: can now stack lines again with position = &#8220;stack&#8221; (fixes #74)<br />
* geom_segment: arrows now preserved in non-Cartesian coordinate system (fixes #117)<br />
* geom_smooth now deals with missing values in the same way as geom_line (thanks to patch from Karsten Loesing)<br />
* guides: check all axis labels for expressions (reported by Benji Oswald)<br />
* guides: extra 0.5 line margin around legend (fixes #71)<br />
* guides: non-left legend positions now work once more (thanks to patch from Karsten Loesing)<br />
* label_bquote works with more expressions (factors now cast to characters, thanks to Baptiste Auguie for bug report)<br />
* scale_color: add missing US spellings<br />
* stat: panels with no non-missing values trigged errors with some statistics. (reported by Giovanni Dall&#8217;Olio)<br />
* stat: statistics now also respect layer parameter inherit.aes (thanks to bug report by Lorenzo Isella and investigation by Brian Diggs)<br />
* stat_bin no longer drops 0-count bins by default<br />
* stat_bin: fix small bug when dealing with single bin with NA position (reported by John Rauser)<br />
* stat_binhex: uses range of data from scales when computing binwidth so hexes are the same size in all facets (thanks to Nicholas Lewin-Koh for the bug report)<br />
* stat_qq has new dparam parameter for specifying distribution parameters (thanks to Yunfeng Zhang for the bug report)<br />
* stat_smooth now uses built-in confidence interval (with small sample correction) for linear models (thanks to suggestion by Ian Fellows)<br />
* sta</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/07/released-today-new-versions-for-ggplot2-0-8-8-and-plyr-1-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clustergram: visualization and diagnostics for cluster analysis (R code)</title>
		<link>http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/</link>
		<comments>http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 08:22:34 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[base graphics]]></category>
		<category><![CDATA[cluster analysis]]></category>
		<category><![CDATA[clustergram]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[Dendrogram]]></category>
		<category><![CDATA[diagnose]]></category>
		<category><![CDATA[diagnosing]]></category>
		<category><![CDATA[diagnostics]]></category>
		<category><![CDATA[functions]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[hierarchical clustering]]></category>
		<category><![CDATA[iris]]></category>
		<category><![CDATA[iris data set]]></category>
		<category><![CDATA[large data]]></category>
		<category><![CDATA[matlines]]></category>
		<category><![CDATA[non-hierarchical]]></category>
		<category><![CDATA[parallel coordinates]]></category>
		<category><![CDATA[R code]]></category>
		<category><![CDATA[R functions]]></category>
		<category><![CDATA[tree]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=391</guid>
		<description><![CDATA[About Clustergrams In 2002, Matthias Schonlau published in &#8220;The Stata Journal&#8221; an article named &#8220;The Clustergram: A graph for visualizing hierarchical and . As explained in the abstract: In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed. I propose an alternative graph named “clustergram” to examine how cluster members are [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/"></g:plusone></div></div><h3>About Clustergrams</h3>
<p>In 2002, <a href="http://www.schonlau.net/clustergram.html">Matthias Schonlau </a>published in &#8220;The Stata Journal&#8221; an article named &#8220;<a href="https://docs.google.com/viewer?url=http://www.schonlau.net/publication/02stata_clustergram.pdf">The Clustergram: A graph for visualizing hierarchical and </a>.  As explained in the abstract:</p>
<blockquote><p>In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed. I propose an alternative graph named “clustergram” to examine how cluster members are assigned to clusters as the number of clusters increases.<br />
This graph is useful in exploratory analysis for non-hierarchical clustering algorithms like k-means and for hierarchical cluster algorithms when the number of observations is large enough to make dendrograms impractical.</p></blockquote>
<p>A <a href="https://docs.google.com/viewer?url=http://www.schonlau.net/publication/04compstat_clustergram.pdf">similar article</a> was later written and was (maybe) published in &#8220;computational statistics&#8221;.</p>
<p>Both articles gives some nice background to known methods like k-means and methods for hierarchical clustering, and then goes on to present examples of using these methods (with the Clustergarm) to analyse some datasets.</p>
<p>Personally, I understand the clustergram to be a type of parallel coordinates plot where each observation is given a vector.  The vector contains the observation&#8217;s location according to how many clusters the dataset was split into.  The scale of the vector is the scale of the first principal component of the data. </p>
<h3>Clustergram in R (a basic function)</h3>
<p>After finding out about this method of visualization, I was hunted by the curiosity to play with it a bit.  Therefore, and since I didn&#8217;t find any implementation of the graph in R, I went about writing the code to implement it.</p>
<p>The code only works for kmeans, but it shows how such a plot can be produced, and could be later modified so to offer methods that will connect with different clustering algorithms.</p>
<p>The function I present here gets a data.frame/matrix with a row for each observation, and the variable dimensions present in the columns.<br />
The function assumes the data is scaled.<br />
The function then goes about calculating the cluster centers for our data, for varying number of clusters.<br />
For each cluster iteration, the cluster centers are multiplied by the first loading of the principal components of the original data.  Thus offering a weighted mean of the each cluster center dimensions that might give a decent representation of that cluster (this method has the known limitations of using the first component of a PCA for dimensionality reduction, but I won&#8217;t go into that in this post).<br />
Finally all of our data points are ordered according to their respective cluster first component, and plotted against the number of clusters (thus creating the clustergram).</p>
<p>My thank goes to <a href="http://had.co.nz/">Hadley Wickham</a> for offering some good tips on how to prepare the graph.</p>
<p>Here is the code (example follows)</p>

<div class="wp_codebox"><table><tr id="p39112"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
</pre></td><td class="code" id="p391code12"><pre class="rsplus" style="font-family:monospace;">&nbsp;
&nbsp;
clustergram.<span style="">kmeans</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>Data, k, ...<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
	<span style="color: #228B22;"># this is the type of function that the clustergram</span>
	<span style="color: #228B22;"># 	function takes for the clustering.</span>
	<span style="color: #228B22;"># 	using similar structure will allow implementation of different clustering algorithms</span>
&nbsp;
	<span style="color: #228B22;">#	It returns a list with two elements:</span>
	<span style="color: #228B22;">#	cluster = a vector of length of n (the number of subjects/items)</span>
	<span style="color: #228B22;">#				indicating to which cluster each item belongs.</span>
	<span style="color: #228B22;">#	centers = a k dimensional vector.  Each element is 1 number that represent that cluster</span>
	<span style="color: #228B22;">#				In our case, we are using the weighted mean of the cluster dimensions by </span>
	<span style="color: #228B22;">#				Using the first component (loading) of the PCA of the Data.</span>
&nbsp;
	cl <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">kmeans</span><span style="color: #080;">&#40;</span>Data, k,...<span style="color: #080;">&#41;</span>
&nbsp;
	cluster <span style="color: #080;">&lt;-</span> cl$cluster
	centers <span style="color: #080;">&lt;-</span> cl$centers <span style="color: #080;">%*%</span> <span style="color: #0000FF; font-weight: bold;">princomp</span><span style="color: #080;">&#40;</span>Data<span style="color: #080;">&#41;</span>$loadings<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># 1 number per center</span>
												<span style="color: #228B22;"># here we are using the weighted mean for each</span>
&nbsp;
	<span style="color: #0000FF; font-weight: bold;">return</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span>
				cluster <span style="color: #080;">=</span> cluster,
				centers <span style="color: #080;">=</span> centers
			<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>		
&nbsp;
clustergram.<span style="">plot</span>.<span style="">matlines</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>X,Y, k.<span style="">range</span>, 
											x.<span style="">range</span>, y.<span style="">range</span> , COL, 
											add.<span style="">center</span>.<span style="">points</span> , centers.<span style="">points</span><span style="color: #080;">&#41;</span>
	<span style="color: #080;">&#123;</span>
		<span style="color: #0000FF; font-weight: bold;">plot</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;white&quot;</span>, xlim <span style="color: #080;">=</span> x.<span style="">range</span>, ylim <span style="color: #080;">=</span> y.<span style="">range</span>,
			axes <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">F</span>,
			xlab <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;Number of clusters (k)&quot;</span>, ylab <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;PCA weighted Mean of the clusters&quot;</span>, main <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;Clustergram of the PCA-weighted Mean of the clusters k-mean clusters vs number of clusters (k)&quot;</span><span style="color: #080;">&#41;</span>
		<span style="color: #0000FF; font-weight: bold;">axis</span><span style="color: #080;">&#40;</span>side <span style="color: #080;">=</span><span style="color: #ff0000;">1</span>, at <span style="color: #080;">=</span> k.<span style="">range</span><span style="color: #080;">&#41;</span>
		<span style="color: #0000FF; font-weight: bold;">axis</span><span style="color: #080;">&#40;</span>side <span style="color: #080;">=</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span>
		<span style="color: #0000FF; font-weight: bold;">abline</span><span style="color: #080;">&#40;</span>v <span style="color: #080;">=</span> k.<span style="">range</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;grey&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
		<span style="color: #0000FF; font-weight: bold;">matlines</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">t</span><span style="color: #080;">&#40;</span>X<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">t</span><span style="color: #080;">&#40;</span>Y<span style="color: #080;">&#41;</span>, pch <span style="color: #080;">=</span> <span style="color: #ff0000;">19</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> COL, lty <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>, lwd <span style="color: #080;">=</span> <span style="color: #ff0000;">1.5</span><span style="color: #080;">&#41;</span>
&nbsp;
		<span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span>add.<span style="">center</span>.<span style="">points</span><span style="color: #080;">&#41;</span>
		<span style="color: #080;">&#123;</span>
			<span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span>plyr<span style="color: #080;">&#41;</span>
&nbsp;
			xx <span style="color: #080;">&lt;-</span> ldply<span style="color: #080;">&#40;</span>centers.<span style="">points</span>, <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#41;</span>
			<span style="color: #0000FF; font-weight: bold;">points</span><span style="color: #080;">&#40;</span>xx$y~xx$x, pch <span style="color: #080;">=</span> <span style="color: #ff0000;">19</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;red&quot;</span>, cex <span style="color: #080;">=</span> <span style="color: #ff0000;">1.3</span><span style="color: #080;">&#41;</span>
&nbsp;
			<span style="color: #228B22;"># add points	</span>
			<span style="color: #228B22;"># temp &lt;- l_ply(centers.points, function(xx) {</span>
									<span style="color: #228B22;"># with(xx,points(y~x, pch = 19, col = &quot;red&quot;, cex = 1.3))</span>
									<span style="color: #228B22;"># points(xx$y~xx$x, pch = 19, col = &quot;red&quot;, cex = 1.3)</span>
									<span style="color: #228B22;"># return(1)</span>
									<span style="color: #228B22;"># })</span>
						<span style="color: #228B22;"># We assign the lapply to a variable (temp) only to suppress the lapply &quot;NULL&quot; output</span>
		<span style="color: #080;">&#125;</span>	
	<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
&nbsp;
clustergram <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">10</span> , 
							clustering.<span style="">function</span> <span style="color: #080;">=</span> clustergram.<span style="">kmeans</span>,
							clustergram.<span style="">plot</span> <span style="color: #080;">=</span> clustergram.<span style="">plot</span>.<span style="">matlines</span>, 
							line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
	<span style="color: #228B22;"># Data - should be a scales matrix.  Where each column belongs to a different dimension of the observations</span>
	<span style="color: #228B22;"># k.range - is a vector with the number of clusters to plot the clustergram for</span>
	<span style="color: #228B22;"># clustering.function - this is not really used, but offers a bases to later extend the function to other algorithms </span>
	<span style="color: #228B22;">#			Although that would  more work on the code</span>
	<span style="color: #228B22;"># line.width - is the amount to lift each line in the plot so they won't superimpose eachother</span>
	<span style="color: #228B22;"># add.center.points - just assures that we want to plot points of the cluster means</span>
&nbsp;
	n <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">dim</span><span style="color: #080;">&#40;</span>Data<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>
&nbsp;
	PCA.1 <span style="color: #080;">&lt;-</span> Data <span style="color: #080;">%*%</span> <span style="color: #0000FF; font-weight: bold;">princomp</span><span style="color: #080;">&#40;</span>Data<span style="color: #080;">&#41;</span>$loadings<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># first principal component of our data</span>
&nbsp;
	<span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span>colorspace<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
			COL <span style="color: #080;">&lt;-</span> heat_hcl<span style="color: #080;">&#40;</span>n<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>PCA.1<span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># line colors</span>
		<span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #080;">&#123;</span>
			COL <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rainbow</span><span style="color: #080;">&#40;</span>n<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>PCA.1<span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># line colors</span>
			<span style="color: #0000FF; font-weight: bold;">warning</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'Please consider installing the package &quot;colorspace&quot; for prittier colors'</span><span style="color: #080;">&#41;</span>
		<span style="color: #080;">&#125;</span>
&nbsp;
	line.<span style="">width</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>line.<span style="">width</span>, n<span style="color: #080;">&#41;</span>
&nbsp;
	Y <span style="color: #080;">&lt;-</span> NULL	<span style="color: #228B22;"># Y matrix</span>
	X <span style="color: #080;">&lt;-</span> NULL	<span style="color: #228B22;"># X matrix</span>
&nbsp;
	centers.<span style="">points</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
&nbsp;
	<span style="color: #0000FF; font-weight: bold;">for</span><span style="color: #080;">&#40;</span>k <span style="color: #0000FF; font-weight: bold;">in</span> k.<span style="">range</span><span style="color: #080;">&#41;</span>
	<span style="color: #080;">&#123;</span>
		k.<span style="">clusters</span> <span style="color: #080;">&lt;-</span> clustering.<span style="">function</span><span style="color: #080;">&#40;</span>Data, k<span style="color: #080;">&#41;</span>
&nbsp;
		clusters.<span style="">vec</span> <span style="color: #080;">&lt;-</span> k.<span style="">clusters</span>$cluster
			<span style="color: #228B22;"># the.centers &lt;- apply(cl$centers,1, mean)</span>
		the.<span style="">centers</span> <span style="color: #080;">&lt;-</span> k.<span style="">clusters</span>$centers 
&nbsp;
		noise <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">unlist</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">tapply</span><span style="color: #080;">&#40;</span>line.<span style="">width</span>, clusters.<span style="">vec</span>, <span style="color: #0000FF; font-weight: bold;">cumsum</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">seq_along</span><span style="color: #080;">&#40;</span>clusters.<span style="">vec</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>clusters.<span style="">vec</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>	
		<span style="color: #228B22;"># noise &lt;- noise - mean(range(noise))</span>
		y <span style="color: #080;">&lt;-</span> the.<span style="">centers</span><span style="color: #080;">&#91;</span>clusters.<span style="">vec</span><span style="color: #080;">&#93;</span> <span style="color: #080;">+</span> noise
		Y <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span>Y, y<span style="color: #080;">&#41;</span>
		x <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>k, <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>y<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
		X <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span>X, x<span style="color: #080;">&#41;</span>
&nbsp;
		centers.<span style="">points</span><span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>k<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>y <span style="color: #080;">=</span> the.<span style="">centers</span> , x <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>k , k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>	
	<span style="color: #228B22;">#	points(the.centers ~ rep(k , k), pch = 19, col = &quot;red&quot;, cex = 1.5)</span>
	<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
	x.<span style="">range</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">range</span><span style="color: #080;">&#40;</span>k.<span style="">range</span><span style="color: #080;">&#41;</span>
	y.<span style="">range</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">range</span><span style="color: #080;">&#40;</span>PCA.1<span style="color: #080;">&#41;</span>
&nbsp;
	clustergram.<span style="">plot</span><span style="color: #080;">&#40;</span>X,Y, k.<span style="">range</span>, 
											x.<span style="">range</span>, y.<span style="">range</span> , COL, 
											add.<span style="">center</span>.<span style="">points</span> , centers.<span style="">points</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
<span style="color: #080;">&#125;</span></pre></td></tr></table></div>

<h3>Example on the iris dataset</h3>
<p>The<a href="http://en.wikipedia.org/wiki/Iris_flower_data_set"> iris data set</a> is a favorite example of many <a href="http://www.r-bloggers.com/?s=iris">R bloggers </a> when writing about <a href="http://opendatagroup.com/2009/10/21/r-accessors-explained/">R accessors </a>, <a href="http://learnr.wordpress.com/2009/10/06/export-data-frames-to-multi-worksheet-excel-file/">Data Exporting</a>, <a href="http://yihui.name/en/2009/09/how-to-import-ms-excel-data-into-r/">Data importing</a>, and for <a href="http://weitaiyun.blogspot.com/2009/03/unison-graph-and-parallel-coordinate.html">different </a><a href="http://weitaiyun.blogspot.com/2009/03/scatterplots.html">visualization </a>techniques.<br />
So it seemed only natural to experiment on it here.</p>

<div class="wp_codebox"><table><tr id="p39113"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p391code13"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">&#40;</span><span style="color: #CC9900; font-weight: bold;">iris</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">250</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>cex.<span style="">lab</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.5</span>, cex.<span style="">main</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.2</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">scale</span><span style="color: #080;">&#40;</span><span style="color: #CC9900; font-weight: bold;">iris</span><span style="color: #080;">&#91;</span>,<span style="color: #080;">-</span><span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notice I am scaling the vectors)</span>
clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">8</span>, line.<span style="">width</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.004</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notice how I am using line.width.  Play with it on your problem, according to the scale of Y.</span></pre></td></tr></table></div>

<p>Here is the output:<br />
<a href="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-1.png"><img src="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-1.png" alt="" title="clustergram 1" width="500"></a></p>
<p>Looking at the image we can notice a few interesting things.  We notice that one of the clusters formed (the lower one) stays as is no matter how many clusters we are allowing (except for one observation that goes way and then beck).<br />
We can also see that the second split is a solid one (in the sense that it splits the first cluster into two clusters which are not &#8220;close&#8221; to each other, and that about half the observations goes to each of the new clusters).<br />
And then notice how moving to 5 clusters makes almost no difference.<br />
Lastly, notice how when going for 8 clusters, we are practically left with 4 clusters (remember &#8211; this is according the mean of cluster centers by the loading of the first component of the PCA on the data)</p>
<p>If I where to take something from this graph, I would say I have a strong tendency to use 3-4 clusters on this data.</p>
<p>But wait, did our clustering algorithm do a stable job?<br />
Let&#8217;s try running the algorithm 6 more times (each run will have a different starting point for the clusters)</p>

<div class="wp_codebox"><table><tr id="p39114"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p391code14"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">500</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">scale</span><span style="color: #080;">&#40;</span><span style="color: #CC9900; font-weight: bold;">iris</span><span style="color: #080;">&#91;</span>,<span style="color: #080;">-</span><span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notice I am scaling the vectors)</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>cex.<span style="">lab</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.2</span>, cex.<span style="">main</span> <span style="color: #080;">=</span> .7<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>mfrow <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">3</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">for</span><span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">6</span><span style="color: #080;">&#41;</span> clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">8</span> , line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>Resulting with:  (press the image to enlarge it)<br />
<a href="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-6.png"><img src="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-6.png" alt="" title="clustergram 6" width="500"></a><br />
Repeating the analysis offers even more insights.<br />
First, it would appear that until 3 clusters, the algorithm gives rather stable results.<br />
From 4 onwards we get various outcomes at each iteration.<br />
At some of the cases, we got 3 clusters when we asked for 4 or even 5 clusters.</p>
<p>Reviewing the new plots, I would prefer to go with the 3 clusters option.  Noting how the two &#8220;upper&#8221; clusters might have similar properties while the lower cluster is quite distinct from the other two.</p>
<p>By the way, the Iris data set is composed of three types of flowers.  I imagine the kmeans  had done a decent job in distinguishing the three.</p>
<h3>Limitation of the method (and a possible way to overcome it?!)</h3>
<p>It is worth noting that the current way the algorithm is built has a fundamental limitation:  The plot is good for detecting a situation where there are several clusters but each of them is clearly &#8220;bigger&#8221; then the one before it (on the first principal component of the data).</p>
<p>For example, let&#8217;s create a dataset with 3 clusters, each one is taken from a normal distribution with a higher mean:</p>

<div class="wp_codebox"><table><tr id="p39115"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p391code15"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">250</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#40;</span>
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">2</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">2</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">2</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
				<span style="color: #080;">&#41;</span>				
clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">5</span> , line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p>The resulting plot for this is the following:<br />
<a href="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-3-ordered-clusters.png"><img src="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-3-ordered-clusters.png" alt="" title="Clustergram-3-ordered-clusters" width="500" class="alignnone size-full wp-image-402" /></a><br />
The image shows a clear distinction between three ranks of clusters.  There is no doubt (for me) from looking at this image, that three clusters would be the correct number of clusters.</p>
<p>But what if the clusters where different but didn&#8217;t have an ordering to them?<br />
For example, look at the following 4 dimensional data:</p>

<div class="wp_codebox"><table><tr id="p39116"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code" id="p391code16"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">250</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#40;</span>
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
				<span style="color: #080;">&#41;</span>				
clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">8</span> , line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span></pre></td></tr></table></div>

<p><a href="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-4-UNordered-clusters.png"><img src="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-4-UNordered-clusters.png" alt="" title="Clustergram-4-UNordered-clusters" width="500" class="alignnone size-full wp-image-403" /></a></p>
<p>In this situation, it is not clear from the location of the clusters on the Y axis that we are dealing with 4 clusters.<br />
But what is interesting, is that through the growing number of clusters, we can notice that there are 4 &#8220;strands&#8221; of data points moving more or less together (until we reached 4 clusters, at which point the clusters started breaking up).<br />
Another hope for handling this might be using the color of the lines in some way, but I haven&#8217;t yet figured out how.</p>
<h3>Clustergram with ggplot2</h3>
<p><a href="http://had.co.nz/">Hadley Wickham</a> has kindly played with recreating the clustergram using the ggplot2 engine.  You can see the result here:<br />
<a href="http://gist.github.com/439761">http://gist.github.com/439761</a><br />
And this is what he wrote about it in the comments:</p>
<blockquote><p>I’ve broken it down into three components:<br />
* run the clustering algorithm and get predictions (many_kmeans and all_hclust)<br />
* produce the data for the clustergram (clustergram)<br />
* plot it (plot.clustergram)<br />
I don’t think I have the logic behind the y-position adjustment quite right though.</p></blockquote>
<p>Here is an example of how it looks:<br />
<a href="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-ggplot2-1.png"><img src="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-ggplot2-1.png" alt="" title="clustergram-ggplot2-1" width="500" class="alignnone size-full wp-image-407" /></a></p>
<h3>Conclusions (some rules of thumb and questions for the future)</h3>
<p>In a first look, it would appear that the clustergram can be of use.  I can imagine using this graph to quickly run various clustering algorithms and then compare them to each other and review their stability (In the way I just demonstrated in the example above).</p>
<p>The three rules of thumb I have noticed by now are:</p>
<ol>
<li>Look at the location of the cluster points on the Y axis. See when they remain stable, when they start flying around, and what happens to them in higher number of clusters (do they re-group together)</li>
<li>Observe the strands of the datapoints.  Even if the clusters centers are not ordered, the lines for each item might (needs more research and thinking) tend to move together &#8211; hinting at the real number of clusters</li>
<li>Run the plot multiple times to observe the stability of the cluster formation (and location)</li>
</ol>
<p>Yet there is more work to be done and questions to seek answers to:</p>
<ul>
<li>The code needs to be extended to offer methods to various clustering algorithms.
</li>
<li>How can the colors of the lines be used better?
</li>
<li>How can this be done using other graphical engines (ggplot2/lattice?) &#8211; (<strong>Update</strong>: look at Hadley&#8217;s reply in the comments)
</li>
<li>What to do in case the first principal component doesn&#8217;t capture enough of the data? (maybe plot this graph to all the relevant components. but then &#8211; how do you make conclusions of it?)
</li>
<li>What other uses/conclusions can be made based on this graph?
</li>
</ul>
<p>I am looking forward to reading your input/ideas in the comments (or in reply posts).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>The new GUI for ggplot2 (using Deducer) &#8211; the designer wants your opinion</title>
		<link>http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/</link>
		<comments>http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/#comments</comments>
		<pubDate>Sat, 01 May 2010 14:29:22 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[deducer]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[interfaces]]></category>
		<category><![CDATA[R GUI]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=331</guid>
		<description><![CDATA[After discovering that R is expected (this summer) to have a GUI for ggplot2 (through deducer), I later found Ian&#8217;s gsoc proposal for this GUI.  Since the system is in it&#8217;s early stages of development, Ian has invited people to give comments, input and critique on his plans for the project. For your convenience (and [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/"></g:plusone></div></div><p>After <a href="http://www.r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects/">discovering that R is expected (this summer) to have a GUI for ggplot2</a> (through <a href="http://cran.r-project.org/web/packages/Deducer/index.html">deducer</a>), I later found <a href="http://neolab.stat.ucla.edu/cranstats/gsoc.pdf">Ian&#8217;s gsoc proposal</a> for this GUI.  Since the system is in it&#8217;s early stages of development, Ian has invited people to give comments, input and critique on his plans for the project.</p>
<p>For your convenience (and with Ian&#8217;s permission), I am reposting his proposal here.  You are welcome to send him feedback by e-mailing him (at: ifellows@gmail.com), or by leaving a comment here (and I will direct him to your comment).</p>
<p><span id="more-331"></span></p>
<p class="gde-text"><a href="http://neolab.stat.ucla.edu/cranstats/gsoc.pdf" target="_blank" class="gde-link">Download (PDF, 2.9MB)</a></p>
<iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fneolab.stat.ucla.edu%2Fcranstats%2Fgsoc.pdf&hl=en_US&embedded=true" class="gde-frame" style="width:500px; height:700px; border: none;" scrolling="no"></iframe>


]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R is going to have a GUI to ggplot2! (by the end of this years google-summer-of-code)</title>
		<link>http://www.r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects/</link>
		<comments>http://www.r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects/#comments</comments>
		<pubDate>Mon, 26 Apr 2010 20:46:40 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[deducer]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[google summer of code]]></category>
		<category><![CDATA[gsoc]]></category>
		<category><![CDATA[GUI]]></category>
		<category><![CDATA[news]]></category>
		<category><![CDATA[R GUI]]></category>
		<category><![CDATA[R news]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=320</guid>
		<description><![CDATA[I was delighted to see the following e-mail post from Dirk Eddelbuettel regarding the google-summer-of-code R google group: * * * Earlier today Google finalised student / mentor pairings and allocations for the Google Summer of Code 2010 (GSoC 2010). The R Project is happy to announce that the following students have been accepted: Colin [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects/"></g:plusone></div></div><p>I was delighted to see the following<del datetime="2010-04-27T05:29:29+00:00"> e-mail </del><a href="http://dirk.eddelbuettel.com/blog/2010/04/26/#gsoc2010_r_students">post from Dirk Eddelbuettel</a> regarding the google-summer-of-code R google group:<br />
*  *  *</p>
<p>Earlier today Google finalised student / mentor pairings and allocations for<br />
the Google Summer of Code 2010 (GSoC 2010).  The R Project is happy to<br />
announce that the following students have been accepted:</p>
<p>  Colin Rundel, &#8220;rgeos &#8211; an R wrapper for GEOS&#8221;, mentored by Roger Bivand of<br />
     the Norges Handelshoyskole, Norway</p>
<p>  Ian Fellows, &#8220;A GUI for Graphics using ggplot2 and Deducer&#8221;, mentored by<br />
     Hadley Wickham of Rice University, USA</p>
<p>  Chidambaram Annamalai, &#8220;rdx &#8211; Automatic Differentiation in R&#8221;, mentored by<br />
     John Nash of University of Ottawa, Canada</p>
<p>  Yasuhisa Yoshida, &#8220;NoSQL interface for R&#8221;, mentored by Dirk Eddelbuettel,<br />
     Chicago, USA</p>
<p>  Felix Schoenbrodt, &#8220;Social Relations Analyses in R&#8221;, mentored by Stefan<br />
     Schmukle, Universitaet Muenster, Germany</p>
<p>  Details about all proposals are on the R Wiki page for the GSoC 2010 at<br />
  <a href="http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010">http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010</a></p>
<p>The R Project is honoured to have received its highest number of student<br />
allocations yet, and looks forward to an exciting Summer of Code.  Please<br />
join me in welcoming our new students.</p>
<p>At this time, I would also like to thank all the other students who have<br />
applied for working with R in this Summer of Code. With a limited number of<br />
available slots, not all proposals can be accepted &#8212; but I hope that those<br />
not lucky enough to have been granted a slot will continue to work with R and<br />
towards making contributions within the R world.</p>
<p>I would also like to express my thanks to all other mentors who provided for<br />
a record number of proposals.  Without mentors and their project ideas we<br />
would not have a Summer of Code &#8212; so hopefully we will see you again next<br />
year.</p>
<p>  Regards,</p>
<p>  Dirk (acting as R/GSoC 2010 admin)</p>
<p>*  *  *</p>
<p>From all the projects, the one I am most excited about is:<br />
Ian Fellows, &#8220;A GUI for Graphics using ggplot2 and Deducer&#8221;, mentored by Hadley Wickham of Rice University, USA</p>
<p><a href="http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.DeducerManual">Deducer </a> (text from the website) attempts to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is to two-fold.</p>
<ul>
<li>Provide an intuitive interface so that non-technical users can learn and perform analyses without programming getting in their way.</li>
<li>Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.
</li>
</ul>
<p>Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).</p>
<p>This combination (of Deducer and ggplot2) might finally provide the bridge to the layman-statistician that some people <a href="http://www.thejuliagroup.com/blog/?p=433">recently wrote</a> to be one of R&#8217;s weak spots (while <a href="http://www.r-statistics.com/2010/04/an-article-attacking-r-gets-responses-from-the-r-blogosphere-some-reflections/">other bloogers wrote back</a> that this is o.k., still no one refuted that R doesn&#8217;t compete with the point-and-click of softwares like SPSS or JMP.)<br />
I came across Ian in the discussion forums, where he provided very kind help to his package &#8220;deducer&#8221;.  Coupled with having Hadley as his mentor, I am very optimistic about the prospects of seeing this project reaching very high standards.<br />
Very exciting development indeed!</p>
<p><strong>Update</strong>: Ian&#8217;s proposal is available to view <a href="http://neolab.stat.ucla.edu/cranstats/gsoc.pdf">here</a>.</p>
<p>p.s: for some intuition about how a GUI for ggplot2 can look like, have a look at <a href="http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/">this video of Jeroen Ooms’s ggplot2 web interface</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Jeroen Ooms&#8217;s ggplot2 web interface &#8211; a new version released (V0.2)</title>
		<link>http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/</link>
		<comments>http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/#comments</comments>
		<pubDate>Mon, 12 Apr 2010 20:34:04 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[R and the web]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[interfaces]]></category>
		<category><![CDATA[jeroen ooms]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[WebSites]]></category>
		<category><![CDATA[youtube]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=266</guid>
		<description><![CDATA[Good news. Jeroen Ooms released a new version of his (amazing) online ggplot2 web interface: yeroon.net/ggplot2 is a web interface for Hadley Wickham&#8217;s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/"></g:plusone></div></div><p>Good news.</p>
<p><a href="http://www.stat.ucla.edu/~jeroen/">Jeroen Ooms</a> released a new version of his <a href="http://www.stat.ucla.edu/~jeroen/ggplot2/">(amazing) online ggplot2 web interface</a>:</p>
<blockquote><p><a href="http://www.yeroon.net/ggplot2/">yeroon.net/ggplot2</a> is a web interface for Hadley Wickham&#8217;s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the client side: a standard browser will do.</p></blockquote>
<p>The new version has a lot of cool new features, like advanced data import, integration with Google docs, converting variables from numeric to factor to dates and vice versa, and a lot of new geom&#8217;s. Some of which you can watch in his new video demo of the application:<br />
<object width="640" height="385"><param name="movie" value="http://www.youtube.com/v/pCzQP7kVEOc&#038;hl=en_US&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/pCzQP7kVEOc&#038;hl=en_US&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p>
<p>The application is on:<br />
<a href="http://www.yeroon.net/ggplot2/">http://www.yeroon.net/ggplot2/</a></p>
<p>p.s: other posts about this (including videos explaining how some of this was done) can be views on the category page: <a href="http://www.r-statistics.com/category/r-and-the-web/">R and the web</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Web Development with R &#8211; an HD video tutorial of Jeroen Ooms talk</title>
		<link>http://www.r-statistics.com/2010/02/web-development-with-r-an-hd-video-tutorial-of-jeroen-ooms-talk/</link>
		<comments>http://www.r-statistics.com/2010/02/web-development-with-r-an-hd-video-tutorial-of-jeroen-ooms-talk/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 07:35:20 +0000</pubDate>
		<dc:creator>Tal Galili</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[R and the web]]></category>
		<category><![CDATA[visualization]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[jeroen ooms]]></category>
		<category><![CDATA[lecture]]></category>
		<category><![CDATA[tutorial]]></category>
		<category><![CDATA[video]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://www.r-statistics.com/?p=73</guid>
		<description><![CDATA[Here is a HD version of a video tutorial on web development with R, a lecture that was given by Jeroen Ooms (the guy who made A web application for R’s ggplot2). This talk was given at the Bay Area UseR Group meeting on R-Powered Web Apps. You can also view the slides for his talk and [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:right;"><div class="socialize-in-button socialize-in-button-right"><iframe src="http://www.facebook.com/plugins/like.php?href=http://www.r-statistics.com/2010/02/web-development-with-r-an-hd-video-tutorial-of-jeroen-ooms-talk/&amp;layout=box_count&amp;show_faces=false&amp;width=50&amp;action=like&amp;font=arial&amp;colorscheme=light&amp;height=65" scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:50px !important; height:65px;" allowTransparency="true"></iframe></div><div class="socialize-in-button socialize-in-button-right"><g:plusone size="tall" href="http://www.r-statistics.com/2010/02/web-development-with-r-an-hd-video-tutorial-of-jeroen-ooms-talk/"></g:plusone></div></div><p>Here is a HD version of <strong>a video tutorial on web development with R</strong>, a lecture that was given by <a href="http://www.stat.ucla.edu/~jeroen/">Jeroen Ooms</a> (the guy who made <a title="A web application for R’s ggplot2" href="http://www.r-statistics.com/2009/12/a-web-application-of-rs-ggplot2/">A web application for R’s ggplot2</a>). This talk was given at <a href="http://blog.revolution-computing.com/2010/01/quick-thoughts-on-rpowered-web-apps.html">the Bay Area UseR Group meeting</a> on R-Powered Web Apps.</p>
<p><object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/7x0UdUghANI&#038;hl=en_US&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/7x0UdUghANI&#038;hl=en_US&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object></p>
<p>You can also view the <a href="http://www.stat.ucla.edu/~jeroen/files/barug2010.pdf">slides</a> for his talk and view (great) examples for: <a href="http://www.stat.ucla.edu/~jeroen/stockplot.html">stockplot</a>, <a href="http://www.stat.ucla.edu/~jeroen/lme4.html">lme4</a>, and <a href="http://www.stat.ucla.edu/~jeroen/ggplot2.html">gpplot2</a>.</p>
<p>Thanks again to Jeroen for sharing his knowledge and experience!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.r-statistics.com/2010/02/web-development-with-r-an-hd-video-tutorial-of-jeroen-ooms-talk/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

