<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>R-statistics blog &#187; visualization</title> <atom:link href="http://www.r-statistics.com/category/visualization/feed/" rel="self" type="application/rss+xml" /><link>http://www.r-statistics.com</link> <description>Writing about statistics with R, and open source stuff (software, data, community)</description> <lastBuildDate>Tue, 07 Sep 2010 18:37:16 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.0.1</generator> <item><title>Rose plot using Deducers ggplot2 plot builder</title><link>http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/</link> <comments>http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/#comments</comments> <pubDate>Mon, 16 Aug 2010 22:35:52 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[deducer]]></category> <category><![CDATA[ggplot2]]></category> <category><![CDATA[GUI]]></category> <category><![CDATA[Hadley Wickham]]></category> <category><![CDATA[Ian fellows]]></category> <category><![CDATA[interfaces]]></category> <category><![CDATA[plot builder]]></category> <category><![CDATA[R GUI]]></category> <category><![CDATA[SPSS]]></category> <category><![CDATA[tutorial]]></category> <category><![CDATA[tutorials]]></category> <category><![CDATA[videos]]></category> <category><![CDATA[youtube]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=517</guid> <description><![CDATA[The (excellent!) LearnR blog had a post today about making a rose plot in ggplot2. Following today&#8217;s announcement, by Ian Fellows, regarding the release of the new version of Deducer (0.4) offering a strong support for ggplot2 using a GUI plot builder, Ian also sent an e-mail where he shows how to create a rose plot using the new ggplot2 GUI included in the latest version of Deducer. After the template is made, the plot can be generated with 4 [...]]]></description> <content:encoded><![CDATA[<p>The (excellent!) <a
href="http://learnr.wordpress.com/2010/08/16/consultants-chart-in-ggplot2/">LearnR blog had a post today</a> about making a rose plot in<br
/> <a
href="http://had.co.nz/ggplot2/">ggplot2</a>.</p><p>Following today&#8217;s announcement, by <a
href="http://www.deducer.org/pmwiki/index.php/">Ian Fellows</a>, regarding <a
href="http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/">the release of the new version of Deducer (0.4)</a> offering a strong support for ggplot2 using a GUI plot builder,  Ian also sent an e-mail where he shows how to create a rose plot using the new ggplot2 GUI included in the latest version of Deducer.  After the template is made, the plot can be generated with 4 clicks of the mouse.</p><p>Here is a video tutorial (Ian published) to show how this can be used:</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/CHYATHLM5sY?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/CHYATHLM5sY?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><p>The generated template file is available at:<br
/> <a
href="http://neolab.stat.ucla.edu/cranstats/rose.ggtmpl">http://neolab.stat.ucla.edu/cranstats/rose.ggtmpl</a></p><p>I am excited about the work Ian is doing, and hope to see more people publish use cases with Deducer.</p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/08/rose-plot-using-deducers-ggplot2-plot-builder/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>ggplot2 plot builder is now on CRAN! (through Deducer 0.4 GUI for R)</title><link>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/</link> <comments>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/#comments</comments> <pubDate>Mon, 16 Aug 2010 18:53:03 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[deducer]]></category> <category><![CDATA[ggplot2]]></category> <category><![CDATA[google summer of code]]></category> <category><![CDATA[GUI]]></category> <category><![CDATA[Hadley Wickham]]></category> <category><![CDATA[Ian fellows]]></category> <category><![CDATA[interfaces]]></category> <category><![CDATA[plot builder]]></category> <category><![CDATA[R GUI]]></category> <category><![CDATA[SPSS]]></category> <category><![CDATA[tutorial]]></category> <category><![CDATA[tutorials]]></category> <category><![CDATA[videos]]></category> <category><![CDATA[youtube]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=507</guid> <description><![CDATA[Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of Deducer (0.4) to CRAN (scheduled to update in the next day or so). This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality. Following is the e-mail he sent out with all the details and demo videos. Deducer Deducer is designed to be a free easy to use alternative to proprietary data [...]]]></description> <content:encoded><![CDATA[<p>Ian fellows, a hard working contributer to the R community (and a cool guy), has announced today the release of <a
href="http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual">Deducer </a>(0.4) to <a
href="http://cran.r-project.org/web/packages/Deducer/index.html">CRAN</a> (scheduled to update in the next day or so).<br
/> This major update also includes the release of a new plug-in package (DeducerExtras), containing additional dialogs and functionality.</p><p>Following is the e-mail he sent out with all the details and demo videos.</p><p><span
id="more-507"></span></p><h3>Deducer</h3><p>Deducer is designed to be a free easy to use alternative to proprietary data analysis software such as SPSS, JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an excel-like spreadsheet in which to view and edit data frames. The goal of the project is two fold.</p><p>Provide an intuitive interface so that non-technical users can learn and perform analyses without programming getting in their way.<br
/> Increase the efficiency of expert R users when performing common tasks by replacing hundreds of keystrokes with a few mouse clicks. Also, as much as possible the GUI should not get in their way if they just want to do some programming.<br
/> Deducer is designed to be used with the Java based R console JGR, though it supports a number of other R environments (e.g. Windows RGUI and RTerm).</p><p>For those not familiar with Deducer, an online manual is available at: <a
href="http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual">http://www.deducer.org/pmwiki/pmwiki.php?n=Main.DeducerManual</a></p><p>An introductory tour of Deducer (4.5 min):</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/iZ857h2j6wA?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/iZ857h2j6wA?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><p>There is also an &#8220;expert users introsuction&#8221; (8 min)</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/AjLToyuluSM?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/AjLToyuluSM?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><h3>ggplot2 Plot Builder</h3><p>The major change to Deducer is the inclusion of a new plotting GUI built on the ggplot2 package. This Google Summer of Code project provides an easy to use system to make anything from simple histograms, to custom publication ready graphics. Feel free to check out the video introduction:</p><p>Part 1 (6 min):</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/-Rym6Ucraes?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/-Rym6Ucraes?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><p>Part 2 (6 min):</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/k6elEgB3OCE?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/k6elEgB3OCE?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><p>Additional videos:<br
/> Templates (5 min):</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/ktdifzqbLW8?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/ktdifzqbLW8?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><p>Extending the Builder (4 min):</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/RsxOo0jx0II?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/RsxOo0jx0II?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><h3>Deducer Extras</h3><p>The DeducerExtras package is an add-on package containing a variety of additional analysis dialogs. These include:</p><ul><li>Distribution quantiles</li><li>Single/multiple sample proportion tests</li><li>Paired t-test, and wilcoxon signed rank test</li><li>Levene&#8217;s test and bartlett&#8217;s test</li><li>K-means clustering</li><li>Hierarchical clustering</li><li>Factor analysis</li><li>Multi-dimensional scaling</li></ul><p>Introduction to Deducer Extras (~2 min):</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/UCrhxB8tSJY?fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/UCrhxB8tSJY?fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p><h3>Final thanks</h3><p>I would like to take this opportunity to thank the R community for choosing this project for a Google Summer of Code grant, and for the support and encouragement. In particular I would like to thank Hadley Wickham for mentoring the Plot Builder GUI, and Dirk Eddelbuettel for his organization of students and mentors.</p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/08/ggplot2-plot-builder-is-now-available-on-cran-through-deducer-0-4-gui-for-r/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Visualization of regression coefficients (in R)</title><link>http://www.r-statistics.com/2010/07/visualization-of-regression-coefficients-in-r/</link> <comments>http://www.r-statistics.com/2010/07/visualization-of-regression-coefficients-in-r/#comments</comments> <pubDate>Fri, 02 Jul 2010 19:46:56 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[coefficients]]></category> <category><![CDATA[Coefficients Visualization]]></category> <category><![CDATA[graph]]></category> <category><![CDATA[plot]]></category> <category><![CDATA[regression]]></category> <category><![CDATA[regression plot]]></category> <category><![CDATA[regression Visualization]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=435</guid> <description><![CDATA[Update (07.07.10): The function in this post has a more mature version in the &#8220;arm&#8221; package. See at the end of this post for more details. * * * * Imagine you want to give a presentation or report of your latest findings running some sort of regression analysis. How would you do it? This was exactly the question Wincent Rong-gui HUANG has recently asked on the R mailing list. One person, Bernd Weiss, responded by linking to the chapter [...]]]></description> <content:encoded><![CDATA[<p><strong>Update (07.07.10)</strong>: The function in this post has a more mature version in the &#8220;arm&#8221; package.  See at the end of this post for more details.<br
/> * * * *</p><p>Imagine you want to give a presentation or report of your latest findings running some sort of regression analysis.  How would you do it?</p><p>This was exactly the question Wincent Rong-gui HUANG has recently asked <a
href="http://r.789695.n4.nabble.com/Visualization-of-coefficients-tt2276010.html#none">on the R mailing list</a>.</p><p>One person, Bernd Weiss, responded by linking to the chapter &#8220;<a
href="http://tables2graphs.com/doku.php?id=04_regression_coefficients">Plotting Regression Coefficients</a>&#8221; on an interesting online book (I have never heard of before) called &#8220;<a
href="http://tables2graphs.com/doku.php">Using Graphs Instead of Tables</a>&#8221; (I should add this link to the <a
href="http://www.r-statistics.com/2009/10/free-statistics-e-books-for-download/">free statistics e-books list</a>&#8230;)</p><p>Letter in the conversation, <a
href="http://statmath.wu.ac.at/~zeileis/">Achim Zeileis</a>, has surprised us (well, me) saying the following</p><blockquote><p>I&#8217;ve thought about adding a plot() method for the coeftest() function in the <a
href="http://cran.r-project.org/web/packages/lmtest/index.html">&#8220;lmtest&#8221; package</a>. Essentially, it relies on a coef() and a vcov() method being available &#8211; <strong>and that a central limit theorem holds</strong>. For releasing it as a general function in the package the code is still too raw, but maybe it&#8217;s useful for someone on the list. Hence,<strong> I&#8217;ve included it below</strong>.</p></blockquote><p> (I allowed myself to add some <strong>bolds</strong> in the text)</p><p>So for the convenience of all of us, I uploaded Achim&#8217;s code in a file for easy access.  Here is an example of how to use it:</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">source</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://www.r-statistics.com/wp-content/uploads/2010/07/coefplot.r.txt&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Mroz&quot;</span>, package <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;car&quot;</span><span style="color: #080;">&#41;</span>
fm <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">glm</span><span style="color: #080;">&#40;</span>lfp ~ ., <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span> Mroz, <span style="color: #0000FF; font-weight: bold;">family</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">binomial</span><span style="color: #080;">&#41;</span>
coefplot<span style="color: #080;">&#40;</span>fm, parm <span style="color: #080;">=</span> <span style="color: #080;">-</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span></pre></div></div><p>Here is the resulting graph:<br
/> <a
href="http://www.r-statistics.com/wp-content/uploads/2010/07/regression-coefficient-plot.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/07/regression-coefficient-plot.png" alt="" title="regression coefficient plot" width="550" class="alignright size-full wp-image-437" /></a></p><p>I hope Achim will get around to improve the function so he might think it worthy of joining his<a
href="http://cran.r-project.org/web/packages/lmtest/index.html">&#8220;lmtest&#8221; package</a>.  I am glad he shared his code for the rest of us to have something to work with in the meantime <img
src='http://www.r-statistics.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p><p>* * *</p><p><strong>Update (07.07.10)</strong>:<br
/> Thanks to a comment by David Atkins, I found out there is a more mature version of this function (called <strong>coefplot</strong>) inside the {arm} package.  This version offers many features, one of which is the ability to easily stack several confidence intervals one on top of the other.</p><p>It works for baysglm, glm, lm, polr objects and a default method is available which takes pre-computed coefficients and associated standard errors from any suitable model.</p><p><strong>Example:</strong><br
/> (Notice that the Poisson model in comparison with the binomial models does not make much sense, but is enough to illustrate the use of the function)</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;arm&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Mroz&quot;</span>, package <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;car&quot;</span><span style="color: #080;">&#41;</span>
M1<span style="color: #080;">&lt;-</span>      <span style="color: #0000FF; font-weight: bold;">glm</span><span style="color: #080;">&#40;</span>lfp ~ ., <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span> Mroz, <span style="color: #0000FF; font-weight: bold;">family</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">binomial</span><span style="color: #080;">&#41;</span>
M2<span style="color: #080;">&lt;-</span> bayesglm<span style="color: #080;">&#40;</span>lfp ~ ., <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span> Mroz, <span style="color: #0000FF; font-weight: bold;">family</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">binomial</span><span style="color: #080;">&#41;</span>
M3<span style="color: #080;">&lt;-</span>      <span style="color: #0000FF; font-weight: bold;">glm</span><span style="color: #080;">&#40;</span>lfp ~ ., <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span> Mroz, <span style="color: #0000FF; font-weight: bold;">family</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">binomial</span><span style="color: #080;">&#40;</span>probit<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
coefplot<span style="color: #080;">&#40;</span>M2, xlim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #080;">-</span><span style="color: #ff0000;">2</span>, <span style="color: #ff0000;">6</span><span style="color: #080;">&#41;</span>,            intercept<span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
coefplot<span style="color: #080;">&#40;</span>M1, add<span style="color: #080;">=</span>TRUE, col.<span style="">pts</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;red&quot;</span>,  intercept<span style="color: #080;">=</span>TRUE<span style="color: #080;">&#41;</span>
coefplot<span style="color: #080;">&#40;</span>M3, add<span style="color: #080;">=</span>TRUE, col.<span style="">pts</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;blue&quot;</span>, intercept<span style="color: #080;">=</span>TRUE, <span style="color: #0000FF; font-weight: bold;">offset</span><span style="color: #080;">=</span><span style="color: #ff0000;">0.2</span><span style="color: #080;">&#41;</span></pre></div></div><p>(hat tip goes to Allan Engelhardt for help improving the code, and for Achim Zeileis in extending and improving the narration for the example)</p><p><strong>Resulting plot </strong></p><p><a
href="http://www.r-statistics.com/wp-content/uploads/2010/07/coeff-visualization-3.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/07/coeff-visualization-3.png" alt="" title="coeff visualization 3" width="550" class="alignright size-full wp-image-471" /></a></p><p>* * *<br
/> Lastly,  another method worth mentioning is the Nomogram, implemented by Frank Harrell&#8217;a <a
href="http://biostat.mc.vanderbilt.edu/wiki/Main/Rrms">rms package</a>.</p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/07/visualization-of-regression-coefficients-in-r/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Clustergram: visualization and diagnostics for cluster analysis (R code)</title><link>http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/</link> <comments>http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/#comments</comments> <pubDate>Tue, 15 Jun 2010 08:22:34 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[base graphics]]></category> <category><![CDATA[cluster analysis]]></category> <category><![CDATA[clustergram]]></category> <category><![CDATA[clustering]]></category> <category><![CDATA[Dendrogram]]></category> <category><![CDATA[diagnose]]></category> <category><![CDATA[diagnosing]]></category> <category><![CDATA[diagnostics]]></category> <category><![CDATA[functions]]></category> <category><![CDATA[ggplot]]></category> <category><![CDATA[ggplot2]]></category> <category><![CDATA[hierarchical clustering]]></category> <category><![CDATA[iris]]></category> <category><![CDATA[iris data set]]></category> <category><![CDATA[large data]]></category> <category><![CDATA[matlines]]></category> <category><![CDATA[non-hierarchical]]></category> <category><![CDATA[parallel coordinates]]></category> <category><![CDATA[R code]]></category> <category><![CDATA[R functions]]></category> <category><![CDATA[tree]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=391</guid> <description><![CDATA[About Clustergrams In 2002, Matthias Schonlau published in &#8220;The Stata Journal&#8221; an article named &#8220;The Clustergram: A graph for visualizing hierarchical and . As explained in the abstract: In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed. I propose an alternative graph named “clustergram” to examine how cluster members are assigned to clusters as the number of clusters increases. This graph is useful in exploratory analysis for non-hierarchical clustering algorithms like k-means and for hierarchical [...]]]></description> <content:encoded><![CDATA[<h3>About Clustergrams</h3><p>In 2002, <a
href="http://www.schonlau.net/clustergram.html">Matthias Schonlau </a>published in &#8220;The Stata Journal&#8221; an article named &#8220;<a
href="https://docs.google.com/viewer?url=http://www.schonlau.net/publication/02stata_clustergram.pdf">The Clustergram: A graph for visualizing hierarchical and </a>.  As explained in the abstract:</p><blockquote><p>In hierarchical cluster analysis dendrogram graphs are used to visualize how clusters are formed. I propose an alternative graph named “clustergram” to examine how cluster members are assigned to clusters as the number of clusters increases.<br
/> This graph is useful in exploratory analysis for non-hierarchical clustering algorithms like k-means and for hierarchical cluster algorithms when the number of observations is large enough to make dendrograms impractical.</p></blockquote><p>A <a
href="https://docs.google.com/viewer?url=http://www.schonlau.net/publication/04compstat_clustergram.pdf">similar article</a> was later written and was (maybe) published in &#8220;computational statistics&#8221;.</p><p>Both articles gives some nice background to known methods like k-means and methods for hierarchical clustering, and then goes on to present examples of using these methods (with the Clustergarm) to analyse some datasets.</p><p>Personally, I understand the clustergram to be a type of parallel coordinates plot where each observation is given a vector.  The vector contains the observation&#8217;s location according to how many clusters the dataset was split into.  The scale of the vector is the scale of the first principal component of the data.</p><h3>Clustergram in R (a basic function)</h3><p>After finding out about this method of visualization, I was hunted by the curiosity to play with it a bit.  Therefore, and since I didn&#8217;t find any implementation of the graph in R, I went about writing the code to implement it.</p><p>The code only works for kmeans, but it shows how such a plot can be produced, and could be later modified so to offer methods that will connect with different clustering algorithms.</p><p>The function I present here gets a data.frame/matrix with a row for each observation, and the variable dimensions present in the columns.<br
/> The function assumes the data is scaled.<br
/> The function then goes about calculating the cluster centers for our data, for varying number of clusters.<br
/> For each cluster iteration, the cluster centers are multiplied by the first loading of the principal components of the original data.  Thus offering a weighted mean of the each cluster center dimensions that might give a decent representation of that cluster (this method has the known limitations of using the first component of a PCA for dimensionality reduction, but I won&#8217;t go into that in this post).<br
/> Finally all of our data points are ordered according to their respective cluster first component, and plotted against the number of clusters (thus creating the clustergram).</p><p>My thank goes to <a
href="http://had.co.nz/">Hadley Wickham</a> for offering some good tips on how to prepare the graph.</p><p>Here is the code (example follows)</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;">&nbsp;
&nbsp;
clustergram.<span style="">kmeans</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>Data, k, ...<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
	<span style="color: #228B22;"># this is the type of function that the clustergram</span>
	<span style="color: #228B22;"># 	function takes for the clustering.</span>
	<span style="color: #228B22;"># 	using similar structure will allow implementation of different clustering algorithms</span>
&nbsp;
	<span style="color: #228B22;">#	It returns a list with two elements:</span>
	<span style="color: #228B22;">#	cluster = a vector of length of n (the number of subjects/items)</span>
	<span style="color: #228B22;">#				indicating to which cluster each item belongs.</span>
	<span style="color: #228B22;">#	centers = a k dimensional vector.  Each element is 1 number that represent that cluster</span>
	<span style="color: #228B22;">#				In our case, we are using the weighted mean of the cluster dimensions by </span>
	<span style="color: #228B22;">#				Using the first component (loading) of the PCA of the Data.</span>
&nbsp;
	cl <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">kmeans</span><span style="color: #080;">&#40;</span>Data, k,...<span style="color: #080;">&#41;</span>
&nbsp;
	cluster <span style="color: #080;">&lt;-</span> cl$cluster
	centers <span style="color: #080;">&lt;-</span> cl$centers <span style="color: #080;">%*%</span> <span style="color: #0000FF; font-weight: bold;">princomp</span><span style="color: #080;">&#40;</span>Data<span style="color: #080;">&#41;</span>$loadings<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># 1 number per center</span>
												<span style="color: #228B22;"># here we are using the weighted mean for each</span>
&nbsp;
	<span style="color: #0000FF; font-weight: bold;">return</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span>
				cluster <span style="color: #080;">=</span> cluster,
				centers <span style="color: #080;">=</span> centers
			<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>		
&nbsp;
clustergram.<span style="">plot</span>.<span style="">matlines</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>X,Y, k.<span style="">range</span>, 
											x.<span style="">range</span>, y.<span style="">range</span> , COL, 
											add.<span style="">center</span>.<span style="">points</span> , centers.<span style="">points</span><span style="color: #080;">&#41;</span>
	<span style="color: #080;">&#123;</span>
		<span style="color: #0000FF; font-weight: bold;">plot</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;white&quot;</span>, xlim <span style="color: #080;">=</span> x.<span style="">range</span>, ylim <span style="color: #080;">=</span> y.<span style="">range</span>,
			axes <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">F</span>,
			xlab <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;Number of clusters (k)&quot;</span>, ylab <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;PCA weighted Mean of the clusters&quot;</span>, main <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;Clustergram of the PCA-weighted Mean of the clusters k-mean clusters vs number of clusters (k)&quot;</span><span style="color: #080;">&#41;</span>
		<span style="color: #0000FF; font-weight: bold;">axis</span><span style="color: #080;">&#40;</span>side <span style="color: #080;">=</span><span style="color: #ff0000;">1</span>, at <span style="color: #080;">=</span> k.<span style="">range</span><span style="color: #080;">&#41;</span>
		<span style="color: #0000FF; font-weight: bold;">axis</span><span style="color: #080;">&#40;</span>side <span style="color: #080;">=</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span>
		<span style="color: #0000FF; font-weight: bold;">abline</span><span style="color: #080;">&#40;</span>v <span style="color: #080;">=</span> k.<span style="">range</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;grey&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
		<span style="color: #0000FF; font-weight: bold;">matlines</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">t</span><span style="color: #080;">&#40;</span>X<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">t</span><span style="color: #080;">&#40;</span>Y<span style="color: #080;">&#41;</span>, pch <span style="color: #080;">=</span> <span style="color: #ff0000;">19</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> COL, lty <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>, lwd <span style="color: #080;">=</span> <span style="color: #ff0000;">1.5</span><span style="color: #080;">&#41;</span>
&nbsp;
		<span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span>add.<span style="">center</span>.<span style="">points</span><span style="color: #080;">&#41;</span>
		<span style="color: #080;">&#123;</span>
			<span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span>plyr<span style="color: #080;">&#41;</span>
&nbsp;
			xx <span style="color: #080;">&lt;-</span> ldply<span style="color: #080;">&#40;</span>centers.<span style="">points</span>, <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#41;</span>
			<span style="color: #0000FF; font-weight: bold;">points</span><span style="color: #080;">&#40;</span>xx$y~xx$x, pch <span style="color: #080;">=</span> <span style="color: #ff0000;">19</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;red&quot;</span>, cex <span style="color: #080;">=</span> <span style="color: #ff0000;">1.3</span><span style="color: #080;">&#41;</span>
&nbsp;
			<span style="color: #228B22;"># add points	</span>
			<span style="color: #228B22;"># temp &lt;- l_ply(centers.points, function(xx) {</span>
									<span style="color: #228B22;"># with(xx,points(y~x, pch = 19, col = &quot;red&quot;, cex = 1.3))</span>
									<span style="color: #228B22;"># points(xx$y~xx$x, pch = 19, col = &quot;red&quot;, cex = 1.3)</span>
									<span style="color: #228B22;"># return(1)</span>
									<span style="color: #228B22;"># })</span>
						<span style="color: #228B22;"># We assign the lapply to a variable (temp) only to suppress the lapply &quot;NULL&quot; output</span>
		<span style="color: #080;">&#125;</span>	
	<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
&nbsp;
clustergram <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">10</span> , 
							clustering.<span style="">function</span> <span style="color: #080;">=</span> clustergram.<span style="">kmeans</span>,
							clustergram.<span style="">plot</span> <span style="color: #080;">=</span> clustergram.<span style="">plot</span>.<span style="">matlines</span>, 
							line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
	<span style="color: #228B22;"># Data - should be a scales matrix.  Where each column belongs to a different dimension of the observations</span>
	<span style="color: #228B22;"># k.range - is a vector with the number of clusters to plot the clustergram for</span>
	<span style="color: #228B22;"># clustering.function - this is not really used, but offers a bases to later extend the function to other algorithms </span>
	<span style="color: #228B22;">#			Although that would  more work on the code</span>
	<span style="color: #228B22;"># line.width - is the amount to lift each line in the plot so they won't superimpose eachother</span>
	<span style="color: #228B22;"># add.center.points - just assures that we want to plot points of the cluster means</span>
&nbsp;
	n <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">dim</span><span style="color: #080;">&#40;</span>Data<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>
&nbsp;
	PCA.1 <span style="color: #080;">&lt;-</span> Data <span style="color: #080;">%*%</span> <span style="color: #0000FF; font-weight: bold;">princomp</span><span style="color: #080;">&#40;</span>Data<span style="color: #080;">&#41;</span>$loadings<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># first principal component of our data</span>
&nbsp;
	<span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span>colorspace<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
			COL <span style="color: #080;">&lt;-</span> heat_hcl<span style="color: #080;">&#40;</span>n<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>PCA.1<span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># line colors</span>
		<span style="color: #080;">&#125;</span> <span style="color: #0000FF; font-weight: bold;">else</span> <span style="color: #080;">&#123;</span>
			COL <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rainbow</span><span style="color: #080;">&#40;</span>n<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>PCA.1<span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>	<span style="color: #228B22;"># line colors</span>
			<span style="color: #0000FF; font-weight: bold;">warning</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">'Please consider installing the package &quot;colorspace&quot; for prittier colors'</span><span style="color: #080;">&#41;</span>
		<span style="color: #080;">&#125;</span>
&nbsp;
	line.<span style="">width</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>line.<span style="">width</span>, n<span style="color: #080;">&#41;</span>
&nbsp;
	Y <span style="color: #080;">&lt;-</span> NULL	<span style="color: #228B22;"># Y matrix</span>
	X <span style="color: #080;">&lt;-</span> NULL	<span style="color: #228B22;"># X matrix</span>
&nbsp;
	centers.<span style="">points</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">list</span><span style="color: #080;">&#40;</span><span style="color: #080;">&#41;</span>
&nbsp;
	<span style="color: #0000FF; font-weight: bold;">for</span><span style="color: #080;">&#40;</span>k <span style="color: #0000FF; font-weight: bold;">in</span> k.<span style="">range</span><span style="color: #080;">&#41;</span>
	<span style="color: #080;">&#123;</span>
		k.<span style="">clusters</span> <span style="color: #080;">&lt;-</span> clustering.<span style="">function</span><span style="color: #080;">&#40;</span>Data, k<span style="color: #080;">&#41;</span>
&nbsp;
		clusters.<span style="">vec</span> <span style="color: #080;">&lt;-</span> k.<span style="">clusters</span>$cluster
			<span style="color: #228B22;"># the.centers &lt;- apply(cl$centers,1, mean)</span>
		the.<span style="">centers</span> <span style="color: #080;">&lt;-</span> k.<span style="">clusters</span>$centers 
&nbsp;
		noise <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">unlist</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">tapply</span><span style="color: #080;">&#40;</span>line.<span style="">width</span>, clusters.<span style="">vec</span>, <span style="color: #0000FF; font-weight: bold;">cumsum</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">seq_along</span><span style="color: #080;">&#40;</span>clusters.<span style="">vec</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>clusters.<span style="">vec</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>	
		<span style="color: #228B22;"># noise &lt;- noise - mean(range(noise))</span>
		y <span style="color: #080;">&lt;-</span> the.<span style="">centers</span><span style="color: #080;">&#91;</span>clusters.<span style="">vec</span><span style="color: #080;">&#93;</span> <span style="color: #080;">+</span> noise
		Y <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span>Y, y<span style="color: #080;">&#41;</span>
		x <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>k, <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>y<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
		X <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span>X, x<span style="color: #080;">&#41;</span>
&nbsp;
		centers.<span style="">points</span><span style="color: #080;">&#91;</span><span style="color: #080;">&#91;</span>k<span style="color: #080;">&#93;</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>y <span style="color: #080;">=</span> the.<span style="">centers</span> , x <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">rep</span><span style="color: #080;">&#40;</span>k , k<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>	
	<span style="color: #228B22;">#	points(the.centers ~ rep(k , k), pch = 19, col = &quot;red&quot;, cex = 1.5)</span>
	<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
	x.<span style="">range</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">range</span><span style="color: #080;">&#40;</span>k.<span style="">range</span><span style="color: #080;">&#41;</span>
	y.<span style="">range</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">range</span><span style="color: #080;">&#40;</span>PCA.1<span style="color: #080;">&#41;</span>
&nbsp;
	clustergram.<span style="">plot</span><span style="color: #080;">&#40;</span>X,Y, k.<span style="">range</span>, 
											x.<span style="">range</span>, y.<span style="">range</span> , COL, 
											add.<span style="">center</span>.<span style="">points</span> , centers.<span style="">points</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
<span style="color: #080;">&#125;</span></pre></div></div><h3>Example on the iris dataset</h3><p>The<a
href="http://en.wikipedia.org/wiki/Iris_flower_data_set"> iris data set</a> is a favorite example of many <a
href="http://www.r-bloggers.com/?s=iris">R bloggers </a> when writing about <a
href="http://opendatagroup.com/2009/10/21/r-accessors-explained/">R accessors </a>, <a
href="http://learnr.wordpress.com/2009/10/06/export-data-frames-to-multi-worksheet-excel-file/">Data Exporting</a>, <a
href="http://yihui.name/en/2009/09/how-to-import-ms-excel-data-into-r/">Data importing</a>, and for <a
href="http://weitaiyun.blogspot.com/2009/03/unison-graph-and-parallel-coordinate.html">different </a><a
href="http://weitaiyun.blogspot.com/2009/03/scatterplots.html">visualization </a>techniques.<br
/> So it seemed only natural to experiment on it here.</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">&#40;</span><span style="color: #CC9900; font-weight: bold;">iris</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">250</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>cex.<span style="">lab</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.5</span>, cex.<span style="">main</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.2</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">scale</span><span style="color: #080;">&#40;</span><span style="color: #CC9900; font-weight: bold;">iris</span><span style="color: #080;">&#91;</span>,<span style="color: #080;">-</span><span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notice I am scaling the vectors)</span>
clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">8</span>, line.<span style="">width</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.004</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notice how I am using line.width.  Play with it on your problem, according to the scale of Y.</span></pre></div></div><p>Here is the output:<br
/> <a
href="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-1.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-1.png" alt="" title="clustergram 1" width="500"></a></p><p>Looking at the image we can notice a few interesting things.  We notice that one of the clusters formed (the lower one) stays as is no matter how many clusters we are allowing (except for one observation that goes way and then beck).<br
/> We can also see that the second split is a solid one (in the sense that it splits the first cluster into two clusters which are not &#8220;close&#8221; to each other, and that about half the observations goes to each of the new clusters).<br
/> And then notice how moving to 5 clusters makes almost no difference.<br
/> Lastly, notice how when going for 8 clusters, we are practically left with 4 clusters (remember &#8211; this is according the mean of cluster centers by the loading of the first component of the PCA on the data)</p><p>If I where to take something from this graph, I would say I have a strong tendency to use 3-4 clusters on this data.</p><p>But wait, did our clustering algorithm do a stable job?<br
/> Let&#8217;s try running the algorithm 6 more times (each run will have a different starting point for the clusters)</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">500</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">scale</span><span style="color: #080;">&#40;</span><span style="color: #CC9900; font-weight: bold;">iris</span><span style="color: #080;">&#91;</span>,<span style="color: #080;">-</span><span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notice I am scaling the vectors)</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>cex.<span style="">lab</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.2</span>, cex.<span style="">main</span> <span style="color: #080;">=</span> .7<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>mfrow <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">3</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">for</span><span style="color: #080;">&#40;</span>i <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">6</span><span style="color: #080;">&#41;</span> clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">8</span> , line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span></pre></div></div><p>Resulting with:  (press the image to enlarge it)<br
/> <a
href="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-6.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-6.png" alt="" title="clustergram 6" width="500"></a><br
/> Repeating the analysis offers even more insights.<br
/> First, it would appear that until 3 clusters, the algorithm gives rather stable results.<br
/> From 4 onwards we get various outcomes at each iteration.<br
/> At some of the cases, we got 3 clusters when we asked for 4 or even 5 clusters.</p><p>Reviewing the new plots, I would prefer to go with the 3 clusters option.  Noting how the two &#8220;upper&#8221; clusters might have similar properties while the lower cluster is quite distinct from the other two.</p><p>By the way, the Iris data set is composed of three types of flowers.  I imagine the kmeans  had done a decent job in distinguishing the three.</p><h3>Limitation of the method (and a possible way to overcome it?!)</h3><p>It is worth noting that the current way the algorithm is built has a fundamental limitation:  The plot is good for detecting a situation where there are several clusters but each of them is clearly &#8220;bigger&#8221; then the one before it (on the first principal component of the data).</p><p>For example, let&#8217;s create a dataset with 3 clusters, each one is taken from a normal distribution with a higher mean:</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">250</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#40;</span>
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">2</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">2</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">2</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
				<span style="color: #080;">&#41;</span>				
clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">5</span> , line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span></pre></div></div><p>The resulting plot for this is the following:<br
/> <a
href="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-3-ordered-clusters.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-3-ordered-clusters.png" alt="" title="Clustergram-3-ordered-clusters" width="500" class="alignnone size-full wp-image-402" /></a><br
/> The image shows a clear distinction between three ranks of clusters.  There is no doubt (for me) from looking at this image, that three clusters would be the correct number of clusters.</p><p>But what if the clusters where different but didn&#8217;t have an ordering to them?<br
/> For example, look at the following 4 dimensional data:</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">250</span><span style="color: #080;">&#41;</span>
Data <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rbind</span><span style="color: #080;">&#40;</span>
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,
				<span style="color: #0000FF; font-weight: bold;">cbind</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">0</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span>,<span style="color: #ff0000;">1</span>, <span style="color: #0000FF; font-weight: bold;">sd</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">0.3</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
				<span style="color: #080;">&#41;</span>				
clustergram<span style="color: #080;">&#40;</span>Data, k.<span style="">range</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">:</span><span style="color: #ff0000;">8</span> , line.<span style="">width</span> <span style="color: #080;">=</span> .004, add.<span style="">center</span>.<span style="">points</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span></pre></div></div><p><a
href="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-4-UNordered-clusters.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/06/Clustergram-4-UNordered-clusters.png" alt="" title="Clustergram-4-UNordered-clusters" width="500" class="alignnone size-full wp-image-403" /></a></p><p>In this situation, it is not clear from the location of the clusters on the Y axis that we are dealing with 4 clusters.<br
/> But what is interesting, is that through the growing number of clusters, we can notice that there are 4 &#8220;strands&#8221; of data points moving more or less together (until we reached 4 clusters, at which point the clusters started breaking up).<br
/> Another hope for handling this might be using the color of the lines in some way, but I haven&#8217;t yet figured out how.</p><h3>Clustergram with ggplot2</h3><p><a
href="http://had.co.nz/">Hadley Wickham</a> has kindly played with recreating the clustergram using the ggplot2 engine.  You can see the result here:<br
/> <a
href="http://gist.github.com/439761">http://gist.github.com/439761</a><br
/> And this is what he wrote about it in the comments:</p><blockquote><p>I’ve broken it down into three components:<br
/> * run the clustering algorithm and get predictions (many_kmeans and all_hclust)<br
/> * produce the data for the clustergram (clustergram)<br
/> * plot it (plot.clustergram)<br
/> I don’t think I have the logic behind the y-position adjustment quite right though.</p></blockquote><p>Here is an example of how it looks:<br
/> <a
href="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-ggplot2-1.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/06/clustergram-ggplot2-1.png" alt="" title="clustergram-ggplot2-1" width="500" class="alignnone size-full wp-image-407" /></a></p><h3>Conclusions (some rules of thumb and questions for the future)</h3><p>In a first look, it would appear that the clustergram can be of use.  I can imagine using this graph to quickly run various clustering algorithms and then compare them to each other and review their stability (In the way I just demonstrated in the example above).</p><p>The three rules of thumb I have noticed by now are:</p><ol><li>Look at the location of the cluster points on the Y axis. See when they remain stable, when they start flying around, and what happens to them in higher number of clusters (do they re-group together)</li><li>Observe the strands of the datapoints.  Even if the clusters centers are not ordered, the lines for each item might (needs more research and thinking) tend to move together &#8211; hinting at the real number of clusters</li><li>Run the plot multiple times to observe the stability of the cluster formation (and location)</li></ol><p>Yet there is more work to be done and questions to seek answers to:</p><ul><li>The code needs to be extended to offer methods to various clustering algorithms.</li><li>How can the colors of the lines be used better?</li><li>How can this be done using other graphical engines (ggplot2/lattice?) &#8211; (<strong>Update</strong>: look at Hadley&#8217;s reply in the comments)</li><li>What to do in case the first principal component doesn&#8217;t capture enough of the data? (maybe plot this graph to all the relevant components. but then &#8211; how do you make conclusions of it?)</li><li>What other uses/conclusions can be made based on this graph?</li></ul><p>I am looking forward to reading your input/ideas in the comments (or in reply posts).</p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/06/clustergram-visualization-and-diagnostics-for-cluster-analysis-r-code/feed/</wfw:commentRss> <slash:comments>14</slash:comments> </item> <item><title>The new GUI for ggplot2 (using Deducer) &#8211; the designer wants your opinion</title><link>http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/</link> <comments>http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/#comments</comments> <pubDate>Sat, 01 May 2010 14:29:22 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[deducer]]></category> <category><![CDATA[ggplot2]]></category> <category><![CDATA[GUI]]></category> <category><![CDATA[interfaces]]></category> <category><![CDATA[R GUI]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=331</guid> <description><![CDATA[After discovering that R is expected (this summer) to have a GUI for ggplot2 (through deducer), I later found Ian&#8217;s gsoc proposal for this GUI.  Since the system is in it&#8217;s early stages of development, Ian has invited people to give comments, input and critique on his plans for the project. For your convenience (and with Ian&#8217;s permission), I am reposting his proposal here. You are welcome to send him feedback by e-mailing him (at: ifellows@gmail.com), or by leaving a [...]]]></description> <content:encoded><![CDATA[<p>After <a
href="http://www.r-statistics.com/2010/04/r-and-the-google-summer-of-code-2010-accepted-students-and-projects/">discovering that R is expected (this summer) to have a GUI for ggplot2</a> (through <a
href="http://cran.r-project.org/web/packages/Deducer/index.html">deducer</a>), I later found <a
href="http://neolab.stat.ucla.edu/cranstats/gsoc.pdf">Ian&#8217;s gsoc proposal</a> for this GUI.  Since the system is in it&#8217;s early stages of development, Ian has invited people to give comments, input and critique on his plans for the project.</p><p>For your convenience (and with Ian&#8217;s permission), I am reposting his proposal here.  You are welcome to send him feedback by e-mailing him (at: ifellows@gmail.com), or by leaving a comment here (and I will direct him to your comment).</p><p><span
id="more-331"></span></p><p
class="gde-text"><a
href="http://neolab.stat.ucla.edu/cranstats/gsoc.pdf" target="_blank" class="gde-link">Download (PDF, 2.9MB)</a></p> <iframe
src="http://www.r-statistics.com/wp-content/plugins/google-document-embedder/proxy.php?url=http%3A%2F%2Fneolab.stat.ucla.edu%2Fcranstats%2Fgsoc.pdf&hl=cs&gdet=&embedded=true" width="500" height="700" frameborder="0" class="gde-frame"></iframe>]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/05/the-new-gui-for-ggplot2-using-deducer-the-designer-wants-your-opinion/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Jeroen Ooms&#8217;s ggplot2 web interface &#8211; a new version released (V0.2)</title><link>http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/</link> <comments>http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/#comments</comments> <pubDate>Mon, 12 Apr 2010 20:34:04 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[R and the web]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[ggplot2]]></category> <category><![CDATA[interfaces]]></category> <category><![CDATA[jeroen ooms]]></category> <category><![CDATA[video]]></category> <category><![CDATA[WebSites]]></category> <category><![CDATA[youtube]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=266</guid> <description><![CDATA[Good news. Jeroen Ooms released a new version of his (amazing) online ggplot2 web interface: yeroon.net/ggplot2 is a web interface for Hadley Wickham&#8217;s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the client side: a standard browser will do. The new version has a lot of cool new features, like advanced [...]]]></description> <content:encoded><![CDATA[<p>Good news.</p><p><a
href="http://www.stat.ucla.edu/~jeroen/">Jeroen Ooms</a> released a new version of his <a
href="http://www.stat.ucla.edu/~jeroen/ggplot2/">(amazing) online ggplot2 web interface</a>:</p><blockquote><p><a
href="http://www.yeroon.net/ggplot2/">yeroon.net/ggplot2</a> is a web interface for Hadley Wickham&#8217;s R package ggplot2. It is used as a tool for rapid prototyping, exploratory graphical analysis and education of statistics and R. The interface is written completely in javascript, therefore there is no need to install anything on the client side: a standard browser will do.</p></blockquote><p>The new version has a lot of cool new features, like advanced data import, integration with Google docs, converting variables from numeric to factor to dates and vice versa, and a lot of new geom&#8217;s. Some of which you can watch in his new video demo of the application:<br
/> <object
width="640" height="385"><param
name="movie" value="http://www.youtube.com/v/pCzQP7kVEOc&#038;hl=en_US&#038;fs=1&#038;"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/pCzQP7kVEOc&#038;hl=en_US&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="640" height="385"></embed></object></p><p>The application is on:<br
/> <a
href="http://www.yeroon.net/ggplot2/">http://www.yeroon.net/ggplot2/</a></p><p>p.s: other posts about this (including videos explaining how some of this was done) can be views on the category page: <a
href="http://www.r-statistics.com/category/r-and-the-web/">R and the web</a></p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/04/jeroen-oomss-ggplot2-web-interface-a-new-version-released-v0-2/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Correlation scatter-plot matrix for ordered-categorical data</title><link>http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/</link> <comments>http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/#comments</comments> <pubDate>Wed, 07 Apr 2010 21:37:26 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[code]]></category> <category><![CDATA[correlation]]></category> <category><![CDATA[correlation matrix]]></category> <category><![CDATA[correlation scatter plot]]></category> <category><![CDATA[non-parametric]]></category> <category><![CDATA[non-parametric test]]></category> <category><![CDATA[nonparametric]]></category> <category><![CDATA[nonparametric test]]></category> <category><![CDATA[R code]]></category> <category><![CDATA[scatter plot]]></category> <category><![CDATA[scatter plot matrix]]></category> <category><![CDATA[spearman correlation]]></category> <category><![CDATA[spearman test]]></category> <category><![CDATA[stackoverflow]]></category> <category><![CDATA[survey]]></category> <category><![CDATA[tutorial]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=256</guid> <description><![CDATA[When analyzing a questionnaire, one often wants to view the correlation between two or more Likert questionnaire item&#8217;s (for example: two ordered categorical vectors ranging from 1 to 5). When dealing with several such Likert variable&#8217;s, a clear presentation of all the pairwise relation&#8217;s between our variable can be achieved by inspecting the (Spearman) correlation matrix (easily achieved in R by using the &#8220;cor.test&#8221; command on a matrix of variables). Yet, a challenge appears once we wish to plot this [...]]]></description> <content:encoded><![CDATA[<p>When analyzing a questionnaire, one often wants to view the correlation between two or more <a
href="http://en.wikipedia.org/wiki/Likert_scale">Likert questionnaire</a> item&#8217;s (for example: two ordered categorical vectors ranging from 1 to 5).</p><p>When dealing with several such Likert variable&#8217;s, a clear presentation of all the pairwise relation&#8217;s between our variable can be achieved by inspecting the (Spearman) correlation matrix (easily achieved in R by using the &#8220;cor.test&#8221; command on a matrix of variables).<br
/> Yet, a challenge appears once we wish to plot this correlation matrix.  The challenge stems from the fact that the classic presentation for a correlation matrix is a <strong>scatter plot matrix</strong> &#8211; but scatter plots don&#8217;t (usually) work well for ordered categorical vectors since the dots on the scatter plot often overlap each other.</p><p>There are four solution for the point-overlap problem that I know of:</p><ol><li>Jitter the data a bit to give a sense of the &#8220;density&#8221; of the points</li><li>Use a color spectrum to represent when a point actually represent &#8220;many points&#8221;</li><li>Use different points sizes to represent when there are &#8220;many points&#8221; in the location of that point</li><li>Add a LOWESS (or LOESS) line to the scatter plot &#8211; to show the trend of the data</li></ol><p>In this post I will offer the code for the  a solution that uses solution 3-4 (and possibly 2, please read this post comments). Here is the output (click to see a larger image):</p><p><a
href="http://www.r-statistics.com/wp-content/uploads/2010/04/scatter-plot-correlation-matrix.png"><img
class="alignnone size-full wp-image-257" title="scatter plot correlation matrix" src="http://www.r-statistics.com/wp-content/uploads/2010/04/scatter-plot-correlation-matrix.png" alt="" width="550"/></a></p><p>And here is the code to produce this plot:</p><p><span
id="more-256"></span></p><h3>R code for producing a Correlation scatter-plot matrix &#8211; for ordered-categorical data</h3><p><strong>Note</strong> that this code will work fine for continues data points (although I might suggest to enlarge the &#8220;point.size.rescale&#8221; parameter to something bigger then 1.5 in the &#8220;panel.smooth.ordered.categorical&#8221; function)</p><div
class="wp_syntax"><table><tr><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
</pre></td><td
class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># -----------------</span>
<span style="color: #228B22;"># Functions</span>
<span style="color: #228B22;"># -----------------</span>
&nbsp;
panel.<span style="">cor</span>.<span style="">ordered</span>.<span style="">categorical</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x, y, digits<span style="color: #080;">=</span><span style="color: #ff0000;">2</span>, prefix<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;&quot;</span>, cex.<span style="">cor</span><span style="color: #080;">&#41;</span> 
<span style="color: #080;">&#123;</span>
&nbsp;
    usr <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;usr&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">on.<span style="">exit</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>usr<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> 
    <span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>usr <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>, <span style="color: #ff0000;">1</span>, <span style="color: #ff0000;">0</span>, <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> 
&nbsp;
    r <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">abs</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">cor</span><span style="color: #080;">&#40;</span>x, y, method <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;spearman&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notive we use spearman, non parametric correlation here</span>
    r.<span style="">no</span>.<span style="">abs</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">cor</span><span style="color: #080;">&#40;</span>x, y, method <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;spearman&quot;</span><span style="color: #080;">&#41;</span>
&nbsp;
&nbsp;
    txt <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">format</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>r.<span style="">no</span>.<span style="">abs</span> , <span style="color: #ff0000;">0.123456789</span><span style="color: #080;">&#41;</span>, digits<span style="color: #080;">=</span>digits<span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span> 
    txt <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">paste</span><span style="color: #080;">&#40;</span>prefix, txt, sep<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;&quot;</span><span style="color: #080;">&#41;</span> 
    <span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">missing</span><span style="color: #080;">&#40;</span>cex.<span style="">cor</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> cex <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">0.8</span><span style="color: #080;">/</span><span style="color: #0000FF; font-weight: bold;">strwidth</span><span style="color: #080;">&#40;</span>txt<span style="color: #080;">&#41;</span> 
&nbsp;
    test <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">cor.<span style="">test</span></span><span style="color: #080;">&#40;</span>x,y, method <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;spearman&quot;</span><span style="color: #080;">&#41;</span> 
    <span style="color: #228B22;"># borrowed from printCoefmat</span>
    Signif <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">symnum</span><span style="color: #080;">&#40;</span>test$p.<span style="">value</span>, corr <span style="color: #080;">=</span> FALSE, na <span style="color: #080;">=</span> FALSE, 
                  cutpoints <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>, <span style="color: #ff0000;">0.001</span>, <span style="color: #ff0000;">0.01</span>, <span style="color: #ff0000;">0.05</span>, <span style="color: #ff0000;">0.1</span>, <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>,
                  <span style="color: #0000FF; font-weight: bold;">symbols</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;***&quot;</span>, <span style="color: #ff0000;">&quot;**&quot;</span>, <span style="color: #ff0000;">&quot;*&quot;</span>, <span style="color: #ff0000;">&quot;.&quot;</span>, <span style="color: #ff0000;">&quot; &quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> 
&nbsp;
    <span style="color: #0000FF; font-weight: bold;">text</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0.5</span>, <span style="color: #ff0000;">0.5</span>, txt, cex <span style="color: #080;">=</span> cex <span style="color: #080;">*</span> r<span style="color: #080;">&#41;</span> 
    <span style="color: #0000FF; font-weight: bold;">text</span><span style="color: #080;">&#40;</span>.8, .8, Signif, cex<span style="color: #080;">=</span>cex, <span style="color: #0000FF; font-weight: bold;">col</span><span style="color: #080;">=</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span> 
<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
panel.<span style="">smooth</span>.<span style="">ordered</span>.<span style="">categorical</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span> <span style="color: #080;">&#40;</span>x, y, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;col&quot;</span><span style="color: #080;">&#41;</span>, bg <span style="color: #080;">=</span> NA, pch <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;pch&quot;</span><span style="color: #080;">&#41;</span>, 
												cex <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>, col.<span style="">smooth</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;red&quot;</span>, span <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">/</span><span style="color: #ff0000;">3</span>, iter <span style="color: #080;">=</span> <span style="color: #ff0000;">3</span>, 
												point.<span style="">size</span>.<span style="">rescale</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.5</span>, ...<span style="color: #080;">&#41;</span> 
<span style="color: #080;">&#123;</span>
	<span style="color: #228B22;">#require(colorspace)</span>
    <span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">reshape</span><span style="color: #080;">&#41;</span>
    z <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">merge</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>x,y<span style="color: #080;">&#41;</span>, melt<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">table</span><span style="color: #080;">&#40;</span>x ,y<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">sort</span> <span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span><span style="color: #080;">&#41;</span>$value
    <span style="color: #228B22;">#the.col &lt;- heat_hcl(length(x))[z]</span>
    z <span style="color: #080;">&lt;-</span> point.<span style="">size</span>.<span style="">rescale</span><span style="color: #080;">*</span>z<span style="color: #080;">/</span> <span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># notice how we rescale the dots accourding to the maximum z could have gotten</span>
&nbsp;
    <span style="color: #0000FF; font-weight: bold;">symbols</span><span style="color: #080;">&#40;</span> x, y,  circles <span style="color: #080;">=</span> z,<span style="color: #228B22;">#rep(0.1, length(x)), #sample(1:2, length(x), replace = T) ,</span>
			inches<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>, bg<span style="color: #080;">=</span> <span style="color: #ff0000;">&quot;grey&quot;</span>,<span style="color: #228B22;">#the.col ,</span>
			fg <span style="color: #080;">=</span> bg, add <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
&nbsp;
    <span style="color: #228B22;"># points(x, y, pch = pch, col = col, bg = bg, cex = cex)</span>
    ok <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">is.<span style="">finite</span></span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> <span style="color: #080;">&amp;</span> <span style="color: #0000FF; font-weight: bold;">is.<span style="">finite</span></span><span style="color: #080;">&#40;</span>y<span style="color: #080;">&#41;</span>
    <span style="color: #0000FF; font-weight: bold;">if</span> <span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">any</span><span style="color: #080;">&#40;</span>ok<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> 
        <span style="color: #0000FF; font-weight: bold;">lines</span><span style="color: #080;">&#40;</span>stats<span style="color: #080;">::</span><span style="color: #0000FF; font-weight: bold;">lowess</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#91;</span>ok<span style="color: #080;">&#93;</span>, y<span style="color: #080;">&#91;</span>ok<span style="color: #080;">&#93;</span>, f <span style="color: #080;">=</span> span, iter <span style="color: #080;">=</span> iter<span style="color: #080;">&#41;</span>, 
            <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> col.<span style="">smooth</span>, ...<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
panel.<span style="">hist</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x, ...<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#123;</span>
    usr <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;usr&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">;</span> <span style="color: #0000FF; font-weight: bold;">on.<span style="">exit</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>usr<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
    <span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>usr <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>usr<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span>, <span style="color: #ff0000;">0</span>, <span style="color: #ff0000;">1.5</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#41;</span>
    h <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">hist</span><span style="color: #080;">&#40;</span>x, <span style="color: #0000FF; font-weight: bold;">plot</span> <span style="color: #080;">=</span> FALSE, br <span style="color: #080;">=</span> <span style="color: #ff0000;">20</span><span style="color: #080;">&#41;</span>
    breaks <span style="color: #080;">&lt;-</span> h$breaks<span style="color: #080;">;</span> nB <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>breaks<span style="color: #080;">&#41;</span>
    y <span style="color: #080;">&lt;-</span> h$counts<span style="color: #080;">;</span> y <span style="color: #080;">&lt;-</span> y<span style="color: #080;">/</span><span style="color: #0000FF; font-weight: bold;">max</span><span style="color: #080;">&#40;</span>y<span style="color: #080;">&#41;</span>
    <span style="color: #0000FF; font-weight: bold;">rect</span><span style="color: #080;">&#40;</span>breaks<span style="color: #080;">&#91;</span><span style="color: #080;">-</span>nB<span style="color: #080;">&#93;</span>, <span style="color: #ff0000;">0</span>, breaks<span style="color: #080;">&#91;</span><span style="color: #080;">-</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>, y, <span style="color: #0000FF; font-weight: bold;">col</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;orange&quot;</span>, ...<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
pairs.<span style="">ordered</span>.<span style="">categorical</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>xx,...<span style="color: #080;">&#41;</span>
		<span style="color: #080;">&#123;</span>
			<span style="color: #0000FF; font-weight: bold;">pairs</span><span style="color: #080;">&#40;</span>xx , 
					diag.<span style="">panel</span> <span style="color: #080;">=</span> panel.<span style="">hist</span> ,
					lower.<span style="">panel</span><span style="color: #080;">=</span>panel.<span style="">smooth</span>.<span style="">ordered</span>.<span style="">categorical</span>,
					upper.<span style="">panel</span><span style="color: #080;">=</span>panel.<span style="">cor</span>.<span style="">ordered</span>.<span style="">categorical</span>,
					cex.<span style="">labels</span> <span style="color: #080;">=</span> <span style="color: #ff0000;">1.5</span>, ...<span style="color: #080;">&#41;</span> 
		<span style="color: #080;">&#125;</span>
&nbsp;
&nbsp;
&nbsp;
&nbsp;
<span style="color: #228B22;"># -----------------</span>
<span style="color: #228B22;"># Example</span>
<span style="color: #228B22;"># -----------------</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">set.<span style="">seed</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">666</span><span style="color: #080;">&#41;</span>
a1 <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">100</span>, <span style="color: #0000FF; font-weight: bold;">replace</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
a2 <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">sample</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">1</span><span style="color: #080;">:</span><span style="color: #ff0000;">5</span>, <span style="color: #ff0000;">100</span>, <span style="color: #0000FF; font-weight: bold;">replace</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>
a3 <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">jitter</span><span style="color: #080;">&#40;</span>a2, <span style="color: #ff0000;">7</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#41;</span>
	a3<span style="color: #080;">&#91;</span>a3 <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span> <span style="color: #080;">|</span> a3 <span style="color: #080;">&gt;</span> <span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">3</span>
a4 <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">6</span><span style="color: #080;">-</span><span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">jitter</span><span style="color: #080;">&#40;</span>a1, <span style="color: #ff0000;">7</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#41;</span>
	a4<span style="color: #080;">&#91;</span>a4 <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span> <span style="color: #080;">|</span> a4 <span style="color: #080;">&gt;</span> <span style="color: #ff0000;">5</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&lt;-</span> <span style="color: #ff0000;">3</span>
&nbsp;
aa <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>a1,a2,a3, a4<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">reshape</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># plotting :)		</span>
pairs.<span style="">ordered</span>.<span style="">categorical</span><span style="color: #080;">&#40;</span>aa<span style="color: #080;">&#41;</span></pre></td></tr></table></div><h3> Credits:</h3><ul><li>The original R code for the correlation matrix plot was taken from <a
href="http://addictedtor.free.fr/graphiques/graphcode.php?graph=137">R Graph Gallery</a> (The differences are: 1) The use of spearman correlation;  2) The adding of hist panel and;  3) The changing of points sizes</li><li>The idea to use symbols for changing the point sizes was <a
href="http://stackoverflow.com/questions/2593643/correlation-scatter-matrix-plot-with-different-point-size-in-r">offered</a> by <a
href="http://www.linkedin.com/pub/doug-y-barbo/2/356/416">Doug Y&#8217;barbo</a>.<br
/> And also to<a
href="http://dirk.eddelbuettel.com/"> Dirk Eddelbuettel </a>for offering to use cex (although I ended up not using that)</li></ul><p>If you got ideas on how to improve this code (or reproducing it with ggplot2 or lattice), please do so in the comments (or on your own blog, but be sure to let me know <img
src='http://www.r-statistics.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> )</p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/04/correlation-scatter-plot-matrix-for-ordered-categorical-data/feed/</wfw:commentRss> <slash:comments>9</slash:comments> </item> <item><title>R-Node: a web front-end to R with Protovis</title><link>http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/</link> <comments>http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/#comments</comments> <pubDate>Sat, 03 Apr 2010 12:22:53 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[R and the web]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[fun]]></category> <category><![CDATA[interfaces]]></category> <category><![CDATA[R internet]]></category> <category><![CDATA[R server]]></category> <category><![CDATA[Rserve]]></category> <category><![CDATA[server]]></category> <category><![CDATA[WebSites]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=241</guid> <description><![CDATA[Update (April 6 &#8211; 2010) : R-Node now has it&#8217;s own a website, with a dedicated google group (you can join it here) * * * * The integration of R into online web services is (for me) one of the more exciting prospects in R&#8217;s future. That is way I was very excited coming across Jamie Love&#8217;s recent creation: R-Node. What is R-Node R-Node is a (open source) web front-end to R (the statistical analysis package). Using this front-end, [...]]]></description> <content:encoded><![CDATA[<p><strong>Update (April 6 &#8211; 2010) :</strong> R-Node now has <a
href="http://www.squirelove.net/r-node">it&#8217;s own a website</a>, with a <a
href="http://groups.google.com/group/r-node-users">dedicated google group</a> (you can <a
href="http://groups.google.com/group/r-node-users/subscribe">join it here</a>)</p><p>*  *  *  *</p><p>The integration of R into online web services is (for me) one of the more exciting prospects in R&#8217;s future.  That is way I was very excited <a
href="http://twitter.com/ChrisDiehl/status/11495443959">coming across</a> Jamie Love&#8217;s recent creation: R-Node.</p><h3>What is R-Node</h3><p><a
href="http://gitorious.org/r-node">R-Node</a> is a (open source) web front-end to R (the statistical analysis package).</p><p>Using this front-end, you can from any web browser connect to an R instance running on a remote (or local) server, and interact with it, sending commands and receiving the responses. In particular, graphing commands such as plot() and hist() will execute in the browser, drawing the graph as an SVG image.</p><p>You can see a<strong> live demonstration</strong> of this interface by visiting:<br
/> <a
href="http://69.164.204.238:2904/">http://69.164.204.238:2904/ </a><br
/> And using the following user/password login info:<br
/> User: pvdemouser<br
/> Password: svL35NmPwMnt<br
/> (This link was originally posted <a
href="http://groups.google.com/group/protovis/browse_thread/thread/f0899d436102164a">here</a>)</p><p>Here are some screenshots:</p><p> <a
href='http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/r1/' title='R1'><img
width="150" height="150" src="http://www.r-statistics.com/wp-content/uploads/2010/04/R1-150x150.png" class="attachment-thumbnail" alt="R1" title="R1" /></a> <a
href='http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/r2/' title='R2'><img
width="150" height="150" src="http://www.r-statistics.com/wp-content/uploads/2010/04/R2-150x150.png" class="attachment-thumbnail" alt="R2" title="R2" /></a> <a
href='http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/r3/' title='R3'><img
width="150" height="150" src="http://www.r-statistics.com/wp-content/uploads/2010/04/R3-150x150.png" class="attachment-thumbnail" alt="R3" title="R3" /></a> <a
href='http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/r4/' title='R4'><img
width="150" height="150" src="http://www.r-statistics.com/wp-content/uploads/2010/04/R4-150x150.png" class="attachment-thumbnail" alt="R4" title="R4" /></a> <br
/> <em>In the second screenshot you see the results of the R command &#8216;plot(x, y)&#8217; (with the reimplementation of plot doing the actual plotting), and in the fourth screenshot you see a similar plot command along with a subsequent best fit line (data points calculated with &#8216;lowess()&#8217;) drawn in. </em></p><p>Once in, you can try out R by typing something like:</p><div
class="wp_syntax"><div
class="code"><pre class="rsplus" style="font-family:monospace;">x <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">rnorm</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">100</span><span style="color: #080;">&#41;</span> 
<span style="color: #0000FF; font-weight: bold;">plot</span><span style="color: #080;">&#40;</span>x, main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Random numbers&quot;</span><span style="color: #080;">&#41;</span> 
l <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">lowess</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> 
<span style="color: #0000FF; font-weight: bold;">lines</span> <span style="color: #080;">&#40;</span>l$y<span style="color: #080;">&#41;</span></pre></div></div><p>The plot and lines commands will bring up a graph &#8211; you can escape out of it, download the graph as a SVG file, and change the graph type (e.g. do: plot (x, type=&#8221;o&#8221;) ).<br
/> Many R commands will work, though only the hist(), plot() and lines() work for graphing.<br
/> Please<strong><u> don&#8217;t</u></strong> type the R command q() &#8211; it will quit the server, stopping it working for everyone! Also, as everyone shares the same session for now, using more unique variable name than &#8216;x&#8217; and &#8216;l&#8217; will help you.</p><p>Currently there is only limited error checking but the code continues to be improved and developed. You can download it from:<br
/> <a
href="http://gitorious.org/r-node">http://gitorious.org/r-node </a></p><p>How do you may imagine yourself using something like this?  Feel invited to share with me and everyone else in the comments.</p><p>Here are some of the more technical details of R-Node:<br
/> <span
id="more-241"></span></p><h3>How does R-Node works</h3><p>(Credit: The following text is based on <a
href="http://groups.google.com/group/protovis/browse_thread/thread/13633e3ae1229993">this forum thread</a>)</p><p>R-node, uses protovis for drawing graphs. <a
href="http://vis.stanford.edu/protovis/">Protovis </a>is a visualization toolkit written in JavaScript using the canvas element. Using simple graphical marks, like boxes and dots, one can construct custom views to present or explore data.</p><p>Besides Protovis, R-node also uses jquery and ExtJS core on the front-end.</p><p>Most R commands are passed back to the server and their results returned to the client. Some, such as the graph commands, are parsed and the arguments used in javascript re-implementations of the R commands (e.g. the R command &#8216;plot&#8217; has a protovis equivalent).</p><p>The server side is R+Rserve, and to connect the browser client to the R server Jamie used a nodejs based application server.</p><p>Projects utilised in this include:</p><ul><li>Protovis &#8211; http://vis.stanford.edu/protovis/</li><li>Nodejs &#8211; http://nodejs.org/</li><li>R &#8211; http://www.r-project.org/</li><li>Rserve  &#8211; http://www.rforge.net/Rserve/doc.html</li><li>Shjs &#8211; http://shjs.sourceforge.net/</li></ul><p>I would love to read your thoughts about this in the comments.</p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/04/r-node-a-web-front-end-to-r-with-protovis/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> <item><title>Nutritional supplements efficacy score &#8211; Graphing plots of current studies results (using R)</title><link>http://www.r-statistics.com/2010/02/nutritional-supplements-efficacy-score-graphing-plots-of-current-studies-results-using-r/</link> <comments>http://www.r-statistics.com/2010/02/nutritional-supplements-efficacy-score-graphing-plots-of-current-studies-results-using-r/#comments</comments> <pubDate>Thu, 25 Feb 2010 21:17:07 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[R]]></category> <category><![CDATA[statistics]]></category> <category><![CDATA[visualization]]></category> <category><![CDATA[allergy research supplements]]></category> <category><![CDATA[amino acids supplements]]></category> <category><![CDATA[balloon]]></category> <category><![CDATA[balloon plot]]></category> <category><![CDATA[balloon plot R]]></category> <category><![CDATA[barplot]]></category> <category><![CDATA[benefits supplements]]></category> <category><![CDATA[capsules supplements]]></category> <category><![CDATA[dietary research]]></category> <category><![CDATA[effects supplements]]></category> <category><![CDATA[fibromyalgia research]]></category> <category><![CDATA[glucosamine research]]></category> <category><![CDATA[glucosamine supplements]]></category> <category><![CDATA[google excel]]></category> <category><![CDATA[google spread sheet]]></category> <category><![CDATA[google spreadsheet]]></category> <category><![CDATA[green tea research]]></category> <category><![CDATA[hair loss research]]></category> <category><![CDATA[herbal research]]></category> <category><![CDATA[herbs research]]></category> <category><![CDATA[herbs supplements]]></category> <category><![CDATA[immune system supplements]]></category> <category><![CDATA[liquid research]]></category> <category><![CDATA[liquid supplements]]></category> <category><![CDATA[magnesium research]]></category> <category><![CDATA[mineral research]]></category> <category><![CDATA[minerals research]]></category> <category><![CDATA[natural health supplements]]></category> <category><![CDATA[natural research]]></category> <category><![CDATA[nutritional research]]></category> <category><![CDATA[plot]]></category> <category><![CDATA[pregnancy supplements]]></category> <category><![CDATA[R code]]></category> <category><![CDATA[side effects supplements]]></category> <category><![CDATA[sports nutrition supplements]]></category> <category><![CDATA[supplement research]]></category> <category><![CDATA[supplements body building]]></category> <category><![CDATA[supplements bodybuilding]]></category> <category><![CDATA[supplements dietary]]></category> <category><![CDATA[supplements foods]]></category> <category><![CDATA[supplements herbal]]></category> <category><![CDATA[supplements mineral]]></category> <category><![CDATA[supplements minerals]]></category> <category><![CDATA[supplements nutritional]]></category> <category><![CDATA[supplements products]]></category> <category><![CDATA[supplements protein]]></category> <category><![CDATA[supplements research]]></category> <category><![CDATA[take supplements]]></category> <category><![CDATA[taking supplements]]></category> <category><![CDATA[thyroid research]]></category> <category><![CDATA[vitamin b supplements]]></category> <category><![CDATA[vitamin c research]]></category> <category><![CDATA[vitamin c supplements]]></category> <category><![CDATA[vitamin d research]]></category> <category><![CDATA[vitamins discount]]></category> <category><![CDATA[vitamins minerals supplements]]></category> <category><![CDATA[vitamins research]]></category> <category><![CDATA[weight loss research]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=171</guid> <description><![CDATA[In this post I showcase a nice bar-plot and a balloon-plot listing recommended Nutritional supplements , according to how much evidence exists for thier benefits, scroll down to see it(and click here for the data behind it) * * * * The gorgeous blog &#8220;Information Is Beautiful&#8221; recently publish an eye candy post showing a “balloon race” image (see a static version of the image here) illustrating how much evidence exists for the benefits of various Nutritional supplements (such as: [...]]]></description> <content:encoded><![CDATA[<p>In this post I showcase a nice <strong>bar-plot and a balloon-plot listing recommended Nutritional supplements</strong> , according to how much evidence exists for thier benefits, scroll down to see it(and <a
href="http://spreadsheets.google.com/ccc?key=0Aqe2P9sYhZ2ndFRKaU1FaWVvOEJiV2NwZ0JHck12X1E&amp;hl=en_GB">click here</a> for the data behind it)<br
/> *  *  *  *<br
/> The gorgeous blog <a
href="http://www.informationisbeautiful.net/">&#8220;Information Is Beautiful&#8221;</a> recently publish an <a
href="http://www.informationisbeautiful.net/play/snake-oil-supplements/">eye candy post</a> showing a “balloon race” image (see a static version of the image <a
href="http://www.informationisbeautiful.net/visualizations/snake-oil-supplements/">here</a>) illustrating how much evidence exists for the benefits of various Nutritional supplements (such as: green tea, vitamins, herbs, pills and so on) . The higher the bubble in the Y axis <del
datetime="2010-03-06T11:34:54+00:00">score (e.g: the bubble size)</del> for the supplement the greater the evidence there is for its effectiveness (But only for the conditions listed along side the supplement).</p><p>There are two reasons this should be of interest to us:</p><ol><li>This shows a fun plot, that R currently doesn&#8217;t know how to do (at least I wasn&#8217;t able to find an implementation for it). So if anyone thinks of an easy way for making one &#8211; please let me know.</li><li>The data for the graph is openly (and freely) provided to all of us on <a
href="http://spreadsheets.google.com/ccc?key=0Aqe2P9sYhZ2ndFRKaU1FaWVvOEJiV2NwZ0JHck12X1E&amp;hl=en_GB">this Google Doc</a>.</li></ol><p>The advantage of having the data on a google doc means that we can see when the data will be updated. But more then that, it means we can easily extract the data into R and have our way with it  (Thanks to <a
href="http://blog.revolution-computing.com/2009/09/how-to-use-a-google-spreadsheet-as-data-in-r.html">David Smith&#8217;s post </a>on the subject)</p><p>For example, I was wondering what are ALL of the top recommended Nutritional supplements, an answer that is not trivial to get from the plot that was in the <a
href="http://www.informationisbeautiful.net/play/snake-oil-supplements/">original post</a>.</p><p>In this post I will supply two plots that present the data: A barplot (that in retrospect didn&#8217;t prove to be good enough) and a balloon-plot for a table (that seems to me to be much better).</p><p><strong>Barplot</strong><br
/> (You can <strong>click the image to enlarge</strong> it)<br
/> <a
href="http://www.r-statistics.com/wp-content/uploads/2010/02/Nutritional-supplements-efficacy.png"><img
class="alignnone size-full wp-image-172" title="Nutritional supplements efficacy" src="http://www.r-statistics.com/wp-content/uploads/2010/02/Nutritional-supplements-efficacy.png" alt="" width="550" /></a></p><p>The R code to produce the barplot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).</p><div
class="wp_syntax"><table><tr><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td
class="code"><pre class="rsplus" style="font-family:monospace;">&nbsp;
<span style="color: #228B22;"># loading the data</span>
supplements.<span style="">data</span>.0 <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">csv</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;http://spreadsheets.google.com/pub?key=0Aqe2P9sYhZ2ndFRKaU1FaWVvOEJiV2NwZ0JHck12X1E&amp;output=csv&quot;</span><span style="color: #080;">&#41;</span>
supplements.<span style="">data</span> <span style="color: #080;">&lt;-</span> supplements.<span style="">data</span>.0<span style="color: #080;">&#91;</span>supplements.<span style="">data</span>.0<span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span> <span style="color: #080;">&gt;</span><span style="color: #ff0000;">2</span>,<span style="color: #080;">&#93;</span> <span style="color: #228B22;"># let's only look at &quot;good&quot; supplements</span>
supplements.<span style="">data</span> <span style="color: #080;">&lt;-</span> supplements.<span style="">data</span><span style="color: #080;">&#91;</span><span style="color: #080;">!</span><span style="color: #0000FF; font-weight: bold;">is.<span style="">na</span></span><span style="color: #080;">&#40;</span>supplements.<span style="">data</span><span style="color: #080;">&#91;</span>,<span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span> <span style="color: #228B22;"># and we don't want any missing data</span>
&nbsp;
supplement.<span style="">score</span> <span style="color: #080;">&lt;-</span> supplements.<span style="">data</span><span style="color: #080;">&#91;</span>, <span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span>
ss <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>supplement.<span style="">score</span>, decreasing  <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">F</span><span style="color: #080;">&#41;</span>	<span style="color: #228B22;"># sort our data</span>
supplement.<span style="">score</span> <span style="color: #080;">&lt;-</span> supplement.<span style="">score</span><span style="color: #080;">&#91;</span>ss<span style="color: #080;">&#93;</span>
supplement.<span style="">name</span> <span style="color: #080;">&lt;-</span> supplements.<span style="">data</span><span style="color: #080;">&#91;</span>ss, <span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>
supplement.<span style="">benefits</span> <span style="color: #080;">&lt;-</span> supplements.<span style="">data</span><span style="color: #080;">&#91;</span>ss, <span style="color: #ff0000;">4</span><span style="color: #080;">&#93;</span>
supplement.<span style="">score</span>.<span style="">col</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">factor</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">as.<span style="">character</span></span><span style="color: #080;">&#40;</span>supplement.<span style="">score</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
	<span style="color: #0000FF; font-weight: bold;">levels</span><span style="color: #080;">&#40;</span>supplement.<span style="">score</span>.<span style="">col</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&lt;-</span>  <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;red&quot;</span>, <span style="color: #ff0000;">&quot;orange&quot;</span>, <span style="color: #ff0000;">&quot;blue&quot;</span>, <span style="color: #ff0000;">&quot;dark green&quot;</span><span style="color: #080;">&#41;</span>
	supplement.<span style="">score</span>.<span style="">col</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">as.<span style="">character</span></span><span style="color: #080;">&#40;</span>supplement.<span style="">score</span>.<span style="">col</span><span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># mar: c(bottom, left, top, right) The default is c(5, 4, 4, 2) + 0.1.</span>
<span style="color: #0000FF; font-weight: bold;">par</span><span style="color: #080;">&#40;</span>mar <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">5</span>,<span style="color: #ff0000;">9</span>,<span style="color: #ff0000;">4</span>,<span style="color: #ff0000;">13</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>	<span style="color: #228B22;"># taking care of the plot margins</span>
bar.<span style="">y</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">barplot</span><span style="color: #080;">&#40;</span>supplement.<span style="">score</span>, names.<span style="">arg</span><span style="color: #080;">=</span> supplement.<span style="">name</span>, las <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span>, horiz <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">T</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> supplement.<span style="">score</span>.<span style="">col</span>, xlim <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">6.2</span><span style="color: #080;">&#41;</span>,
				main <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Nutritional supplements efficacy score&quot;</span>,<span style="color: #ff0000;">&quot;(by evidence for its effectiveness on the listed condition)&quot;</span>, <span style="color: #ff0000;">&quot;(2010)&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">axis</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">4</span>, <span style="color: #0000FF; font-weight: bold;">labels</span> <span style="color: #080;">=</span> supplement.<span style="">benefits</span>, at <span style="color: #080;">=</span> bar.<span style="">y</span>, las <span style="color: #080;">=</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># Add right axis</span>
<span style="color: #0000FF; font-weight: bold;">abline</span><span style="color: #080;">&#40;</span>h <span style="color: #080;">=</span> bar.<span style="">y</span>, <span style="color: #0000FF; font-weight: bold;">col</span> <span style="color: #080;">=</span> supplement.<span style="">score</span>.<span style="">col</span> , lty <span style="color: #080;">=</span> <span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span> <span style="color: #228B22;"># add some lines so to easily follow each bar</span></pre></td></tr></table></div><p>Also, the nice things is that if the guys at Information Is Beautiful will update there data, we could easily run the code and see the updated list of recommended supplements.</p><p><strong>Balloon plot</strong><br
/> So after some web surfing I came around an implementation of a balloon plot in R (Thanks to <a
href="http://addictedtor.free.fr/graphiques/graphcode.php?graph=60">R graph gallery</a>)<br
/> There where two problems with using the command out of the box. The first one was that the colors where non informative (easily fixed), the second one was that the X labels where overlapping one another. Since there is no &#8220;las&#8221; parameter in the function, I just opened the function up, found where this was plotted and changed it manually (a bit messy, but that&#8217;s what you have to do sometimes&#8230;)</p><p>Here are the result (you can click the image for a larger image):</p><p><a
href="http://www.r-statistics.com/wp-content/uploads/2010/02/balloonplot.png"><img
src="http://www.r-statistics.com/wp-content/uploads/2010/02/balloonplot.png" alt="" title="balloonplot" width="550" class="alignnone size-full wp-image-199" /></a></p><p>And here is The R code to produce the Balloon plot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).<br
/> (it&#8217;s just the copy of the function with a tiny bit of editing in line 146, and then using it)</p><p><span
id="more-171"></span></p><div
class="wp_syntax"><table><tr><td
class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
</pre></td><td
class="code"><pre class="rsplus" style="font-family:monospace;">&nbsp;
<span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span>colorspace<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">require</span><span style="color: #080;">&#40;</span>gplots<span style="color: #080;">&#41;</span>
&nbsp;
<span style="color: #228B22;"># I was able to find the function by using</span>
<span style="color: #228B22;"># methods(balloonplot)[1]</span>
<span style="color: #228B22;"># This command: getAnywhere(&quot;balloonplot.default&quot;) # Wouldn't work...</span>
balloonplot2 <span style="color: #080;">&lt;-</span> gplots<span style="color: #080;">:::</span><span style="">balloonplot</span>.<span style="">default</span> <span style="color: #228B22;"># This one works :)</span>
&nbsp;
<span style="color: #228B22;"># now run:</span>
<span style="color: #0000FF; font-weight: bold;">fix</span><span style="color: #080;">&#40;</span>balloonplot2<span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># search for </span>
<span style="color: #228B22;"># y &lt;- ny + 0.75 + (nlabels.x - i + 0.5) * colmar</span>
<span style="color: #228B22;"># And add beneath it the following line:</span>
<span style="color: #228B22;"># y &lt;- rep(y, dim(xlabs)[1]) - c(0,.5,1)</span>
&nbsp;
supplement.<span style="">benefits</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">tolower</span><span style="color: #080;">&#40;</span>supplement.<span style="">benefits</span> <span style="color: #080;">&#41;</span>
supplement.<span style="">name</span>		<span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">tolower</span><span style="color: #080;">&#40;</span>supplement.<span style="">name</span><span style="color: #080;">&#41;</span>
&nbsp;
balloonplot2<span style="color: #080;">&#40;</span> supplement.<span style="">name</span>,supplement.<span style="">benefits</span>, supplement.<span style="">score</span>, xlab <span style="color: #080;">=</span><span style="color: #ff0000;">&quot;supplement&quot;</span>, ylab<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Benefit&quot;</span>,
			show.<span style="">margins</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>, dotsize <span style="color: #080;">=</span> <span style="color: #ff0000;">15</span>,fun<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span><span style="color: #0000FF; font-weight: bold;">max</span><span style="color: #080;">&#40;</span>x,na.<span style="">rm</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">T</span><span style="color: #080;">&#41;</span>,			
			rowmar <span style="color: #080;">=</span> <span style="color: #ff0000;">7</span>,
			colmar <span style="color: #080;">=</span> <span style="color: #ff0000;">7</span>,
			dotcolor <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">rev</span><span style="color: #080;">&#40;</span>heat_hcl<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">max</span><span style="color: #080;">&#40;</span> supplement.<span style="">score</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#91;</span> supplement.<span style="">score</span><span style="color: #080;">-</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>,
			main <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Balloon plot of&quot;</span>, <span style="color: #ff0000;">&quot;Nutritional supplements efficacy score&quot;</span>,<span style="color: #ff0000;">&quot;(by evidence for its effectiveness on the listed condition)&quot;</span>, <span style="color: #ff0000;">&quot;(2010)&quot;</span><span style="color: #080;">&#41;</span>,
			<span style="color: #0000FF; font-weight: bold;">sub</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Published on www.r-statistics.com&quot;</span><span style="color: #080;">&#41;</span>				
			<span style="color: #080;">&#41;</span></pre></td></tr></table></div><p>Got any good ideas of how else to plot the data? let me know in the comments <img
src='http://www.r-statistics.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/02/nutritional-supplements-efficacy-score-graphing-plots-of-current-studies-results-using-r/feed/</wfw:commentRss> <slash:comments>17</slash:comments> </item> <item><title>Fun interpretive dances for common statistical plots</title><link>http://www.r-statistics.com/2010/02/fun-interpretive-dance-for-common-statistical-plots/</link> <comments>http://www.r-statistics.com/2010/02/fun-interpretive-dance-for-common-statistical-plots/#comments</comments> <pubDate>Wed, 24 Feb 2010 11:06:39 +0000</pubDate> <dc:creator>Tal Galili</dc:creator> <category><![CDATA[visualization]]></category> <category><![CDATA[dance]]></category> <category><![CDATA[fun]]></category> <category><![CDATA[interpretive dance]]></category> <category><![CDATA[video]]></category> <category><![CDATA[youtube]]></category><guid
isPermaLink="false">http://www.r-statistics.com/?p=167</guid> <description><![CDATA[My wife is a big lover of dance (especially Dance In Israel), and while reading through the NYtimes article: &#8220;To Impress, Tufts Prospects Turn to YouTube&#8220;, she found me a pearl: A woman performing interpretive dances for math/statistical plots. That includes small dance for: scatter plots, boxplots, barplots and a few others. Enjoy:]]></description> <content:encoded><![CDATA[<p>My wife is a big lover of dance (especially <a
href="http://www.danceinisrael.com/">Dance In Israel</a>), and while reading through the NYtimes article: &#8220;<a
href="http://www.nytimes.com/2010/02/23/education/23tufts.html">To Impress, Tufts Prospects Turn to YouTube</a>&#8220;, she found me a pearl: A woman performing interpretive dances for math/statistical plots. That includes small dance for: scatter plots, boxplots, barplots and a few others. Enjoy:</p><p><object
width="500" height="400"><param
name="movie" value="http://www.youtube.com/v/CNPXUWsMdIo&#038;fs=1"></param><param
name="allowFullScreen" value="true"></param><param
name="allowscriptaccess" value="always"></param><embed
src="http://www.youtube.com/v/CNPXUWsMdIo&#038;fs=1" type="application/x-shockwave-flash" width="500" height="400" allowscriptaccess="always" allowfullscreen="true"></embed></object></p> ]]></content:encoded> <wfw:commentRss>http://www.r-statistics.com/2010/02/fun-interpretive-dance-for-common-statistical-plots/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (enhanced)
Database Caching 3/37 queries in 0.006 seconds using disk
Object Caching 1422/1586 objects using disk

Served from: www.r-statistics.com @ 2010-09-08 00:10:57 -->