Nutritional supplements efficacy score – Graphing plots of current studies results (using R)

In this post I showcase a nice bar-plot and a balloon-plot listing recommended Nutritional supplements , according to how much evidence exists for thier benefits, scroll down to see it(and click here for the data behind it)
* * * *
The gorgeous blog “Information Is Beautiful” recently publish an eye candy post showing a “balloon race” image (see a static version of the image here) illustrating how much evidence exists for the benefits of various Nutritional supplements (such as: green tea, vitamins, herbs, pills and so on) . The higher the bubble in the Y axis score (e.g: the bubble size) for the supplement the greater the evidence there is for its effectiveness (But only for the conditions listed along side the supplement).

There are two reasons this should be of interest to us:

  1. This shows a fun plot, that R currently doesn’t know how to do (at least I wasn’t able to find an implementation for it). So if anyone thinks of an easy way for making one – please let me know.
  2. The data for the graph is openly (and freely) provided to all of us on this Google Doc.

The advantage of having the data on a google doc means that we can see when the data will be updated. But more then that, it means we can easily extract the data into R and have our way with it (Thanks to David Smith’s post on the subject)

For example, I was wondering what are ALL of the top recommended Nutritional supplements, an answer that is not trivial to get from the plot that was in the original post.

In this post I will supply two plots that present the data: A barplot (that in retrospect didn’t prove to be good enough) and a balloon-plot for a table (that seems to me to be much better).

Barplot
(You can click the image to enlarge it)

The R code to produce the barplot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).


# loading the data
supplements.data.0 <- read.csv("http://spreadsheets.google.com/pub?key=0Aqe2P9sYhZ2ndFRKaU1FaWVvOEJiV2NwZ0JHck12X1E&output=csv")
supplements.data <- supplements.data.0[supplements.data.0[,2] >2,] # let's only look at "good" supplements
supplements.data <- supplements.data[!is.na(supplements.data[,2]),] # and we don't want any missing data

supplement.score <- supplements.data[, 2]
ss <- order(supplement.score, decreasing  = F)	# sort our data
supplement.score <- supplement.score[ss]
supplement.name <- supplements.data[ss, 1]
supplement.benefits <- supplements.data[ss, 4]
supplement.score.col <- factor(as.character(supplement.score))
	levels(supplement.score.col) <-  c("red", "orange", "blue", "dark green")
	supplement.score.col <- as.character(supplement.score.col)

# mar: c(bottom, left, top, right) The default is c(5, 4, 4, 2) + 0.1.
par(mar = c(5,9,4,13))	# taking care of the plot margins
bar.y <- barplot(supplement.score, names.arg= supplement.name, las = 1, horiz = T, col = supplement.score.col, xlim = c(0,6.2),
				main = c("Nutritional supplements efficacy score","(by evidence for its effectiveness on the listed condition)", "(2010)"))
axis(4, labels = supplement.benefits, at = bar.y, las = 1) # Add right axis
abline(h = bar.y, col = supplement.score.col , lty = 2) # add some lines so to easily follow each bar

Also, the nice things is that if the guys at Information Is Beautiful will update there data, we could easily run the code and see the updated list of recommended supplements.

Balloon plot
So after some web surfing I came around an implementation of a balloon plot in R (Thanks to R graph gallery)
There where two problems with using the command out of the box. The first one was that the colors where non informative (easily fixed), the second one was that the X labels where overlapping one another. Since there is no "las" parameter in the function, I just opened the function up, found where this was plotted and changed it manually (a bit messy, but that's what you have to do sometimes...)

Here are the result (you can click the image for a larger image):

And here is The R code to produce the Balloon plot of Nutritional supplements efficacy score (by evidence for its effectiveness on the listed condition).
(it's just the copy of the function with a tiny bit of editing in line 146, and then using it)


require(colorspace)
require(gplots)

# I was able to find the function by using
# methods(balloonplot)[1]
# This command: getAnywhere("balloonplot.default") # Wouldn't work...
balloonplot2 <- gplots:::balloonplot.default # This one works :)

# now run:
fix(balloonplot2)
# search for
# y <- ny + 0.75 + (nlabels.x - i + 0.5) * colmar
# And add beneath it the following line:
# y <- rep(y, dim(xlabs)[1]) - c(0,.5,1)

supplement.benefits <- tolower(supplement.benefits )
supplement.name		<- tolower(supplement.name)

balloonplot2( supplement.name,supplement.benefits, supplement.score, xlab ="supplement", ylab="Benefit",
			show.margins=F, dotsize = 15,fun=function(x)max(x,na.rm=T),
			rowmar = 7,
			colmar = 7,
			dotcolor = rev(heat_hcl(max( supplement.score)))[ supplement.score-1],
			main = c("Balloon plot of", "Nutritional supplements efficacy score","(by evidence for its effectiveness on the listed condition)", "(2010)"),
			sub = c("Published on www.r-statistics.com")
			)

Got any good ideas of how else to plot the data? let me know in the comments 🙂

19 thoughts on “Nutritional supplements efficacy score – Graphing plots of current studies results (using R)”

  1. The charts and graphs in the book Information is beautiful are awesome, but is the R code available anywhere,tnx Samuel

    1. Hi Samuel,
      Thank you for the comment.
      The R code inside my post is for the barplot. I don’t think there exists an R code for doing the “Racing balloons” plot (although if someone where to create such a function – I’d be happy to know about it).

  2. Couldn’t you do it in R using a vector for the cex argument of plot?

    For example:
    x <- rnorm(10)
    y <- rnorm(10)
    plot(x,y,cex=seq(1,1.9,0.1))

    Then you need to make the cex values vector meaningful based on the number of google hits, play with the axes etc and Bob's your uncle. Maybe.

    1. Hi Ron,
      Thanks for the suggestions.

      My issue with trying to do the bubble plot in R is actually in how to combine what you showed with using many “text” functions, so to also include the relevant text in the bubbles. (I am not even talking about interaction)
      I agree that something similar (to the static image) can be reproduced using the R graphic system, but I fear the hurdles are many…

  3. Second point, I think you have misinterpreted the original graphic. You say “The higher the score (e.g: the bubble size) for the supplement the greater the evidence there is for its effectiveness”.

    My reading of their graphic and text is that evidence is the y axis and google popularity is the bubble size parameter.

    Ron.

    1. Hi J S –

      Selenium got both 6 and 3 because it has different level of evidence for different aspects of helping patients with cancer.

      The google spreadsheet indicates that Selenium is “3” for “cancer” but “6” for chemotherapy. I didn’t got into the details of the difference, but you can review the source for the info here: http://www.ncbi.nlm.nih.gov/pubmed/9290116

      For the grade 6, the source abstracts are:
      # Russo, M. W.; Murray, S. C.; Wurzelmann, J. I.; Woosley, J. T.; Sandler R. S.; (1997). “Plasma selenium levels and the risk of colorectal adenomas”. Nutr Cancer. 28 (2): 125–129.. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=%209290116. Patterson, B. H.; Levander O. A. (1997). “Naturally occurring selenium compounds in cancer chemoprevention trials: a workshop summary”. Cancer Epidemiol Biomarkers Prev. 6 (1): 63–69. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=8993799. Is low selenium status a risk factor for lung canc…[Am J Epidemiol. 1998] – PubMed Result Dietary selenium repletion may reduce cancer incid…[Nutr Rev. 1997] – PubMed Result The genotoxicity of selenium. [Mutat Res. 1985] – PubMed Result
      Blood serum selenium in the province of Mérida, Ve…[J Trace Elem Electrolytes Health Dis. 1990] – PubMed Result

      While for grade 3 , the source abstracts is:
      Russo MW, Murray SC, Wurzelmann JI, Woosley JT, Sandler RS (1997). “Plasma selenium levels and the risk of colorectal adenomas”. Nutrition and Cancer 28 (2): 125–9. doi:10.1080/01635589709514563. ISSN 0163-5581. PMID 9290116

      If after reading you come up with a deeper understanding, please feel welcome to share.

      Best,
      Tal

  4. Great work, and a nice complement to the informationisbeautiful.com infographic. One little quibble is that you used “gyno” as the general category for folic acid. I think it would make more sense to say obstetric, since folic acid affects fetal development, not general reproductive health. I think the gyno tag would work better for supplements that affect PMS/menstrual cramps/etc.

    1. Hi Kate,
      I am glad you came by, and that you found my post of use 🙂
      Regarding your comment, it sounds worth while, and I could make the change in the graph I made – BUT – since my graph is based on the data from informationisbeautiful, wouldn’t it be better to offer this to them, so that their table will include the correction? (I already noticed they respond to comments from the audience, I think it would be worth to you to jump over to their post and leave there your comment)

      All the best,
      Tal

Leave a Reply to kateCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.