Statistics with R, and open source stuff (software, data, community)

labels.dendrogram in R 3.2.2 can be ~70 times faster (for trees with 1000 labels)

The recent release of R 3.2.2 came with a small (but highly valuable) improvement to the stats:::labels.dendrogram function. When working with dendrograms with (say) 1000 labels, the new function offers a 70 times speed improvement over the version of the function from R 3.2.1. This speedup is even better than the Rcpp version of labels.dendrogram from the dendextendRcpp package.

Here is some R code to demonstrate this speed improvement:

# IF you are missing an of these - they should be installed:install.packages("dendextend")install.packages("dendextendRcpp")install.packages("microbenchmark")# Getting labels from dendextendRcpp
labelsRcpp%dist%>%hclust%>%as.dendrogramlabels(dend)

And here are the results:

> microbenchmark(labels_3.2.1(dend), labels_3.2.2(dend), labelsRcpp(dend))
Unit: milliseconds
expr min lq median uq max neval
labels_3.2.1(dend)186.522968189.395378195.684164208.328365321.98368100
labels_3.2.2(dend)2.6047662.8267762.8917283.00679221.24127100
labelsRcpp(dend)3.8254013.9469043.9998174.17955211.22088100>> microbenchmark(labels_3.2.2(dend), order.dendrogram(dend))
Unit: microseconds
expr min lq median uq max neval
labels_3.2.2(dend)2520.2182596.08802678.6772885.28909572.460100order.dendrogram(dend)665.191712.2235954.951996.10552268.812100

As we can see, the new labels function (in R 3.2.2) is about 70 times faster than the older version (from R 3.2.1). When only wanting something like the number of labels, using length on order.dendrogram will still be (about 3 times) faster than using labels.