R-statistics blog

The difference between "letters[c(1,NA)]" and "letters[c(NA,NA)]"

In David Smith’s latest blog post (which, in a sense, is a continued response to the latest public attack on R), there was a comment by Barry that caught my eye. Barry wrote:

Even I get caught out on R quirks after 20 years of using it. Compare letters[c(12,NA)] and letters[c(NA,NA)] for the most recent thing that made me bang my head against the wall.

So I did, and here’s the output:

> letters[c(12,NA)]
[1] "l" NA
>  letters[c(NA,NA)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
>

Interesting isn’t it?
I had no clue why this had happened but luckily for us, Barry gave a follow-up reply with an explanation. And here is what he wrote:

My example with ‘letters’ comes from a collision of three features:

  1. recycling of short subscripts
  2. silent coercion of types (boolean NA to numeric NA)
  3. and the existence of five different NA values that all print the same.

[…] to really understand that letters[c(1,NA)] is different from letters[c(NA,NA)] you have to see that:

To make sure I understood Barry correctly, I tried the following code:

>  letters[c(T,NA)]
 [1] "a" NA  "c" NA  "e" NA  "g" NA  "i" NA  "k" NA  "m" NA  "o" NA  "q" NA  "s" NA  "u" NA  "w" NA  "y" NA
Barry gave this example to illustrate how R violates the Zen idea if: “Simple is better than complex”.  Since (so he claims), subscript recycling is shooting you in the foot.
To follow up on that discussion, head over to Barry’s comment on the REvolution blog
Exit mobile version