Way back when I was first learning R
, I ran across an old listserv post that talked about how the colon (:
) operator was the fastest way to generate a sequence. I never really thought about it, but I got in the habit of always using it whenever I needed a sequence.
Anecdotally, I knew from running a few simulations that seq()
should be avoided if you’re generating a lot of small sequences repeatedly, but that’s a relatively rare case. Is the colon operator really that much faster than the alternatives — seq()
, seq.int()
, or seq_len()
— in general cases?
Turns out the answer is “yes” — most of the time. Running a simple microbenchmark
script, I tested the generation of numbers from $latex 10^1$ to $latex 10^9$ for each of the four functions. Then I plotted the mean with bars representing the $latex 2.5$th and $latex 97.5$th percentiles (on a log-log plot).
If you’re generating large sequences, it really doesn’t seem to matter which function you use, but for the common cases (e.g., slicing a vector or enumerating a loop), the colon operator outperforms the others. I’m not really sure there’s a lesson here except to trust R
listserv posts and use :
as often as possible. Code here.