Way back when I was first learning
R, I ran across an old listserv post that talked about how the colon (
:) operator was the fastest way to generate a sequence. I never really thought about it, but I got in the habit of always using it whenever I needed a sequence.
Anecdotally, I knew from running a few simulations that
seq() should be avoided if you’re generating a lot of small sequences repeatedly, but that’s a relatively rare case. Is the colon operator really that much faster than the alternatives —
seq_len() — in general cases?
Turns out the answer is “yes” — most of the time. Running a simple
microbenchmark script, I tested the generation of numbers from $latex 10^1$ to $latex 10^9$ for each of the four functions. Then I plotted the mean with bars representing the $latex 2.5$th and $latex 97.5$th percentiles (on a log-log plot).
If you’re generating large sequences, it really doesn’t seem to matter which function you use, but for the common cases (e.g., slicing a vector or enumerating a loop), the colon operator outperforms the others. I’m not really sure there’s a lesson here except to trust
R listserv posts and use
: as often as possible. Code here.