Friday, 15 Sep 2017 [Tuesday, 19 Sep 2017]

Some time ago I had occasion to implement (mostly exponential) back-off in an application for the first time. This is not a hard problem, but at the outset I expected it to be one of those annoying cases where the code is only clear to read when you are immersed in the problem it solves.

Not so. It turns out there is a trivially simple algorithm if you pick the right form of stored state – namely, a pair of timestamps: (last_success, next_retry). The essential form of the algorithm goes like this:

if succeeded { last_success = now }
next_retry = last_success, i = 0
until next_retry > now {
  next_retry += step_size(i)
  i += 1

Because this recalculates the scheduling for the next retry by starting over from the previous success, every time, it is totally resilient against all fluctuations in the environment. Belated or missing retry attempts have no effect on its output. Even swapping the step_size function for an entirely different one mid-flight just works!

At the same time, it is trivial to reason out and verify that this algorithm works correctly.

I was quite pleased.

(In practice you will likely not have a separate step_size function and i counter but rather some kind of step variable iterated along with next_retry. But here I wanted to abstract away from the specific formula used.)

Update: As prompted by a question from Matthew Persico, let me clarify that my use case is scheduling polls that succeed only intermittently, meaning that I always want to wait at least once between attempts, which is why I used “until next_retry > now”.

If instead you want to add backoff to an operation that only fails intermittently (e.g. draining a buffer to I/O) then you’ll want to use “while next_retry < now” for your loop, so you can have zero-delay back-to-back attempts.