Newsfeed

“When was this stored” vs “when did it happen”

Sunday, 4 Nov 2018

Over on his weblog, Chris Siebenmann talks about metadata in file based websites:

In a file based website engine, any form of metadata that you can’t usefully commit into common version control systems is a mistake.

This is part of a class of mistake I’ve variously made or run into in a number of contexts.

A prominent example that comes to mind is an application I work on which tracks records of some kind, and in one place, users enter “number of X for the thing recorded here” as data points over time. Originally the point in time used for a data point was the time of entry of the data point. At some point I realised that this was a mistake: the point in time at which the number of X was Y is not the same as the point in time at which that fact was recorded in our system. Among other things, treating them the same means you cannot enter “number of X” data points after the fact (such as when newly capturing a record retroactively). One is metadata about a real-world event, the other is metadata about the usage of our application to record that real-world event.

So the generalised principle goes maybe something like this:

Metadata about a storage entity should not be confused with metadata about whatever the payload of that storage entity represents.

These may (obviously) coincide but are nevertheless not the same thing. If an article is stored in a file, the creation date of the file just says when that article was saved into that file – not when the article was written. As long as the “writing an article” and “saving it into a file” actions are strongly bound to each other, these metadata will coincide so that it may seem attractive to treat one as the other. But once you introduce ways for them to diverge – such as adding a VCS to the mix, which creates/removes/modifies files without a corresponding article-writing/editing action – you have a problem.

Doubly detached IRC

Thursday, 8 Feb 2018

For long-running terminal sessions such as IRC, I’ve long used dtach (or Screen, or tmux, it doesn’t matter) to be able to leave them running on a server without having to be connected to it… just like everyone else does. But recently I put together some simple facts in a retrospectively obvious way that I haven’t heard of anyone else doing, and thus achieved a little quality of life improvement.

I wrote before about how much I love Mosh when I need it. And when I had the insight I am writing about, I was in a situation where I seriously needed it. (If you don’t know what Mosh is, read about it first, because ⓐ you’re missing out and ⓑ the rest of this article won’t make much sense otherwise.)

Here’s the thing about Mosh, though: it still has to go through regular SSH in order to bring up a Mosh session. And on a severely packet-lossy connection, that alone can be hell.

The real beauty of Mosh comes to the fore only if you keep the session around once it’s set up. As long as it’s up, then no matter how bad the packet loss, you get decent interactivity. But to fully benefit from that, you have to avoid tearing the session down.

My problem is that try as I might, I have never been able to break with my compulsion to close terminal windows once I am done with them. For IRC that means sooner or later I’ll want to detach from a session I’m not actively chatting in. And because I use Mosh to run dtach on the remote end, detaching from IRC means that the dtach client exits on the remote end… which tears down the Mosh session.

The simple fact that suddenly occurred to me is that I can also use dtach on my end of the connection, in front of Mosh:

dtach -A ~/.dtach/irssi mosh hostname dtach -A ~/.dtach/irssi irssi

Now when I detach, it is only from my local dtach session, not the one on the server. So the Mosh session behind it sticks around – without me having to keep the terminal window open.

The upshot is a dtach ↔ Mosh ↔ dtach sandwich which gives me the full benefits of Mosh.


If you want to use this yourself, note the procedure to end the Mosh session in this case:

dtach -a ~/.dtach/irssi -E # and then press the detach shortcut

The -E switch disables the keyboard shortcut for detaching in the local dtach client. This means when you press the shortcut it gets sent to the remote dtach client. What follows then is exactly the same chain of events as always when you detach from the remote dtach session: the remote dtach client exits, so Mosh exits. And therefore the local dtach session ends as well. Detaching from the remote end thus brings the whole edifice down, and the simplest way to trigger that is to disable detaching on the local end.

Backoff

Friday, 15 Sep 2017 [Tuesday, 19 Sep 2017]

Some time ago I had occasion to implement (mostly exponential) back-off in an application for the first time. This is not a hard problem, but at the outset I expected it to be one of those annoying cases where the code is only clear to read when you are immersed in the problem it solves.

Not so. It turns out there is a trivially simple algorithm if you pick the right form of stored state – namely, a pair of timestamps: (last_success, next_retry). The essential form of the algorithm goes like this:

if succeeded { last_success = now }
next_retry = last_success, i = 0
until next_retry > now {
  next_retry += step_size(i)
  i += 1
}

Because this recalculates the scheduling for the next retry by starting over from the previous success, every time, it is totally resilient against all fluctuations in the environment. Belated or missing retry attempts have no effect on its output. Even swapping the step_size function for an entirely different one mid-flight just works!

At the same time, it is trivial to reason out and verify that this algorithm works correctly.

I was quite pleased.


(In practice you will likely not have a separate step_size function and i counter but rather some kind of step variable iterated along with next_retry. But here I wanted to abstract away from the specific formula used.)

Update: As prompted by a question from Matthew Persico, let me clarify that my use case is scheduling polls that succeed only intermittently, meaning that I always want to wait at least once between attempts, which is why I used “until next_retry > now”.

If instead you want to add backoff to an operation that only fails intermittently (e.g. draining a buffer to I/O) then you’ll want to use “while next_retry < now” for your loop, so you can have zero-delay back-to-back attempts.