Meanwhile, elsewhere…

  • “POSIX hardlink heartache”

    Michael Orlitzky:

    It follows that, on POSIX systems without any non-standard protections, it’s unsafe for anyone (but in particular, root) to do anything sensitive in a directory that is writable by another user. Cross-platform programs designed to do so are simply flawed.

  • “Learning to Read, Again”

    David Kolb:

    I picked my own case because the focus on words and print makes various stages of learning to navigate a media world easily identifiable. […]

    When I moved to Bates College in Maine in the late ’70s, the computer revolution was just beginning. I rented a standalone word processing device the size of a small refrigerator from Digital Equipment Corporation and used it to begin a book. A few years later I bought my first computer. The fledgling Internet was gradually coming up and everything I knew about reading, research, and paper was about to be challenged.

    At first I didn't notice how much was changing[.]

    His summary of the developments is succint:

    The media have become predatory, grasping; you’re not in control. […]

    Recall the old advertising tactic: stimulate people into a constant state of low level unfulfilled sexual excitement, and into that gap you can pour a infinity of products. That still continues, but now add another tactic: stimulate people into a constant state of unsatisfied anger and resentment, and out of that gap you can pull an infinity of votes and cash contributions. So we get exposes, fake news, endless conspiracy theories, all with urgent appeals.

    And, as the salesmen say, “but wait, there’s more.” Aside from all those predatory grasping Internet manipulators there is something diffuse and deceptive going on, a general hyping and intensifying of our everyday encounters.

    His thoughts on what answer must be are that wall-building isolation cannot possibly work. And so he proposes something else.

  • A proposed adequate definition of a data breach

    Troy Hunt:

    A data breach occurs when information is obtained by an unauthorised party in a fashion in which it was not intended to be made available.

    Eminently sensible; the definition shouldn’t hinge on technicalities about how the data got away. (I wonder if that means we should be using a different word than “breach”?)

  • Stop The Humour

    Jeff Johnson:

    To be more specific, NSURL is based on an obsolete RFC. (Note: RFC is an acronym for Read the F-ing Commandment, while NS is an acronym for No Swift.)

  • Early Signs

    Jamie Zawinski:

    Today is day 10,000 of The September That Never Ended.

    The Internet: Mistakes Were Made.™

rename 1.601

Friday, 30 Aug 2019

Over 6 years have gone by since I cut the last release of rename because no new features have been needed in the time since. Unfortunately a number of small bugfixes and documentation additions have therefore sat unreleased in the repository for years. You have a bug report about a long-fixed issue to thank for me finally noticing. I am hereby rectifying this situation:

Share and enjoy.

“When was this stored” vs “when did it happen”

Sunday, 4 Nov 2018

Over on his weblog, Chris Siebenmann talks about metadata in file based websites:

In a file based website engine, any form of metadata that you can’t usefully commit into common version control systems is a mistake.

This is part of a class of mistake I’ve variously made or run into in a number of contexts.

A prominent example that comes to mind is an application I work on which tracks records of some kind, and in one place, users enter “number of X for the thing recorded here” as data points over time. Originally the point in time used for a data point was the time of entry of the data point. At some point I realised that this was a mistake: the point in time at which the number of X was Y is not the same as the point in time at which that fact was recorded in our system. Among other things, treating them the same means you cannot enter “number of X” data points after the fact (such as when newly capturing a record retroactively). One is metadata about a real-world event, the other is metadata about the usage of our application to record that real-world event.

So the generalised principle goes maybe something like this:

Metadata about a storage entity should not be confused with metadata about whatever the payload of that storage entity represents.

These may (obviously) coincide but are nevertheless not the same thing. If an article is stored in a file, the creation date of the file just says when that article was saved into that file – not when the article was written. As long as the “writing an article” and “saving it into a file” actions are strongly bound to each other, these metadata will coincide so that it may seem attractive to treat one as the other. But once you introduce ways for them to diverge – such as adding a VCS to the mix, which creates/removes/modifies files without a corresponding article-writing/editing action – you have a problem.

Doubly detached IRC

Thursday, 8 Feb 2018

For long-running terminal sessions such as IRC, I’ve long used dtach (or Screen, or tmux, it doesn’t matter) to be able to leave them running on a server without having to be connected to it… just like everyone else does. But recently I put together some simple facts in a retrospectively obvious way that I haven’t heard of anyone else doing, and thus achieved a little quality of life improvement.

I’ve written before about my love for Mosh during times of need. And when I had the insight I am writing about, I was in a situation where I seriously needed it. (If you don’t know what Mosh is, read about it first, because ⓐ you’re missing out and ⓑ the rest of this article won’t make much sense otherwise.)

Here’s the thing about Mosh, though: it still has to go through regular SSH in order to bring up a Mosh session. And on a severely packet-lossy connection, that alone can be hell.

The real beauty of Mosh comes to the fore only if you keep the session around once it’s set up. As long as it’s up, then no matter how bad the packet loss, you get decent interactivity. But to fully benefit from that, you have to avoid tearing the session down.

My problem is that try as I might, I have never been able to break with my compulsion to close terminal windows once I am done with them. For IRC that means sooner or later I’ll want to detach from a session I’m not actively chatting in. And because I use Mosh to run dtach on the remote end, detaching from IRC means that the dtach client exits on the remote end… which tears down the Mosh session.

The simple fact that suddenly occurred to me is that I can also use dtach on my end of the connection, in front of Mosh:

dtach -A ~/.dtach/irssi mosh hostname dtach -A ~/.dtach/irssi irssi

Now when I detach, it is only from my local dtach session, not the one on the server. So the Mosh session behind it sticks around – without me having to keep the terminal window open.

The upshot is a dtach ↔ Mosh ↔ dtach sandwich which gives me the full benefits of Mosh.

Should you want to use this yourself, you will need the last piece of the puzzle, namely how to bring down the Mosh session while keeping the IRC session around. To do that you have to detach on the remote end, and the simplest way of doing that is this:

dtach -a ~/.dtach/irssi -E # and then press the detach shortcut

The -E switch disables the keyboard shortcut for detaching in the local dtach client. This means when you press the shortcut it gets sent to the remote dtach client.

(What follows then is exactly the same chain of events as always when you detach from the remote dtach session: the remote dtach client exits, so the remote Mosh session ends, so the local Mosh client exits – and therefore the local dtach session ends as well. Detaching from the remote end thus brings the whole edifice down.)


Friday, 15 Sep 2017 [Tuesday, 19 Sep 2017]

Some time ago I had occasion to implement (mostly exponential) back-off in an application for the first time. This is not a hard problem, but at the outset I expected it to be one of those annoying cases where the code is only clear to read when you are immersed in the problem it solves.

Not so. It turns out there is a trivially simple algorithm if you pick the right form of stored state – namely, a pair of timestamps: (last_success, next_retry). The essential form of the algorithm goes like this:

if succeeded { last_success = now }
next_retry = last_success, i = 0
until next_retry > now {
  next_retry += step_size(i)
  i += 1

Because this recalculates the scheduling for the next retry by starting over from the previous success, every time, it is totally resilient against all fluctuations in the environment. Belated or missing retry attempts have no effect on its output. Even swapping the step_size function for an entirely different one mid-flight just works!

At the same time, it is trivial to reason out and verify that this algorithm works correctly.

I was quite pleased.

(In practice you will likely not have a separate step_size function and i counter but rather some kind of step variable iterated along with next_retry. But here I wanted to abstract away from the specific formula used.)

Update: As prompted by a question from Matthew Persico, let me clarify that my use case is scheduling polls that succeed only intermittently, meaning that I always want to wait at least once between attempts, which is why I used “until next_retry > now”.

If instead you want to add backoff to an operation that only fails intermittently (e.g. draining a buffer to I/O) then you’ll want to use “while next_retry < now” for your loop, so you can have zero-delay back-to-back attempts.