plasmasturm.org

Supplying Git a commit message prepared ahead of time

2025-03-28T10:19:17+01:00

Say you want to write script that will set up the next Git commit for you without creating it – just stage some content and prepare a commit message. The next bare git commit should automatically populate the editor with that message, without any flags passed to it, nor any Git configuration changes, and without affecting any subsequent invocations of git commit. The staging part is obvious… but where do you put the commit message?

The answer, which appears to be undocumented, turns out to be .git/MERGE_MSG:

echo 'there we go' > .git/MERGE_MSG

If you now run git commit, your editor will open with the “there we go” commit message already entered.

Update: originally in this entry I suggested using .git/SQUASH_MSG, which probably seems more eccentric than .git/MERGE_MSG. Because I could not remember how to trigger this behaviour in Git outside of a merge or rebase, I was reduced to reading the code of git commit, where the related logic is somewhat convoluted, and the .git/SQUASH_MSG case is the easiest to discover. I finally remembered that it is git cherry-pick --no-commit where I saw this behaviour first, which allowed me to discover what it does by just running the command and examining the .git directory. I am writing this down because that should remain an easy route to figuring out whatever the new mechanism is, should it ever change.

How to eject external hard disks used for APFS TimeMachine backups (I’m not making this up)

2023-01-20T00:32:44+01:00

Ever since upgrading to a recent Mac that came with the disk formatted with AFPS, a perennial irritation has been Time Machine. I use a USB hard drive for backups, which of course needs unplugging when I want to take the machine with me somewhere. There are long stretches of time when I don’t even think about this because it works just fine. And then there are the other stretches of time when this has been impossible: clicking the eject button in Finder does nothing for a few ponderous moments and then shows a force eject dialog. (And of course the command line tools and other methods all equally fail.)

I could of course forcibly eject the disk, as the dialog offers. And maybe I would, if it this was just a USB stick I was using to shuffle around some files. But doing this with my backup disk seems rather less clever. This disk I want to unmount in good order.

Unfortunately when this happens, there is no help for it: even closing all applications does not stop the mystery program from using it. So what is the program which is using the disk? The Spotlight indexer, it turns out.

$ sudo lsof +d /Volumes/TimeMachine\ XXXXXX/
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
mds     1234 root    5r   DIR   1,24      160    2 /Volumes/TimeMachine XXXXXX
mds     1234 root    6r   DIR   1,24      160    2 /Volumes/TimeMachine XXXXXX
mds     1234 root    7r   DIR   1,24      160    2 /Volumes/TimeMachine XXXXXX

How do you ask this to stop?

Beats me. I have not found any documented, official way of doing so. Not the Spotlight privacy settings, not removing the disk from the list of backup disks in the Time Machine settings, not the combination of those, no conceivable variation of using tmutil on the command line, not a number of other things – nothing. Even killall -HUP mds does not help: obviously the Spotlight service notices, and just respawns the processes.

And this state will persist for hours and days – literally. On one occasion, I wanted but didn’t need the machine with me, so I left it to its own devices out of curiosity. It took over 2 days before ejecting the Time Machine volume worked again.

For a purportedly portable computer, this is… you know… a bit of a showstopper.

So after suffering this issue long enough, I finally tried something stupid the other day – and whaddayaknow, it works:

$ sudo killall -HUP mds ; sudo umount /Volumes/TimeMachine\ XXXXXX/

This will not always work the first time, it may need a repeat or two. But sooner rather than later it does take. Evidently the mds process respawn is not so quick that it wouldn’t leave a window during which the disk can be unmounted properly.

And so I put the following in ~/bin/macos/unmount-despite-mds and made it executable:

#!/bin/bash
if (( $# != 1 )) ; then echo "usage: $0 " 1>&2 ; exit 1 ; fi
parent_dev=$( stat -f %d "$1"/.. ) || exit $?
while [[ -d $1 ]] && [[ "$( stat -qf %d "$1" )" != $parent_dev ]] && ! umount "$1" ; do
  killall -HUP mds
  lsof +d "$1"
done

Now I can invoke it at any time from the terminal like so:

$ sudo ~/bin/macos/unmount-despite-mds /Volumes/TimeMachine\ XXXXXX/

What this does is check whether the given path, if it is a mount point, fails to unmount. If so, it sends a signal to terminate the Spotlight indexer processes and immediately retries. In a loop.

Or to put it more colloquially, it machineguns Spotlight until umount can slip in under the suppressing fire and pull the disk out from under it.

This is not a solution. It is the bluntest of instruments. But this is what works. And as far as I have been able to find, this is the only thing that works.

Three cheers for Apple software quality, I guess.

Update: Sending HUP instead of TERM is a better idea. Both cause the mds processes to shut down, but TERM seems to leave Spotlight in a bad state: for some time after, some search results take longer to show up and search result order gets messed up, which painfully disrupts my muscle memory. With HUP I have not observed the same issue. I have added it to all the examples in the article.

Low-brow clipboard integration with remote X

2024-02-11T21:15:11+01:00

This is another one of those small-lightbulb moments like this one was. The setup for this one is that I run some X programs on my home server for remote use from the machine I actually work on – but via VNC, where I never got clipboard integration working.¹ I only rarely need to paste in X, so I’ve just been living with this… until the obvious recently occcured to me:

The low-brow but perfectly serviceable solution is to just use the clipboard commands of both systems, strung together with a pipe over a SSH connection. Of course this needs to be done manually to transfer the clipboard every single time I copy something on one side and want to paste it on the other – the low-brow bit. But it is far more convenient than pasting into terminals like I was doing before, to the point I won’t feel the lack any more.

Only… it didn’t quite work right.

I searched the web for a fix and unsurprisingly found that plenty of people have had the same idea:

# copy in X, run this, then paste locally
ssh server.home xclip -selection clipboard -out | pbcopy

(If you’re not using a Mac locally, just replace pbcopy (and pbpaste) with your system’s equivalent.)

The trouble is, I was looking for the other direction – and far from novel though the idea may be, I didn’t find a command written up anywhere. There turns out to be a minor trick to it (and maybe that’s why), which I ultimately had to figure out for myself:

# copy locally, run this, then paste in X
pbpaste | exec ssh server.home 'exec xclip -selection clipboard &> /dev/null'

Namely, this won’t work as desired without the “&> /dev/null” bit.

It will work, except without returning to the prompt. It just hangs. This comes down to the way that selections work in X: the program the selected content came from must grab that selection and then answer requests to paste it – no program, no paste. So the only way xclip can work is to stick around as a background process after putting something on the clipboard – until something else is copied, then it can exit. And because xclip doesn’t close stdout and stderr, ssh won’t know to quit any sooner than that, so it sits there waiting. To create a compound command that immediately returns to the prompt after shipping over the local clipboard contents, it is therefore necessary to close stdout and stderr on xclip explicitly.

With that, I have a solution I can happily live with.²

This is after trying all the usual suggestions (like running vncconfig on the server). Presumably they didn’t work because I am using the Screen Sharing app that comes with MacOS, which apparently is not actually a VNC client but just uses that protocol for most of its functionality.

The backstory to that is that I used to use Xpra for remote X because it makes that rather neat: individual windows on the server are displayed remotely as individual windows on the client, complete with native local windowing UI, so there is none of the ungainly window-in-window hassle and no need for an X window manager. The catch is that Xpra requires reasonably matching versions on server and client, and I have at times fallen well behind running the latest OS version on either side, which on a few occasions has made them tricky to align. A while ago I failed to find any working constellation at all, at which point I decided I was tired of doing that and would switch to something less bespoke. Now I no longer need to install anything on the Mac and have multiple highly compatible options on the server. ↩
Or maybe I’ll go back to Xpra, who knows… ↩

Closing for trouble in the land of streams

2024-02-08T17:22:22+01:00

Recently I learned of a mistake I had been making without realizing it. Zygo:

Closing FD 2, without simultaneously opening something else in its place (like a new TTY, if you’re trying to shed a controlling TTY or drop the last references to your old namespace/chroot parent), is weird and occasionally dangerous.

It’s asking for trouble because libraries are chatty and they write noise to stderr and it’s nearly impossible to stop them. Libraries also like to make their own open file descriptors, and they tend not to notice when the thing they’ve opened/connected to and expect to exclusively control/have a private conversation with happens to also be stderr and full of noise from random library functions. Chaos follows when those patterns collide.

Jamie Zawinski:

XScreenSaver has had code in it since the beginning to re-open FDs 1 and 2 as /dev/null lest XOpenDisplay put the connection to the X server on FD 2 with hilarious results.

In other words, closing one of the standard streams might lead to flooding somewhere else. You must ensure drainage, even if only to the void.

Doubly detached IRC

2018-02-08T05:13:25+01:00

For long-running terminal sessions such as IRC, I’ve long used dtach (or Screen, or tmux, it doesn’t matter) to be able to leave them running on a server without having to be connected to it… just like everyone else does. But recently I put together some simple facts in a retrospectively obvious way that I haven’t heard of anyone else doing, and thus achieved a little quality of life improvement.

I’ve written before about my love for Mosh during times of need. And when I had the insight I am writing about, I was in a situation where I seriously needed it. (If you don’t know what Mosh is, read about it first, because ⓐ you’re missing out and ⓑ the rest of this article won’t make much sense otherwise.)

Here’s the thing about Mosh, though: it still has to go through regular SSH in order to bring up a Mosh session. And on a severely packet-lossy connection, that alone can be hell.

The real beauty of Mosh comes to the fore only if you keep the session around once it’s set up. As long as it’s up, then no matter how bad the packet loss, you get decent interactivity. But to fully benefit from that, you have to avoid tearing the session down.

My problem is that try as I might, I have never been able to break with my compulsion to close terminal windows once I am done with them. For IRC that means sooner or later I’ll want to detach from a session I’m not actively chatting in. And because I use Mosh to run dtach on the remote end, detaching from IRC means that the dtach client exits on the remote end… which tears down the Mosh session.

The simple fact that suddenly occurred to me is that I can also use dtach on my end of the connection, in front of Mosh:

dtach -A ~/.dtach/irssi mosh hostname dtach -A ~/.dtach/irssi irssi

Now when I detach, it is only from my local dtach session, not the one on the server. So the Mosh session behind it sticks around – without me having to keep the terminal window open.

The upshot is a dtach ↔ Mosh ↔ dtach sandwich which gives me the full benefits of Mosh.

Should you want to use this yourself, you will need the last piece of the puzzle, namely how to bring down the Mosh session while keeping the IRC session around.

Update: The obvious answer is of course to just quit the Mosh session using Ctrl^ . so the following is only of academic interest.

To do that you can detach on the remote end, and the simplest way of doing that is this:

dtach -a ~/.dtach/irssi -E # and then press the detach shortcut

The -E switch disables the keyboard shortcut for detaching in the local dtach client. This means when you press the shortcut it gets sent to the remote dtach client.

(What follows then is exactly the same chain of events as always when you detach from the remote dtach session: the remote dtach client exits, so the remote Mosh session ends, so the local Mosh client exits – and therefore the local dtach session ends as well. Detaching from the remote end thus brings the whole edifice down.)

Comparing the contents of gzipped tarballs

2022-12-18T18:12:51+01:00

Some time ago I had a pile of tarballs which were created periodically by a cron job on a machine, regardless of whether anything had changed between runs, and they were eating up all the storage space. To free up space I wanted to get rid of the redundant ones so I needed a quick way to identify which of them had (no) changes relative to the respective preceding tarball.

I was hoping I could just compare the tarballs themselves rather than doing anything more complicated that involved actually extracting their contents (and then, I don’t know, doing some kind of fingerprinting on top). Unfortunately some naïve attempts using cmp failed and seemed to indicate that I was going to have to take the kind of more complicated approach I was hoping to avoid.

Now, there is quite a bit of discussion online about how to make Tar generate reproducible (so in a sense, canonical) tarballs. Of course that isn’t much use in hindsight (such as in my case), when a pile of tarballs is already sitting around on disk.

But in my case, the part of the filesystem that all of the tarballs were made from was an area where, in case of the tarballs that were redundant, absolutely nothing would have happened. By that I don’t just mean logical non-changes like creating and then deleting a temporary file. I mean that nothing was writing to that part of the filesystem in any capacity. Therefore Tar should encounter files and directories in the exact same order each time it was iterating that directory tree. Why then should it ever generate non-identical tarballs? Exasperated, I dug into the question of archive reproducibility for quite a while.

Spoiler: that was a waste of time.

I finally discovered that Tar is not the culprit at all…

Gzip is! Namely, the Gzip file header includes a timestamp.

So my instincts were right: Tar should have been creating the exact same archive over and over, and in fact it was. I just hadn’t thought to suspect Gzip at all.

Luckily, the timestamp is found at a fixed location in a gzipped file: it is the 32-bit value at offset 4. And handily, cmp has a switch to tell it to seek past the start of the file(s) it is comparing.

So to make a long story short:

cmp -i8 file1.tar.gz file2.tar.gz

(This entry is brought to you by the hope to not have to figure this all out a third time in my life. Some time after the events described above, it came back to me that I had already figured this out years before but lost all memory of it by the next occasion to use the knowledge.)

The Programmers’ Credo

2022-12-18T17:25:18+01:00

Maciej Cegłowski:

The Programmers’ Credo:

We do these things not because they are easy,
but because we thought they were going to be easy.

“CAPTCHA”

2022-12-06T14:41:57+01:00

Bruce Schneier:

This is an actual CAPTCHA I was shown when trying to log into PayPal.

[…]

As an actual human and not a bot, I had no idea how to answer. Is this a joke? (Seems not.) […] I stared at the screen, paralyzed, for way too long.

Update: some follow-up humour.

“POSIX hardlink heartache”

2022-04-03T15:53:04+02:00

Michael Orlitzky:

It follows that, on POSIX systems without any non-standard protections, it’s unsafe for anyone (but in particular, root) to do anything sensitive in a directory that is writable by another user. Cross-platform programs designed to do so are simply flawed.

“Learning to Read, Again”

2022-03-27T15:37:48+02:00

David Kolb:

I picked my own case because the focus on words and print makes various stages of learning to navigate a media world easily identifiable. […]

When I moved to Bates College in Maine in the late ’70s, the computer revolution was just beginning. I rented a standalone word processing device the size of a small refrigerator from Digital Equipment Corporation and used it to begin a book. A few years later I bought my first computer. The fledgling Internet was gradually coming up and everything I knew about reading, research, and paper was about to be challenged.

At first I didn't notice how much was changing[.]

His summary of the developments is succint:

The media have become predatory, grasping; you’re not in control. […]

Recall the old advertising tactic: stimulate people into a constant state of low level unfulfilled sexual excitement, and into that gap you can pour a infinity of products. That still continues, but now add another tactic: stimulate people into a constant state of unsatisfied anger and resentment, and out of that gap you can pull an infinity of votes and cash contributions. So we get exposes, fake news, endless conspiracy theories, all with urgent appeals.

And, as the salesmen say, “but wait, there’s more.” Aside from all those predatory grasping Internet manipulators there is something diffuse and deceptive going on, a general hyping and intensifying of our everyday encounters.

His thoughts on what answer must be are that wall-building isolation cannot possibly work. And so he proposes something else.

A proposed adequate definition of a data breach

2021-12-18T14:05:46+01:00

Troy Hunt:

A data breach occurs when information is obtained by an unauthorised party in a fashion in which it was not intended to be made available.

Eminently sensible; the definition shouldn’t hinge on technicalities about how the data got away. (I wonder if that means we should be using a different word than “breach”?)

Stop The Humour

2021-04-10T21:49:07+02:00

Jeff Johnson:

To be more specific, NSURL is based on an obsolete RFC. (Note: RFC is an acronym for Read the F-ing Commandment, while NS is an acronym for No Swift.)

Early Signs

2021-01-17T16:15:45+01:00

Jamie Zawinski:

Today is day 10,000 of The September That Never Ended.

The Internet: Mistakes Were Made.™

Useful GitHub Issues overviews

2016-04-24T14:06:46+02:00

I’ve always found the default, easily available views of GitHub Issues inadequate for my purposes. I want to separate issues by the kind of action I’ll want to take, but the interface is fundamentally oriented around a single list of issues, and by default that is just a big dump of every issue that involves you in some way. Luckily all the buttons are just UI over a query language, and the query language turns out to be just barely powerful enough to allow me to get the overviews I really want.

So here are the queries I’ve arrived at. Together they approximate a basic dashboard. Unfortunately there is not, to my knowledge, a keyword in the query language to refer to “whoever the currently logged in user is”, so I cannot demonstrate them as effectively as I’d like: you will have to manually edit them to subsitute your username for mine.

user:ap -author:ap

This shows all issues filed by others against my own repositories.

Semantically, this one is “stuff waiting for me to fix”.
user:ap author:ap

This shows all issues I have filed on my own repositories.

Semantically, this one is “my personal todo list”.
author:ap -user:ap

This shows all issues I have filed against repositores I do not own.

Semantically, this one is “stuff I need to keep bugging others about”.
commenter:ap -author:ap -user:ap

This shows all issues filed by others against repositories I do not own, which I have nevertheless commented on.

Semantically, this one is “stuff I care about as a bystander”.
involves:ap -commenter:ap -author:ap -user:ap

This shows all issues filed against repositories I do not own, which I have been mentioned in but have not commented on. There can be dross in here; I have a short username, and people importing content into GitHub sometimes trigger bogus mentions by having @ap somewhere in it. By isolating the things passively attached to me, I gain more use of the other queries.

Semantically, this one is “stuff someone considers me relevant to (or maybe spam)”.
Manual Subscriptions

This one is not a GitHub Issues search query, but is useful to include in this context.

Obviously this is stuff I’m not involved with but want to stay informed about.

That collection gives me a reasonable handle on everything I need to take care of one way or another, which I could not get from GitHub’s own built in views.

Update: I’ve split the last query, “involves:ap -author:ap -user:ap”, in two. Now it is divided on the source of my involvement as a bystander: myself or others.

Update: I’ve split the first query, “user:ap”, in two, to divide it on the origin of the issue: others or myself. I have also added a link to the issue subscriptions page.

`rename` 1.601

2019-08-30T20:35:32+02:00

Over 6 years have gone by since I cut the last release of rename because no new features have been needed in the time since. Unfortunately a number of small bugfixes and documentation additions have therefore sat unreleased in the repository for years. You have a bug report about a long-fixed issue to thank for me finally noticing. I am hereby rectifying this situation:

Share and enjoy.

Canonical Log Lines

2019-08-09T01:35:14+02:00

Brandur Leach:

Although logs offer additional flexibility in the examples above, we’re still left in a difficult situation if we want to query information across the lines in a trace. [… At Stripe, we] use canonical log lines to help address this. They’re a simple idea: in addition to their normal [logfmt-structured] log traces, requests […] also emit one long log line at the end that pulls all its key telemetry into one place. [… They] are a simple enough idea that implementing them tends to be straightforward regardless of the tech stack in use. […] Over the years our implementation has been hardened to maximize the chance that canonical log lines are emitted for every request, even if an internal failure or other unexpected condition occurred.

Glimmer of hope

2019-07-31T12:23:59+02:00

Fraser Speirs:

It’s also worth noting the significant impact that the rise of tablets has had on the design and capability of laptops. In 2010, laptops weighed four-plus pounds – not including a weighty charger – and got 3–4 hours of battery life. Today, they’ve halved in weight and more than doubled in battery life while getting faster, more robust and more flexible. In the final analysis, I think that the long-term effect of tablets will be that they forced laptops to get better.

“When was this stored” vs “when did it happen”

2018-11-04T21:15:12+01:00

Over on his weblog, Chris Siebenmann talks about metadata in file based websites:

In a file based website engine, any form of metadata that you can’t usefully commit into common version control systems is a mistake.

This is part of a class of mistake I’ve variously made or run into in a number of contexts.

A prominent example that comes to mind is an application I work on which tracks records of some kind, and in one place, users enter “number of X for the thing recorded here” as data points over time. Originally the point in time used for a data point was the time of entry of the data point. At some point I realised that this was a mistake: the point in time at which the number of X was Y is not the same as the point in time at which that fact was recorded in our system. Among other things, treating them the same means you cannot enter “number of X” data points after the fact (such as when newly capturing a record retroactively). One is metadata about a real-world event, the other is metadata about the usage of our application to record that real-world event.

So the generalised principle goes maybe something like this:

Metadata about a storage entity should not be confused with metadata about whatever the payload of that storage entity represents.

These may (obviously) coincide but are nevertheless not the same thing. If an article is stored in a file, the creation date of the file just says when that article was saved into that file – not when the article was written. As long as the “writing an article” and “saving it into a file” actions are strongly bound to each other, these metadata will coincide so that it may seem attractive to treat one as the other. But once you introduce ways for them to diverge – such as adding a VCS to the mix, which creates/removes/modifies files without a corresponding article-writing/editing action – you have a problem.

The HTTPS divide

2018-08-11T00:17:15+02:00

Eric Meyer:

I saw a piece that claimed, “Investing in HTTPS makes it faster, cheaper, and easier for everyone.” If you define “everyone” as people with gigabit fiber access, sure. Maybe it’s even true for most of those whose last mile is copper. But for people beyond the reach of glass and wire, every word of that claim was wrong.

Someone who goes by Roy called this quite a while ago: an HTTPS-only web is a web without intermediaries, built on a costlier protocol, and that has real costs. This (as opposed to the likes of Dave Winer’s barely-sensical paranoia) is why I am skeptical about the move to HTTPS, albeit acknowleding the necessity given the lack of better solutions in the near term. Non-benign network operators and universal surveillance are real problems that need to be addressed. We are not in a great place right now.

There are better options beyond the horizon, though. Don’t miss Eric’s comment section: several people mention ideas and proposals currently in the works. There is hope for… someday.

Who knows an XML document from a hole in the ground?

2006-01-16T02:21:27+01:00

Hello back, everyone. (With a tip o’ the titular hat to Phil Ringnalda.) Some of you who’re seeing this will now be scampering to catch up on 20 of my entries, posted since the beginning of last December.

Reticent a person though I am, in general, I am not that reticent. For those of you who wondered why it’s gotten so silent around here, the reason is simple: because you didn’t notice that you need to file a bug report against the aggregator of your choice.

Yes, that’s right. The software you use is broken. And make no mistake, you’re not alone. There is a wide range of choices among bug trackers (or customer support forms), and their associated software, to file reports against.

This all happened because I tried to be a clever and good citizen and, in the process, save a bit of space in a smart way. But let’s backtrack a bit, and let me tell you what happened, and what the effect has been.

Until the beginning of December, the structure of my feed looked like this (and from today on looks like so again):


  plasmasturm.org
  
  
  
    
      <div xmlns="http://www.w3.org/1999/xhtml">Foo</div>
    
    
      Bar
    
    
      Baz

You can see that there are xmlns="http://www.w3.org/1999/xhtml" bits strewn everywhere; in practice the effect is less drastic than it appears here where I’m eliding a bunch of Atom tags and any sensible content.

This setup means that in the document at large, the default namespace is http://www.w3.org/2005/Atom – so tags like are to be interpreted according to RFC 4287 –, but for the

tags and their contents, the default namespace is http://www.w3.org/1999/xhtml – so that the tags are to be interpreted according to the XHTML Recommendation.

This works perfectly in everything that claims support for Atom.

Then, in the beginning of December, I read an enlightening piece on XML citizenship by Joe English, titled A plea for Sanity, whose focus is how to structure XML documents with regards to namespaces such that software to process them can be kept simple, and which sets forth a few definitions on documents that are not sane. According to his definitions, my feed was neurotic: two different namespaces are mapped to the same prefix (i.e. the null prefix) at different points in the document.

I chose the neurotic structure I outlined above because it seemed logical to choose the Atom namespace as the default namespace for an Atom document at large. Since I author my weblog by editing a master feed (which contains the entire archive of the log) by hand, however, I definitely want the XHTML namespace to be the default for the sections of the document which contain HTML: having to write namespace prefixes in every tag would make the already tiresome experience of hand-written markup truly painful.

An added bonus is that despite the frequent repetition of the default namespace declaration, it actually saves space on declaring a prefix to map to the namespace once at the top of the feed and then having to write Foo everywhere. Across the whole feed, these two characters per tag easily add up to much more than the fixed cost of a namespace declaration in every text/content construct.

However, reading the plea for sanity inspired me to try something counterintuitive: declare the XHTML namespace as the default for the entire document and instead declare a prefix for the Atom namespace. This leads to a structure like so:


  plasmasturm.org
  
  
  
    Foo
    Bar
    Baz

This is a sane XML document. And according to XML specifications, both this form of the document and the previous one are semantically exactly identical. They mean the exact same thing, and any compliant software which can process one of them will produce the exact same results given the other.

It’s also worth noting that the number of Atom tags within an Atom entry is small and varies very little. So even though I’m having to prefix every Atom tag, this actually saves space on redeclaring the XHTML namespace over and over. (Right now, the master feed, even though the savings are particularly diminished in it because it contains the entire archive and thus has a much higher HTML-to-Atom ratio than the newsfeed on the site, is about 1.2% smaller in its sane version than in the neurotic one. For the on-site feed, the ratio is a tad larger; not much, but with frequently requested documents like newsfeeds, a penny saved is a penny got.)

I was very pleased with myself for figuring out a way to improve my feed while reducing its size in the process.

Too pleased.

As a matter of fact, within a few days, Scott Arrington got in touch on IRC and informed me that my feed was suddenly throwing an error in Safari: Safari can’t open the page “feed://plasmasturm.org/feeds/plasmasturm/”, it said, and misleadingly continued: The error was: “unknown error” (NSURLErrorDomain:-1) (after all, the problem has nothing to do with a domain in the internet sense; though maybe error domain is Apple framework lingo for error type or error class).

I shrugged. Surely, this was an outlier. Certainly, breakage like that wouldn’t go unnoticed. Likely, it would be fixed with the next batch of updates.

As a matter of fact, yes, it is nice to live with the illusion that the basics of XML are at least moderately well understood, in this year 2006 of the Lord. Thanks for asking.

Until tonight delivered a rude awakening from Jagath Narayan to my inbox. He informed me that my feed failed to parse in a number of aggregators. So I got on IRC and asked Scott’s assistance to test drive a few more Mac aggregators with the feed.

Here’s the list of known broken aggregators as of this writing:

Safari 2.0.3
Firefox 1.5
Thunderbird 1.5
NewsFire 1.2 (v45)
Feedreader 2.90
BottomFeeder 4.1
Bloglines (big surprise…)

What about the remaining major browser? Opera 8.51 is not broken – because it doesn’t support Atom 1.0 at all. (The upcoming version 9.0 will.)

None of the major browsers with feed support are compliant in their latest version. How depressing.

Here’s a list of known working aggregators:

And here’s a test case. Please mail me further results, and of course, file bugs avidly.

So now my feed is back to its old, neurotic form, and works for a lot more people. It feels good to know I’m back, even though I never knew I was gone.

Mutually assured intel

2018-05-10T19:18:44+02:00

Bruce Schneier:

Supply-chain security is an incredibly complex problem. [National]-only design and manufacturing isn’t an option; the tech world is far too internationally interdependent for that.

We can’t trust anyone, yet we have no choice but to trust everyone.

Don’t be a problem-solver

2017-12-24T12:45:28+01:00

halvarflake:

The one rule of thumb is: If you allow complexity into a place that should be simple, more complexity will follow.

Backoff

2017-09-15T03:12:06+02:00

Some time ago I had occasion to implement (mostly exponential) back-off in an application for the first time. This is not a hard problem, but at the outset I expected it to be one of those annoying cases where the code is only clear to read when you are immersed in the problem it solves.

Not so. It turns out there is a trivially simple algorithm if you pick the right form of stored state – namely, a pair of timestamps: (last_success, next_retry). The essential form of the algorithm goes like this:

if succeeded { last_success = now }
next_retry = last_success, i = 0
until next_retry > now {
  next_retry += step_size(i)
  i += 1
}

Because this recalculates the scheduling for the next retry by starting over from the previous success, every time, it is totally resilient against all fluctuations in the environment. Belated or missing retry attempts have no effect on its output. Even swapping the step_size function for an entirely different one mid-flight just works!

At the same time, it is trivial to reason out and verify that this algorithm works correctly.

I was quite pleased.

(In practice you will likely not have a separate step_size function and i counter but rather some kind of step variable iterated along with next_retry. But here I wanted to abstract away from the specific formula used.)

Update: As prompted by a question from Matthew Persico, let me clarify that my use case is scheduling polls that succeed only intermittently, meaning that I always want to wait at least once between attempts, which is why I used “until next_retry > now”.

If instead you want to add backoff to an operation that only fails intermittently (e.g. draining a buffer to I/O) then you’ll want to use “while next_retry < now” for your loop, so you can have zero-delay back-to-back attempts.

Six Stages of Debugging

2012-03-25T09:59:47+02:00

That can’t happen.
That doesn’t happen on my machine.
That shouldn’t happen.
Why does that happen?
Oh, I see.
How did that ever work?

[This is not mine. I posted it in the interest of personal archival because the oldest mention I could track down on the web appeared on a now-defunct weblog. In the meantime, Mike W. Cremer (who bills himself The Newton™ Scapegoat ☺) has claimed credit for coining it after a particularly frustrating DMA debugging session while slaving away on Dante (Newton OS 2.0). According to his account, this took place in Apple’s building at 5 Infinite Loop (nicknamed RD5 or IL5). The list was later to be found taped to Mike Engber’s door in IL2.]

Fixing a Google Chrome failure to save passwords

2017-07-02T00:33:41+02:00

This is a kind of post that people used to write back in the heady early days of blogging and a more communal web: putting something out there to help Google help other people.

The problem

For some time I had been having an irritating persistent failure with Google Chrome that I could not find an answer for:

After logging into some website, it would offer me to save the password, as usual.
I would click on the save button.
Chrome would not show any kind of error.
But the password would not be saved:
It was not filled in automatically next time I went to the same site.
No password at all showed up in the list on chrome://settings/passwords – the list just stayed blank no matter what I did.

Mysteriously, a handful of passwords did get stored, somehow, somewhere. Chrome could fill those in, even as it woudn’t list them on the settings screen. I checked the MacOS keychain and did not find them there, so they had to be stored by Chrome, even though it refused to show them to me.

The quest

Searching the web about my problems, almost all answers I could find related to the case of people who are logged into Google within Chrome and use its password syncing service… which I don’t. I simply want my passwords saved locally.

The few answers I did find that seemed to relate to my situation invariably suggested resetting one’s profile. Now, that approach does appear not to be mere superstition: of the people I found who had this problem, the ones who reported back all wrote that resetting their profile fixed the problem. So I had a way of making the problem go away – but I also have a lot of data in my profiles. It’s not just my bookmarks. I have tweaked many of the settings, individually for each profile (the whole point of using profiles, after all), and I also use a number of extensions, many of which themselves have extensive configurations. Recreating that all is a big task.

I want password auto-fill fixed while keeping my profiles intact. I am only willing to lose my stored passwords. (Because I save them in a password manager first anyway.)

So I went poking around in the directories where Chrome stores its profiles and other user data. There’s no need to look far: there’s a file called Login Data. This is an SQLite database (like most of Chrome’s user data files). It can be opened using the sqlite3 command line utility and examined using SQL queries. I did that, and there they were, my mysteriously saved passwords… plus a bunch more.

I also discovered that some of the data in those tables is scrambled in some form. Presumably there are several separatedly stored pieces of data required to unscramble that data, and some of those pieces somehow become mismatched on my system – I don’t know how nor why, and was too lazy to research. All I cared was that this looked like the right vicinity.

As an experiment, I moved the Login Data file and its Login Data-journal pair out of one profile… and bingo, password auto-fill started working there as expected. After saving a password, Chrome would subsequently successfully auto-fill it as well as list it on the saved passwords screen.

Good enough for me.

The solution

Deleting the files Login Data and Login Data-journal from a profile fixes password saving in that profile – without affecting any other data in it. A full profile reset is not necessary – you can reset just the password storage by deleting just the files that it uses.

This does mean you lose any passwords you had stored previously, unfortunately. But since you cannot really access them any more anyway, that data loss has effectively already happened by the time you delete the files.

Instructions

Quit Chrome.
Go to the directory where Chrome stores its user-specific data, below your user home directory:

Mac
~/Library/Application Support/Google/Chrome

Linux
~/.config/google-chrome

Windows
%UserProfile%\AppData\Local\Google\Chrome\User Data
From there, go into the directory called Default if you want to fix your main profile, or into Profile 1 or Profile 2 etc. to fix one of your extra profiles.
Delete the files Login Data and Login Data-journal.
Repeat for other profiles as necessary.

plasmasturm.org

Supplying Git a commit message prepared ahead of time

How to eject external hard disks used for APFS TimeMachine backups (I’m not making this up)

Low-brow clipboard integration with remote X

Closing for trouble in the land of streams

Doubly detached IRC

Comparing the contents of gzipped tarballs

The Programmers’ Credo

“CAPTCHA”

“POSIX hardlink heartache”

“Learning to Read, Again”

A proposed adequate definition of a data breach

Stop The Humour

Early Signs

Useful GitHub Issues overviews

`user:ap -author:ap`

`user:ap author:ap`

`author:ap -user:ap`

`commenter:ap -author:ap -user:ap`

`involves:ap -commenter:ap -author:ap -user:ap`

Manual Subscriptions

`rename` 1.601

Canonical Log Lines

Glimmer of hope

“When was this stored” vs “when did it happen”

The HTTPS divide

Who knows an XML document from a hole in the ground?

Mutually assured intel

Don’t be a problem-solver

Backoff

Six Stages of Debugging

Fixing a Google Chrome failure to save passwords

The problem

The quest

The solution

Instructions

plasmasturm.org

Supplying Git a commit message prepared ahead of time

How to eject external hard disks used for APFS TimeMachine backups (I’m not making this up)

Low-brow clipboard integration with remote X

Closing for trouble in the land of streams

Doubly detached IRC

Comparing the contents of gzipped tarballs

The Programmers’ Credo

“CAPTCHA”

“POSIX hardlink heartache”

“Learning to Read, Again”

A proposed adequate definition of a data breach

Stop The Humour

Early Signs

Useful GitHub Issues overviews

user:ap -author:ap

user:ap author:ap

author:ap -user:ap

commenter:ap -author:ap -user:ap

involves:ap -commenter:ap -author:ap -user:ap

Manual Subscriptions

rename 1.601

Canonical Log Lines

Glimmer of hope

“When was this stored” vs “when did it happen”

The HTTPS divide

Who knows an XML document from a hole in the ground?

Mutually assured intel

Don’t be a problem-solver

Backoff

Six Stages of Debugging

Fixing a Google Chrome failure to save passwords

The problem

The quest

The solution

Instructions

`user:ap -author:ap`

`user:ap author:ap`

`author:ap -user:ap`

`commenter:ap -author:ap -user:ap`

`involves:ap -commenter:ap -author:ap -user:ap`

`rename` 1.601