“When was this stored” vs “when did it happen”

Sunday, 4 Nov 2018

Over on his weblog, Chris Siebenmann talks about metadata in file based websites:

In a file based website engine, any form of metadata that you can’t usefully commit into common version control systems is a mistake.

This is part of a class of mistake I’ve variously made or run into in a number of contexts.

A prominent example that comes to mind is an application I work on which tracks records of some kind, and in one place, users enter “number of X for the thing recorded here” as data points over time. Originally the point in time used for a data point was the time of entry of the data point. At some point I realised that this was a mistake: the point in time at which the number of X was Y is not the same as the point in time at which that fact was recorded in our system. Among other things, treating them the same means you cannot enter “number of X” data points after the fact (such as when newly capturing a record retroactively). One is metadata about a real-world event, the other is metadata about the usage of our application to record that real-world event.

So the generalised principle goes maybe something like this:

Metadata about a storage entity should not be confused with metadata about whatever the payload of that storage entity represents.

These may (obviously) coincide but are nevertheless not the same thing. If an article is stored in a file, the creation date of the file just says when that article was saved into that file – not when the article was written. As long as the “writing an article” and “saving it into a file” actions are strongly bound to each other, these metadata will coincide so that it may seem attractive to treat one as the other. But once you introduce ways for them to diverge – such as adding a VCS to the mix, which creates/removes/modifies files without a corresponding article-writing/editing action – you have a problem.