Aggregator engineering: feed refresh

Saturday, 7 May 2005

Something that no desktop aggregator I’ve used so far does very well is feed refresh rates. The typical setup is that there’s a global refresh rate setting, which constitutes the default for feeds without an individually specified refresh, and an optional separate setting per feed. Typically, when the user adds a feed to poll, they are given the choice to specify a refresh rate or use the global default.

Pretty straightforward, right?

Too straightforward. This is one of those cases of a programmer-friendly interface that doesn’t assist the user as it should. It is one of those interfaces which even users will have a hard time finding something wrong with once they’ve gotten used to it. The problem is only apparent for users who are new to aggregators. I remember that when I first started using one, I never knew how to decide on a good interval to use. If I had that problem, how is my grandfather or my 10-year old nephew supposed to make a good, informed choice? And why should they have to?

After all, any necessary the information is already there. Good feeds make it easy:

RSS 2.0 has the optional pubDate element to signify when an entry was published.
RSS 1.0 can optionally use Dublin Core elements for this purpose.
The upcoming Atom 1.0 has two mandatory and two optional elements per feed entry to very granularly express when each of the various events in the life of a feed entry occurred.

With this information, it is trivial to calculate an average feed update interval and a standard deviation and use this data to make an intelligent automatic choice for when to schedule the next refresh.

However, even in absence of such information, as permitted by all the formats which share the RSS initialism, a good choice is not difficult, even though it does get more complicated to achieve. Exponential refresh interval backoff (which probably should have a small, fractional radix around 1.5), reduced by a function of the number of new entries found when there are any, should keep the update interval close to an ideal frequency. After a certain amout of time has passed and/or a certain number of new entries has been collected, the aggregator has a statistically sufficient number of samples to make the aformentioned calculations.

So given that it’s perfectly expectable of an aggregator to automatically handle refresh intervals intelligently and that users should not be expected to make sound calls on this matter, why should the interface even offer an option for them to twiddle it? The aggregator should figure this all out by itself.

Users should only have a way to force an immediate, full refresh – that is, without If-Modified-Since headers and other bandwidth-saving methods. Such a forced refresh should also invalidate the currently scheduled refresh and its results should be accounted for when rescheduling.

This is a substantial usability gain: subscribing becomes a one-step process which can be performed equally well by anyone. As a bonus the aggregator would consume less overall bandwidth and yet most likely receive updates more timely than with user-set intervals.