The quantum leap

Sunday, 21 Aug 2005

This site is now officially Atom 1.0 powered. You may have noticed, if you are subscribed to the weblog or to any of my scraped feeds.

The changes specific to Atom 1.0 were all trivial, but required me to review all the XSLT I’ve written for this site. As a result, I spent much more time down in the bowels of the transforms than I had planned to.

The CPAN ratings feed had to be fixed because changes at that site broke the scraper. use Perl; has acquired Atom feeds, so I rewrote its scraper to use those, throwing out its laborious RSS-to-Atom conversion fluff and improving it so it won’t easily miss entries anymore. I also realized that generating tag: URIs that use other people’s domain names is perhaps not so wise… Some of the scraped feeds have hence changed their ID scheme, others are still pending an update; if you saw old items in one of the feeds reappear as new in your aggregator, sorry.

I also ended up completely restructuring the transforms that generate my own weblog. Previously, there were two transforms, one to generate the live feed from the archive, and one to generate anything HTML from the archive. The latter was a monolithic moloch which took a mode parameter to decide whether it should generate the frontpage, the archive or a permalink page by dispatching to templates with the corresponding mode attribute. It was a hack, and though each responsibility in the transform was cleanly separated, the whole was an unsightly pile of incongruent units. Now each target has its own single-purpose transform that imports a library of templates common to the HTML generators, and another library with templates common to all transforms including the live feed generator. In the process, I collected a bunch of the functions I used in both scrapers and weblog transforms into two other libraries yet.

As a bonus, I added some features to automatically crossreference information available elsewhere in the feed. Namely, any intra-site link will now automatically carry the title of the linked page, and every entry now includes a Backlinks section when other entries link to it. The former applies to all outputs, while the latter only happens for HTML pages.

In short, I spent a considerable amount of time bringing all my old code up to date with my current understanding, and gained some significant new insights in the process. The excercise also yielded a bunch of code nice enough that I wager others might find it useful, so I plan to wrap up various bits and put them up for download. I have yet to update my Atom to HTML transform anyway, and in so doing will have a good opportunity to tie everything together for release.

I’ve noticed that some aggregators currently balk at my feeds… let’s hope people update their codebases quickly. Not that I have grounds to speak, what with how long it took me to do this for my own site. I also notice that Bloglines really does have the bug I thought was solely my own fault.