Good HTTP citizenship for DiggBar protesters
Saturday, 11 Apr 2009 [Thursday, 16 Apr 2009]
John Gruber started a bit of a wave by blocking the DiggBar on his site and explaining how others can do the same.
However, his implementation as it currently stands does not play well with search engines and caching proxies.
-
The problem with search engines is easily stated and easily solved: a search engine that comes in via a DiggBar link will index the get-lost page as the regular content for that page.
To avoid that, simply send the page with a status of 403 Forbidden rather than 200 OK, thus telling the search engine that there’s nothing there for it to see. This is trivial to do – in PHP, just add a line like this to the code:
header("Status: 403");
Note that this needs to happen before any output is produced; i.e. in John’s example it would go above the
echo
line. -
Fixing the caching proxy problem is not as nice.
The issue is what happens when two users request the page through the same caching proxy: the version of the page that’s served to the first visitor will be cached and served to the second visitor as well. But if one of them hits the page via the DiggBar and the other comes in via legitimate venues, then the second visitor will get the wrong version of the page no matter the order in which they hit the site. Of course the worse case is when the DiggBar user was first: then the second, legitimate visitor will be served the get-lost page.
Just the response status fix above should help a little with that: proxies generally use different expiration rules for such as 403 responses.
To really fix the problem, however, you need to also send a “
Vary: Referer
” header. This asks proxies to request a separate copy of the page each time they see a client request it with a differentReferer
. This ensures that the correct page will be served to everyone, it even ensures that caching remains possible despite the page content possibly varying depending on the source link.However, while correct, it is probably hard to stomach: it will increase the volume of proxy reqests by roughly a factor of the number of distinct links to your site. That’s easily an order of magnitude for popular weblogs like Daring Fireball. Then again, not implementing it will penalise some innocent visitors.
Sending variable pages properly in an infrastructure with caches is not cheap.
Update: in email, Mark Nottingham suggests simply making the 403 explicitly uncacheable by sending along a “
Cache-Control: no-store
” header.
Update: Adrian Sutton:
So that’s roughly what has now been deployed to Symphonious.net. The key difference is that the “
Vary: Referer
” header that Aristotle suggests is only added when the page is blocked. This means it’s possible for someone using the DiggBar to get the real page from a caching proxy, but it shouldn’t be possible for an innocent user to get the blocked page.
That’s a clever trade-off. He compounds this with a Javascript solution to bust the DiggBar frame, because a user might be coming in through a link from another site that in turn has been framed – which isn’t obvious from the referrer. Nice work.
Update: Mark Nottingham remarks that serving responses both with and without a Vary
header for the same URI is likely to confuse caches.