Good HTTP citizenship for DiggBar protesters
However, his implementation as it currently stands does not play well with search engines and caching proxies.
The problem with search engines is easily stated and easily solved: a search engine that comes in via a DiggBar link will index the get-lost page as the regular content for that page.
To avoid that, simply send the page with a status of 403 Forbidden rather than 200 OK, thus telling the search engine that there’s nothing there for it to see. This is trivial to do – in PHP, just add a line like this to the code:
Note that this needs to happen before any output is produced; i.e. in John’s example it would go above the
Fixing the caching proxy problem is not as nice.
The issue is what happens when two users request the page through the same caching proxy: the version of the page that’s served to the first visitor will be cached and served to the second visitor as well. But if one of them hits the page via the DiggBar and the other comes in via legitimate venues, then the second visitor will get the wrong version of the page no matter the order in which they hit the site. Of course the worse case is when the DiggBar user was first: then the second, legitimate visitor will be served the get-lost page.
Just the response status fix above should help a little with that: proxies generally use different expiration rules for such as 403 responses.
To really fix the problem, however, you need to also send a “
Vary: Referer” header. This asks proxies to request a separate copy of the page each time they see a client request it with a different
Referer. This ensures that the correct page will be served to everyone, it even ensures that caching remains possible despite the page content possibly varying depending on the source link.
However, while correct, it is probably hard to stomach: it will increase the volume of proxy reqests by roughly a factor of the number of distinct links to your site. That’s easily an order of magnitude for popular weblogs like Daring Fireball. Then again, not implementing it will penalise some innocent visitors.
Sending variable pages properly in an infrastructure with caches is not cheap.