Save our servers!

Sick and tired of getting a million hits, all to the same page, which more often than not hasn’t been updated in the mean time? Want to reduce your bandwidth and server-time loads without necessarily impairing your visitors’ experience of your site?

If you haven’t ever had cause to use it, there’s a standard called ETag out there which you can probably implement using existing technology that can boost the efficiency of your content delivery a hundredfold. Along with the longer-standing HTTP header Last-Modified it can be used with compliant browser/aggregator software to drastically lower your overheads, while scarcely impacting on your non-compliant users. And although the two standards see most use in the blogosphere, they can be used for anything else from a company’s record in a directory to enormous high-resolution image feeds from astronomy laboratories.

The idea is that you embed in every outgoing page request a couple of HTTP header lines. That’s easier for the total n00b than it sounds: you can do it in one line each with, say, the PHP function header() or the Coldfusion tag <cfheader>, or even in the HTML if you don’t have that level of access, using <meta http-equiv=.../>. The point is that you set the following two flags on the “envelope” that surrounds the page you send to the browser:

Last-Modified: Wed, 15 Nov 2006 18:20:54 +0000
ETag: “78c4d3d8-1834-11dc-8314-0800200c9a66″

Typically for the first field you’ll want e.g. the latest date from your RSS feed, or the date on which a semi-static page was last edited. The second field is really up to you: if you never go back and edit posts without changing the published date then (a) well done you—there’s a space in heaven already reserved—and (b) you can just calculate ETag from the published date. It can actually be the published date if you’re a bit slack, although you might want to hash it instead, in the way that you might with passwords, to avoid any sloppy client software depending on ETags being dates.

What happens next? Well, you won’t see anything at first. But compliant software that’s visited your site before will start sending you two headers that correspond to your original submissions:

If-Modified-Since: Wed, 15 Nov 2006 18:20:54 +0000
If-None-Match: “78c4d3d8-1834-11dc-8314-0800200c9a66″

The idea is that you make sure that you can compare these to the values you’re about to send out, quite early on in your workflow. That way, if they match, you can immediately terminate all further work and just send a “304: Not Modified” HTTP header. The result? Well, with quite complex pages, involving the computation of tag clouds and term hierarchies and archive structures, you can work out quite early on whether it’s worth bothering, or whether the remote client will know exactly what to do if you just tell it, concisely, that nothing has changed since it last looked.

Word to the wise, though: if you are going to send a 304, you should also send your two headers as you always would. If you don’t then you only win out every other time, because the remote client will see the absence of ETag and Last-Modified headers and duly forget the ones it had in its cache.