Universal re'locator

What happens when nobody will take responsibility for a standard that the web relies on?

RSS, the standard millions of us use to syndicate content, and view other people’s syndicated content, was originally invented by Ramanathan Guha at Netscape, for use on its my.netscape.com portal. Soon afterwards, Netscape lost interest in the format, leaving it ownerless and later on picked up by a development community spearheaded by UserLand Software.RSS 0.91 became 1.0 and 2.0, yet despite the deprecation of the grandaddy of them all 0.91 is still around and in use, arguably because of the vast overcomplications in its immediate successor and the divisions that it caused in the community.

The problem with that is as follows. Every time someone views an RSS 0.91 syndication feed with certain types of syndication software, their computer attempts to get the DTD from this location on the my.netscape portal—it’s hardcoded into the way that the software understands what XML format it’s dealing with. So this URL gets plenty of hits:

http://my.netscape.com/publish/formats/rss-0.9.dtd

Which is great, until Netscape decide—legitimately, one might argue—to update the my.netscape portal and get rid of the DTD. Which they did, at the start of the year. At that point, a good portion of the syndication lights go out across the world. And although we now have a moratorium until July 2007, nothing has really been solved in the long run.

Anyway, Netscape shouldn’t have to support the bandwidth of millions of DTD downloads for a standard they declared defunct—when did they sign the don’t-be-evil contract?—and maybe people should “just” move to a newer version of RSS, or Atom. But this whole episode is an Ozymandian warning of what is to come. We’ve reached the point where the URLs of industry (one-time) giants are simply no longer to be trusted as the location of standards.

One day Microsoft, and Sun, and IBM, will cease to exist, and their websites become the 22nd century equivalent of Google-adsensed search engines (Google, of course, will be around forever, more’s the pity). Sooner or later something really horrible will happen for the open communities, say Purl disappearing for good, taking things like the Dublin Core XML specification with it. We need to know how to deal with the loss of their specifications and standards now: the unreliability of the URL as a locator for DTDs and schemata. Or is the only lesson we can draw from history, that we’re destined to wander from standard to standard as the specifications drop off the radar, leading the nomadic life of those standardized today, obsolete tomorrow?

Comments

Isn't this what the Public identifier is for? I've not fiddled with RSS, but the equivalent for XHTML 1.0 is:

DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

where even if the URL is unavailable the "-//W3C//DTD XHTML 1.0 Strict//EN" should be sufficient to identify the DTD that, frankly, every RSS reader should have cached locally anyway.

(Relatedly, ISTR this is why there wasn't supposed to be anything downloadable at the URL that defines a namespace.)

I don't know what you mean by "for", really. I can't take that public identifier and use it to validate the content. You might as well say that you can treat the URL that follows it as a string that isn't really a URL, just an identifier, and junk the PUBLIC identifier as redundant. It doesn't solve the lookup problem.

Whether or not you cache the standard locally is a separate issue too, I think. If there are millions of copies of the DTD about the place then as far as reliability is concerned that's as bad as having no authoratitive single copy. And if your cache has no latency time after which it goes and rediscovers the authoratitive standard then you run the risk of being unable to refresh a somehow polluted cache.

I'm glad to know there's some reason why you can't download anything at the URL that defines a schema namespace, though: it's a shame it doesn't really make up for the fact that any process whereby you might try to automate schema referencing, downloading and validating is fundamentally screwed by having bugger-all to download at the xmlns: URL, even if the relevant authoratitive standards are still alive and well.