Crouching Harold, hidden formats

Elliotte Rusty Harold roundly disses microformats, comparing the practice of utilising them to homeopathy, of all “disciplines.” A bit of cheeky banter, so it’s probably churlish to point out that the comparison itself turns out to be unsound within his own argument: whereas homeopathy might arguably be no solution to any problem, Elliotte’s beef with microformats seems to be that they solve a problem—expressing non-XHTML structure within XHTML—for which he believes there are more efficient solutions.

Not being immediately aware of any such alternative (microformats having evolved from a number of web practitioners’ frustrated wishes to add extra semantics to their XHTML), I was a bit surprised to read:

The only reason I can imagine you might choose a microformat over a macroformat is because macroformats are invalid XHTML, but so what? XML doesn’t have to be valid! That’s a deliberate design decision in XML. Some say invalidity is the real revolution in XML. It’s what XML brings to the table that SGML never had.

Well. This is true, in a sense, but not really pertinent to the actual problem that microformats are intended to solve. SGML didn’t “have” to be valid, if we’re talking pragmata (you say prag-may-ta). Millions of webpages out there had the most godawful pseudo-HTML on them, and browsers muddled along reasonably well. But that wasn’t enough. Our data didn’t soar. Our browsers didn’t leap, nor did they bound.

There were a number of motivations behind establishing DTDs for HTML, let alone for XHTML, and one important one was being able to hand a webpage to someone who wants or needs to be able to maintain it in good HTML, and give them an editor that could enforce the standards, either by beeping at them (Sean McGrath might turn in his blog at that one) or by quietly fudging good HTML in the background without telling them. If you start putting arbitrary tags in, you break the silent checks that mean non-technical people can actually write webpages.

Many of the clients I work with have government or charitable funding, and a proviso of this funding is that their pages be accessible to web users with special needs, and to be strongly future-proofed in the light of past mistakes. In part for the reasons above, but generally because it’s safest to enforce as strongly as possible without affecting the primary goal of the medium, accessibility standards and funding bodies’ definitions of “future-proofing” tend to require pages to be valid XHTML1.0 . It’s fine for someone technical to say to themselves “this is valid XHTML1.0, except for the bit I’m putting in now”, but there’s no guarantee that the next person to touch that page will understand or care. Or the next person. Or the next person. Didn’t we… didn’t we tackle the problem of markup rot once already?

Adding your own tags to XHTML is something that Elliotte Rusty Harold can do with aplomb, and probably ought to because he knows what he’s doing. And if he’s got the spare time to maintain every site on the web on their owners’ behalf, to ensure we don’t return to the tag soup of HTML that one couldn’t even begin to validate, then he’s welcome to give it a go. In the mean time you might want to campaign for an exception to current UK disability law, or at any rate almost every organisation’s interpretation of it, in the case of sites maintained by Elliotte Rusty Harold—I’ll even sign the petition if there’s one passed round—but I can’t see it gaining much traction.

There’s a pragmatic solution to the validation problem, though. We could reprogram our XHTML validation engines to ignore specific blocks of markup based on particular criteria, say a reserved attribute on particular elements that means the content is checked for well-formedness—CDATA elements won’t do, because they can contain absolutely anything, and you lose the power of XML—but ignored by DTD and schema validation. Might I suggest <div class=”xhtml-ignore”/>…? It’s funny, but that particular method rings a bell.