Blosxom to WordPress: tying up loose ends

A busy few weeks, but they’ve included an import from a Blosxom blog to a WordPress blog which is worth describing. There are a couple of established methods for importing the data, and I opted for the one that seemed the most modular. This was Eric Davis’ Import-Blosxom method, consisting of a PHP script on the WordPress side and a set of Blosxom flavour files which produce a feed compatible with RSS 2.0. This separation of Blosxom and WordPress behaviours meant that I could thoroughly test the former before proceeding with the latter.

It worked very well with practically no configuration or edits, but there were a few issues with the out-of-the-box behaviour of the import script:

  1. Unicode character entities were being escaped in titles, leading to the exposure of the alphanumeric code e.g. “Zürich” instead of “Zürich”.
  2. Whitespace in post bodies is converted to hard newlines by WordPress, and so must be excised to avoid tags being broken e.g. ‘<a [newline] href=”…”>’ becoming ‘<a <br/> href=”…”>’.
  3. Multiple hierarchical categories are not supported (a known problem).
  4. Although categories are created and posts are linked to them, the number of posts that a category is used in is not incremented and hence the list of categories on the front-end has zero posts for each category(possibly owing to a change between WordPress versions of how this has been handled).

I’ve come up with a number of fixes that I’ve mentioned both to Davis and on the WordPress support forums. As they’ve been greeted with an eerie silence that I’ve found typical of such forums, I’ll put them up here instead.

To fix the first three problems I created rss_to_wp, a Blosxom plugin that, along with the standard interpolate_fancy package, you can use to wrap your title and category processing bits in the RSS2.0 flavour templates. Respectively, this plugin tackles the above problems by:

  1. Providing an interpolate_fancy method to unescape entities
  2. Normalizing any whitespace in the body of your Blosxom posts to single spaces
  3. Providing an interpolate_fancy method to convert a Blosxom-style category path into a set of category tags

You’ll need to change the Davis-recommended story.rss20 template to implement the two interpolation methods. I’ve made a sample available.

The final issue was a more knotty problem, as it was a bug in the script (possibly caused by WordPress’ handling of categories changing over time). It’s easily fixed by adding a few lines to the category-handling part of import-blosxom.php as follows:

294    if (!$exists)
295    {
296        $wpdb->query("INSERT INTO $wpdb->post2cat (post_id, category_id)
297                      VALUES ($post_id, $cat_id)");
298    }
300    // JPS' addition - increment count if cat ID exists
301    if ($cat_id) {
302        $wpdb->query("UPDATE $wpdb->categories SET category_count = category_count + 1 WHERE cat_ID = $cat_id");
303    }
304    // End JPS' addition

Exit gracefully: exporting and then importing—transporting?—works well if the two tasks are separable. That way the integrity of the exported data can be checked in its transitory state and any bugs worked out, before it’s imported into the new system. It’s certainly worthwhile backing up the target database for the import, as this lets you preserve any quirks of your target database if you have to dump all the imported data and start again. The standard WordPress install includes a plugin for doing this, but the command-line tool mysqldump is arguably more powerful.


[...] If you’re curious, I used Jason Clark’s Blosxom-to-Wordpress instructions, and those on the Graceful Exits blog. After all the various edits that all the pages suggested, I used import-blosxom.php. All the flavour files (Blosxom’s spelling of “flavour,” by the way, not mine) I used are in a separate directory. The import also requires that you install the Blosxom interpolate_fancy plugin, which I’ve set aside for you. [...]