Importing from Wordpress to Drupal

The first stage of building this site was to import the content from the previous site. Well, the first stage was actually to set the site up: to install Drupal and enable the relevant modules. At Torchbox we've got a custom install profile that does a lot of this for us, installing and configuring relevant modules and creating users and roles. The actual company profile does a lot more work than I needed, in fact, and I've had to pare it back a little so that I've got less to maintain and worry about.

Importing content from Wordpress was largely handled by... the Wordpress Import module. There's a Drupal heuristic that, if it's a problem that a few people have encountered in the past, there's probably a module for it. Wordpress instances provide an XML export file called a WXR file, which you put on the filesystem and the module can convert content, freetext tags, the category hierarchy and users/authors.

The one tweak I had to make to the module was required to import the article summaries or excerpts, the little "humourous" quotes that are intended for blog listings. These were present in the WXR file as the <excerpt:encoded> element, and the Wordpress Import module contains a nice utility function that meant I only had to add this code at around line 621 to bring in the excerpt as a CCK field:

$node["field_summary"][0] = array(
  "value" => wordpress_import_get_tag($post, 'excerpt:encoded'),
  "format" => $params['format'],
);

Overall importing from Wordpress was pretty smooth---thanks to both Wordpress and Drupal, to give both technologies their due---but having content on a beta site is a double-edged sword. It's great to be able to see broadly how the site is going to look when it goes live, with everything in place; on the other hand, it's disappointing to be able to see broadly how the site is going to break when it goes live. Content showed up all sorts of little bugs: missing, slightly quirky CSS formatting you'd forgotten about; oversensitive output filters; slightly wonky imported URL aliases that needed a visit to the database to fix.

As I iterated and tweaked the configuration on the Drupal site---with content already in place---I had effectively frozen the site development, and could no longer roll back and re-import as it would lose my configuration changes. The import itself meant that for a week or so I was running two sites in parallel, writing blogposts on both, and getting a bit flustered about it all.

I think it was the right, pragmatic decision to do that, even though it initially felt like a lot of overhead: writing some sort of module to do the configuration changes was possible, but didn't really suit the way I wanted to fiddle with the site rather than run a set of fixes; importing at the very last minute would have meant I'd not have found most of the little irritations until they were publicly visible; not putting content on the blog for a week wasn't really possible, what with Oxford Geek Night 14 rumbling on, and our most excellent sponsors all clamouring to give us stuff. I just wish, as always, that I'd had twice as much time to do it all in.