module

Migrating users and profiles with the Migrate module

At Torchbox we’ve been working a lot with the Migrate module recently. It’s a framework for representing relationships in the data you want to import; both relationships between bits of data, and also relationships between the data and relevant Drupal entities. It used to be a GUI-driven system, but now it seems quite code-heavy.

Which suits us fine, certainly for one-off imports which can be built most straightforwardly by developers. We had to write an import for around 4600 users, plus data which we wanted to store in three Profile2 profiles (address, subscriptions, personal information.)

Migrate did a lot of the heavy lifting for us, along with some code examples. The most useful one was this example import module, started by wusel and edited by (among others) Profile2’s maintainer joachim. As a thankyou I’ve contributed to that post a bit, tidying things up and incorporating some of the lessons we learned.

The results? A flawless migration, including mapping many CSV columns to three multi-valued entity fields for the subscriptions (basically an ORM mapping.) And it was blisteringly fast: user importing on a reasonably high-spec server was at the rate of 12592/min, and we had a peak profile import of 9954/min.

Webform Protected Downloads and zero expiry time

Webform protected downloads is an extension to the webform module, which turns a webform submission into a key which unlocks protected attachments through a hash emailed to the submitter. The hash can be set to have an expiry time, in seconds from the email being sent; a zero expiry time means “never expire”.

However, zero expiry times are treated slightly incorrectly by the module’s internals. The first problem, Dave posted a patch for in late March: zero times were treated as immediately expiring. The second, I just posted a new patch for: zero times were going into the database as zero just fine, but then the cronjob was deleting them, as if  ”0” meant “1 January 1970”! Hopefully this fix will get submitted to the next release of WPD.

Blog category:

Feeds objects within feeds objects

The Drupal Feeds module consists of layers of objects, tunnelling between each other, like a pearl onion on a cocktail stick

We've been doing a lot of work with the Drupal Feeds module recently. The frontend is nice enough, although the sub-navigation was rendered almost illegible by our theme's CSS. The online tutorials need work, and the admin navigation needs to be made a bit more robust to layout changes; but then it will be the de facto way for people to consume feeds on their Drupal sites.

The most recent work we've been doing involved custom integration with RSS feeds arriving effectively as PHP string variables containing all the XML. This is different from either a file on disk or a remote URL: in fact, we had a Python program creating the RSS file from us via a shell (which in turn, horribly, was hitting a remote Oracle database using cx_Oracle). Feeds was definitely up to the job in terms of power. In fact, it was quite a toolkit of useful functionality, which is Drupal code for "incredibly powerful but almost incomprehensible.

It's not that the developer documentation for Feeds isn't decent: it's pretty good. But it's limited in scope: it tells you roughly how to expose your own Feeds-like objects to the admin interface, but not really how all those objects interact. Most importantly, we wanted to know what happened on a cron run: this is the bedrock of how Feeds works on your site, after all.

I poked around a bit and this is what I discovered:

 

Workflow of a Feeds cron run

Here's a summary of the above diagram to give you some idea of what's going on.

 

  1. Drupal's cron creates a FeedsScheduler object and passes it a "job", which is all the configuration for a feed call, including any configuration that was attached originally to the particular node which defines the Feed. The scheduler creates a FeedsImporter and passes it the job; the importer then creates a FeedsSource and embeds itself in it as a parent. In each case, the method ::work() is called to create the child/helper object.
  2. The Source object is what now runs the three phases of feed consumption, via its parent Importer. The Source asks the Importer for the relevant Fetcher, Parser and Processor objects: for example, the HTTP Fetcher, the RSS Parser and the Node Processor objects are strung together to turn an RSS feed at a HTTP URL into a set of nodes, one per entry. Each of these have a relevant, verb-like named method: so ::fetch() for the Fetcher etc. The common currency is a FeedsBatch object, which gets passed around and needs to have methods that make it feel like a batch of feed objects.
  3. After the three phases have run, the Source calls hook_feeds_after_import() to do any tidying, then quits to the Importer, which quits to the Scheduler, which then runs its ::finished() method on the job, and the cron run for this particular feed is done.

 

When you build a new plugin, you need to implement hook_feeds_plugins() in a module and reference a class file: this class will be selectable in the admin interface for one of the three consumption phases, depending on what class it's ultimately based on. You should therefore extend existing classes rather than start from scratch: there are abstract PHP classes in the feeds module directories, which give you skeleton "interfaces" which you can then flesh out with relevant functions. But what's better is to extend e.g. the HTTP fetcher to fetch from a command on disk (which is what we did) or, say, extend the CSV parser to interrogate JSON.

Class hierarchies mean you don't have to spend a lot of time reinventing the wheel or hacking existing modules until they become unupgradeable; instead you can take existing classes and tweak them through inheritance, experimenting as you develop.

Blog category:

Drupal module: watermarking your development sites

If you've ever been programming in dev-test-prod environments and thought "now, where am I?" then this might be for you.

Developing Drupal in a development--test--production environment has a lot of advantages. Each developer's work is sandboxed, staging is straightforward, and deployment to live the stuff of Capistrano scripts---especially if you unite the entire ecosystem of separate environments with version control.

However, it can lead to confusion over precisely where your browser's currently pointing at: at best, this can be comical; at worst, it can result in either loss of live content or the logjamming of a staged site with content intended for live. Suddenly a staging environment is out of action until that content can be exported to the live site.

Enter devwatermark β-0.1, a D5 module intended to watermark any non-live sites with a little right-hand banner overlay. When first enabled---you can do this on your live site---it inserts no such banner. However, as you add live domains to its configuration, it begins to work out when it's not on a live site and tags your browser window with the "DEVELOPMENT" banner. Like watermarking your printouts with "DRAFT", or maybe like dogearing a page in a book. If your site has to respond to multiple domains---and if the content is the same then you should really serve up 301s instead---then you can add those extra domains to the configuration as required.

Having devwatermark enabled on live means that when you bring a live database down to a staging site to test the next round of updates, the change in domain will make devwatermark automatically show its banner image. You know instantly when you're in a development---and hence content-volatile---environment. That means that you can also watermark staging sites as non-development too: by adding them to devwatermark's configuration you can confirm to the developers that here's a database environment that they can't wipe and start again.

As I mention above, it's currently only available for Drupal 5, and your theme has to respect hook_footer (most do out of the box.) But please feel free to download it and give it a try.

New alpha version of Drupal EditInline module

EditInline is four, er, alpha subversions old. I bought it a cake.

My Drupal module for editing nodes inline EditInline is at version ɑ-0.4. Just to summarize, the module lets you edit either the current node (or any other node where the title comes from Views or node template rendering) in a lightbox overlay. That means you don't always have to navigate to (or even know how to navigate to) a piece of content in order to edit it, making editing more accessible and intuitive.

Now you can also edit nodes in nodereference fields, while you're on the page to edit the current node! That means you can be on the edit page for e.g. a publication, but edit the author biography node attached to it by a CCK nodereference field. There's little edit buttons to the side of nodereference autocompletes which . Also, once you've edited the node in the lightbox overlay and it's closed, any title edits are also changed in situ to help you envisage how the page will look without having to refresh.

I did some of the work---mostly that leading from ɑ-0.2 to ɑ-0.3---during handy gaps between talks at Drupal Camp UK, held at BBC Manchester a couple of weeks ago. I wish I'd caught the wave of blogging about it at the time, as it was tremendous fun. The talks were all of a very high standard, but what felt more important to me was meeting people in the UK's Drupal community, and realising at first hand that the fun-loving, Drupal-interested, hard-drinking weirdos (that I'd always hoped were hiding here and there on the IRC channels and forums) really do exist.

EditInline second alpha release

Further improvements to EditInline mean it's actually worth a second alpha release. Good heavens.

EditInline was first discussed here. It's a Drupal module that provides your site with handy editing links, inline with each node title, which rather than taking you to a separate editing page use a lightbox overlay on the current page to provide an inline editing interface.

It's currently in alpha but available under GPL on the Torchbox public subversion repository. I've recently prepared another alpha subversion release, so if you want to have a look at EditInline ɑ-0.2 then feel free to download and let me know what you think.

(The main improvement from ɑ-0.1 is a dedicated form workflow for editing inline, which results in submission of the form leading to the lightbox overlay being automatically closed. Future releases should include CSS hooks into the lightbox content and other usability improvements.)

Inline edit links, but not editing inline

Squaring the circle of simple CMS usability with complex content representations, with a neat low-footprint Drupal module

It's heartwarming, really encouraging to see that Drupal 7 is undergoing a usability review. Drupal's a massively functional CMS, but all the functionality in the world won't help you when the average (for which read: can't write HTML, let alone PHP) CMS user can't discover it. There's a common misconception that usability is the finishing touches you add to an application if you've got time, the icing on the cake; but if your application lays any claim to maturity then its usability is the cake, and all that functionality you were so proud of is, without usability, just eggs and flour.

One of the main usability improvements suggested by the usability team---and largely shouted down by the technical team---is the ability to edit inline on the page: that is, to log in as an admin, then have any bit of the page "active", so that if you click on it then it becomes an edit box with the text inside. Flickr does this especially well, letting you edit title and description on photo pages and lists of photos by just clicking on the apparently uneditable text. But Flickr has the advantage that there's very little form on top of its content: it's a delivery mechanism for the raw metadata about photos, and the photo itself.

The other end of the spectrum---which complex CMS sites have every right to sit on---is a rich and complicated mapping between the storage of a node's content in the database and the eventual display of it in the browser. take a page from a recent Torchbox project at random, how would you expect areas of this page from the Joseph Rowntree Foundation's website to behave when you clicked on them? If you have to hardcode print statements in your PHP templates, what do you print? How do you get editing inline to work? What happens when content is brought in from other, related nodes, and mixed in with the other content before display.

I can appreciate both sides to this story of user experience versus technical practicality, although it's not sufficient to expect the usability team to discard the idea merely because there's no correspondence between page content and database content: that's only an argument for why Drupal doesn't currently have edit-on-page. The usability project is moving forwards rapidly, and while there's clearly a tension between usability for the CMS user and feasible technical limitations---usability for the developer, if you like---it will need to be resolved soon for this marvellous work, and a great opportunity, not to end up wasted. And resolving that conflict will involve some sort of compromise, for both sides.

One possible compromise would be to offer edit links, when Drupal can spot a sort-of 1-to-1 correspondence between a fragment of page content and the node that supports it. Page templates and views---specifically hook_preprocess_node and hook_views_pre_render---know full well that what they're processing is a node. And they generally know what field the node title will be in. So let Drupal rewrite the title, to add an "edit inline" link. If anyone clicks on this link, then pop the node-edit form up in a lightbox for editing.

Here's some screenshots of what I've been working on, in an attempt to get people interested (click for bigger.) Firstly, here's what the anonymous site visitor sees:

Homepage for an anonymous site visitor

Next, here's what happens when a user has just logged in. Note that the brilliant Admin menu module kicks in, giving the user a black navigation bar across the top. But, more pertinently, each node title also now has an "[edit inline]" link beside it:

Homepage for a logged-in admin user

If the logged-in user clicks on one of these new links, then our edit-inline module kicks in and, using the equally brilliant Drupal Thickbox wrapper module, provides a stripped-down version of the node-edit page in a Thickbox overlay, both speeding up node editing using AJAX calls and also letting the user cancel the node-edit procedure and return to the webpage they were on quickly:

Effect of clicking on an 'edit inline' link

To reiterate, you don't have to be on a node's page to edit it. All that matters is that the title of the node you want to edit passes through onee of the supported pre-render hooks. Currently, clicking on save/preview/cancel takes you elsewhere rather than being trapped within the Thickbox, and we're also wrestling with getting CSS and Javascript into the Thickbox overlay to support the nattier bits of node editing, but it's functional and, I hope, gives you some idea of how it would all work given a few more hours of bashing away at keyboards.

Anyway, there it is. A possible compromise. I've mentioned it in a comment on the d7ux blog but I fear I might have been eaten by a spamtrap. If anyone's interested in the project then email me, jp.stacey, either at gmail.com or torchbox.com, and say hello.

The multiple magics of Drupal search

Form API is magical; core Drupal search is a twist on that magic; hooking onto that twist puts your code on yet another level of weird.

Drupal's Form API handles so much work for you that you'd be a fool not to use it as much as possible. This code snippet:

function myform_some_form($form_state) {
  $form['text'] = array(
    '#type' => 'textfield',
    '#title' => t('Your submission'),
    '#default_value' => t('Enter some text'),
    '#description' => t('Please use this field to submit some text'),
    '#required' => TRUE,
  );
  return $form;
}

creats a form with:

  • A single textfield element
  • Accessible XHTML with form labels
  • Potentially localized labels, translated into any number of languages
  • A bit of similarly localized help text below the element
  • Validation of the form submission, with the field content marked as required

That's a separate item of form functionality for each array key. And as long as you use Form API, Drupal handles validation and input sanitization for you, thus massively reducing the risk of attack by SQL injection or XSS.

Bookmarkable search URLs with POSTed search terms

But there's a catch. To encourage best practice in terms of form submission and friendly URLs, Form API defaults to HTTP POST. If site searching used Form API (which it does) then what impact would that have? Successful searches could never be bookmarked, because the URL on its own doesn't capture the POST submission.

The search module tackles this by adding an extra twist to Form API. At the end of submission processing are the following two actions:

  • Call either the function named in $form['#submit'] or $ID_submit, where $ID is typically the name of the original form creation function ("myform_some_form" above)
  • Finally, either return to the original action page of the form, or redirect to any URL specified in $form['redirect']

The search module therefore uses a function called search_form_submit to grab the POSTed search terms, and redirect the user to search/$SEARCH_TYPE/$SEARCH_TERMS. $SEARCH_TYPE is "node" for Drupal's out-of-box textual node searching, but if you install some other search module e.g. Apache Solr then it'll be e.g. "apachesolr_search" instead. Result: bookmarkable search URLs.

Writing your own module to handle searches

This has important ramifications if you're trying to piggyback off core search somehow: if, say, you're still using core search or a third-party module for the actual result-finding, but then you want a page other than core search to display the results.

If you want the main site search form to redirect to your own pages, for example, then you have to (a) add your own $form['#submit'] function to the stack and then (b) use that to change the core search's $form['redirect']:

// Implementation of hook_form_alter(), adding an extra submit callback to
// search forms identified by their existing callback
function mysearch_form_alter(&$form, $form_state, $form_id) {
  $submits = array(
    'box' => 'search_box_form_submit',
    'form' => 'search_form_submit',
  );
  if (is_array($form['#submit'])) {
    $which = array_intersect($submits, $form['#submit']);
    $which && ($form['#submit'][] = 'mysearch_form_mysubmit');
  }
}
// Submit callback, which changes the redirect using a regular-expression replace
function mysearch_form_mysubmit(&$form, $form_state) {
  $form_state['redirect'] = preg_replace('/^search\/[^\/]+/', 'search/my_special_search',
    $form_state['redirect']);
}

Now you've got all your site search forms redirecting to a bookmarkable page at search/my_special_search/$SEARCH_TERMS. All you have to do now is write a menu callback for that page: from here on in you're on your own for now.

A WTF at the heart of your Drupal feed aggregation

Do try this at home, kids: but please have the decency to feel a little dirty about it.

Embedding JSON in XML. Hah, that's ridiculous, right? Almost as ridiculous as running a successful blog in .NET/ASP. Well, RSS can combine with JSON to quickly get a Drupal site to consume complex data structures over a webservice.

Drupal's core Aggregator module understands RSS2.0 with no tweaking, putting the text in the <description/> element into the content of quasi-node objects, so you can aggregate all sorts of syndicated content. You could build your own Google Reader if you liked that sort of thing, with articles from the BBC sitting alongside those from the Guardian.

So far so boring. And, on one level, it doesn't get much more interesting than that: Aggregator understands neither Atom XML (rich content) nor RSS that contains Dublin Core fields. There's therefore a limit to how much you can extend the actual XML format.

But what if you get a remote application to produce an RSS feed like this:

<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
  <channel>
    <title>Hello, world</title>
    <link>http://example.com</link>
    <description>Recent updates</description>
    <language>en</language>
    <item>
      <title>Sample JSON encoded content</title>
      <link>Foo</link>
      <description>
        {"text": "This is some lovely JSON text"}
      </description>
      <pubDate>Mon, 24 Nov 2008 22:07:03 +0000</pubDate>
      <guid isPermaLink="false">none</guid>
    </item>
  </channel>
</rss>

"What if?" Well, you get a quasi-node of content whose body contains the literal JSON text. Not terribly exciting. But Drupal's powerful themeing system means you can override the way that such content is .

Drop a file into your theme's directory called aggregator-item.tpl.php and containing the following:

<?php
$data = json_decode($content);
print $data->text;
?>

Voilà! You've unpacked the JSON data packet and accessed the content. And the packet, being JSON, can contain however much hierarchical data that you want. You could essentially encode whatever you liked at the webservice side and unpack it at the webconsumer side. You can't pickle objects very easily, unfortunately, but my recommendation is to avoid doing that sort of thing.

(You might need to empty your cache, if you've got any sort of zealous cacheing switched on. And this specific example will only work on PHP 5.2, unfortunately: json_decode() is a recent addition to the already-polluted default PHP namespace. You could use the PHP serialize() format if you've got an older version of PHP, or some other serialized data format that PHP can understand.)

If you were building all this from scratch, then of course you'd use either XML or JSON throughout, and not this weird hybrid solution. If you were building it from scratch. And if you are building it from scratch: let me know when you're done.

Drupal for NGOs - first ever meet yesterday

More people use Drupal in UK NGOs than you think. And than was planned for at the first, full-to-bursting Drupal for NGOs meet-up.

Yesterday Neal, Tom and I wandered to London, where Rob Purdie was hosting the first ever Drupal for NGOs meeting at Amnesty International’s UK headquarters. It was a hot, dry evening, and Neal’s attempts to Brompton it over from Marylebone left him dry-mouthed enough to avoid the copious snacks that Rob and others had laid on for us.

It was clear after the first half hour or so that there were going to be far more people there than Rob had expected: I think in the end there were around 50 to 60 attendees. A brief, slightly confusing “speed-dating” session later, I also realised that there was a real cross-section of Drupal fans there. There were freelance theme developers, module coders, hardcore sysadmins, CSSers, end users, tech writers, Drupal beginners…. A well-rounded audience, that hopefully stopped the meeting being too focussed on one layer of the CMS.

The talk from Tracy Frauzel at Greenpeace, about their experience with Drupal, was really enlightening, as was the phoned-in discussion from Joel Bassuk of Oxfam International (new site going live, 3–4 weeks’ time). It’s good to hear of people enjoying their transition from other systems to Drupal (even if the imports tend—like all data imports—to be occasionally painful).

It was also interesting to see how far people would tend to go with contributed modules, tweaking the theme and hammering away at the admin config, rather than building their own modules or (shudder) hacking core. Oxfam’s experience with forking Plone shows the perils of hacking core; to avoid doing that in Drupal, Greenpeace had used the usual locale hack to translate core strings to their liking (I say “usual”: I hadn’t seen it in the wild before, so again it was nice to hear a success story).

I really look forward to where Drupal for NGOs will go from here—maybe collaborative/accretive online conversations and resources, but most importantly the next event. This one was a really smooth first event, and it bodes well for future ones. From my experiences with the OGNs, I’ve learnt that organizing a fairly straightforward event can be incredibly stressful, and when it all works perfectly then nobody notices all the effort you’re making: that’s sort of the point of the effort, but it’s incredibly infuriating that people think you’re kicking back and feeling chilled! All the contributors, everyone who spoke or who volunteered some information, contributed to a great evening. Cheers, everyone.

Pages

Subscribe to RSS - module