api

Twitter pull loses time_ago

Drupal’s Twitter Pull module is a useful one: not just for its own UI elements, but also for getting tweets to format yourself. However, in a recent version change, it lost its $tweet->time_ago property on the tweet objects.

You can recreate this straightforwardly using the following PHP:

<?php 
$tweet->time_ago = format_interval(time() - $t->timestamp);

It’s a minor pain but if you’ve exposed your own theme implementations to preprocess hooks - and why wouldn’t you? - then it should be easy to put in e.g. your template.php.

Blog category:

Feeds objects within feeds objects

The Drupal Feeds module consists of layers of objects, tunnelling between each other, like a pearl onion on a cocktail stick

We've been doing a lot of work with the Drupal Feeds module recently. The frontend is nice enough, although the sub-navigation was rendered almost illegible by our theme's CSS. The online tutorials need work, and the admin navigation needs to be made a bit more robust to layout changes; but then it will be the de facto way for people to consume feeds on their Drupal sites.

The most recent work we've been doing involved custom integration with RSS feeds arriving effectively as PHP string variables containing all the XML. This is different from either a file on disk or a remote URL: in fact, we had a Python program creating the RSS file from us via a shell (which in turn, horribly, was hitting a remote Oracle database using cx_Oracle). Feeds was definitely up to the job in terms of power. In fact, it was quite a toolkit of useful functionality, which is Drupal code for "incredibly powerful but almost incomprehensible.

It's not that the developer documentation for Feeds isn't decent: it's pretty good. But it's limited in scope: it tells you roughly how to expose your own Feeds-like objects to the admin interface, but not really how all those objects interact. Most importantly, we wanted to know what happened on a cron run: this is the bedrock of how Feeds works on your site, after all.

I poked around a bit and this is what I discovered:

 

Workflow of a Feeds cron run

Here's a summary of the above diagram to give you some idea of what's going on.

 

  1. Drupal's cron creates a FeedsScheduler object and passes it a "job", which is all the configuration for a feed call, including any configuration that was attached originally to the particular node which defines the Feed. The scheduler creates a FeedsImporter and passes it the job; the importer then creates a FeedsSource and embeds itself in it as a parent. In each case, the method ::work() is called to create the child/helper object.
  2. The Source object is what now runs the three phases of feed consumption, via its parent Importer. The Source asks the Importer for the relevant Fetcher, Parser and Processor objects: for example, the HTTP Fetcher, the RSS Parser and the Node Processor objects are strung together to turn an RSS feed at a HTTP URL into a set of nodes, one per entry. Each of these have a relevant, verb-like named method: so ::fetch() for the Fetcher etc. The common currency is a FeedsBatch object, which gets passed around and needs to have methods that make it feel like a batch of feed objects.
  3. After the three phases have run, the Source calls hook_feeds_after_import() to do any tidying, then quits to the Importer, which quits to the Scheduler, which then runs its ::finished() method on the job, and the cron run for this particular feed is done.

 

When you build a new plugin, you need to implement hook_feeds_plugins() in a module and reference a class file: this class will be selectable in the admin interface for one of the three consumption phases, depending on what class it's ultimately based on. You should therefore extend existing classes rather than start from scratch: there are abstract PHP classes in the feeds module directories, which give you skeleton "interfaces" which you can then flesh out with relevant functions. But what's better is to extend e.g. the HTTP fetcher to fetch from a command on disk (which is what we did) or, say, extend the CSV parser to interrogate JSON.

Class hierarchies mean you don't have to spend a lot of time reinventing the wheel or hacking existing modules until they become unupgradeable; instead you can take existing classes and tweak them through inheritance, experimenting as you develop.

Blog category:

Drupal feeds PDF

Oxford Geek Night 13 sponsored by the Guardian Open Platform

You heard it here second, probably; immediately after on the mailing list.

Wow. Well, now it's out in the open and I can tell you all. Oxford Geek Night 13 on Wednesday 15 July will be co-sponsored by the Guardian Open Platform.

The Guardian's Open Platform and Datastore work has really put them at the forefront of modern media. They've got a data API, client libraries in PHP, Python, Java etc, and swathes of online documentation, makes them a match made in heaven for Oxfordshire's geeks.

While we're on the subject, the Guardian Datastore is also running a competition to win a Flip Mino HD camcorder. They want ideas or tools from you which can bring their data to life. More details on their site, but maybe one for the next Oxford Geek Jam?

User loading and saving in Drupal 6.x

Nearly a year ago I broke down user_load() and user_save() in Drupal 5. I had to put together workflows for a number of jobs, specifically integrating the creation, instantiation and updating of users with an external system. Fast forward nearly twelve months, and we have to do it all over again for D6, for different work. So here's a PDF of user_load() and user_save() in Drupal 5 and 6.

flowcharts of user_load and user_save

The flowcharts have been especially useful in coding in the most Drupalish way possible. Drupal core (and well-behaved modules) is built with a hook-based architecture. That means that before and/or after important events, Drupal calls all the functions which follow a particular naming convention: any module which, in effect, implements a hook. That means your code can tag along with Drupal’s powerful core, making hook essential to developing modules efficiently.

What's changed between Drupals 5 and 6? Not much, to be honest:

  • Loading now tries to grab an object, rather than checking if an ID has been returned by the database first
  • Updating clears the sessions for newly-blocked users, effectively kicking them out; it also sends notification emails through _user_mail_notify
  • Creating doesn't grab a new ID for the user, pre-creation, owing to D6's better database abstractions

For your convenience and mine, all six workflows are now in the same PDF. That makes it easier to compare 5 and 6 side by side, but it also makes clear some of the very minor errors I made in the original Drupal 5 diagrams. Well, best let them stand, for transparency's sake. And besides: if a man's errors are his portals of discovery, you'd be lucky to fit the chipmunk of serendipity through these.

The multiple magics of Drupal search

Form API is magical; core Drupal search is a twist on that magic; hooking onto that twist puts your code on yet another level of weird.

Drupal's Form API handles so much work for you that you'd be a fool not to use it as much as possible. This code snippet:

function myform_some_form($form_state) {
  $form['text'] = array(
    '#type' => 'textfield',
    '#title' => t('Your submission'),
    '#default_value' => t('Enter some text'),
    '#description' => t('Please use this field to submit some text'),
    '#required' => TRUE,
  );
  return $form;
}

creats a form with:

  • A single textfield element
  • Accessible XHTML with form labels
  • Potentially localized labels, translated into any number of languages
  • A bit of similarly localized help text below the element
  • Validation of the form submission, with the field content marked as required

That's a separate item of form functionality for each array key. And as long as you use Form API, Drupal handles validation and input sanitization for you, thus massively reducing the risk of attack by SQL injection or XSS.

Bookmarkable search URLs with POSTed search terms

But there's a catch. To encourage best practice in terms of form submission and friendly URLs, Form API defaults to HTTP POST. If site searching used Form API (which it does) then what impact would that have? Successful searches could never be bookmarked, because the URL on its own doesn't capture the POST submission.

The search module tackles this by adding an extra twist to Form API. At the end of submission processing are the following two actions:

  • Call either the function named in $form['#submit'] or $ID_submit, where $ID is typically the name of the original form creation function ("myform_some_form" above)
  • Finally, either return to the original action page of the form, or redirect to any URL specified in $form['redirect']

The search module therefore uses a function called search_form_submit to grab the POSTed search terms, and redirect the user to search/$SEARCH_TYPE/$SEARCH_TERMS. $SEARCH_TYPE is "node" for Drupal's out-of-box textual node searching, but if you install some other search module e.g. Apache Solr then it'll be e.g. "apachesolr_search" instead. Result: bookmarkable search URLs.

Writing your own module to handle searches

This has important ramifications if you're trying to piggyback off core search somehow: if, say, you're still using core search or a third-party module for the actual result-finding, but then you want a page other than core search to display the results.

If you want the main site search form to redirect to your own pages, for example, then you have to (a) add your own $form['#submit'] function to the stack and then (b) use that to change the core search's $form['redirect']:

// Implementation of hook_form_alter(), adding an extra submit callback to
// search forms identified by their existing callback
function mysearch_form_alter(&$form, $form_state, $form_id) {
  $submits = array(
    'box' => 'search_box_form_submit',
    'form' => 'search_form_submit',
  );
  if (is_array($form['#submit'])) {
    $which = array_intersect($submits, $form['#submit']);
    $which && ($form['#submit'][] = 'mysearch_form_mysubmit');
  }
}
// Submit callback, which changes the redirect using a regular-expression replace
function mysearch_form_mysubmit(&$form, $form_state) {
  $form_state['redirect'] = preg_replace('/^search\/[^\/]+/', 'search/my_special_search',
    $form_state['redirect']);
}

Now you've got all your site search forms redirecting to a bookmarkable page at search/my_special_search/$SEARCH_TERMS. All you have to do now is write a menu callback for that page: from here on in you're on your own for now.

User loading and saving in Drupal 5.x

Workflows of Drupal's user load and save functionality: spot the hooks and win a programmatical prize.

Recently at Torchbox we’ve been looking into how to build extra functionality on top of Drupal users. The standard Drupal user object is a combination of the contents from the users table, plus any properties provided by the core profile module. This means that the Drupal user is a combination of rows (and admittedly deserialized, structured data) from a couple of tables in a relational database.

flowcharts of user_load and user_save

That works just fine for most purposes, but we may have to bring in content from not just outside the core Drupal tables but outside the core database, and even on a remote server through webservices. To this end we’ve decomposed the core user module’s user_load() and user_save() functions. This helps us understand better both the workflow and at what points in it our own code can motor into life, query all those extra resources (or set those queries in motion), assemble the rest of the user++ object, and then hand control back over to Drupal.

For those who don’t know much about Drupal, its core has a hook-based API structure. At certain points in its workflow, it checks all the modules for functions following certain naming conventions (typically the module name followed by the hook name e.g. mymodule_init on response startup, or mymodule_block to return details about the module’s support for Drupal page furniture). Any matching hook functions are called in the order defined by module weightings, and then page processing will generally continue: you can crowbar a grind-to-a-halt exit() in your hook, but it’d be unwise, as you can never be sure what tidying up Drupal might need to do after your hook. Outside these hooks, your code has little control over Drupal’s core functioning, unless you stub out entirely the bits of core you need yourself, and get your request to use those bits instead.

Because of the way they let your code tag along with Drupal’s powerful core, hooks are essential to developing modules in the most Drupalish way. With that in mind, here are flow diagrams of the three basic aspects of user functionality—create new, load, save existing—lifted straight from examination of the code:

Although you have to save a user before you can load them, I’ve put this functionality first in the above (admittedly unordered) list. There are two main reasons for this:

  1. user_save actually calls user_load a number of times, once or twice, to “refresh” the user object
  2. user_load is a more primitive function and so bears examination first

Stripped down, user_load consists of: querying the database for a core user record matching the search criteria; returning this and the extended profile data; unserializing a free-data field and inserting it into the user object; discovering user roles; triggering hook_user('load') and returning the object (or boolean false, if no user found).

What this reveals (which I didn’t realise before) is that the anonymous user is in the Drupal users table, with ID=0. Otherwise, searching for this user would return no records, and the anonymous user object could not be instantiated. You could therefore attach rich data to the anonymous user, if you were in a hacky mood.

The two user_save workflows are fairly similar. Creating a user means obtaining an ID from the database: because some MySQL providers have poorer feature sets than others, referential integrity is ensured at the application level rather than the database level. In place of obtaining an ID, user update calls hook_user('update') to pre-process the user. Both workflows then set aside special fields, such as the user’s password, user roles and any profile fields managed by that module (determined from user_fields()). Then they save this data into the database in slightly different orders, with user creation calling hook_user('insert') early on, and the update procedure calling hook_user('after_update') much later in the process, just before determining the external authentication mappings (e.g. OpenID) and returning the user object.

What does this mean for us? Well, we’ll want varying amounts of data to piggyback on the core user object, so we have somewhere to cache it. Ideally this data won’t be summoned—brought out of the distributed data ‘cloud’—on every request/response cycle, so we’ll need to do some local cacheing, but not so much that we’ll get out of synch with the cloud (or that we’ll duplicate sensitive data). We think that, given the pair of hooks in user_save for existing users, we’ll have just enough leverage to do this: the first hook will effectively “tear down” our extra data, so we can do what we want with it, and store it somewhere temporarily; the second hook will “set up” the user for the rest of the request, by putting all that data back in. The existence of user_load within user_save complicates things somewhat, but at the same time it gives us some more wiggle room, because each call to that function fires another hook.

A Drupal hook is worth a thousand lines of module code, but they’re still a bit few and far between for some workflows. Hopefully the accompanying images will help anyone reading to find them, and ditch those thousand lines before they’re even written.

Software simple and software facile

Assaf writes about, among other things, REST as a simplifier of development against an existing system:

REST plays the same role as open source and open APIs: It eliminates tooling and vendoring as artificial barriers to adoption.

Interestingly, a corollary to this was brought up at Barcamp Brighton this weekend. During Gareth Rushgrove’s talk about REST and Nabaztag, a chap whose name I’ve again forgotten (although I’m sure someone like Fatty will enlighten me) pointed out that much of the push of SOAP is coming from the vendors, because the vendors make their money from selling tools, and REST development needs very few tools, most of which are free.

Undoubtedly there’s a set of problems that REST finds hard, but this truism is extended by SOAP vendors to the hard-to-prove (but also hard-to-contradict) claim that it’s a larger set, or a set more pertinent to enterprise solutions, than the set which SOAP finds hard. It convinces the consumers, because intelligent data mining and storage has always been a difficult problem, and a simple solution like REST feels like underkill for the job in hand. They let you confuse libre and gratis, the vendors point out (I see them sitting on the consumer’s shoulders with tridents at this point): so where’s the hidden cost of this free lunch?

(hat tip to Simon Willison)

Drupal logins not working

There’s a long (and old) thread about Drupal logins not working. A lot of the problems are to do with weird PHP version changes; some of them are caused by cookie persistence; but the one we’ve had was the result of losing the login box on the front page.

“How can you log in without a login box?” I hear you holler. Well, even if you drop the block containing the login box from your frontend theme, you can have a separate theme for administering the site (useful as a visual distinction between admin and content editing). This theme can retain a login box. But if you try to log in at /admin, in the login box your admin theme still seems to have, then the submission workflow actually (I think) goes via your live site’s theme, where the form does not get instantiated. This means that Form API can’t handle your login and you just get taken to the front page.

The way we solved it is detailed in the comments to this post, but it basically involves stubbing out PHPTemplate’s core block.tpl.php and wrapping it as follows:

<?php if (! ($block->module == “user” && $block->delta == “0″)): ?>
  <!– .. original block content .. –>
<?php endif; ?>

This code effectively masks the output from the user-module-generated block #0, which is the login box. It doesn’t prevent the block from being generated (which is a minor performance issue, I suppose) and so the Form API hooks are all still activated and can grab onto your username and password.

Pages

Subscribe to RSS - api