User loading and saving in Drupal 5.x

Recently at Torchbox we’ve been looking into how to build extra functionality on top of Drupal users. The standard Drupal user object is a combination of the contents from the users table, plus any properties provided by the core profile module. This means that the Drupal user is a combination of rows (and admittedly deserialized, structured data) from a couple of tables in a relational database.

flowcharts of user_load and user_save

That works just fine for most purposes, but we may have to bring in content from not just outside the core Drupal tables but outside the core database, and even on a remote server through webservices. To this end we’ve decomposed the core user module’s user_load() and user_save() functions. This helps us understand better both the workflow and at what points in it our own code can motor into life, query all those extra resources (or set those queries in motion), assemble the rest of the user++ object, and then hand control back over to Drupal.

For those who don’t know much about Drupal, its core has a hook-based API structure. At certain points in its workflow, it checks all the modules for functions following certain naming conventions (typically the module name followed by the hook name e.g. mymodule_init on response startup, or mymodule_block to return details about the module’s support for Drupal page furniture). Any matching hook functions are called in the order defined by module weightings, and then page processing will generally continue: you can crowbar a grind-to-a-halt exit() in your hook, but it’d be unwise, as you can never be sure what tidying up Drupal might need to do after your hook. Outside these hooks, your code has little control over Drupal’s core functioning, unless you stub out entirely the bits of core you need yourself, and get your request to use those bits instead.

Because of the way they let your code tag along with Drupal’s powerful core, hooks are essential to developing modules in the most Drupalish way. With that in mind, here are flow diagrams of the three basic aspects of user functionality—create new, load, save existing—lifted straight from examination of the code:

Although you have to save a user before you can load them, I’ve put this functionality first in the above (admittedly unordered) list. There are two main reasons for this:

  1. user_save actually calls user_load a number of times, once or twice, to “refresh” the user object
  2. user_load is a more primitive function and so bears examination first

Stripped down, user_load consists of: querying the database for a core user record matching the search criteria; returning this and the extended profile data; unserializing a free-data field and inserting it into the user object; discovering user roles; triggering hook_user('load') and returning the object (or boolean false, if no user found).

What this reveals (which I didn’t realise before) is that the anonymous user is in the Drupal users table, with ID=0. Otherwise, searching for this user would return no records, and the anonymous user object could not be instantiated. You could therefore attach rich data to the anonymous user, if you were in a hacky mood.

The two user_save workflows are fairly similar. Creating a user means obtaining an ID from the database: because some MySQL providers have poorer feature sets than others, referential integrity is ensured at the application level rather than the database level. In place of obtaining an ID, user update calls hook_user('update') to pre-process the user. Both workflows then set aside special fields, such as the user’s password, user roles and any profile fields managed by that module (determined from user_fields()). Then they save this data into the database in slightly different orders, with user creation calling hook_user('insert') early on, and the update procedure calling hook_user('after_update') much later in the process, just before determining the external authentication mappings (e.g. OpenID) and returning the user object.

What does this mean for us? Well, we’ll want varying amounts of data to piggyback on the core user object, so we have somewhere to cache it. Ideally this data won’t be summoned—brought out of the distributed data ‘cloud’—on every request/response cycle, so we’ll need to do some local cacheing, but not so much that we’ll get out of synch with the cloud (or that we’ll duplicate sensitive data). We think that, given the pair of hooks in user_save for existing users, we’ll have just enough leverage to do this: the first hook will effectively “tear down” our extra data, so we can do what we want with it, and store it somewhere temporarily; the second hook will “set up” the user for the rest of the request, by putting all that data back in. The existence of user_load within user_save complicates things somewhat, but at the same time it gives us some more wiggle room, because each call to that function fires another hook.

A Drupal hook is worth a thousand lines of module code, but they’re still a bit few and far between for some workflows. Hopefully the accompanying images will help anyone reading to find them, and ditch those thousand lines before they’re even written.