You are here

layers

Frontend United this weekend

Frontenders! United! Can never be defeated! Except possibly by the Same Origin Policy.

This weekend, Drupal's Frontend United conference is being held in London. The idea is to bring together everyone involved with the visual display of the site - concept designers, UX professionals, web designers, CSS/HTML experts and other theme implementers - to discuss how best to implement frontend development in Drupal.

A prior arrangement means I can't go, but you should do so if you're involved in any aspect of Drupal themeing. Tickets are still available, so get one now!

RobotsTxt, robots.txt and /robots.txt

The RobotsTxt module effectively requires a hack to Drupal core in order to function. This is because core Drupal contains a static robots.txt file, and webservers like Apache are configured by Drupal’s .htaccess file to serve static files preferentially to asking Drupal for the content. Every time Drupal is upgraded (or you deploy the site to a new staging or development instance), that hack has to be repeated.

One solution which Johan hit upon a while ago was to patch Drupal’s core .htaccess file, to send any request beginning /robots.txt to Drupal rather than serving any file on disk. It still constitutes a hack to core, but as a patch file accessible at a URL it can be incorporated into e.g. drush make files, and applied automatically when Drupal core is upgraded.

The patchfile-based hack is a big improvement, but it still leaves the RobotsTxt module complaining that there’s a problem. This is because it checks for the robots.txt file on disk, not for the /robots.txt URL, which is the real acid test for whether the module is working properly. So now we’ve also added this patch for robotstxt_requirements(), which checks the URL instead.

Blog category: 

Webform integrates with Entity Token...

but only from 4.x onwards, it seems.

We have a specific client requirement on Drupal 7, to use webforms to update both a third-party CRM system and local “Drupal storage”: whatever that ends up meaning.

It turns out that the excellent Profile2 module provides us with better storage than D7’s core profile module (which doesn’t even use Field API, unlike the rest of D7!) We were always expecting to have to write the CRM/local updating ourselves, but piping those profile values - Profile2 doesn’t modify the $user object in the same way as Profile - into the webforms was a chunk of work we wanted to avoid.

Luckily, as of the 4.x branch, Webform seems to support token replacement; the Entity Token module (part of the Entity API project) lets entities expose themselves as tokens; and Profile2 uses this to hook itself up to Webform. It all works pretty well, although we’re crossing our fingers for a non-alpha release on Webform’s 4.x branch

Blog category: 

Programmatically executing a 2.x View

TL;DR: don't do it, as such! Execute a display instead.

TL;DR: don’t do it! Execute a display instead.

Drupal Views are a good half-way house between user-friendly elegance and writing the SQL yourself. With this in mind, you might want to use them as a robust database API in your own code: you don’t need to think about precisely what’s stored in which table, but execute the view instead.

Working out precisely what you have to do to run a view isn’t straight forward. This now elderly post on Earl “Views” Miles’ blog provides some clues, but only by using page-display rendering as an example. Ideally, we want to avoid running our view through the whole theme layer, as that’s just lost CPU cycles. So what do we do to run our own view?

It turns out to be 90% easy, but 10% ever so subtle. Here is the usual stab, based on Earl’s example:

$view = views_get_view('MY_VIEW_NAME');$view->set_arguments(array($my_first_arg, $my_second_arg, ...));$view->set_display("DISPLAY_NAME");$view->execute(); // - this line is wrong. 

However, that doesn’t quite work. The annoying thing is that it almost works; it just doesn’t quite work. For a start, the number of items per page remains at the default, 10. This is a clue that something else needs to be invoked. But what?

Following that clue, go to the view’s own edit page and click preview. You should see that this preview does indeed respect paging. So what gives? Try invoking debug_backtrace() in View’s set_items_per_page() method, and the devel dsm() function to dump out the contents into Drupal’s messages area.

When you click on “Preview” again to refresh it, you should find that the backtrace includes a call by $view->preview() on $view->pre_execute(), and this is what calls the method to set the paging.

The only other way to call the pre-execute function? $view->execute_display(‘DISPLAY_NAME’). It’s slightly confusing and leaves you with the worry that there might be a pre_execute_display() method that you’re also somehow omitting to call…. Also, DISPLAY_NAME ought to default to, well, “default”, but when we omitted it, paging limits were once again not respected.

Here’s the correct code:

$view = views_get_view('MY_VIEW_NAME');$view->set_arguments(array($my_first_arg, $my_second_arg, ...));$view->execute_display("DISPLAY_NAME");

Your results should be available in an array called $view->result. (note: not singular!) This should respect all paging/offset settings for your display, although of course if you actually want to retrieve page 3 of your results, you’re going to have to do a bit more programming!

Blog category: 

Reverting back to defaults

Goodbye, custom theme!

After having spent eighteen months trying to get the time to work on a Panels-based Drupal theme, I've finally accepted it isn't going to happen. I need to concentrate more on improving the site and adding functionality. I also want to enjoy the benefits of some of the work we've been doing recently with Display Suite, an alternative, and more lightweight, display engine.

To that end, I've switched back to Drupal's default Garland theme, which I will augment incrementally as and when I need to. This should hopefully mean I can get improvements onto this website sooner rather than later.

Blog category: 

Feeds objects within feeds objects

The Drupal Feeds module consists of layers of objects, tunnelling between each other, like a pearl onion on a cocktail stick

We've been doing a lot of work with the Drupal Feeds module recently. The frontend is nice enough, although the sub-navigation was rendered almost illegible by our theme's CSS. The online tutorials need work, and the admin navigation needs to be made a bit more robust to layout changes; but then it will be the de facto way for people to consume feeds on their Drupal sites.

The most recent work we've been doing involved custom integration with RSS feeds arriving effectively as PHP string variables containing all the XML. This is different from either a file on disk or a remote URL: in fact, we had a Python program creating the RSS file from us via a shell (which in turn, horribly, was hitting a remote Oracle database using cx_Oracle). Feeds was definitely up to the job in terms of power. In fact, it was quite a toolkit of useful functionality, which is Drupal code for "incredibly powerful but almost incomprehensible.

It's not that the developer documentation for Feeds isn't decent: it's pretty good. But it's limited in scope: it tells you roughly how to expose your own Feeds-like objects to the admin interface, but not really how all those objects interact. Most importantly, we wanted to know what happened on a cron run: this is the bedrock of how Feeds works on your site, after all.

I poked around a bit and this is what I discovered:

 

Workflow of a Feeds cron run

Here's a summary of the above diagram to give you some idea of what's going on.

 

  1. Drupal's cron creates a FeedsScheduler object and passes it a "job", which is all the configuration for a feed call, including any configuration that was attached originally to the particular node which defines the Feed. The scheduler creates a FeedsImporter and passes it the job; the importer then creates a FeedsSource and embeds itself in it as a parent. In each case, the method ::work() is called to create the child/helper object.
  2. The Source object is what now runs the three phases of feed consumption, via its parent Importer. The Source asks the Importer for the relevant Fetcher, Parser and Processor objects: for example, the HTTP Fetcher, the RSS Parser and the Node Processor objects are strung together to turn an RSS feed at a HTTP URL into a set of nodes, one per entry. Each of these have a relevant, verb-like named method: so ::fetch() for the Fetcher etc. The common currency is a FeedsBatch object, which gets passed around and needs to have methods that make it feel like a batch of feed objects.
  3. After the three phases have run, the Source calls hook_feeds_after_import() to do any tidying, then quits to the Importer, which quits to the Scheduler, which then runs its ::finished() method on the job, and the cron run for this particular feed is done.

 

When you build a new plugin, you need to implement hook_feeds_plugins() in a module and reference a class file: this class will be selectable in the admin interface for one of the three consumption phases, depending on what class it's ultimately based on. You should therefore extend existing classes rather than start from scratch: there are abstract PHP classes in the feeds module directories, which give you skeleton "interfaces" which you can then flesh out with relevant functions. But what's better is to extend e.g. the HTTP fetcher to fetch from a command on disk (which is what we did) or, say, extend the CSV parser to interrogate JSON.

Class hierarchies mean you don't have to spend a lot of time reinventing the wheel or hacking existing modules until they become unupgradeable; instead you can take existing classes and tweak them through inheritance, experimenting as you develop.

Blog category: 

Postcode lookup must not suck

Because people won't put up with much if there's no benefit for them anyway. 

Recently my wife and I were trying to work out why she couldn't submit her address details to a website, even though I could. As we watched her behaviour in filling out the form, we encountered error after error: or rather, exceptional circumstance after exceptional circumstance. And it was clear that very few of the circumstances had been considered, that error handling was the absolute bare minimum, that the form was set up to be almost a trial to use. The postcode lookup part of the process was probably the source of the most unhandled exceptions: difficult if not impossible for the power user to flow through; unwieldy for the standard user; of almost no benefit at all to the web newbie.

People still think of their workflows on the web like they're workflows. Over here there's the start; over there is the goal; somewhere in between there might be some intermediate stages, but ultimately you go from over here to more or less over there eventually. It comes as something as a shock to most people that their beautiful webform does not encompass a workflow: the web has holes all over it; the user is a ball bearing and your application is a pinboard:

Antique pinball machine

Above we see a slightly clumsy metaphor for your web application. The end point of your own particular "workflow" isn't even something visually obvious like the shark's head. It's, oh, let's say that small metallic gate in the bottom right with the red doors (luckily for you) open. The best you can hope for is that the user caroms through your website hitting as few pins as they have to and ends up in one of a trillion end points, that they don't close the browser, or reach a form error, or silently lose their submission, or navigate elsewhere in irritation.

In fact, it dawned on my wife first, and then gradually on me, that postcode lookups are not intended to directly benefit the user filling them in. Instead, they're meant to force the user---remember that phrase---to provide a canonical address, and not the address. That is, the user comes to your site with an opinion about where they live and limited good will about your "product", and the postcode lookup is a mechanism for forcing them to discard the former, while the application as a whole is trying hard to get them to keep hold of the former.

Good luck with that.

Richard Rutter figured out the dirty secret behind postcode lookups---that they're not for the user---long before my wife and I. In order to mitigate this natural tension between forcing the user and keeping them happy, he's done a sizeable chunk of work to condense the postcode lookup pattern here. Along with a quite lively and informed conversation in the comments, this post nails much of the core of the pattern that lookup needs. Much of what's frankly miserable about using a postcode lookup is indeed tackled there, but there's an important omissions that I think needs dealing with. Roughly speaking, that isnever force your users to pretend to be someone they're not.

As an example, consider Spotify. Since inheriting a slightly ropey eMac, I've been able to listen to Spotify, and I like it. I think Spotify is a gamechanger in the field of streaming music. I've heard albums on Spotify that I would never have bought. And yet: I would never consider purchasing Spotify Premium. The obvious barrier for me is that I use Linux, and there are no native Linux binaries for the Spotify desktop client.

People keep telling me that Spotify runs really well on the Windows emulator WINE. I'm sure it does. But that misses a more fundamental point: if something wants me to enter into a relationship with it, commercial or otherwise, I should not have to pretend to be a demographic I'm not in order that the relationship can be properly fulfilling. I'm not a Windows user, and it's an affront to a paying customer to expect them to make out that they're a type of user that they're not if they want to buy your stuff. More consisely: I don't take offence at the interface requirements; I take offence at what they imply about the respect for my needs as a user.

With that in mind, consider the first decision block in Richard's workflow:

does the user know their postcode?

As this is the first pin the user's pinball hits, then this is the one that alters their final resting place the most. This is the critical pin. And what does it tell me, a user who knows his postcode but also knows his address, and doesn't want to bother with lookup? It tells me that I have to pretend to not know my postcode if I want to be in a situation where I don't have to put it in. I have to play games with the application, and mask my true intentions. No, the postcode lookup system must allow for users who simply do not want to fill out their postcode. Postcode lookups should therefore begin with a simple choice of two buttons by a traditional form label. The buttons should not make any assumption about the user's reasons for choosing either route:

Address:  [LOOK UP POSTCODE]
               [JUST LET ME TYPE MY ADDRESS]

When you press the first button, both buttons should still remain on the page: the user might decide they wanted to press the second one after all. In fact, as we're probably using AJAX here, the minimum necessary modification to the form is just to add a postcode box (and to move and maybe change that button) like this:

Address:  [ postcode ]   [LOOK IT UP]
               [JUST LET ME TYPE MY ADDRESS]

You should still present the user with the ability to change their address. The "speed bump" of having to press the button is what works in your favour, is what gets you access to canonical data; anything more than a bump will run the risk of walloping the user right off your pinboard.

If at any stage the user clicks on the second button, the webform should then change to this view:

Address:  [ Address, either as a set of textfields or as a textarea ]
               [ postcode ]   [LOOK IT UP INSTEAD]

For simplicity, I'm almost tempted to drop lookup button altogether at this point: the user has made their choice, after all. But you should never make it difficult for users to go back in a workflow, especially when (almost certainly) the browser back button will have been disabled by these shenanigans.

In comment 10 on his blogpost, Richard agrees with the idea of one big textarea for the address: possibly even including the postcode in that textarea! Again, the simplicity is appealing. You could do all sorts with this to make the user's experience easier: regex matching behind the scenes would retrieve the postcode; the address could even be automatically split into lines rather than setting real estate aside for the user to split them up themselves. It sounds great, but it's not an established pattern, and I think a lot of users---especially power users---would mistrust it. Better to go with Address 1, Address 2 etc. even though from a data perspective they're a horror, slightly improved---but, again, made more complex for the user---by labelling the last address line as "town". But this last detail is up to you. Do some A/B testing. See how it goes.

Richard's workflow, with the addition of the basic prototypes above, permits us to move towards as usable a system as postcode lookup will ever be. Usability means the least number of pins on your pinboard, and exactly the right pins, the ones that nudge and tilt the user just enough. And so we end up with a system that still satisfies your original remit---to nudge the user towards using your shiny, expensive, time-consuming, postcode lookup service, with all its concomitant costs in development and maintenance---while catering for the users who simply do not want to, who will never want to, and who will actively object to your site giving them short shrift if they try not to use it.

I'll make a prediction here, that the users who try not to submit a postcode will tend to be the users you want: the digital natives, the users in flow, the people who will buy ten things from a polished user interface without even stopping to think about it. When they reach that very first pin on the board, they briefly want to be your application's friends on the web. Your application should consider itself honoured. The least it could do in return is be polite.

Django internal architecture: a nice PDF

Get that blasted workflow away from me, you fiends.

I've never been completely happy with this spindly and slightly confusing diagram from the Django Book, ever since it appeared the first edition. Once I'd digested it, I almost immediately started redrafting it as an exercise in explaining it to others, for a possible seminar for wannabe Djangolians at Torchbox.

Time went by, as it was wont to do, and I still had a slightly incomplete diagram of the Django internal architecture on my desktop. The Django Book had since been reorganized, empires had risen and fallen, we had probably passed peak oil, and I still hadn't any use for that fecking diagram.

In an attempt to just get rid of it I've polished it up and posted it here: a three-page PDF of Django's architecture. It's got callout boxes and different colours and everything. Feast your eyes on it; move rapidly between pages, as in a flick-book; complain about the fact that it's not much of an improvement on the original. Just don't leave it on my desktop, please.

User loading and saving in Drupal 6.x

Nearly a year ago I broke down user_load() and user_save() in Drupal 5. I had to put together workflows for a number of jobs, specifically integrating the creation, instantiation and updating of users with an external system. Fast forward nearly twelve months, and we have to do it all over again for D6, for different work. So here's a PDF of user_load() and user_save() in Drupal 5 and 6.

flowcharts of user_load and user_save

The flowcharts have been especially useful in coding in the most Drupalish way possible. Drupal core (and well-behaved modules) is built with a hook-based architecture. That means that before and/or after important events, Drupal calls all the functions which follow a particular naming convention: any module which, in effect, implements a hook. That means your code can tag along with Drupal’s powerful core, making hook essential to developing modules efficiently.

What's changed between Drupals 5 and 6? Not much, to be honest:

  • Loading now tries to grab an object, rather than checking if an ID has been returned by the database first
  • Updating clears the sessions for newly-blocked users, effectively kicking them out; it also sends notification emails through _user_mail_notify
  • Creating doesn't grab a new ID for the user, pre-creation, owing to D6's better database abstractions

For your convenience and mine, all six workflows are now in the same PDF. That makes it easier to compare 5 and 6 side by side, but it also makes clear some of the very minor errors I made in the original Drupal 5 diagrams. Well, best let them stand, for transparency's sake. And besides: if a man's errors are his portals of discovery, you'd be lucky to fit the chipmunk of serendipity through these.

How to not cache a particular Drupal page

Sometimes you don't want every random visitor seeing the same thing on your cached site.

Edit: if you know reasonably in advance e.g. at the start of a given page request that the page is never going to be cached, there is a better way (thanks to Stack Overflow!)

In a recent Drupal project we turned on standard caching to help site performance. With this in place, however, we found that certain visitor-sensitive details might be revealed. For example, if a submission via the webform module contains an email address, and this is included somehow in the acknowledgement page (through custom code), then this custom page can be guessed for other users. The reason for this is a complication of webform and fairly understandable custom-modular code. Webform's confirmation page is a GET URL of the form:

http://example.com/form_page/done?sid=1418

where sid is the submission ID: the unique identifier of the data. This is fine with out-of-the-box webform, which just gives all of your site visitors the same confirmation message. But if the message is personalized based on the submission, e.g. to say "Thanks, Bob! We've sent an acknowledgement of your gift of a cheese pastie to your email address, which is..." then we're in trouble. The cache is set to slurp up the response to any HTTP GET request, which while it doesn't affect forms does include confirmation pages, however personalized. As a matter of course, we firstly made the confirmation page customization contingent on a $_SESSION variable, which was set when the form was processed, and unset when the confirmation page was viewed: without the variable, the page would not be customized. In the uncached situation, this solved the problem of discovery; however, the cache would just serve up the cached version regardless, as it just grabs the raw HTML from the database, never touching the code which checks $_SESSION One option would have been to change the webform's confirmation URL to have a random second parameter e.g:

http://example.com/form_page/done?sid=1418&random=0d1803d0-fdf7-11dd-87a...

This would still put an entry in the cache, but it becomes hard to stumble across by trial and error! However, while OK in practice, this felt like a bit of a hack: fundamentally, it's safest not to have any cache entry. With this in mind, we took a different route. In Drupal 6, hook_exit() is called across all modules, immediately before the end of a page request. This happens both in the absence of caching and the presence of standard caching, in drupal_page_footer() and _drupal_bootstrap() respectively. The order of execution is, with some omissions:

  • drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL)
    • Cached version?
      • Set headers and send cache contents to browser
      • module_invoke_all('exit'): execute the exit hooks
      • quit!
    • Otherwise, carry on with page execution
  • ... page execution...
  • drupal_page_footer()
    • Cacheing on?
      • page_set_cache(): cache this page
    • Regardless, module_invoke_all('exit'): execute the exit hooks

So either the page is served from the cache, or it's created by code execution and then cached. That means that we can build a module which uses hook_exit to clear the page that's just been cached every time the code is executed. The page content is therefore never cached i.e. always dynamic, and the $_SESSION trick ensures security of submissions. We use cache_clear_all to clear the page out. If we inspect page_set_cache, we can see the page-specific key that it stores the cached content using cache_set. We can therefore clear out just the entry under this key, for this page, leaving the rest of the cache intact. Here's some sample code that accomplishes this.

/** * Implementation of hook_nodeapi */ function mymodule_nodeapi(&$node, $op) {   if ($node->type == "webform" && ($op == "load") && (arg(2) == "done")) {     // If sid doesn't match session, then quit unless the current user has admin access     if ( ($_GET['sid'] != $_SESSION['submission']['sid']) && !user_access("access webform results")) {       $node->webform["confirmation"] = "Thank you for your submission.";       return;     }
    // Flag the submission for deletion in hook_exit     $_SESSION['submission']['#delete'] = TRUE;
    /* . . . thankyou customization code . . .*/   } }
/** * Implementation of hook_exit */ function mymodule_exit() {   global $base_root;
  // Have we just processed a submission?   if ($_SESSION['submission']['#delete']) {     // Firstly remove the submission entirely from the session, just in case     unset($_SESSION['submission']);     // Then clear the cache for this page     cache_clear_all($base_root . request_uri(), 'cache_page');   } }

Note on aggressive caching

Later on, owing to performance issues, you might want to increase caching from standard to aggressive on your site. At that point, the site will warn you that your new module is "incompatible with aggressive mode caching and might not function properly." I think this happens purely because of the presence of hook_exit(): it's a warning because, in drupal_bootstrap(), aggressive cacheing exits before the _exit hooks are actioned. But this only happens if Drupal finds a cached version. It isn't omitted on the first visit to a given URL. So when webform creates a particular visitor's thankyou page, aggressive cacheing can't find a cache item for that sid: so it still executes the page, puts it in the cache, and executes hook_exit() to clear the page out of the cache! Result: hook_exit() is still called on pages which need it, even with aggressive caching switched on. Note: unlike Knuth, not only have I not tested this yet, but I've barely proven it works, in an aggressively cached Drupal install. Use it in that environment at your peril.

Pages

Subscribe to RSS - layers