How to not cache a particular Drupal page

Edit: if you know reasonably in advance e.g. at the start of a given page request that the page is never going to be cached, there is a better way (thanks to Stack Overflow!)

In a recent Drupal project we turned on standard caching to help site performance. With this in place, however, we found that certain visitor-sensitive details might be revealed. For example, if a submission via the webform module contains an email address, and this is included somehow in the acknowledgement page (through custom code), then this custom page can be guessed for other users. The reason for this is a complication of webform and fairly understandable custom-modular code. Webform's confirmation page is a GET URL of the form:

http://example.com/form_page/done?sid=1418

where sid is the submission ID: the unique identifier of the data. This is fine with out-of-the-box webform, which just gives all of your site visitors the same confirmation message. But if the message is personalized based on the submission, e.g. to say "Thanks, Bob! We've sent an acknowledgement of your gift of a cheese pastie to your email address, which is..." then we're in trouble. The cache is set to slurp up the response to any HTTP GET request, which while it doesn't affect forms does include confirmation pages, however personalized. As a matter of course, we firstly made the confirmation page customization contingent on a $_SESSION variable, which was set when the form was processed, and unset when the confirmation page was viewed: without the variable, the page would not be customized. In the uncached situation, this solved the problem of discovery; however, the cache would just serve up the cached version regardless, as it just grabs the raw HTML from the database, never touching the code which checks $_SESSION One option would have been to change the webform's confirmation URL to have a random second parameter e.g:

http://example.com/form_page/done?sid=1418&random=0d1803d0-fdf7-11dd-87a...

This would still put an entry in the cache, but it becomes hard to stumble across by trial and error! However, while OK in practice, this felt like a bit of a hack: fundamentally, it's safest not to have any cache entry. With this in mind, we took a different route. In Drupal 6, hook_exit() is called across all modules, immediately before the end of a page request. This happens both in the absence of caching and the presence of standard caching, in drupal_page_footer() and _drupal_bootstrap() respectively. The order of execution is, with some omissions:

  • drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL)
    • Cached version?
      • Set headers and send cache contents to browser
      • module_invoke_all('exit'): execute the exit hooks
      • quit!
    • Otherwise, carry on with page execution
  • ... page execution...
  • drupal_page_footer()
    • Cacheing on?
      • page_set_cache(): cache this page
    • Regardless, module_invoke_all('exit'): execute the exit hooks

So either the page is served from the cache, or it's created by code execution and then cached. That means that we can build a module which uses hook_exit to clear the page that's just been cached every time the code is executed. The page content is therefore never cached i.e. always dynamic, and the $_SESSION trick ensures security of submissions. We use cache_clear_all to clear the page out. If we inspect page_set_cache, we can see the page-specific key that it stores the cached content using cache_set. We can therefore clear out just the entry under this key, for this page, leaving the rest of the cache intact. Here's some sample code that accomplishes this.

/** * Implementation of hook_nodeapi */ function mymodule_nodeapi(&$node, $op) {   if ($node->type == "webform" && ($op == "load") && (arg(2) == "done")) {     // If sid doesn't match session, then quit unless the current user has admin access     if ( ($_GET['sid'] != $_SESSION['submission']['sid']) && !user_access("access webform results")) {       $node->webform["confirmation"] = "Thank you for your submission.";       return;     }
    // Flag the submission for deletion in hook_exit     $_SESSION['submission']['#delete'] = TRUE;
    /* . . . thankyou customization code . . .*/   } }
/** * Implementation of hook_exit */ function mymodule_exit() {   global $base_root;
  // Have we just processed a submission?   if ($_SESSION['submission']['#delete']) {     // Firstly remove the submission entirely from the session, just in case     unset($_SESSION['submission']);     // Then clear the cache for this page     cache_clear_all($base_root . request_uri(), 'cache_page');   } }

Note on aggressive caching

Later on, owing to performance issues, you might want to increase caching from standard to aggressive on your site. At that point, the site will warn you that your new module is "incompatible with aggressive mode caching and might not function properly." I think this happens purely because of the presence of hook_exit(): it's a warning because, in drupal_bootstrap(), aggressive cacheing exits before the _exit hooks are actioned. But this only happens if Drupal finds a cached version. It isn't omitted on the first visit to a given URL. So when webform creates a particular visitor's thankyou page, aggressive cacheing can't find a cache item for that sid: so it still executes the page, puts it in the cache, and executes hook_exit() to clear the page out of the cache! Result: hook_exit() is still called on pages which need it, even with aggressive caching switched on. Note: unlike Knuth, not only have I not tested this yet, but I've barely proven it works, in an aggressively cached Drupal install. Use it in that environment at your peril.