When iterating on features can break a Drupal site in development

It's generally safe to use Features to maintain and deploy the configuration of Drupal sites, but I'd heard rumours from other developers that, especially during a rapid development process, some phase of featurization could lead to database errors. I'd not seen that for ages until a few days ago, when re-enabling features led to PDO exceptions. Here's what I did, what I didn't do, and how to mitigate this problem in future.

The process I followed: getting a new live database mid-featurizing

I had been working on a feature locally, part of which included adding new fields to a content type. All the additions had been exported into the feature and checked into version control. The features dashboard was showing green "Default" lights all the way.

At that point, I needed to see how my feature would look when it was finally moved to production after approval. To do that—I don't have shell access to the server—I downloaded a database using Backup/Migrate and unzipped it into MySQL at the command prompt. So far, so good; at which point I enabled my new features, saw they were listed as "Overridden" and reverted them.

This was when all hell broke loose:

$ drush feature-revert-all
The following modules will be reverted: foo_something, foo_another, foo_yetanother
Do you really want to continue? (y/n): y
Do you really want to revert context? (y/n): y
Reverted context.                                                                  [ok]
Do you really want to revert field? (y/n): y
Reverted field.                                                          [ok]
Do you really want to revert views_view? (y/n): y
Reverted views_view.                                                     [ok]
Do you really want to revert field? (y/n): y
exception 'DatabaseSchemaObjectExistsException' with message 'Table      [error]
field_data_field_test already exists.' in
/var/www/drupal7-foo/web/includes/database/schema.inc:657
Stack trace:
#0 /var/www/drupal7-foo/web/includes/database/database.inc(2720):
DatabaseSchema->createTable('field_data_fiel...', Array)
...

(Output edited for confidentiality.) So what exactly had happened here? More importantly, how was I to prevent it happening on production?

What broke feature reversion and how to prevent it

Here's the order in which important events occurred on my local build.

  1. Drupal began with a database similar to production.
  2. At least one field was created, called field_test in the above example.
  3. This field, among other things, was featurized into code.
  4. A new database dump was brought down from production, and unzipped directly into MySQL's command prompt (via drush sql-cli.)
  5. All features were reverted, during which a PDO exception was observed owing to a table already, unexpectedly existing.

The key here is step 4, where I unzipped the database dump straight into MySQL. Doing so does successfully drop and recreate tables that production knew about at the point of creating the zipfile. However, it doesn't touch—for better or worse—tables which don't exist on production. They're left lying around, untracked by Drupal.

The upshot is that I ended up with a database where fields didn't exist in Drupal's field configuration, but their corresponding tables did, left behind by step 2. When the feature was enabled, the field configuration was changed to add the new field, and only then was an attempt made to create the table... which already existed, hence the PDO exception.

This is straightforwardly—if annoyingly—solved for local development, by ensuring the database is DROPped and reCREATEed before piping the new zipfile in to fill it up. This drops all tables and leaves a clean slate for production to be faithfully replicated in.

Will PDO exceptions happen on production?

What about production, which is far more crucial than a development environment? Well, field creation through features has generally been safe for me on production. However, it's at least theoretically possible that this could cause problems.

The only situation I can think of in which PDO exceptions might happen on production are when tables already exist that might replicate the above problem. In theory, once you've deleted a field instance, its per-entity data should also be deleted; in practice, this might not happen until cron, and even then until several cron runs, once all the field data is purged. It's not clear to me, but it's something to be aware of if anyone has created fields on production in the past. You can check your production database matches what's expected of it by Drupal using the schema module, although it's not clear to me how that works with deleted yet unpurged fields.

Ultimately, if you really don't feel you can trust your production environment to feature-revert safely, you just shouldn't do it (especially when features contain field creations.) Instead, use the diff module to see what parts of the feature are in need of reversion, and then create the fields manually and accordingly. Once you're left with a few fragments of overridden configuration—maybe field weights, or boolean flags you've forgotten to set—it's probably OK to revert by that point. Probably.

Oh, and always back up production before making changes to it, obviously. That goes without saying. Right?

Comments

The short summary: when importing database dumps make sure to drop all tables first. Example:

drush sql-create
drush sqlc < dump.sql

Oh, yes, I'd forgotten about sql-create: in itself it's definitely neater than the DROP/CREATE statements. But (for reasons of defense in depth) Drupal's MySQL user shouldn't have permission to create databases, so I'm assuming you'd need extra command-line options to specify a more privileged MySQL user. At that point, it starts to approach the verbosity of DROP/CREATE.

More broadly, I think people just aren't used to cruft tables suddenly being orphaned in the database, when they're in the thick of iterating on what feels like their code, that should be predictable by them. I've spoken to a couple of devs now who treat these PDO exceptions as arcane and not entirely predictable, so I think it's instructive to have a forensic breakdown of where they've come from, how to get rid of them and what it means when you get to the production environment, where you don't have the privilege of running sql-create.

You're right, though, that it's best practice to always drop the entire database before importing a database dump. Mind you, if everyone always followed best practice, we'd probably have much less use for diagnostic modules like schema...!

Thanks for sharing!  Also worth noting that another common error that can occur with this workflow, even if you do drop the database, is refactoring a field to a different type.

For example: You have a textfield "field_summary" which accepts up to 255 characters and the requirement changes and it needs to store more even more text.  So you drop your database, import the remote one, delete that field and re-create it as a Long Text field instead.  Sure, you will lose all your data if you don't manually handle it, but you will be able to get away with this refactor and re-create your feature too.

However, once you try to deploy this, Drupal/Features will choke because it won't know to drop that field before recreating it and will throw an exception when trying to revert that feature.  Now that I think about it, it may be possible to circumvent this by adding an update hook to excplicitly drop that field before reverting the feature.  Either way, it's messy... :)

That's absolutely true. I think in my brain refactoring fields is stored in a box marked "REALLY DANGEROUS" so I didn't even consider it! But I should have: maybe I'll make a quick edit when I get a chance and point at your comment.

To rephrase, when getting a database from live, it is sensible to replace your current dev database, rather than merging the two databases.
 

The same applies to code - if possible it's good to have a deployment process that uploads everything from dev to live (except the files directory), rather than just overwriting live with the files that are currently in dev (which will not remove any files that have been removed from dev)

This is totally true, but I think my reply to dasjo stands: establishing a process is useful, but understanding from forensics what happened helps work out exactly what you've forgotten to do. Otherwise, you might get a PDO exception and not know what process to follow to stop it happening in future (which I think is what at least a few devs I know have been doing.)

And I absolutely agree with you re deployment, although I think tools like git - which are built to manage filesystem consistency in situ - are preferable to completely tearing down and re-uploading; unless you're using e.g. a drush-make-powered build process like Omega8/Aegir. But, yes, definitely don't just keep rsync'ing or ftp'ing code up!