Example Drupal migration base class to help keep good metrics

Drupal's Migrate framework is powerful, useful and subtle. I've used it on a number of projects and it's always worked well, but one thing I've tried to adopt recently is proper and conscientious use of the metrics pages.

Migration metrics and why they're useful (and underused)

Here's the page of metrics you get if you click on the migration name in the dashboard:

Remember this? Even if you've found it in the past when wandering around, chances are that you—like me, frankly—have always thought of it as a "nice-to-have": in the sense of thinking "oh, that's nice," and then moving on.

On the screenshot above you see a number of tabs, each keeping track of what you've yet to consider: even if "considering" basically means marking as belonging to the group "D(o )N(ot )M(igrate)." This provides breakdowns for each migration of what source and destination fields have and haven't been dealt with one way or another. Here's an example of such an unhandled field:

To create this example, I've deleted the mapping from a source field, into the destination "created" field: you can see "created" is red; but there's also now a red source field too. Migrate is letting me know that my migration plan doesn't include these fields.

The goal offered to you by this page is to ensure that no tabs have red lights on them, much as with a unit test suite. Which makes it suddenly sound a lot more useful: a test harness for your migration specification, if not for your migration itself.

But the framework also has a steep learning curve which these days I forget was ever the case. This means that keeping the metrics up to date is a task that tends to go to the wall early on, as you're trying just to get the migration working. Bad habits set in, and before you know it, you're just writing the minimum code that gets the data across. Also, when a deadline is approaching, any "good bookkeeping" you've indulged in so far will tend to suffers.

... Which is OK, until there's some kind of a problem: at which point, those metrics would have come in useful!

How do we make maintaining metrics easier?

In order to keep our metrics in good order, there's a number of decisions we need to tell Migrate about:

  1. We plan to migrate from the source field Sm to the destination field Dm.
  2. We don't plan to migrate from the source field Sdnm, discarding it.
  3. We don't plan to migrate to the destination field Ddnm, leaving it empty.
  4. We plan to create an on-the-fly source field Sotf, not explicitly present in the source, to migrate into the destination field Dm.
  5. We have some custom storage for an "on-the-fly" destination field Dotf that doesn't yet "talk Migrate" but we want to put data there.

Here's how we can take care of these cases:

  • The first case is straightforwardly handled declaratively: $this->addFieldMapping() updates any relevant metrics. No extra work is required.
  • The second and third cases can be handled using $this->addUnmigratedSources() and ...Destinations(), but you do need to tell them about each one.
  • The fourth case can be handled when setting up $this->source: for example, MigrateSourceSQL::__construct() takes an array second argument which can include these.
  • The fifth case is a bit tricky, and needs you to provide a custom destination to e.g. MigrateSourceSQL::__construct(): we don't cover it here. Try using Migrate Extras to provide it for e.g. pathauto, media etc. before you reach for customization.

All of this metric-keeping is possible, then: but remembering you need to do it, and then having to keep writing extra PHP each time, is a bit of a pain. How do we automate as much of this as possible?

A base class facilitating good metrics

Because Migrate is object-oriented, we can extend the core Migration class to provide an "abstract" base, and then base all of our own "real" migrations on that base.

Below is an abstract class that handles our cases 1–4, and an example extended real class showing how you'd use it. The base class also does the following:

  • Sets up simple metadata e.g. the team members with contact details.
  • Provides a parent::__construct() that you can pass your source/destination objects to and it will "just work" for most cases.
  • Sets up ->defaultValue properties on field mappings (and could do the same with callbacks.)

Here's the code for the abstract base class:

<?php
 
/**
 * @file
 * Migrate class: abstract base.
 */
 
/**
 * @class
 * MigrateMyBase.
 */
abstract class MigrateMyBase extends Migration {
  // Simple field mappings.
  public $simpleFieldMappings = array();
  // Unmigrated source fields.
  public $unmigratedSourceFields = array();
  // Unmigrated target fields.
  public $unmigratedTargetFields = array();
  // Extra source fields, added in say ::prepareRow().
  public $extraSourceFields = array();
  // Adding simple defaults.
  public $defaultValues = array();
 
  /**
   * Implements ::__construct().
   */
  public function __construct($arguments, $from_to = array()) {
    parent::__construct($arguments);
 
    $this->team = array(
      new MigrateTeamMember(
        'J-P Stacey',
        'jp@example.com',
        t('Freelance Drupal Architect')
      ),
    );
 
    // If we've got a from/to configuration (
    if ($from_to) {
      $this->setUpMigration($arguments, $from_to);
    }
  }
 
  /**
   * Private: set up source, destination, SQL map and basic field mappings.
   *
   * @param $arguments array
   *   Standard constructor arguments array, from hook_migrate().
   * @param $from_to array
   *   to_object => a MigrateDestination* object.
   *   from_id => array describing the source ID field for migrate mapping.
   *   to_schema => a schema for the destination ID: see example later.
   */
  private function setUpMigration($arguments, $from_to) {
        // Source, destination and map between the two.
    $this->source = new MigrateSourceSQL(
      $this->query(),
      // Extra fields, not provided by the query.
      $this->extraSourceFields,
      NULL,
      array('map_joinable' => FALSE)
    );
    $this->destination = $from_to['to_object'];
    $this->map = new MigrateSQLMap(
      $arguments['machine_name'],
      $from_to['from_id'],
      $from_to['to_schema']
    );
 
    // Standard field mappings.
    $this->mungeFieldMappings();
  }
 
  /**
   * Protected: helper function to add simple field mappings.
   *
   * Eventually move this into __construct().
   */
  protected function mungeFieldMappings() {
    // Add simple field mappings.
    foreach ($this->simpleFieldMappings as $drupal_field => $legacy_field) {
      $this->addFieldMapping($drupal_field, $legacy_field);
    }
 
    // Add any defaults to these.
    foreach ($this->defaultValues as $target_name => $default_value) {
      $this->codedFieldMappings[$target_name]->defaultValue($default_value);
    }
 
    // Track intentionally unmigrated source/targets.
    $this->addUnmigratedSources($this->unmigratedSourceFields);
    $this->addUnmigratedDestinations($this->unmigratedTargetFields);
  }
}

And here's an example class which extends it:

<?php
/**
 * @file
 * Migrate class: blogposts.
 */
 
/**
 * @class
 * MigrateMyBlogpost.
 */
class MigrateMyBlogpost extends MigrateMyBase {
  // Description.
  public $description = "From blogpost table";
  // Dependencies.
  public $dependencies = array('MyUser', 'MyTerm');
 
  // Simple field mappings.
  public $simpleFieldMappings = array(
    'title' => 'name',
    'field_tags' => 'tags',
    /* ... */
  );
 
  // Unmigrated destination fields.
  public $unmigratedTargetFields = array(
    'revision',
    /* ... */
  );
 
  // Extra source fields, added in say ::prepareRow().
  public $extraSourceFields = array(
    'tags' => "Added in ::prepareRow() by a subquery",
    /* ... */
  );
 
  // Adding simple defaults.
  public $defaultValues = array(
    'status' => 1,
    /* ... */
  );
 
  /**
   * Implements ::__construct().
   */
  public function __construct($arguments) {
    parent::__construct(
      $arguments,
      array(
        'to_object' => new MigrateDestinationNode('blog'),
        'from_id' => array(
          'blogpost_id' => array(
            'type' => 'int',
            'not null' => TRUE,
            'description' => 'Source blogpost ID',
            'alias' => 'p',
          )
        ),
        'to_schema' => MigrateDestinationNode::getKeySchema()
      )
    );
 
    // Any complex mappings we can't define with arrays?
    /* ... */
  }
 
  /**
   * Protected: helper function to assemble iterable query.
   *
   * Base class ->setUpMigration iterates over this query: one row per source item.
   * Only add fields which won't cause duplicate rows for a given source ID.
   */
  protected function query() {
    $q = Database::getConnection('default', 'legacy')
      ->select('blogpost', 'b')
      ->fields('blogpost_id', 'name');
    $q->orderBy('b.id');
 
    return $q;
  }
 
  /**
   * Implements ::prepareRow().
   *
   * Add fields which would otherwise lead to duplicates in iterable query above.
   */
  public function prepareRow($row) {
    // Add potentially many tags using a subquery.
    $q = Database::getConnection('default', 'legacy')
      ->select('tags', 't')
      ->fields('label');
    $q->condition('t.blogpost_id', $row->blogpost_id);
 
    $result = $q->execute();
    while ($tag = $result->fetchAssoc()) {
      $row->tags[] = $tag['label'];
    }
  }
}

As you can see, most of the example class is now just PHP arrays, and it's really simple to find the right place to add new metrics to. In theory, you might be able to abstract a lot of those PHP arrays into the hook_migrate() hook which registers your migrations, leaving you with a generic class to cover several migrations; in practice, I've found that practically every migration needs its own PHP class anyway, so it's best to add them sooner rather than later.

Summary

Migrate works really well, but to get the most out of it you should consider keeping the metrics page(s) up to date. Doing so can be fiddly, but a simple abstract base class can be used to cover the most important ways of registering such metrics. With this example base class, and an example extension of it, you're now set to go full metric when writing your next migration.