You are here

atom

Feeds objects within feeds objects

The Drupal Feeds module consists of layers of objects, tunnelling between each other, like a pearl onion on a cocktail stick

We've been doing a lot of work with the Drupal Feeds module recently. The frontend is nice enough, although the sub-navigation was rendered almost illegible by our theme's CSS. The online tutorials need work, and the admin navigation needs to be made a bit more robust to layout changes; but then it will be the de facto way for people to consume feeds on their Drupal sites.

The most recent work we've been doing involved custom integration with RSS feeds arriving effectively as PHP string variables containing all the XML. This is different from either a file on disk or a remote URL: in fact, we had a Python program creating the RSS file from us via a shell (which in turn, horribly, was hitting a remote Oracle database using cx_Oracle). Feeds was definitely up to the job in terms of power. In fact, it was quite a toolkit of useful functionality, which is Drupal code for "incredibly powerful but almost incomprehensible.

It's not that the developer documentation for Feeds isn't decent: it's pretty good. But it's limited in scope: it tells you roughly how to expose your own Feeds-like objects to the admin interface, but not really how all those objects interact. Most importantly, we wanted to know what happened on a cron run: this is the bedrock of how Feeds works on your site, after all.

I poked around a bit and this is what I discovered:

 

Workflow of a Feeds cron run

Here's a summary of the above diagram to give you some idea of what's going on.

 

  1. Drupal's cron creates a FeedsScheduler object and passes it a "job", which is all the configuration for a feed call, including any configuration that was attached originally to the particular node which defines the Feed. The scheduler creates a FeedsImporter and passes it the job; the importer then creates a FeedsSource and embeds itself in it as a parent. In each case, the method ::work() is called to create the child/helper object.
  2. The Source object is what now runs the three phases of feed consumption, via its parent Importer. The Source asks the Importer for the relevant Fetcher, Parser and Processor objects: for example, the HTTP Fetcher, the RSS Parser and the Node Processor objects are strung together to turn an RSS feed at a HTTP URL into a set of nodes, one per entry. Each of these have a relevant, verb-like named method: so ::fetch() for the Fetcher etc. The common currency is a FeedsBatch object, which gets passed around and needs to have methods that make it feel like a batch of feed objects.
  3. After the three phases have run, the Source calls hook_feeds_after_import() to do any tidying, then quits to the Importer, which quits to the Scheduler, which then runs its ::finished() method on the job, and the cron run for this particular feed is done.

 

When you build a new plugin, you need to implement hook_feeds_plugins() in a module and reference a class file: this class will be selectable in the admin interface for one of the three consumption phases, depending on what class it's ultimately based on. You should therefore extend existing classes rather than start from scratch: there are abstract PHP classes in the feeds module directories, which give you skeleton "interfaces" which you can then flesh out with relevant functions. But what's better is to extend e.g. the HTTP fetcher to fetch from a command on disk (which is what we did) or, say, extend the CSV parser to interrogate JSON.

Class hierarchies mean you don't have to spend a lot of time reinventing the wheel or hacking existing modules until they become unupgradeable; instead you can take existing classes and tweak them through inheritance, experimenting as you develop.

Blog category: 

Rolling feed on jpstacey.info

Following my recent success with putting a Flickr feed on my website’s front page is the conversion of this to an all-purpose feed reporter, where RSS/Atom flavour and feed specifics are dealt with by Javascript associative arrays of functions, keyed on both variables respectively.

If you’re lucky then the feed should wait a bit while it loads the XML via Javscript’s XMLHTTPResponse() code (querying proxies all on my website for foreign feeds like del.icio.us). Then it will report the first feed it finds, while still loading other feeds in the background. Every 15s or so it swaps to a new feed.

After writing it easily in Firefox, I found that a number of drastic changes were required for IE, as follows:

  1. IE doesn’t seem to have great support for the IMG element’s width or height properties, at least before the image is rendered on the page. For the Flickr feed, if imgElem.width == 0, I’ve had to employ some horrible regular expressions on imgElem.outerHTML (which isn’t supported by non-IE browsers, so it’s all a bit of a hack).
  2. IE doesn’t support assigning arbitrary properties to elements. I was using these to smuggle a collection of titles and descriptions (keyed by date) into a function and that of course failed. You can use setAttribute() for string-like properties, but not objects, which I’ve had to pass as parameters or in some reserved scopes.
  3. IE is very unforgiving about character encodings, behaviour which I haven’t been able to work around yet. Some of the programs that create my feeds (specifically Blosxom) aren’t very Unicode-wise and hence will happily produce invalid content. Firefox is very forgiving of this (which may or may not be the right XML way to do it, but is certainly a very patient way of dealing with web content generally). I’m currently working on my proxies to sort this out somehow, even if I have to filter everything down to the ASCII character set.

The Javascript is a single, monolithic file which relies on a beta-release version of Mochikit, and is available at http://www.jpstacey.info/index/javascript/index.js. Feel free to scrape it and re-use anything you find interesting there.

Blog category: 
Subscribe to RSS - atom