database

Drupal module: watermarking your development sites

If you've ever been programming in dev-test-prod environments and thought "now, where am I?" then this might be for you.

Developing Drupal in a development--test--production environment has a lot of advantages. Each developer's work is sandboxed, staging is straightforward, and deployment to live the stuff of Capistrano scripts---especially if you unite the entire ecosystem of separate environments with version control.

However, it can lead to confusion over precisely where your browser's currently pointing at: at best, this can be comical; at worst, it can result in either loss of live content or the logjamming of a staged site with content intended for live. Suddenly a staging environment is out of action until that content can be exported to the live site.

Enter devwatermark β-0.1, a D5 module intended to watermark any non-live sites with a little right-hand banner overlay. When first enabled---you can do this on your live site---it inserts no such banner. However, as you add live domains to its configuration, it begins to work out when it's not on a live site and tags your browser window with the "DEVELOPMENT" banner. Like watermarking your printouts with "DRAFT", or maybe like dogearing a page in a book. If your site has to respond to multiple domains---and if the content is the same then you should really serve up 301s instead---then you can add those extra domains to the configuration as required.

Having devwatermark enabled on live means that when you bring a live database down to a staging site to test the next round of updates, the change in domain will make devwatermark automatically show its banner image. You know instantly when you're in a development---and hence content-volatile---environment. That means that you can also watermark staging sites as non-development too: by adding them to devwatermark's configuration you can confirm to the developers that here's a database environment that they can't wipe and start again.

As I mention above, it's currently only available for Drupal 5, and your theme has to respect hook_footer (most do out of the box.) But please feel free to download it and give it a try.

Inline edit links, but not editing inline

Squaring the circle of simple CMS usability with complex content representations, with a neat low-footprint Drupal module

It's heartwarming, really encouraging to see that Drupal 7 is undergoing a usability review. Drupal's a massively functional CMS, but all the functionality in the world won't help you when the average (for which read: can't write HTML, let alone PHP) CMS user can't discover it. There's a common misconception that usability is the finishing touches you add to an application if you've got time, the icing on the cake; but if your application lays any claim to maturity then its usability is the cake, and all that functionality you were so proud of is, without usability, just eggs and flour.

One of the main usability improvements suggested by the usability team---and largely shouted down by the technical team---is the ability to edit inline on the page: that is, to log in as an admin, then have any bit of the page "active", so that if you click on it then it becomes an edit box with the text inside. Flickr does this especially well, letting you edit title and description on photo pages and lists of photos by just clicking on the apparently uneditable text. But Flickr has the advantage that there's very little form on top of its content: it's a delivery mechanism for the raw metadata about photos, and the photo itself.

The other end of the spectrum---which complex CMS sites have every right to sit on---is a rich and complicated mapping between the storage of a node's content in the database and the eventual display of it in the browser. take a page from a recent Torchbox project at random, how would you expect areas of this page from the Joseph Rowntree Foundation's website to behave when you clicked on them? If you have to hardcode print statements in your PHP templates, what do you print? How do you get editing inline to work? What happens when content is brought in from other, related nodes, and mixed in with the other content before display.

I can appreciate both sides to this story of user experience versus technical practicality, although it's not sufficient to expect the usability team to discard the idea merely because there's no correspondence between page content and database content: that's only an argument for why Drupal doesn't currently have edit-on-page. The usability project is moving forwards rapidly, and while there's clearly a tension between usability for the CMS user and feasible technical limitations---usability for the developer, if you like---it will need to be resolved soon for this marvellous work, and a great opportunity, not to end up wasted. And resolving that conflict will involve some sort of compromise, for both sides.

One possible compromise would be to offer edit links, when Drupal can spot a sort-of 1-to-1 correspondence between a fragment of page content and the node that supports it. Page templates and views---specifically hook_preprocess_node and hook_views_pre_render---know full well that what they're processing is a node. And they generally know what field the node title will be in. So let Drupal rewrite the title, to add an "edit inline" link. If anyone clicks on this link, then pop the node-edit form up in a lightbox for editing.

Here's some screenshots of what I've been working on, in an attempt to get people interested (click for bigger.) Firstly, here's what the anonymous site visitor sees:

Homepage for an anonymous site visitor

Next, here's what happens when a user has just logged in. Note that the brilliant Admin menu module kicks in, giving the user a black navigation bar across the top. But, more pertinently, each node title also now has an "[edit inline]" link beside it:

Homepage for a logged-in admin user

If the logged-in user clicks on one of these new links, then our edit-inline module kicks in and, using the equally brilliant Drupal Thickbox wrapper module, provides a stripped-down version of the node-edit page in a Thickbox overlay, both speeding up node editing using AJAX calls and also letting the user cancel the node-edit procedure and return to the webpage they were on quickly:

Effect of clicking on an 'edit inline' link

To reiterate, you don't have to be on a node's page to edit it. All that matters is that the title of the node you want to edit passes through onee of the supported pre-render hooks. Currently, clicking on save/preview/cancel takes you elsewhere rather than being trapped within the Thickbox, and we're also wrestling with getting CSS and Javascript into the Thickbox overlay to support the nattier bits of node editing, but it's functional and, I hope, gives you some idea of how it would all work given a few more hours of bashing away at keyboards.

Anyway, there it is. A possible compromise. I've mentioned it in a comment on the d7ux blog but I fear I might have been eaten by a spamtrap. If anyone's interested in the project then email me, jp.stacey, either at gmail.com or torchbox.com, and say hello.

Deserving of a serious LAMPing

Reinventing primary keys, one horrifying integrity error at a time.

The LAMP-stack community frequently shows their disdain for foreign and primary keys in databases, and sometimes with reason borne of experience. MySQL historically has been little more than a nice language for comparing a set of unrelated spreadsheets, so referential integrity has had to happen at the application layer or not at all. As such, careful MySQL users put a lot of work into ensuring referential integrity, without the help of the database.

But the eyes of even the most hardened LAMPer would widen, if he were to dip into a conversation to find someone saying the following about something they’ve built over PostgreSQL:

Yeah, I could add the current date/time to the end, so as long as you don’t add more than one [entry] with the same name within 1 second, it’ll be fine.

It might take you a while to work out (a) what that’s referring to (b) what kind of error in thinking it demonstrates and (c) what it implies about the quality of the underlying code. By that time—if you’ve worked at all with that sort of programming—your head may well be in your hands.

A complex CCK module in Drupal

CCK is Drupal’s way of making rich content. It means that nodes of any content type can have any kind of data attached to them, so you can have e.g. a directory of superstore outlets, where the outlet records have their longitude and latitude (editable by a Google Map widget) whereas the contact records (e.g. Sales Manager, South-East) can have a portrait photo, selected from an image gallery in a dropdown widget. (A note on terminology: widgets are the structures which are used to edit the CCK data, typically defined in the same CCK submodule as the data types).

There’s already a tutorial with a CCK submodule available. The premise of the sample module is to provide two fairly generic numerical formats. In principle that simplifies the submodule, but in practice it also means it’s difficult to work out precisely why particular programmatic choices were made and what you can tweak. Also, at work we wanted a much richer content type. Here’s what we wanted to display:

  1. Build an “Available in” field for e.g. all the different formats that a book might come in.
  2. Record the dimensions DxWxH
  3. And some dscriptive information
  4. And let the user pick a thumbnail from a dropdown
  5. And let all of this be done multiple times

Here’s a mockup of what we wanted to appear on the admin page (excuse styling—Wordpress isn’t the best for this):


Available in










And here’s sort of what we wanted on the front end (again, sorry for the HTML: it’s table-heavy as it’s a one off):

Available in
Hardback Paperback Trade p/bk
ins cm ins cm ins cm
W 5.1 13 W 5.1 13 W 5.1 13
H 7.6 19.2 H 7.6 19.2 H 7.6 19.2
D 0.79 2.0 D 0.79 2.0 D 0.79 2.0

It was difficult, from the simplistic submodule available, to work out exactly how you leverage complex editing widgets, and how you get CCK to save all the database information in the five columns we needed: description, thumbnail image reference and three dimensions in inches. What’s worse is that a CCK submodule with an incomplete set of hook functions seems to do a fandango on certain areas of the database (possibly the cacheing of certain bits of the module’s behaviour): certainly developing a CCK submodule was quite painful, as the process of trial and error frequently required a database restore so as to try again. That tutorial does warn that “all hooks are required. CCK expects them to all be present and will not function correctly if some are missing.” But there’s a world of developer pain in those brief phrases.

Since I originally mentioned the problems we were having on the development mailing lists, a couple of people have asked me how to build a CCK submodule to support rich fields. With this in mind my employers Torchbox have very gracefully let me make the module we built public, with some minor edits (the client’s site isn’t live yet, so names and details changed to protect the innocent).

The module is available here. It consists of 4 1/2 pairs of functions, as follows:

  1. Declare:
    1. field types that the module supports
    2. widgets that the module supports, for editing fields on the node-edit page
  2. Settings for:
    1. field (so settings inherent to the way that particular field is stored e.g. decimal accuracy for numeric values)
    2. widgets (so e.g. the image gallery that populates the dropdown for the wireframes)
  3. Handlers for:
    1. field, so what happens to it when a node is loaded
    2. widget, so pre-processing of the field to fit a HTML form, or producing a chunk of Form API, or post-processing the HTML form into a database-saveable format.
  4. BONUS UNNECESSARY FUNCTION: cck_book_complex_input just produces the Form API for function 3b. Ignore this one!
  5. Formatters, which give you output options for fields on the “Display Fields” page.
    1. Declaring formatters
    2. Formatters

A few extra pointers:

  1. CCK does most stuff for you, so you don’t need to actually load/save field contents to the database; declaring the columns for widgets and fields is enough.
  2. If you’re in the developing phase with your CCK submodule, back up the database first. See above.
  3. We sidestepped formatters altogether as they don’t cope with e.g. multiple field values that need a header value. Instead, we just overrode theme('field'). One thing, though: you do need some formatters present, or the admin configuration page for “Display Fields” defaults every dropdown to “<Hidden>”, and you can’t get anything to appear.
  4. To reiterate, CCK needs all hooks present in its submodules from the start. The further you get from Drupal core, the more bleeding-edge APIs will get.

Note that as I say we had to peer into the node (use var_dump($node) and search for the right array) and then override the theme, because we wanted a fancy header to our field. But that was pretty painless—mostly a matter of knowing one’s PHP rather than one’s Drupal—and it all worked smoothly otherwise. And we ended up with by and large exactly what we wanted. It works and, in retrospect, Drupal has helped us to write readable, maintainable code.

Drupal in five minutes

Will wonders never cease? I’m helping a friend build a website for his driving school, and decided to give Drupal a try so he could have all the whistles and bells he might want. It seemed a bit like overkill, but it would also be instructive for me and would mean that he could always turn to someone else in future to do the development. The number of sites that must founder because they’re maintained by a friend that becomes an ex-friend….

Anyway, it turns out that installing Drupal on Mythic Beasts’ standard hosting arrangement takes literally less than five minutes:

  1. Download the .tgz from drupal.org
  2. Unpack it
  3. Visit the index of the installation in your web-browser
  4. Put in the details of your MySQL database
  5. Er…
  6. That’s it.

There’s refinements, of course: setting up multiple sites, that sort of thing. But on the other hand, as I’d rolled a Drupal+modules package at work for this sort of occasion, it’s even quicker for me to get up and running with a protean CMS, chock full of the most useful community contributions. If nothing else I’ve now got Wordpress functionality plus, in about the same time as it takes to install Wordpress.

Now comes the hard part, of course: either writing my own theme and functionality from scratch, or editing what someone else thinks passes for XHTML. Ugh, on both counts.

Now that's magic (quotes)

If your web application ensures that all your incoming CGI variables are free of the most common source of malicious site damage, can you stop worrying?

I wondered this as I got far enough into a PHP publishing system that I had to start thinking about adding new content through the system (rather than just jamming it into the database by hand, which is why the previous incarnation has sadly fallen into disuse). As it’s typically configured, PHP will add backslashes to anything it doesn’t trust: hence the comment “it’s a great site you’ve got here” will, when submitted by a POST request, become “it’s a great site you’ve got here”. Whether or not your server does this automatically can be checked by calling the function magic_quotes_gpc() (I realised only the other day that “gpc” stood for “GET, POST and cookies:” I probably have some catching up to do). In performing this blanket adding of slashes, PHP prevents the unwary coder from leaving his site open to both unintentional database hiccups and intentional malevolent attacks, the SQL injection attack.

All well and good, but my application is heavily object-oriented. Such objects store whatever content you give them, as well as optionally writing it to the database. If I want these objects to persist (even for the course of a single request) then any access to their internal storage must yield sensible data: those slashes have to disappear before the articles appear in an RSS feed, or on the website itself. So when the CGI environment gives slash-added content to an object, the object needs to know to both add it to the database verbatim and to produce it for viewing with the slashes removed. It can either do this by storing it in a slash-removed state or by placing a filter on its outputs.

There’s a further complication, in that content can also be written to an object by the PHP application itself: the publishing of all my unpublished articles, for example, would change the status of their accompanying objects without reference to any CGI variable. If I assumed all of this content had had its slashes escaped, then this article, for example, would lose all of its ’ text, because the object would assume they’d been added by PHP’s internals: in my second paragraph, the “after” string would look like the “before” string, and the “before” string would instead break the database insertion. In addition, what if the server is reconfigured? Can I trust my hosting company to never change the configuration of PHP, even accidentally during an upgrade?

I found myself lost in a maze of adding, removing and then adding slashes, with no clear way of deciding. Suddenly I decided: why not use one of PHP’s major downsides—that it doesn’t support persistence of objects from one request to the next very well, and hence each action is fighting against the overhead of constantly recreating and recompiling code—to ascertain which input/output processes were the most frequent (and most public) and hence needed to be the fastest? I drew a flowchart of a typical object’s behaviour and, by identifying which channels could be safely bottlenecked, arrived at a reasonable solution to the problem.

From my phrasing it’s clear that it was a foregone conclusion: I wanted, more than anything else, for content to flow straight from the database (through the object if applicable) to the user. This content needed to stay in any object in a simple, de-slashed form, so it could flow and flow as long as the object was in existence. That meant that incoming CGI content could not be stored with its added slashes intact. Counter-intuitively, then, my solution was to undo PHP’s default safety mechanisms, unescaping the CGI content and storing it raw, and then without fail adding slashes to anything that CGI or my application wanted to add to the database. This would be my bottleneck: everything else would be as fast as it could be.

Exit gracefully: ensuring all incoming content can be added to the database safely is not necessarily the most efficient or desirable long-term solution. By examining the likely workflows for content, it’s possible to make pragmatic decisions on where content should be pre-processed and where it should be left alone. Consider all your overheads, including that of short-term programming and long-term cumulative processing time: this will vary depending on your environment. Also, if you’re aware of a safety net, over the presence of which you have minimal control, account for the possibility that someone might one day remove it.

SELECT needle FROM haystack;

Ever wondered just where in a complex database you stored that particular bit of information? Or ever had to reverse-engineer someone else’s database, about which you know nothing, with only the output from an application to guide you? In these cases it’s handy to be able to search every text cell, in every column, in every table.I found a solution to this problem on the web, specific to Microsoft SQL Server. This solution uses SQL Server’s inbuilt system tables to find all the table and column names and thus search the entire database. Although it doesn’t seem to run fast enough to be of use in a live environment (several seconds for a large database) it’s still a useful development tool.

Unable to find an equivalent for PostgreSQL, I decided to write the following. The function and datatype can be loaded into a current database by e.g. typing:

i ‘/path/to/file/f_search_all.sql’

at the psql command prompt when connected to the database you want to search. Then run:

SELECT * FROM f_search_all(’some search text’);

to retrieve any matching tables and columns, plus a sample of the text that matched.

Here is the code: http://www.jpstacey.info/blog/files/code/f_search_all.sql.

As with the SQL Server solution, this works by referencing PostgreSQL’s system tables, in this case pg_class and pg_catalog. Most of the code consists of loops over static SELECTs and so is quite speedy: the only slow point is the EXECUTEion of dynamic code where the table and column names (stored in varchars) are evaluated as part of a SELECT string. Interestingly, these same system tables power the psql backslash commands. If you type:

set ECHO_HIDDEN t

within psql, and then e.g. examine a table using

d table_name

then psql will show you the queries it has made to provide you with the well-formatted output (thanks to Ian Bicking for that tip).

Blog category:

Very flat, Excel

Database design, as an intellectual exercise, can be tremendously satisfying. As one builds up relations and constraints, making your database just rigid enough to support the more intelligent model soon to be layered over it, it’s possible to feel a sense of future-problems-solved: this check will prevent the model from trying to call a service a finished product; this trigger will roll back any unpassworded changes to a given provider. At the end, a complex relational system, which is normalized just to the point where normalization yields diminishing returns, is a work of craft if not of art.

So it’s always unsettling when the client then asks for such heavily relational data to be exported to a spreadsheet.

There are many excellent reasons why applications sit on top of a database instead of a spreadsheet, but most of them are hard to explain to clients, many of whom tend to use spreadsheets only as highfalutin Word tables. I calculated that I could express the data relevant to a single provider in one spreadsheet (multiple worksheets); it would therefore take around 1000 files, which the client understandably declined to accept.

Clearly a compromise was necessary, and in this instance it came about by asking the client what they actually needed: in this case, there were two separate requirements pulling in opposite directions that, singly, were easy to solve: their technical advisor wanted a schema, or at any rate a UML diagram, so they could build a companion application along similar lines to ours; their project co-ordinator wanted a checklist of the provider data we had, to avoid duplication during import and see how to proceed with their own data audits.

Exit gracefully: After having finally teased out the two separate requirements, and dealt with one by a simple dump of the database schema, I identified the tables which could, just about, be flattened. I planned ahead by checking exactly how many were involved in each many-to-one relationship—no more than two, in the areas I agreed to flatten—and eventually was able to promise the equivalent of an address book for providers. For the task in hand, this was more than adequate.

When building a database for a client, and one which might see some reuse or multiple simultaneous use, always make sure you can justify each constraint or relation not just in terms you can understand but in ones that stress the benefit to the client’s well-tended data: there will be a benefit to the data, so this isn’t as hard as it sounds. But, as you build, look for where you might need to make quick simplifications in future. Identify the quasi-flat areas you might be able to isolate, because sooner or later the client is bound to ask for data in that shape.

Blog category:

Subscribe to RSS - database