This space intentionally left blank

Tuesday, July 11, 2006 - 11:22

I’ve been asked a couple of times recently, as part of separate projects, to split the results of a SQL query on whitespace within. Simply put, how does one go from:

foo
foo bar
quux
blort wuu spong

to the expanded form:

foo
foo
bar
quux
blort
wuu
spong

efficiently and cleanly, only using SQL? (In case anyone’s worried, I’ve scrubbed the data sets of any personal details they might have previously contained: any resemblance to the real Blort Wuu-Spong is entirely coincidental.)

I finally decided it wasn’t possible, and although without the pure mathematics to back me up I could have kept huntingâ€”partial solutions involving a self-join for each whitespace splitting kept rearing their headsâ€”what finally convinced me was comparing the behaviour of SQL with that of XSL(T). The two are more alike than you might think; and no, I don’t mean SQL and XQuery, although that easy comparison provides a clue for the underlying similarity.

In XSL(T), the XML node in your original document(s) is in a sense king: it’s considered bad form (and is at any rate inefficient) to do data management on some transient data set, created within the template. Loops work best over nodesets rather than with some sort of conditional or from/to structure. This stems from XSL(T)’s underlying functional paradigm, where each nodeset is created

Of course, it’s always possible to twist non-functional behaviour out of the stylesheet (and most real-world solutions have to take a pragmatic approach to such programmatic purity) and interpreter-specific kluges exist to node-ize strings based on some non-XML token, but the language works fastest and cleanest when it’s hanging functions off nodes.

In SQL, the equivalent to the node in an XML document is the row in a query. Rows are passed around, compared with other rows based on the content of some of their cells, tied together and discarded, but very rarely can rows be created out of thin air. The closest one gets is the LEFT/RIGHT OUTER JOIN where the ON-condition is not satisfied: then the left-hand row, rather than being discarded as in the INNER JOIN, is in a sense tied to a row of NULLs. Although that equates to it being tied to no row at all, then when the SQL99 dust settles and post-processing can begin, NULLs can be reinterpreted (Coldfusion does this without being asked, for example).

So to create new rows, one can UNION two rowsets, or entangle the rowsets with some sort of a JOIN, but in simplest, non-iterative SQL, there ought to be no easy way to make one row magically split into two, or maybe three, or maybe four, based on its textual content. It breaks the underlying principle, that rows should flow through the SQL into bit-buckets or the STDOUT tray, but shouldn’t be tossed into the stream with flamboyant verve like chillis into a stir-fry.

Exit gracefully: regardless of the data itself, the data model that a given language’s designers had in mind can have the most effect on what’s plausible to do in the language. Almost all languages evolve through proprietary extensions until they can do associative arrays, every kind of loop structure and, if left alone for long enough, GOTOs, but being able to complete a task with a given language is not the same as being able to complete it, for a sufficiently large data set, before the death of your server, your development team or the universe.

I'm a Drupal Association member!

My individual membership of the Drupal Association

Drupal 8 API tutorials

Want to learn about the Drupal 8 APIs, with worked examples? Follow my series of tutorials, covering routing, caching, entities, config and much more!

Want to hire me?

I'm currently

fully booked

Blog category:

efficiency, import/export, paradigms

Recent blogposts

Altering the length of a Drupal 8 text field that contains data

Friday, July 21, 2017 - 11:31
A menagerie of testing: behavioural, unit, system, smoke, regression, oh my!

Friday, June 2, 2017 - 10:11
Including Javascript in Behat tests, all inside a headless, virtual machine

Tuesday, May 30, 2017 - 16:51

All blogposts

About me

I'm J-P Stacey, and I'm a freelance technical developer and software architect, working with Drupal, Javascript, Symfony, PHP and devops, with experience in project and process management and an emphasis on usability.

I live in the UK; my website is self-hosted on bigv.io; my email is hosted by Google, and that's also what I use to share files. (More info|What is this?)

Secondary menu

This space intentionally left blank

I'm a Drupal Association member!

Drupal 8 API tutorials

Want to hire me?

Blog category:

Tags:

Recent blogposts

Altering the length of a Drupal 8 text field that contains data

A menagerie of testing: behavioural, unit, system, smoke, regression, oh my!

Including Javascript in Behat tests, all inside a headless, virtual machine

About me

Find me elsewhere