You are here

sql

SQL Server error: sqlboot.dll

After several administrative steps over a few months, we suddenly found the following error coming from an instance of SQL Server on a virtual machine:

Your SQL Server installation is either corrupt or has been tampered with (unable to load SQLBOOT.DLL). Please uninstall them re-run setup to correct this problem.

Nobody had been keeping an eye on it, because the virtual machine gets very little use from day to day, but then is suddenly essential and has to have the dust blown off it. So we set to the nasty nuts-and-bolts work of starting/stopping services, running uninstall scripts, fixing registry entries when the scripts couldn’t be found, etc. All very well, but when we finally were able to begin the uninstallation procedure, it promptly keeled over, probably because it too was trying to activate some other “unloadable” .DLL .

It turns out that relabeling the drive that an installed SQL Server instance sits on (in this case, from F: to E:) causes all sorts of problems: even, it seems, if you scour the registry for old instances of F:SQL_Server... and replace with E:SQL_Server.... Moving it back fixed it. This suggests that if you want to relabel the drive, you have to completely uninstall SQL Server first. If you’re clever about it you can keep your databases and reattach them when you reinstall SQL Server on the new drive.

(I only mention this here as lots of other methods—which don’t work—seem to have a lot of Google juice, and we couldn’t get past the uninstallation barrier.)

Blog category: 

Now that's magic (quotes)

If your web application ensures that all your incoming CGI variables are free of the most common source of malicious site damage, can you stop worrying?

I wondered this as I got far enough into a PHP publishing system that I had to start thinking about adding new content through the system (rather than just jamming it into the database by hand, which is why the previous incarnation has sadly fallen into disuse). As it’s typically configured, PHP will add backslashes to anything it doesn’t trust: hence the comment “it’s a great site you’ve got here” will, when submitted by a POST request, become “it’s a great site you’ve got here”. Whether or not your server does this automatically can be checked by calling the function magic_quotes_gpc() (I realised only the other day that “gpc” stood for “GET, POST and cookies:” I probably have some catching up to do). In performing this blanket adding of slashes, PHP prevents the unwary coder from leaving his site open to both unintentional database hiccups and intentional malevolent attacks, the SQL injection attack.

All well and good, but my application is heavily object-oriented. Such objects store whatever content you give them, as well as optionally writing it to the database. If I want these objects to persist (even for the course of a single request) then any access to their internal storage must yield sensible data: those slashes have to disappear before the articles appear in an RSS feed, or on the website itself. So when the CGI environment gives slash-added content to an object, the object needs to know to both add it to the database verbatim and to produce it for viewing with the slashes removed. It can either do this by storing it in a slash-removed state or by placing a filter on its outputs.

There’s a further complication, in that content can also be written to an object by the PHP application itself: the publishing of all my unpublished articles, for example, would change the status of their accompanying objects without reference to any CGI variable. If I assumed all of this content had had its slashes escaped, then this article, for example, would lose all of its ’ text, because the object would assume they’d been added by PHP’s internals: in my second paragraph, the “after” string would look like the “before” string, and the “before” string would instead break the database insertion. In addition, what if the server is reconfigured? Can I trust my hosting company to never change the configuration of PHP, even accidentally during an upgrade?

I found myself lost in a maze of adding, removing and then adding slashes, with no clear way of deciding. Suddenly I decided: why not use one of PHP’s major downsides—that it doesn’t support persistence of objects from one request to the next very well, and hence each action is fighting against the overhead of constantly recreating and recompiling code—to ascertain which input/output processes were the most frequent (and most public) and hence needed to be the fastest? I drew a flowchart of a typical object’s behaviour and, by identifying which channels could be safely bottlenecked, arrived at a reasonable solution to the problem.

From my phrasing it’s clear that it was a foregone conclusion: I wanted, more than anything else, for content to flow straight from the database (through the object if applicable) to the user. This content needed to stay in any object in a simple, de-slashed form, so it could flow and flow as long as the object was in existence. That meant that incoming CGI content could not be stored with its added slashes intact. Counter-intuitively, then, my solution was to undo PHP’s default safety mechanisms, unescaping the CGI content and storing it raw, and then without fail adding slashes to anything that CGI or my application wanted to add to the database. This would be my bottleneck: everything else would be as fast as it could be.

Exit gracefully: ensuring all incoming content can be added to the database safely is not necessarily the most efficient or desirable long-term solution. By examining the likely workflows for content, it’s possible to make pragmatic decisions on where content should be pre-processed and where it should be left alone. Consider all your overheads, including that of short-term programming and long-term cumulative processing time: this will vary depending on your environment. Also, if you’re aware of a safety net, over the presence of which you have minimal control, account for the possibility that someone might one day remove it.

This space intentionally left blank

I’ve been asked a couple of times recently, as part of separate projects, to split the results of a SQL query on whitespace within. Simply put, how does one go from:

foo
foo bar
quux
blort wuu spong

to the expanded form:

foo
foo
bar
quux
blort
wuu
spong

efficiently and cleanly, only using SQL? (In case anyone’s worried, I’ve scrubbed the data sets of any personal details they might have previously contained: any resemblance to the real Blort Wuu-Spong is entirely coincidental.)

I finally decided it wasn’t possible, and although without the pure mathematics to back me up I could have kept hunting—partial solutions involving a self-join for each whitespace splitting kept rearing their heads—what finally convinced me was comparing the behaviour of SQL with that of XSL(T). The two are more alike than you might think; and no, I don’t mean SQL and XQuery, although that easy comparison provides a clue for the underlying similarity.

In XSL(T), the XML node in your original document(s) is in a sense king: it’s considered bad form (and is at any rate inefficient) to do data management on some transient data set, created within the template. Loops work best over nodesets rather than with some sort of conditional or from/to structure. This stems from XSL(T)’s underlying functional paradigm, where each nodeset is created

Of course, it’s always possible to twist non-functional behaviour out of the stylesheet (and most real-world solutions have to take a pragmatic approach to such programmatic purity) and interpreter-specific kluges exist to node-ize strings based on some non-XML token, but the language works fastest and cleanest when it’s hanging functions off nodes.

In SQL, the equivalent to the node in an XML document is the row in a query. Rows are passed around, compared with other rows based on the content of some of their cells, tied together and discarded, but very rarely can rows be created out of thin air. The closest one gets is the LEFT/RIGHT OUTER JOIN where the ON-condition is not satisfied: then the left-hand row, rather than being discarded as in the INNER JOIN, is in a sense tied to a row of NULLs. Although that equates to it being tied to no row at all, then when the SQL99 dust settles and post-processing can begin, NULLs can be reinterpreted (Coldfusion does this without being asked, for example).

So to create new rows, one can UNION two rowsets, or entangle the rowsets with some sort of a JOIN, but in simplest, non-iterative SQL, there ought to be no easy way to make one row magically split into two, or maybe three, or maybe four, based on its textual content. It breaks the underlying principle, that rows should flow through the SQL into bit-buckets or the STDOUT tray, but shouldn’t be tossed into the stream with flamboyant verve like chillis into a stir-fry.

Exit gracefully: regardless of the data itself, the data model that a given language’s designers had in mind can have the most effect on what’s plausible to do in the language. Almost all languages evolve through proprietary extensions until they can do associative arrays, every kind of loop structure and, if left alone for long enough, GOTOs, but being able to complete a task with a given language is not the same as being able to complete it, for a sufficiently large data set, before the death of your server, your development team or the universe.

SELECT needle FROM haystack;

Ever wondered just where in a complex database you stored that particular bit of information? Or ever had to reverse-engineer someone else’s database, about which you know nothing, with only the output from an application to guide you? In these cases it’s handy to be able to search every text cell, in every column, in every table.I found a solution to this problem on the web, specific to Microsoft SQL Server. This solution uses SQL Server’s inbuilt system tables to find all the table and column names and thus search the entire database. Although it doesn’t seem to run fast enough to be of use in a live environment (several seconds for a large database) it’s still a useful development tool.

Unable to find an equivalent for PostgreSQL, I decided to write the following. The function and datatype can be loaded into a current database by e.g. typing:

i ‘/path/to/file/f_search_all.sql’

at the psql command prompt when connected to the database you want to search. Then run:

SELECT * FROM f_search_all(’some search text’);

to retrieve any matching tables and columns, plus a sample of the text that matched.

Here is the code: http://www.jpstacey.info/blog/files/code/f_search_all.sql.

As with the SQL Server solution, this works by referencing PostgreSQL’s system tables, in this case pg_class and pg_catalog. Most of the code consists of loops over static SELECTs and so is quite speedy: the only slow point is the EXECUTEion of dynamic code where the table and column names (stored in varchars) are evaluated as part of a SELECT string. Interestingly, these same system tables power the psql backslash commands. If you type:

set ECHO_HIDDEN t

within psql, and then e.g. examine a table using

d table_name

then psql will show you the queries it has made to provide you with the well-formatted output (thanks to Ian Bicking for that tip).

Blog category: 
Subscribe to RSS - sql