Automated backups to S3

Wrapping python-boto in scriptable magic to back up your websites to S3

Simon Willison put together an excellent short-and-simple backup script over a year ago now, and I've used it intermittently to make backups. It takes files, whole directories, or the output of shell commands, and wraps it all up into a datestamped gzip file, before sending the whole thing up to an Amazon S3 account. It's a nice little piece of kit, in other words.

I found that it worked mostly fine, although it was (a) fiddly to run as a cron and (b) threw occasional errors which I wanted to fix. These two things together meant I wanted to do some proper development on the script and keep track of it.

I wrote a wrapper to help setting up cronjobs, which means it can be scripted using config files in the user's home directory and . I then decided on a whim to commit all of this to github , which is the online home for a community that's been set up around the git versioning tool. As befits a DVCS tool, git has spawned a community which is multithreaded and agile and most of all very friendly, with all free repositories available to the public and everyone able to fork off everyone else.

The repository was still a bit empty, as out of respect for the original coder I didn't commit with everything else. Then, very kindly, Simon agreed to let me bundle his script with everything else, so you can now check out (clone) the "Configured S3 Backups" repository and follow the README to get scripted backups up and running fairly speedily.

I'd love for people to give it a try and offer some feedback. I definitely want to squash bugs and would also welcome new functionality. But the joy of git and github is that if I don't want to implement any extras, then other programmers can go ahead and

In summary? Configured S3 Backups: give the script a try. Git: give DVCS a try. And github: give DVCS-based online community building a try. Message ends.

Python 3 for Absolute Beginners is here

The book you've all been waiting for, when you've not been waiting for Mark Pilgrim's.

I'm ridiculously excited that the book I co-authored with Tim Hall, Python 3 For Absolute Beginners, has been published. Apress very kindly sent me some complimentary copies last week and I immediately took photos of it and posted them on Flickr.

I blogged about how glad I was to be asked to work alongside Tim on the project back in January, and said among other things:

The book's aimed at those learning to program, through the medium of Python 3, rather than those already experienced in Python 2.x. But the new Python looks like an excellent way to teach people about the vagaries of a whole range of programming concepts. Generally the changes in the new version are for the better, and I think Python's benevolent dictatorship were absolutely right to conscience backwards incompatibility in the occasional change.

This is still all true, and I still think that Python 3 is still a quietly exciting overhaul of Python 2.x that raises the bar for both object-oriented and scripting programming languages.

My two chapters are on Exception Handling and Modules. Tim's ten chapters are on... everything else. Buy it. Enjoy it. Request it from your local library. Point it out nonchalantly to your friends. Use it to stop your desk from wobbling. Whatever: I'm just proud it's out there.

BeautifulSoup available for Python 3

Python 3 can now strip the hell out of webpages just as well as Python 2.

A Python3-compatible version of BeautifulSoup is now bundled with the Python2 BeautifulSoup tarball. It's actually been available since 27 December, but the most recent version addresses a bug in attribute handling.

It's a bit fiddly to get it working---you need patch, and both python3 and 2to3 on the command line (and 2to3 to be called 2to3-3.0---but when it does so, that ol' BS magic is pretty clear. While there's still lots of good reasons not to convert all your Python2 code to Python3, there's now one less reason not to begin your next big project in Python3.

(BeautifulSoup has an active user group on Google Groups, so you can report any bugs there.)

Blog category:

I'm writing approximately fifteen percent of a book

I wrote a whole book when I finished my academic thesis. Never again.

I'm really quite astonishingly happy to be joining Tim Hall in co-authoring Python 3 for Absolute Beginners. It's Tim and APress' project, but I've been lucky in getting to author two of the chapters.

The book's aimed at those learning to program, through the medium of Python 3, rather than those already experienced in Python 2.x. But the new Python looks like an excellent way to teach people about the vagaries of a whole range of programming concepts. Generally the changes in the new version are for the better, and I think Python's benevolent dictatorship were absolutely right to conscience backwards incompatibility in the occasional change.

There's been plenty of discussion of this on the blogosphere which I don't particularly want to repeat here. Relentless backwards compatibility can be harmful to a living, breathing application, which exists to enable yet constrain a programmer's wildest ideas. It's not as though there are major changes between 2 and 3: there's a kind of wart-removal ethos in their cleaning-up of exception handling, lists v. iterators, and the addition or removal of little details like dictionary comprehensions or the weird lesser-spaceship operator "<>", but: I'll miss the (deprecated) interpolation operator for string formatting. It's fortunate that "".format() is pretty Pythonic, and as good as a method equivalent to the operator is likely to get.

It's turning into a busy January for me, what with Oxford Geek Night #10, a minor bike whoops nearly two weeks ago, and now this. I intend to take every February evening off, if that's OK.

Playing with Django: a fretless experience

I've been trying for twenty minutes to shoehorn a joke about Grappelling into this excerpt.

Django continues to gather momentum towards its imminent 1.0 release. The 1.0 beta 1 is out; the developer documentation has been refactored; it already places nicely with Python's powerful debugging and logging tools; indeed, all is proceeding according to the roadmap, more or less. James Turnbull will be speaking about Django 1.0 at the eighth Oxford Geek Night this Wednesday, and it looks like he's got plenty of triumphs to bulletpoint for us.

An Oxford Django sprint had been mooted for this weekend. I didn't hear much more about it, but to be honest I had the great opportunity to actually have my own sprint---against 1.0b1---in work this week, working on a fast-turnaround project. I definitely felt performance improvements, especially when running unit tests. It was also lovely to work on my first internationalized/localized site and to find that it was just a question of dropping in certain bits of middleware to make it work across six languages. We didn't have any translations in place, but I clicked on "Polszczyzna" expecting bugger-all to happen and then suddenly realised that the English-language link read "Anglieski." It's characteristic of Python's (and Django's) refreshingly plastic and just-works behaviour. Magic.

We did encounter one bug, involving model inheritance. I struggled for a while with registering with the project trac to report it. It's my first mediocre experience with Django: I waited a day or so for the arrival of an account-confirmation email, but eventually gave up without adding what would have admittedly been a me-too to an existing bug report. But then, email finally in my inbox, I chased it up just now, to find that it's been fixed. Today.

Probably much like Django itself, the project's interface with the user/consumer requires some past experience with its foibles, but the actual endeavour itself is fast, well-factored and puts most closed-source equivalents to shame.

Realplayer to mp3: a configurable Python wrapper

It’s one of the worst-kept tech secrets in the world, but Real Audio streams can be downloaded using software such as mplayer and then converted to MP3 format with lame. Both of these are available in Ubuntu using the non-Ubuntu package manager Automatix. The possibility of doing this conversion implies that, although the BBC offer all their programs in Real Audio and only a few as podcasts, you can in principle put any you like on your portable music device.

Similar solutions abound on the web: Tom Taylor has a method involving mencoder; other methods can be found all over the place. However, these all involve a bit of ad hoc command-line intervention, or scripts which aren’t terribly configurable. There are GUI and proprietary commands, but they tend not to offer great support for command-line and therefore scheduled operation.

I’ve knocked together a Python application called rmrip: it’s available in a tar file from If you unzip this to a directory you’ll find a number of .py files and a config.conf configuration file. Edit config.conf to match your system requirements and stream preferences, make sure is executable, then run it. mp3s should eventually appear in a subdirectory called YYYYMMDD unless you configure the system otherwise.

The application can in principle be run from a cronjob, so it could tick over late at night when everyone’s internet is otherwise nice and quiet. In addition, conversion works via a named pipe, which is a funky way of piping the intermediary, enormous .wav audio file straight into lame, rather than saving it to disk. This does unfortunately restrict the application to non-Windows machines, but it’s a great help for audiophiles with limited disk space: .ra and .mp3 files can be in the hundreds of megabytes for many-hour programmes, but the associated .wav would take up gigabytes.

Current requirements include (please give any feedback on this!):

  • mplayer and lame: their locations are configurable
  • The subprocess module in Python

Current file types supported:

  • Direct rtsp://….ra Real Audio stream links
  • http://….ram references to Real Audio streams
  • http://….rpm Real Audio playlists (BBC so far only format tested)

To get you started, has provided information on how to get stream information using a standalone Python program, and also has potentially out-of-date static pages detailing the current BBC streams.

Throw it all away!

I’ve recently been experimenting with calling external commands (mplayer and lame, so you might be able to guess what I’ve been doing) from within a scripting language (Python, although it needn’t have been as it turns out). Bizarrely, the external commands—argumentae intactae—worked absolutely fine on their own, chained together by me, by hand. However, when executed in the scripted environment the command that produced large volumes of output was stalling at –86.somethingMB, whereas the other command stalling at 7.737MB of output.

Very odd, I thought. Odder still that the higher-volume command was being permitted to write much bigger files before it stopped. So… it can’t be a limit imposed by Python on files creating runaway outputs. Unless it’s a bandwidth limit, so the second command was having its input stream throttled…. No documentation, though, and very few reports of the error on the web. So what could it be?

The external commands are opened with the subprocess.Popen method in Python, which lets you specify destinations for the three standard I/O streams: input, output and error. I tried setting the latter two to None. All the output from the child processes was splatted onto the screen, and lo! the Python script ran to its natural conclusion.

It turned out that both commands were producing an on-screen counter or ticker to denote progress, and as this was writing to standard output, the buffers provided by Python to collect output and errors were filling up. Once a command is told by the shell that it can no longer write into its output buffer then it can end up stalling indefinitely until the buffer is cleared!

When I added command-line parameters to turn the ticker off, and piped the output into files rather than into any temporary storage, the whole system ran very nicely indeed. I hope to publish it here soon.

Exit gracefully: it’s not always possible to run external commands in ultra-quiet mode. The diagnostics they produce might be handy, so you can’t suppress them. However, unless you have a good reason to hang onto the output within your program—post-processing, say, to extract meaningful error messages—then you should be directing them to files. Also, look out for progress counters that don’t seem to cause very much output: just because the screen isn’t scrolling past doesn’t mean that the command-line program isn’t generating reams and reams of diagnostics onto the standard output and error streams. And if you direct them into a file then you could end up with a log of gibberish. Try to think in a single dimension and a single direction, like a stream of text: there’s no support for such cute rewind-and-rewrite diagnostics in logfiles.

Subscribe to RSS - python