Exporting all Libreoffice worksheets as CSV

I've been recently looking into importing into Drupal from an Excel spreadsheet, and obviously the first step when confronted with an Excel spreadsheet is to say:

Oh my dear heavens. Can I export this as CSV?

It turns out that exporting a single sheet (the active one) to CSV is trivial. Obviously you can open it in Libreoffice and do that manually, but there's also unoconv which lets you do it from the command line.

Exporting e.g. all sheets from one spreadsheet, each into a different CSV file, is much trickier: manually, and automatically. Here's what unoconv's maintainer said about being willing to add the functionality to do so:

But for your specific case, modifying unoconv should be very simple (look for UNO code for selecting a sheet, or removing sheets) and integrate it just before the export takes place.

... in 2011. Even as of 2014, it looks like hacking away at the Python code of /usr/bin/unoconv, is the only way to make it work. Now, I like Python as much as the next man, but I'd prefer to do this in a more maintainable way.

A more interesting option - exporting semi-manually, if you like - is to script the process with Libreoffice macros, possibly the most minimally tutorialized (as opposed to mere reference documentation of functions) programming language you're likely to encounter this side of Google Spreadsheet formulae. But once you work out what you need to do, and get a feel for how the language works (hint: Visual Basic) then it's fairly straightforward;

Sub ExportAllToCsv
	document = ThisComponent
 
	' Use the global string tools library to generate a base filename for each CSV
	' based on the current prefixless filename
	GlobalScope.BasicLibraries.loadLibrary("Tools")
	BaseFilename = Tools.Strings.GetFileNameWithoutExtension(document.getURL(), "/")
 
	' Work out number of sheets for looping over them later.	
	Sheets = document.Sheets
	NumSheets = Sheets.Count - 1
 
	' Set up a propval object to store the filter properties
	Dim Propval(1) as New com.sun.star.beans.PropertyValue
	Propval(0).Name = "FilterName"
	Propval(0).Value = "Text - txt - csv (StarCalc)"
	Propval(1).Name = "FilterOptions"
	Propval(1).Value ="59,34,0,1,1"   'ASCII  59 = ;  34 = "
		
	For I = 0 to NumSheets
		' For each sheet, assemble a filename and save using the filter
		SheetName = Sheets(I).Name
		document.getCurrentController.setActiveSheet(Sheets(I))		
		Filename = "/tmp/" + BaseFilename + "." + SheetName + ".csv"
		FileURL = convertToURL(Filename)
		document.StoreAsURL(FileURL, Propval())
	Next I
 
	' Libreoffice thinks our filename is now the last-exported-CSV filename, so close.
	Msgbox "Files saved as /tmp/" + BaseFilename + ".*.csv. You'll need to close the spreadsheet now."
End Sub

I doubt if I would want to write a whole web application in it, but that particular script really hit the spot. I might try more Libreoffice macro scripting again in future. If I have to.

Comments

If it were me, I'd do it in Java with Apache POI. If you want fast + low memory + xls only, then the XLS2CSVmra example + a few lines to start writing to a new file on each new sheet should do you. If you can use a bit more memory, and want to support .xlsx files too, then the ToCSV example + a few lines to write a new file on each sheet will do you there too. Would mean you'd have to write some Java though...

Java? Meh, it'll never catch on. Too many curly brackets. [runs]

Thank you JP for this!

One question: where can I find the definitions for the property values for com.sun.star.beans.PropertyValue?  In particular I'm looking for how to set the equivalent of the Export Text File Field Options like Field Delimiter {Tab} and Text Delimiter <blank>.  I tried some googling, and also rooting around in the macro editor, but can't seem to find it.

Thanks again.

Thanks for the feedback! I'm afraid that, over a year after I wrote the original post, I couldn't tell you how to get an exhaustive list of the properties, let alone documentation on how to manipulate them. I wasn't kidding when I called it:

 the most minimally tutorialized programming language you're likely to encounter....

But if you do come up with any solutions yourself, please do either add them as a comment or link to your own blogpost, as I dare say it will come in handy in the future.

Thanks a lot for the script! Saved a lot of time.

The following documentation about the property values may help other users adapt the script to their needs:

http://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filt...

(Sorry, I had to break the link to pacify the spam filter.)

Thanks, Christopher. I thought it let links through but maybe it's having a bit of a fit. I've fixed it for you anyway as I think it's a useful resource.

Add new comment