9 Comments

Summary:

I really enjoy the overall experience reading books and articles on my Sony PRS-500 eBook reader, but dislike having to fire up Boot Camp or VMware into Windows in order to purchase books from the Sony eBook Store, especially when there are thousands of books in […]

I really enjoy the overall experience reading books and articles on my Sony PRS-500 eBook reader, but dislike having to fire up Boot Camp or VMware into Windows in order to purchase books from the Sony eBook Store, especially when there are thousands of books in the public domain and tons of blog and article content on the internets for free.

The problem lies with getting this information onto said device. to make my life easier, I use a utility that first appeared in OS X 10.4 called textutil. As you will see, the utility of this small tool goes far beyond formatting content for eBook readers. As always, fire up Terminal.app and have it ready to roll as we delve once again down to the command line.

Formats A-Plenty

While I have a very specific and regular use-case for textutil, there are plenty of features that make it a highly useful and general purpose tool. (For most of the examples, I will be using text and HTML version of A Christmas Carol by Charles Dickens) The first lies in format support. textutil can convert from/to txt, html, rtf, rtfd, doc, docx, wordml, odt and webarchive. Picture a scenario where you receive a large number of HTML documents from an existing project that are just wretched and you really only need access to the raw text to begin anew. While you may have techniques for stripping HTML tags, textutil can do the heavy lifting for you with ease:

$textutil -convert txt ChristmasCarol.html

Since you can specify as many files as you like on the command line, batch processing an entire directory is just as easy:

$textutil -convert txt *.html

If you have an article broken up into many pieces and want to convert (or keep in the same format) and concatenate them into one large file just use the -cat option:

$textutil -cat html *.txt

If you have a look at the texutil manual page, you will see that you have complete control over the location, name and extension of output files and can even specify font name and style. This is very handy for my use-case since I have a certain base font size I like to use with the reader:

textutil -convert rtf -font Times -fontsize 14 ArticleToConvert.html

You also do not need to save HTML files from your browser first. The -stdin option lets you work some further command line magic (by pairing textutil with curl) to convert your data directly from the web:

 

$curl --silent http://slashdot.org/ | textutil -stdin -convert txt -output slashdot.txt

 

Metadata Madness

textutil does its best to preserve file information, but you may not want to keep such data around or you may want to modify it in some way. The -strip option clears away all metadata while the -title, -author, -subject, -comment, -editor, and -company flags all take parameters that let you specify your own values for each field. You can add your own metadata keywords via the -keywords option and even modify the creation and modification dates through -creationtime and -modificationtime flags.

Unearthing textutil From the Command Line

While dropping into Terminal.app to do some conversions is fine, it would be easier for most users if there was a more accessible way to perform conversion tasks, especially if they are somewhat routine operations. For this, we turn to the power of AppleScript and its ability to make Droplets, which are nothing more than applications that respond to specific events. Fire up Script Editor and enter the following code:

 

on open droppedFiles
	repeat with macFile in droppedFiles
		set unixFile to quoted form of POSIX path of macFile
		set shellScript to ("/usr/bin/textutil -convert rtf -font Times -fontsize 14 " & unixFile)
		display alert shellScript
		do shell script shellScript
	end repeat
	return
end open

 

Save the script as both a normal script (so you can edit it later) and then save it as an application (so you can make it a Droplet). Now you have a handy tool which you can drop any number of files on to batch convert right from the Finder. You can customize this script to perform the transformations you need and create as many droplets as you see fit.

Download the source code

  1. Great post. I have never used textutil, but now that I know about it, I can think of many good uses for it.

    Share
  2. You should check out Calibre – its an opensource eBook manager that can be used with the Sony to get items onto and off of it. It also can do some file conversion too for different eBook formats: http://calibre.kovidgoyal.net/

    Share
  3. @Randy that is an awesome resource. *thank you*. @Sam I’m hoping to expose more cool, hidden utils

    Share
  4. textutil seems like the sort of Unix application just begging for someone to create a full-featured GUI version with drag-and-drop. Anyone interested?

    Share
  5. @Mike lay out an interface that you think would work and I’ll be glad to put one together.

    Share
  6. Bob, have a look at whsmith.co.uk they do prs500 compatible ebooks cheaper then sony’s store and you can use adobe on mac with the prs now

    Share
  7. there’s an online tool that converts some formats to .lrf
    http://www.lib2go.com

    Share
  8. [...] more palatable than starting from scratch, I’d say. As with any command-line based tool there are options a plenty so knock yourself [...]

    Share
  9. [...] Tales From the Command Line: textutil Useful tour through the textutil utility, a command line utility in Mac OS X for converting among txt, doc, docx, rtf, rtfd, HTML, wordml, odt (?), and webarchive formats. (tags: macosx) [...]

    Share

Comments have been disabled for this post