Dig Into Unix: Sed and Awk


Time again to pop a shell and dig into the deep, geeky Unix internals of OS X with Dig Into Unix. Today we are going to look at two top-shelf power tools for text editing: sed and awk.

Sed is a Stream EDitor, and if you recall our previous Dig Into Unix installment concerning standard streams, you’ll understand that the streams we are talking about are actually just text from one source or another. Sed’s bread and butter is text search and replace, very similary to the “Edit” and “Find…” functions in TextEdit and many other GUI text editors. Unlike those text editors though, sed, by default, will write its output to the screen, or stdout.

As an example, try some basic operations on this string of text:
The quick brown fox jumped over the lazy dog's back.

Save the string of text as a file named test.txt, and type this into the Terminal:
sed s/quick/slow/g test.txt

The fox is now slow on the screen, but not changed in the file itself. To follow the stream, the text came from the file, through sed, and to the screeen. The best set of examples I’ve found for getting right into sed and starting to play with it is the collection of sed one liners hosted at Sourceforge.

Personally, I use sed when I’ve got a large number of configuration files that need to be edited. For example, it might be decided that we do not need our Nagios monitoring system alerting on the a certain statistic. I could go into 100 different files and perform the same action on all of them, or I could rely on a simple shell script and sed to do it for me.

for each in `ls *.cfg`; do
mv $each $each.bak #Safety First!
sed '30,35s/^/#/g' $each.bak > $each

This will plow through all of the config files in a certain directory and add a # sign at the beginning of lines 30 through 35, commenting those lines out. Then I can restart Nagios, and if all goes well, delete all of the .bak files created as backups by the script.

While sed operates on lines and regular expressions (the subject of a future Dig Into Unix article!), awk works with fields. When given a stream of text, either from a text file or piped in from another application, awk can manipulate the text and rearrange the words. By default, awk separates the text fields by a space character, but you can use any other character you’d like.

Like sed, awk also has a great collection of one liners, this collection here is a great resource collected by Eric Pement. In my day to day activities, I call on awk when I want to format text for a report or to be input into another application.

To use our quick brown fox example again, we can print only the fourth, third, and second words, in reverse order, with this command:
awk '{ print $4, $3, $2 }' test

That will print out “fox brown quick”. Not very practical or useful. Something more practical might be to manipulate a list of comma separated values. Using the “-F” flag in awk, you can tell awk to separate it’s fields based on the comma, as in:
awk -F, '{ print $1 }' test

Since there are no commas in the test file, this will print the entire string of text. So, we could run this command to take care of that:
sed s/ /,/g test > test2

This will use sed to replace all spaces with commas. The backslash is there to escape a special character, so the space is interpreted literally and not as part of a command. Now, you could use awk to manipulate the string of text as needed.
awk -F, '{ print $7, $8, $9, $10, $5, $6, $1, $2, $3, $4 " did" }' test2

This article has just barely skimmed the surface of what sed and awk can do. There are some rather hefty books dedicated to the pair, this one from O’Reilly has been on my desk for years now. Awk is an entire programming language, but the point of this series is not to teach the in-depth details, it’s just to get your feet wet, and maybe, just maybe leave you thirsting for more. The real rub is that everything that sed and awk can do can also be done, at times more efficiently, with the practical extraction and report language…better known as Perl, which is the subject of a future Dig Into Unix article.