Dig Into Unix: Sed and Awk



Time again to pop a shell and dig into the deep, geeky Unix internals of OS X with Dig Into Unix. Today we are going to look at two top-shelf power tools for text editing: sed and awk.

Sed is a Stream EDitor, and if you recall our previous Dig Into Unix installment concerning standard streams, you’ll understand that the streams we are talking about are actually just text from one source or another. Sed’s bread and butter is text search and replace, very similary to the “Edit” and “Find…” functions in TextEdit and many other GUI text editors. Unlike those text editors though, sed, by default, will write its output to the screen, or stdout.

As an example, try some basic operations on this string of text:
The quick brown fox jumped over the lazy dog's back.

Save the string of text as a file named test.txt, and type this into the Terminal:
sed s/quick/slow/g test.txt

The fox is now slow on the screen, but not changed in the file itself. To follow the stream, the text came from the file, through sed, and to the screeen. The best set of examples I’ve found for getting right into sed and starting to play with it is the collection of sed one liners hosted at Sourceforge.

Personally, I use sed when I’ve got a large number of configuration files that need to be edited. For example, it might be decided that we do not need our Nagios monitoring system alerting on the a certain statistic. I could go into 100 different files and perform the same action on all of them, or I could rely on a simple shell script and sed to do it for me.

for each in `ls *.cfg`; do
mv $each $each.bak #Safety First!
sed '30,35s/^/#/g' $each.bak > $each

This will plow through all of the config files in a certain directory and add a # sign at the beginning of lines 30 through 35, commenting those lines out. Then I can restart Nagios, and if all goes well, delete all of the .bak files created as backups by the script.

While sed operates on lines and regular expressions (the subject of a future Dig Into Unix article!), awk works with fields. When given a stream of text, either from a text file or piped in from another application, awk can manipulate the text and rearrange the words. By default, awk separates the text fields by a space character, but you can use any other character you’d like.

Like sed, awk also has a great collection of one liners, this collection here is a great resource collected by Eric Pement. In my day to day activities, I call on awk when I want to format text for a report or to be input into another application.

To use our quick brown fox example again, we can print only the fourth, third, and second words, in reverse order, with this command:
awk '{ print $4, $3, $2 }' test

That will print out “fox brown quick”. Not very practical or useful. Something more practical might be to manipulate a list of comma separated values. Using the “-F” flag in awk, you can tell awk to separate it’s fields based on the comma, as in:
awk -F, '{ print $1 }' test

Since there are no commas in the test file, this will print the entire string of text. So, we could run this command to take care of that:
sed s/ /,/g test > test2

This will use sed to replace all spaces with commas. The backslash is there to escape a special character, so the space is interpreted literally and not as part of a command. Now, you could use awk to manipulate the string of text as needed.
awk -F, '{ print $7, $8, $9, $10, $5, $6, $1, $2, $3, $4 " did" }' test2

This article has just barely skimmed the surface of what sed and awk can do. There are some rather hefty books dedicated to the pair, this one from O’Reilly has been on my desk for years now. Awk is an entire programming language, but the point of this series is not to teach the in-depth details, it’s just to get your feet wet, and maybe, just maybe leave you thirsting for more. The real rub is that everything that sed and awk can do can also be done, at times more efficiently, with the practical extraction and report language…better known as Perl, which is the subject of a future Dig Into Unix article.



The problems with

for each in ‘ls *.cfg‘;

Are the following:

1) you are starting another process to handle this
2) you are neglecting files with spaces in the name(which makes it wrong).

That’s what makes the proposed loop wrong. The sed -i solution is the best, but the -i isn’t posix.


You actually did for each in ls? Seriously?

Why not just: for each in *.cfg; which also has the added benefit of being correct.

Eric Crist

You are probably using bash, which is not guaranteed on all systems, or at least guaranteed to be the user’s shell. The ‘correct’ way is to use the for each syntax, and not each, as Jon did, above.

For example:

[email protected]:~-> each in *.txt ; do cat $each
each: Command not found.
each: Undefined variable.


@Eric Crist: you’re missing the point of Tal.

Both the original article and Tal’s answer use “for each in…”, but the original article uses “ls *.cfg” whereas Tal only uses “*.cfg”

The “ls” part is the difference…

And note that if I remember correctly, “for … in” will only work for the *sh family and not *csh shells, so I’d say there is no ‘correct’ way ;)

Hagen Kaye

Awk is great, I use it all the time.

Another great scripting language is Expect (which is a TCL extension).

Between the two I’ve done some neat automated scripts.

Chris Boulton

sed actually has an -i argument for editing in place (editing a file). The great thing about it, is that it can also automatically create backups for you by specifying the extension after the -i option.

So the first example can be something like:

sed -i .bak '30,35s/^/#/g' *.cfg

Nifty, huh?


Indeed, moreover it seems that the ‘g’ flag is useless here (there is usually only one “line beginning” per line).

Thus the example becomes
sed -i .bak ‘30,35s/^/#/’ *.cfg

Comments are closed.