Originally published by Robert Beisert at fortcollinsprogram.robert-beisert.com

Linux + C – Your New Friend, Awk

In modern internet lingo, awk is a synonym for awkward or abnormal. However, the Linux tool awk (and its brethren, gawk and nawk) are anything but awkward.

Awk – the programmable filter

So far, we’ve looked at a set of filters that perform fairly singular tasks. LESS and MORE are designed to filter output to the screen for easier viewing. SED is designed to use regular expressions to find and replace values in a stream. CAT is designed to print a file to the terminal, and ECHO is designed to print some text to the terminal.

What we’ve lacked so far is a filter whose operations are almost entirely under our control. While we had a lot of power with sed, we could only reliably use it to replace simple patterns in a stream; it wasn’t designed to, say, print a set of fields from a table to our terminal, then create a running sum of those field values.

This is where awk comes into play.

AWK (named after its developers, Aho, Weinberger, and Kernighan) is a pseudo-C language designed to manipulate files (and streams, but that’s not it’s best use), allowing us to create complex filtering operations with simple syntax and operations. There are a number of things we can teach the filter to do:

  • Find and replace in a similar manner to sed
  • Create multi-dimensional arrays
  • Print individual fields out of a file (extremely useful for filtering log files and raw data)
  • Print only specific lines out of a file, based on length or contents
  • Perform arithmetic operations, including increment and decrement

Perhaps there are even more applications for this fascinating filter, but so far I’ve not had need for them.

Three Stages of Awk

There are three basic phases of an awk program: the beginning, middle, and end. (Ed. Whoa, what a crazy concept)

The beginning phase occurs before awk begins to operate on the file. We can set up variables here, print lines of text, and perform system operations (terminal commands) exactly once, before we start processing any data.

The middle phase is performed on every line of the input file(s). This is where we do most of our filtering.

The end phase occurs exactly once, when the middle phase has finished reading all inputs. Generally, we use this phase to print out some results, or provide a message, or call some system operation.

An awk script basically looks like this:

awk '
BEGIN{}
{}
END{}
'

Soon we’ll look at some of the nifty features that make awk such a powerful tool.

photo by: