Wednesday, August 25, 2010

AWK - selecting columns for output

AWK is a super-handy old-skool UNIX tool. There are plenty of good tutorials out there for it, but I'm jotting down some basic uses here for my own benefit.

I have used AWK mainly to select and print certain columns from input, like this, which will print the 1st and 7th columns:
awk '{print $1,$7}' < /var/log/syslog
Columns break-up is determined by the value of the FS (input field separator) variable, which is space by default (in POSIX mode this actually means space and tab but not newline). You can change this with:
awk 'BEGIN {FS=";"}{print $1,$7}' < /var/log/syslog

OR

awk -F: '{print $1,$7}' < /var/log/syslog
The output from awk is separated by the OFS (output field separator) variable, also a space by default. To write out CSV you might use:
cat /var/log/syslog | awk -F: 'BEGIN{OFS=","}{print $1,$3}'
There is plenty more you can do with awk, including simple programming tasks such as counting, summing etc. cut is a simple alternative if all you want to do is cut fields from an input stream. It is doesn't take much to hit its limitations however. Consider the output of last, the first two columns of which look like this:
user     pts/0
user     pts/1
reboot   system boot
This awk command will print the first two columns correctly:
last | awk '{print $1,$2}'
Whereas this cut command:
last | cut -d" " -f1-5
Won't produce the first two columns cleanly and we need to specify 5 columns to try and skip the empty fields. The problem is there are variable numbers of spaces between the username and the tty line.

No comments: