Updated Mon Feb 5 10:22:02 EST 2024
Available in paperback and e-book formats. Order at Amazon and other fine booksellers.
Introduction
This page holds material related to the second edition of The AWK Programming Language. The first edition was written by Al Aho, Brian Kernighan and Peter Weinberger in 1988. Awk has evolved since then, there are multiple implementations, and of course the computing world has changed enormously. The new edition of the Awk book reflects some of those changes.The book is now available on paper and electronically. We are continuing to add material that we hope will be of interest -- historical documents, bits of code, and occasional essays on Awk and related topics.
The table of contents and preface of the
new edition is here.
Programs and data files are now available, though not in a very orderly form. Download programs.tar (33 MB).
Contact us at info@awk.dev.
Errata
(These are listed in page number order.)
Sep 20, 2023, page 3, line -5:
The input line Kathy 15.50 10 should be in italic. Thanks to Galen Menzel for spotting this error.
Oct 6, 2023, page 9, line -2:
It should say "$2 is less than 20 ..." to match the code. Thanks to Kevin Lo for spotting this error.
Oct 10, 2023, page 24:
The test at the top of the page should be
NR == 10 { exit }
Otherwise the code prints 11 lines, not 10. Thanks to
Mark Konezny.
Sep 16, 2023, pages 26 and 27:
The streaming version of mc one page 26 prints output in 7 columns, not 5, since the loop starts with n set to zero and ends with n = 6. Here's a better version:
{ out = sprintf("%s%-10.10s ", out, $0)
if (++n >= 5) {
print substr(out, 1, length(out)-2)
out = ""
n = 0
}
}
The second version of mc on page 27 has a different problem: it doesn't include the two spaces between columns when computing the number of columns, so the result is always too high. Probably easiest fixed like this:
ncol = int(60 / (max+2) + 0.5) # int(x) returns integer value of xMany thanks to 郭济琳 (Jilin Guo) for spotting these errors.
Oct 8, 2023, page 32, line 9:
The expression s = s $n++ " " in the function rest works on Awk but not on Gawk. It's an ambiguity in resolving the precedences of the prefix $ and the postfix ++. Fixed by adding parentheses:
s = s $(n++) " "The same construction appears near the middle of the page and is fixed in the same way. Thanks to 郭济琳 (Jilin Guo).
Oct 8, 2023, page 33, line -10:
The split function is better written as
split(date, d)to properly handle single-digit days that might be preceded by two spaces instead of one. Alternatively, the third argument could be / +/. Thanks to 郭济琳 (Jilin Guo).
Nov 4, 2023, page 42, line -10:
The test should be $5 < 0.5 to match the text, which says "What about beer with less that say 0.5%?" Thanks to 郭济琳 (Jilin Guo).
Nov 21, 2023, page 88:
The derivation at the bottom of the page is missing a couple of intermediate states. It should read
Sentence -> Nounphrase Verbphrase
-> the girl Verbphrase
-> the girl Verb Modlist Adverb
-> the girl runs Modlist Adverb
-> the girl runs very Modlist Adverb
-> the girl runs very very Modlist Adverb
-> the girl runs very very Adverb
-> the girl runs very very quickly
Thanks to Eran Yarkon.
Nov 21, 2023, page 94, line -10:
In
pfx = tolower($0) gsub(/[^A-Za-z]/, "", pfx)the RE doesn't need A-Z since pfx has no upper case letters. Thanks to Eran Yarkon.
Nov 27, 2023, page 111, line 8 of section 7.2:
The display should read title caption, not label.
On page 114, the line { ok = 1 } about 12 lines up from the bottom of the page doesn't do anything since ok is set to zero by the next pattern.
On page 115, inside the for loop in the END block, there's no need to test flag again; it's never empty at this point. Thanks to Eran Yarkon for these.
Awk for Exploratory Data Analysis (Sep 21, 2023)
Awk has always been a good tool for taking a quick look at some dataset. How many items of what kind are there? What is the range of numeric values in some field? Are there anomalies in the data, like rows with too many or too few fields?A new chapter in the book talks about using Awk for this kind of analysis, using a couple of datasets. But there are plenty of other examples as well. BWK co-taught a course in the Humanities sequence at Princeton where Awk was taught to some very non-technical students as a tool for looking at some neat data about English poetry.
This essay describes some of what went on there; it might give you some ideas about how Awk can be used in a different domain.
Interesting Threads
Ben Hoyt, author of GoAwk and one of the expert technical reviewers of the second edition, has an interesting blog post on an implementation of the Unix make command in 60 lines of Awk, along with a Python version for comparison. One wouldn't make make in Awk, as Ben notes, but it's a good vehicle for learning how something works. (Sep 21, 2023)There's a Hacker News thread on Ben's original post here, with some interesting comments.
Awk Source and Documentation
Awk source is maintained at https://github.com/onetrueawk/awk.Gawk releases are at https://ftp.gnu.org/gnu/gawk; the Gawk manual is here.
Arnold Robbins has compiled a list of other implementations of Awk.
Historical Documents
The citations in the original Awk book have by now become quite dusty, but some of the material is still interesting and potentially useful. Here are references to some of the documents, perhaps updated.-
AWK - A Pattern Scanning and Processing Language,
the original Awk paper from Software Practice and Experience, 1979.
CSTR 118: An internal technical report on Awk, dated June 1985, so it's not the original language but more or less the one described in the Awk book in 1988.
dformat: Dformat is an Awk program, originally written by Jon Bentley, for drawing data-format diagrams. The version here comes from Arnold Robbins (to whom thanks), who has fixed it up and made it work properly in today's environments.
chem: Chem was an experiment in little languages, a language for describing chemical structure diagrams. (Think benzene rings on steroids.) It wasn't much used but it was a good exercise. The link above is to a somewhat blurry but complete PDF of the original chem paper by Bentley, Lynn Jelinski and bwk, published in Computational Chemistry in 1987.
indexing programs: One of the examples in the original Awk book was a simplified version of indexing tools first created by Jon Bentley, and used both for both editions of the book. The link above provides the code; the paper was published in Electronic Publishing -- Origination Dissemination and Design in 1988. The second edition of the book has somewhat modified code. Thanks to Taj Khattra for finding the EP-ODD source.
Algorithm animation: by Jon Bentley and bwk. The paper A System for Algorithm Animation (1991) describes a system for embedding simple graphics commands in program output that could be used to display an "animated" version of the output. It all worked on monochrome displays, so it's totally dated now, but it was neat at the time. The original Computing Science Technical Report (CSTR) 132 is here.
Netlib's typesetting collection Includes some links to chem and indexing programs
Interview with Al Aho about Awk in Computerworld, May 2008
Interview with Brian Kernighan about Awk and AMPL, Computerworld, October 2009
Autre temps, autres Awks
awk.news (dmr) Tue Jul 15 23:47:57 1986 Rosa Miller, a Tlinget [sic] Indian who lives in Juneau, ... is a member of the Dipper House of the Dog Salmon Clan of the Raven Moiety of the Awk Tribe of the Tlingit (pronounced KLINK-it) Nation.... Mrs. Miller contends that the Awk Tribal Council in Juneau was set up by people who were not Awks but, as she calls them, "Johnny come latelies" to the area.... New York Times, 7/14, p. A8 CORRECTION Because of a transmission error, the Alaska Journal yesterday, from Anchorage, misidentified an Indian tribe. It is the Auk, not Awk. New York Times, 7/15, p. B1
Awk (adj, obs; also awke, auk, awck) [from ON afug, turned the wrong way, back foremost, perverse] 1. Directed the other way or in the wrong direction, back-handed, from the left hand. 1634: "With an awke stroke gaue hym a grete wounde." 2. Untoward, froward, perverse, in nature or disposition. 1642: "Our natures more crooked, inconstante, awk, and perverse." 3. Out of the way, odd, strange (rare) [fortunately] 4. Untoward to deal with, awkward to use, clumsy. There are also awkly, awkness, awkward, awkwardish, awkwardly, awkwardness, and awky.
In Scotland, upon April Day, they have a custom of ``hunting the gowk ...', properly, a cuckoo, and is used here, metaphorically in vulgar language, for a fool. This is done by sending silly people upon fools' errands from place to place, by means of a letter in which it is written: ``On the first day of April, Hunt the gowk another mile.'' John Brand's ``Observations on Popular Antiquities, 1813'' (c) Jeffrey Kacirk
Do you have one to add? Send it along! Mail to info@awk.dev.
