Understanding awk
What Is awk in Simple Terms
While grep finds lines, awk processes fields within those lines. If grep is a filter, awk is a transformation engine. Every line of input is automatically split into fields ($1, $2, $3...) separated by whitespace (or a custom delimiter), and you write rules that process those fields.
ps aux | awk '{print $2, $11}' — prints just the PID and command name columns from ps output, discarding everything else.
How It Works
Input line: "rahul 1234 45.3 2.1 node index.js" awk splits by whitespace: $1 = rahul (user) $2 = 1234 (PID) $3 = 45.3 (CPU%) $4 = 2.1 (MEM%) $5 = node (command) $6 = index.js (argument) $NF = index.js (last field) NR = line number NF = 6 (number of fields)+------------------------------------------+| Input line: || rahul 1234 45.3 node index.js |+------------------------------------------+ | awk splits by whitespace (default) | v+------------------------------------------+| $1=rahul $2=1234 $3=45.3 $NF=index.js|| NR=line number NF=field count |+------------------------------------------+ | awk '{print $1, $2}' (print fields 1,2) | v+------------------------------------------+| Output: rahul 1234 |+------------------------------------------+Practical Commands
## Print specific fieldsps aux | awk '{print $1, $2, $11}' ## user, PID, commanddf -h | awk '{print $5, $6}' ## usage%, mount point ## Custom field separatorawk -F: '{print $1, $3}' /etc/passwd ## username and UIDawk -F, '{print $1, $3}' data.csv ## first and third CSV columns ## Conditional processingps aux | awk '$3 > 10 {print $2, $11}' ## PIDs using >10% CPUdf -h | awk '$5+0 > 80 {print $6, $5}' ## filesystems >80% full ## BEGIN and END blocksawk 'BEGIN {print "=== Report ==="} {sum += $3} END {print "Total CPU:", sum}' <(ps aux) ## Pattern matchingawk '/ERROR/ {print NR, $0}' /var/log/app.log ## numbered error linesawk '!/DEBUG/ {print}' /var/log/app.log ## skip debug lines ## Formatted outputdf -h | awk '{printf "%-20s %5s\n", $6, $5}'## / 62%## /var 45% ## Multi-field operations## Count errors by service from structured logs:## 2024-01-15 10:23:45 payment-api ERROR Connection failedawk '/ERROR/ {errors[$3]++} END {for (svc in errors) print svc, errors[svc]}' /var/log/app.log ## Sum a columnawk '{sum += $2} END {print "Total:", sum}' numbers.txt ## Process only specific line rangeawk 'NR==10,NR==20 {print}' /var/log/app.logTroubleshooting
| Symptom | Command | What to Check |
|---|---|---|
| Wrong field selected | `echo 'a b c' | awk '{print $2}'` |
| Numeric comparison fails | awk '$3+0 > 80' |
Force numeric with +0 |
| Multi-character separator | awk -F '::' '{print $1}' |
-F accepts regex |
PLACEMENT PRO TIP**Tip:** `awk 'NR==1{print; next} !seen[$0]++{print}'` is the classic awk one-liner for removing duplicate lines while preserving the first occurrence and keeping the header row. Useful for deduplicating log entries.
REMEMBER THIS**Remember:** awk field numbering starts at $1. $0 is the entire line. $NF is always the last field regardless of how many fields there are. This makes `awk '{print $NF}'` reliable for extracting the last word/column even when the number of fields varies.