Overview and What You Will Learn
Every production incident involves reading logs. Every deployment script processes command output. Every monitoring system parses structured data. Text processing tools are the lenses through which Linux engineers see what is happening in their systems.
By the end of this lab you will:
- Filter log files with
grepusing regex patterns and context flags - Extract and transform fields with
awkone-liners - Perform find-and-replace operations with
sed - Cut columns from delimited files with
cut - Sort, deduplicate, and count with
sortanduniq - Parse and query JSON from APIs with
jq - Build multi-stage log analysis pipelines combining all tools
Why This Matters in Production
A Hotstar on-call engineer gets paged at 2 AM. The service is returning errors. The log file has 500,000 lines. Without text processing skills, finding the root cause takes 20 minutes of scrolling. With them: grep -E 'ERROR|FATAL' /var/log/app.log | awk '{print $5}' | sort | uniq -c | sort -rn | head -5 finds the top 5 error types in 3 seconds.
Core Principles
Text processing pipeline — raw log to insight:
+------------------------------------------+| Raw log: 500,000 lines || access.log (nginx access log) |+------------------------------------------+ | grep '500' | v+------------------------------------------+| Filtered: 1,247 error lines |+------------------------------------------+ | awk '{print $7}' | v+------------------------------------------+| Extracted: URL paths only |+------------------------------------------+ | sort | uniq -c | sort -rn | v+------------------------------------------+| Ranked: top URLs generating 500 errors || 342 /api/payment/process || 198 /api/cart/checkout |+------------------------------------------+Detailed Step-by-Step Practical Lab
Milestone 1 — grep for filtering
## Basic searchgrep 'ERROR' /var/log/app.log ## Case-insensitivegrep -i 'error' /var/log/app.log ## Show line numbersgrep -n 'ERROR' /var/log/app.log ## Count matchesgrep -c 'ERROR' /var/log/app.log ## Invert (lines NOT matching)grep -v 'DEBUG' /var/log/app.log ## Show context: 2 lines before, 3 lines after each matchgrep -B 2 -A 3 'CRITICAL' /var/log/app.log ## Extended regex (alternation, +, ?)grep -E 'ERROR|FATAL|CRITICAL' /var/log/app.log ## Multiple files -- show filenamegrep -l 'ERROR' /var/log/*.log ## Recursive search in directoriesgrep -r 'DATABASE_URL' /etc/ --include='*.conf'grep -r 'API_KEY' /opt/apps/ --include='*.py' --include='*.js' ## Highlight matches (useful for piping to less)grep --color=always 'ERROR' /var/log/app.log | less -R ## Search for whole words onlygrep -w 'error' /var/log/app.log## Matches 'error' but not 'errors' or 'error_code' ## Perl regex (lookahead, lookbehind)grep -P '(?<=user_id=)\d+' /var/log/app.log## Extracts digits that follow 'user_id=' ## Quiet mode for scriptsif grep -q 'FATAL' /var/log/app.log; then echo "Fatal errors detected -- alerting on-call"fiMilestone 2 — awk for field extraction
## Print specific fields (whitespace-delimited by default)## nginx access log: IP - - [date] "METHOD /path HTTP" status bytescat /var/log/nginx/access.log | awk '{print $1, $7, $9}'## 203.0.113.45 /api/payment 200 ## Custom field separatorawk -F: '{print $1, $3}' /etc/passwd ## user:UIDawk -F, '{print $1, $4}' /tmp/report.csv ## CSV columns 1 and 4 ## Filter with condition then extract## Show PIDs of processes using more than 10% CPUps aux | awk '$3 > 10 {print $2, $11, $3"%"}' ## Count pattern occurrencesawk '/ERROR/ {count++} END {print "Errors:", count}' /var/log/app.log ## Sum a columnawk '{sum += $10} END {printf "Total bytes: %.2fMB\n", sum/1024/1024}' /var/log/nginx/access.log ## Group and count## Count requests per status codeawk '{codes[$9]++} END {for (code in codes) print code, codes[code]}' /var/log/nginx/access.log | sort -rn -k2 ## Formatted table outputdf -h | awk 'NR==1 || $5+0 > 70 {printf "%-20s %5s %5s\n", $6, $5, $4}'## Shows header row plus filesystems over 70% full ## Process specific line rangeawk 'NR>=100 && NR<=200 {print NR": "$0}' /var/log/app.log ## Multi-field log analysis## Format: timestamp service level message## 2024-01-15T10:23:45 payment-api ERROR Database connection failedawk '/ERROR/ {services[$2]++} END {for (s in services) print s, services[s]}' /var/log/app.log## Shows error count per serviceMilestone 3 — sed for transformation
## Basic substitution (first occurrence per line)sed 's/localhost/prod-db.razorpay.internal/' config.yaml ## Global substitution (all occurrences)sed 's/localhost/prod-db.razorpay.internal/g' config.yaml ## In-place edit (always backup first)sed -i.bak 's/debug: true/debug: false/' /etc/app/config.yaml## Creates config.yaml.bak with original content ## Delete linessed '/^#/d' config.yaml ## remove commentssed '/^$/d' config.yaml ## remove blank linessed '/DEBUG/d' /var/log/app.log ## remove debug lines ## Extract lines between markerssed -n '/BEGIN CERT/,/END CERT/p' certificate.pem ## Substitute only on lines matching a pattern## Only change port on lines containing 'payment-api'sed '/payment-api/s/8080/4000/' docker-compose.yaml ## Multiple expressionssed -e 's/localhost/10.0.2.100/g' -e 's/5432/5433/g' config.yaml ## Use different delimiter (useful when pattern contains /)sed 's|/etc/nginx|/etc/nginx-prod|g' config.conf ## Add line before patternsed '/^\[Service\]/i User=payment-svc' myapp.service ## Add line after patternsed '/^ExecStart=/a Restart=on-failure' myapp.service ## Comment out a linesed -i '/PasswordAuthentication yes/s/^/#/' /etc/ssh/sshd_config ## Remove trailing whitespace from all linessed -i 's/[[:space:]]*$//' file.txtMilestone 4 — cut, sort, and uniq for structured data
## cut: extract columns from delimited text## -d sets delimiter, -f selects field(s)cut -d: -f1 /etc/passwd ## extract usernamescut -d: -f1,3 /etc/passwd ## fields 1 and 3cut -d, -f2,4 report.csv ## CSV columns 2 and 4cut -c1-10 /var/log/app.log ## first 10 characters of each line ## sort: sort linessort /etc/hosts ## alphabeticalsort -n numbers.txt ## numericsort -rn numbers.txt ## reverse numericsort -k2 -t: /etc/passwd ## sort by field 2, colon-delimitedsort -u /var/log/ips.txt ## sort and deduplicate ## uniq: work with consecutive duplicate lines## (always sort first for full deduplication)sort /var/log/ips.txt | uniq ## deduplicatesort /var/log/ips.txt | uniq -c ## count occurrencessort /var/log/ips.txt | uniq -d ## show only duplicates ## Combined: frequency analysis## Top 10 IPs from nginx access logawk '{print $1}' /var/log/nginx/access.log | \ sort | uniq -c | sort -rn | head -10 ## Top error messagesgrep 'ERROR' /var/log/app.log | \ sed 's/.*ERROR //' | \ sort | uniq -c | sort -rn | head -10Milestone 5 — jq for JSON processing
## Install jq if not presentsudo apt install jq ## Pretty-print JSONcurl -s https://api.internal/status | jq . ## Extract a fieldcurl -s https://api.internal/health | jq '.status'## "healthy" ## Extract nested fieldcurl -s https://api.internal/metrics | jq '.services.payment.latency_ms' ## Extract from arraycurl -s https://api.internal/servers | jq '.[0].hostname'curl -s https://api.internal/servers | jq '.[].hostname' ## all hostnames ## Filter array by conditioncurl -s https://api.internal/services | jq '.[] | select(.status == "down")'curl -s https://api.internal/services | jq '[.[] | select(.healthy == false)]' ## Build new JSONcurl -s https://api.internal/services | \ jq '{name: .name, status: .status, uptime: .uptime_seconds}' ## Extract multiple fields as TSVcurl -s https://api.internal/services | \ jq -r '.[] | [.name, .status, .region] | @tsv' ## Count elementscurl -s https://api.internal/services | jq 'length'curl -s https://api.internal/services | jq '[.[] | select(.status == "down")] | length' ## Process docker inspect outputdocker inspect payment-api | jq '.[0].NetworkSettings.IPAddress'docker inspect payment-api | jq '.[0].State.Status' ## Process kubectl outputkubectl get pods -o json | jq '.items[] | {name: .metadata.name, status: .status.phase}'Milestone 6 — Production log analysis pipeline
## Production incident log analyser## Usage: ./analyse-logs.sh /var/log/app/app.log LOG_FILE="${1:-/var/log/app/app.log}"SINCE="${2:-1 hour ago}" echo "=== Log Analysis: $LOG_FILE ==="echo "=== Period: since $SINCE ==="echo "" echo "--- Error Summary ---"grep -c 'ERROR' "$LOG_FILE" || echo "0 errors"grep -c 'FATAL' "$LOG_FILE" || echo "0 fatals"echo "" echo "--- Top 5 Error Types ---"grep 'ERROR' "$LOG_FILE" | \ awk '{$1=$2=$3=""; print $0}' | \ sed 's/^ *//' | \ sort | uniq -c | sort -rn | head -5echo "" echo "--- Error Rate by Minute (last 10 minutes) ---"grep 'ERROR' "$LOG_FILE" | \ awk '{print $1"T"substr($2,1,5)}' | \ sort | uniq -c | tail -10echo "" echo "--- Services with Most Errors ---"grep 'ERROR' "$LOG_FILE" | \ awk '{print $4}' | \ sort | uniq -c | sort -rn | head -5echo "" echo "--- Slowest API Calls (>1000ms) ---"grep 'duration_ms' "$LOG_FILE" | \ jq -r 'select(.duration_ms > 1000) | "\(.duration_ms)ms \(.path)"' 2>/dev/null | \ sort -rn | head -10Production Best Practices and Common Pitfalls
| Task | Slow Approach | Fast Approach |
|---|---|---|
| Find error lines | `cat log | grep ERROR` |
| Count unique IPs | Loop and count manually | `sort ips.txt |
| Find large files | Browse directories | `du -sh /* |
| Extract JSON field | String splitting | jq '.fieldname' |
| Remove duplicate lines | Manual comparison | `sort file |
Quick Reference and Troubleshooting Commands
| Task | Command |
|---|---|
| Filter log | `grep -E 'ERROR |
| Extract field | awk '{print $3}' logfile |
| Custom delimiter | awk -F: '{print $1}' /etc/passwd |
| Replace text | sed 's/old/new/g' file |
| In-place edit | sed -i.bak 's/old/new/g' file |
| Frequency count | `sort file |
| JSON field | jq '.fieldname' response.json |
| Filter JSON array | `jq '.[] |
PLACEMENT PRO TIP**Tip:** Avoid the `cat file | grep pattern` antipattern (Useless Use of Cat). `grep pattern file` is faster and more direct. The extra `cat` process is unnecessary. Similarly, `grep pattern file | awk '{...}'` can often be `awk '/pattern/{...}' file` in a single process.
REMEMBER THIS**Remember:** `uniq` only removes consecutive duplicate lines. If duplicates are scattered throughout a file, `uniq` alone will not catch them. Always sort first: `sort file | uniq -c | sort -rn` gives frequency analysis of all duplicates regardless of position.
COMMON MISTAKE / WARNING**Security:** Avoid parsing `/etc/passwd` or security-sensitive files with regex tools in scripts that process user input. An injection via a username containing special characters could manipulate your awk or sed pattern. Use dedicated tools (`getent passwd username`) for user lookups in security-sensitive contexts.
COMMON MISTAKE / WARNING**Common Mistake:** Using `sed -i` without first testing the expression. Always run `sed 's/old/new/g' file` without `-i` first to preview the output. One wrong regex on a production config file can break a service until the backup is restored.