Overview and What You Will Learn
systemd is PID 1 on every modern Linux server. It starts all services at boot, manages their lifecycle, restarts them on failure, and collects their logs. If a payment service at Razorpay crashes at 3 AM, systemd restarts it automatically within 5 seconds — before any alert even fires.
By the end of this lab you will:
- Confidently use all essential
systemctlcommands for production service management - Read and interpret
systemctl statusoutput to diagnose failures - Write a complete systemd unit file for a Node.js application from scratch
- Use override files to customise package-installed unit files without editing them
- Query, filter, and follow logs with
journalctl - Understand systemd targets and the boot sequence
Why This Matters in Production
Before systemd, starting a service on reboot required writing brittle shell scripts. Restart-on-failure required monit or supervisor. Log collection was scattered across files. systemd unified all of this into one system — but only engineers who understand it can harness it.
When a Swiggy microservice fails, the first command is systemctl status payment-api. It shows the current state, the last 10 log lines, the exit code, and whether systemd is retrying. That 5-second status check contains everything needed to decide the next action.
Core Principles
systemd boot sequence:
+------------------------------------------+| Kernel loads, starts PID 1: systemd |+------------------------------------------+ | v+------------------------------------------+| systemd reads unit files from: || /usr/lib/systemd/system/ (packages) || /etc/systemd/system/ (admin) |+------------------------------------------+ | v+------------------------------------------+| Dependency resolution || After=, Requires=, Wants= evaluated |+------------------------------------------+ | v+------------------------------------------+| Services start in parallel || (where dependencies allow) |+------------------------------------------+ | v+------------------------------------------+| multi-user.target reached || Server is ready for SSH and workloads |+------------------------------------------+Unit file anatomy — the three sections:
+------------------------------------------+| [Unit] || Description, After=, Requires=, Wants= |+------------------------------------------+| [Service] || User=, ExecStart=, Restart=, Env= |+------------------------------------------+| [Install] || WantedBy=multi-user.target |+------------------------------------------+Detailed Step-by-Step Practical Lab
Milestone 1 — Essential systemctl commands
## Start a servicesudo systemctl start nginx ## Stop a servicesudo systemctl stop nginx ## Restart (stop then start)sudo systemctl restart nginx ## Reload config without restarting (sends SIGHUP)sudo systemctl reload nginx ## Enable at boot (creates symlink in target wants directory)sudo systemctl enable nginx ## Enable AND start immediately (most common combination)sudo systemctl enable --now nginx ## Disable from starting at bootsudo systemctl disable nginx ## Check current statussystemctl status nginx ## Check if active (exit 0 = yes, exit 3 = no)systemctl is-active nginx ## Check if enabled (exit 0 = yes, exit 1 = no)systemctl is-enabled nginx ## Reload systemd after editing any unit filesudo systemctl daemon-reloadMilestone 2 — Read systemctl status output
systemctl status nginxnginx.service - A high performance web server and a reverse proxy server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; preset: enabled) ^file location ^starts at boot Active: active (running) since Mon 2024-01-15 10:00:00 UTC; 2h 15m ago ^state ^when it started ^how long running Process: 1098 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=0/SUCCESS) Main PID: 1100 (nginx) Tasks: 5 (limit: 4915) Memory: 12.4M CGroup: /system.slice/nginx.service +-1100 nginx: master process /usr/sbin/nginx -g daemon off; +-1101 nginx: worker process +-1102 nginx: worker process Jan 15 10:00:00 mumbai-prod-node-1 systemd[1]: Starting nginx...Jan 15 10:00:00 mumbai-prod-node-1 nginx[1098]: nginx: configuration test is OKJan 15 10:00:00 mumbai-prod-node-1 systemd[1]: Started nginx.## When a service fails, status shows the exit code and signalsystemctl status payment-api## Active: failed (Result: exit-code) since ...## Process: 2341 ExecStart=... (code=exited, status=1/FAILURE) ## List all failed services immediatelysystemctl list-units --type=service --state=failed ## List all services with their statesystemctl list-units --type=service ## Show services that start at bootsystemctl list-unit-files --type=service --state=enabledMilestone 3 — Write a unit file for a Node.js application
## Create the unit filesudo tee /etc/systemd/system/payment-api.service << 'EOF'[Unit]Description=Payment API ServiceDocumentation=https://internal.razorpay.in/docs/payment-api# Start after network is up and PostgreSQL is readyAfter=network.target postgresql.service# If PostgreSQL fails, this service fails tooRequires=postgresql.service [Service]# Run as dedicated service account -- never rootUser=payment-svcGroup=payment-svc # Working directory for the processWorkingDirectory=/opt/payment-api # Load environment from file (keeps secrets out of unit file)EnvironmentFile=/etc/payment-api/env# Individual environment variablesEnvironment=NODE_ENV=productionEnvironment=PORT=4000 # The command to start the serviceExecStart=/usr/bin/node /opt/payment-api/index.js # Command to reload config without restart (if app supports it)ExecReload=/bin/kill -HUP $MAINPID # Restart policy: restart if it exits with non-zero or signalRestart=on-failureRestartSec=5# Max 3 restarts in 60 seconds before giving upStartLimitInterval=60StartLimitBurst=3 # File descriptor limit for high-traffic serviceLimitNOFILE=65536 # Service type: simple = ExecStart IS the main processType=simple [Install]WantedBy=multi-user.targetEOF ## Load the new unit filesudo systemctl daemon-reload ## Enable and startsudo systemctl enable --now payment-api ## Verify it is runningsystemctl status payment-apiMilestone 4 — Override package unit files without editing them
Editing files in /usr/lib/systemd/system/ is wrong — package upgrades overwrite changes. Use override files instead.
## Method 1: systemctl edit (recommended -- opens editor, creates override)sudo systemctl edit nginx ## This creates: /etc/systemd/system/nginx.service.d/override.conf## Type your overrides:[Service]# Increase file descriptor limitLimitNOFILE=65536# Always restart, not just on failureRestart=alwaysRestartSec=3## Method 2: Manual override filesudo mkdir -p /etc/systemd/system/nginx.service.d/sudo tee /etc/systemd/system/nginx.service.d/limits.conf << 'EOF'[Service]LimitNOFILE=65536EOF ## Reload and restart to applysudo systemctl daemon-reloadsudo systemctl restart nginx ## Verify the override is appliedsystemctl show nginx -p LimitNOFILE## LimitNOFILE=65536 ## Show the full effective unit file (base + all overrides)systemctl cat nginxMilestone 5 — Query logs with journalctl
## Logs for a specific servicejournalctl -u payment-api ## Follow logs live (like tail -f)journalctl -u payment-api -f ## Last 50 linesjournalctl -u payment-api -n 50 ## Since a specific timejournalctl -u payment-api --since "1 hour ago"journalctl -u payment-api --since "2024-01-15 10:00" --until "2024-01-15 11:00" ## Only errors and above (emerg, alert, crit, err)journalctl -u payment-api -p err ## Only since last bootjournalctl -u payment-api -b ## Without the pager (useful in scripts)journalctl -u payment-api --no-pager | grep "ERROR" | tail -20 ## All logs from a specific PIDjournalctl _PID=1350 ## Show disk usage of journaljournalctl --disk-usage ## Vacuum old journal logs (keep last 2 weeks)sudo journalctl --vacuum-time=2weeksMilestone 6 — Understand targets and the boot sequence
## Show current target (runlevel equivalent)systemctl get-default## multi-user.target ## Common targets:## multi-user.target -- standard server mode, no GUI## graphical.target -- with GUI (not used on servers)## rescue.target -- single-user recovery mode## emergency.target -- minimal emergency shell ## List all targetssystemctl list-units --type=target ## Change default targetsudo systemctl set-default multi-user.target ## Analyse boot time -- find slow servicessystemd-analyze## Startup finished in 1.234s (kernel) + 8.567s (userspace) = 9.801s ## Show which services are slowestsystemd-analyze blame | head -15 ## Show critical path of bootsystemd-analyze critical-chainProduction Best Practices and Common Pitfalls
| Scenario | Wrong | Correct |
|---|---|---|
| Service config changed | Edit /usr/lib/systemd/system/ file | Create /etc/systemd/system/service.d/override.conf |
| Service not restarting | Restart server | Check systemctl status, read journalctl |
| Secrets in unit file | Environment=DB_PASS=secret | EnvironmentFile=/etc/service/env (with 640 permissions) |
| Unit file change silent | Edit file and restart | Edit file, daemon-reload, then restart |
| Debug a failing start | Guess | journalctl -u service -n 50 immediately |
Quick Reference and Troubleshooting Commands
| Task | Command |
|---|---|
| Start service | sudo systemctl start name |
| Enable + start | sudo systemctl enable --now name |
| Check status | systemctl status name |
| View logs live | journalctl -u name -f |
| View errors only | journalctl -u name -p err |
| After editing unit | sudo systemctl daemon-reload |
| List failed | systemctl list-units --state=failed |
| Boot time analysis | systemd-analyze blame |
| Validate unit file | systemd-analyze verify /etc/systemd/system/name.service |
PLACEMENT PRO TIP**Tip:** `systemd-analyze verify /etc/systemd/system/payment-api.service` catches unit file syntax errors before you try to start the service. Run it every time you write or edit a unit file.
REMEMBER THIS**Remember:** After editing any systemd unit file, always run `sudo systemctl daemon-reload` before starting or restarting the service. Without this, systemd runs the old cached version of the unit file and your changes have no effect.
COMMON MISTAKE / WARNING**Common Mistake:** Using `systemctl restart` when `systemctl reload` is available. Restart stops all connections and starts fresh. Reload sends SIGHUP, which gracefully reloads config without dropping active connections. For nginx and many other services, reload is the zero-downtime option.
COMMON MISTAKE / WARNING**Security:** Never put secrets directly in unit files. The `Environment=` directive is visible to any user who runs `systemctl cat servicename`. Always use `EnvironmentFile=` pointing to a file with `chmod 640` and `chown root:service-group` permissions.