PromQL — Querying Your Kubernetes Metrics
What is PromQL in Simple Terms?
PromQL is the language you use to ask questions of Prometheus. Questions like: what is the average CPU usage of all payment-api pods over the last 5 minutes? Which pods have restarted more than 3 times in the last hour? What percentage of requests are returning 5xx errors right now?
Every panel in a Grafana dashboard, every alert rule in Alertmanager, and every metric you see in the Prometheus web UI is powered by a PromQL query. Understanding PromQL is the difference between having monitoring and understanding what your monitoring is telling you.
The Core Data Types in PromQL
+------------------------------------------+| Instant Vector || A set of time series with ONE value each || at a specific point in time || || Example: http_requests_total || Returns: current counter value per pod |+------------------------------------------+ +------------------------------------------+| Range Vector || A set of time series with a RANGE of || values over a time window || || Example: http_requests_total[5m] || Returns: all values in last 5 minutes || Used with: rate(), increase(), delta() |+------------------------------------------+ +------------------------------------------+| Scalar || A single number with no labels || || Example: 3.14 || Used in: mathematical expressions |+------------------------------------------+Building PromQL Queries — Step by Step
# Step 1 — Select a metric by namehttp_requests_total# Returns: all time series with this metric name# Includes ALL labels: namespace, pod, status, method, etc. # Step 2 — Filter with label matchershttp_requests_total{namespace="production", service="payment-api"}# Returns: only payment-api time series in production namespace # Label matcher operators:# = exact match# != not equal# =~ regex match (e.g. status=~"5.." matches 500, 501, 502...)# !~ regex not match (e.g. status!~"2.." excludes 200, 201, 202...) # Step 3 — Apply a functionrate(http_requests_total{namespace="production"}[5m])# rate() calculates: (value now - value 5m ago) / 300 seconds# Result: requests per SECOND averaged over the last 5 minutes# This is the correct way to query counters # Step 4 — Aggregate across labelssum(rate(http_requests_total{namespace="production"}[5m]))# sum() adds up the per-second rate across all pods# Result: total requests per second for all production pods combined # Step 5 — Group by labelsum by (pod) (rate(http_requests_total{namespace="production"}[5m]))# Result: requests per second broken down per individual pod# Shows which pods are handling more or less trafficThe Functions You Will Use Most
# rate() — per-second rate of a counter over a time window# Use for: request rates, error rates, bytes transferredrate(http_requests_total[5m]) # increase() — total increase of a counter over a time window# Use for: total events (restarts, errors) in a periodincrease(kube_pod_container_status_restarts_total[1h])# How many times did this pod restart in the last hour? # irate() — per-second rate using only the last two data points# Use for: real-time spikes (more sensitive, more noisy)irate(http_requests_total[5m]) # avg_over_time() — average of a gauge over a time window# Use for: average memory usage over the last 30 minutesavg_over_time(container_memory_usage_bytes{pod="payment-api-xyz"}[30m]) # topk() — top N time series by value# Use for: which pods are using the most memory?topk(5, container_memory_usage_bytes{namespace="production"}) # histogram_quantile() — calculate percentile from histogram# Use for: P99 latency, P95 request durationhistogram_quantile(0.99, sum by (le) ( rate(http_request_duration_seconds_bucket{service="payment-api"}[5m]) ))# Returns: 99th percentile latency for payment-api over last 5 minutesProduction-Ready PromQL Queries for Kubernetes
# -- Pod Health Queries ----------------------------- # Pods that are NOT running (stuck Pending, Failed, Unknown)kube_pod_status_phase{phase!="Running", namespace="production"} == 1 # Pods with high restart count (flapping or CrashLoopBackOff)rate(kube_pod_container_status_restarts_total{ namespace="production"}[30m]) * 1800 > 3# Restarts more than 3 times in the last 30 minutes # Pods that are not ready (passing liveness but failing readiness)kube_pod_status_ready{condition="false", namespace="production"} == 1 # -- Resource Usage Queries ------------------------- # CPU usage as percentage of request (shows over-provisioning)sum by (pod) ( rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) /sum by (pod) ( kube_pod_container_resource_requests{ resource="cpu", namespace="production" }) * 100 # Memory usage as percentage of limit (shows pods near OOMKill)container_memory_usage_bytes{namespace="production"} /container_spec_memory_limit_bytes{namespace="production"} * 100# Alert when this exceeds 85% # Nodes approaching memory capacity(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) /node_memory_MemTotal_bytes * 100 > 85 # -- Traffic and Error Queries --------------------- # HTTP error rate percentage per servicesum by (service) ( rate(http_requests_total{status=~"5..", namespace="production"}[5m])) /sum by (service) ( rate(http_requests_total{namespace="production"}[5m])) * 100 # P99 API latency per servicehistogram_quantile(0.99, sum by (service, le) ( rate(http_request_duration_seconds_bucket{namespace="production"}[5m]) )) * 1000# Result in milliseconds # -- Scaling Queries ------------------------------- # HPA current vs desired replicas (shows scaling activity)kube_horizontalpodautoscaler_status_current_replicas / kube_horizontalpodautoscaler_spec_max_replicas# Values close to 1.0 mean the HPA is near its ceiling # Pending pods (not enough capacity — CA should kick in)kube_pod_status_phase{phase="Pending"} == 1Common PromQL Mistakes to Avoid
# WRONG — querying a counter directly (shows total since start, not rate)http_requests_total{service="payment-api"}# Returns: 47,832,091 (meaningless cumulative count) # CORRECT — use rate() to get per-second throughputrate(http_requests_total{service="payment-api"}[5m])# Returns: 245.3 (requests per second over last 5 minutes) # WRONG — using irate() for alert rules (too noisy)irate(http_requests_total[5m]) > 500# irate() spikes on single data point anomalies — alert fires and resolves constantly # CORRECT — use rate() for alert rules (smoothed average)rate(http_requests_total[5m]) > 500# rate() averages over the full window — stable, reliable alerting # WRONG — forgetting by() clause when summing across podssum(container_memory_usage_bytes{namespace="production"})# Returns: one number (total memory for all pods combined)# You lose which pod is causing the problem # CORRECT — add by (pod) to keep pod-level granularitysum by (pod) (container_memory_usage_bytes{namespace="production"})# Returns: memory per pod — now you can see which pod is the problemTesting PromQL Queries
# Access the Prometheus web UI for interactive query testingkubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring# Open http://localhost:9090# Use the Graph tab to test queries and visualise results # Query Prometheus from the command linekubectl exec -it prometheus-kube-prometheus-stack-prometheus-0 \ -n monitoring -- \ promtool query instant http://localhost:9090 \ 'rate(http_requests_total{namespace="production"}[5m])'PLACEMENT PRO TIP**Tip:** Always use `rate()` over a 5-minute window as your default time window for dashboard panels — it provides a smooth curve without too much lag. For alerting rules, use `rate()` over 5 minutes with a `for: 2m` clause so the alert only fires if the condition holds for 2 minutes continuously, preventing false alerts from single-point metric spikes.
REMEMBER THIS**Remember:** PromQL metrics from Kubernetes have many labels — namespace, pod, container, node, service. Always filter by namespace first in every query (`namespace="production"`) otherwise you get metrics from all namespaces mixed together, which makes graphs unreadable and alert thresholds meaningless.
COMMON MISTAKE / WARNING**Common Mistake:** Querying `container_memory_usage_bytes` to alert on memory pressure. This metric includes cache memory which Linux can reclaim at any time. It shows higher than actual memory pressure. Use `container_memory_working_set_bytes` instead — this is the metric the kubelet itself uses for OOMKill decisions and eviction calculations. It excludes reclaimable cache and gives you the true memory pressure number.
COMMON MISTAKE / WARNING**Security:** Never embed Prometheus or Grafana access tokens directly in PromQL queries or dashboard JSON that gets committed to Git. Use Grafana data source variables for credentials and store sensitive values in Kubernetes Secrets referenced by the Grafana deployment — not hardcoded in dashboard configuration files.