Prometheus — The Metrics Engine of Every Kubernetes Cluster
What is Prometheus in Simple Terms?
Before Prometheus, monitoring meant applications pushing metrics to a central server. Prometheus flipped this — it pulls (scrapes) metrics from your applications on a schedule. Every 15 seconds, Prometheus visits every configured target, fetches its /metrics endpoint, and stores the numbers in its time-series database.
At Zerodha, Prometheus scrapes 500+ pods every 15 seconds — collecting CPU usage, request latency, order processing rates, and database query times. This data powers the Grafana dashboards that SREs watch during market hours and the alerts that wake engineers up when error rates spike.
How Prometheus Works — The Pull Model
+------------------------------------------+| Prometheus (every 15 seconds) |+------------------------------------------+ | | | | HTTP GET | HTTP GET | HTTP GET | /metrics | /metrics | /metrics v v v+----------+ +----------+ +-------------------+| Pod A | | Pod B | | Node Exporter || payment | | order | | (on each node) || :8080 | | :8080 | | :9100 || /metrics | | /metrics | | /metrics |+----------+ +----------+ +-------------------+ Each /metrics endpoint returns:http_requests_total{method="GET", status="200"} 47832http_requests_total{method="POST", status="500"} 12process_memory_bytes 268435456Installing Prometheus on Kubernetes
The standard installation is through the kube-prometheus-stack Helm chart which installs the full observability stack:
helm repo add prometheus-community \ https://prometheus-community.github.io/helm-chartshelm repo update kubectl create namespace monitoring helm install kube-prometheus-stack \ prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set prometheus.prometheusSpec.retention=30d \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=gp3 \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi # Verify all components are runningkubectl get pods -n monitoring# prometheus-kube-prometheus-stack-prometheus-0 2/2 Running# alertmanager-kube-prometheus-stack-alertmanager-0 2/2 Running# kube-prometheus-stack-grafana-xxx 3/3 Running# kube-prometheus-stack-kube-state-metrics-xxx 1/1 Running# kube-prometheus-stack-node-exporter-xxx 2/2 Running (one per node) # Access the Prometheus web UIkubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring# Open http://localhost:9090Key Components That Come With the Stack
+------------------------------------------+| kube-state-metrics || Exposes Kubernetes object state as || metrics: pod counts, deployment health, || node conditions, PVC status || Target: /metrics on port 8080 |+------------------------------------------+ +------------------------------------------+| Node Exporter (DaemonSet) || Exposes host-level metrics from every || node: CPU, memory, disk I/O, network, || filesystem usage || Target: /metrics on port 9100 per node |+------------------------------------------+ +------------------------------------------+| cAdvisor (built into kubelet) || Exposes container-level metrics: || container CPU, memory, filesystem || Target: kubelet /metrics/cadvisor |+------------------------------------------+Configuring Prometheus to Scrape Your Application
Prometheus uses ServiceMonitor CRD objects to know which services to scrape. You do not edit Prometheus config directly:
# Step 1 — Add /metrics endpoint to your application# Example: Node.js with prom-client library # In your Node.js app:# const client = require('prom-client')# const register = new client.Registry()# client.collectDefaultMetrics({ register })# app.get('/metrics', async (req, res) => {# res.set('Content-Type', register.contentType)# res.send(await register.metrics())# }) # Step 2 — Create a ServiceMonitor to tell Prometheus about your serviceapiVersion: monitoring.coreos.com/v1kind: ServiceMonitormetadata: name: payment-api namespace: production labels: release: kube-prometheus-stack # Must match Prometheus serviceMonitorSelectorspec: selector: matchLabels: app: payment-api # Selects your Service endpoints: * port: http # Named port on the Service path: /metrics # Where metrics are exposed interval: 15s # Scrape frequency scrapeTimeout: 10s # Timeout per scrape namespaceSelector: matchNames: * production# Verify Prometheus discovered your targetkubectl port-forward svc/prometheus-operated 9090:9090 -n monitoring# Open http://localhost:9090/targets# Your service should appear with state=UP # If it shows DOWN, check:# 1. ServiceMonitor label matches Prometheus serviceMonitorSelector# 2. Service port name matches ServiceMonitor endpoint port# 3. /metrics endpoint returns valid Prometheus formatPrometheus Storage and Retention
# Check current storage usagekubectl exec -it prometheus-kube-prometheus-stack-prometheus-0 \ -n monitoring -- \ df -h /prometheus # Check retention configurationkubectl get prometheus kube-prometheus-stack-prometheus \ -n monitoring -o yaml | grep retention# retentionSize: 80GB (delete oldest data when storage hits 80GB)# retention: 30d (delete data older than 30 days) # Compact and defrag the database if it becomes slowkubectl exec -it prometheus-kube-prometheus-stack-prometheus-0 \ -n monitoring -- \ promtool tsdb analyze /prometheusEssential Prometheus Operational Commands
# Check which targets Prometheus is scraping (and their health)# http://localhost:9090/targets after port-forward # Check active alert ruleskubectl get prometheusrule -A # View Prometheus configurationkubectl get secret prometheus-kube-prometheus-stack-prometheus \ -n monitoring -o jsonpath='{.data.prometheus\.yaml\.gz}' | \ base64 -d | gunzip # Force Prometheus to reload configurationkubectl exec -it prometheus-kube-prometheus-stack-prometheus-0 \ -n monitoring -- \ kill -HUP 1 # Check Prometheus logs for scrape errorskubectl logs prometheus-kube-prometheus-stack-prometheus-0 \ -n monitoring -c prometheus --tail=50 | grep -i errorTroubleshooting Common Prometheus Issues
| Problem | Likely Cause | Fix |
|---|---|---|
| Target shows DOWN | Pod not exposing /metrics correctly | kubectl exec -it <pod> -- curl localhost:8080/metrics |
| ServiceMonitor not discovered | Label selector mismatch | Check ServiceMonitor has release: kube-prometheus-stack label |
| High memory usage on Prometheus | Too many high-cardinality metrics | Check for label explosion: kubectl exec prometheus -- promtool tsdb analyze /prometheus |
| Slow queries in Grafana | Prometheus not enough memory | Increase prometheusSpec.resources.limits.memory in Helm values |
| Missing metrics after pod restart | Scrape interval too long | Reduce interval: 15s — metrics lost between restarts are normal |
REMEMBER THIS**Remember:** Prometheus stores data locally — it is not replicated or distributed by default. If the Prometheus pod is deleted or its persistent volume is lost, all historical metrics are gone. Always use `storageSpec` with a PersistentVolumeClaim and `reclaimPolicy: Retain` so the data survives pod restarts and accidental PVC deletion.
PLACEMENT PRO TIP**Tip:** Avoid high-cardinality labels in your custom metrics. A label that has thousands of unique values (like a user ID or request ID) creates thousands of separate time series — each consuming storage and memory in Prometheus. Good labels have low cardinality: `status` (5-10 values), `method` (5-10 values), `service` (tens of values). Bad labels: `user_id`, `request_id`, `session_token`.
COMMON MISTAKE / WARNING**Common Mistake:** Using Prometheus for long-term storage beyond 30 days. Prometheus is designed for short-to-medium term storage. For compliance requirements, capacity planning, or year-over-year comparisons at Razorpay or Hotstar scale, add Thanos or Grafana Mimir as a long-term storage backend. These write Prometheus data to object storage (S3) for unlimited retention at low cost.
COMMON MISTAKE / WARNING**Security:** The Prometheus web UI at port 9090 and the metrics endpoints on your pods should never be exposed to the public internet. Metrics data reveals internal service names, error patterns, and infrastructure topology that attackers can use for reconnaissance. Always firewall the port-forward to localhost and use Kubernetes NetworkPolicies to restrict which pods can access the Prometheus service.