Menu
Pick a module and start learning specific DevOps tools or concepts at your own pace.
Learn GitHub Actions - build automated CI/CD pipelines, run tests, deploy to cloud, and trigger workflows on every code push.
Learn networking for DevOps engineers - TCP/IP, DNS, HTTP, subnets, firewalls, and how traffic flows in production systems.
Learn Docker from the ground up - understand containers, images, Dockerfile, volumes, networking, and Docker Compose. This module takes you from zero to confidently building, running, and shipping containerized applications the way it's done in real DevOps and cloud environments.
Learn how to stop clicking through cloud consoles and start managing servers, networks, and databases with code - the way every modern DevOps team does it.
Learn Linux for DevOps from scratch - file system, permissions, processes, bash scripting, and production server management.
Learn shell scripting and Bash for DevOps - write production scripts, automate tasks, handle errors, and schedule with cron.
Learn Git and GitHub for DevOps - commits, branching, merging, pull requests, rebasing, and team collaboration workflows.
From understanding what CI/CD is to building real pipelines on Azure DevOps - everything a DevOps engineer needs to know, in the right order.
A complete guide to GitLab - from version control and merge requests to building production-grade CI/CD pipelines, container registry, security scanning, and deployment automation.
Learn Kubernetes from scratch - pods, deployments, services, ingress, Helm, and production-grade container orchestration.
Learn DevOps monitoring and logging - Prometheus, Grafana, ELK stack, alerting, and building production observability pipelines.
This module teaches you how software goes from raw source code to a deployable artifact.
Security is not something you add at the end. It runs through every step of building and deploying software. This module teaches you the mindset, the tools, and the pipeline practices that make security part of your daily DevOps workflow - not an afterthought.
A complete, practical reference for DevOps engineers. Covers all 20 core AWS services with architecture, flows, real-world usage, CLI commands, and comparisons.
Master Apache Kafka - producers, consumers, topics, partitions, consumer groups, and building event-driven architectures at scale.
AWS Signer · Notation · Ratify · Gatekeeper on EKS
AWS Signer · Notation · Ratify · Gatekeeper on EKS · Account A (us-east-1) signs - Account B (ap-southeast-1) enforces
AWS Signer · Notation · Ratify · Gatekeeper on EKS · Account A signs - Account B enforces · Same Region (us-east-1)
Automate file transfer from Amazon S3 to FSx for OpenZFS so that every file uploaded to a specific S3 path automatically appears on FSx - no manual commands needed after setup.
Learn Python from absolute zero - variables, data types, loops, functions, files, APIs, and real projects. No prior experience needed.
Learn anomaly detection with Isolation Forest - understand how the algorithm works, apply it to real ops metrics, and build a complete anomaly detector from scratch.
Learn prompt engineering from scratch - master zero-shot, few-shot, chain-of-thought, and ops-specific patterns to get reliable results from any LLM.
Learn how AI agents work from scratch - reasoning loops, tool calling, ReAct pattern, and build a real ops incident investigator agent in pure Python.
Give your agent a memory built from your own runbooks and ops docs - so it stops guessing and starts answering from what your team actually knows.
During a production incident, 200 alerts fire and 180 of them are noise from one root cause. This module teaches you how to group, deduplicate, inhibit, and correlate alerts so your team sees one clear signal instead of a storm
Seven ready-to-use prompt templates for RCA, runbook generation, incident updates, and postmortems — so your AI agent reasons like a senior engineer during real incidents.
Build guardrails that prevent your AI agent from running destructive commands in production — with command risk classification, output validation, and human approval gates.
Learn when to use cloud models vs local open source models for ops workloads. Run Llama locally with Ollama for sensitive data and high-volume tasks — zero data leaves your network.
Learn the eight failure patterns that cause production incidents at scale — cascading failures, split-brain, thundering herds, and more — plus CAP theorem, circuit breakers, and the reasoning process that turns alert storms into structured diagnosis.
Learn a structured 5-step process to diagnose production incidents — reading logs, metrics, and traces together to find root cause fast instead of guessing.
Learn to build MCP servers that give AI agents safe, structured access to your Kubernetes cluster, Prometheus metrics, and internal runbooks.
Learn to instrument Python services with OpenTelemetry, collect traces through the OTel Collector, and use distributed tracing to find the exact cause of production incidents.
Learn to write Ansible playbooks, build automated ops runbooks, and connect Prometheus alerts to self-healing workflows that fix production issues without human intervention.
Learn to forecast ops metrics, detect anomalies, and predict capacity issues before they happen using Prophet and ARIMA in Python.
Learn what DevSecOps is, how attackers think, how mature teams embed security into every stage of software delivery, and how to model threats before writing a single line of code.
Learn how to secure Git repositories from leaked secrets, unauthorized commits, and supply chain attacks - covering pre-commit hooks, Gitleaks, TruffleHog, branch protection, CODEOWNERS, signed commits, and Dependabot.
Learn how to build and harden CI/CD pipelines - covering OIDC federation, Vault secrets injection, least privilege runners, GitHub Actions permissions, Jenkins hardening, artifact signing with Cosign, and audit logging.
Learn how to embed security into the development phase — covering SonarQube SAST with quality gates, Snyk and OWASP Dependency Check for SCA, CVSS vulnerability prioritization, and HashiCorp Vault dynamic secrets management including Kubernetes sidecar injection.
Learn DAST with OWASP ZAP, generate Software Bills of Materials with Syft, scan for vulnerabilities with Grype, sign artifacts with Cosign, and understand SLSA provenance - through the lens of real supply chain attacks like SolarWinds and XZ Utils.
Learn how to secure Docker containers from image hardening and vulnerability scanning to runtime protection - covering non-root users, minimal base images, multi-stage builds, Trivy scanning, seccomp/AppArmor profiles, capability dropping, read-only filesystems, and the Docker socket threat.
Learn how to secure Kubernetes clusters end to end - covering RBAC and least privilege, Pod Security Standards, Network Policies, Secrets encryption at rest, service account hardening, Falco runtime security, and etcd protection.
Learn how to secure AWS environments end to end - covering IAM least privilege, S3 public access blocking, CloudTrail audit logging, GuardDuty threat detection, Security Hub compliance, KMS encryption, Secrets Manager, VPC security groups, SCPs with AWS Organizations, and AWS Config compliance monitoring.
Learn how to secure Infrastructure as Code with Terraform - covering sensitive variable handling, state file protection, provider credential security, Checkov scanning with CI/CD integration, OPA policy-as-code, and tfsec static analysis to catch misconfigurations before deployment.
Master zero trust identity architecture - covering IAM RBAC and JIT access, OIDC workload federation for Kubernetes on AWS and GCP, HashiCorp Vault PKI and dynamic secrets, SPIFFE/SPIRE workload identity, OAuth2 PKCE, Privileged Access Management, Zero Standing Privileges, Kubernetes service account hardening, and automated secrets rotation.
Build a production-grade runtime security and detection pipeline -covering eBPF fundamentals, Falco syscall-based threat detection with custom rules, Tetragon eBPF enforcement, MITRE ATT&CK mapping for Cloud and Containers, Sigma rules and SIEM conversion, Elasticsearch SIEM with alert correlation and noise reduction, auditd Linux syscall monitoring, OpenTelemetry for security observability, and automated incident response runbooks.
Learn how SOC 2, ISO 27001, and PCI-DSS map to your pipelines, automate compliance evidence, and run incident response from detection to blameless postmortem.
Master supply chain security, secrets detection, fuzzing, threat modeling, chaos engineering, eBPF policies, zero-day response, and CIS benchmarks.
Go beyond CI/CD scanning - learn threat modeling, runtime security, supply chain integrity, cloud security, and compliance automation.
Understand what developers build and deploy - REST APIs, HTTP, Nginx, databases, environment variables, health checks, and microservices - from an infrastructure perspective, not a developer one.
Master GitOps principles and ArgoCD - from understanding why GitOps exists to operating multi-environment deployments, App of Apps, ApplicationSets, and progressive delivery with Argo Rollouts.
Master FinOps for Kubernetes — understand unit economics, implement cost visibility with Kubecost, cut EC2 costs 60-70% with Karpenter and Spot instances, and right-size workloads to eliminate waste.
Learn what separates Platform Engineers from DevOps Engineers - multi-tenancy, admission controllers, policy as code with Kyverno, CRDs, the operator pattern, and Cluster API for declarative cluster lifecycle management.
Build an Internal Developer Platform from first principles - understand why IDPs exist, what problems they solve, how to implement a software catalog and golden paths with Backstage, and how to measure platform success with DORA metrics.
Build a production-grade application platform from scratch - React frontend, Node.js API, PostgreSQL, Redis — containerised with Docker, deployed to Kubernetes with Ingress, NetworkPolicies, HPA, PDB, resource limits, health checks, Prometheus monitoring, and GitOps delivery via ArgoCD.
Provision a complete production-grade AWS infrastructure from scratch using Terraform - VPC, EKS cluster, IAM roles, IRSA, ECR, and remote state. The infrastructure that Capstone 1's application runs on in a real company.
Build a complete GitOps delivery platform - App of Apps pattern, multi-environment promotion from staging to production, ApplicationSets for scale, and progressive delivery with Argo Rollouts canary deployments.
Build a working Internal Developer Platform using Backstage - software catalog, golden path template that creates a new service end-to-end, TechDocs, and Kubernetes integration.
The final test. Take a running application, inject real production failures - pod crashes, OOMKills, bad deployments, cost spikes, network issues - and learn to diagnose and fix each one.
The mega-capstone. One developer action in Backstage triggers the entire platform - Terraform provisions infrastructure, ArgoCD deploys the application, Prometheus monitors it, Kyverno validates policies, Kubecost tracks spend. Everything from Capstones 1-5 working together as one complete Platform Engineering system.
Build a real AIOps anomaly detection pipeline - collect live Kubernetes metrics from Prometheus, run machine learning to detect anomalies automatically, and fire alerts when something unusual happens. No more waiting for users to report problems.
Build an AI agent that receives production alerts, queries Prometheus for context, retrieves runbooks via RAG, and suggests remediation steps automatically.
Wire anomaly detection, AI diagnosis, and Ansible remediation into one autonomous pipeline that detects, diagnoses, and fixes production incidents without human intervention.
Build a production-grade secure pipeline where every push triggers SAST, SCA, secrets detection, container scanning, SBOM generation, and image signing before any code reaches production.
Lock down a production Kubernetes cluster using RBAC, Pod Security Admission, NetworkPolicies, OPA Gatekeeper, Falco runtime detection, and automated compliance scanning.
The mega-capstone. Wire Terraform IaC scanning, secrets management with Vault, zero-trust mTLS, security chaos engineering, and a unified compliance dashboard into one production security platform.