Union Filesystem (OverlayFS) — How Docker Layers Work
What Is OverlayFS in Simple Terms?
OverlayFS is the Linux filesystem that makes Docker images and containers efficient. Instead of copying gigabytes of data for every container, Docker stacks read-only image layers and adds a thin read-write layer on top for each container. All containers running from the same image share the image layers — only their individual changes are stored separately.
The name "union filesystem" comes from the fact that it presents a unified view of multiple underlying directories merged together.
+------------------------------------------+| Container Read-Write Layer (unique) || /app/logs/app.log (created at runtime) || /etc/nginx/conf.d/site.conf (modified) |+==========================================+| Image Layer 4: App source code || /app/dist/server.js || /app/dist/routes.js |+==========================================+| Image Layer 3: npm node_modules || /app/node_modules/express/ |+==========================================+| Image Layer 2: package.json || /app/package.json |+==========================================+| Image Layer 1: node:20-alpine base || /usr/bin/node || /usr/lib/ || /bin/sh |+==========================================+ Container sees ONE unified filesystem: /usr/bin/node (from Layer 1) /app/package.json (from Layer 2) /app/node_modules/ (from Layer 3) /app/dist/ (from Layer 4) /app/logs/ (from RW layer)How OverlayFS Works on Disk
# OverlayFS uses three directories:# lowerdir = read-only image layers (stacked)# upperdir = container's read-write layer# workdir = temporary working directory (OverlayFS internal)# merged = the unified view the container sees # On the host, Docker stores layers at:ls /var/lib/docker/overlay2/# a84f9c2b1d3e... (layer 1 — base image)# b72c8a9f4e1d... (layer 2 — package.json)# c91d8b3f2a5e... (layer 3 — node_modules)# d03e5f4a6b7c... (layer 4 — app source)# e15f8g5b7c8d... (container RW layer — unique per container) # Each directory contains the files added by that layerls /var/lib/docker/overlay2/a84f9c2b1d3e/diff/# bin/ usr/ lib/ etc/ <- Alpine Linux base filesCopy-on-Write — How Modifications Work
# When a container modifies a file from a read-only layer:# 1. Docker copies the file UP to the read-write layer (copy-on-write)# 2. Container modifies the copy in the read-write layer# 3. Original read-only layer is untouched # Example: container modifies /etc/nginx/nginx.conf (from base image)docker exec my-nginx sh -c 'echo "# modified" >> /etc/nginx/nginx.conf' # What happened on disk:# Original /etc/nginx/nginx.conf in read-only Layer 1 = UNCHANGED# A copy of nginx.conf now exists in the RW layer with the modification# Other containers using the same image still see the original # Consequence: modifying large files in containers is expensive# A 100MB database file modified in a container:# - Full 100MB copied to RW layer on first modification# - All subsequent modifications are in-place on the RW layer copy# This is why databases should use volumes, not the container filesystemLayer Sharing — Storage Efficiency
# Multiple containers running from the same imagedocker run -d --name api-1 payment-api:v3.1.0docker run -d --name api-2 payment-api:v3.1.0docker run -d --name api-3 payment-api:v3.1.0 # On disk:# Image layers (shared): ~200MB (one copy for all three containers)# api-1 RW layer: ~1MB (only api-1's changes)# api-2 RW layer: ~1MB (only api-2's changes)# api-3 RW layer: ~1MB (only api-3's changes)# Total: ~203MB (not 600MB) # Without OverlayFS (VMs or full copies):# vm-1: 200MB# vm-2: 200MB# vm-3: 200MB# Total: 600MB # docker system df shows actual storage usagedocker system df# TYPE TOTAL ACTIVE SIZE RECLAIMABLE# Images 5 3 892MB 240MB# Containers 3 3 3.2MB 0B# Volumes 2 2 1.5GB 0BPerformance Implications
# OverlayFS is fast for reads (native filesystem speed)# OverlayFS is slower for first write to a new file (copy-up cost)# OverlayFS is NOT suitable for database workloads: # BAD: Database storing data in container filesystemdocker run -d postgres:15# Every database write goes through OverlayFS# First write to any file = copy-up from read-only layer# Performance: 40-60% slower than native filesystem # GOOD: Database storing data in a volumedocker run -d -v postgres-data:/var/lib/postgresql/data postgres:15# Database writes go directly to the volume (native filesystem)# OverlayFS is bypassed for volume paths# Performance: native filesystem speedPLACEMENT PRO TIP**Tip:** Run `docker system df -v` to see the size of each image layer and each container's read-write layer. This tells you exactly how much storage each container is consuming above the shared image layers — useful for identifying containers that are writing large amounts of data to their container filesystem instead of a volume.
REMEMBER THIS**Remember:** When you delete a file inside a container, OverlayFS does not delete the original from the read-only layer — it adds a "whiteout" file in the read-write layer that marks the file as deleted. The original bytes are still in the image layer on disk. This is why `RUN rm -rf /tmp/large-file` in a Dockerfile does not reduce image size if the large file was in a previous layer.
COMMON MISTAKE / WARNING**Common Mistake:** Storing database data or large files in the container filesystem instead of a Docker volume. The container filesystem uses OverlayFS, which has overhead on writes and is deleted when the container is removed. Always use named volumes for any data that must persist or that is written frequently.