Overview and What You Will Learn
A Terraform resource block creates something and takes ownership of it — Terraform will create it, update it, and destroy it. But sometimes you need to reference something that already exists and belongs to someone else. The VPC your networking team created. The SSL certificate your security team manages. The latest Amazon Linux AMI that changes every few weeks. You need the value, but you do not want Terraform to manage — or destroy — the thing.
That is exactly what a data source is for. A data source reads existing information from a cloud provider and makes it available in your configuration, without creating anything or taking ownership of anything.
In this lab you will learn:
- The difference between a resource block and a data block — when to use each
- How to fetch the latest Amazon Linux 2 AMI without hardcoding an AMI ID
- How to read an existing VPC, subnet, and security group created by another team
- How to pull secrets from AWS Secrets Manager without storing them in state
- How to use
aws_caller_identityandaws_regionto make configs self-aware - How data source results flow into resource arguments
Why This Matters in Production
At Hotstar, the networking team owns the VPC and subnets. The platform team owns the EC2 and RDS resources. The security team owns the SSL certificates. No team creates what another team owns — but every team needs to reference what the others create.
Without data sources, the platform team would have to hardcode VPC IDs (vpc-0a1b2c3d4e) and subnet IDs (subnet-0a1b2c3d4e) into their Terraform configuration. When the networking team recreates the VPC for a new region, every downstream configuration breaks and needs manual updates.
With data sources, the platform team looks up the VPC by name tag at plan time. The VPC ID is fetched dynamically. The configuration is portable across regions and environments — no hardcoded IDs to maintain.
Core Principles
+------------------------------------------+| data block (reads, never creates) || data "aws_vpc" "main" { || tags = { Name = "hotstar-prod-vpc" } || } |+------------------------------------------+ | | (read-only API call at plan time) v+------------------------------------------+| Cloud Provider API || Fetches: VPC id, cidr_block, owner_id |+------------------------------------------+ | v+------------------------------------------+| Data source result available as: || data.aws_vpc.main.id || data.aws_vpc.main.cidr_block |+------------------------------------------+ | v+------------------------------------------+| resource block (uses the read value) || resource "aws_subnet" "app" { || vpc_id = data.aws_vpc.main.id || } |+------------------------------------------+Key difference: data vs resource
+------------------------+ +------------------------------+| resource block | | data block || | | || Creates new object | | Reads existing object || Stored in state | | NOT stored in state || Can be destroyed | | Cannot be destroyed || Terraform owns it | | Terraform reads it only |+------------------------+ +------------------------------+Detailed Step-by-Step Practical Lab
Part 1 — Data Block Syntax
The syntax of a data block mirrors a resource block — data keyword, type, and local name — but it reads instead of creates:
# resource block — creates and managesresource "aws_instance" "web" { ami = "ami-0f5ee92e2d63afc18" instance_type = "t3.micro"} # data block — reads existing, creates nothingdata "aws_ami" "amazon_linux" { most_recent = true # if multiple match, return the newest one owners = ["amazon"] # only AMIs published by Amazon, not community filter { name = "name" values = ["amzn2-ami-hvm-*-x86_64-gp2"] # Amazon Linux 2 naming pattern } filter { name = "virtualization-type" values = ["hvm"] # hardware virtual machine — required for t3/t4 instances }} # Reference the data source result with data.<type>.<name>.<attribute>resource "aws_instance" "web" { ami = data.aws_ami.amazon_linux.id # e.g., ami-0f5ee92e2d63afc18 instance_type = "t3.micro"}PLACEMENT PRO TIP**Tip:** Notice the reference format: `data.aws_ami.amazon_linux.id` — it always starts with `data.` to distinguish data source references from resource references (`aws_instance.web.id`).
Part 2 — AMI Data Source (Most Common Use Case)
Hardcoding an AMI ID (ami-0f5ee92e2d63afc18) is a maintenance trap. Amazon releases patched AMIs regularly. A hardcoded ID eventually points to an outdated AMI — or one that no longer exists in a new region.
The aws_ami data source always fetches the current AMI matching your filter:
# data.tf — all data sources in one file for clarity # Amazon Linux 2 — for EC2 instances running yum-based workloadsdata "aws_ami" "amazon_linux_2" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["amzn2-ami-hvm-*-x86_64-gp2"] } filter { name = "state" values = ["available"] # only fetch AMIs that are ready to use }} # Amazon Linux 2023 — newer, AL2-compatible with dnfdata "aws_ami" "amazon_linux_2023" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["al2023-ami-*-kernel-*-x86_64"] }} # Ubuntu 22.04 LTS — for apt-based workloadsdata "aws_ami" "ubuntu_22_04" { most_recent = true owners = ["099720109477"] # Canonical's AWS account ID — always verify this filter { name = "name" values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"] } filter { name = "virtualization-type" values = ["hvm"] }}# Using the AMI data sources in resourcesresource "aws_instance" "app_server" { ami = data.aws_ami.amazon_linux_2.id # always current, always patched instance_type = "t3.medium" tags = { Name = "razorpay-api-app" AMIUsed = data.aws_ami.amazon_linux_2.name # log which AMI was used }} # Output the AMI ID so you can see what was usedoutput "ami_id_used" { description = "AMI ID selected by the data source — for auditability" value = data.aws_ami.amazon_linux_2.id}Part 3 — VPC and Subnet Data Sources (Cross-Team References)
In production, the networking team creates the VPC and subnets. The application team creates EC2 and RDS. Data sources let the application team reference the VPC without owning it.
# Read a VPC by its Name tag — no hardcoded VPC IDdata "aws_vpc" "main" { tags = { Name = "hotstar-prod-vpc" # must match exactly Environment = "prod" }} # Read all private subnets in that VPCdata "aws_subnets" "private" { filter { name = "vpc-id" values = [data.aws_vpc.main.id] # use the VPC we just read } tags = { Tier = "private" # networking team tags their subnets — read the tag }} # Read a specific subnet by tagdata "aws_subnet" "app_primary" { vpc_id = data.aws_vpc.main.id tags = { Name = "hotstar-prod-private-ap-south-1a" }}# Use the VPC data source result in RDS subnet groupresource "aws_db_subnet_group" "app" { name = "razorpay-payments-db-subnet-group" subnet_ids = data.aws_subnets.private.ids # list of all private subnet IDs tags = { Name = "razorpay-payments" }} # Use the VPC in a security groupresource "aws_security_group" "rds" { name = "razorpay-rds-sg" vpc_id = data.aws_vpc.main.id # must be in same VPC as the RDS instance}Part 4 — Account and Region Data Sources (Self-Aware Configs)
These two data sources make your configuration environment-aware without hardcoding account IDs or region strings:
# aws_caller_identity — reads the current AWS account detailsdata "aws_caller_identity" "current" {}# No filter arguments needed — reads the caller (the IAM user or role running Terraform) # aws_region — reads the currently configured AWS regiondata "aws_region" "current" {} # aws_availability_zones — lists all AZs in the current regiondata "aws_availability_zones" "available" { state = "available" # only fetch AZs that are currently operational}# Use account ID for globally unique S3 bucket namesresource "aws_s3_bucket" "terraform_state" { # S3 bucket names are globally unique — include account ID to guarantee uniqueness bucket = "razorpay-terraform-state-${data.aws_caller_identity.current.account_id}"} # Use region in ARN constructionresource "aws_cloudwatch_log_group" "app" { name = "/aws/ec2/${data.aws_region.current.name}/app" # Result: /aws/ec2/ap-south-1/app} # Use available AZs for subnet distributionresource "aws_subnet" "private" { # Create one subnet per AZ — works in any region without hardcoding AZ names count = length(data.aws_availability_zones.available.names) availability_zone = data.aws_availability_zones.available.names[count.index] vpc_id = aws_vpc.main.id cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)}Part 5 — Security Group and IAM Data Sources
# Read an existing security group by name — often owned by another teamdata "aws_security_group" "bastion" { name = "hotstar-bastion-sg" vpc_id = data.aws_vpc.main.id} # Read an existing IAM role — to attach a policy without owning the roledata "aws_iam_role" "ec2_instance_role" { name = "hotstar-ec2-instance-role"} # Read an existing IAM policy by ARNdata "aws_iam_policy" "ssm_managed" { arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"}# Attach the existing IAM policy to the existing roleresource "aws_iam_role_policy_attachment" "ssm" { role = data.aws_iam_role.ec2_instance_role.name policy_arn = data.aws_iam_policy.ssm_managed.arn}Part 6 — Secrets Manager Data Source (Secrets Without State Exposure)
Reading a secret from AWS Secrets Manager via a data source is safer than storing the secret as a Terraform variable — the secret value never appears in the plan output or state file as a Terraform-managed value.
# Read the secret metadata (name, ARN, description)data "aws_secretsmanager_secret" "db_password" { name = "razorpay/payments-db/master-password"} # Read the current secret valuedata "aws_secretsmanager_secret_version" "db_password" { secret_id = data.aws_secretsmanager_secret.db_password.id}# Use the secret value in an RDS resourceresource "aws_db_instance" "payments" { identifier = "razorpay-payments-prod" engine = "postgres" username = "payments_admin" # Read from Secrets Manager — not stored as a Terraform variable password = data.aws_secretsmanager_secret_version.db_password.secret_string instance_class = "db.r6g.large" allocated_storage = 500}COMMON MISTAKE / WARNING**Security:** Even when reading from Secrets Manager via a data source, the secret value is still stored in the Terraform state file after apply (as part of the `aws_db_instance` resource attributes). The state file must be encrypted (S3 SSE) and access-controlled. For true secret isolation, use the Vault provider with dynamic credentials instead.
Part 7 — SSM Parameter Store Data Source
AWS Systems Manager Parameter Store is a good place for configuration values that are not secret but change per environment (database hostnames, service endpoints, feature flags):
# Read a plain string parameterdata "aws_ssm_parameter" "db_endpoint" { name = "/razorpay/${var.environment}/database/endpoint" # Result: "payments-prod.abc123.ap-south-1.rds.amazonaws.com"} # Read a SecureString parameter (encrypted with KMS)data "aws_ssm_parameter" "api_key" { name = "/razorpay/${var.environment}/third-party/api-key" with_decryption = true # decrypt SecureString parameters}# Use the parameter value in an ECS task definition environment variableresource "aws_ecs_task_definition" "api" { family = "razorpay-api" container_definitions = jsonencode([{ name = "api" image = "razorpay/api:latest" environment = [ { name = "DB_ENDPOINT" value = data.aws_ssm_parameter.db_endpoint.value } ] }])}Part 8 — When Data Sources Fetch Their Data
Understanding when data sources run matters for plan-time vs apply-time behaviour:
# Data sources that run at PLAN time (most common):# - aws_ami, aws_vpc, aws_subnet, aws_caller_identity, aws_region# - These fetch data before any apply happens# - Results are known in the plan output # Data sources that may run at APPLY time:# - Data sources that depend on resources being created first# Example: reading a security group that Terraform creates in the same applyresource "aws_security_group" "app" { name = "app-sg" vpc_id = aws_vpc.main.id} # This data source can only read the security group after it is createddata "aws_security_group" "app_lookup" { name = aws_security_group.app.name # depends on the resource above vpc_id = aws_vpc.main.id}# In the plan output, data.aws_security_group.app_lookup.id will show as# "(known after apply)" because the security group does not exist yetREMEMBER THIS**Remember:** If a data source result shows `(known after apply)` in the plan, it means the data source depends on a resource that has not been created yet. This is normal — the value will be known after the apply finishes.
Part 9 — Complete Example: Multi-Team Infrastructure Reference
This is the pattern used when multiple Terraform configurations manage different layers of infrastructure:
# data.tf — reading everything the networking team created # The VPC — networking team owns this, we just read itdata "aws_vpc" "main" { tags = { Name = "${var.environment}-vpc", Team = "networking" }} # Private subnets for our EC2 and RDS instancesdata "aws_subnets" "private" { filter { name = "vpc-id" values = [data.aws_vpc.main.id] } tags = { Tier = "private" }} # The bastion security group — we need to allow SSH from itdata "aws_security_group" "bastion" { name = "${var.environment}-bastion-sg" vpc_id = data.aws_vpc.main.id} # Current account and regiondata "aws_caller_identity" "current" {}data "aws_region" "current" {} # Latest Amazon Linux 2 AMI for this regiondata "aws_ami" "amazon_linux" { most_recent = true owners = ["amazon"] filter { name = "name" values = ["amzn2-ami-hvm-*-x86_64-gp2"] }} # RDS password from Secrets Manager — not a Terraform variabledata "aws_secretsmanager_secret_version" "db_password" { secret_id = "razorpay/${var.environment}/rds/master-password"}# main.tf — using everything we read above resource "aws_instance" "app" { ami = data.aws_ami.amazon_linux.id instance_type = local.ec2_instance_type subnet_id = data.aws_subnets.private.ids[0] vpc_security_group_ids = [ aws_security_group.app.id, data.aws_security_group.bastion.id # allow SSH from bastion — owned by another team ] tags = merge(local.common_tags, { Name = "${local.name_prefix}-app" AMI = data.aws_ami.amazon_linux.id })} resource "aws_db_instance" "main" { identifier = "${local.name_prefix}-postgres" engine = "postgres" instance_class = var.database_config.instance_class db_subnet_group_name = aws_db_subnet_group.main.name password = data.aws_secretsmanager_secret_version.db_password.secret_string tags = local.common_tags} resource "aws_db_subnet_group" "main" { name = "${local.name_prefix}-db-subnet-group" subnet_ids = data.aws_subnets.private.ids # all private subnets from data source}Production Best Practices and Common Pitfalls
Use data sources instead of hardcoding IDs. Never put
vpc-0a1b2c3d4e5forami-0f5ee92e2d63afc18directly in resource arguments. These IDs change per region and per account. Data sources make your config portable.Tag your resources consistently so data sources can find them. A data source filtering by
tags = { Name = "prod-vpc" }only works if the VPC was actually tagged that way. Establish a tagging standard before you start writing data sources that rely on tags.Add
most_recent = trueto AMI data sources. Without it, if multiple AMIs match your filter, Terraform errors. Withmost_recent = true, Terraform picks the newest — which is always what you want for security patches.Never use a data source to read something you just created in the same apply. If you create a VPC and then immediately use a
data "aws_vpc"to look it up, the data source runs before the VPC exists (at plan time) and the plan fails. Reference the resource directly:aws_vpc.main.idnotdata.aws_vpc.main.id.Pin the Canonical Ubuntu AMI owner ID. The value
099720109477is Canonical's AWS account ID for Ubuntu AMIs. Verify this at https://cloud-images.ubuntu.com/locator/ec2/ — it does not change, but verifying protects you from malicious community AMIs with similar names.Data source results are not stored in state. If the underlying resource changes (someone renames the VPC tag), your next plan or apply may fail because the data source can no longer find what it is looking for. This is a feature — it forces you to update your configuration when dependencies change.
Quick Reference and Troubleshooting Commands
| Command | What It Does |
|---|---|
terraform plan |
Shows data source reads as <= data.aws_ami.amazon_linux |
terraform console |
Test data source expressions interactively |
terraform state list |
Data sources do NOT appear in state — only resources do |
terraform refresh |
Force data sources to re-fetch from the API |
| Error | Root Cause | Fix |
|---|---|---|
Error: no matching AMI found |
Filter does not match any AMIs | Broaden the filter or check owners field |
Error: no matching VPC found |
Tag filter does not match | Verify tag names and values on the actual VPC |
Error: multiple VPCs matched |
Filter is too broad | Add more filters to narrow to exactly one result |
(known after apply) in plan |
Data source depends on a resource not yet created | Reference the resource directly instead of using a data source lookup |
Error: error reading Secrets Manager secret |
IAM permissions missing | Add secretsmanager:GetSecretValue to the IAM role |