← Back|CLOUD-DEVOPSSection 1/17
0 of 17 completed

Infrastructure as Code

Intermediate14 min read📅 Updated: 2026-02-17

Introduction

Nee AWS console la login panni, EC2 instance create pannuva, security group add pannuva, RDS setup pannuva — manually ellam click click click. Next day same setup another region la venum — again click click click. 😩


Infrastructure as Code (IaC) vandha — oru file ezhudhunga, terraform apply run pannunga — DONE! 10 servers, 5 databases, networking ellam 2 minutes la ready! 🚀


Indha article la IaC fundamentals, Terraform hands-on, AI infrastructure setup, best practices — ellam detail ah paapom! ⚙️

Why Infrastructure as Code?

Manual infrastructure management oda problems:


Without IaC 😰:

ProblemImpact
Manual setupHours of clicking
No version history"Yaar change pannadhu?"
InconsistencyDev ≠ Staging ≠ Prod
No rollbackMistake fix panna impossible
DocumentationAlways outdated
ScalingEach server manual setup

With IaC 😊:

BenefitHow
**Automation**One command — full setup
**Version control**Git la track every change
**Consistency**Same code = Same infra everywhere
**Rollback**Git revert = Infra rollback
**Documentation**Code IS the documentation
**Scaling**count = 10 → 10 servers instant

Real example: Netflix — thousands of servers manage panradhu IaC la. Manual ah possible eh illa! 🎬

IaC Tools Landscape

Popular IaC tools compare pannalam:


ToolLanguageCloud SupportLearning CurveBest For
**Terraform**HCLMulti-cloudMediumIndustry standard
**Pulumi**Python/TS/GoMulti-cloudEasy (if dev)Developers
**CloudFormation**YAML/JSONAWS onlyMediumAWS-only shops
**Bicep**BicepAzure onlyEasyAzure users
**CDK**TypeScript/PythonAWSMediumAWS + developers
**Ansible**YAMLMulti-cloudEasyConfig management

Categories:

  • 🏗️ Provisioning: Terraform, Pulumi, CloudFormation — create infrastructure
  • ⚙️ Configuration: Ansible, Chef, Puppet — configure servers
  • 📦 Containers: Docker Compose, Kubernetes YAML — app deployment

My recommendation: Terraform learn pannunga first — 80% companies use panradhu. Then Pulumi or CDK try pannunga. 🎯

Terraform — Core Concepts

Terraform oda building blocks:


1. Providers 🌐 — Cloud platform connection

hcl
provider "aws" {
  region = "ap-south-1"
}

2. Resources 🏗️ — What you want to create

hcl
resource "aws_instance" "ai_server" {
  ami           = "ami-0abcdef1234567890"
  instance_type = "g4dn.xlarge"  # GPU instance
  tags = { Name = "AI-Model-Server" }
}

3. Variables 📝 — Reusable values

hcl
variable "instance_count" {
  default = 3
}

4. Outputs 📤 — Display created resource info

hcl
output "server_ip" {
  value = aws_instance.ai_server.public_ip
}

5. State 💾 — Terraform tracks what it created

  • terraform.tfstate file la current infra state store aagum
  • Remote state (S3 bucket) use pannunga team ku

Workflow: Write → Plan → Apply → Destroy 🔄

Terraform Workflow — Step by Step

Example

Complete Terraform workflow:

bash
# 1. Initialize — download providers
terraform init

# 2. Format — clean code
terraform fmt

# 3. Validate — check syntax
terraform validate

# 4. Plan — preview changes (DRY RUN)
terraform plan
# Output: "Will create 3 resources"

# 5. Apply — create infrastructure!
terraform apply
# Type "yes" to confirm

# 6. Show — see current state
terraform show

# 7. Destroy — delete everything
terraform destroy

Critical rule: ALWAYS run terraform plan before apply! Plan output paathuttu dhaan apply pannunga. Illana accidental delete aagum! ⚠️

Pro tip: CI/CD la terraform plan PR la run pannunga, terraform apply merge la run pannunga. Review + automation! 🛡️

AI Infrastructure with Terraform

Real-world AI project infrastructure setup:


hcl
# main.tf — AI Platform Infrastructure

# VPC for isolation
resource "aws_vpc" "ai_vpc" {
  cidr_block = "10.0.0.0/16"
  tags       = { Name = "AI-Platform-VPC" }
}

# GPU Instance for Model Training
resource "aws_instance" "training_server" {
  count         = var.training_nodes
  ami           = "ami-deep-learning-ubuntu"
  instance_type = "p3.2xlarge"  # V100 GPU
  subnet_id     = aws_subnet.private.id

  root_block_device {
    volume_size = 500  # 500GB for datasets
  }

  tags = { Name = "Training-Node-${count.index}" }
}

# S3 for Model Storage
resource "aws_s3_bucket" "models" {
  bucket = "ai-models-${var.environment}"
  versioning { enabled = true }
}

# RDS for Metadata
resource "aws_db_instance" "metadata" {
  engine         = "postgres"
  instance_class = "db.r5.large"
  storage        = 100
}

# API Server (CPU) for Inference
resource "aws_ecs_service" "inference_api" {
  name            = "ai-inference"
  desired_count   = var.api_replicas
  task_definition = aws_ecs_task_definition.inference.arn
}

Oru file la complete AI platform define aagiduthu! Training servers, model storage, database, API — ellam! 🤖

Terraform Modules — Reusable Components

Modules = Reusable infrastructure packages. Functions maari — oru thadava write pannunga, everywhere use pannunga!


Module structure:

code
modules/
  ai-platform/
    main.tf        ← Resources
    variables.tf   ← Inputs
    outputs.tf     ← Outputs
    README.md      ← Documentation

Using a module:

hcl
module "ai_staging" {
  source          = "./modules/ai-platform"
  environment     = "staging"
  gpu_count       = 1
  api_replicas    = 2
}

module "ai_production" {
  source          = "./modules/ai-platform"
  environment     = "production"
  gpu_count       = 4
  api_replicas    = 10
}

Same module, different configs — staging ku 1 GPU, production ku 4 GPU. Code duplicate illa! 🧩


Public modules: Terraform Registry la 10,000+ community modules irukku. VPC, EKS, RDS — ready-made modules use pannalam!

State Management — Critical Topic

⚠️ Warning

Terraform state file = Most important & dangerous file! ⚠️

What is state?

- terraform.tfstate — JSON file tracking your infrastructure

- Terraform idha use panni "what exists" vs "what should exist" compare pannum

NEVER do these:

❌ State file Git la commit pannaadheenga — secrets irukku!

❌ State file manually edit pannaadheenga — corrupt aagum!

❌ Two people same time apply pannaadheenga — state conflict!

❌ State file delete pannaadheenga — Terraform "forgets" everything!

Remote state setup (MUST for teams):

hcl
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "ai-platform/terraform.tfstate"
    region         = "ap-south-1"
    dynamodb_table = "terraform-locks"  # Locking!
    encrypt        = true
  }
}

DynamoDB lock — two people same time apply try pannina, one wait pannum. State corruption prevent! 🔒

Multi-Environment Setup

Dev, Staging, Production — separate environments manage pannradhu:


Approach 1: Workspaces (Simple)

bash
terraform workspace new dev
terraform workspace new staging
terraform workspace new prod

terraform workspace select prod
terraform apply

Approach 2: Directory Structure (Recommended)

code
environments/
  dev/
    main.tf        → module source = "../../modules/ai-platform"
    terraform.tfvars → gpu_count = 1
  staging/
    main.tf
    terraform.tfvars → gpu_count = 2
  prod/
    main.tf
    terraform.tfvars → gpu_count = 8

Approach 3: Terragrunt (Advanced)

code
# terragrunt.hcl — DRY configuration
terraform {
  source = "../../modules/ai-platform"
}
inputs = {
  environment = "prod"
  gpu_count   = 8
}

Recommendation: Start with Directory Structure. Team grow aana Terragrunt move pannunga! 📂

IaC Pipeline Architecture

🏗️ Architecture Diagram
┌──────────────────────────────────────────────────────┐
│           INFRASTRUCTURE AS CODE PIPELINE              │
├──────────────────────────────────────────────────────┤
│                                                        │
│  👨‍💻 Developer                                          │
│    │ git push (*.tf files)                             │
│    ▼                                                   │
│  ┌──────────┐    ┌───────────────┐                    │
│  │  GitHub   │───▶│ GitHub Actions │                   │
│  │   PR      │    │   CI/CD       │                   │
│  └──────────┘    └───────┬───────┘                    │
│                     ┌────▼────┐                        │
│                     │ tf init │                        │
│                     │ tf fmt  │                        │
│                     │ tf plan │ ◀── PR Comment         │
│                     └────┬────┘     (plan output)      │
│                          │                             │
│                   ┌──────▼──────┐                      │
│                   │   Review    │ ◀── Team approves    │
│                   │   & Merge   │                      │
│                   └──────┬──────┘                      │
│                          │                             │
│               ┌──────────▼──────────┐                  │
│               │   terraform apply   │                  │
│               └──────────┬──────────┘                  │
│          ┌───────────────┼───────────────┐             │
│          ▼               ▼               ▼             │
│   ┌────────────┐  ┌───────────┐  ┌────────────┐      │
│   │  Dev Infra │  │  Staging  │  │ Production │      │
│   │  (auto)    │  │  (auto)   │  │ (approval) │      │
│   └────────────┘  └───────────┘  └────────────┘      │
│                                                        │
│   📦 State: S3 + DynamoDB Lock                         │
│   🔒 Secrets: HashiCorp Vault / AWS Secrets Manager    │
│                                                        │
└──────────────────────────────────────────────────────┘

Pulumi — Developer-Friendly Alternative

Terraform ku HCL learn pannanum. Pulumi la unga favourite language la IaC ezhudhalaam!


Pulumi with Python 🐍:

python
import pulumi
import pulumi_aws as aws

# Create VPC
vpc = aws.ec2.Vpc("ai-vpc",
    cidr_block="10.0.0.0/16")

# GPU Instance for AI Training
training = aws.ec2.Instance("ai-trainer",
    instance_type="p3.2xlarge",
    ami="ami-deep-learning",
    vpc_security_group_ids=[sg.id],
    tags={"Name": "AI-Training-Server"})

# S3 Bucket for Models
models = aws.s3.Bucket("ai-models",
    versioning={"enabled": True})

# Output the IP
pulumi.export("training_ip", training.public_ip)

Terraform vs Pulumi:

AspectTerraformPulumi
LanguageHCL (custom)Python, TS, Go, C#
Loops/LogicLimitedFull programming power
TestingExternal toolsNative unit tests
CommunityMassiveGrowing
StateFile/S3Pulumi Cloud (free)

When to use Pulumi: Complex logic, loops, conditions venum na — Pulumi better. Simple infra ku Terraform podhum! 🎯

IaC Security Best Practices

💡 Tip

Infrastructure security code level la enforce pannunga:

🔒 1. No hardcoded secrets

hcl
# ❌ BAD
password = "mysecretpassword"

# ✅ GOOD
password = var.db_password  # From env/vault

🔒 2. Least privilege IAM

hcl
# Terraform service account ku minimum permissions only

🔒 3. Encryption everywhere

hcl
resource "aws_s3_bucket" "models" {
  server_side_encryption_configuration {
    rule { apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"
    }}
  }
}

🔒 4. Security scanning

- tfsec — Terraform security scanner

- checkov — Policy-as-code scanner

- CI pipeline la add pannunga!

🔒 5. State encryption — Remote state always encrypt pannunga!

Prompt: Design AI Infrastructure

📋 Copy-Paste Prompt
You are a Cloud Infrastructure Architect.

Design Terraform configuration for an AI/ML platform with:
- VPC with public/private subnets across 2 AZs
- GPU instances (p3.2xlarge) for model training with auto-scaling
- ECS Fargate for model inference API
- S3 bucket for model artifacts with versioning
- RDS PostgreSQL for metadata
- CloudWatch monitoring and alerts
- All in ap-south-1 (Mumbai) region

Provide:
1. Complete main.tf, variables.tf, outputs.tf
2. Module structure recommendation
3. Remote state configuration
4. CI/CD pipeline for terraform apply
5. Cost estimation per month

Summary

Key takeaways:


IaC = Infrastructure code la define & manage pannradhu

Terraform = Industry standard, multi-cloud, HCL language

Modules = Reusable infrastructure components

State = Remote backend + locking MUST for teams

Security = No hardcoded secrets, encryption, scanning

Environments = Directory structure for dev/staging/prod


Action item: AWS Free Tier account la Terraform install pannunga, EC2 instance oru main.tf la create pannunga. terraform apply run pannunga — magic feel pannunga! ✨


Next article: Monitoring AI Apps — observability deep dive! 📊

🏁 🎮 Mini Challenge

Challenge: Create EC2 Instance + GPU using Terraform


Infrastructure code la define → one command la deploy pannu! 🏗️


Step 1: Terraform Install 📦

bash
# macOS
brew install terraform

# Verify
terraform version

Step 2: AWS Credentials Setup 🔑

bash
# AWS console → create access key
# ~/.aws/credentials file
[default]
aws_access_key_id = YOUR_KEY
aws_secret_access_key = YOUR_SECRET

Step 3: Terraform Configuration Create 📝

hcl
# main.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

# VPC (network)
resource "aws_vpc" "ai_vpc" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags = {
    Name = "ai-vpc"
  }
}

# Subnet
resource "aws_subnet" "ai_subnet" {
  vpc_id            = aws_vpc.ai_vpc.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"
}

# Security Group
resource "aws_security_group" "ai_sg" {
  vpc_id = aws_vpc.ai_vpc.id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["YOUR_IP/32"]
  }

  ingress {
    from_port   = 8000
    to_port     = 8000
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# EC2 with GPU (g4dn.xlarge)
resource "aws_instance" "ai_gpu" {
  ami             = "ami-0c55b159cbfafe1f0"  # Deep Learning AMI
  instance_type   = "g4dn.xlarge"
  subnet_id       = aws_subnet.ai_subnet.id
  security_groups = [aws_security_group.ai_sg.id]

  tags = {
    Name = "ai-gpu-server"
  }
}

output "instance_ip" {
  value = aws_instance.ai_gpu.public_ip
}

Step 4: Initialize & Deploy 🚀

bash
# Download Terraform modules
terraform init

# Preview changes
terraform plan

# Deploy!
terraform apply

# Output see
terraform output instance_ip

Step 5: Connect & Run 🔌

bash
# SSH connect
ssh -i key.pem ubuntu@<instance_ip>

# NVIDIA GPU check
nvidia-smi

# AI app run!
python train_model.py

Step 6: Destroy (cleanup) 🧹

bash
# When done, remove resources
terraform destroy

# Cost saved! 💰

Completion Time: 2-3 hours

Skills: AWS, Terraform, Infrastructure as Code

Cost: ~$5-10 for usage ⭐

💼 Interview Questions

Q1: Terraform state file — why important? Security concerns?

A: State file = current infrastructure snapshot (what resources exist, IDs, attributes). Terraform read state, compare desired state, plan changes. Important: destroy safe, updates idempotent. Security: sensitive data (passwords, keys) state file la store — encryption needed, version control avoid.


Q2: Terraform modules — reusable code — structure best practice?

A: Module = directory with main.tf, variables.tf, outputs.tf. Input variables: customization. Outputs: other modules consume pannalam. Structure: root module, child modules (networking, compute, database). Example: VPC module reuse multiple environments.


Q3: dev/staging/prod environments — Terraform la manage?

A: Option 1: separate directories (dev/, staging/, prod/) — each own state. Option 2: workspaces (terraform workspace new prod) — same code, separate state. Option 2 simpler but careful (accidental prod delete risk). Recommendation: separate directories (safety), plus variable files (dev.tfvars, prod.tfvars).


Q4: Terraform version control — state file commit pannala?

A: No! State file .gitignore. Remote backend use (AWS S3 + DynamoDB lock). Team: shared state (everyone up-to-date), locking prevents conflicts. State file only locally backup, or remote backend git push.


Q5: Terraform plan output — false positive warnings?

A: Plan shows exact changes. Review carefully! Force new (instance recreate, data loss possible). Sensitive outputs hide (secrets show illa). Targets: specific resource deploy (terraform apply -target=aws_instance.ai_gpu). Destruction dry-run: terraform plan -destroy (safe check before destroy).

Frequently Asked Questions

IaC na enna simple ah?
Infrastructure as Code = Cloud resources (servers, databases, networks) code files la define pannradhu. Manual console clicking ku badhila, code ezhudhi run pannina infrastructure auto create aagum. Version control, reuse, automation — ellam possible.
Terraform vs Pulumi — evadhu better?
Terraform — industry standard, HCL language, massive community. Pulumi — real programming languages (Python, TypeScript) use pannalam. Beginners ku Terraform recommend. Already Python expert na Pulumi try pannunga.
IaC learn panna yevlo time aagum?
Terraform basics — 1 week. Real project setup (VPC, EC2, RDS) — 2-3 weeks. Production-grade modules — 1-2 months. Start with AWS Free Tier + Terraform tutorials.
IaC illama cloud manage panna mudiyaadha?
Mudiyum — but nightmare! Manual setup la consistency illa, documentation illa, rollback impossible. 10 servers manual ah setup vs 1 terraform apply — which is better? IaC is non-negotiable for serious teams.
🧠Knowledge Check
Quiz 1 of 1

Terraform state file eppadi manage pannanum?

0 of 1 answered