← Back|CLOUD-DEVOPSSection 1/17
0 of 17 completed

Cloud cost optimization

Advanced16 min read📅 Updated: 2026-02-17

Introduction

Cloud migrate pannita — everything works great! But month end la bill paarkura: $3,000! Expected budget $800 dhan! 😱


30-35% of cloud spend is wasted according to Flexera 2025 report. Unused resources, oversized instances, wrong pricing model — panam drain aagudhu!


Cloud cost optimization = same performance maintain panni unnecessary spending eliminate pannradhu.


Indha article la:

  • Cloud billing models understand
  • Rightsizing — correct instance size select
  • Reserved Instances & Savings Plans
  • Spot Instances for 90% savings
  • Storage & network cost optimization
  • FinOps practices
  • Real-world cost reduction strategies

Un cloud bill 40-60% reduce pannalam! 💪

Cloud Billing Models — How You Pay 💳

Pay-as-you-go = default pricing. Most expensive but most flexible.


AWS Pricing Models:


ModelDiscountCommitmentBest For
**On-Demand**0%NoneUnpredictable workloads
**Reserved (1yr)**30-40%1 yearSteady-state apps
**Reserved (3yr)**50-60%3 yearsLong-term production
**Savings Plans**30-50%1-3 yearsFlexible commitment
**Spot Instances**60-90%None (can be interrupted)Batch, CI/CD, testing

GCP Pricing Models:


ModelDiscountHow it works
**On-Demand**0%Pay per second
**Committed Use (CUD)**28-55%1 or 3 year commitment
**Sustained Use**Up to 30%**Automatic!** Run >25% of month
**Preemptible/Spot**60-91%Can be stopped anytime

Azure Pricing Models:


ModelDiscountCommitment
**Pay-as-you-go**0%None
**Reserved (1yr)**30-40%1 year
**Reserved (3yr)**50-60%3 years
**Spot VMs**Up to 90%Can be evicted
**Azure Hybrid Benefit**Up to 85%Existing Windows/SQL licenses

💡 GCP advantage: Sustained Use Discounts are automatic — no commitment needed! Run an instance for a full month and you get ~30% off automatically.

Rightsizing — Stop Overpaying for Resources 📏

#1 waste source: Oversized instances! Average CPU utilization in cloud = 15-20%. Nee 80% resource ku pay panra, use pannradhu 20% mattum! 😰


Rightsizing Process:


code
1. Monitor actual usage (2-4 weeks minimum)
2. Identify underutilized resources (<40% CPU/memory)
3. Recommend smaller instance size
4. Test with new size
5. Apply change
6. Continue monitoring

AWS Rightsizing Example:

code
Current: m5.2xlarge (8 vCPU, 32 GB RAM) — $280/month
Actual usage: 12% CPU, 8 GB RAM average

Recommendation: m5.large (2 vCPU, 8 GB RAM) — $70/month
Savings: $210/month (75% reduction!) 🎉

Tools for Rightsizing:


ToolProviderFree
**AWS Compute Optimizer**AWS
**GCP Recommender**GCP
**Azure Advisor**Azure
**Datadog**Multi-cloudTrial
**CloudHealth**Multi-cloudPaid
**Spot.io (NetApp)**Multi-cloudPaid

bash
# AWS CLI - Get rightsizing recommendations
aws compute-optimizer get-ec2-instance-recommendations \
  --filters Name=Finding,Values=OVER_PROVISIONED \
  --output table

# GCP - Get recommender insights
gcloud recommender recommendations list \
  --project=my-project \
  --recommender=google.compute.instance.MachineTypeRecommender \
  --location=us-central1-a

Rightsizing Pro Tip

💡 Tip

Start with non-production environments! 🧪

Dev/staging instances are typically oversized by 3-4x because devs copy production config.

Quick wins:

- Dev environments: Use t3.small instead of m5.large

- Staging: Use 50% of production sizing

- Schedule dev/staging to shut down nights & weekends (save 65%!)

bash
# AWS - Stop dev instances at 7 PM, start at 8 AM
aws scheduler create-schedule --name "stop-dev" \
  --schedule-expression "cron(0 19 ? * MON-FRI *)" \
  --target '{"Arn":"arn:aws:ec2:...:stop-instances"}'

Reserved Instances & Savings Plans 📊

Predictable workloads ku commitment kuduthaa big discounts kidaikkum!


When to use Reserved Instances:

  • ✅ Production databases (always running)
  • ✅ Core application servers
  • ✅ Baseline capacity (minimum instances needed)
  • ❌ Variable/seasonal workloads
  • ❌ Short-term projects (<1 year)

Savings Plans vs Reserved Instances:


FeatureReserved InstancesSavings Plans
**Flexibility**Locked to instance typeAny instance family
**Region**Specific regionAny region (Compute SP)
**Discount**Slightly higherSlightly lower
**Ease**Complex to manageSimple commitment
**Recommendation**Large, stable fleetsMost teams ✅

Optimal Strategy — Layered Approach:

code
Total Capacity Needed
├── 40% — Reserved/Savings Plans (baseline load)
├── 30% — On-Demand (variable load)
├── 20% — Spot Instances (fault-tolerant tasks)
└── 10% — Buffer for spikes

AWS Savings Plan Example:

code
Without commitment:
  10 × m5.large On-Demand = $700/month

With 1-year Compute Savings Plan ($400/month commitment):
  Same 10 instances = $450/month
  Savings: $250/month = $3,000/year! 💰

With 3-year commitment:
  Same 10 instances = $320/month
  Savings: $380/month = $4,560/year! 🤑

Spot Instances — 90% Savings! ⚡

Spot Instances = cloud provider oda unused capacity. 60-90% cheaper than On-Demand! But 2-minute warning la terminate aagalam.


Perfect for:

  • ✅ CI/CD pipelines (build servers)
  • ✅ Batch processing / data pipelines
  • ✅ Machine learning training
  • ✅ Testing environments
  • ✅ Stateless web servers (behind load balancer)

NOT suitable for:

  • ❌ Databases
  • ❌ Single-instance production
  • ❌ Stateful applications
  • ❌ Anything that can't handle interruption

Spot Strategy — Diversify:

yaml
# AWS Auto Scaling Group - Mixed Instances
MixedInstancesPolicy:
  InstancesDistribution:
    OnDemandBaseCapacity: 2          # Minimum 2 On-Demand
    OnDemandPercentageAboveBase: 20  # 20% On-Demand, 80% Spot
    SpotAllocationStrategy: capacity-optimized
  LaunchTemplate:
    Overrides:
      - InstanceType: m5.large
      - InstanceType: m5a.large      # AMD variant (cheaper)
      - InstanceType: m5d.large      # Different variant
      - InstanceType: m4.large       # Previous gen (more available)

Spot Instance pricing (example):


InstanceOn-DemandSpot PriceSavings
m5.large$0.096/hr$0.029/hr**70%**
c5.xlarge$0.170/hr$0.034/hr**80%**
r5.large$0.126/hr$0.025/hr**80%**
g4dn.xlarge (GPU)$0.526/hr$0.158/hr**70%**

ML training la Spot use pannaa massive savings! 🚀

Storage Cost Optimization 💾

Storage costs slowly build up — often second largest cloud expense!


S3 Storage Classes (AWS):


ClassCost (GB/month)AccessBest For
**S3 Standard**$0.023FrequentActive data
**S3 IA**$0.0125InfrequentBackups (30+ days)
**S3 Glacier Instant**$0.004RareArchives (instant access)
**S3 Glacier Deep**$0.00099Very rareCompliance archives

10 TB storage cost comparison:


ClassMonthly Cost
S3 Standard**$230**
S3 IA**$125**
S3 Glacier Instant**$40**
S3 Glacier Deep**$10**

S3 Lifecycle Policy — Automate tiering:

json
{
  "Rules": [
    {
      "ID": "OptimizeCosts",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

EBS Volume Optimization:

bash
# Find unattached EBS volumes (wasted money!)
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].{ID:VolumeId,Size:Size,Type:VolumeType}' \
  --output table

# Delete unattached volumes
# (verify first — some may be intentional snapshots)

Quick wins:

  • 🗑️ Delete unattached EBS volumes
  • 📦 Enable S3 Intelligent-Tiering (auto-moves data)
  • 🗜️ Compress data before storing
  • 🔄 Set lifecycle policies on ALL buckets
  • 📸 Delete old snapshots (>90 days)

Network & Data Transfer Costs 🌐

Hidden cost killer = data transfer! Ingress free, but egress is expensive.


AWS Data Transfer Pricing:


Transfer TypeCost
**Internet → AWS**FREE
**AWS → Internet**$0.09/GB
**Cross-region**$0.02/GB
**Cross-AZ**$0.01/GB
**Same AZ**FREE
**NAT Gateway processing**$0.045/GB

NAT Gateway is expensive! 🚨

code
100 GB/day through NAT Gateway:
Processing: 100 × $0.045 = $4.50/day
Hourly charge: 24 × $0.045 = $1.08/day
Monthly: ~$167/month for ONE NAT Gateway!

Optimization strategies:


StrategySavings
**CloudFront CDN**40-60% on data transfer
**VPC Endpoints**Eliminate NAT for AWS services
**Same-AZ placement**Eliminate cross-AZ costs
**Compress responses**Reduce transfer volume
**Cache at edge**Fewer origin requests

bash
# Create VPC Endpoint for S3 (free! replaces NAT Gateway for S3 traffic)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-xxx

Pro tip: S3 traffic through NAT Gateway = double charge (NAT processing + data transfer). VPC Endpoint use pannaa both charges eliminated! 💡

FinOps — Cloud Financial Management 📈

FinOps = cloud cost la accountability and optimization culture build pannradhu.


FinOps Lifecycle:

code
Inform → Optimize → Operate → (repeat)
  │          │          │
  ├─ Visibility   ├─ Rightsizing    ├─ Governance
  ├─ Allocation   ├─ Rate optimization  ├─ Automation
  ├─ Benchmarks   ├─ Usage reduction   ├─ Continuous improvement
  └─ Forecasting  └─ Waste elimination └─ Team accountability

Cost Tagging Strategy (Critical!):

json
{
  "Tags": {
    "Environment": "production",
    "Team": "ai-platform",
    "Project": "recommendation-engine",
    "CostCenter": "CC-1234",
    "Owner": "rathish@company.com"
  }
}

Without tags = impossible to know which team/project is spending how much!


Enforce tagging with AWS SCP:

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": "ec2:RunInstances",
      "Resource": "*",
      "Condition": {
        "Null": {
          "aws:RequestTag/Environment": "true",
          "aws:RequestTag/Team": "true"
        }
      }
    }
  ]
}

Budget Alerts:

bash
# AWS Budget Alert
aws budgets create-budget --account-id 123456789 \
  --budget '{
    "BudgetName": "MonthlyBudget",
    "BudgetLimit": {"Amount": "1000", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 80
    },
    "Subscribers": [{
      "SubscriptionType": "EMAIL",
      "Address": "team@company.com"
    }]
  }]'

Kubernetes Cost Optimization ⎈

K8s la cost waste panra top reasons:


1. Over-provisioned resource requests:

yaml
# ❌ Bad - requesting way more than needed
resources:
  requests:
    cpu: "2"
    memory: "4Gi"
  limits:
    cpu: "4"
    memory: "8Gi"
# Actual usage: 200m CPU, 512Mi RAM 😱

# ✅ Good - based on actual metrics
resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

2. Cluster Autoscaler not configured:

yaml
# Enable Cluster Autoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          args:
            - --scale-down-enabled=true
            - --scale-down-delay-after-add=10m
            - --scale-down-unneeded-time=10m
            - --skip-nodes-with-system-pods=false

K8s Cost Tools:


ToolWhat it doesFree
**Kubecost**Cost allocation per namespace/pod✅ (basic)
**OpenCost**CNCF cost monitoring
**Goldilocks**Resource request recommendations
**Karpenter**Smart node provisioning

Karpenter vs Cluster Autoscaler:

  • Cluster Autoscaler: Node group based, slower scaling
  • Karpenter: Pod-aware, picks optimal instance type, much faster

yaml
# Karpenter NodePool - auto-selects cheapest instance
apiVersion: karpenter.sh/v1beta1
kind: NodePool
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5a.large", "m5d.large", "m4.large"]
      kubelet:
        maxPods: 110
  disruption:
    consolidationPolicy: WhenUnderutilized

Cost Monitoring Architecture

🏗️ Architecture Diagram
**Cloud Cost Monitoring & Optimization Architecture:**

```
Cloud Providers (AWS / GCP / Azure)
        │
        ▼
Cost Data Collection
├── AWS Cost Explorer API
├── GCP Billing Export (BigQuery)
├── Azure Cost Management API
        │
        ▼
Central Cost Platform
├── Data aggregation & normalization
├── Tag-based allocation
├── Anomaly detection 🚨
├── Forecast & budgeting
        │
        ▼
Dashboards & Alerts
├── Team-level cost dashboards
├── Budget threshold alerts (80%, 100%, 120%)
├── Daily cost anomaly notifications
├── Monthly cost review reports
        │
        ▼
Optimization Engine
├── Rightsizing recommendations
├── Reserved/Savings Plan analysis
├── Unused resource detection
├── Spot Instance opportunities
        │
        ▼
Governance & Automation
├── Auto-stop idle resources
├── Tag compliance enforcement
├── Auto-scaling policies
└── Cost approval workflows
```

**Tools for this architecture:**

| Component | Options |
|-----------|---------|
| **Collection** | AWS CUR, GCP Billing Export, Azure Exports |
| **Platform** | CloudHealth, Spot.io, Apptio, Kubecost |
| **Dashboard** | Grafana, custom (Metabase + BigQuery) |
| **Alerts** | PagerDuty, Slack, OpsGenie |
| **Automation** | Lambda functions, Cloud Functions |

Real-World Cost Reduction Case Study

Example

Scenario: AI SaaS Startup — $8,000/month → $3,200/month 🎯

Initial State ($8,000/month):

- 20 EC2 instances (all m5.xlarge On-Demand)

- 5 TB S3 Standard storage

- RDS db.r5.2xlarge (Multi-AZ)

- NAT Gateway processing 500 GB/month

- No tags, no monitoring, no optimization

Optimization Steps:

| Action | Monthly Savings |

|--------|----------------|

| Rightsizing 20 instances → 8 m5.large + 5 t3.medium | $1,800 |

| Savings Plan (1-year) for 8 baseline instances | $840 |

| Spot Instances for CI/CD and batch jobs | $600 |

| S3 Lifecycle — moved 4TB to IA/Glacier | $380 |

| RDS rightsizing — r5.2xlarge → r5.large | $520 |

| VPC Endpoints — eliminated NAT for S3/DynamoDB | $180 |

| Dev/staging shutdown nights & weekends | $480 |

| TOTAL SAVINGS | $4,800/month |

Result: 60% cost reduction! Same performance, happier CFO 😄💰

Common Cost Traps to Avoid

⚠️ Warning

Watch out for these hidden costs! 🚨

1. NAT Gateway — Silently charges $0.045/GB. VPC Endpoints use pannu!

2. Elastic IPs — Attached = free. Unattached = $3.65/month each!

3. EBS Snapshots — Old snapshots accumulate. Set retention policies!

4. CloudWatch Logs — Ingestion $0.50/GB + storage $0.03/GB/month. Set log retention!

5. Idle Load Balancers — $16/month minimum even with zero traffic

6. Cross-region replication — Data transfer charges both ways

7. Lambda provisioned concurrency — Pay even when not invoked!

Monthly cleanup checklist:

- ☐ Delete unattached EBS volumes

- ☐ Release unused Elastic IPs

- ☐ Remove old EBS snapshots (>90 days)

- ☐ Check for idle RDS instances

- ☐ Review Lambda functions with provisioned concurrency

- ☐ Audit NAT Gateway data processing

Summary

Cloud Cost Optimization pathi namma learn pannadhu:


Billing Models: On-Demand, Reserved, Savings Plans, Spot — right mix use pannu

Rightsizing: Average 15-20% CPU utilization — downsize and save 50%+

Reserved/Savings Plans: Predictable workloads ku 30-60% savings

Spot Instances: Fault-tolerant workloads ku 60-90% savings

Storage: Lifecycle policies, tiering — S3 Standard → IA → Glacier

Network: VPC Endpoints, CDN, same-AZ placement

Kubernetes: Resource requests optimize, Karpenter, Kubecost

FinOps: Tags, budgets, accountability, continuous optimization

Monitoring: Cost anomaly detection, budget alerts, dashboards


Key takeaway: Cloud cost optimization is not a one-time activity — it's a continuous practice. Monthly review pannu, automate what you can, and make every team accountable for their spend. 40-60% savings is realistic! 💰🚀


With this, nee Cloud & DevOps series complete pannitta! Infrastructure to optimization, everything covered! 🎓🎉

🏁 🎮 Mini Challenge

Challenge: Analyze & Optimize Cloud Billing


Real cost analysis — hidden charges find & eliminate pannu! 💰


Step 1: Billing Dashboard Access 📊

bash
# AWS Console → Cost Management → Cost Explorer
# GCP Console → Billing → Reports
# Azure Portal → Cost Management + Billing

# Last 3 months spend analyze
# Trends: increasing, stable, decreasing?

Step 2: Cost Attribution 🏷️

bash
# Tag all resources:
# - Environment: dev, staging, prod
# - Project: ai-model, chatbot, etc.
# - Team: backend, data, ops
# - Cost center: engineering, research

# AWS CLI
aws ec2 create-tags --resources i-1234567890abcdef0   --tags Key=Environment,Value=prod Key=Project,Value=ai-model

Step 3: Breakdown Analysis 📈

bash
# Services breakdown:
# Top 5 costs identify
# Example:
# 1. Compute (EC2): 40%
# 2. Storage (S3): 30%
# 3. Data transfer: 15%
# 4. Database (RDS): 10%
# 5. Networking: 5%

# Question: each service necessary?
# Can be optimized?

Step 4: Right-Sizing Analysis 🔍

bash
# CloudWatch insights: CPU, Memory usage last 30 days
# Example findings:
# - t3.large instance: avg 5% CPU (over-provisioned!)
# - Solution: downsize to t3.small (60% cost save)
# - Potential savings: ₹5000/month → ₹2000/month

# Report: Right-sizing recommendations

Step 5: Reservation Purchase 📋

``calls>

# Current on-demand: ₹50000/month

# 1-year reservation: 30% discount = ₹35000/month

# Savings/year: ₹180000!


# AWS Reserved Instance calculator

# Check if production workloads stable (good for RI)

code

**Step 6: Spot Instance Strategy** 🎰

# Non-critical batch jobs: on-demand → spot

# Cost: ₹100/day → ₹30/day (70% save!)

# Risk: interruption (need checkpointing)


# Example: data processing batch

# Interruption: restart from checkpoint (ok)

# Save: ₹2100/month

code

**Step 7: Unused Resources Cleanup** 🧹

# Find & remove:

# - Unattached EBS volumes

# - Unattached Elastic IPs

# - Old snapshots (beyond retention)

# - Non-running instances


# AWS CLI commands:

aws ec2 describe-volumes --filters "Name=status,Values=available" --query 'Volumes[*].VolumeId'


# Cost: ₹500/month wasted (easy win!)

code

**Step 8: Optimization Summary** 📊

# Create spreadsheet:

# Optimization | Current Cost | Optimized | Monthly Save

# Right-size EC2 | 5000 | 2000 | 3000

# Reserve instances | 50000 | 35000 | 15000

# Spot for batch | 3000 | 900 | 2100

# Remove unused | 500 | 0 | 500


# Total potential savings: ₹20,600/month (41%)

# Payoff: easy recommendations first

code

**Completion Time**: 2-3 hours
**Tools**: AWS Cost Explorer, CloudWatch, Spreadsheet
**Real cost savings** ⭐⭐⭐

💼 Interview Questions

Q1: Cloud bill unexpectedly high — troubleshoot steps?

A: (1) Service breakdown check (top 5 services). (2) Timeline analysis (sudden spike? when?). (3) Recent changes review (new resources deployed?). (4) Unattached resources find (volumes, IPs). (5) Data transfer check (expensive!). (6) Check all regions (resources everywhere?). (7) Reserved instances expiry (went back to on-demand?).


Q2: Data transfer costs expensive — minimize?

A: Data gravity principle: process where data lives. S3 → EC2: same region (free). S3 → internet: expensive. Solutions: (1) CloudFront CDN (cache edge). (2) VPC endpoints (avoid internet gateway). (3) Same-region resources. (4) Batch downloads (consolidate calls). Monitoring: data transfer separate line-item track.


Q3: Commitment discount vs Savings Plans — difference?

A: Reserved Instances (RI): specific instance type, region, 1-3 years. Savings Plans: compute flexibility (EC2, Fargate, Lambda), region flexible. RI: deeper discount (up to 72%). Savings Plans: flexibility (easier for variable workloads). Choose: predictable workloads = RI, variable = Savings Plans.


Q4: Auto-scaling cost implication — unexpected bill?

A: ASG max capacity can cause runaway costs. Solution: (1) Set max limit realistic (not unlimited). (2) Scale-down cooldown appropriate (avoid flapping). (3) Billing alerts (daily budget threshold). (4) Scheduled scaling (reduce off-peak capacity). (5) Monitor scale events (debug unnecessary scaling).


Q5: Cost accountability team-wise — sharing model?

A: Tag all resources (team, project, cost-center). Reports generated per tag. Team budgets enforce (quota limits). Chargeback model: usage-based billing (team pays). Showback: visibility without charge (educate). Monthly reviews: trends, anomalies, optimization opportunities. Personal accountability = cost consciousness.

Frequently Asked Questions

Cloud bill sudden ah increase aana enna pannradhu?
First, billing alerts setup pannu (AWS Budgets, GCP Budget Alerts). Then Cost Explorer use panni top spending services identify pannu. Common culprits: forgotten EC2 instances, unattached EBS volumes, NAT Gateway data transfer, idle load balancers.
Reserved Instances vs Savings Plans — which is better?
Savings Plans are more flexible — any instance family, any region. Reserved Instances give slightly more discount but locked to specific instance type and region. For most teams, Savings Plans are better. Start with 1-year no-upfront commitment.
Spot Instances reliable ah?
Spot Instances 2-minute warning la terminate aagalam. Stateless workloads (batch processing, CI/CD, data analysis) ku perfect. Production web servers ku use pannaadheenga. Spot + On-Demand mix use pannunga for reliability.
Free tier mattum use panni eppadi varaikum sustain pannalam?
AWS free tier 12 months (t2.micro, 750hrs/month). GCP always-free tier (e2-micro, 1 instance). For learning and small projects, free tier sufficient. Production apps ku plan pannunga — $50-200/month typical for small SaaS.
FinOps team na enna?
FinOps = Financial Operations. Engineering + Finance + Business teams collaborate panni cloud spending optimize pannradhu. FinOps team cloud costs monitor pannum, optimization recommendations kudum, accountability enforce pannum. Large companies ku essential.
🧠Knowledge Check
Quiz 1 of 2

Un company oda AWS bill la biggest waste source identify pannanum. Which metric FIRST paakkanum?

0 of 2 answered