Session 016 — ECS: Capacity Providers and Application Auto Scaling

May 16, 2026

Estimated duration: 60 minutes
Prerequisites: session-015-ecs-fargate-networking-iam

Objective

By the end, you will be able to configure a Capacity Provider for Fargate and Fargate Spot with weights, scale an ECS Service based on custom metrics (not just CPU/memory), and calculate the cost savings of Fargate Spot vs standard Fargate in a workload scenario with interruption tolerance.

Context

[FACT] Capacity Providers are the ECS abstraction layer that separates the definition of where to run tasks (which capacity pool) from the definition of what to run (task definition). Before Capacity Providers, you specified launchType: FARGATE or launchType: EC2 directly on the service — a binary and static choice. With Capacity Providers, you define a strategy with multiple providers and weights, and ECS distributes tasks among them automatically.

[FACT] Application Auto Scaling is the AWS service that manages the scaling of ECS Services (among other resources like DynamoDB, Aurora, etc.). It is separate from EC2 Auto Scaling: while EC2 Auto Scaling manages instances, Application Auto Scaling manages the desiredCount of an ECS Service. The two can coexist in an EC2-based architecture (Application Auto Scaling increases tasks, EC2 Auto Scaling via Capacity Provider increases instances to accommodate them).

[CONSENSUS] The Fargate + Fargate Spot pair is the most commonly used combination for web services that tolerate some interruption. The typical production strategy maintains a base of standard Fargate tasks (availability guarantee) and scales predominantly on Fargate Spot (cost savings), using weights to control the proportion.

Key concepts

1. Capacity Providers: base and weight

[FACT] A service's Capacity Provider strategy has two parameters per provider:

base: absolute minimum number of tasks that must run on this provider. Only one provider per strategy can have base > 0. It is satisfied before any weight calculation.
weight: relative weight. After satisfying the base, additional tasks are distributed proportionally to the weights.

Estratégia exemplo:
  FARGATE:       base=2, weight=1
  FARGATE_SPOT:  base=0, weight=4

desiredCount=2:
  → base de FARGATE é satisfeito com 2 tasks
  → nenhuma task restante para distribuir por weight
  Resultado: 2 tasks em FARGATE, 0 em FARGATE_SPOT

desiredCount=7:
  → base: 2 tasks em FARGATE (base=2)
  → restam 5 tasks para distribuir por weight 1:4
    → 1 task em FARGATE  (1/5 × 5 = 1)
    → 4 tasks em FARGATE_SPOT  (4/5 × 5 = 4)
  Resultado: 3 tasks em FARGATE, 4 tasks em FARGATE_SPOT

desiredCount=10:
  → base: 2 tasks em FARGATE
  → restam 8 tasks por weight 1:4
    → 1.6 → arredonda para 2 em FARGATE
    → 6.4 → arredonda para 6 em FARGATE_SPOT
  Resultado: ~4 tasks em FARGATE, ~6 tasks em FARGATE_SPOT

[FACT] The base ensures that even when the desiredCount is low (e.g., 1 task at midnight), you always have at least N tasks on stable capacity. This is critical for services that cannot tolerate zero availability during a Fargate Spot interruption event.

2. Fargate Spot: pricing, interruption, and graceful shutdown

[FACT] Fargate Spot runs tasks on surplus Fargate capacity at a reduced price compared to standard Fargate. The historically observed discount is approximately 70% off the Fargate on-demand price (the exact value varies by region and supply/demand fluctuation — AWS does not publish the percentage as it does with EC2 Spot).

[FACT] When AWS needs to reclaim Fargate Spot capacity, it issues a 2-minute interruption warning before terminating the task. The warning arrives through two simultaneous channels:

EventBridge: an ECS Task State Change event with stopCode: SpotInterruption is published.
SIGTERM on the container: the main process receives SIGTERM, same as a normal shutdown.

t=0:   AWS decide recuperar capacidade
t=0:   EventBridge publica SpotInterruption event
t=0:   Container recebe SIGTERM
t=120: Container recebe SIGKILL (se ainda estiver rodando)
       Task é encerrada

[FACT] The stopTimeout parameter in the container definition controls how long ECS waits between SIGTERM and SIGKILL. The default is 30 seconds. For Fargate Spot, you can configure up to 120 seconds (the 2-minute warning limit). Setting stopTimeout: 120 gives the container maximum time to:
- Finish in-progress processing.
- Checkpoint state.
- Return SQS messages to the queue (nack/visibility timeout).
- Deregister from the service registry.

Ideal workloads for Fargate Spot:

✅ Adequados para Spot:
  - Processamento de filas SQS (mensagens retornam à fila no SIGTERM)
  - Batch jobs com checkpoint (recomeça do checkpoint)
  - Rendering, transcodificação (job é reprocessado)
  - Ambientes de desenvolvimento/staging
  - Workers stateless com idempotência garantida

❌ Inadequados para Spot (use Fargate padrão):
  - Bases de dados stateful
  - Processamento transacional sem idempotência
  - WebSockets com estado de conexão crítico
  - Tasks que não tratam SIGTERM

3. Application Auto Scaling for ECS

[FACT] Application Auto Scaling manages the desiredCount of an ECS Service. It requires three components:

Scalable Target: registers the service as a scaling target.
Scaling Policy: defines how to scale (target tracking or step scaling).
CloudWatch Alarm: (automatic for target tracking, manual for step scaling).

Target Tracking vs Step Scaling:

Target Tracking:
  → Você define um TARGET VALUE para uma métrica
  → O AS calcula quantas tasks são necessárias para manter esse valor
  → Escala out quando métrica > target, in quando métrica < target
  → Mais simples, recomendado para a maioria dos casos
  → Cooldown de scale-in padrão: 300 segundos

Step Scaling:
  → Você define LIMIARES e quantas tasks adicionar/remover por limiar
  → Mais controle, mais complexo de configurar corretamente
  → Recomendado quando você precisa de reação rápida a spikes
  → Pode empilhar múltiplos steps (ex: +2 tasks quando CPU > 60%,
    +5 tasks quando CPU > 80%)

[FACT] Predefined metrics available for target tracking in ECS:

Metric	Description	Typical target value
`ECSServiceAverageCPUUtilization`	Average CPU across tasks	50-70%
`ECSServiceAverageMemoryUtilization`	Average memory	60-80%
`ALBRequestCountPerTarget`	Requests/second per task via ALB	Depends on app capacity

4. Scaling by custom metrics: the SQS case

[FACT] The most robust pattern for queue processing services is to scale based on backlog per task — not the absolute number of messages in the queue, but the ratio between pending messages and processing tasks. This avoids the problem of under-scaling when the desiredCount is high and over-scaling when it is low.

Fórmula do backlog por task:
  backlog_per_task = ApproximateNumberOfMessages / desiredCount

Exemplo:
  ApproximateNumberOfMessages = 1000 mensagens na fila
  desiredCount = 5 tasks
  backlog_per_task = 200 mensagens/task

  Se o target for 100 mensagens/task:
  → Para processar 1000 mensagens com 100/task cada
  → Precisamos de 1000/100 = 10 tasks
  → O AS escala de 5 para 10

[FACT] To implement this formula in Application Auto Scaling, you use Metric Math in Target Tracking:

{
  "CustomizedMetricSpecification": {
    "Metrics": [
      {
        "Id": "messages",
        "MetricStat": {
          "Metric": {
            "Namespace": "AWS/SQS",
            "MetricName": "ApproximateNumberOfMessagesVisible",
            "Dimensions": [{ "Name": "QueueName", "Value": "my-queue" }]
          },
          "Stat": "Sum"
        }
      },
      {
        "Id": "tasks",
        "MetricStat": {
          "Metric": {
            "Namespace": "ECS/ContainerInsights",
            "MetricName": "RunningTaskCount",
            "Dimensions": [
              { "Name": "ClusterName", "Value": "my-cluster" },
              { "Name": "ServiceName", "Value": "my-worker" }
            ]
          },
          "Stat": "Average"
        }
      },
      {
        "Id": "backlog",
        "Expression": "messages / tasks",
        "ReturnData": true
      }
    ]
  },
  "TargetValue": 100.0
}

Practical example

Scenario: An image processing service (image-processor) that:
- Consumes from an SQS queue.
- Tolerates interruptions (idempotent jobs with SQS visibility timeout as a retry mechanism).
- Must maintain at least 2 on-demand tasks to guarantee minimum processing.
- Must scale predominantly on Fargate Spot for cost savings.
- Must scale based on backlog per task (target: 50 messages/task).

Complete CDK (TypeScript)

import { Stack, StackProps, Duration } from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as sqs from 'aws-cdk-lib/aws-sqs';
import * as appscaling from 'aws-cdk-lib/aws-applicationautoscaling';
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
import { Construct } from 'constructs';

export class ImageProcessorStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    const vpc = ec2.Vpc.fromLookup(this, 'Vpc', { vpcName: 'prod-vpc' });

    // Fila SQS de entrada
    const queue = new sqs.Queue(this, 'ImageQueue', {
      queueName: 'image-processing-queue',
      visibilityTimeout: Duration.seconds(300),  // 5 min para processar cada job
    });

    // Cluster com Capacity Providers Fargate e Fargate Spot
    const cluster = new ecs.Cluster(this, 'Cluster', {
      vpc,
      enableFargateCapacityProviders: true,  // habilita FARGATE e FARGATE_SPOT
    });

    // Task Definition
    const taskDef = new ecs.FargateTaskDefinition(this, 'TaskDef', {
      cpu: 1024,
      memoryLimitMiB: 2048,
    });

    taskDef.addContainer('processor', {
      image: ecs.ContainerImage.fromRegistry('my-org/image-processor:latest'),
      environment: { QUEUE_URL: queue.queueUrl },
      logging: ecs.LogDrivers.awsLogs({ streamPrefix: 'processor' }),
      // Máximo de tempo para graceful shutdown em Spot
      stopTimeout: Duration.seconds(120),
    });

    // Permissão para consumir da fila
    queue.grantConsumeMessages(taskDef.taskRole);

    // ECS Service com estratégia de Capacity Provider
    const service = new ecs.FargateService(this, 'Service', {
      cluster,
      taskDefinition: taskDef,
      desiredCount: 2,
      // Estratégia: 2 tasks fixas em Fargate, escala em Fargate Spot
      capacityProviderStrategies: [
        {
          capacityProvider: 'FARGATE',
          base: 2,       // sempre 2 tasks on-demand
          weight: 1,     // 1 de 5 tasks adicionais em on-demand
        },
        {
          capacityProvider: 'FARGATE_SPOT',
          base: 0,
          weight: 4,     // 4 de 5 tasks adicionais em Spot (80%)
        },
      ],
      vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
    });

    // ─── Application Auto Scaling ──────────────────────────────────────────

    const scaling = service.autoScaleTaskCount({
      minCapacity: 2,
      maxCapacity: 20,
    });

    // Métrica de mensagens na fila
    const messagesVisible = new cloudwatch.Metric({
      namespace: 'AWS/SQS',
      metricName: 'ApproximateNumberOfMessagesVisible',
      dimensionsMap: { QueueName: queue.queueName },
      statistic: 'Sum',
      period: Duration.minutes(1),
    });

    // Escalar baseado em ALB Request Count (alternativa simples para APIs)
    // Para SQS, usamos step scaling com a métrica de backlog
    scaling.scaleOnMetric('ScaleOnQueueDepth', {
      metric: messagesVisible,
      scalingSteps: [
        { upper: 0, change: -1 },     // fila vazia: remove 1 task
        { lower: 100, change: +1 },   // > 100 mensagens: adiciona 1 task
        { lower: 500, change: +3 },   // > 500 mensagens: adiciona 3 tasks
        { lower: 1000, change: +5 },  // > 1000 mensagens: adiciona 5 tasks
      ],
      adjustmentType: appscaling.AdjustmentType.CHANGE_IN_CAPACITY,
      cooldown: Duration.seconds(120),
    });

    // Scale in conservador: espera 5 minutos antes de remover tasks
    scaling.scaleOnCpuUtilization('ScaleOnCpu', {
      targetUtilizationPercent: 60,
      scaleInCooldown: Duration.seconds(300),
      scaleOutCooldown: Duration.seconds(60),
    });
  }
}

Calculating cost savings: Fargate vs Fargate Spot

Cenário: serviço com desiredCount médio de 10 tasks (1 vCPU, 2 GB cada)

Preço Fargate (us-east-1, referência aproximada em 2025):
  CPU:    $0.04048 por vCPU-hora
  Memory: $0.004445 por GB-hora

Custo por task/hora (Fargate padrão):
  = (1 vCPU × $0.04048) + (2 GB × $0.004445)
  = $0.04048 + $0.00889
  = $0.04937/hora por task

Custo por task/hora (Fargate Spot, ~70% desconto):
  = $0.04937 × 0.30
  = ~$0.01481/hora por task

Custo mensal com estratégia mista (base=2 Fargate, 8 Spot):
  Fargate padrão:  2 tasks × $0.04937 × 720h = $71.09
  Fargate Spot:    8 tasks × $0.01481 × 720h = $85.34
  Total:           $156.43/mês

Custo mensal com 100% Fargate padrão:
  10 tasks × $0.04937 × 720h = $355.46/mês

Economia com a estratégia mista: ~56% (~$199/mês)

[UNCERTAIN] The exact Fargate Spot prices vary by region and fluctuate according to supply/demand. The 70% discount is based on common historical observations, but may be different at the time you are reading. Check the current price at aws.amazon.com/fargate/pricing.

Common pitfalls

Pitfall 1: base=0 on both providers — tasks always Spot during reduced scaling

The mistake: You configure the strategy FARGATE: base=0, weight=1 and FARGATE_SPOT: base=0, weight=4. With desiredCount=1 (e.g., service under low load at night), the single task goes to Fargate Spot (higher weight). A Spot interruption at night takes down the only task of the service. The ALB starts returning 503. ECS starts a new task on Spot, but while the new task comes up (~30-60s), the service is unavailable.

Why it happens: With base=0, ECS distributes all tasks by weight, including the first one. Without any task guaranteed on standard Fargate, a Spot interruption results in downtime during the replacement time.

How to avoid: Always set base >= 1 (or base >= 2 for zero-downtime during replacement) on the most stable provider (standard FARGATE). The base is the minimum availability insurance.

Pitfall 2: Short scale-in cooldown causing thrashing

The mistake: You configure target tracking with scaleInCooldown: Duration.seconds(60). An SQS queue has burst processing: fills up quickly, is processed in 2 minutes, empties. The AS scales out to 10 tasks, the queue empties, the AS scales in to 2 tasks in 60 seconds. Thirty seconds later, a new message arrives, the AS scales out again. This cycle creates thrashing — tasks being created and destroyed every few minutes, which:
- Increases cost (minimum 1-minute charge per Fargate task).
- Degrades performance (task cold start time).
- Creates excessive noise in logs and metrics.

Why it happens: Fast scale-in without understanding the message arrival pattern. For queues with periodic bursts, the cooldown should be longer than the period between bursts.

How to avoid: For SQS queues, use scaleInCooldown: Duration.seconds(300) (5 minutes) or longer. Scale out can be fast (30-60s), but scale in should be conservative. Analyze the historical traffic pattern before configuring cooldowns.

Pitfall 3: Not handling SIGTERM in Fargate Spot containers

The mistake: The worker container on Fargate Spot uses a Python process without signal handling. When the Spot interruption occurs, the process receives SIGTERM but does not handle it — it keeps running. After the stopTimeout (default: 30 seconds), ECS sends SIGKILL. The SQS message that was being processed does not return to the queue (the worker did not nack or reset the visibility timeout). The message goes to the DLQ after exceeding the maxReceiveCount, or gets "lost" if processing was halfway through.

Why it happens: Most runtimes ignore SIGTERM by default. Python, Node.js, and Java need explicit handlers for SIGTERM.

How to avoid: Implement a SIGTERM handler that:
1. Stops accepting new messages from the queue.
2. Waits for current processing to finish (or cancels and nacks).
3. Exits with exit(0).

# Python — exemplo de handler SIGTERM para worker SQS
import signal
import sys

shutdown_requested = False

def handle_sigterm(signum, frame):
    global shutdown_requested
    print("SIGTERM recebido — iniciando graceful shutdown")
    shutdown_requested = True

signal.signal(signal.SIGTERM, handle_sigterm)

while not shutdown_requested:
    messages = sqs.receive_message(QueueUrl=QUEUE_URL, MaxNumberOfMessages=1)
    if 'Messages' in messages:
        msg = messages['Messages'][0]
        try:
            process(msg)
            sqs.delete_message(QueueUrl=QUEUE_URL, ReceiptHandle=msg['ReceiptHandle'])
        except Exception as e:
            # Não deleta — mensagem volta à fila automaticamente
            print(f"Erro: {e}")

print("Shutdown completo")
sys.exit(0)

Reflection exercise

You are sizing a video transcoding service on ECS Fargate that processes jobs from an SQS queue. Each job takes an average of 8 minutes to complete. The job volume has a clear daily pattern: peak of 500 jobs/hour from 2 PM to 6 PM and low of 10 jobs/hour from midnight to 6 AM.

How would you configure the Capacity Provider strategy (base and weight between Fargate and Fargate Spot)? What would be the appropriate stopTimeout for the container? What would be the most suitable scaling metric — CPU, queue message count, or backlog per task — and what would be the target value? What are the risks of using Fargate Spot for this type of workload and how does the worker design (SIGTERM handling, idempotency, checkpoint) mitigate those risks? Finally, how would you calculate the estimated monthly cost with the mixed strategy versus 100% standard Fargate?

Resources for further study

1. Amazon ECS clusters for Fargate (Capacity Providers)

URL: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/fargate-capacity-providers.html
What to find: Complete documentation of FARGATE and FARGATE_SPOT Capacity Providers: how to configure the strategy, base and weight values, the Spot interruption mechanism, and how to handle SIGTERM.
Why it's the right source: It is the primary reference for Fargate Capacity Providers — includes the interruption protocol details that any Spot operator needs to understand.

2. Automatically scale your Amazon ECS service

URL: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-auto-scaling.html
What to find: Complete guide to Application Auto Scaling for ECS: target tracking, step scaling, predefined and custom metrics, and cooldown configuration. Includes AWS CLI examples.
Why it's the right source: It is the official guide that integrates ECS with Application Auto Scaling, covering both concepts and practical configuration examples.

3. Optimizing Amazon ECS service auto scaling

URL: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/capacity-autoscaling-best-practice.html
What to find: Best practices guide specific to ECS scaling: how to choose the right metric by workload type, cooldown pitfalls, and the backlog-per-task pattern for SQS queues.
Why it's the right source: It is the prescriptive guide — tells you what to do beyond how to do it, with justifications for each recommendation.