luizmachado.dev

PT EN

Session 044 — Secrets Manager: Automatic Rotation, Lambda Rotators and RDS Integration

Prerequisite: none (self-contained session)


Session Objectives

  • Understand the version lifecycle and staging labels (AWSPENDING → AWSCURRENT → AWSPREVIOUS)
  • Implement the four rotation function steps (createSecret, setSecret, testSecret, finishSecret)
  • Distinguish Single User vs Alternating Users and when each strategy is appropriate
  • Configure native rotation for RDS PostgreSQL with AWS-managed Lambda
  • Build a custom rotator for an external service
  • Write applications resilient to rotation (cache + retry without downtime)

1. Secrets Manager Fundamentals

1.1 Versions and Staging Labels

[FACT] Each secret in Secrets Manager can have multiple simultaneous versions. Each version receives one or more staging labels that identify its role in the rotation cycle.

Estados de staging labels durante rotação:

  ┌──────────────────────────────────────────────────────────────┐
  │ Antes da rotação                                             │
  │   versão A: [AWSCURRENT]                                     │
  │   versão B: [AWSPREVIOUS]  (da rotação anterior)            │
  └──────────────────────────────────────────────────────────────┘

  ┌──────────────────────────────────────────────────────────────┐
  │ Durante a rotação (entre createSecret e finishSecret)        │
  │   versão A: [AWSCURRENT]                                     │
  │   versão B: [AWSPREVIOUS]                                    │
  │   versão C: [AWSPENDING]   ← nova senha gerada, DB atualizado│
  └──────────────────────────────────────────────────────────────┘

  ┌──────────────────────────────────────────────────────────────┐
  │ Após finishSecret (rotação completa)                         │
  │   versão A: [AWSPREVIOUS]  ← era AWSCURRENT                 │
  │   versão B: sem label      ← removida automaticamente        │
  │   versão C: [AWSCURRENT]   ← nova versão promovida          │
  └──────────────────────────────────────────────────────────────┘

[FACT] AWSPENDING must never be removed manually before finishSecret. If AWSPENDING exists without being on the same versionId as AWSCURRENT, any new rotation attempt will assume a previous rotation is in progress and return an error.

1.2 Encryption

[FACT] All secrets are encrypted at rest. By default, it uses the AWS-managed key aws/secretsmanager (no additional KMS cost, but no control over key policy). For granular control, use a CMK (Customer Managed Key).

[FACT] If a secret uses a custom CMK, the rotation Lambda needs kms:Decrypt and kms:GenerateDataKey permissions on the CMK. Use kms:EncryptionContext:SecretARN to limit the Lambda's access to only the secret it rotates.

1.3 Pricing

[FACT] $0.40/secret/month + $0.05 per 10,000 API calls. The first 30 days of a new secret are free.


2. The Four Rotation Function Steps

[FACT] Every rotation function — native or custom — must implement four methods invoked sequentially by Secrets Manager:

Fluxo sequencial de rotação:

  Secrets Manager
       │
       ├──1──► createSecret  → gera nova senha
       │                     → put_secret_value com label AWSPENDING
       │                     → idempotente: verifica se AWSPENDING já existe
       │
       ├──2──► setSecret     → aplica nova senha no banco/serviço de destino
       │                     → usa credenciais AWSCURRENT para conectar
       │                     → muda senha do usuário para o valor em AWSPENDING
       │
       ├──3──► testSecret    → testa conexão usando AWSPENDING
       │                     → lê algo do banco para confirmar que funciona
       │                     → se falhar → rotação para; AWSPENDING permanece
       │
       └──4──► finishSecret  → update_secret_version_stage:
                               move AWSCURRENT → versão antiga vira AWSPREVIOUS
                               versão AWSPENDING → vira nova AWSCURRENT
                               AWSPENDING é removido atomicamente

2.1 Idempotency and ClientRequestToken

[FACT] Secrets Manager passes a ClientRequestToken (UUID) as VersionId for each rotation call. The createSecret must check if a version with that token already exists before generating a new password — this ensures idempotency if the Lambda is called again after a partial failure.


3. Rotation Strategies

3.1 Single User (recommended for most cases)

[FACT] Updates credentials of a single user in a single secret. The sequence is: generate new password → apply to the database → update secret.

Linha do tempo — Single User:

  t0  createSecret:  AWSPENDING criado com nova senha
  t1  setSecret:     DB muda senha do usuário
  t2  [janela de risco]: DB já tem nova senha, AWSCURRENT ainda tem a antiga
                         → conexões novas com AWSCURRENT falham
  t3  testSecret:    testa AWSPENDING → sucesso
  t4  finishSecret:  AWSPENDING promovido a AWSCURRENT
  t5  [normalizado]: todas as conexões novas usam nova senha

Duração da janela de risco (t1→t4): segundos a milissegundos
Mitigação: exponential backoff + retry na aplicação

[FACT] Single User is recommended for: general cases, ad hoc/interactive users, and when the database does not support cloning users. RDS Proxy supports Single User.

3.2 Alternating Users (high availability)

[FACT] Maintains two users (e.g., myuser and myuser_clone) and alternates which one has the updated password. Rotation works as follows:

Primeira rotação:
  AWSPENDING = {username: "myuser_clone", password: gerada}
  Lambda clona "myuser" → cria "myuser_clone" com mesma permissão
  AWSCURRENT passa a apontar para myuser_clone

Segunda rotação:
  AWSPENDING = {username: "myuser", password: nova_senha}
  Lambda atualiza senha de "myuser" (não precisa clonar)
  AWSCURRENT passa a apontar para myuser

Terceira rotação: idem à primeira (alterna de volta para myuser_clone)

[FACT] Alternating Users requires a second secret with superuser credentials (which has permission to create/clone users in the database). The Lambda uses the superuser to clone and modify permissions.

[FACT] Amazon RDS Proxy does NOT support Alternating Users — use Single User when using RDS Proxy.

[FACT] If the original user's permissions are changed after the clone is created, the cloned user is not updated automatically — manual update is required.


4. Network Access — Critical Requirement

[FACT] The rotation Lambda needs to reach two endpoints simultaneously:

Lambda de rotação (na VPC)
    │
    ├──► Secrets Manager endpoint
    │    Opção A: VPC Endpoint (Interface) para secretsmanager — recomendado
    │    Opção B: NAT Gateway + Internet Gateway
    │
    └──► RDS PostgreSQL endpoint (porta 5432)
         Security Group: permitir Lambda SG → RDS SG na porta 5432

[FACT] If the Lambda cannot reach the Secrets Manager endpoint, rotation fails immediately with SecretsManagerServiceException. The most common cause of rotation failure is a network/VPC issue.


5. CDK Python — Native RDS PostgreSQL Rotation

from aws_cdk import (
    Stack, Duration, RemovalPolicy,
    aws_ec2 as ec2,
    aws_rds as rds,
    aws_secretsmanager as sm,
    aws_iam as iam,
    aws_lambda as _lambda,
    aws_kms as kms,
)
from constructs import Construct

class SecretsManagerRDSStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        vpc = ec2.Vpc(self, "VPC", max_azs=2, nat_gateways=1)

        # KMS CMK para criptografia dos segredos
        secret_key = kms.Key(self, "SecretKey",
            description="CMK para Secrets Manager - RDS credentials",
            enable_key_rotation=True,  # rotação da própria CMK anualmente
        )

        # ──────────────────────────────────────────────────────────────
        # Segredo do superusuário RDS (admin) — necessário para
        # Alternating Users strategy
        # ──────────────────────────────────────────────────────────────
        superuser_secret = sm.Secret(self, "RDSSuperuserSecret",
            secret_name="rds/postgres/superuser",
            description="Credenciais do superusuário RDS PostgreSQL",
            encryption_key=secret_key,
            generate_secret_string=sm.SecretStringGenerator(
                secret_string_template='{"username": "postgres"}',
                generate_string_key="password",
                password_length=32,
                exclude_characters="/@\"\\' ",
                exclude_punctuation=False,
            ),
        )

        # ──────────────────────────────────────────────────────────────
        # Instância RDS PostgreSQL com rotação de credenciais
        # ──────────────────────────────────────────────────────────────
        db_sg = ec2.SecurityGroup(self, "DBSG", vpc=vpc)

        db_instance = rds.DatabaseInstance(self, "PostgresDB",
            engine=rds.DatabaseInstanceEngine.postgres(
                version=rds.PostgresEngineVersion.VER_16
            ),
            instance_type=ec2.InstanceType.of(
                ec2.InstanceClass.T3, ec2.InstanceSize.MEDIUM
            ),
            vpc=vpc,
            vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
            security_groups=[db_sg],
            credentials=rds.Credentials.from_secret(superuser_secret),
            removal_policy=RemovalPolicy.DESTROY,
            deletion_protection=False,
        )

        # ──────────────────────────────────────────────────────────────
        # Segredo da aplicação (usuário "app_user") — será rotacionado
        # com Alternating Users strategy
        # ──────────────────────────────────────────────────────────────
        app_secret = sm.Secret(self, "AppUserSecret",
            secret_name="rds/postgres/app-user",
            description="Credenciais do usuário da aplicação — rotação automática",
            encryption_key=secret_key,
            generate_secret_string=sm.SecretStringGenerator(
                secret_string_template=(
                    '{"username": "app_user",'
                    f'"host": "{db_instance.db_instance_endpoint_address}",'
                    f'"port": "{db_instance.db_instance_endpoint_port}",'
                    '"dbname": "appdb","engine": "postgres"}'
                ),
                generate_string_key="password",
                password_length=32,
                exclude_characters="/@\"\\' ;",
            ),
        )

        # ──────────────────────────────────────────────────────────────
        # Rotação automática: estratégia Alternating Users
        # HostedRotationLambda = Lambda gerenciada pela AWS (rotação nativa)
        # ──────────────────────────────────────────────────────────────
        app_secret.add_rotation_schedule("RotationSchedule",
            hosted_rotation=sm.HostedRotation.postgre_sql_multi_user(
                function_name="SecretsManagerRDSPostgreSQLRotation",
                master_secret=superuser_secret,  # superusuário para clonar
                vpc=vpc,
                vpc_subnets=ec2.SubnetSelection(
                    subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
                ),
                security_groups=[ec2.SecurityGroup(self, "RotationLambdaSG", vpc=vpc)],
                exclude_characters="/@\"\\' ;",
            ),
            automatically_after=Duration.days(30),  # rotação a cada 30 dias
        )

        # Permitir Lambda de rotação conectar ao RDS
        db_sg.add_ingress_rule(
            peer=ec2.Peer.ipv4(vpc.vpc_cidr_block),
            connection=ec2.Port.tcp(5432),
            description="Permitir Lambda de rotação acessar PostgreSQL",
        )

        # VPC Endpoint para Secrets Manager — Lambda não precisa de NAT
        vpc.add_interface_endpoint("SecretsManagerEndpoint",
            service=ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
        )

        # ──────────────────────────────────────────────────────────────
        # Exemplo de rotação Single User (mais simples)
        # Para quando Alternating Users não é necessário
        # ──────────────────────────────────────────────────────────────
        simple_secret = sm.Secret(self, "SimpleSecret",
            secret_name="rds/postgres/simple-user",
            encryption_key=secret_key,
            generate_secret_string=sm.SecretStringGenerator(
                secret_string_template=(
                    '{"username": "simple_user",'
                    f'"host": "{db_instance.db_instance_endpoint_address}"}'
                ),
                generate_string_key="password",
                password_length=32,
                exclude_characters="/@\"\\' ",
            ),
        )

        simple_secret.add_rotation_schedule("SimpleRotationSchedule",
            hosted_rotation=sm.HostedRotation.postgre_sql_single_user(
                function_name="SecretsManagerRDSPostgreSQLSingleUserRotation",
                vpc=vpc,
                vpc_subnets=ec2.SubnetSelection(
                    subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
                ),
            ),
            automatically_after=Duration.days(30),
        )

6. Python — Custom Rotation Function (External Service)

"""
Template de rotation function customizada para serviço externo
(ex.: API key de terceiro, token OAuth, senha de sistema legado).

Deployed via CDK Lambda Function apontado como rotation function no segredo.
"""
import json
import logging
import boto3
from botocore.exceptions import ClientError

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)

secretsmanager = boto3.client("secretsmanager")


def handler(event: dict, context) -> None:
    """
    Entry point: Secrets Manager invoca esta função com um evento
    contendo arn, token e step.
    """
    arn   = event["SecretId"]
    token = event["ClientRequestToken"]
    step  = event["Step"]

    # Verificar que o segredo tem a versão com este token
    metadata = secretsmanager.describe_secret(SecretId=arn)
    versions = metadata.get("VersionIdsToStages", {})

    if token not in versions:
        raise ValueError(f"Secret version {token} has no stage for secret {arn}")

    if "AWSCURRENT" in versions[token]:
        # Rotação já foi concluída para este token — idempotente
        logger.info("Version %s already set as AWSCURRENT, nothing to do.", token)
        return

    if "AWSPENDING" not in versions[token]:
        raise ValueError(f"Secret version {token} not set as AWSPENDING for {arn}")

    dispatch = {
        "createSecret": create_secret,
        "setSecret":    set_secret,
        "testSecret":   test_secret,
        "finishSecret": finish_secret,
    }
    if step not in dispatch:
        raise ValueError(f"Invalid step parameter: {step}")

    dispatch[step](secretsmanager, arn, token)


def create_secret(client, arn: str, token: str) -> None:
    """
    Step 1: Gera nova credencial e armazena como AWSPENDING.
    Idempotente: se AWSPENDING já existe com este token, não faz nada.
    """
    # Tentar buscar a versão AWSPENDING existente (para idempotência)
    try:
        client.get_secret_value(SecretId=arn, VersionId=token, VersionStage="AWSPENDING")
        logger.info("createSecret: AWSPENDING version %s already exists. Idempotent skip.", token)
        return
    except client.exceptions.ResourceNotFoundException:
        pass  # AWSPENDING não existe ainda — prosseguir

    # Buscar o segredo atual para obter estrutura
    current = json.loads(
        client.get_secret_value(SecretId=arn, VersionStage="AWSCURRENT")["SecretString"]
    )

    # Gerar nova credencial (ex.: nova API key)
    new_credential = _generate_new_credential(current)

    # Armazenar como AWSPENDING
    client.put_secret_value(
        SecretId=arn,
        ClientRequestToken=token,
        SecretString=json.dumps(new_credential),
        VersionStages=["AWSPENDING"],
    )
    logger.info("createSecret: new version stored as AWSPENDING for %s.", arn)


def set_secret(client, arn: str, token: str) -> None:
    """
    Step 2: Aplica a nova credencial no serviço de destino.
    IMPORTANTE: verificar que AWSCURRENT e AWSPENDING apontam para o mesmo recurso
    antes de modificar (defesa contra confused deputy).
    """
    current_secret = json.loads(
        client.get_secret_value(SecretId=arn, VersionStage="AWSCURRENT")["SecretString"]
    )
    pending_secret = json.loads(
        client.get_secret_value(SecretId=arn, VersionId=token, VersionStage="AWSPENDING")["SecretString"]
    )

    # Validação de segurança: mesmo recurso de destino
    if current_secret.get("endpoint") != pending_secret.get("endpoint"):
        raise ValueError("AWSCURRENT and AWSPENDING point to different endpoints. Aborting.")

    # Aplicar a nova credencial no serviço externo
    _apply_credential_to_service(
        endpoint=pending_secret["endpoint"],
        old_api_key=current_secret["api_key"],
        new_api_key=pending_secret["api_key"],
    )
    logger.info("setSecret: credential updated in service for %s.", arn)


def test_secret(client, arn: str, token: str) -> None:
    """
    Step 3: Valida que a nova credencial (AWSPENDING) funciona.
    """
    pending_secret = json.loads(
        client.get_secret_value(SecretId=arn, VersionId=token, VersionStage="AWSPENDING")["SecretString"]
    )

    # Testar a credencial contra o serviço
    if not _test_credential(pending_secret):
        raise RuntimeError(
            f"testSecret: new credential failed validation for {arn}. "
            "Rotation will not complete."
        )
    logger.info("testSecret: new credential validated successfully for %s.", arn)


def finish_secret(client, arn: str, token: str) -> None:
    """
    Step 4: Promove AWSPENDING a AWSCURRENT atomicamente.
    Secrets Manager automaticamente move a versão anterior para AWSPREVIOUS.
    NÃO remover AWSPENDING manualmente antes desta chamada.
    """
    # Encontrar o versionId atual de AWSCURRENT
    metadata = client.describe_secret(SecretId=arn)
    current_version = next(
        (vid for vid, stages in metadata["VersionIdsToStages"].items()
         if "AWSCURRENT" in stages and vid != token),
        None,
    )

    # Mover AWSCURRENT da versão antiga para a nova versão (token)
    # Esta chamada também remove AWSPENDING atomicamente
    client.update_secret_version_stage(
        SecretId=arn,
        VersionStage="AWSCURRENT",
        MoveToVersionId=token,
        RemoveFromVersionId=current_version,
    )
    logger.info(
        "finishSecret: rotation complete. New version %s is now AWSCURRENT for %s.",
        token, arn
    )


# ── Funções auxiliares do serviço externo ──────────────────────────────────

def _generate_new_credential(current: dict) -> dict:
    """Gera nova credencial mantendo a estrutura do segredo."""
    import secrets
    return {
        **current,
        "api_key": secrets.token_urlsafe(32),
        "generated_at": __import__("datetime").datetime.utcnow().isoformat() + "Z",
    }


def _apply_credential_to_service(endpoint: str, old_api_key: str, new_api_key: str):
    """
    Aplica a nova API key no serviço externo.
    Implementação específica ao serviço — ex.: chamada REST para rotacionar.
    """
    import urllib.request, urllib.error
    # Exemplo: POST /rotate-key com Basic Auth usando old_api_key
    # request = urllib.request.Request(
    #     f"{endpoint}/rotate-key",
    #     data=json.dumps({"new_key": new_api_key}).encode(),
    #     headers={"Authorization": f"Bearer {old_api_key}"},
    #     method="POST",
    # )
    # urllib.request.urlopen(request)
    pass  # implementar conforme o serviço


def _test_credential(secret: dict) -> bool:
    """Testa a credencial AWSPENDING contra o serviço."""
    # Exemplo: GET /health com a nova API key
    # Deve retornar True se a credencial funciona, False caso contrário
    return True  # implementar conforme o serviço

7. Python — Application Resilient to Rotation

"""
Padrão de aplicação resiliente a rotação de segredos.
Princípio: cache local com TTL + retry automático em falha de auth.
"""
import json
import time
import logging
import functools
from dataclasses import dataclass, field
from typing import Any, Optional
import boto3
import psycopg2
from psycopg2 import OperationalError

logger = logging.getLogger(__name__)

@dataclass
class CachedSecret:
    value: dict
    fetched_at: float
    ttl_seconds: int = 300  # 5 minutos — bem menor que o período de rotação (30 dias)

    def is_expired(self) -> bool:
        return time.time() - self.fetched_at > self.ttl_seconds


class SecretCache:
    """
    Cache de segredos com TTL.
    NÃO chame get_secret_value em cada request — use o cache.
    Cache TTL << período de rotação para que novas versões sejam descobertas.
    """
    def __init__(self, ttl_seconds: int = 300):
        self._cache: dict[str, CachedSecret] = {}
        self._ttl = ttl_seconds
        self._client = boto3.client("secretsmanager")

    def get(self, secret_name: str, force_refresh: bool = False) -> dict:
        cached = self._cache.get(secret_name)
        if cached and not cached.is_expired() and not force_refresh:
            return cached.value

        logger.info("Fetching secret %s from Secrets Manager.", secret_name)
        response = self._client.get_secret_value(SecretId=secret_name)
        value = json.loads(response["SecretString"])

        self._cache[secret_name] = CachedSecret(
            value=value,
            fetched_at=time.time(),
            ttl_seconds=self._ttl,
        )
        return value

    def invalidate(self, secret_name: str):
        self._cache.pop(secret_name, None)


# Instância global — compartilhada dentro da execução Lambda
_secret_cache = SecretCache(ttl_seconds=300)


def get_db_connection(secret_name: str, force_refresh: bool = False):
    """
    Obtém conexão PostgreSQL com credenciais do Secrets Manager.
    Se force_refresh=True, busca nova versão (após falha de auth).
    """
    secret = _secret_cache.get(secret_name, force_refresh=force_refresh)
    return psycopg2.connect(
        host=secret["host"],
        port=int(secret.get("port", 5432)),
        database=secret["dbname"],
        user=secret["username"],
        password=secret["password"],
        connect_timeout=5,
    )


def with_db_rotation_resilience(secret_name: str, max_retries: int = 2):
    """
    Decorator que adiciona resiliência a rotação de segredos.
    Em caso de falha de autenticação (OperationalError com "password authentication"),
    invalida o cache, busca nova versão e tenta novamente.

    Uso:
        @with_db_rotation_resilience("rds/postgres/app-user")
        def minha_funcao_db(conn, ...):
            ...
    """
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            last_error = None
            for attempt in range(max_retries + 1):
                conn = None
                try:
                    force_refresh = (attempt > 0)
                    if force_refresh:
                        logger.warning(
                            "Auth failure on attempt %d for %s. Refreshing secret.",
                            attempt, secret_name
                        )
                        _secret_cache.invalidate(secret_name)

                    conn = get_db_connection(secret_name, force_refresh=force_refresh)
                    return func(conn, *args, **kwargs)

                except OperationalError as e:
                    last_error = e
                    error_msg = str(e).lower()
                    # Só fazer retry se for erro de autenticação (possível rotação)
                    if "password authentication" not in error_msg \
                       and "authentication failed" not in error_msg:
                        raise  # erro de rede/DB real — não é rotação
                    logger.warning("Auth error (attempt %d/%d): %s", attempt+1, max_retries+1, e)
                    time.sleep(0.5 * (attempt + 1))  # backoff simples

                finally:
                    if conn:
                        conn.close()

            raise RuntimeError(
                f"Failed to connect after {max_retries + 1} attempts. "
                f"Last error: {last_error}"
            )
        return wrapper
    return decorator


# Exemplo de uso do decorator
@with_db_rotation_resilience("rds/postgres/app-user")
def create_order(conn, order_data: dict) -> int:
    """Cria um pedido — automaticamente resiliente a rotação."""
    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO orders (customer_id, total) VALUES (%s, %s) RETURNING id",
            (order_data["customer_id"], order_data["total"])
        )
        conn.commit()
        return cur.fetchone()[0]


# Lambda handler resiliente
def lambda_handler(event: dict, context) -> dict:
    try:
        order_id = create_order(order_data=event)
        return {"statusCode": 200, "orderId": order_id}
    except Exception as e:
        logger.error("Failed to process order: %s", e)
        return {"statusCode": 500, "error": str(e)}

8. CLI — Essential Examples

# 1. Criar segredo com geração automática de senha
aws secretsmanager create-secret \
  --name "rds/postgres/app-user" \
  --description "Credenciais da aplicação de checkout" \
  --secret-string '{
    "username": "app_user",
    "password": "WILL_BE_ROTATED",
    "host": "my-db.cluster-xxx.us-east-1.rds.amazonaws.com",
    "port": "5432",
    "dbname": "appdb",
    "engine": "postgres"
  }' \
  --kms-key-id "arn:aws:kms:us-east-1:123456789012:key/abc-123"

# 2. Ver versões e staging labels de um segredo
aws secretsmanager describe-secret \
  --secret-id "rds/postgres/app-user" \
  --query '{
    Name: Name,
    RotationEnabled: RotationEnabled,
    RotationRules: RotationRules,
    VersionIdsToStages: VersionIdsToStages
  }'

# 3. Buscar segredo atual (AWSCURRENT)
aws secretsmanager get-secret-value \
  --secret-id "rds/postgres/app-user" \
  --query 'SecretString' --output text | python3 -m json.tool

# 4. Buscar versão anterior (AWSPREVIOUS) — útil para rollback diagnóstico
aws secretsmanager get-secret-value \
  --secret-id "rds/postgres/app-user" \
  --version-stage AWSPREVIOUS \
  --query 'SecretString' --output text

# 5. Ativar rotação automática com Lambda nativa (30 dias)
aws secretsmanager rotate-secret \
  --secret-id "rds/postgres/app-user" \
  --rotation-lambda-arn "arn:aws:lambda:us-east-1:123456789012:function:SecretsManagerRDSPostgreSQLRotation" \
  --rotation-rules AutomaticallyAfterDays=30

# 6. Disparar rotação imediata (independente do schedule)
aws secretsmanager rotate-secret \
  --secret-id "rds/postgres/app-user"

# 7. Verificar status após rotação (checar se AWSPENDING foi resolvido)
aws secretsmanager list-secret-version-ids \
  --secret-id "rds/postgres/app-user" \
  --query 'Versions[*].{VersionId:VersionId,Stages:VersionStages,Date:LastAccessedDate}' \
  --output table

# 8. Cancelar rotação em andamento (se AWSPENDING ficou preso)
aws secretsmanager cancel-rotate-secret \
  --secret-id "rds/postgres/app-user"

# 9. Desabilitar rotação automática
aws secretsmanager rotate-secret \
  --secret-id "rds/postgres/app-user" \
  --no-rotate-immediately \
  --rotation-rules AutomaticallyAfterDays=0

# 10. Verificar logs de rotação (Lambda CloudWatch)
aws logs filter-log-events \
  --log-group-name "/aws/lambda/SecretsManagerRDSPostgreSQLRotation" \
  --start-time $(date -d '1 hour ago' +%s000) \
  --filter-pattern "ERROR" \
  --query 'events[*].message' --output text

# 11. Listar todos os segredos com rotação habilitada
aws secretsmanager list-secrets \
  --filters Key=rotation-enabled,Values=true \
  --query 'SecretList[*].{Name:Name,RotationEnabled:RotationEnabled,NextRotation:NextRotationDate}' \
  --output table

# 12. Resource policy — permitir acesso cross-account ao segredo
aws secretsmanager put-resource-policy \
  --secret-id "rds/postgres/app-user" \
  --resource-policy '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"AWS": "arn:aws:iam::987654321098:role/AppRole"},
      "Action": ["secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret"],
      "Resource": "*"
    }]
  }'

9. Diagram: Complete Rotation Cycle

          CICLO DE ROTAÇÃO — SINGLE USER (30 dias)

 t=0 (schedule disparado)
  │
  ├─ createSecret ─────────────────────────────────────────────────
  │   put_secret_value(token, AWSPENDING)
  │   → nova senha gerada, armazenada com label AWSPENDING
  │
  ├─ setSecret ────────────────────────────────────────────────────
  │   ALTER USER app_user PASSWORD 'nova_senha_do_AWSPENDING'
  │   ↑ banco tem nova senha; AWSCURRENT ainda tem a antiga
  │   ← JANELA DE RISCO (milissegundos a segundos) →
  │
  ├─ testSecret ───────────────────────────────────────────────────
  │   SELECT 1 usando AWSPENDING → OK
  │
  └─ finishSecret ─────────────────────────────────────────────────
      update_secret_version_stage:
        AWSCURRENT  → versão antiga = AWSPREVIOUS
        AWSPENDING  → nova versão  = AWSCURRENT (+ AWSPENDING removido)

  ↓ após finishSecret
  Aplicação com cache TTL expirado busca AWSCURRENT → nova senha
  Aplicação com cache ainda válido: retenta com nova senha se auth falhar

  ↓ t+30 dias: próxima rotação

10. Pitfalls

[FACT] Lambda without network access = rotation fails immediately: the rotation Lambda needs to reach both the Secrets Manager endpoint and the database. Without a VPC Endpoint or NAT Gateway, rotation fails with SecretsManagerServiceException. Check: private subnet, route table, security groups.

[FACT] AWSPENDING "stuck" on an empty version = future rotation blocked: if a rotation failed between createSecret and setSecret, AWSPENDING may be attached to a version with empty content. Every subsequent rotation returns an error assuming a rotation is in progress. Solution: cancel-rotate-secret to clean up the orphaned AWSPENDING.

[FACT] Alternating Users incompatible with RDS Proxy: AWS documentation is explicit — RDS Proxy does not support the Alternating Users strategy. Use Single User when you have RDS Proxy in front.

[FACT] Do not log SecretString: the rotation Lambda has access to SecretString in plaintext. An accidental logger.debug(response) exposes the password in CloudWatch Logs. AWS documentation explicitly warns about this.

[CONSENSUS] Cache too long increases risk window: if the application caches the password for 24 hours and the password was rotated, there are 24 hours of auth failures before the cache expires. A 5-minute TTL + retry on auth failure is the safe standard.

[FACT] Minimum permissions on Lambda execution role: the rotation Lambda should not have secretsmanager:*. The minimum permissions are: GetSecretValue, PutSecretValue, UpdateSecretVersionStage, DescribeSecret for the specific secret(s), plus kms:Decrypt/kms:GenerateDataKey if using a CMK.

[FACT] Minimum rotation every 4 hours: Secrets Manager allows a schedule with rate(4 hours) as the minimum. More frequent rotations are rejected.

[CONSENSUS] secretsmanager:SecretId in Lambda resource policy: to prevent confused deputy attacks, add aws:SourceAccount to the rotation Lambda's resource policy — this prevents other accounts from invoking the Lambda pretending to be Secrets Manager.


11. When to Use Each Strategy

╔══════════════════════════════╦═══════════════════════════════════════╗
║ Cenário                      ║ Estratégia recomendada                ║
╠══════════════════════════════╬═══════════════════════════════════════╣
║ Aplicação geral (maioria)    ║ Single User + retry com backoff       ║
║ Com RDS Proxy                ║ Single User (única opção suportada)   ║
║ Alta disponibilidade crítica ║ Alternating Users (zero-downtime)     ║
║ API key de serviço externo   ║ Rotation function customizada         ║
║ Usuário interativo / ad hoc  ║ Single User (whitepaper recomenda)    ║
║ Banco sem suporte a CLONE    ║ Single User                           ║
╚══════════════════════════════╩═══════════════════════════════════════╝

Reflection Exercise

A high-traffic Lambda application (10,000 req/s) connects to an RDS PostgreSQL via RDS Proxy. It currently uses hard-coded credentials in the code. The team wants to migrate to Secrets Manager with automatic rotation every 30 days, without any downtime.

Design the architecture and answer:

  1. Which rotation strategy to choose and why? (hint: RDS Proxy)
  2. How should the application fetch and cache the secret securely? What TTL makes sense?
  3. During the ~2 seconds of the setSecret risk window, what happens to the 10,000 req/s in progress?
  4. How does the with_db_rotation_resilience decorator solve the problem? What is the exact retry flow?
  5. The rotation Lambda is in a private subnet without a NAT Gateway or VPC Endpoint. What happens and how to fix it?

References