Session 003 — CloudFormation: changesets, drift detection and stack policies
Estimated duration: 60 minutes
Prerequisites: session-002 — CloudFormation: stacks, templates, parameters, outputs, Ref/GetAtt
Objective
By the end, you will be able to create and review a changeset before applying it, detect drift in resources outside the CloudFormation lifecycle, and configure a stack policy to protect critical resources from accidental replacement.
Context
[CONSENSUS] The main risk in production IaC operations is not the initial deploy — it's the update. An aws cloudformation deploy in production can delete a database, replace a security group with all its rules, or rename an SQS queue (losing in-flight messages) if the engineer doesn't understand the update behavior of each resource type.
This session covers the three tools that exist to mitigate this risk: changesets (to see what will change before changing it), drift detection (to know when the real world has diverged from the template), and stack policies (to prevent certain changes from happening even if they are requested).
[FACT] Changesets and stack policies are native CloudFormation features. Drift detection is an asynchronous operation that can take minutes on large stacks and is not executed automatically — you need to trigger it explicitly.
Key concepts
1. Update behaviors — what can happen to a resource during an update
Before understanding changesets, you need to understand what CloudFormation can do to a resource during an update. Each property of each resource type has a documented "update behavior".
[FACT] There are three categories:
Update with No Interruption
→ CloudFormation atualiza o recurso sem interromper a operação
→ O recurso mantém seu physical ID
→ Ex: mudar tags de um EC2, mudar descrição de um Security Group
Update with Some Interruption
→ CloudFormation atualiza o recurso com interrupção temporária
→ O recurso mantém seu physical ID
→ Ex: mudar o tipo de instância de um EC2 (reboot necessário)
Replacement
→ CloudFormation cria um NOVO recurso, atualiza as referências,
e deleta o recurso antigo
→ O recurso recebe um NOVO physical ID
→ Ex: mudar o nome de um bucket S3, mudar o engine de um RDS,
mudar a VPC de um Security Group
Why Replacement is the most dangerous:
Estado anterior: Após Replacement:
AppBucket (bucket-prod-123) → AppBucket (bucket-prod-456) [novo]
bucket-prod-123 [deletado]
Se DeletionPolicy = Delete: todos os objetos são perdidos
Se DeletionPolicy = Retain: bucket antigo permanece órfão na conta
[FACT] To identify the update behavior of a specific property, check the resource documentation in the "Update requires" column for each property. There is no universal rule — it depends on the service and the property.
Update behaviors diagram:
Update da stack
│
├─ Propriedade com "No Interruption"
│ └─ Recurso atualizado in-place, sem impacto
│
├─ Propriedade com "Some Interruption"
│ └─ Recurso reiniciado, downtime mínimo
│
└─ Propriedade com "Replacement"
└─ Novo recurso criado → referências atualizadas → antigo deletado
⚠️ Physical ID muda, dados podem ser perdidos
2. Changesets — look before you leap
A changeset is an execution plan that CloudFormation calculates without applying anything. You create it, review it, and only then decide whether to execute.
Full workflow via CLI:
# 1. Criar o changeset (não aplica nada)
aws cloudformation create-change-set \
--stack-name minha-stack \
--change-set-name meu-changeset-$(date +%Y%m%d%H%M) \
--template-body file://template.yaml \
--parameters ParameterKey=Env,ParameterValue=prod \
--capabilities CAPABILITY_NAMED_IAM
# 2. Aguardar o changeset ficar pronto (status: CREATE_COMPLETE)
aws cloudformation wait change-set-create-complete \
--stack-name minha-stack \
--change-set-name meu-changeset-20260506
# 3. Revisar o changeset
aws cloudformation describe-change-set \
--stack-name minha-stack \
--change-set-name meu-changeset-20260506 \
--query 'Changes[*].ResourceChange.[Action,LogicalResourceId,ResourceType,Replacement]' \
--output table
# 4a. Executar (se aprovado)
aws cloudformation execute-change-set \
--stack-name minha-stack \
--change-set-name meu-changeset-20260506
# 4b. Cancelar (se não aprovado)
aws cloudformation delete-change-set \
--stack-name minha-stack \
--change-set-name meu-changeset-20260506
Reading the describe-change-set output:
The most important field is Changes[].ResourceChange. Each item has:
{
"Action": "Modify", // Add | Modify | Remove
"LogicalResourceId": "AppDB",
"ResourceType": "AWS::RDS::DBInstance",
"Replacement": "True", // True | False | Conditional
"Scope": ["Properties"],
"Details": [
{
"Target": {
"Attribute": "Properties",
"Name": "DBInstanceClass",
"RequiresRecreation": "Always" // Never | Conditional | Always
},
"ChangeSource": "DirectModification"
}
]
}
The Replacement field and its values:
False → nenhuma propriedade modificada exige substituição
True → pelo menos uma propriedade exige substituição (certeza)
Conditional → substituição depende de outras condições avaliadas em runtime
(ex: quando uma propriedade muda para um valor específico)
[FACT] Conditional is the most treacherous — CloudFormation cannot determine at planning time whether replacement will occur. Treat Conditional as True when reviewing changesets in production.
Comparing aws cloudformation deploy vs manual changeset:
aws cloudformation deploy
└── cria changeset internamente
└── executa automaticamente
└── você não vê o changeset antes da execução
└── --no-execute-changeset cria o changeset mas NÃO executa
aws cloudformation create-change-set + execute-change-set
└── fluxo explícito — você controla cada etapa
└── recomendado em pipelines de produção com aprovação humana
Multiple changesets on a stack:
[FACT] A stack can have multiple pending changesets simultaneously, but only one can be executed — when you execute one, CloudFormation automatically deletes all other changesets on the stack (they become obsolete after execution).
3. Drift detection — when the real world diverges from the template
Drift happens when someone modifies a resource outside of CloudFormation — via the console, direct CLI, or another tool. The template says one thing, the resource is in a different state.
Triggering drift detection:
# Detectar drift em toda a stack (operação assíncrona)
aws cloudformation detect-stack-drift \
--stack-name minha-stack
# Retorna: { "StackDriftDetectionId": "abc-123" }
# Acompanhar o status da detecção
aws cloudformation describe-stack-drift-detection-status \
--stack-drift-detection-id abc-123
# DetectionStatus: DETECTION_IN_PROGRESS | DETECTION_COMPLETE | DETECTION_FAILED
# Ver o resultado por recurso
aws cloudformation describe-stack-resource-drifts \
--stack-name minha-stack \
--stack-resource-drift-filter-status DRIFTED \
--query 'StackResourceDrifts[*].[LogicalResourceId,ResourceType,StackResourceDriftStatus]' \
--output table
# Detectar drift num recurso específico
aws cloudformation detect-stack-resource-drift \
--stack-name minha-stack \
--logical-resource-id AppBucket
Drift status per resource:
IN_SYNC → propriedades conferem com o template
DRIFTED → pelo menos uma propriedade difere do template
NOT_CHECKED → recurso não suporta drift detection ou não foi verificado
DELETED → recurso foi deletado fora do CloudFormation
What drift detection checks:
[FACT] CloudFormation compares only the properties explicitly defined in the template with the current state of the resource. Properties not declared in the template (which use the service's default values) are not checked.
# Template declara apenas BucketName e VersioningConfiguration
AppBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: meu-bucket
VersioningConfiguration:
Status: Enabled
# Alguém adicionou uma política de lifecycle via console
# → drift detection VAI detectar o lifecycle como "added" (não estava no template)
# → drift detection NÃO vai reportar a ausência de CORS (nunca foi declarado)
Important limitations of drift detection — [FACT]:
1. Nem todos os tipos de recurso suportam drift detection
(verifique a lista em "Resources that support import and drift detection operations")
2. A detecção é assíncrona e pode levar vários minutos em stacks grandes
3. Drift detection NÃO corrige o drift — apenas reporta
Para corrigir: ou atualiza o recurso manualmente para o estado do template,
ou atualiza o template para refletir o estado atual
4. Não existe drift detection contínua nativa — você precisa agendar via EventBridge
ou usar AWS Config com a regra `cloudformation-stack-drift-detection-check`
5. Stacks com status REVIEW_IN_PROGRESS não podem ter drift detectado
Drift lifecycle diagram:
Template Recurso Real
───────── ────────────
VersioningConfiguration: Enabled
VersioningConfiguration: Enabled ← IN_SYNC
[Alguém vai no console e desativa o versioning]
VersioningConfiguration: Enabled
VersioningConfiguration: Suspended ← DRIFTED
Opções para resolver:
A) Reativar o versioning no console/CLI → IN_SYNC
B) Atualizar o template para Suspended e fazer deploy → IN_SYNC
C) Fazer deploy do template original (sem mudar nada) → CloudFormation
detecta a diferença e reativa o versioning como parte do update
4. Stack policies — declarative protection against destructive updates
A stack policy is a JSON document that defines which update actions are allowed on which resources. Once defined, every stack update goes through the policy before being executed.
Policy structure:
{
"Statement": [
{
"Effect": "Allow" | "Deny",
"Principal": "*",
"Action": [ "Update:Modify", "Update:Replace", "Update:Delete", "Update:*" ],
"Resource": "LogicalResourceId/NomeDoRecurso" | "*",
"Condition": { ... } // opcional
}
]
}
The four possible Actions:
Update:Modify → atualização que não substitui nem deleta
Update:Replace → atualização que substitui o recurso (Replacement = True)
Update:Delete → remoção do recurso do template
Update:* → qualquer atualização (wildcard)
Default behavior — [FACT]:
Without a stack policy: all resources can be updated without restriction.
With a stack policy: all resources are protected by default (Deny Update:*). You need to explicitly declare what is allowed.
Practical example — protect RDS and SQS, allow everything else:
{
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "Update:*",
"Resource": "*"
},
{
"Effect": "Deny",
"Principal": "*",
"Action": ["Update:Replace", "Update:Delete"],
"Resource": "LogicalResourceId/ProductionDatabase"
},
{
"Effect": "Deny",
"Principal": "*",
"Action": ["Update:Replace", "Update:Delete"],
"Resource": "LogicalResourceId/OrdersQueue"
}
]
}
[FACT] Deny always takes precedence over Allow when statements overlap. The logic is identical to IAM.
Applying and managing the policy:
# Aplicar a policy ao criar a stack
aws cloudformation create-stack \
--stack-name minha-stack \
--template-body file://template.yaml \
--stack-policy-body file://stack-policy.json
# Aplicar a uma stack existente
aws cloudformation set-stack-policy \
--stack-name minha-stack \
--stack-policy-body file://stack-policy.json
# Ver a policy atual
aws cloudformation get-stack-policy \
--stack-name minha-stack
# Override temporário para atualizar um recurso protegido
# (a policy original volta a valer automaticamente após o update)
aws cloudformation update-stack \
--stack-name minha-stack \
--template-body file://template.yaml \
--stack-policy-during-update-body file://override-policy.json
Temporary override — emergency policy:
// override-policy.json — permite substituição apenas do RDS
{
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "Update:*",
"Resource": "LogicalResourceId/ProductionDatabase"
}
]
}
[FACT] The override via --stack-policy-during-update-body is applied only during that specific update. After the update completes (success or rollback), the original policy is automatically restored.
5. Stack termination protection — complement to stack policies
Stack policy protects resources from updates. Termination protection protects the entire stack from being deleted.
# Habilitar
aws cloudformation update-termination-protection \
--stack-name minha-stack \
--enable-termination-protection
# Verificar
aws cloudformation describe-stacks \
--stack-name minha-stack \
--query 'Stacks[0].EnableTerminationProtection'
# Desabilitar (necessário antes de deletar)
aws cloudformation update-termination-protection \
--stack-name minha-stack \
--no-enable-termination-protection
Difference between the protections:
Termination protection → impede DELETE da stack inteira
Stack policy → impede UPDATE destrutivo de recursos específicos
DeletionPolicy: Retain → comportamento do recurso quando a stack é deletada
All three are complementary and independent.
Practical example
Scenario: you have a production stack with an RDS and need to update the instance type — a Some Interruption operation. You want to make sure you won't accidentally replace the database.
# 1. Ver o estado atual da stack
aws cloudformation describe-stacks \
--stack-name prod-stack \
--query 'Stacks[0].[StackStatus,EnableTerminationProtection]'
# 2. Verificar se há drift antes de qualquer operação
aws cloudformation detect-stack-drift --stack-name prod-stack
# aguardar...
aws cloudformation describe-stack-resource-drifts \
--stack-name prod-stack \
--stack-resource-drift-filter-status DRIFTED \
--output table
# Se houver drift, avaliar antes de prosseguir
# 3. Criar changeset com a mudança de instance class
aws cloudformation create-change-set \
--stack-name prod-stack \
--change-set-name update-rds-class-$(date +%Y%m%d) \
--template-body file://template.yaml \
--parameters \
ParameterKey=DBInstanceClass,ParameterValue=db.t3.large \
--capabilities CAPABILITY_NAMED_IAM
aws cloudformation wait change-set-create-complete \
--stack-name prod-stack \
--change-set-name update-rds-class-20260506
# 4. Revisar: verificar se Replacement é False (esperado para mudança de class)
aws cloudformation describe-change-set \
--stack-name prod-stack \
--change-set-name update-rds-class-20260506 \
--query 'Changes[*].ResourceChange.[Action,LogicalResourceId,Replacement,Scope]' \
--output table
# Saída esperada:
# Action | LogicalResourceId | Replacement | Scope
# Modify | ProductionDB | False | ['Properties']
# ✅ Replacement = False → seguro prosseguir
# 5. Executar após confirmação
aws cloudformation execute-change-set \
--stack-name prod-stack \
--change-set-name update-rds-class-20260506
# 6. Acompanhar eventos em tempo real
aws cloudformation describe-stack-events \
--stack-name prod-stack \
--query 'StackEvents[0:10].[Timestamp,LogicalResourceId,ResourceStatus,ResourceStatusReason]' \
--output table
Common pitfalls
1. Trusting Replacement: False without checking RequiresRecreation in the Details
The Replacement field at the resource change level is an aggregate. To understand which specific property is causing the change — and whether it could cause replacement in different scenarios — you need to look at Changes[].ResourceChange.Details[].Target.RequiresRecreation. A changeset may show Replacement: False today and Replacement: True tomorrow if the property value changes.
2. Assuming drift detection covers all resources
Several resource types do not support drift detection — among them some Lambda resources, resources from newer services, and custom resources (AWS::CloudFormation::CustomResource). Before trusting a clean drift report, check which resources in the stack are marked as NOT_CHECKED in the output.
3. Stack policy blocking its own rollback
[FACT] If a stack policy denies Update:Replace on a resource, and an update fails and attempts to rollback by creating a new version of the resource, the rollback can also be blocked by the policy — leaving the stack in UPDATE_ROLLBACK_FAILED. To recover from this state: aws cloudformation continue-update-rollback with --resources-to-skip to skip the problematic resource.
Reflection exercise
You maintain a production stack with the following resources: an RDS PostgreSQL, an SQS queue, an S3 bucket, and an ALB. A colleague creates a stack policy that only denies Update:Replace and Update:Delete for the RDS and the SQS queue, and allows Update:* for everything else.
Three months later, someone changes the S3 BucketName in the template (a Replacement operation) and deploys. The bucket is replaced and the data is lost.
Where did the protection fail? Rewrite the stack policy to cover this case without blocking legitimate maintenance operations on the other resources. Also consider: what would need to be done in the template beyond the stack policy to ensure multi-layered protection?
Resources for further reading
Changesets:
- Update CloudFormation stacks using change sets — covers the full cycle: creation, viewing, and execution, with the output fields explained.
Update behaviors:
- Understand update behaviors of stack resources — reference for the three behaviors with examples of resources and properties for each category.
Drift detection:
- Detect unmanaged configuration changes to stacks and resources with drift detection — includes the list of supported resources and the detailed format of per-resource drift output.
Stack policies:
- Prevent updates to stack resources — policy JSON structure, the four available actions, and how to perform a temporary override.