luizmachado.dev

PT EN

Session 011 — CDK: CustomResources and Aspects

Estimated duration: 60 minutes
Prerequisites: session-010-cdk-pipelines-stages-shellsteps


Objective

By the end, you will be able to create a CustomResource that invokes a Lambda to provision resources not natively supported by CloudFormation, and apply an Aspect to traverse all constructs in a stack (e.g., enforce encryption on all S3 buckets automatically).


Context

[FACT] CloudFormation only knows the resource types that AWS has published — approximately 1,200 types as of May 2026. Any resource outside that catalog (configurations in third-party APIs, database initialization operations, DNS records in external providers, data bootstrapping) requires an extension mechanism. This mechanism has existed since 2012 and is called Custom Resource: a Type: Custom::* block in the template that delegates the lifecycle (Create/Update/Delete) to a Lambda or an SNS topic.

[CONSENSUS] The CDK offers two abstractions over Custom Resources: custom_resources.Provider (a mini-framework with support for asynchronous operations, retry, and timeout) and AwsCustomResource (a wrapper that executes a single SDK call without you needing to write Lambda code). The choice between the two follows a simple heuristic: if the logic fits in a single API call, use AwsCustomResource; if it requires business logic, polling, or multiple calls, use Provider.

[FACT] Aspects are a construct tree traversal mechanism introduced in CDK v1 and maintained in v2. They implement the Visitor pattern: you write a visit(node) function and the CDK invokes it for each node in the tree during synthesis, after all constructs have been instantiated. This allows you to inspect or modify the tree in a cross-cutting manner — without each construct needing to know about the rule being applied.


Key concepts

1. CloudFormation Custom Resource: the base protocol

Before looking at the CDK abstraction, understand what's underneath. When CloudFormation needs to create a Custom Resource, it makes an HTTP POST to a pre-signed S3 URL (or invokes a Lambda directly) with a JSON payload like this:

{
  "RequestType": "Create",
  "ResponseURL": "https://s3.amazonaws.com/pre-signed-url...",
  "StackId": "arn:aws:cloudformation:...",
  "RequestId": "abc123",
  "ResourceType": "Custom::AcmeCertRegistration",
  "LogicalResourceId": "MyCert",
  "ResourceProperties": {
    "Domain": "app.example.com",
    "ServiceToken": "arn:aws:lambda:..."
  }
}

The handler (Lambda or SNS) must respond with a JSON object to the ResponseURL containing Status (SUCCESS or FAILED), PhysicalResourceId, and optionally Data (attributes that become available via Fn::GetAtt).

[FACT] The Physical Resource ID is the most critical field in the protocol. It uniquely identifies the instance of the external resource. The rules are:

CREATE:  you generate the PhysicalResourceId (e.g.: "acme-cert-app.example.com")
UPDATE:  you return the same ID if the operation is in-place,
         or a different ID if it's a replacement.
         → If the ID changes, CloudFormation sends a DELETE event
           for the old ID immediately after!
DELETE:  CloudFormation sends the ID that was stored during Create/Update.
         You use it to destroy the external resource.

[FACT] If the Lambda fails (uncaught exception) or doesn't respond within 60 minutes (the default Custom Resource timeout), CloudFormation gets stuck waiting and eventually rolls back after 1 hour. This is why the CDK Provider exists: it eliminates the need to write the HTTP response code manually and manages the timeout.


2. custom_resources.Provider — the CDK mini-framework

[FACT] The Provider is a construct that creates all the infrastructure needed to manage a Custom Resource robustly:

┌─────────────────────────────────────────────────────────────┐
│  Provider (construct)                                       │
│                                                             │
│  ┌──────────────┐    ┌──────────────────────────────────┐  │
│  │  onEvent     │    │  (se isComplete definido)        │  │
│  │  Lambda      │    │  ┌─────────────────────────────┐ │  │
│  │              │───▶│  │ Step Functions state machine│ │  │
│  │  CREATE/     │    │  │  ┌──────────┐  ┌─────────┐  │ │  │
│  │  UPDATE/     │    │  │  │isComplete│  │ waiter  │  │ │  │
│  │  DELETE      │    │  │  │  Lambda  │◀─│  loop   │  │ │  │
│  └──────────────┘    │  │  └──────────┘  └─────────┘  │ │  │
│                      │  └─────────────────────────────┘ │  │
│                      └──────────────────────────────────┘  │
│                                                             │
│  serviceToken: Lambda ARN ou Step Functions ARN             │
└─────────────────────────────────────────────────────────────┘
         │
         │ serviceToken
         ▼
┌─────────────────┐
│  CustomResource │  ◀─── CloudFormation trata como um recurso comum
│  (construct)    │
└─────────────────┘

The minimal setup in TypeScript:

import * as cr from 'aws-cdk-lib/custom-resources';
import { CustomResource, Duration } from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';

// 1. Sua Lambda de handler
const onEventHandler = new lambda.Function(this, 'OnEvent', {
  runtime: lambda.Runtime.NODEJS_22_X,
  handler: 'index.handler',
  code: lambda.Code.fromInline(`
    exports.handler = async (event) => {
      console.log('Event:', JSON.stringify(event));
      const physicalId = event.PhysicalResourceId || 'my-resource-' + Date.now();

      if (event.RequestType === 'Delete') {
        // limpa o recurso externo aqui
        return { PhysicalResourceId: physicalId };
      }

      // cria ou atualiza
      const result = await doSomething(event.ResourceProperties);
      return {
        PhysicalResourceId: result.id,
        Data: { Endpoint: result.endpoint, ApiKey: result.apiKey },
      };
    };
  `),
});

// 2. Provider que registra o handler
const provider = new cr.Provider(this, 'Provider', {
  onEventHandler,
  // isCompleteHandler: opcionalmente uma segunda Lambda para polling
  // totalTimeout: Duration.minutes(30),  // máx para operações async
});

// 3. Custom Resource conectado ao Provider
const resource = new CustomResource(this, 'Resource', {
  serviceToken: provider.serviceToken,
  properties: {
    Domain: 'app.example.com',
    Version: '2',   // mudar este valor no futuro dispara um UPDATE
  },
  resourceType: 'Custom::AcmeCertRegistration',   // prefixo Custom:: obrigatório
});

// 4. Lendo atributos retornados pelo handler (via Data{})
const endpoint = resource.getAttString('Endpoint');

[FACT] The serviceToken property of the Provider can be the ARN of a Lambda or the ARN of a Step Functions Express Workflow (when isCompleteHandler is defined). When there is an isCompleteHandler, the Provider creates a state machine that calls onEvent once, then keeps polling isComplete every queryInterval (default 5 seconds) until it returns { IsComplete: true } or reaches the totalTimeout.


3. AwsCustomResource — the shortcut for single SDK calls

[FACT] AwsCustomResource + AwsSdkCall is a high-level abstraction that allows you to execute any AWS SDK call without writing Lambda code. Internally, it uses the Provider with a generic Lambda that executes SDK calls dynamically.

Classic use case: fetching a parameter from SSM Parameter Store in a different region than the stack (the ssm.StringParameter.valueFromLookup only works at synthesis time, not at deploy time):

import { AwsCustomResource, AwsSdkCall, PhysicalResourceId } from 'aws-cdk-lib/custom-resources';
import * as iam from 'aws-cdk-lib/aws-iam';

const getParam = new AwsCustomResource(this, 'GetParam', {
  onUpdate: {   // 'onUpdate' é chamado tanto em Create quanto em Update
    service: 'SSM',
    action: 'getParameter',
    parameters: {
      Name: '/shared/database-endpoint',
      WithDecryption: true,
    },
    region: 'us-east-1',                         // região diferente da stack!
    physicalResourceId: PhysicalResourceId.of(Date.now().toString()),
  },
  policy: AwsCustomResourcePolicy.fromSdkCalls({
    resources: AwsCustomResourcePolicy.ANY_RESOURCE,  // em prod: ARN específico
  }),
});

const dbEndpoint = getParam.getResponseField('Parameter.Value');

[CONSENSUS] The AwsCustomResource requires you to explicitly define the policy — the IAM permissions that the generic Lambda needs to execute the SDK call. Using ANY_RESOURCE is convenient in development, but in production always restrict to the specific resource ARN.


4. Aspects and IAspect: the Visitor pattern on the construct tree

[FACT] An Aspect is any object that implements the IAspect interface:

interface IAspect {
  visit(node: IConstruct): void;
}

You apply it to a scope with Aspects.of(scope).add(aspect). The CDK, during the synthesis phase (after the entire tree has been constructed), traverses the tree in depth-first, pre-order invoking visit(node) on each construct:

App
└── Stack
    ├── Bucket1          ← visit() chamado aqui
    ├── Lambda1          ← e aqui
    │   └── Role         ← e aqui (filho de Lambda1)
    └── Queue1           ← e aqui

If you apply the Aspect to the App, it visits all constructs in the application. If you apply it to a specific Stack, it visits only the constructs in that stack.

[FACT] Inside visit(), you can:

  1. Inspect the construct and add error or warning annotations:
import { Annotations } from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';

class BucketEncryptionChecker implements IAspect {
  visit(node: IConstruct): void {
    if (node instanceof s3.CfnBucket) {
      if (!node.bucketEncryption) {
        Annotations.of(node).addError(
          'Todos os buckets S3 devem ter encriptação configurada. ' +
          'Use BucketEncryption.S3_MANAGED ou KMS.'
        );
      }
    }
  }
}
  1. Mutate the construct, adding properties or calling methods:
class EnforceS3Encryption implements IAspect {
  visit(node: IConstruct): void {
    if (node instanceof s3.CfnBucket) {
      // força encriptação SSE-S3 em qualquer bucket que não tenha uma já
      if (!node.bucketEncryption) {
        node.bucketEncryption = {
          serverSideEncryptionConfiguration: [{
            serverSideEncryptionByDefault: {
              sseAlgorithm: 'AES256',
            },
          }],
        };
      }
    }
  }
}

[FACT] addError() causes cdk synth to fail with a descriptive message — the CloudAssembly is not generated. addWarning() and addInfo() only emit messages but do not interrupt synthesis. This makes Aspects suitable for compliance gates: the CDK pipeline fails at synth before even reaching deploy.


5. Aspects in practice: Tags, Annotations, and limitations

[FACT] The CDK's own Tags system is internally implemented as an Aspect. When you call Tags.of(stack).add('Environment', 'production'), the CDK registers an Aspect that, when visiting each construct, adds the tag to the corresponding CloudFormation resource.

[FACT] When traversing the tree, the node received in visit() is typed as IConstruct. To act only on a specific resource type, use instanceof:

// Para constructs L2 (nível CDK):
if (node instanceof s3.Bucket) { ... }

// Para recursos L1 (nível CloudFormation):
if (node instanceof s3.CfnBucket) { ... }

[FACT] There is an important difference between visiting L2 and L1:

  • Visiting s3.Bucket (L2) gives access to the CDK's high-level API (bucket.addLifecycleRule(), bucket.grantRead()). However, the L2 may not exist if the resource was created via L1 directly.
  • Visiting s3.CfnBucket (L1) guarantees that you see all buckets, but you manipulate CloudFormation properties directly (as in the bucketEncryption example above).

[CONSENSUS] The dominant convention in the CDK community is to visit L1 in compliance and mutation Aspects, because it is the common denominator — every L2 resource eventually creates an L1.

Critical limitation: [FACT] You must not create new constructs inside visit(). The CDK does not guarantee the order of Aspect application relative to tree construction, and adding constructs during visitation can cause undefined behavior (ignored constructs, synthesis loops). If you need to create infrastructure conditionally, create it in the constructor and enable/disable via properties.


Practical example

Scenario: You have an infrastructure stack that creates S3 buckets in different parts of the code. The company's security policy requires that every bucket has SSE-KMS encryption (not SSE-S3) and has versioning enabled. Instead of auditing each new s3.Bucket(...) manually, you want cdk synth to fail automatically if any bucket violates the rules.

File structure

lib/
  aspects/
    bucket-compliance.ts   ← Aspect de validação
  stacks/
    storage-stack.ts       ← Stack com buckets
  app.ts                   ← Ponto de entrada

lib/aspects/bucket-compliance.ts

import { IAspect, Annotations } from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';
import { IConstruct } from 'constructs';

export class BucketComplianceAspect implements IAspect {
  visit(node: IConstruct): void {
    // Visita apenas recursos CfnBucket (L1) para cobrir todos os buckets
    if (!(node instanceof s3.CfnBucket)) return;

    // Regra 1: Encriptação KMS obrigatória
    const enc = node.bucketEncryption as s3.CfnBucket.BucketEncryptionProperty | undefined;
    const hasKms = enc?.serverSideEncryptionConfiguration?.some(
      (rule: any) =>
        rule.serverSideEncryptionByDefault?.sseAlgorithm === 'aws:kms'
    );
    if (!hasKms) {
      Annotations.of(node).addError(
        '[SECURITY] Bucket sem encriptação SSE-KMS. ' +
        'Configure encryptionKey ou use BucketEncryption.KMS.'
      );
    }

    // Regra 2: Versionamento obrigatório
    const versioning = node.versioningConfiguration as
      s3.CfnBucket.VersioningConfigurationProperty | undefined;
    if (versioning?.status !== 'Enabled') {
      Annotations.of(node).addWarning(
        '[COMPLIANCE] Bucket sem versionamento habilitado. ' +
        'Habilite com versioned: true.'
      );
    }
  }
}

lib/stacks/storage-stack.ts

import { Stack, StackProps } from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as kms from 'aws-cdk-lib/aws-kms';
import { Construct } from 'constructs';

export class StorageStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    const key = new kms.Key(this, 'StorageKey', { enableKeyRotation: true });

    // Bucket compliant: KMS + versionamento
    new s3.Bucket(this, 'LogsBucket', {
      encryptionKey: key,
      encryption: s3.BucketEncryption.KMS,
      versioned: true,
    });

    // Bucket NÃO compliant: sem encriptação e sem versionamento
    // O Aspect vai emitir um error e um warning aqui
    new s3.Bucket(this, 'TempBucket');
  }
}

lib/app.ts

import { App, Aspects } from 'aws-cdk-lib';
import { StorageStack } from './stacks/storage-stack';
import { BucketComplianceAspect } from './aspects/bucket-compliance';

const app = new App();
const storageStack = new StorageStack(app, 'StorageStack');

// Aplica o Aspect na stack inteira
Aspects.of(storageStack).add(new BucketComplianceAspect());

app.synth();

cdk synth result

[Error at /StorageStack/TempBucket/Resource] [SECURITY] Bucket sem encriptação 
SSE-KMS. Configure encryptionKey ou use BucketEncryption.KMS.

[Warning at /StorageStack/TempBucket/Resource] [COMPLIANCE] Bucket sem 
versionamento habilitado. Habilite com versioned: true.

Found errors

Synthesis fails (Found errors). The TempBucket needs to be fixed before any deploy can happen.


Bonus: CustomResource combined with AwsCustomResource

Real problem: you need, when creating the stack, to register a webhook in an external service (e.g., GitHub) using that service's REST API. AwsCustomResource covers AWS SDK calls, but not arbitrary HTTP calls. Here you use the full Provider.

// lib/constructs/github-webhook.ts
import { Construct, IConstruct } from 'constructs';
import { CustomResource, SecretValue } from 'aws-cdk-lib';
import * as cr from 'aws-cdk-lib/custom-resources';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as path from 'path';

export class GithubWebhook extends Construct {
  constructor(scope: Construct, id: string, props: {
    repo: string;
    webhookUrl: string;
    githubTokenSecret: string; // ARN do Secrets Manager
  }) {
    super(scope, id);

    const handler = new lambda.Function(this, 'Handler', {
      runtime: lambda.Runtime.NODEJS_22_X,
      handler: 'index.handler',
      code: lambda.Code.fromAsset(path.join(__dirname, '..', 'lambda', 'github-webhook')),
      environment: {
        GITHUB_TOKEN_SECRET: props.githubTokenSecret,
      },
    });

    // Permissão para ler o secret
    // handler.addToRolePolicy(new iam.PolicyStatement({ ... }));

    const provider = new cr.Provider(this, 'Provider', {
      onEventHandler: handler,
    });

    new CustomResource(this, 'Resource', {
      serviceToken: provider.serviceToken,
      properties: {
        Repo: props.repo,
        WebhookUrl: props.webhookUrl,
        // Incluir um hash do token faz o webhook ser re-registrado se o token mudar
        TokenHash: SecretValue.secretsManager(props.githubTokenSecret).toString(),
      },
      resourceType: 'Custom::GithubWebhook',
    });
  }
}

Common pitfalls

Pitfall 1: Changing the PhysicalResourceId in an Update causes Delete of the old resource

The mistake: You have a Custom Resource that generates an ID based on some properties. In an update, you change a property that makes the ID different. CloudFormation sends an UPDATE for the new ID and then a DELETE for the old ID — your Lambda will try to delete a resource that was just created with a different name, which is frequently correct, but can be a silent bug if you weren't expecting the DELETE.

Why it happens: The CloudFormation Custom Resource protocol specifies that, if the PhysicalResourceId changes in an Update, the old resource was "replaced" and must be deleted.

How to recognize it: In the Lambda logs (CloudWatch), you see two events: RequestType: Update followed by RequestType: Delete with the previous ID.

How to avoid it: Be deliberate about when to change the PhysicalResourceId. If the operation is always in-place (e.g., updating configuration of an existing resource), always return the same ID. If the operation creates a new resource (e.g., registering a certificate with a different domain), changing the ID is correct — but ensure that the DELETE handler knows how to deal with attempting to delete something that may no longer exist.


Pitfall 2: Aspect visiting L2 doesn't catch buckets created via L1

The mistake: You write if (node instanceof s3.Bucket) in your Aspect, but part of the infrastructure uses new s3.CfnBucket(...) directly. The Aspect passes over those resources without detecting them.

Why it happens: s3.Bucket and s3.CfnBucket are different classes in the hierarchy. TypeScript's instanceof checks the exact type, not the relationship with the underlying CloudFormation resource.

How to recognize it: cdk synth passes without errors for a CfnBucket that should have been intercepted. A manual review or a unit test (using Template.hasResourceProperties) can reveal the problem.

How to avoid it: In compliance Aspects that need to cover 100% of resources of a type, always visit L1 (CfnBucket, CfnFunction, etc.). If you need to visit L2 for some reason, also add a separate check for L1.


Pitfall 3: Creating constructs inside visit() causes undefined behavior

The mistake: Inside visit(node), you identify that a bucket doesn't have a log bucket and try to create a new s3.Bucket(this, 'AccessLogs', ...) to configure it automatically. On the first run it seems to work, but in subsequent deployments the CDK emits warnings about non-deterministic synthesis, or the new bucket is not included in the already-applied Aspects.

Why it happens: The synthesis phase traverses the tree once. Adding new constructs during that traversal is like adding elements to a list while you iterate over it — the behavior is undefined by the CDK specification.

How to avoid it: Never instantiate new SomeConstruct(...) inside visit(). If you need to create resources conditionally, create them in the constructor and use flags or props to control creation. If you need to detect the absence of something and fix it, consider creating a custom L2 construct that encapsulates the rules in the constructor, instead of using an Aspect for mutation.


Reflection exercise

You are building an internal platform where different teams create independent CDK stacks. The security policy mandates:

  1. Every S3 bucket must use SSE-KMS with a KMS key that has automatic rotation enabled.
  2. Every SNS topic must have encryption at rest enabled.
  3. Every Lambda function must have a DLQ (Dead Letter Queue) configured.
  4. No Security Group can have an ingress rule with cidr 0.0.0.0/0 on port 22.

How would you structure Aspects to implement these four rules? For each rule, decide: would you visit L1 or L2? Would you use addError() (blocking) or addWarning() (informational)? How would you organize the Aspects in code — a single Aspect with all checks, or one Aspect per rule? Consider how this solution would scale if the company grew to 20 compliance rules and 50 teams with independent stacks. Are there cases where an automatic mutation Aspect (silently fixing instead of reporting the error) would be preferable? What are the risks of that approach in a team environment?


Resources for further study

1. CDK v2 Guide — Custom Resources (cfn_layer)

URL: https://docs.aws.amazon.com/cdk/v2/guide/cfn_layer.html#develop-customize-expose
What to find: The "Develop a custom resource provider" section explains the CDK's escape hatch layer for interacting directly with CloudFormation Custom Resources, including how to expose attributes via Data and how to use the escape hatch node.defaultChild for customizations.
Why it's the right source: It's the official CDK documentation for the "expose" approach — the closest point to creating your own resource types via CDK.

2. aws-cdk-lib.custom_resources module README

URL: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.custom_resources-readme.html
What to find: Complete documentation of Provider, AwsCustomResource, and AwsSdkCall. Includes diagrams of the Provider's internal architecture (with Step Functions), examples of isCompleteHandler for asynchronous resources, and the complete list of response properties.
Why it's the right source: It's the canonical reference for the module — more detailed than the narrative guide.

3. CDK v2 Guide — Aspects

URL: https://docs.aws.amazon.com/cdk/v2/guide/aspects.html
What to find: Examples of Aspects for validation (with addError) and for applying tags. Explains the execution order (depth-first, pre-order) and how to apply Aspects at different scopes (App, Stack, individual Construct).
Why it's the right source: It's the official document for the feature, with examples in TypeScript, Python, and Java, and clarifies the guarantees and limitations of the mechanism.