luizmachado.dev

PT EN

Session 024 — Lambda Observability: structured logging, X-Ray and Lambda Insights

Estimated duration: 60 minutes
Prerequisites: session-023-stepfunctions-parallel-map-error


Objective

By the end, you will be able to emit structured logs (JSON) from a Lambda with correlation fields (requestId, userId), enable X-Ray active tracing and add custom subsegments, and enable Lambda Insights to see duration, error, and init time metrics per function.


Context

[FACT] Observability in distributed systems relies on the three classic pillars: logs (discrete event records), metrics (numeric time series), and traces (request tracing across services). In Lambda, each invocation is ephemeral and potentially distributed across hundreds of simultaneous worker instances — which makes correlation between these three pillars especially critical.

[CONSENSUS] The biggest observability problem in Lambda is not lack of data, but lack of correlation. CloudWatch already captures native metrics and logs by default. What differentiates an observable system from a monitored system is the ability to, given a request ID or a traceId, quickly find the function's log, the complete X-Ray trace, the performance metrics of the specific worker, and the errors that occurred. Structured logging, X-Ray, and Lambda Insights are the three tools that allow building this correlation systematically in Lambda.

[FACT] Starting in 2023, Lambda began natively supporting JSON format for system logs (messages that the Lambda service itself emits — such as START, END, REPORT), in addition to application logs. This simplifies log ingestion in CloudWatch Logs Insights without the need for custom parsers.


Key concepts

1. Structured Logging — logs as objects, not strings

[FACT] The default log format in Lambda is plain text. When the application uses print() or console.log(), CloudWatch receives a text line that needs to be parsed with regex or glob expressions to extract fields. Structured logging replaces strings with JSON objects, making each field directly queryable.

Log não estruturado (difícil de consultar):
───────────────────────────────────────────────────────────────
[INFO] 2026-06-24T10:15:32Z - Pedido P001 processado para usuario U42 em 245ms

Log estruturado JSON (CloudWatch Insights auto-descobre campos):
───────────────────────────────────────────────────────────────
{
  "timestamp": "2026-06-24T10:15:32.410Z",
  "level": "INFO",
  "message": "Pedido processado",
  "requestId": "abc123-def456",
  "traceId": "1-66795-abc...",
  "pedido_id": "P001",
  "usuario_id": "U42",
  "duracao_ms": 245,
  "service": "pedidos",
  "version": "2.1.0"
}

[FACT] CloudWatch Logs Insights automatically detects fields in JSON lines without any configuration. Once logs are in JSON, queries like the one below work directly:

-- Buscar todos os erros de um usuário específico na última hora
fields @timestamp, level, message, pedido_id, duracao_ms
| filter level = "ERROR" and usuario_id = "U42"
| sort @timestamp desc
| limit 50

Required correlation fields

[CONSENSUS] The practice adopted by most production teams is to include at least four correlation fields in each log:

Campo Origem Uso
requestId context.aws_request_id Correlacionar todos os logs de uma invocação
traceId os.environ["_X_AMZN_TRACE_ID"] Correlacionar com trace X-Ray
service Constante na função Filtrar logs por serviço em log groups agregados
cold_start Variável de inicialização Identificar invocações com Init phase

Enabling native JSON in Lambda (function log format)

[FACT] Since 2023, it is possible to configure the Lambda function so that system messages (START, END, REPORT) are also emitted in JSON. This is separate from the application log format:

# CDK — log format e log level nativos da função
from aws_cdk import aws_lambda as lambda_

fn = lambda_.Function(
    self, "MinhaFuncao",
    # ...
    logging_format=lambda_.LoggingFormat.JSON,   # logs sistema em JSON
    system_log_level=lambda_.SystemLogLevel.INFO,
    application_log_level=lambda_.ApplicationLogLevel.INFO,
    log_retention=logs.RetentionDays.ONE_WEEK,
)

[FACT] With LoggingFormat.JSON, the REPORT record becomes:

{
  "timestamp": "2026-06-24T10:15:32.660Z",
  "type": "platform.report",
  "record": {
    "requestId": "abc123",
    "metrics": {
      "durationMs": 245.12,
      "billedDurationMs": 246,
      "memorySizeMB": 256,
      "maxMemoryUsedMB": 89,
      "initDurationMs": 312.5
    }
  }
}

Structured logging with pure Python (without Powertools)

import json
import logging
import os

# Configura o logger raiz para emitir JSON
class JsonFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "message": record.getMessage(),
            "logger": record.name,
            "requestId": getattr(record, "requestId", None),
            "traceId": os.environ.get("_X_AMZN_TRACE_ID"),
            "service": "pedidos",
        }
        # Campos extras passados via extra={}
        for key in vars(record):
            if key not in logging.LogRecord.__dict__ and not key.startswith("_"):
                log_entry[key] = getattr(record, key)
        return json.dumps(log_entry)

logger = logging.getLogger()
logger.setLevel(logging.INFO)
if logger.handlers:
    logger.handlers[0].setFormatter(JsonFormatter())

# Variável para detectar cold start
COLD_START = True

def handler(event, context):
    global COLD_START
    cold = COLD_START
    COLD_START = False

    # Enriquece todos os logs desta invocação com requestId
    extra = {"requestId": context.aws_request_id, "cold_start": cold}

    logger.info("Invocação iniciada", extra={**extra, "evento_tipo": event.get("type")})

    try:
        resultado = processar(event, extra)
        logger.info("Invocação concluída", extra={**extra, "resultado": resultado["status"]})
        return resultado
    except Exception as e:
        logger.error("Erro na invocação", extra={**extra, "error_type": type(e).__name__, "error_msg": str(e)})
        raise

[CONSENSUS] Using Lambda Powertools Logger (session-021) eliminates the need to implement this boilerplate manually. The @logger.inject_lambda_context decorator automatically injects requestId, cold_start, xray_trace_id, and other fields into all logs of the invocation.


2. X-Ray Active Tracing — distributed tracing in Lambda

[FACT] AWS X-Ray is AWS's distributed tracing service. In Lambda, tracing works via an X-Ray daemon that runs inside the execution environment and receives data via UDP (port 2000 on loopback). The X-Ray SDK sends segments to this daemon, which forwards them to the X-Ray service.

Anatomy of a trace in Lambda

[FACT] With active tracing enabled, Lambda automatically creates two segments per invocation:

Trace (X-Amzn-Trace-Id: Root=1-...;Sampled=1)
├── Segmento 1: "Lambda" (serviço)
│   └── Representa o Lambda service recebendo a invocação
│       Inclui: cold start time, queuing time
│
└── Segmento 2: "minhaFuncao" (função)
    ├── Subsegmento: Initialization (apenas em cold starts)
    │   └── Tempo do Init phase (carregamento do módulo)
    ├── Subsegmento: Invocation
    │   └── Tempo de execução do handler
    │       ├── [seus subsegmentos customizados aqui]
    └── Subsegmento: Overhead
        └── Tempo de checkpoint/extensões

[FACT] The environment variable _X_AMZN_TRACE_ID contains the trace ID of the current invocation in the format Root=1-<timestamp>-<hex>;Parent=<parentId>;Sampled=<0|1>. This string must be propagated in downstream calls (HTTP headers, SQS messages, etc.) to maintain trace continuity.

Enabling active tracing

# CDK
fn = lambda_.Function(
    self, "MinhaFuncao",
    # ...
    tracing=lambda_.Tracing.ACTIVE,   # ou PASS_THROUGH para herdar do upstream
)
# CDK adiciona automaticamente xray:PutTraceSegments e xray:PutTelemetryRecords
# à execution role da função
# CLI
aws lambda update-function-configuration \
  --function-name MinhaFuncao \
  --tracing-config Mode=Active

Custom subsegments

[FACT] The X-Ray SDK allows creating subsegments for any operation within the handler — database calls, external APIs, heavy processing:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all

# Patcha automaticamente boto3, requests, httplib, pymongo, etc.
patch_all()

def handler(event, context):
    pedido_id = event["pedido_id"]

    # Subsegmento manual com context manager
    with xray_recorder.in_subsegment("validar-pedido") as subseg:
        subseg.put_annotation("pedido_id", pedido_id)   # indexado — filtrável
        subseg.put_annotation("valor", event["valor"])
        subseg.put_metadata("evento_completo", event)   # não indexado — apenas armazenado
        resultado = validar_pedido(pedido_id)

    # Decorator em funções internas
    resultado_bd = salvar_no_banco(pedido_id, resultado)

    return {"status": "ok"}

@xray_recorder.capture("salvar-no-banco")
def salvar_no_banco(pedido_id, dados):
    # boto3 já está patchado — as chamadas DynamoDB aparecem como
    # subsegmentos automáticos dentro de "salvar-no-banco"
    tabela.put_item(Item={"id": pedido_id, **dados})
    return True

Annotations vs Metadata

[FACT] The distinction between put_annotation and put_metadata is critical for X-Ray usage:

Annotations                          Metadata
────────────────────────────────     ────────────────────────────────
Tipos: string, número, booleano      Tipos: qualquer JSON serializável
Indexados pelo X-Ray                 NÃO indexados
Aparecem em filter expressions       Apenas visíveis no detalhe do trace
Limite: 50 anotações por trace       Limite: 64KB por segmento
Uso: agrupamento, filtros, alertas   Uso: debug, dados de contexto

[FACT] Filter expressions in the X-Ray console use annotations:

# Encontrar todos os traces com erro para um pedido específico
annotation.pedido_id = "P001" AND error = true

# Traces lentos (>2s) de um serviço específico
annotation.service = "pedidos" AND responsetime > 2

Sampling rules

[FACT] By default, X-Ray samples 5% of requests (or 1 req/s, whichever is greater). In high-volume production, this is essential for cost control. Custom rules can be configured:

aws xray create-sampling-rule --cli-input-json '{
  "SamplingRule": {
    "RuleName": "PedidosAltoValor",
    "Priority": 1,
    "FixedRate": 1.0,
    "ReservoirSize": 5,
    "ServiceName": "pedidos",
    "ServiceType": "AWS::Lambda::Function",
    "Host": "*",
    "HTTPMethod": "*",
    "URLPath": "*",
    "ResourceARN": "*",
    "Attributes": { "pedido_valor": "alto" }
  }
}'

3. Lambda Insights — per-invocation system metrics

[FACT] Lambda Insights is implemented as an internal Lambda Extension, distributed as an AWS-managed Lambda Layer. When enabled, the extension collects system metrics from each invocation and sends them to CloudWatch Logs in the /aws/lambda/insights group using EMF (Embedded Metric Format), which CloudWatch interprets to create time-series metrics.

Collected metrics

[FACT] Lambda Insights collects the following metrics per invocation:

Métricas de performance:
┌─────────────────────────┬──────────────────────────────────────────────────┐
│ Métrica                 │ Descrição                                        │
├─────────────────────────┼──────────────────────────────────────────────────┤
│ duration                │ Duração da invocação em ms                       │
│ billed_duration         │ Duração cobrada (arredondada para 1ms)           │
│ init_duration           │ Tempo do Init phase (cold start apenas)          │
│ memory_utilization      │ % de memória configurada utilizada               │
│ used_memory_max         │ Pico de uso de memória em MB                     │
│ cpu_total_time          │ Tempo total de CPU em ms                         │
├─────────────────────────┼──────────────────────────────────────────────────┤
│ Métricas de I/O:        │                                                  │
│ rx_bytes                │ Bytes recebidos via rede                         │
│ tx_bytes                │ Bytes enviados via rede                          │
│ disk_used               │ Uso de /tmp em MB                                │
│ disk_total              │ Espaço total em /tmp em MB                       │
├─────────────────────────┼──────────────────────────────────────────────────┤
│ Diagnóstico:            │                                                  │
│ cold_start              │ 1 se foi cold start, 0 caso contrário            │
│ out_of_memory           │ 1 se a função excedeu memória                    │
│ timeout                 │ 1 se a função atingiu timeout                    │
│ errors                  │ 1 se houve erro não tratado                      │
└─────────────────────────┴──────────────────────────────────────────────────┘

Enabling Lambda Insights via CDK

# CDK
fn = lambda_.Function(
    self, "MinhaFuncao",
    # ...
    tracing=lambda_.Tracing.ACTIVE,
    insights_version=lambda_.LambdaInsightsVersion.VERSION_1_0_229_0,
    # CDK adiciona automaticamente:
    # - A layer gerenciada arn:aws:lambda:<region>:580247275435:layer:LambdaInsightsExtension:...
    # - A policy CloudWatchLambdaInsightsExecutionRolePolicy à execution role
)

[FACT] The layer ARN changes per region. LambdaInsightsVersion.VERSION_1_0_229_0 is the most recent version as of May 2026 — check docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights.html for versions available in your region.

Dashboard and Log Insights

[FACT] The CloudWatch console automatically creates a dashboard at /LambdaInsights when Lambda Insights is enabled. The metrics are available in the LambdaInsights namespace.

-- CloudWatch Logs Insights: invocações mais lentas com cold start
-- Log group: /aws/lambda/insights
fields @timestamp, function_name, duration, init_duration, memory_utilization, cold_start
| filter cold_start = 1
| sort duration desc
| limit 20

-- Correlacionar log de aplicação com métricas Insights
-- Log group: /aws/lambda/MinhaFuncao
fields @timestamp, @message, @requestId
| filter level = "ERROR"
| join insights on requestId = @requestId   -- correlação via requestId

4. Correlation between the three pillars

[FACT] The field that unites logs, traces, and metrics in Lambda is the requestId (also called aws_request_id in the Python context object). The correlation flow is:

Invocação recebida
       │
       ▼
Lambda Service gera requestId ─────────────────────────────────────────┐
       │                                                                │
       ▼                                                                ▼
┌──────────────────────┐    ┌────────────────────────┐    ┌────────────────────────┐
│    LOGS              │    │      X-RAY             │    │  LAMBDA INSIGHTS       │
│                      │    │                        │    │                        │
│ Log estruturado com  │    │ Trace ID gerado pelo   │    │ Métricas EMF emitidas  │
│ "requestId": "abc"   │    │ X-Ray daemon           │    │ com requestId e        │
│ "traceId": "1-..."   │    │                        │    │ function_name          │
│ "cold_start": true   │    │ Segmento da função tem │    │                        │
│ "usuario_id": "U42"  │    │ anotação requestId     │    │ init_duration: 312ms   │
│                      │    │                        │    │ memory_utilization: 35%│
└──────────┬───────────┘    └───────────┬────────────┘    └────────────┬───────────┘
           │                            │                              │
           └────────────────────────────┴──────────────────────────────┘
                         requestId como chave de correlação

Console CloudWatch → ServiceLens: une logs + traces em uma view única

[FACT] CloudWatch ServiceLens (tab in the CloudWatch console) automatically consumes the correlation between logs and X-Ray traces when:
1. The function has active tracing enabled.
2. The logs include the @xrayTraceId field (Lambda Powertools injects this automatically; without Powertools, use os.environ["_X_AMZN_TRACE_ID"]).


Practical example

Scenario: Order processing function with complete structured logging, X-Ray with custom subsegments, and Lambda Insights.

Python handler with all three pillars

import json
import logging
import os
import time
import boto3
from aws_xray_sdk.core import xray_recorder, patch_all

# Patcha clientes boto3 automaticamente para X-Ray
patch_all()

# ── Structured Logger ──────────────────────────────────────────────────────────
class StructuredLogger:
    def __init__(self, service_name: str, level: str = "INFO"):
        self.service = service_name
        self.level = getattr(logging, level)
        self._base_fields: dict = {}

    def set_invocation_context(self, request_id: str, cold_start: bool):
        self._base_fields = {
            "requestId": request_id,
            "cold_start": cold_start,
            "traceId": os.environ.get("_X_AMZN_TRACE_ID", ""),
            "service": self.service,
        }

    def _emit(self, level: str, message: str, **kwargs):
        entry = {
            "timestamp": time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime()),
            "level": level,
            "message": message,
            **self._base_fields,
            **kwargs,
        }
        # Lambda captura stdout; usar print garante flush imediato
        print(json.dumps(entry))

    def info(self, msg: str, **kwargs):  self._emit("INFO", msg, **kwargs)
    def warn(self, msg: str, **kwargs):  self._emit("WARN", msg, **kwargs)
    def error(self, msg: str, **kwargs): self._emit("ERROR", msg, **kwargs)
    def debug(self, msg: str, **kwargs):
        if self.level <= logging.DEBUG:
            self._emit("DEBUG", msg, **kwargs)

logger = StructuredLogger("pedidos")

# ── Clientes (inicializados fora do handler = reutilizados em warm starts) ──────
dynamodb = boto3.resource("dynamodb")
tabela = dynamodb.Table(os.environ["TABELA_PEDIDOS"])

# Detecta cold start
_COLD_START = True

# ── Handler ───────────────────────────────────────────────────────────────────
def handler(event, context):
    global _COLD_START
    cold = _COLD_START
    _COLD_START = False

    logger.set_invocation_context(context.aws_request_id, cold)
    logger.info("Invocação iniciada", pedido_id=event.get("pedido_id"))

    inicio = time.time()

    try:
        resultado = processar_pedido(event)
        duracao_ms = int((time.time() - inicio) * 1000)
        logger.info(
            "Pedido processado",
            pedido_id=event["pedido_id"],
            status=resultado["status"],
            duracao_ms=duracao_ms,
        )
        return resultado

    except ValueError as e:
        logger.error(
            "Erro de validação",
            pedido_id=event.get("pedido_id"),
            error_type="ValueError",
            error_msg=str(e),
        )
        raise
    except Exception as e:
        logger.error(
            "Erro inesperado",
            pedido_id=event.get("pedido_id"),
            error_type=type(e).__name__,
            error_msg=str(e),
        )
        raise


def processar_pedido(event: dict) -> dict:
    pedido_id = event["pedido_id"]

    # ── Subsegmento: validação ─────────────────────────────────────────────────
    with xray_recorder.in_subsegment("validar-pedido") as seg:
        seg.put_annotation("pedido_id", pedido_id)
        seg.put_annotation("valor", event.get("valor", 0))
        seg.put_metadata("evento_completo", event, namespace="pedidos")

        if not pedido_id or not isinstance(event.get("valor"), (int, float)):
            raise ValueError(f"Pedido inválido: campos obrigatórios ausentes")

        if event["valor"] <= 0:
            raise ValueError(f"Valor do pedido deve ser positivo: {event['valor']}")

    # ── Subsegmento: persistência ──────────────────────────────────────────────
    with xray_recorder.in_subsegment("persistir-pedido") as seg:
        seg.put_annotation("pedido_id", pedido_id)
        # boto3 patchado → a chamada DynamoDB aparece como sub-subsegmento
        tabela.put_item(Item={
            "pedido_id": pedido_id,
            "valor": str(event["valor"]),
            "status": "PROCESSADO",
            "request_id": xray_recorder.current_segment().id,
        })

    return {"status": "PROCESSADO", "pedido_id": pedido_id}

CDK — function with all three pillars enabled

from aws_cdk import (
    Stack, Duration, RemovalPolicy,
    aws_lambda as lambda_,
    aws_logs as logs,
    aws_iam as iam,
)

class PedidosObservabilidadeStack(Stack):

    def __init__(self, scope, construct_id, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        # Layer com aws-xray-sdk (construída via Docker para compatibilidade Linux)
        xray_layer = lambda_.LayerVersion(
            self, "XRayLayer",
            code=lambda_.Code.from_asset(
                "layers/xray",
                bundling={
                    "image": lambda_.Runtime.PYTHON_3_12.bundling_image,
                    "command": [
                        "bash", "-c",
                        "pip install aws-xray-sdk -t /asset-output/python"
                    ],
                }
            ),
            compatible_runtimes=[lambda_.Runtime.PYTHON_3_12],
            description="aws-xray-sdk para instrumentação customizada",
        )

        fn = lambda_.Function(
            self, "ProcessarPedido",
            runtime=lambda_.Runtime.PYTHON_3_12,
            handler="handler.handler",
            code=lambda_.Code.from_asset("src/pedidos"),
            memory_size=256,
            timeout=Duration.seconds(30),
            layers=[xray_layer],
            environment={
                "TABELA_PEDIDOS": "pedidos",
                "POWERTOOLS_SERVICE_NAME": "pedidos",
            },
            # Pilar 1: Structured logging nativo
            logging_format=lambda_.LoggingFormat.JSON,
            system_log_level=lambda_.SystemLogLevel.INFO,
            application_log_level=lambda_.ApplicationLogLevel.INFO,
            log_retention=logs.RetentionDays.ONE_WEEK,
            # Pilar 2: X-Ray active tracing
            tracing=lambda_.Tracing.ACTIVE,
            # Pilar 3: Lambda Insights
            insights_version=lambda_.LambdaInsightsVersion.VERSION_1_0_229_0,
        )

        # Permissão adicional para DynamoDB (X-Ray já é adicionado pelo CDK)
        fn.add_to_role_policy(iam.PolicyStatement(
            actions=["dynamodb:PutItem", "dynamodb:GetItem"],
            resources=["arn:aws:dynamodb:*:*:table/pedidos"],
        ))

CloudWatch Logs Insights queries for diagnostics

-- 1. Erros das últimas 3 horas agrupados por tipo
fields @timestamp, message, error_type, pedido_id
| filter level = "ERROR"
| stats count(*) as total by error_type
| sort total desc

-- 2. Latência p95 e p99 por hora (usando campo duracao_ms do log)
fields @timestamp, duracao_ms
| filter ispresent(duracao_ms)
| stats
    pct(duracao_ms, 95) as p95,
    pct(duracao_ms, 99) as p99,
    avg(duracao_ms) as media
  by bin(1h)

-- 3. Cold starts e seus requestIds (para cruzar com X-Ray)
fields @timestamp, requestId, cold_start, message
| filter cold_start = true
| sort @timestamp desc
| limit 100

-- 4. No log group /aws/lambda/insights — funções com memory > 80%
fields @timestamp, function_name, memory_utilization, duration, cold_start
| filter memory_utilization > 80
| sort memory_utilization desc
| limit 50

Common pitfalls

Pitfall 1 — print() with a JSON object is not the same as true structured logging

The mistake: The developer does print(json.dumps({"level": "INFO", "message": "ok"})) and assumes CloudWatch Logs Insights will parse it as JSON. It works — but the timestamp generated by Lambda for the log line is not inside the JSON, making sorting difficult. Additionally, if the JSON object contains line breaks, CloudWatch may interpret it as multiple log events.

Why it happens: CloudWatch Logs captures each line (\n) as a separate event. If json.dumps doesn't have separators=(',', ':') and produces multi-line JSON, the event is fragmented.

How to avoid:
- Always use json.dumps(obj, separators=(',', ':')) (no spaces) to ensure the JSON is a single line.
- Or use json.dumps(obj) without indent (which is the default — no indent produces a single line).
- For the timestamp, rely on the @timestamp field that CloudWatch adds automatically — it's not necessary to include a timestamp in the JSON (but including one doesn't hurt and facilitates correlation).


Pitfall 2 — X-Ray SDK's patch_all() outside the handler causes errors in test environments

The mistake: The developer calls patch_all() in the module's global scope. In unit tests without the X-Ray daemon running, the SDK tries to register the trace and fails with SegmentNotFoundException: cannot find the current segment/subsegment.

Why it happens: patch_all() monkeypatches boto3 clients globally. In tests, there is no active X-Ray context — the daemon is not running and there is no open segment.

How to avoid:
- Configure the SDK to ignore errors when there is no context: xray_recorder.configure(context_missing='LOG_ERROR') (default in Lambda) or 'IGNORE_ERROR'.
- In tests, configure via environment variable: AWS_XRAY_CONTEXT_MISSING=LOG_ERROR.
- CDK/Lambda already configures this automatically when tracing is enabled, but local test environments may not have this variable.

from aws_xray_sdk.core import xray_recorder, patch_all
xray_recorder.configure(context_missing='IGNORE_ERROR')
patch_all()

Pitfall 3 — Lambda Insights without CloudWatchLambdaInsightsExecutionRolePolicy permission causes the extension to fail silently

The mistake: Lambda Insights is enabled (layer added), but metrics don't appear in /aws/lambda/insights. The function runs normally, but no data arrives.

Why it happens: The Lambda Insights extension needs permission to write logs to the /aws/lambda/insights log group with specific permissions: logs:CreateLogGroup, logs:CreateLogStream, logs:PutLogEvents. Without these permissions, the extension fails to initialize and is ignored — it doesn't throw an error in the main invocation.

How to avoid:
- With CDK: insights_version=lambda_.LambdaInsightsVersion.* adds the managed policy automatically.
- Manually: add CloudWatchLambdaInsightsExecutionRolePolicy (AWS managed) to the function's execution role.
- To verify: check the extension logs in the /aws/lambda/insights log group or enable LAMBDA_INSIGHTS_LOG_LEVEL=info in the environment variables.

# CDK faz isso automaticamente, mas se precisar fazer manualmente:
fn.role.add_managed_policy(
    iam.ManagedPolicy.from_aws_managed_policy_name(
        "CloudWatchLambdaInsightsExecutionRolePolicy"
    )
)

Reflection exercise

You have a Lambda function that processes payments and is receiving user complaints that "some payments don't process." The system has no observability configured beyond Lambda's default logs (plain text, no correlation). You need to propose an observability solution that allows, given a transaction ID reported by the user, finding in less than 5 minutes: (a) the complete log of the invocation that processed that transaction, (b) whether there was a retry or cold start, (c) which external calls (database, payment API) were made and which one was slowest, and (d) whether the problem is systemic (affects X% of transactions) or isolated.

Question: Which fields would you include in the structured logs? How would you configure X-Ray to trace the payment API call (which is external HTTP, not AWS)? What CloudWatch Logs Insights query would you use to identify whether the problem is systemic? Where would Lambda Insights help (or not help) in this diagnosis?


Resources for further study

  1. Monitor function performance with Amazon CloudWatch Lambda Insights
    URL: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-insights.html
    Official guide for enabling Lambda Insights with layer ARNs per region, step-by-step via console/CDK/CLI, and complete list of collected metrics. Includes how to interpret the /LambdaInsights dashboard.

  2. Visualize Lambda function invocations using AWS X-Ray
    URL: https://docs.aws.amazon.com/lambda/latest/dg/services-xray.html
    Explains the native X-Ray integration with Lambda: how segments are created automatically, how to enable active tracing, and how to use the X-Ray Python SDK inside Lambda functions. Includes examples of subsegments and sampling configuration.

  3. Configuring JSON and plain text log formats
    URL: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-cloudwatchlogs-logformat.html
    Documentation of the new native JSON format for system logs (START, END, REPORT). Describes the fields emitted in each event type, how to configure via console/CLI/CDK, and how to use with CloudWatch Logs Insights.