Session 019 — Lambda: execution model, cold starts and provisioned concurrency
Estimated duration: 60 minutes
Prerequisites: session-004-cdk-v2-setup-bootstrap
Objective
By the end, you will be able to measure cold start of a function with and without provisioned concurrency, calculate the cost of provisioned concurrency vs on-demand for a specific traffic profile, and identify which languages and package sizes have the greatest impact on cold start.
Context
[FACT] Lambda is a serverless compute service where you pay only for the time your code executes — not for capacity allocated on standby. The corollary of this guarantee is that when there are no active executions, there are no execution environments kept warm. The next invocation that arrives when no environment is available must go through the initialization phase before executing the handler. This latency cost is called a cold start.
[FACT] Cold starts are not a defect of Lambda — they are a direct consequence of the billing model. The trade-off is: you save by paying zero when your function is not invoked, but the first invocation after a period of inactivity (or any invocation that requires a new execution environment due to horizontal scaling) has additional latency. For most asynchronous workloads, this trade-off is acceptable. For synchronous low-latency APIs with sparse traffic, it can be problematic.
[CONSENSUS] The importance of cold start is often exaggerated in community discussions. [FACT] According to AWS data, cold starts occur in less than 1% of invocations in most production workloads. The real problem appears in three specific scenarios: APIs with very sparse traffic (one invocation every 15+ minutes), functions with heavy initialization (Spring Boot Java, ML models), and interactive applications where P99 latency matters more than average.
Key concepts
1. The execution environment lifecycle
[FACT] A Lambda execution environment goes through three phases:
┌─────────────────────────────────────────────────────────────────┐
│ FASE INIT (cold start — cobrada como duração) │
│ │
│ 1. Download do código/layer (da origem: S3, ECR) │
│ └─ Frequentemente chamado de "cold start" no senso estrito │
│ │
│ 2. Inicialização do runtime (Node.js, Python, JVM, etc.) │
│ │
│ 3. Execução do código de inicialização estática │
│ └─ Tudo FORA do handler: imports, conexões de DB, │
│ carregamento de modelos, inicialização de SDKs │
│ │
│ [sinaliza ready → Next API] │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ FASE INVOKE (execução do handler) │
│ │
│ handler(event, context) { ... } │
│ │
│ Cobrada por duração × memória │
└─────────────────────────────────────────────────────────────────┘
│
│ função retorna, environment fica em standby
▼
┌─────────────────────────────────────────────────────────────────┐
│ FASE SHUTDOWN (eventual) │
│ │
│ Lambda decide reciclar o environment (após inatividade) │
│ Extensions têm até 2s para finalizar │
└─────────────────────────────────────────────────────────────────┘
[FACT] The INIT phase has a limit of 10 seconds. If the initialization code (outside the handler) takes more than 10 seconds to complete, Lambda retries on the first invocation using the function's configured timeout. Functions with heavy Spring Boot or multi-GB ML model loading can exceed this limit.
[FACT] What happens between the INVOKE phase of one invocation and the next is important: the execution environment is frozen (CPU suspended, memory retained). Global variables, database connections, and in-memory caches persist between warm invocations. This is the mechanism that enables database connection reuse.
# Python: conexão de banco criada UMA VEZ na inicialização estática
# Persiste entre invocações do mesmo execution environment
import boto3
import psycopg2
# Código FORA do handler: executado apenas no cold start
db_connection = psycopg2.connect(host=os.environ['DB_HOST'], ...)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])
def handler(event, context):
# Reutiliza db_connection e table (sem overhead de reconexão)
result = table.get_item(Key={'id': event['id']})
return result['Item']
2. Factors that influence cold start duration
[FACT] The main factors, in order of impact:
Runtime/language:
Tempo típico de cold start (inicialização do runtime, sem código de app):
Python 3.x: ~100-200ms
Node.js 20+: ~100-200ms
Go (custom): ~50-100ms (binário estático, sem VM)
.NET 8: ~300-600ms (com SnapStart: ~10ms)
Java 21: ~500-2000ms (com SnapStart: ~10ms)
Java Spring Boot: 3000-8000ms (sem otimizações)
Nota: esses valores são aproximações baseadas em benchmarks da comunidade
e variam com memória alocada, tamanho do package, e carga dos datacenter.
[FACT] Allocated memory has an indirect impact on cold start: more memory = more proportional CPU (Lambda allocates CPU proportional to memory). Doubling memory from 512MB to 1024MB can reduce static initialization time by up to 50% for functions with CPU-intensive initialization.
[FACT] Deployment package size impacts the download/extraction time during cold start. Practical references:
Package pequeno (< 5 MB): impacto mínimo (< 50ms adicional)
Package médio (5-50 MB): 50-200ms adicional
Package grande (> 50 MB): 200ms+ adicional
Container image (> 500 MB): pode adicionar 1-3s no primeiro pull
Mitigação: Lambda mantém cache do código por período de tempo não divulgado.
Pulls subsequentes ao mesmo código são muito mais rápidos.
[FACT] VPC was historically the biggest cold start multiplier (adding 1-3 seconds to provision an ENI). [FACT] Since 2019, AWS changed the VPC architecture to pre-provision ENIs. Cold starts in functions with VPC are now comparable to functions without VPC in most cases. Residual impact still exists in accounts with few recently created VPC execution environments.
3. Provisioned Concurrency
[FACT] Provisioned Concurrency (PC) maintains a configured number of pre-initialized and warm execution environments, eliminating cold starts for those environments. When an invocation reaches a PC environment, it enters the INVOKE phase directly without going through INIT.
Sem PC:
Invocação 1: [INIT 800ms] + [INVOKE 50ms] = 850ms de latência
Invocação 2: [INVOKE 50ms] = 50ms (warm)
Com PC (2 environments provisionados):
Invocação 1: [INVOKE 50ms] (environment já inicializado) = 50ms
Invocação 2: [INVOKE 50ms] = 50ms
Invocação 3: [INIT 800ms] + [INVOKE 50ms] = 850ms (PC esgotado → on-demand)
[FACT] PC must be configured on a specific version or alias, not on $LATEST:
# ❌ Errado: $LATEST não suporta PC
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier '$LATEST' \
--provisioned-concurrent-executions 10
# ✅ Correto: versão numerada
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier 3 # versão 3
--provisioned-concurrent-executions 10
# ✅ Correto: alias
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier prod # alias 'prod'
--provisioned-concurrent-executions 10
[FACT] Provisioned Concurrency cost:
Cobrança de PC (us-east-1, referência):
$0.0000646234 por GB-segundo de PC alocado
$0.0000097656 por GB-segundo de invocação em PC
Cobrança on-demand (para comparação):
$0.0000200000 por GB-segundo de invocação
Exemplo: 10 PC × 1GB × 24h × 30 dias
PC alocado: 10 × 1 × 86400 × 30 × $0.0000646234 = $1,679.08/mês
Invocação em PC: depende do tráfego real
Total PC para 10 environments 1GB: ~$1,679/mês em alocação apenas!
→ PC tem custo fixo ALTO. Só é justificável se o custo do cold start
(perda de SLA, usuários impactados) for maior que esse valor.
[FACT] A more cost-controlled alternative is to use Application Auto Scaling to dynamically adjust PC based on schedule:
# Escalar PC para 10 das 8h às 18h (horário comercial), 2 fora do horário
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id function:my-function:prod \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--min-capacity 2 \
--max-capacity 10
4. SnapStart: cold starts for Java and .NET
[FACT] SnapStart is a Lambda feature that captures a snapshot of the execution environment after the INIT phase and reuses it for new invocations, eliminating runtime and static code initialization time. Instead of initializing the JVM and loading all classes on every cold start, Lambda restores the memory and disk state from the snapshot in milliseconds.
Sem SnapStart (Java Spring Boot):
Deploy nova versão → [JVM init: 1s] + [Spring init: 5s] + [handler: 100ms]
Cold start total: ~6.1 segundos
Com SnapStart:
Deploy nova versão → Lambda faz INIT e tira snapshot
Invocação cold → [restaura snapshot: ~10ms] + [handler: 100ms]
Cold start total: ~110ms
[FACT] SnapStart limitations (May 2026):
- Available only for Java (managed runtimes) and .NET 8+.
- Does not support functions using container images (only deployment packages).
- Does not support functions with ephemeralStorage > 512MB.
- The snapshot is taken per version — each PublishVersion generates a new snapshot.
- Code that uses timestamps or randoms during INIT may have unexpected behavior when restored (time "freezes" in the snapshot).
[FACT] SnapStart has zero configuration cost. Billing is for normal invocation time (there is no "allocation" cost like PC).
Practical example
Scenario: You have a REST API with Lambda + API Gateway. The function uses Node.js and connects to RDS PostgreSQL. Traffic is irregular: peaks of 500 req/s during business hours and nearly zero at night. P99 latency must be < 200ms.
Measuring cold start in CloudWatch
# Filtrar invocações com Init Duration nos logs do Lambda
# (Init Duration aparece apenas em cold starts)
aws logs filter-log-events \
--log-group-name /aws/lambda/my-api-function \
--filter-pattern '"Init Duration"' \
--start-time $(date -d '1 hour ago' +%s)000 \
| jq '.events[].message' \
| grep -o 'Init Duration: [0-9.]*'
# Saída típica:
# Init Duration: 823.45
# Init Duration: 791.12
# Init Duration: 1203.78
CloudWatch Insights for cold start analysis at scale:
-- Query CloudWatch Insights: distribuição de cold starts por hora
filter @message like /Init Duration/
| parse @message "Init Duration: * ms" as initDuration
| stats
count(*) as coldStarts,
avg(initDuration) as avgInit,
pct(initDuration, 95) as p95Init,
max(initDuration) as maxInit
by bin(1h)
| sort by bin(1h) desc
Calculating cost: PC vs on-demand for a traffic profile
Perfil de tráfego:
8h-18h (10h/dia × 22 dias úteis = 220h/mês): 100 invocações/s
Resto (530h/mês): 2 invocações/s
Função: 1GB memória, 100ms de execução média
On-demand (sem PC):
Total invocações/mês:
100 req/s × 220h × 3600s = 79,200,000 invocações em pico
2 req/s × 530h × 3600s = 3,816,000 invocações fora do pico
Total: 83,016,000 invocações
Custo GB-segundos: 83,016,000 × 0.1s × 1GB = 8,301,600 GB-s
Custo invocações: $0.20/M × 83.016M = $16.60
Custo GB-s: $0.0000200000 × 8,301,600 = $166.03
Total on-demand: ~$182.63/mês
(+ cold starts em ~1% das invocações = ~830,000 cold starts)
Com PC de 40 environments (pico de 100 req/s × 100ms = 10 concurrent + margem):
PC allocation: 40 × 1GB × 720h = 28,800 GB-h = 103,680,000 GB-s
Custo PC alocado: 103,680,000 × $0.0000646234 = $6,698/mês
→ PC é MUITO mais caro para esse perfil.
Com PC de 5 environments (garante resposta rápida para tráfego baixo):
PC allocation: 5 × 1GB × 720h × 3600s/h = 12,960,000 GB-s
Custo PC: 12,960,000 × $0.0000646234 = $837/mês
→ Ainda mais caro que on-demand. PC só compensa se o SLA
exigir que TODOS os cold starts sejam eliminados, não apenas
os de baixo tráfego.
Conclusão para esse perfil:
Melhor estratégia: otimizar o código estático para reduzir cold start
(conexão lazy de DB, remover dependências não usadas)
+ aceitar cold starts ocasionais em troca de custo 10x menor.
CDK with Provisioned Concurrency via alias
import { Stack, Duration } from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as appscaling from 'aws-cdk-lib/aws-applicationautoscaling';
// Função
const fn = new lambda.Function(this, 'ApiFunction', {
runtime: lambda.Runtime.NODEJS_22_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('lambda'),
memorySize: 1024,
timeout: Duration.seconds(30),
});
// Publica versão imutável (necessária para PC)
const version = fn.currentVersion;
// Alias 'prod' aponta para a versão atual
const prodAlias = new lambda.Alias(this, 'ProdAlias', {
aliasName: 'prod',
version,
provisionedConcurrentExecutions: 5, // PC configurado no alias
});
// Auto Scaling do PC baseado em agendamento
const target = prodAlias.addAutoScaling({
minCapacity: 2,
maxCapacity: 20,
});
// Escala para 10 às 8h, volta para 2 às 18h (UTC-3 = UTC+3 invertido)
target.scaleOnSchedule('ScaleUpMorning', {
schedule: appscaling.Schedule.cron({ hour: '11', minute: '0' }), // 8h BRT
minCapacity: 10,
});
target.scaleOnSchedule('ScaleDownEvening', {
schedule: appscaling.Schedule.cron({ hour: '21', minute: '0' }), // 18h BRT
minCapacity: 2,
});
Common pitfalls
Pitfall 1: Database connections created inside the handler
The mistake: The code creates a new database connection on every invocation:
def handler(event, context):
# ❌ Conexão criada DENTRO do handler
conn = psycopg2.connect(host=DB_HOST, ...)
result = conn.execute("SELECT ...")
conn.close()
return result
Why it happens: The developer doesn't know the execution environment model or is worried about "stale" connections in warm environments.
The cost: A TCP + TLS connection to RDS takes 50-200ms. For a function that executes in 10ms, this is a 500-2000% overhead. With 1000 invocations/minute, this creates and destroys 1000 connections/minute on the database, potentially exhausting the RDS max_connections.
How to avoid: Create connections in the global scope (outside the handler). Use RDS Proxy to manage the connection pool on the database side — it aggregates connections from multiple execution environments into a smaller pool for RDS.
# ✅ Conexão criada FORA do handler (inicialização estática)
conn = None
def get_connection():
global conn
if conn is None or conn.closed:
conn = psycopg2.connect(host=DB_HOST, ...)
return conn
def handler(event, context):
c = get_connection()
result = c.execute("SELECT ...")
return result
Pitfall 2: Provisioned Concurrency on $LATEST
The mistake: The pipeline configures PC on the $LATEST version of the function:
aws lambda put-provisioned-concurrency-config \
--function-name my-function \
--qualifier '$LATEST' \ # ← isso falha!
--provisioned-concurrent-executions 5
Why it happens: $LATEST is a special alias that points to the most recent code. It's convenient for development but doesn't support PC because it's mutable — Lambda cannot maintain a snapshot of something that changes on every deploy.
How to recognize: Error InvalidParameterValueException: Provisioned Concurrency Configurations are not supported on unpublished versions. The cold start continues happening even after configuring PC.
How to avoid: Always publish a version (PublishVersion) before configuring PC, and apply PC to the numbered version or to an alias that points to it. In CDK, use fn.currentVersion which automatically publishes an immutable version.
Pitfall 3: Unnecessary global imports increasing cold start
The mistake: The function imports complete libraries when it only uses one function from each:
# ❌ Importa o SDK inteiro na inicialização
import boto3
import pandas as pd
import numpy as np
from PIL import Image
def handler(event, context):
# Só usa S3 e uma operação de JSON
s3 = boto3.client('s3')
# pandas, numpy, PIL nunca são usados nesta função
Why it happens: Copying code from elsewhere without reviewing imports, or keeping dependencies from a previous version of the function that was simplified.
The cost: pandas + numpy together can add 300-500ms to cold start just from the import. A function that executes in 50ms can have an 800ms cold start because of unused imports.
How to avoid: Audit imports regularly. Use lazy imports (inside the handler) for rarely used libraries. For Node.js, use bundlers (esbuild via CDK's NodejsFunction) that perform tree-shaking and eliminate unused code from the final bundle.
Reflection exercise
You have three Lambda functions with distinct profiles:
-
auth-validator: Node.js, 128MB, executes in 5ms, invoked 10,000 times/minute constantly 24h/day. Current cold start: 200ms.
-
report-generator: Java Spring Boot, 3GB, executes in 8s, invoked 2 times/hour, only during business hours. Current cold start: 6s.
-
image-resizer: Python, 512MB, executes in 300ms, invoked in bursts from 0 to 500 req/s in less than 1 minute (unpredictable marketing campaigns). Current cold start: 400ms.
For each function, decide: is the cold start a real problem in the usage context? If so, which strategy would you use — code optimization, Provisioned Concurrency, SnapStart, or another approach? Calculate (approximately) the monthly cost of each chosen strategy and compare with the cost of doing nothing (how many users affected, what impact on SLA).
Resources for deeper learning
1. Understanding the Lambda execution environment lifecycle
URL: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
What to find: Detailed description of the Init, Invoke, and Shutdown phases, the 10-second limit of the Init phase, execution environment reuse behavior, and how extensions interact with the lifecycle.
Why it's the right source: It's the primary documentation of the execution model — the foundation for understanding all other concepts in this session.
2. Configuring provisioned concurrency for a function
URL: https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html
What to find: How to configure PC on versions and aliases, the billing difference between allocated PC and PC invocation, how behavior changes when PC is exhausted (fallback to on-demand), and how to use Application Auto Scaling to dynamically adjust PC.
Why it's the right source: It's the official feature reference, with all configuration and cost details.
3. Optimizing static initialization
URL: https://docs.aws.amazon.com/lambda/latest/dg/static-initialization.html
What to find: Best practices for reducing static initialization time (outside the handler): lazy initialization, caching heavy objects, measuring with Init Duration in logs, and examples by language.
Why it's the right source: It's the practical guide for cold start optimization without needing PC — the first line of defense before considering provisioned concurrency.