Session 030 — DynamoDB: single-table design, adjacency list and overloaded indexes
Estimated duration: 60 minutes
Prerequisites: session-029-dynamodb-access-patterns-pk-sk
Objective
By the end, you will be able to implement the adjacency list pattern to represent many-to-many
relationships in a single table (e.g., users and groups), use composite sort keys for hierarchical
queries (ORDER#123#ITEM#456), and explicitly identify when single-table design creates more
problems than it solves (e.g., teams with weak tooling, highly unpredictable access patterns).
Context
[FACT] The previous session covered the fundamentals: access patterns first, generic PK/SK with
entity prefixes, item collections. This session goes one level up: how to model many-to-many
relationships (the hardest problem in document databases), how to use the sort key to create
queryable data hierarchies at any level, and how a single GSI can index multiple entity types
via overloading.
[OPINION] The single-table vs multi-table debate is genuinely open in the DynamoDB community.
Rick Houlihan (ex-AWS) popularized single-table as dogma around 2018–2020. AWS itself, starting
in 2023–2024, began presenting in the official docs a canonical example of 15 access patterns
using multi-table design with strategic GSIs — a notable retreat from the previous position.
Today, the practical consensus is: single-table is great when entities have high access
correlation; multi-table is simpler when entities have independent lifecycles. There is no
universal answer.
Key concepts
1. Adjacency list — modeling many-to-many without joins
In SQL, a many-to-many relationship between User and Group uses a junction table:
user_group(user_id, group_id). DynamoDB has no joins. The solution is the adjacency list:
each node (entity) and each edge (relationship) becomes an item in the same table, using PK to
represent the source node and SK to represent the destination node.
Relacionamento muitos-para-muitos: Users ↔ Groups
- Um usuário pertence a muitos grupos
- Um grupo tem muitos usuários
Tradução para adjacency list:
PK SK Tipo de item
──────────────────────────────────────────────────────────────
USER#u1 USER#u1 item-raiz do usuário u1
USER#u1 GROUP#g10 aresta: u1 pertence ao grupo g10
USER#u1 GROUP#g20 aresta: u1 pertence ao grupo g20
USER#u2 USER#u2 item-raiz do usuário u2
USER#u2 GROUP#g10 aresta: u2 pertence ao grupo g10
GROUP#g10 GROUP#g10 item-raiz do grupo g10
GROUP#g10 USER#u1 aresta: g10 tem o membro u1
GROUP#g10 USER#u2 aresta: g10 tem o membro u2
GROUP#g20 GROUP#g20 item-raiz do grupo g20
GROUP#g20 USER#u1 aresta: g20 tem o membro u1
The diagram as a graph:
u1 ──── g10 ──── u2
└───── g20
Each edge is represented twice — once in the source node's partition, once in the destination
node's partition. This is intentional duplication to support both query directions with a
single operation:
"Quais grupos o usuário u1 pertence?"
Query(PK="USER#u1", begins_with(SK, "GROUP#"))
→ retorna GROUP#g10 e GROUP#g20
"Quais usuários pertencem ao grupo g10?"
Query(PK="GROUP#g10", begins_with(SK, "USER#"))
→ retorna USER#u1 e USER#u2
[FACT] The advantage of the adjacency list is that both traversal directions are O(1) Query
operations — no Scan, no application-side joins. The disadvantage is that maintaining
consistency is the code's responsibility: if you add the edge USER#u1 → GROUP#g10, you must
also add GROUP#g10 → USER#u1. This is typically done in a DynamoDB transaction
(transact_write_items) to guarantee atomicity.
[FACT] For complex many-to-many relationships (multi-hop: "friends of friends", dependency graphs,
neighbor rankings), the adjacency list in DynamoDB starts to be limiting — each "hop" requires a
new Query. For deep graphs with multi-level queries in real time, the official AWS documentation
recommends using Amazon Neptune (a dedicated graph database) instead of trying to simulate
deep graphs in DynamoDB.
2. Composite sort keys — hierarchies queryable at multiple levels
One of the most powerful properties of the sort key is that, being a string with lexicographic
ordering, you can embed hierarchies in it using separators. This allows queries that stop at
any level of the hierarchy using begins_with.
Example: order system with items and sub-items
PK SK Representa
─────────────────────────────────────────────────────────────────────
ORDER#o1 ORDER#o1 cabeçalho do pedido o1
ORDER#o1 ITEM#i100 item i100 do pedido o1
ORDER#o1 ITEM#i100#RETURN#r1 devolução r1 do item i100
ORDER#o1 ITEM#i100#RETURN#r2 devolução r2 do item i100
ORDER#o1 ITEM#i101 item i101 do pedido o1
ORDER#o1 ITEM#i101#RETURN#r3 devolução r3 do item i101
ORDER#o1 SHIPMENT#s1 envio s1 do pedido o1
ORDER#o1 SHIPMENT#s1#EVENT#e1 evento do envio (ex: saiu do CD)
ORDER#o1 SHIPMENT#s1#EVENT#e2 evento do envio (ex: em trânsito)
With this schema, one operation per hierarchy level:
"Todos os itens do pedido o1":
Query(PK="ORDER#o1", begins_with(SK, "ITEM#"))
→ retorna ITEM#i100, ITEM#i100#RETURN#r1, ITEM#i100#RETURN#r2, ITEM#i101, ...
"Só os itens (sem devoluções) do pedido o1":
Não é possível com begins_with(SK, "ITEM#") sem FilterExpression — returns todos
Solução: usar SK = "ITEM#i100" (nível exato) para GetItem, ou redesenhar o SK
"Só as devoluções do item i100":
Query(PK="ORDER#o1", begins_with(SK, "ITEM#i100#RETURN#"))
→ retorna ITEM#i100#RETURN#r1, ITEM#i100#RETURN#r2
"Todos os eventos do envio s1":
Query(PK="ORDER#o1", begins_with(SK, "SHIPMENT#s1#EVENT#"))
→ retorna EVENT#e1, EVENT#e2
[FACT] The important limitation: you cannot skip levels of the hierarchy with begins_with.
To search for "all returns of all items in order o1", you would need a Query with
begins_with(SK, "ITEM#") and then filter on the client for items containing #RETURN# — or
redesign the schema. begins_with always starts from the beginning of the string; it is not a
"contains".
Geographic example (from AWS documentation):
SK hierárquico para localização:
"BR#SP#SaoPaulo#Pinheiros#AvenidaFaria Lima"
Queries possíveis:
begins_with(SK, "BR#") → todos os endereços no Brasil
begins_with(SK, "BR#SP#") → todos no estado de São Paulo
begins_with(SK, "BR#SP#SaoPaulo#") → todos na cidade de São Paulo
begins_with(SK, "BR#SP#SaoPaulo#Pinheiros#") → só o bairro Pinheiros
3. Overloaded GSI — one index serving multiple access patterns
[FACT] DynamoDB allows up to 20 GSIs per table (default limit, increasable via quota request).
In single-table design with many entity types, 20 GSIs may be insufficient — or simply expensive
(each GSI replicates attributes and charges for storage and writes). GSI overloading uses the
flexibility of DynamoDB's schema (each item can have different attributes) to make a single GSI
serve multiple access patterns.
The core idea: if different entity types need to be queried by different attributes, you create
generic attributes (GSI1PK, GSI1SK) whose values are populated with type-specific semantic
content:
Tabela: plataforma-educacional (single-table)
Item type PK SK GSI1PK GSI1SK
──────────────────────────────────────────────────────────────────────────────
Student STUDENT#s1 PROFILE STUDENT 2024-03-01 ← query: todos alunos
Student STUDENT#s1 COURSE#c10 COURSE#c10 STUDENT#s1 ← query: alunos de um curso
Course COURSE#c10 METADATA INSTRUCTOR#i5 COURSE#c10 ← query: cursos de um instrutor
Enrollment COURSE#c10 STUDENT#s1 COURSE#c10 STUDENT#s1 ← (duplicata da aresta)
The same GSI1 (with GSI1PK as PK and GSI1SK as SK) serves three different patterns:
"Todos os alunos cadastrados (para admin)":
Query GSI1(GSI1PK="STUDENT", GSI1SK >= "2024-01-01")
"Alunos matriculados no curso c10":
Query GSI1(GSI1PK="COURSE#c10", begins_with(GSI1SK, "STUDENT#"))
"Cursos do instrutor i5":
Query GSI1(GSI1PK="INSTRUCTOR#i5", begins_with(GSI1SK, "COURSE#"))
[FACT] The GSI1SK attribute with value "STUDENT" and a date for the first pattern is an
example of the sparse index + static key as GSI PK pattern. The static key ("STUDENT")
creates a potential "hot partition" in the GSI — if thousands of students are being inserted per
second, that GSI partition can become a bottleneck. For high-throughput workloads, a shard is
added to the value: "STUDENT#0" to "STUDENT#9" randomly, and the parallel query across the
10 shards is done on the client.
4. When single-table design creates more problems than it solves
[OPINION — position from Alex DeBrie, The DynamoDB Book] Single-table is not for everyone, and
applying it dogmatically is a frequent mistake. The contexts where multi-table is the safer
choice:
1. TIME COM TOOLING FRACO OU SDK DE ALTO NÍVEL
ORM-style SDKs (Java DynamoDBMapper, Java Enhanced Client, boto3 high-level Resource)
mapeiam classes para tabelas. Misturar tipos em uma tabela exige processar resultados
heterogêneos — o código fica complexo e propenso a bugs de deserialização.
Sintoma: "meu código de Query retorna um mix de User e Order e eu não sei qual é qual"
2. ENTIDADES COM CICLOS DE VIDA E OPERACIONAIS INDEPENDENTES
Se você precisa fazer backup apenas de Orders (não de Users), ou aplicar TTL só em Sessions,
ou habilitar InfrequentAccess storage class apenas em dados históricos — single-table força
a mesma política para todos os tipos. Multi-table dá controle granular.
3. STREAMS COM PROCESSAMENTO POR TIPO
DynamoDB Streams emite um evento para cada item modificado. Em single-table, uma Lambda que
processa o stream recebe todos os tipos misturados e precisa filtrar por `entity_type`. Com
Lambda event filters por atributo isso tem custo zero de invocação para os filtrados, mas
aumenta a complexidade do código de processamento.
4. ACCESS PATTERNS ALTAMENTE IMPREVISÍVEIS OU EM CONSTANTE MUDANÇA
Single-table exige que os access patterns sejam conhecidos e estáveis antes do design. Se o
negócio muda frequentemente e novos patterns surgem, cada novo pattern pode exigir um novo
GSI — e você fica limitado pelos 20 GSIs. Multi-table permite criar índices em tabelas
específicas sem afetar o schema das outras.
5. TIMES QUE NÃO DOMINAM O MODELO MENTAL DO DYNAMODB
Single-table com adjacency lists, overloaded GSIs e composite sort keys exige que TODOS os
desenvolvedores do time entendam o design profundamente. Um desenvolvedor que não entende e
adiciona um atributo no lugar errado pode quebrar queries silenciosamente.
Multi-table tem a "estranheza" da falta de joins, mas o design por tabela é mais intuitivo.
[FACT] The official AWS documentation (page bp-modeling-nosql-B, updated in 2024) now uses an
example of 15 access patterns implemented with multi-table design with strategic GSIs as the
canonical approach — representing a notable shift from the single-table evangelism of previous
years. AWS itself documents the trade-offs of each approach in data-modeling-foundations.html
without declaring a universal winner.
[CONSENSUS] The most useful practical criterion: if you frequently need to fetch data from
multiple entities in a single call (e.g., user profile + their latest orders + their active
sessions), single-table with item collections is clearly superior. If entities are rarely
accessed together, multi-table is simpler to maintain with no performance penalty.
Practical example
Scenario: online course platform with a many-to-many relationship between Students and
Courses, and a hierarchy of Course → Module → Lesson.
Complete schema
Tabela: edtech-platform
PK (String) + SK (String) + GSI1PK (String) + GSI1SK (String)
PK SK GSI1PK GSI1SK Dados extras
─────────────────────────────────────────────────────────────────────────────────────────────
STUDENT#s1 PROFILE STUDENT 2024-09-01 name, email
STUDENT#s1 ENROLLMENT#c10 COURSE#c10 STUDENT#s1 enrolled_at, progress
STUDENT#s1 ENROLLMENT#c20 COURSE#c20 STUDENT#s1 enrolled_at, progress
COURSE#c10 METADATA INSTRUCTOR#i5 COURSE#c10 title, description
COURSE#c10 MODULE#m1 — — title, order_num
COURSE#c10 MODULE#m1#LESSON#l1 — — title, duration_min
COURSE#c10 MODULE#m1#LESSON#l2 — — title, duration_min
COURSE#c10 MODULE#m2 — — title, order_num
COURSE#c10 MODULE#m2#LESSON#l3 — — title, duration_min
COURSE#c10 ENROLLMENT#STUDENT#s1 COURSE#c10 STUDENT#s1 progress, last_seen
COURSE#c10 ENROLLMENT#STUDENT#s2 COURSE#c10 STUDENT#s2 progress, last_seen
Access patterns served
AP1: Perfil do aluno s1
GetItem(PK="STUDENT#s1", SK="PROFILE")
AP2: Todos os cursos em que s1 está matriculado
Query(PK="STUDENT#s1", begins_with(SK, "ENROLLMENT#"))
AP3: Todos os alunos matriculados no curso c10
Query GSI1(GSI1PK="COURSE#c10", begins_with(GSI1SK, "STUDENT#"))
AP4: Todos os cursos do instrutor i5
Query GSI1(GSI1PK="INSTRUCTOR#i5", begins_with(GSI1SK, "COURSE#"))
AP5: Estrutura completa do curso c10 (módulos + lições)
Query(PK="COURSE#c10", begins_with(SK, "MODULE#"))
AP6: Só os módulos do curso c10 (sem as lições)
Query(PK="COURSE#c10", begins_with(SK, "MODULE#"))
+ FilterExpression: "attribute_not_exists(lesson_id)" ← custo: lê tudo, filtra cliente
— ou redesenhar SK para módulos como "MODULE#m1#" e lições como "MODULE#m1#L#l1"
AP7: Só as lições do módulo m1 do curso c10
Query(PK="COURSE#c10", begins_with(SK, "MODULE#m1#LESSON#"))
Python code — adjacency list with transaction
import boto3
from boto3.dynamodb.conditions import Key, Attr
from botocore.exceptions import ClientError
from datetime import datetime, timezone
dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
table = dynamodb.Table("edtech-platform")
# Matricular aluno em curso — operação bidirecional em transação
def enroll_student(student_id: str, course_id: str) -> None:
now = datetime.now(timezone.utc).isoformat()
try:
dynamodb.meta.client.transact_write_items(
TransactItems=[
# Aresta: Student → Course (na partição do aluno)
{
"Put": {
"TableName": "edtech-platform",
"Item": {
"PK": {"S": f"STUDENT#{student_id}"},
"SK": {"S": f"ENROLLMENT#{course_id}"},
"GSI1PK": {"S": f"COURSE#{course_id}"},
"GSI1SK": {"S": f"STUDENT#{student_id}"},
"entity_type": {"S": "ENROLLMENT"},
"enrolled_at": {"S": now},
"progress": {"N": "0"},
},
"ConditionExpression": "attribute_not_exists(PK)", # idempotente
}
},
# Aresta inversa: Course → Student (na partição do curso)
{
"Put": {
"TableName": "edtech-platform",
"Item": {
"PK": {"S": f"COURSE#{course_id}"},
"SK": {"S": f"ENROLLMENT#STUDENT#{student_id}"},
"GSI1PK": {"S": f"COURSE#{course_id}"},
"GSI1SK": {"S": f"STUDENT#{student_id}"},
"entity_type": {"S": "ENROLLMENT"},
"enrolled_at": {"S": now},
"progress": {"N": "0"},
"last_seen": {"S": now},
},
"ConditionExpression": "attribute_not_exists(PK)",
}
},
]
)
except ClientError as e:
if e.response["Error"]["Code"] == "TransactionCanceledException":
print("Aluno já matriculado (condition check falhou)")
else:
raise
# AP2: Quais cursos o aluno está matriculado?
def get_student_courses(student_id: str) -> list[dict]:
response = table.query(
KeyConditionExpression=(
Key("PK").eq(f"STUDENT#{student_id}") &
Key("SK").begins_with("ENROLLMENT#")
)
)
return response.get("Items", [])
# AP3: Quais alunos estão no curso? (via GSI1)
def get_course_students(course_id: str) -> list[dict]:
response = table.query(
IndexName="GSI1",
KeyConditionExpression=(
Key("GSI1PK").eq(f"COURSE#{course_id}") &
Key("GSI1SK").begins_with("STUDENT#")
)
)
return response.get("Items", [])
# AP5: Estrutura do curso (módulos + lições em uma Query)
def get_course_structure(course_id: str) -> dict:
response = table.query(
KeyConditionExpression=(
Key("PK").eq(f"COURSE#{course_id}") &
Key("SK").begins_with("MODULE#")
),
ScanIndexForward=True, # ordem crescente → módulos e lições em ordem
)
items = response.get("Items", [])
# Organizar hierarquia no cliente
modules = {}
for item in items:
sk = item["SK"]
parts = sk.split("#")
if len(parts) == 2:
# É um módulo: MODULE#m1
mod_id = parts[1]
modules[mod_id] = {**item, "lessons": []}
elif len(parts) == 4 and parts[2] == "LESSON":
# É uma lição: MODULE#m1#LESSON#l1
mod_id = parts[1]
lesson_id = parts[3]
if mod_id in modules:
modules[mod_id]["lessons"].append(item)
return {"course_id": course_id, "modules": list(modules.values())}
# AP7: Lições de um módulo específico
def get_module_lessons(course_id: str, module_id: str) -> list[dict]:
response = table.query(
KeyConditionExpression=(
Key("PK").eq(f"COURSE#{course_id}") &
Key("SK").begins_with(f"MODULE#{module_id}#LESSON#")
),
ScanIndexForward=True,
)
return response.get("Items", [])
CLI — queries with overloaded GSI
# AP3: alunos matriculados no curso c10 (via GSI1)
aws dynamodb query \
--table-name edtech-platform \
--index-name GSI1 \
--key-condition-expression "GSI1PK = :pk AND begins_with(GSI1SK, :prefix)" \
--expression-attribute-values '{":pk":{"S":"COURSE#c10"}, ":prefix":{"S":"STUDENT#"}}'
# AP4: cursos do instrutor i5 (via GSI1)
aws dynamodb query \
--table-name edtech-platform \
--index-name GSI1 \
--key-condition-expression "GSI1PK = :pk AND begins_with(GSI1SK, :prefix)" \
--expression-attribute-values '{":pk":{"S":"INSTRUCTOR#i5"}, ":prefix":{"S":"COURSE#"}}'
# AP7: lições do módulo m1 do curso c10
aws dynamodb query \
--table-name edtech-platform \
--key-condition-expression "PK = :pk AND begins_with(SK, :prefix)" \
--expression-attribute-values '{
":pk":{"S":"COURSE#c10"},
":prefix":{"S":"MODULE#m1#LESSON#"}
}'
Common pitfalls
Pitfall 1: forgetting the inverse edge in the adjacency list
The most common mistake when implementing an adjacency list is writing only the edge in the
direction of the current operation. For example: when enrolling a student, the code creates
STUDENT#s1 → ENROLLMENT#c10, but forgets to create COURSE#c10 → ENROLLMENT#STUDENT#s1.
Everything works until the day someone needs AP3 ("which students are in the course?") and
discovers that the course's partition has no data. The problem is silent — no error, no exception,
just empty results. The prevention is to always implement bidirectional operations in
transact_write_items and test both query directions from the start of development.
Pitfall 2: composite sort key with begins_with does not distinguish intermediate levels
If the hierarchical SK is MODULE#m1#LESSON#l1, the query begins_with(SK, "MODULE#m1#")
returns both the module item (MODULE#m1) and all its lessons (MODULE#m1#LESSON#l1,
MODULE#m1#LESSON#l2). To fetch "only the module, without the lessons", there is no way to
express this with KeyConditionExpression alone — you would need a query with SK = "MODULE#m1"
(GetItem or Query with =), or redesign the SK so that modules and lessons have distinct prefixes
that are not a prefix of each other: e.g., modules with MOD#m1 and lessons with LES#m1#l1.
Planning prefixes so there is no ambiguity between hierarchy levels is part of the design.
Pitfall 3: GSI_PK attribute with static value at high cardinality
In the overloaded GSI, if you use GSI1PK = "STUDENT" to index all students and the table has
10 million students, that GSI partition reaches the capacity of a single partition (~10 GB,
~3,000 RCU/WCU). Writes of new students will be limited to ~1,000 WCU/s on that partition — a
classic hot partition. The symptom is ProvisionedThroughputExceededException on the GSI even
with the main table operating normally. The solution is to add a random shard suffix:
GSI1PK = "STUDENT#" + str(random.randint(0, 9)), and when fetching all students, perform 10
parallel queries (shards 0–9) and consolidate on the client.
Reflection exercise
You are modeling a simplified professional social network. The entities are: User, Company, and
the "works at" relationship (Employment) between User and Company. A user can have worked at
multiple companies (employment history), and a company has many current and former employees.
The access patterns are:
- Fetch a user's profile
- List a user's entire employment history (chronological, most recent first)
- List all current employees of a company
- List former employees of a company (status = "former")
- Fetch a user's current employment (only the most recent with status = "current")
Design the schema: what are the PK, SK, and GSI1PK/GSI1SK values for each item type (User,
Company, bidirectional Employment). Explain how the composite sort key in the Employment item
can include the employment start timestamp to solve the "most recent first" requirement. Identify
which access pattern cannot be served with just the SK and needs a GSI — and explain why a
FilterExpression would be inadequate for that case.
Resources for further study
1. Best practices for managing many-to-many relationships in DynamoDB tables
URL: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html
What you'll find: the adjacency list pattern described with the canonical Invoices-Bills
(many-to-many) example, the base table and inversion GSI schema, and the materialized graph
pattern for real-time graph workflows.
Why it's the right source: primary documentation with the exact schema and motivation for the pattern.
2. Overloading Global Secondary Indexes in DynamoDB
URL: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-gsi-overloading.html
What you'll find: a concrete example of how a single GSI with a generic Data attribute serves
multiple query types (by name, by warehouse, by hire date) — what AWS calls "GSI overloading".
Why it's the right source: the only official documentation page that names and explains the GSI
overloading pattern explicitly.
3. Example of modeling relational data in DynamoDB
URL: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-modeling-nosql-B.html
What you'll find: AWS's most complete example — 15 access patterns for an order system,
implemented with multi-table design and strategic GSIs. Includes the table mapping each access
pattern to the corresponding DynamoDB operation.
Why it's the right source: it's the current canonical AWS example (updated 2024) and demonstrates
the shift toward multi-table as a pragmatic approach — important for calibrating when to use
single-table vs multi-table.
4. Data Modeling foundations in DynamoDB
URL: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/data-modeling-foundations.html
What you'll find: explicit and objective trade-offs of single-table vs multi-table, with a list
of advantages and disadvantages of each approach (backup, encryption, streams, GraphQL, cost).
Why it's the right source: it's the most balanced official source on the single vs multi-table
debate, without dogmatism toward either approach.
5. alexdebrie.com — DynamoDB one-to-many and single-table posts
URLs: https://www.alexdebrie.com/posts/dynamodb-one-to-many/ and https://www.alexdebrie.com/posts/dynamodb-single-table/
What you'll find: the most cited technical posts on DynamoDB modeling outside of AWS documentation.
The first covers all 1:N relationship patterns (embedding, item collections, GSI). The second
presents the arguments for and against single-table with more nuance than any re:Invent talk.
Why it's the right source: [OPINION] Alex DeBrie is the author of The DynamoDB Book and probably
the most prolific technical writer on DynamoDB outside of AWS. His posts are based on practical
experience with real workloads, not evangelism of a single pattern.