luizmachado.dev

PT EN

Session 035 — DynamoDB Global Tables: eventual consistency and conflict resolution

Dependencies: session-034-dynamodb-transacoes-condicionais


Objective

By the end of this session, you will be able to enable Global Tables v2019 across multiple regions, explain the conflict resolution mechanism (last-writer-wins based on internal timestamp), identify patterns where Global Tables adds real value (per-region read latency, active-active DR) vs where it is overkill, and calculate the additional replication cost with rWRU/rWCU.


Context

[FACT] DynamoDB Global Tables is a managed multi-region, multi-active replication feature: each replica accepts reads and writes. When an application writes to a replica, DynamoDB automatically replicates the change to all other replicas. There is no "primary" replica — all are active.

[FACT] There are two versions: 2019.11.21 (Current) and 2017.11.29 (Legacy). Always use the 2019 version, which is more cost-efficient, supports more features (tables with existing data, on-demand mode) and has a simplified billing model.


Key concepts

1. Consistency modes: MREC vs. MRSC

[FACT] Global Tables supports two consistency modes, defined at creation and immutable afterwards:

                    MREC                         MRSC
                    (Multi-Region Eventual        (Multi-Region Strong
                    Consistency) — default        Consistency)
────────────────────────────────────────────────────────────────────────
Replicação          Assíncrona                   Síncrona (≥ 1 região
                                                 antes de confirmar)
Latência de escrita Baixa                        Mais alta
Strongly consistent Retorna dado local           Sempre retorna dado
read                (pode ser stale se escrito   mais recente de qualquer
                    em outra região)             região
RPO                 > 0 (segundos tipicamente)   0
TransactWriteItems  Suportado (só na região      NÃO suportado
                    de origem — não ACID cross)
TTL                 Suportado                    NÃO suportado
LSI                 Suportado                    NÃO suportado
Regiões             Qualquer região disponível   Apenas conjuntos fixos:
                                                 US (N.Virginia+Ohio+Oregon),
                                                 EU (Ireland+London+Paris+Frankfurt),
                                                 AP (Tokyo+Seoul+Osaka)
Topologia           Qualquer número de réplicas  Exatamente 3 regiões
                                                 (2 réplicas + 1 witness,
                                                 ou 3 réplicas)

[FACT] The mode cannot be changed after creation. To switch from MREC to MRSC you would need to recreate the table.


2. Conflict resolution: last-writer-wins

[FACT] In MREC mode, if the same item is modified simultaneously in multiple regions, DynamoDB resolves the conflict with last-writer-wins based on internal timestamp:

Região us-east-1                    Região eu-west-1
──────────────────────              ──────────────────────
t=100ms: Write item X               t=101ms: Write item X
         {status: "A"}                       {status: "B"}
         (timestamp interno: T1)             (timestamp interno: T2)

Replicação cruzada:
  us-east-1 recebe escrita de eu-west-1 (T2)
  eu-west-1 recebe escrita de us-east-1 (T1)

Resolução:
  T2 > T1 → versão "B" vence em AMBAS as regiões
  Resultado final: item X = {status: "B"} nas duas réplicas

IMPORTANTE: a aplicação em us-east-1 NÃO é notificada que sua
escrita foi sobrescrita. Ela recebeu "success" no PutItem.

[FACT] The timestamp used is internal to DynamoDB — it is not a visible attribute in the application nor is it the createdAt/updatedAt that you define. It is not possible to control or influence the conflict resolution outcome.

[CONSENSUS] For workloads where simultaneous conflicts are possible and silent loss of writes is unacceptable (e.g., financial transactions, inventory), Global Tables MREC is not suitable. Use MRSC or design the application to write to a single region.


3. Billing model: rWRU and rWCU

[FACT] When a table becomes part of a Global Table, write units change from WRU/WCU to rWRU/rWCU (replicated), charged in all regions that contain a replica:

Cenário: on-demand, 2 regiões (us-east-1 + eu-west-1)
         Item de 1 KB escrito em us-east-1

─────────────────────────────────────────────────────────────
Tabela single-region:   1 WRU (em us-east-1)
─────────────────────────────────────────────────────────────
Tabela global (2 regiões):
  us-east-1: 1 rWRU  (onde foi escrito)
  eu-west-1: 1 rWRU  (réplica de destino)
  Total: 2 rWRU
─────────────────────────────────────────────────────────────
Tabela global (3 regiões):
  us-east-1: 1 rWRU
  eu-west-1: 1 rWRU
  ap-southeast-1: 1 rWRU
  Total: 3 rWRU
─────────────────────────────────────────────────────────────

Preço: rWRU ≈ mesmo preço que WRU (us-east-1: $1.25 / milhão)
       → com 2 regiões: custo de escrita ~2×
       → com 3 regiões: custo de escrita ~3×

GSI updates: cobrados em WRU (não rWRU) em todas as regiões
             → um write com 1 GSI em 2 regiões:
             2 rWRU (base table) + 2 WRU (GSI) = 4 unidades no total

Leituras: cobradas normalmente em RRU/RCU por réplica
          (sem multiplicador — você paga apenas pelas leituras feitas)

Cross-region data transfer: cobrança adicional por GB transferido
                            entre regiões

[FACT] MRSC mode with witness: replication to the witness does not generate rWRU cost, storage, or data transfer. The witness is transparent in terms of billing.


4. Synchronized vs. non-synchronized settings

[FACT] Not all configurations are synchronized between replicas:

SEMPRE sincronizados (mudar em uma réplica → muda em todas):
  - Capacity mode (provisioned ↔ on-demand)
  - Write capacity (provisioned) e write auto scaling
  - Schema de chave e definições de GSI
  - SSE type (KMS key type)
  - TTL (configuração, não os valores dos itens)
  - Streams definition (MREC)

SINCRONIZADOS mas sobrescritíveis por réplica:
  - Read capacity e read auto scaling (override por réplica)
  - Table class
  - On-demand max read throughput

NUNCA sincronizados (gerenciar manualmente por réplica):
  - Deletion protection
  - Point-in-time recovery (PITR)
  - Tags
  - CloudWatch Contributor Insights
  - Kinesis Data Streams
  - Resource Policies

[CONSENSUS] Non-synchronized PITR is a common pitfall: enabling PITR on the original table does not protect the replicas. You must enable PITR on each replica individually.


5. DAX with Global Tables: critical behavior

[FACT] Writes replicated from other regions arrive directly at DynamoDB, bypassing DAX. The DAX in the destination region is not updated when a replication arrives — only when the cache TTL expires or when the local application writes through DAX.

Região eu-west-1:
  Aplicação escreve via DAX → DynamoDB → replicado para us-east-1

Região us-east-1:
  Replicação chega → DynamoDB atualizado
  DAX em us-east-1: STALE até TTL expirar
  Aplicação lê via DAX em us-east-1: pode retornar dado antigo
  (mesmo após a replicação ter chegado ao DynamoDB)

[CONSENSUS] If you use DAX with Global Tables and data is written in multiple regions, adjust the item cache TTL to reflect the acceptable staleness tolerance — or avoid DAX for those access patterns.


Practical example

Scenario: global content catalog — low read latency per region

Use case: streaming platform with users in North America, Europe, and Asia Pacific. Content is published centrally (us-east-1), read in all regions. User writes (favorites, history) happen locally.

CDK Python — MREC Global Table with 3 regions

# IMPORTANTE: Global Tables requerem que as réplicas existam em stacks separadas
# ou usando CfnGlobalTable (L1 construct). O construct L2 Table não suporta
# Global Tables diretamente — é necessário usar CfnGlobalTable ou
# table.add_global_secondary_index após a criação via CfnReplicationGroup.
#
# Padrão recomendado com CDK: usar CfnGlobalTable (L1).

from aws_cdk import (
    Stack, RemovalPolicy,
    aws_dynamodb as dynamodb,
)
from constructs import Construct


class GlobalContentTableStack(Stack):
    """
    Stack deployada em us-east-1.
    O CfnGlobalTable cria réplicas nas regiões especificadas automaticamente.
    """

    def __init__(self, scope: Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        # CfnGlobalTable cria a tabela global com réplicas em múltiplas regiões
        global_table = dynamodb.CfnGlobalTable(
            self, "ContentGlobalTable",
            table_name="content-table",
            billing_mode="PAY_PER_REQUEST",
            attribute_definitions=[
                dynamodb.CfnGlobalTable.AttributeDefinitionProperty(
                    attribute_name="PK", attribute_type="S"
                ),
                dynamodb.CfnGlobalTable.AttributeDefinitionProperty(
                    attribute_name="SK", attribute_type="S"
                ),
                dynamodb.CfnGlobalTable.AttributeDefinitionProperty(
                    attribute_name="contentType", attribute_type="S"
                ),
                dynamodb.CfnGlobalTable.AttributeDefinitionProperty(
                    attribute_name="publishedAt", attribute_type="S"
                ),
            ],
            key_schema=[
                dynamodb.CfnGlobalTable.KeySchemaProperty(
                    attribute_name="PK", key_type="HASH"
                ),
                dynamodb.CfnGlobalTable.KeySchemaProperty(
                    attribute_name="SK", key_type="RANGE"
                ),
            ],
            global_secondary_indexes=[
                dynamodb.CfnGlobalTable.GlobalSecondaryIndexProperty(
                    index_name="ContentByTypeDate",
                    key_schema=[
                        dynamodb.CfnGlobalTable.KeySchemaProperty(
                            attribute_name="contentType", key_type="HASH"
                        ),
                        dynamodb.CfnGlobalTable.KeySchemaProperty(
                            attribute_name="publishedAt", key_type="RANGE"
                        ),
                    ],
                    projection=dynamodb.CfnGlobalTable.ProjectionProperty(
                        projection_type="INCLUDE",
                        non_key_attributes=["title", "thumbnail", "duration"],
                    ),
                )
            ],
            # Streams habilitado (obrigatório para MREC — gerenciado pelo DynamoDB)
            stream_specification=dynamodb.CfnGlobalTable.StreamSpecificationProperty(
                stream_view_type="NEW_AND_OLD_IMAGES"
            ),
            # SSE com chave gerenciada pela AWS (sincronizado entre réplicas)
            sse_specification=dynamodb.CfnGlobalTable.SSESpecificationProperty(
                sse_enabled=True,
            ),
            # TTL no atributo "expiresAt"
            time_to_live_specification=dynamodb.CfnGlobalTable.TimeToLiveSpecificationProperty(
                attribute_name="expiresAt",
                enabled=True,
            ),
            # Réplicas: uma por região
            replicas=[
                # Região principal (us-east-1) — onde a stack é deployada
                dynamodb.CfnGlobalTable.ReplicaSpecificationProperty(
                    region="us-east-1",
                    point_in_time_recovery_specification=dynamodb.CfnGlobalTable.PointInTimeRecoverySpecificationProperty(
                        point_in_time_recovery_enabled=True
                    ),
                    tags=[{"key": "Environment", "value": "production"}],
                ),
                # Réplica Europa
                dynamodb.CfnGlobalTable.ReplicaSpecificationProperty(
                    region="eu-west-1",
                    point_in_time_recovery_specification=dynamodb.CfnGlobalTable.PointInTimeRecoverySpecificationProperty(
                        point_in_time_recovery_enabled=True
                    ),
                    tags=[{"key": "Environment", "value": "production"}],
                ),
                # Réplica Ásia Pacífico
                dynamodb.CfnGlobalTable.ReplicaSpecificationProperty(
                    region="ap-southeast-1",
                    point_in_time_recovery_specification=dynamodb.CfnGlobalTable.PointInTimeRecoverySpecificationProperty(
                        point_in_time_recovery_enabled=True
                    ),
                    tags=[{"key": "Environment", "value": "production"}],
                ),
            ],
        )

Python — Regional write and read with fallback

import boto3
from botocore.exceptions import ClientError
from datetime import datetime, timezone

# Cada região tem seu próprio cliente
REGIONS = ["us-east-1", "eu-west-1", "ap-southeast-1"]
HOME_REGION = "us-east-1"  # região onde conteúdo é publicado

clients = {
    region: boto3.resource("dynamodb", region_name=region)
    for region in REGIONS
}

def get_local_table(region: str):
    return clients[region].Table("content-table")


def publish_content(content_id: str, title: str, content_type: str, body: str) -> dict:
    """
    Publica conteúdo na região home (us-east-1).
    DynamoDB replica automaticamente para eu-west-1 e ap-southeast-1.

    ATENÇÃO: leituras em outras regiões após este write podem ser
    eventually consistent — podem retornar dado stale por alguns milissegundos
    a segundos dependendo da ReplicationLatency.
    """
    table = get_local_table(HOME_REGION)
    now = datetime.now(timezone.utc).isoformat()

    table.put_item(
        Item={
            "PK": f"CONTENT#{content_id}",
            "SK": "METADATA",
            "contentId": content_id,
            "title": title,
            "contentType": content_type,
            "body": body,
            "publishedAt": now,
            "status": "PUBLISHED",
            # Região de origem — útil para filtrar Stream records por região
            "originRegion": HOME_REGION,
        },
        # put-if-not-exists
        ConditionExpression="attribute_not_exists(PK)",
    )
    return {"contentId": content_id, "publishedAt": now}


def get_content(content_id: str, preferred_region: str) -> dict | None:
    """
    Lê conteúdo da réplica local ao usuário.
    Se a réplica local estiver indisponível, tenta home region.

    Para conteúdo publicado recentemente, pode retornar dado stale
    (ReplicationLatency tipicamente < 1 segundo para regiões próximas).
    """
    regions_to_try = [preferred_region]
    if preferred_region != HOME_REGION:
        regions_to_try.append(HOME_REGION)

    for region in regions_to_try:
        try:
            response = get_local_table(region).get_item(
                Key={"PK": f"CONTENT#{content_id}", "SK": "METADATA"},
            )
            item = response.get("Item")
            if item:
                return item
        except ClientError as e:
            if e.response["Error"]["Code"] in (
                "ProvisionedThroughputExceededException",
                "ServiceUnavailable",
            ):
                # Tenta próxima região
                continue
            raise

    return None


def record_user_favorite(
    user_id: str,
    content_id: str,
    user_region: str,
) -> None:
    """
    Favorito do usuário é escrito na réplica local ao usuário.
    Não há conflito esperado (um usuário opera em uma região por vez).

    Em casos raros (mesmo usuário em múltiplas sessões simultâneas
    em regiões diferentes), last-writer-wins resolve.
    """
    table = get_local_table(user_region)
    table.put_item(
        Item={
            "PK": f"USER#{user_id}",
            "SK": f"FAVORITE#{content_id}",
            "userId": user_id,
            "contentId": content_id,
            "createdAt": datetime.now(timezone.utc).isoformat(),
            "originRegion": user_region,
        }
    )

CLI — Create Global Table and monitor replication

# 1. Criar tabela single-region primeiro (base para Global Table)
aws dynamodb create-table \
  --table-name content-table \
  --attribute-definitions \
      AttributeName=PK,AttributeType=S \
      AttributeName=SK,AttributeType=S \
  --key-schema \
      AttributeName=PK,KeyType=HASH \
      AttributeName=SK,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES \
  --region us-east-1

# Aguardar tabela ficar ACTIVE
aws dynamodb wait table-exists --table-name content-table --region us-east-1

# 2. Adicionar réplica para criar Global Table (v2019)
aws dynamodb update-table \
  --table-name content-table \
  --replica-updates '[
    {"Create": {"RegionName": "eu-west-1"}},
    {"Create": {"RegionName": "ap-southeast-1"}}
  ]' \
  --region us-east-1

# 3. Verificar status das réplicas
aws dynamodb describe-table \
  --table-name content-table \
  --region us-east-1 \
  --query 'Table.Replicas[*].{Region:RegionName,Status:ReplicaStatus}'

# 4. Habilitar PITR em CADA réplica individualmente (NÃO é sincronizado)
for region in eu-west-1 ap-southeast-1; do
  aws dynamodb update-continuous-backups \
    --table-name content-table \
    --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true \
    --region "$region"
  echo "PITR enabled in $region"
done

# 5. Monitorar ReplicationLatency (MREC) entre regiões
# Métrica publicada por par de regiões (source→destination)
aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ReplicationLatency \
  --dimensions \
      Name=TableName,Value=content-table \
      Name=ReceivingRegion,Value=eu-west-1 \
  --start-time "$(date -u -d '1 hour ago' '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -v-1H '+%Y-%m-%dT%H:%M:%SZ')" \
  --end-time "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" \
  --period 60 \
  --statistics Average,Maximum \
  --region us-east-1

# 6. Verificar versão da Global Table (confirmar que é v2019)
aws dynamodb describe-table \
  --table-name content-table \
  --region us-east-1 \
  --query 'Table.GlobalTableVersion'
# Resultado esperado: "2019.11.21"

# 7. Calcular custo aproximado de escritas (exemplo: 1M writes/dia, item 1KB, 3 regiões)
echo "Writes: 1.000.000/dia × 3 regiões × \$1.25/milhão rWRU"
echo "= \$3.75/dia apenas em rWRU (sem contar cross-region transfer)"

# 8. Remover réplica quando não mais necessária
aws dynamodb update-table \
  --table-name content-table \
  --replica-updates '[{"Delete": {"RegionName": "ap-southeast-1"}}]' \
  --region us-east-1

Common pitfalls

1. Silent last-writer-wins on concurrent writes
[FACT] The application is not notified when its write is overwritten by conflict resolution. The PutItem returns success, but the data may be discarded seconds later when replication from another region with a more recent timestamp arrives. Critical workloads that do not tolerate loss of writes should use MRSC or redirect writes to a single region.

2. PITR is not synchronized — each replica needs to be configured
[FACT] Enabling PITR on the original table does not activate PITR on the replicas. In an incident that requires a restore, replicas without PITR cannot be restored to a previous point. Configure PITR on each replica individually and automate this via IaC.

3. ACID transactions only within the origin region
[FACT] TransactWriteItems is atomic only in the region where it was invoked. Other replicas may see transient partial state during propagation. If you use Global Tables MREC and transactions, clients in other regions must be designed to tolerate this behavior or use TransactGetItems for atomic post-transaction reads.

4. Streams in MREC: records may differ between replicas
[FACT] In MREC, the replication process may combine multiple changes into a single replicated write. The Stream records of one replica may differ from those of another replica — both in content and in ordering between items. If you process Streams for CDC, add an originRegion attribute to the item and filter in the Lambda to process only records originating in the desired region.

5. DAX + Global Tables = stale cache from replication
[FACT] Writes replicated from other regions do not update the local DAX cache. An item updated in eu-west-1 may be read from DAX in us-east-1 with the old value until the TTL expires. Adjust the item cache TTL to reflect the staleness tolerance, or use direct reads to DynamoDB for critical data.

6. Cost multiplication with GSIs
[FACT] GSI writes in Global Tables are charged in WRU (not rWRU) in each replica. With 3 replicas and 1 GSI per table: a 1KB write generates 3 rWRU (base table) + 3 WRU (GSI) = 6 units total. With multiple GSIs the cost scales rapidly — design only the GSIs that are truly necessary.


Reflection exercise

You are designing a user profile system for an app with users in Brazil (sa-east-1), USA (us-east-1), and Europe (eu-west-1). Each user edits their own profile; there are no cross-writes (a Brazilian user never edits a European user's profile).

Answer:

  1. Given the described access pattern, what is the practical probability of a last-writer-wins conflict? Does this make MREC safe for this specific use case? What type of workload would make MREC inadequate even with the same data model?

  2. The team wants to ensure that after a profile write, the subsequent read in the same region returns the updated data. How does strongly consistent read behavior in MREC affect this? Is there any situation where ConsistentRead=True would still return stale data?

  3. Calculate the monthly rWRU cost for the following scenario: 500,000 active users, each updating their profile 2 times per day (2 KB item), 3 regions, no GSIs. Compare with the cost of a single-region table with the same write volume.


Resources for further study

  • [FACT] How DynamoDB global tables work (MREC vs MRSC, conflict resolution): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/V2globaltables_HowItWorks.html
  • [FACT] Understanding billing for global tables (rWRU/rWCU, GSI): https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/global-tables-billing.html
  • [FACT] Best practices for global tables: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-global-table-design.html
  • [FACT] Write modes with global tables: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-global-table-design.prescriptive-guidance.writemodes.html