luizmachado.dev

PT EN

Session 042 — Cost Explorer, Cost Anomaly Detection and Compute Optimizer

Prerequisite: session-041 (Spot Instances and Fleet)


Session Objectives

  • Master the GetCostAndUsage API with tag filters, dimensions, and correct metrics
  • Understand the differences between BlendedCost, UnblendedCost, AmortizedCost, and NetAmortizedCost
  • Configure Cost Anomaly Detection with managed and tag-based monitors
  • Interpret the anomaly SNS payload and its fields (anomalyScore, impact, rootCauses)
  • Use Compute Optimizer for EC2 and Fargate with rightsizing preferences
  • Export Compute Optimizer recommendations for analysis in S3

1. Cost Explorer — Fundamental Concepts

1.1 Granularity and data window

[FACT] Cost Explorer provides data with a delay of up to 24 hours. Granularity defines the minimum period for each data point:

╔════════════════╦════════════════════════════════════════════════╗
║ Granularidade  ║ Restrição                                      ║
╠════════════════╬════════════════════════════════════════════════╣
║ HOURLY         ║ Janela máxima: 14 dias atrás                   ║
║ DAILY          ║ Padrão; 13 meses gratuito, 38 meses (pago)     ║
║ MONTHLY        ║ Padrão; 13 meses gratuito, 38 meses (pago)     ║
╚════════════════╩════════════════════════════════════════════════╝

1.2 Metrics — which one to use for what

[FACT] Each metric represents a different perspective of cost:

UnblendedCost
  — Custo real cobrado na conta individual
  — Inclui o preço On-Demand sem qualquer spreading de RI/SP
  — Usar para: faturamento de conta individual, chargeback simples

BlendedCost
  — Distribui o custo de RIs e SPs proporcionalmente entre as contas vinculadas
  — Usado para: alocação de custo em Organizations (levela diferenças de preço)
  — Cuidado: pode não refletir o custo real de cada conta filha

AmortizedCost
  — Distribui o custo upfront de RI/SP pelo período de vigência
  — Ex.: RI 1 ano, $1000 upfront → $2,74/dia amortizado
  — Usar para: análise de custo real do período (ex.: dashboards FinOps)

NetAmortizedCost
  — AmortizedCost depois de descontos privados (Enterprise Discount Program, negociações)
  — Usar para: custo líquido real em contas com contratos customizados

NormalizedUsageAmount / UsageQuantity
  — Volume de uso, não custo; útil para análise de consumo por serviço

[CONSENSUS] For cost dashboards by project/team, the most useful metric is AmortizedCost — it reflects the real economic cost, spreading RI/SP upfront payments over time.

1.3 Cost Allocation Tags

[FACT] To filter or group by tag in Cost Explorer, the tag must be activated as a Cost Allocation Tag in the Billing console. It is not automatic.

Fluxo de ativação:
  1. Billing console → Cost allocation tags
  2. Selecionar tag (ex.: "project") → Activate
  3. Aguardar 24h para aparecer nos dados do Cost Explorer
  4. Tags em recursos novos começam a aparecer imediatamente após ativação
  5. Tags em recursos históricos NÃO são retroativas

Limite: 500 Cost Allocation Tags ativadas por conta

2. GetCostAndUsage API — Structure and Examples

2.1 Expression structure (filters)

[FACT] The Expression type in the Cost Explorer API supports boolean composition with And, Or, Not (arrays), and leaves with Dimensions, Tags, CostCategories.

# Estrutura base de Expression
expression = {
    "And": [
        # Folha 1: filtrar por tag
        {
            "Tags": {
                "Key": "project",
                "Values": ["checkout-service", "payments-api"],
                "MatchOptions": ["EQUALS"]
            }
        },
        # Folha 2: excluir tipo de uso (ex.: data transfer)
        {
            "Not": {
                "Dimensions": {
                    "Key": "USAGE_TYPE_GROUP",
                    "Values": ["EC2: Data Transfer - Internet (Out)"],
                    "MatchOptions": ["EQUALS"]
                }
            }
        }
    ]
}

Available dimensions in Dimensions.Key: SERVICE, REGION, LINKED_ACCOUNT, INSTANCE_TYPE, USAGE_TYPE, USAGE_TYPE_GROUP, RECORD_TYPE (On-Demand/Spot/SavingsPlan/etc.), OPERATING_SYSTEM, TENANCY, PURCHASE_TYPE, AZ.

MatchOptions: EQUALS, STARTS_WITH, ENDS_WITH, CONTAINS, ABSENT (resources without the tag), CASE_SENSITIVE, CASE_INSENSITIVE.

2.2 GroupBy

[FACT] GroupBy defines how the result is segmented. Maximum of 2 groups per call. The type can be DIMENSION or TAG.

group_by = [
    {"Type": "TAG",       "Key": "project"},         # tag ativada
    {"Type": "DIMENSION", "Key": "SERVICE"},           # dimensão padrão
]

3. CDK Python — Cost Monitoring Stack

from aws_cdk import (
    Stack, aws_ce as ce, aws_sns as sns,
    aws_sns_subscriptions as subs,
    aws_budgets as budgets,
)
from constructs import Construct

class CostMonitoringStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        # SNS topic para alertas de anomalia
        alert_topic = sns.Topic(self, "CostAlertTopic",
            display_name="AWS Cost Anomaly Alerts",
        )
        alert_topic.add_subscription(
            subs.EmailSubscription("finops-team@company.com")
        )

        # ──────────────────────────────────────────────────────────────
        # Cost Anomaly Detection — Monitor por serviço (AWS Managed)
        # Monitora todos os serviços automaticamente, top 5000 valores
        # ──────────────────────────────────────────────────────────────
        service_monitor = ce.CfnAnomalyMonitor(self, "ServiceMonitor",
            monitor_name="AllServicesMonitor",
            monitor_type="DIMENSIONAL",
            monitor_dimension="SERVICE",  # AWS Managed: monitora todos serviços
        )

        # Subscription com dois thresholds combinados (AND):
        # alerta se impacto >= $50 E percentual >= 20%
        ce.CfnAnomalySubscription(self, "ServiceSubscription",
            subscription_name="ServiceAnomalyAlerts",
            monitor_arn_list=[service_monitor.attr_monitor_arn],
            subscribers=[
                ce.CfnAnomalySubscription.SubscriberProperty(
                    address=alert_topic.topic_arn,
                    type="SNS",
                    status="CONFIRMED",
                )
            ],
            frequency="IMMEDIATE",  # alertas individuais via SNS (não diário/semanal)
            threshold_expression={
                "And": [
                    {
                        "Dimensions": {
                            "Key": "ANOMALY_TOTAL_IMPACT_ABSOLUTE",
                            "MatchOptions": ["GREATER_THAN_OR_EQUAL"],
                            "Values": ["50"]   # $50 de impacto absoluto
                        }
                    },
                    {
                        "Dimensions": {
                            "Key": "ANOMALY_TOTAL_IMPACT_PERCENTAGE",
                            "MatchOptions": ["GREATER_THAN_OR_EQUAL"],
                            "Values": ["20"]   # 20% de desvio percentual
                        }
                    }
                ]
            },
        )

        # ──────────────────────────────────────────────────────────────
        # Monitor por Cost Allocation Tag (project) — Customer Managed
        # Monitora projetos específicos com threshold personalizado
        # ──────────────────────────────────────────────────────────────
        tag_monitor = ce.CfnAnomalyMonitor(self, "ProjectTagMonitor",
            monitor_name="ProjectTagMonitor",
            monitor_type="CUSTOM",  # Customer managed
            monitor_specification=ce.CfnAnomalyMonitor.MonitorSpecificationProperty(
                expression={
                    "Tags": {
                        "Key": "project",
                        "Values": ["checkout-service", "payments-api", "fraud-detector"],
                        "MatchOptions": ["EQUALS"]
                    }
                }
            ),
        )

        ce.CfnAnomalySubscription(self, "ProjectSubscription",
            subscription_name="ProjectAnomalyAlerts",
            monitor_arn_list=[tag_monitor.attr_monitor_arn],
            subscribers=[
                ce.CfnAnomalySubscription.SubscriberProperty(
                    address=alert_topic.topic_arn,
                    type="SNS",
                    status="CONFIRMED",
                )
            ],
            frequency="IMMEDIATE",
            threshold_expression={
                "Dimensions": {
                    "Key": "ANOMALY_TOTAL_IMPACT_ABSOLUTE",
                    "MatchOptions": ["GREATER_THAN_OR_EQUAL"],
                    "Values": ["100"]  # threshold mais alto para projetos core
                }
            },
        )

        # ──────────────────────────────────────────────────────────────
        # Budget por tag de projeto — alerta na previsão de estourar
        # ──────────────────────────────────────────────────────────────
        budgets.CfnBudget(self, "ProjectBudget",
            budget=budgets.CfnBudget.BudgetDataProperty(
                budget_name="checkout-service-monthly",
                budget_type="COST",
                time_unit="MONTHLY",
                budget_limit=budgets.CfnBudget.SpendProperty(
                    amount=5000, unit="USD"
                ),
                cost_filters={
                    "TagKeyValue": ["user:project$checkout-service"]
                },
            ),
            notifications_with_subscribers=[
                budgets.CfnBudget.NotificationWithSubscribersProperty(
                    notification=budgets.CfnBudget.NotificationProperty(
                        comparison_operator="GREATER_THAN",
                        notification_type="ACTUAL",
                        threshold=80,
                        threshold_type="PERCENTAGE",
                    ),
                    subscribers=[
                        budgets.CfnBudget.SubscriberProperty(
                            address=alert_topic.topic_arn,
                            subscription_type="SNS",
                        )
                    ],
                ),
                budgets.CfnBudget.NotificationWithSubscribersProperty(
                    notification=budgets.CfnBudget.NotificationProperty(
                        comparison_operator="GREATER_THAN",
                        notification_type="FORECASTED",
                        threshold=100,
                        threshold_type="PERCENTAGE",
                    ),
                    subscribers=[
                        budgets.CfnBudget.SubscriberProperty(
                            address=alert_topic.topic_arn,
                            subscription_type="SNS",
                        )
                    ],
                ),
            ],
        )

4. Python — Cost Explorer GetCostAndUsage

import boto3
from datetime import datetime, date, timedelta
from dataclasses import dataclass, field
from typing import Optional
import calendar

ce = boto3.client("ce", region_name="us-east-1")


@dataclass
class ProjectCostReport:
    project: str
    service: str
    monthly_cost: float
    previous_month_cost: float
    mom_change_pct: float  # month-over-month %


def get_cost_by_project_and_service(
    tag_key: str = "project",
    tag_values: Optional[list[str]] = None,
    lookback_months: int = 2,
) -> list[ProjectCostReport]:
    """
    Retorna custo mensal segmentado por tag:project × SERVICE,
    usando AmortizedCost (correto para FinOps — amortiza RI/SP upfront).
    """
    today = date.today()
    # Primeiro dia do mês atual
    start_current = today.replace(day=1)
    # Primeiro dia de lookback_months atrás
    start_prev = (start_current - timedelta(days=lookback_months * 31)).replace(day=1)
    # Fim = amanhã (para incluir hoje)
    end = today + timedelta(days=1)

    filter_expr: dict = {}
    if tag_values:
        filter_expr = {
            "Tags": {
                "Key": tag_key,
                "Values": tag_values,
                "MatchOptions": ["EQUALS"]
            }
        }
    # Sem filtro de tag = todos os projetos

    params = dict(
        TimePeriod={
            "Start": start_prev.strftime("%Y-%m-%d"),
            "End": end.strftime("%Y-%m-%d"),
        },
        Granularity="MONTHLY",
        Metrics=["AmortizedCost"],
        GroupBy=[
            {"Type": "TAG",       "Key": tag_key},
            {"Type": "DIMENSION", "Key": "SERVICE"},
        ],
    )
    if filter_expr:
        params["Filter"] = filter_expr

    # Paginar resultados
    all_results = []
    next_token = None
    while True:
        if next_token:
            params["NextPageToken"] = next_token
        response = ce.get_cost_and_usage(**params)
        all_results.extend(response.get("ResultsByTime", []))
        next_token = response.get("NextPageToken")
        if not next_token:
            break

    # Organizar por (project, service) → {month_key: cost}
    from collections import defaultdict
    cost_map: dict[tuple, dict[str, float]] = defaultdict(dict)
    for result in all_results:
        period_start = result["TimePeriod"]["Start"][:7]  # "2025-11"
        for group in result.get("Groups", []):
            keys = group["Keys"]  # ["project$checkout", "Amazon EC2"]
            project_val = keys[0].replace(f"{tag_key}$", "")
            service_val = keys[1]
            cost = float(group["Metrics"]["AmortizedCost"]["Amount"])
            cost_map[(project_val, service_val)][period_start] = cost

    # Montar relatório com variação MoM
    months = sorted({m for costs in cost_map.values() for m in costs})
    if len(months) < 2:
        return []
    prev_month, curr_month = months[-2], months[-1]

    reports = []
    for (project, service), monthly_data in cost_map.items():
        curr = monthly_data.get(curr_month, 0.0)
        prev = monthly_data.get(prev_month, 0.0)
        mom = ((curr - prev) / prev * 100) if prev > 0 else 0.0
        if curr > 0.01 or prev > 0.01:  # filtrar zeros
            reports.append(ProjectCostReport(
                project=project,
                service=service,
                monthly_cost=curr,
                previous_month_cost=prev,
                mom_change_pct=mom,
            ))

    return sorted(reports, key=lambda r: r.monthly_cost, reverse=True)


def get_cost_forecast(
    tag_key: str = "project",
    tag_value: str = "checkout-service",
    months_ahead: int = 1,
) -> dict:
    """Previsão de custo para o próximo mês usando GetCostForecast."""
    today = date.today()
    # Forecast começa amanhã
    start = (today + timedelta(days=1)).strftime("%Y-%m-%d")
    # Último dia do mês seguinte
    target_month = today.month + months_ahead
    target_year = today.year + (target_month - 1) // 12
    target_month = ((target_month - 1) % 12) + 1
    last_day = calendar.monthrange(target_year, target_month)[1]
    end = f"{target_year}-{target_month:02d}-{last_day:02d}"

    response = ce.get_cost_forecast(
        TimePeriod={"Start": start, "End": end},
        Metric="AMORTIZED_COST",
        Granularity="MONTHLY",
        Filter={
            "Tags": {
                "Key": tag_key,
                "Values": [tag_value],
                "MatchOptions": ["EQUALS"]
            }
        },
    )
    total = response["Total"]
    return {
        "forecast_amount": float(total["Amount"]),
        "unit": total["Unit"],
        "prediction_interval_lower": float(
            response["ForecastResultsByTime"][0]["PredictionIntervalLowerBound"]
        ),
        "prediction_interval_upper": float(
            response["ForecastResultsByTime"][0]["PredictionIntervalUpperBound"]
        ),
    }


# Exemplo de uso
if __name__ == "__main__":
    reports = get_cost_by_project_and_service(
        tag_key="project",
        tag_values=["checkout-service", "payments-api"],
    )
    print(f"\n{'Projeto':25} {'Serviço':35} {'Mês atual':12} {'Mês ant.':12} {'MoM':8}")
    print("-" * 95)
    for r in reports[:20]:
        print(
            f"{r.project:25} {r.service:35} "
            f"${r.monthly_cost:>9.2f}  ${r.previous_month_cost:>9.2f}  "
            f"{r.mom_change_pct:>+6.1f}%"
        )

    fc = get_cost_forecast("project", "checkout-service")
    print(f"\nPrevisão checkout-service: ${fc['forecast_amount']:.2f} "
          f"(range: ${fc['prediction_interval_lower']:.2f}–${fc['prediction_interval_upper']:.2f})")

5. Compute Optimizer — Rightsizing

5.1 Supported resources and prerequisites

[FACT] Compute Optimizer supports: EC2 instances, EC2 Auto Scaling groups, EBS volumes, Lambda functions, ECS services on Fargate, Aurora/RDS databases, commercial software licenses.

[FACT] Compute Optimizer is not active by default — explicit opt-in is required in the account or in the Organization's management account.

[FACT] By default, it analyzes 14 days of CloudWatch metrics. With Enhanced Infrastructure Metrics (paid), it extends to 93 days.

[FACT] For recommendations that consider EC2 memory, you need to install the CloudWatch agent on the instance (or configure external metrics ingestion via Datadog/Dynatrace).

5.2 Findings (possible results)

╔══════════════════════╦══════════════════════════════════════════════════════╗
║ Finding              ║ Significado                                          ║
╠══════════════════════╬══════════════════════════════════════════════════════╣
║ OVER_PROVISIONED     ║ Instância maior que necessário; há economia possível ║
║ UNDER_PROVISIONED    ║ Instância menor que necessário; risco de performance ║
║ OPTIMIZED            ║ Configuração adequada ao uso atual                   ║
║ NOT_OPTIMIZED        ║ Dados insuficientes ou configuração especial         ║
╚══════════════════════╩══════════════════════════════════════════════════════╝

5.3 Rightsizing Preferences — Presets

[FACT] Compute Optimizer offers 4 configurable presets, with direct impact on the conservatism of recommendations:

╔══════════════════════╦══════════════╦══════════════╦════════════════╗
║ Preset               ║ CPU Threshold║ CPU Headroom ║ Memory Headroom║
╠══════════════════════╬══════════════╬══════════════╬════════════════╣
║ Maximum savings      ║ P90          ║ 0%           ║ 10%            ║
║ Balanced             ║ P95          ║ 30%          ║ 30%            ║
║ Default              ║ P99.5        ║ 20%          ║ 20%            ║
║ Maximum performance  ║ P99.5        ║ 30%          ║ 30%            ║
╚══════════════════════╩══════════════╩══════════════╩════════════════╝

CPU Threshold = percentil acima do qual dados são ignorados (ex.: P90 ignora top 10% picos)
CPU Headroom  = margem adicionada acima do uso atual para buffer
Memory Headroom = margem adicionada acima do uso de memória

[CONSENSUS] For conservative production environments, Default (P99.5 + 20% headroom) is adequate. For dev/staging environments where spikes are not critical, Balanced or Maximum savings generate more savings.


6. Python — Compute Optimizer: recommendation analysis and export

import boto3
import json
from dataclasses import dataclass
from typing import Optional

co = boto3.client("compute-optimizer", region_name="us-east-1")
ec2_client = boto3.client("ec2", region_name="us-east-1")


@dataclass
class EC2Recommendation:
    instance_id: str
    instance_type: str
    finding: str  # OVER_PROVISIONED / UNDER_PROVISIONED / OPTIMIZED
    recommended_instance_type: Optional[str]
    estimated_monthly_savings: float
    cpu_max_30d: float
    memory_max_30d: Optional[float]
    reason: str


def get_ec2_rightsizing_recommendations(
    account_ids: Optional[list[str]] = None,
    finding_filter: Optional[list[str]] = None,  # ex.: ["OVER_PROVISIONED"]
) -> list[EC2Recommendation]:
    """
    Busca recomendações de rightsizing para instâncias EC2.
    finding_filter: lista de findings para filtrar (None = todos).
    """
    params: dict = {}
    if account_ids:
        params["accountIds"] = account_ids
    if finding_filter:
        params["filters"] = [
            {"name": "Finding", "values": finding_filter}
        ]

    recommendations = []
    next_token = None
    while True:
        if next_token:
            params["nextToken"] = next_token
        response = co.get_ec2_instance_recommendations(**params)

        for rec in response.get("instanceRecommendations", []):
            # Pegar a top recomendação (primeira da lista)
            top_options = rec.get("recommendationOptions", [])
            if not top_options:
                continue
            top = top_options[0]

            # Extrair métricas de utilização
            cpu_max = next(
                (float(m["value"]) for m in rec.get("utilizationMetrics", [])
                 if m["name"] == "CPU" and m["statistic"] == "MAXIMUM"),
                0.0
            )
            mem_max = next(
                (float(m["value"]) for m in rec.get("utilizationMetrics", [])
                 if m["name"] == "MEMORY" and m["statistic"] == "MAXIMUM"),
                None
            )

            # Estimativa de economia mensal
            savings = top.get("estimatedMonthlySavings", {})
            monthly_savings = float(savings.get("value", 0.0))

            # Motivos da recomendação
            reasons = [r.get("name", "") for r in top.get("migrationEffort", [])]
            reason_str = ", ".join(reasons) if reasons else "CPU over-provisioned"

            recommendations.append(EC2Recommendation(
                instance_id=rec["instanceArn"].split("/")[-1],
                instance_type=rec["currentInstanceType"],
                finding=rec["finding"],
                recommended_instance_type=top.get("instanceType"),
                estimated_monthly_savings=monthly_savings,
                cpu_max_30d=cpu_max,
                memory_max_30d=mem_max,
                reason=reason_str,
            ))

        next_token = response.get("nextToken")
        if not next_token:
            break

    return sorted(recommendations, key=lambda r: r.estimated_monthly_savings, reverse=True)


def export_ec2_recommendations_to_s3(
    s3_bucket: str,
    s3_prefix: str = "compute-optimizer/",
    file_format: str = "Csv",  # "Csv" ou "Json"
    include_member_accounts: bool = True,
) -> str:
    """
    Exporta recomendações EC2 para S3 (processamento assíncrono).
    Retorna o JobId para acompanhamento.
    """
    response = co.export_ec2_instance_recommendations(
        s3DestinationConfig={
            "bucket": s3_bucket,
            "keyPrefix": s3_prefix,
        },
        fileFormat=file_format,
        includeMemberAccounts=include_member_accounts,
        # Campos opcionais para filtrar
        # accountIds=["123456789012"],
        # filters=[{"name": "Finding", "values": ["OVER_PROVISIONED"]}],
    )
    return response["jobId"]


def get_ecs_fargate_recommendations() -> list[dict]:
    """
    Recomendações para ECS services on Fargate:
    Compute Optimizer recomenda task CPU, task memory, container CPU/memory.
    """
    response = co.get_ecs_service_recommendations(
        filters=[
            {"name": "Finding", "values": ["OVER_PROVISIONED", "UNDER_PROVISIONED"]}
        ]
    )

    results = []
    for rec in response.get("ecsServiceRecommendations", []):
        current_config = rec.get("currentServiceConfiguration", {})
        top_option = (rec.get("recommendationOptions") or [{}])[0]
        recommended_config = top_option.get("containerRecommendations", [])
        savings = top_option.get("estimatedMonthlySavings", {})

        results.append({
            "service_arn": rec["serviceArn"],
            "finding": rec["finding"],
            "current_cpu": current_config.get("cpu"),
            "current_memory": current_config.get("memory"),
            "recommended_containers": [
                {
                    "name": c.get("containerName"),
                    "recommended_cpu": c.get("memorySizeConfiguration", {}).get("cpu"),
                    "recommended_memory": c.get("memorySizeConfiguration", {}).get("memory"),
                }
                for c in recommended_config
            ],
            "estimated_monthly_savings_usd": float(savings.get("value", 0.0)),
        })

    return sorted(results, key=lambda r: r["estimated_monthly_savings_usd"], reverse=True)


# Exemplo de uso
if __name__ == "__main__":
    # EC2 — lista top recomendações de over-provisioned
    recs = get_ec2_rightsizing_recommendations(
        finding_filter=["OVER_PROVISIONED"]
    )
    total_savings = sum(r.estimated_monthly_savings for r in recs)
    print(f"\nOportunidade total de rightsizing EC2: ${total_savings:.2f}/mês")
    print(f"\n{'ID':20} {'Atual':15} {'Recomendado':15} {'Economia/mês':13} {'CPU max%':9}")
    print("-" * 75)
    for r in recs[:10]:
        print(
            f"{r.instance_id:20} {r.instance_type:15} "
            f"{str(r.recommended_instance_type):15} "
            f"${r.estimated_monthly_savings:>10.2f}   {r.cpu_max_30d:>6.1f}%"
        )

    # Exportar para S3 (assíncrono)
    job_id = export_ec2_recommendations_to_s3(
        s3_bucket="my-finops-bucket",
        s3_prefix="compute-optimizer/ec2/",
    )
    print(f"\nExportação iniciada. Job ID: {job_id}")

    # ECS Fargate
    ecs_recs = get_ecs_fargate_recommendations()
    if ecs_recs:
        print(f"\nRecomendações ECS Fargate: {len(ecs_recs)} serviços")
        for r in ecs_recs[:5]:
            print(f"  {r['service_arn'].split('/')[-1]}: "
                  f"{r['finding']} | "
                  f"Economia: ${r['estimated_monthly_savings_usd']:.2f}/mês")

7. CLI — Essential Examples

# ── COST EXPLORER ──────────────────────────────────────────────────────────

# 1. Custo mensal por serviço nos últimos 3 meses
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '3 months ago' +%Y-%m-01),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics AmortizedCost \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[].{Month:TimePeriod.Start,Groups:Groups[*].{Service:Keys[0],Cost:Metrics.AmortizedCost.Amount}}' \
  --output json

# 2. Custo por tag de projeto no mês atual
aws ce get-cost-and-usage \
  --time-period Start=$(date +%Y-%m-01),End=$(date +%Y-%m-%d) \
  --granularity MONTHLY \
  --metrics AmortizedCost \
  --group-by Type=TAG,Key=project \
  --filter '{
    "Not": {
      "Dimensions": {
        "Key": "RECORD_TYPE",
        "Values": ["Credit", "Refund"],
        "MatchOptions": ["EQUALS"]
      }
    }
  }' \
  --query 'ResultsByTime[0].Groups | sort_by(@, &Metrics.AmortizedCost.Amount) | reverse(@) | [:10].[Keys[0], Metrics.AmortizedCost.Amount]' \
  --output table

# 3. Previsão de custo para o próximo mês
aws ce get-cost-forecast \
  --time-period Start=$(date -d 'tomorrow' +%Y-%m-%d),End=$(date -d 'last day of next month' +%Y-%m-%d) \
  --metric AMORTIZED_COST \
  --granularity MONTHLY \
  --filter '{"Tags": {"Key": "project","Values": ["checkout-service"],"MatchOptions": ["EQUALS"]}}' \
  --query '{Forecast:Total.Amount,Unit:Total.Unit}'

# 4. Listar Cost Allocation Tags ativadas
aws ce list-cost-allocation-tags \
  --status Active \
  --query 'CostAllocationTags[*].{Key:TagKey,Type:Type}'


# ── COST ANOMALY DETECTION ──────────────────────────────────────────────────

# 5. Listar monitores de anomalia
aws ce get-anomaly-monitors \
  --query 'AnomalyMonitors[*].{Name:MonitorName,Type:MonitorType,Arn:MonitorArn}'

# 6. Listar anomalias detectadas nos últimos 30 dias
aws ce get-anomalies \
  --date-interval Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --query 'Anomalies[*].{
      ID:AnomalyId,
      Start:AnomalyStartDate,
      Impact:Impact.TotalImpact,
      ImpactPct:Impact.TotalImpactPercentage,
      Service:RootCauses[0].Service
  }' \
  --output table

# 7. Ver detalhes de uma anomalia específica
aws ce get-anomalies \
  --anomaly-id "12345678-abcd-ef12-3456-987654321a12" \
  --date-interval Start=2024-01-01,End=2024-12-31 \
  --query 'Anomalies[0]'


# ── COMPUTE OPTIMIZER ──────────────────────────────────────────────────────

# 8. Opt-in (necessário antes de qualquer uso)
aws compute-optimizer update-enrollment-status \
  --status Active \
  --include-member-accounts   # para Organizations

# 9. Verificar status de enrollment
aws compute-optimizer get-enrollment-status \
  --query '{Status:Status,NumberOfMemberAccountsOptedIn:NumberOfMemberAccountsOptedIn}'

# 10. Recomendações EC2 — instâncias over-provisioned com maior economia
aws compute-optimizer get-ec2-instance-recommendations \
  --filters name=Finding,values=OVER_PROVISIONED \
  --query 'instanceRecommendations | sort_by(@, &recommendationOptions[0].estimatedMonthlySavings.value) | reverse(@) | [:10] | [*].{
    ID:instanceArn,
    Current:currentInstanceType,
    Recommended:recommendationOptions[0].instanceType,
    Savings:recommendationOptions[0].estimatedMonthlySavings.value,
    SavingsUnit:recommendationOptions[0].estimatedMonthlySavings.currency
  }' \
  --output table

# 11. Recomendações ECS Fargate
aws compute-optimizer get-ecs-service-recommendations \
  --filters name=Finding,values=OVER_PROVISIONED \
  --query 'ecsServiceRecommendations[*].{
    Service:serviceArn,
    Finding:finding,
    CurrentCPU:currentServiceConfiguration.cpu,
    CurrentMem:currentServiceConfiguration.memory,
    Savings:recommendationOptions[0].estimatedMonthlySavings.value
  }' \
  --output table

# 12. Exportar todas as recomendações EC2 para S3 (assíncrono)
aws compute-optimizer export-ec2-instance-recommendations \
  --s3-destination-config bucket=my-finops-bucket,keyPrefix=compute-optimizer/ \
  --file-format Csv \
  --include-member-accounts \
  --query 'jobId'

# 13. Verificar status da exportação
aws compute-optimizer describe-recommendation-export-jobs \
  --job-ids "job-0abc123def456789" \
  --query 'recommendationExportJobs[0].{Status:status,S3:s3Destination}'

# 14. Preferências de rightsizing — aplicar preset "Balanced" para conta
aws compute-optimizer put-recommendation-preferences \
  --resource-type Ec2Instance \
  --scope name=AccountId,value=123456789012 \
  --cpu-vendor-architectures CURRENT_OR_FUTURE_GENERATION \
  --utilization-preferences '[
    {"metricName": "CpuUtilization", "metricParameters": {"threshold": "P95", "headroom": "PERCENT_30"}},
    {"metricName": "MemoryUtilization", "metricParameters": {"headroom": "PERCENT_30"}}
  ]'

8. SNS Payload — Interpreting Anomalies

[FACT] The payload sent to SNS when an anomaly is detected has the following structure:

{
  "accountId": "123456789012",
  "anomalyDetailsLink": "https://console.aws.amazon.com/...",
  "anomalyEndDate": null,
  "anomalyId": "12345678-abcd-ef12-3456-987654321a12",
  "anomalyScore": {
    "currentScore": 0.87,   // quão anômalo é agora (0–1, não documentado como %
                             // mas quanto maior, mais anômalo)
    "maxScore":    0.87
  },
  "anomalyStartDate": "2024-03-15T00:00:00Z",
  "dimensionKey": {"type": "DIMENSION", "key": "SERVICE"},
  "dimensionalValue": "Amazon EC2",
  "impact": {
    "maxImpact":            1203.45,   // maior impacto diário
    "totalActualSpend":     5412.78,   // gasto real no período da anomalia
    "totalExpectedSpend":   1800.00,   // gasto esperado pelo modelo ML
    "totalImpact":          3612.78,   // actual - expected
    "totalImpactPercentage": 200.71    // (totalImpact / totalExpectedSpend) * 100
  },
  "rootCauses": [
    {
      "linkedAccount": "987654321098",
      "linkedAccountName": "prod-account",
      "region": "us-east-1",
      "service": "Amazon EC2",
      "usageType": "BoxUsage:c5.4xlarge",
      "impact": {"contribution": 2800.00}
    }
  ],
  "subscriptionId": "...",
  "subscriptionName": "ServiceAnomalyAlerts"
}

Key fields for automated triage:
- impact.totalImpactPercentage → relative severity
- impact.totalImpact → absolute deviation value
- rootCauses[0] → account + region + usageType for quick diagnosis
- anomalyEndDate == null → anomaly still ongoing


9. Diagram: Complete Cost Governance Pipeline

                     PIPELINE DE GOVERNANÇA DE CUSTO

  Recursos AWS          Coleta/Análise            Alertas/Ação
  ─────────────         ──────────────            ────────────
  EC2, ECS,             Cost Explorer             SNS → Email/Slack
  Lambda,          ───► GetCostAndUsage     ────► Lambda Handler:
  RDS, etc.             (AmortizedCost)            • abrir ticket Jira
      │                      │                     • notify #finops
      │             ┌────────┴──────────┐
      │             │ Cost Anomaly Det. │──────► SNS → Lambda:
      │             │ ML baseline 90d   │         • parse rootCauses
      │             │ 3× checks/day     │         • tag offending resource
      │             │ up to 24h delay   │
      │             └───────────────────┘
      │
      │             Compute Optimizer               Relatório
      └──────────── (opt-in, 14d default)  ───────► S3 CSV:
                    EC2, ECS, Lambda, EBS,          rightsizing_YYYY-MM.csv
                    RDS, ASG                        • instance_id
                    Finding:                        • current_type
                      OVER_PROVISIONED              • recommended_type
                      UNDER_PROVISIONED             • monthly_savings
                      OPTIMIZED

10. Pitfalls

[FACT] Cost Allocation Tags are not retroactive: when you activate a tag, only future data is indexed. Costs prior to activation will not appear filtered by that tag.

[FACT] Cost Explorer has up to 24h delay: detected anomalies may have up to 24h of delay. For real-time alerts, use CloudWatch Alarms instead of Cost Anomaly Detection.

[FACT] Compute Optimizer requires opt-in: it does not collect data until explicitly enabled. If activated today, the first recommendations only appear after 14 days of metric collection.

[FACT] EC2 memory without CloudWatch agent = CPU-only recommendations: Compute Optimizer cannot recommend based on memory without the CloudWatch agent or external metrics ingestion. Recommendations without memory may overestimate over-provisioning.

[FACT] BlendedCost ≠ UnblendedCost in Organizations: in member accounts, BlendedCost distributes RI/SP discounts from the management account proportionally, potentially appearing lower than the actual cost charged to the account. Use UnblendedCost for real per-account chargeback.

[CONSENSUS] Customer Managed Monitor with a single threshold: when using a single customer-managed monitor for multiple projects/tags with very different cost volumes (e.g., $50 and $50,000/month), the same absolute threshold generates many false positives for the smaller one or silence for the larger one. Prefer separate monitors for groups with similar cost levels.

[FACT] Compute Optimizer does not consider SP/RI commitment: by default, it recommends instances without considering your existing Savings Plans or Reserved Instances. Use put-recommendation-preferences with preferredResources to restrict to covered families.


11. When to use each tool

┌─────────────────────────────────┬────────────────────────────────────┐
│ Pergunta                        │ Ferramenta                         │
├─────────────────────────────────┼────────────────────────────────────┤
│ Quanto custou o projeto X       │ Cost Explorer — GetCostAndUsage    │
│ este mês?                       │ com filter por tag + AmortizedCost │
├─────────────────────────────────┼────────────────────────────────────┤
│ Gasto subiu inesperadamente?    │ Cost Anomaly Detection             │
│ Qual serviço causou?            │ (ML detects + rootCauses)          │
├─────────────────────────────────┼────────────────────────────────────┤
│ Vou estourar o budget?          │ Cost Explorer — GetCostForecast    │
│                                 │ + Budgets com threshold FORECASTED │
├─────────────────────────────────┼────────────────────────────────────┤
│ Qual instância está             │ Compute Optimizer — EC2/ECS recs   │
│ superdimensionada?              │ finding=OVER_PROVISIONED           │
├─────────────────────────────────┼────────────────────────────────────┤
│ Comparar períodos de custo      │ Cost Explorer — GetCostComparisons │
├─────────────────────────────────┼────────────────────────────────────┤
│ Rightsizing em escala (500+     │ Compute Optimizer export → S3 →   │
│ instâncias)?                    │ Athena + QuickSight                │
└─────────────────────────────────┴────────────────────────────────────┘

Reflection Exercise

An engineering team wants to implement an automatic cost governance system that:

  1. Detects when any service from a specific project spends more than 150% of expected
  2. Identifies the root cause (account, region, usage type)
  3. Updates an instance tag with the status cost-review=pending
  4. Opens an automatic ticket in the company's ticketing system

Design the complete architecture, answering:

  1. Which Cost Anomaly Detection monitor type to use — AWS Managed or Customer Managed? Why?
  2. How to configure the threshold to detect exactly "150% of expected"? Which field to use: absolute or percentage?
  3. What is the complete data path: from the cost event occurring until the ticket is opened? What are the delays?
  4. Why not use Cost Explorer directly to detect anomalies in real time?
  5. What would Compute Optimizer add to this flow? At what point in the pipeline would it be most useful?

References