Amazon ECS Details

January 10, 2025

1. Cluster Structure and Capacity

Managed Capacity: Capacity providers allow you to dynamically manage the capacity of an ECS cluster. You can specify scaling strategies that automatically adjust the number of EC2 instances, balancing cost and availability.
Advanced Auto Scaling: Combine Auto Scaling groups with capacity providers to react to demand spikes. You can define metrics that trigger the addition (or removal) of instances. This way, the cluster automatically maintains the ideal capacity so that ECS tasks run without interruptions.

EC2:
Greater control over the OS and network layer, enabling specific optimizations such as kernel tuning, use of GPU-optimized instances, and direct server access for third-party security tools.
Greater responsibility for patching and maintaining AMIs.
Fargate:
Simplifies infrastructure management by not exposing the underlying OS.
Removes the need to manage cluster scalability, as each task gets isolated resources.
Excellent for workloads with variable size that benefit from per-second usage-based pricing.

Amazon EFS:
Allows configuring shared volumes between different containers or services, keeping data persistent.
Useful for applications that need simultaneous access to the same volume (e.g., caching systems or file uploads).
Pay attention to file permissions and encryption at rest (EFS offers native encryption).
Ephemeral Storage:
You can configure additional ephemeral storage space in Fargate (up to 200 GiB) or use local disks on EC2 instances.
Ideal for temporary caches and short-term file manipulation, but remember that data is not persisted after task interruption.

Bridge vs Host vs AWSVPC
Bridge Mode: Creates a separate Docker network, mapping ports to the host. Useful when you have multiple containers running on the same instance with similar ports.
Host Mode: The container shares the same network stack as the instance. Lower overhead, but requires care to avoid port conflicts.
AWSVPC Mode: Each task gets its own IP address in the VPC. Greater isolation and traffic traceability, recommended for microservices that require granular security (current best practice).

Secrets Manager and Parameter Store
You can inject secrets into environment variables or files in the container, consuming them directly from AWS Secrets Manager or AWS Systems Manager Parameter Store.
Use the syntax "{\"SecretsManagerSecret\":\"<secret-name>\"}" in the definition file to simplify configuration.

Least Privilege: Always create a specific IAM role for each task type, defining only the necessary actions.
Isolation Between Containers: Instead of a single role for the entire cluster, assign different roles to each service (or task) to prevent unauthorized access to unrelated resources.

Network ACLs and Security Groups:
For tasks using awsvpc, each task can have its own security group. This allows highly refined control of ingress and egress traffic.
Consider grouping back-end services in private subnets, using a NAT Gateway for internet access, leaving only public-facing front-ends in public subnets.
AWS Verified Access (ZTA):
If your architecture is designed with zero trust in mind, it may be beneficial to integrate ECS with VPN-less access solutions and context checks for each request, depending on the governance model.

Image Lifecycle
Automate image vulnerability scanning via Amazon ECR image scanning or third-party tools (Twistlock, Aqua Security, etc.).
Create a CI/CD pipeline that stops the build if high-risk detections are found.
AMI Updates (EC2)
If using EC2 instances, create a regular patching and AMI update process (e.g., via EC2 Image Builder) to keep the OS up to date and reduce security vulnerabilities.

Amazon GuardDuty
Monitors network traffic and API calls within your AWS account, identifying potentially malicious activities.
Important to enable integration with ECR and VPC for detecting suspicious container-related behaviors (e.g., cryptocurrency mining or port scanning).
AWS Security Hub
Centralizes security findings, including integrations with GuardDuty, AWS Config, and third-party solutions.
Enables creating automated response via Amazon EventBridge for rapid incident remediation.

CloudWatch Logs
Each container can send application logs directly to CloudWatch.
Configure log drivers (e.g., awslogs, fluentd, splunk) to forward log data to the desired destination.
Container Insights
Provides detailed metrics on CPU, memory, disk usage, network, as well as insights into the performance of each ECS task or service.
Helps identify bottlenecks and abnormal behaviors.

AWS X-Ray
Monitors distributed transactions across different microservices.
Aggregates metrics, latency, and logs associated with end-to-end requests, facilitating performance issue detection and dependency mapping.

Circuit Breakers and Rolling Deployments
ECS supports circuit breaker to automatically stop deployments that fail or take too long to stabilize.
Have defined metrics (e.g., HTTP 5xx errors) that trigger automatic rollback before problems propagate in the production environment.

ECS Services can be configured to run tasks across multiple availability zones (AZs), increasing resilience to infrastructure failures.

ECR Cross-Region
If you need greater image resilience, enable replication so that images are available in different regions.
Useful for DR (Disaster Recovery) architectures, where applications need to be brought up quickly in another region.

Image Versioning
Always version your Docker images (unambiguous tags) to quickly roll back in case of incidents.
Configuration Snapshots
Use AWS CloudFormation or Terraform to manage infrastructure as code. This way, your ECS configuration (services, tasks, IAM, networks, etc.) can be reproduced in another environment in case of disaster.

ECS Exec
Allows you to access the shell inside a running container, similar to kubectl exec in Kubernetes.
Requires AWS Systems Manager (SSM) configuration, which manages this access securely and auditably (all commands are logged in CloudTrail, if enabled).

AWS WAF
If you are publishing services via ALB (Application Load Balancer), you can integrate with AWS WAF for layer 7 protection, blocking SQL injection, XSS, and other application threats.
Shield and Firewall Manager
For critical workloads that require DDoS protection, enable AWS Shield Advanced and manage proactive blocking rules via AWS Firewall Manager.

AWS Organizations & Service Control Policies (SCPs)
Globally restricts certain actions or services in specific accounts to reduce the risk of misuse.
Example: blocking the creation of ECS resources outside a specific region, or prohibiting root access to cluster changes.