Amazon ECS Details
1. Cluster Structure and Capacity
1.1 Capacity Providers
- Managed Capacity: Capacity providers allow you to dynamically manage the capacity of an ECS cluster. You can specify scaling strategies that automatically adjust the number of EC2 instances, balancing cost and availability.
- Advanced Auto Scaling: Combine Auto Scaling groups with capacity providers to react to demand spikes. You can define metrics that trigger the addition (or removal) of instances. This way, the cluster automatically maintains the ideal capacity so that ECS tasks run without interruptions.
1.2 EC2 vs Fargate: Advanced Selection Criteria
- EC2:
- Greater control over the OS and network layer, enabling specific optimizations such as kernel tuning, use of GPU-optimized instances, and direct server access for third-party security tools.
- Greater responsibility for patching and maintaining AMIs.
- Fargate:
- Simplifies infrastructure management by not exposing the underlying OS.
- Removes the need to manage cluster scalability, as each task gets isolated resources.
- Excellent for workloads with variable size that benefit from per-second usage-based pricing.
---
2. Task Definitions: Advanced Features
2.1 Volume Mounting and Storage
- Amazon EFS:
- Allows configuring shared volumes between different containers or services, keeping data persistent.
- Useful for applications that need simultaneous access to the same volume (e.g., caching systems or file uploads).
- Pay attention to file permissions and encryption at rest (EFS offers native encryption).
- Ephemeral Storage:
- You can configure additional ephemeral storage space in Fargate (up to 200 GiB) or use local disks on EC2 instances.
- Ideal for temporary caches and short-term file manipulation, but remember that data is not persisted after task interruption.
2.2 Network Configurations
- Bridge vs Host vs AWSVPC
- Bridge Mode: Creates a separate Docker network, mapping ports to the host. Useful when you have multiple containers running on the same instance with similar ports.
- Host Mode: The container shares the same network stack as the instance. Lower overhead, but requires care to avoid port conflicts.
- AWSVPC Mode: Each task gets its own IP address in the VPC. Greater isolation and traffic traceability, recommended for microservices that require granular security (current best practice).
2.3 Secrets and Configurations
- Secrets Manager and Parameter Store
- You can inject secrets into environment variables or files in the container, consuming them directly from AWS Secrets Manager or AWS Systems Manager Parameter Store.
- Use the syntax "{\"SecretsManagerSecret\":\" in the definition file to simplify configuration.
---
3. Advanced Security
3.1 IAM Roles for Tasks
- Least Privilege: Always create a specific IAM role for each task type, defining only the necessary actions.
- Isolation Between Containers: Instead of a single role for the entire cluster, assign different roles to each service (or task) to prevent unauthorized access to unrelated resources.
3.2 Network Control and Zero Trust
- Network ACLs and Security Groups:
- For tasks using awsvpc, each task can have its own security group. This allows highly refined control of ingress and egress traffic.
- Consider grouping back-end services in private subnets, using a NAT Gateway for internet access, leaving only public-facing front-ends in public subnets.
- AWS Verified Access (ZTA):
- If your architecture is designed with zero trust in mind, it may be beneficial to integrate ECS with VPN-less access solutions and context checks for each request, depending on the governance model.
3.3 Patching and Secure Images
- Image Lifecycle
- Automate image vulnerability scanning via Amazon ECR image scanning or third-party tools (Twistlock, Aqua Security, etc.).
- Create a CI/CD pipeline that stops the build if high-risk detections are found.
- AMI Updates (EC2)
- If using EC2 instances, create a regular patching and AMI update process (e.g., via EC2 Image Builder) to keep the OS up to date and reduce security vulnerabilities.
3.4 IDS/IPS and GuardDuty Integration
- Amazon GuardDuty
- Monitors network traffic and API calls within your AWS account, identifying potentially malicious activities.
- Important to enable integration with ECR and VPC for detecting suspicious container-related behaviors (e.g., cryptocurrency mining or port scanning).
- AWS Security Hub
- Centralizes security findings, including integrations with GuardDuty, AWS Config, and third-party solutions.
- Enables creating automated response via Amazon EventBridge for rapid incident remediation.
---
4. Observability and Monitoring
4.1 Logs and Metrics
- CloudWatch Logs
- Each container can send application logs directly to CloudWatch.
- Configure log drivers (e.g., awslogs, fluentd, splunk) to forward log data to the desired destination.
- Container Insights
- Provides detailed metrics on CPU, memory, disk usage, network, as well as insights into the performance of each ECS task or service.
- Helps identify bottlenecks and abnormal behaviors.
4.2 Distributed Tracing and Telemetry
- AWS X-Ray
- Monitors distributed transactions across different microservices.
- Aggregates metrics, latency, and logs associated with end-to-end requests, facilitating performance issue detection and dependency mapping.
4.3 Deployment Structure
- Circuit Breakers and Rolling Deployments
- ECS supports circuit breaker to automatically stop deployments that fail or take too long to stabilize.
- Have defined metrics (e.g., HTTP 5xx errors) that trigger automatic rollback before problems propagate in the production environment.
---
5. High Availability and Disaster Recovery Strategies
5.1 Multi-AZ
- ECS Services can be configured to run tasks across multiple availability zones (AZs), increasing resilience to infrastructure failures.
5.2 Cross-Region Replication
- ECR Cross-Region
- If you need greater image resilience, enable replication so that images are available in different regions.
- Useful for DR (Disaster Recovery) architectures, where applications need to be brought up quickly in another region.
5.3 Backup and Image Versions
- Image Versioning
- Always version your Docker images (unambiguous tags) to quickly roll back in case of incidents.
- Configuration Snapshots
- Use AWS CloudFormation or Terraform to manage infrastructure as code. This way, your ECS configuration (services, tasks, IAM, networks, etc.) can be reproduced in another environment in case of disaster.
---
6. Specific Use Cases and Best Practices
6.1 ECS Exec: Secure Container Debugging
- ECS Exec
- Allows you to access the shell inside a running container, similar to kubectl exec in Kubernetes.
- Requires AWS Systems Manager (SSM) configuration, which manages this access securely and auditably (all commands are logged in CloudTrail, if enabled).
6.2 Integration with Other Solutions
- AWS WAF
- If you are publishing services via ALB (Application Load Balancer), you can integrate with AWS WAF for layer 7 protection, blocking SQL injection, XSS, and other application threats.
- Shield and Firewall Manager
- For critical workloads that require DDoS protection, enable AWS Shield Advanced and manage proactive blocking rules via AWS Firewall Manager.
6.3 Governance Policies
- AWS Organizations & Service Control Policies (SCPs)
- Globally restricts certain actions or services in specific accounts to reduce the risk of misuse.
- Example: blocking the creation of ECS resources outside a specific region, or prohibiting root access to cluster changes.
---