Authentication Failures

What this means

Your ECS containers are logging authentication or authorization failures when trying to access AWS services or external APIs. In the AWS ecosystem, this almost always involves the task's IAM role. Every ECS task can be assigned a task role (for application-level AWS API access) and a task execution role (for ECS infrastructure operations like pulling images and writing logs). When either role lacks the required permissions, or when credentials expire and cannot be refreshed, the application logs authentication errors.

The ECS credential chain works differently from EC2 instance profiles. On Fargate, the ECS agent injects temporary credentials via the task metadata endpoint at 169.254.170.2. The AWS SDKs automatically discover and use these credentials. On EC2 launch type, the same endpoint is available when using awsvpc network mode, but with bridge or host networking, credentials come from the container instance's IAM profile unless you configure the ECS_ENABLE_TASK_IAM_ROLE agent variable. This distinction trips up teams migrating from EC2 to Fargate — an IAM policy that worked on EC2 (because it was attached to the instance profile) stops working on Fargate (because the task role is a separate entity that needs its own policy).

Auth failures can also come from non-AWS sources. If your container authenticates to an external database (RDS with IAM auth, or a third-party database with credentials from Secrets Manager), a failed secret retrieval or an expired IAM database token will surface as an authentication error in the application logs. Similarly, OAuth tokens for external APIs that are stored in AWS Secrets Manager or Parameter Store will cause auth failures if the task role cannot read from those services.

Credential rotation is another frequent trigger. When AWS rotates the temporary credentials for a task role (which happens automatically), the SDK must fetch new credentials from the metadata endpoint. If the endpoint is unreachable (due to a misconfigured iptables rule on EC2, or a network issue on Fargate), the SDK falls back to expired credentials and every subsequent AWS API call fails with an ExpiredTokenException. This pattern typically appears as a sudden burst of auth failures across all tasks in a service simultaneously.

Detection criteria

ELEVATED

More than 1 authentication error pattern detected across log windows. Matches "AccessDenied," "UnauthorizedAccess," "ExpiredTokenException," "InvalidIdentityToken," "403 Forbidden," "authentication failed," and similar auth-related error signatures.

Example finding

ELEVATED Authentication Failures

3 authentication failure events detected across 2 log windows. IAM task role lacks s3:GetObject permission for the reports bucket.

AccessDenied: User: arn:aws:sts::123456:assumed-role/ecs-task-role/task-id

is not authorized to perform: s3:GetObject on resource:

arn:aws:s3:::company-reports/2026/03/daily-summary.csv

How to fix

Identify which IAM role is failing and what action it tried. The AWS error message includes the assumed role ARN and the denied action. The role ARN tells you whether this is the task role (used by your application code) or the task execution role (used by ECS to pull images and push logs). Check the error: if it mentions ecsTaskExecutionRole or references ECR/CloudWatch operations, it is the execution role. Otherwise, it is the task role. You can also use IAM Access Analyzer's policy simulator to test which actions a role is permitted to perform.

Add the missing permissions to the correct IAM role. For the task role, add a policy statement granting the denied action. Use the principle of least privilege: grant access only to the specific resources mentioned in the error, not Resource: "*". For example, if the error says s3:GetObject on a specific bucket, grant s3:GetObject on arn:aws:s3:::that-bucket/*. After updating the policy, new tasks will pick up the change automatically, but running tasks need to be restarted to refresh their credentials (or wait for the next automatic credential rotation).

Check for SCP or permission boundary restrictions. Even if the task role has the correct policy, an AWS Organizations Service Control Policy (SCP) or an IAM permission boundary attached to the role can deny the action. SCPs are evaluated at the account level and can override any IAM policy. If you recently migrated the account into an Organization or updated SCPs, check aws organizations list-policies-for-target and review each SCP for deny statements that match the failed action.

Verify Secrets Manager and Parameter Store access. If your task definition references secrets (via the secrets block or valueFrom in environment variables), the task execution role must have secretsmanager:GetSecretValue for Secrets Manager or ssm:GetParameters for Parameter Store. If the secrets are encrypted with a customer-managed KMS key, the execution role also needs kms:Decrypt on that key. Missing KMS permissions produce particularly confusing error messages that mention AccessDenied without specifying the KMS context.

Monitor for credential rotation issues. If auth failures appear suddenly across all tasks at the same time, the issue is likely credential refresh rather than missing permissions. Check whether the task metadata endpoint is reachable from inside your container by running curl http://169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI via ECS Exec. If this fails, investigate network configuration (iptables rules on EC2, security group rules on Fargate). Also ensure you are using a recent version of the AWS SDK, as older versions have bugs in the credential refresh logic that can cause intermittent auth failures under high concurrency.

High Container Error Rate

Percentage of log windows containing container errors

Network Connectivity Failures

Connection errors to downstream services

Spot IAM and credential issues before they cascade. Upload your CloudWatch export to get started.

Analyze your logs

What this means

Detection criteria

Example finding

How to fix

Related patterns