Debug EKS Unauthenticated Error

Xing Du
4 min readAug 22, 2023

I ran into an “unauthorized” error while migrating my EKS terraform provisioning project to terraform cloud last week.

The debugging process is somewhat interesting and I hope writing it down would help whoever runs into the same problem in the future

Context

Project Setup

A simplified version for my terraform project, which should be a fairly common setup:

provider "aws" {
region = var.region
allowed_account_ids = ["${var.aws_account_id}"]

assume_role {
role_arn = var.role_arn
}
}

provider "tfe" {
hostname = "app.terraform.io"
}

provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}

data "aws_eks_cluster_auth" "this" {
name = module.eks.cluster_name
}

Migration

  • prior to migrating to terraform cloud, terraform runs from local with my local AWS credential
  • the project contains AWS resources and k8s resources: a running EKS cluster has been provisioned by terraform already
  • this migration contains no further change than necessary changes that modify where terraform is executed
  • existing tfstate (on a s3 backend) is automatically migrated to Terraform cloud with backend change and terraform init
  • on Terraform cloud, the workspace is configured to use agent Execution mode: terraform runs from an agent, which is hosted on a dedicated EC2 instance

Changes

More details on how to migrate to Terraform cloud will be covered in a different post. For context, some high-level changes:

  • point backend to cloud
  • IAM role change
  • workspace name change
  • connectivity: VPC peering connection changes
  • Security group changes

Problem

Symptom

After making the necessary changes (see above) to stop terraform from complaining, I arrived at the last error:

Error: Unauthorized
with kubernetes_some_resource.my_resource
on my_tf.tf line xx, in resource “kubernetes_some_resource” “my_resource”:
resource “kubernetes_some_resource” “my_resource” {

Debugging

A couple of thoughts to narrow down the investigation:

  • This didn’t happen when using s3 backend and local terraform, so it's caused by the environmental difference.
  • the project contains AWS and k8s resources and this only affects k8s resources
  • connectivity issue would have surfaced in a tcp dial timeout instead

The root cause is in how terraform runtime authorizes with EKS, i.e. related to this block:

provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = data.aws_eks_cluster_auth.this.token
}

data "aws_eks_cluster_auth" "this" {
name = module.eks.cluster_name
}

Reasons could be (but not limited to):

  • auth token is expired when terraform gets to k8s resources: a token is valid for 15mins and it’s possible (but unlikely) if the TFC run is using a stale token from tfstate instead of issuing a new one
  • auth token is valid but does not have permission to access EKS

Validate token expiration

To verify/rule out 1st guess, I:

  • commented out the aws_eks_cluster_auth data source
  • used a sensitive variable to pass in the token to k8s provider
  • issued the token via AWS CLI (aws eks get-token --cluster-name <cluster_name> --role-arn <tfc_agent_role>) and pass in using variable
  • ensure this process is done within 15mins

i.e:

provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
token = var.eks_auth_token
}

var "eks_auth_token" {
type = string
sensitive = true
}

This run ends up with the same error message, meaning it’s probably not related to a valid but expired token.

Validate token permission

To verify: we can try hitting cluster k8s API using auth token issued from AWS CLI without going through terraform:

  • fetch cluster_endpoint from either terraform output or AWS web console
  • fetch the cluster_certificate_authority_data from either terraform output or AWS web console, decode it and save it locally: echo -n '<cluster_certificate_authority_data>' | base64 --decode > /tmp/mycert
  • aws eks get-token --cluster-name <cluster_name> --role-arn <tfc_agent_role>
  • curl -s --cacert <(cat /tmp/mycert) --header "Authorization: Bearer <token>" --request GET 'https://<cluster_endpoint>/openapi/v2'

This setup returned the same unauthorized error message, however, if I replace tfc_agent_role with the role I used from local runs and repeat the steps above, I get a successful response.

Confirmed that this is caused by the difference in the AWS IAM role being used and this part is not explicitly covered in my terraform source.

Root cause

After some research, I found this page explaining the issue. I provisioned my EKS cluster with an OIDC provider to drive RBAC with SSO and intentionally didn’t specify aws-auth related configurations.

The IAM role used for creating the cluster will have system:master permission to k8s and won't need to be explicitly added to aws-auth config map, while other IAM roles need to be added to aws-auth explicitly for authorization.

In this case, the role I used for local runs was used during cluster creation. Therefore, despite not specifying that to aws-auth I got permission to do other k8s provisioning with this role. When migrating to TFC, the TFC agent role needs to be added to aws-auth to prevent this error. i.e.:

# in EKS module
aws_auth_roles = [
{
rolearn = local.tfc_role_arn
username = "tfc-agent"
groups = ["system:masters"]
},
{
rolearn = local.local_role_arn
username = "local-user"
groups = ["system:masters"]
}
]

After adding the above block, the unauthorized error message went away and I successfully migrated this project to terraform cloud.

Conclusion

If you find this to be helpful, give it a clap and it would mean the world to me. Please share this with whoever needs this, and I’d appreciate it if you want to buy me a coffee

--

--

Xing Du

Minimalist. Game Developer. Software Engineer. DevOps enthusiast. Foodie. Gamer.