My EKS clusters were created with terraform
EKS module@v19 and I recently upgraded to version 20, which introduced a breaking change on how to set up cluster auth.
I didn’t find any good documentation on how this upgrade process should be and I decided to write down the things I learned in the process. Hopefully, it will benefit others who need to perform the same upgrade.
aws-auth
aws-auth
ConfigMap
has been the only option for EKS auth. The cluster creator will be granted cluster admin permission without having to be added to this ConfigMap
explicitly.
From the cluster provisioning perspective, I’m not a fan of this approach. Almost everything else are AWS resources and are created with aws
provider, this is the only thing (correct me if I missed anything) that’s inconsistent when standing up a vanilla EKS cluster with terraform
.
The cluster creator’s access is not reflected anywhere explicitly, which is another thing I don’t like.
EKS Access entry
EKS access entry is the replacement of aws-auth
for the exact same pain point I mentioned above.
EKS access entry offloads RBAC from k8s
domain to aws
domain. For each entry, you can either use managed EKS access policy or continue with binding with k8s
group
s.
AWS documentation has mentioned that the cluster creator has its implicit entry in EKS access entry, which is consistent with how the other authentication would be set up.
EKS access policy
4 options to choose from, very coarse as of this writing. However, I believe this will get more granular in the future and the best part (could be a double-edged sword) is that it’s managed.
k8s RBAC
Used for customizing RBAC for the access entry policy (e.g. custom resources). See my other post on when & how to k8s groups in EKS access entry
Upgrade
First, bump up aws
provider version to the latest stable. I suggest applying this as a stand-alone change and validating it before moving forward.
Second, address the terraform
complaints. Mainly coming from 3 things:
aws-auth
related variables should be removed:manage_aws_auth_configmap
,aws_auth_roles
, etcauthentication_mode
access_entries
For EKS clusters created using the EKS module before version 20, authentication_mode
is defaulted to CONFIG_MAP
. I suggest taking this opportunity to upgrade to API
if you don’t want to have a hard time upgrading to version 21.
To upgrade from CONFIG_MAP
to API
, you need to run 2 terraform apply
s since this transition is not allowed by AWS. You need to make a change from CONFIG_MAP
to API_AND_CONFIG_MAP
, terraform apply
, and make another change from API_AND_CONFIG_MAP
to API
and terraform apply
.
After this change, AWS converts your existing aws-auth
data into access entries. They will NOT be in your tfstate
and you will run into multiple “resource already exist” errors if you try to apply directly.
Note that the cluster creator role would have a matching EKS access entry If enable_cluster_creator_admin_permissions
is set to true
. If you created another entry under access_entries
block, you’ll get an error during terraform apply
.
It’s worth taking the time to run a terraform state list
after creating all the access entries (creator + additional ones under access_entries
). In the case of upgrading an existing cluster, I noticed duplicated tfstate
resources: 2 resources (aws_eks_access_entry
, different addresses) in tfstate
share the same content. If the same thing happen to you, use terraform state rm
to deduplicate.
Draft your entries based on how you configured aws-auth
previously, run terraform plan
to get the accurate address of the aws_eks_access_entry
resources to be created, and use import
block (alternatively, terraform import
) to import these converted access entries to your tfstate
and adjust the values (e.g. if you want to use managed policy instead of k8s
group
)
Now you should be able to run your terraform
without other issues after the upgrade.
Post upgrade
2 important things to callout:
- the upgrade process will empty the
aws-auth
ConfigMap
but it will NOT delete the object. Runkubectl delete configmap/aws-auth -n kube-system
to avoid issues. I believe this is a bug from AWS as of this writing: ifaws-auth
exists in anAPI
access-mode EKS cluster, theaws-auth
still takes precedence in some occasions (I ran into an “unauthorized” error fromterraform
(authenticating usingget-token
API) - the submodules inside EKS module, e.g.
karpenter
(ref) oreks-managed-node-group
(ref) should be upgraded to the matching version if you’re utilizing these modules.