I recently set up pod PIDs limit for my EKS clusters and I’m surprised that I couldn’t find good examples/instructions online on how to do this (it’s been enabled since k8s@1.20
)
In this post, I’ll walk through how to set this up for an AWS EKS cluster using the official EKS terraform module. Other kubelet
configurations or non-EKS k8s
clusters would essentially require very similar steps so I’m sure it will benefit more than just podPidsLimit
on EKS.
Why Pod PIDs Limit is necessary
Similar to CPU/memory, PIDs (Process IDs) are fundamental resources on kubernetes
nodes.
By default, kubernetes
does not limit how many PIDs a pod can consume. When mistakes (e.g. application thread leak) happen, pods will exhaust PIDs from the node (in a few minutes or a few weeks). Similar to how "noisy neighbor" consumes CPU/Memory, it will take down all the critical daemonset
pods (e.g. vpc-cni
for pod networking) as well as your application pods.
If the root of the problem comes from a deployment that has many replicas, it will cause a lot of damage to your entire kubernetes
cluster.
On one hand, application owners should prevent this issue from happening by paying close attention to application architecture and thread model changes of their applications. On the other hand, a kubernetes
cluster should not let workloads exhaust PIDs without a limit, similar to what CPU/Memory employs LimitRange
for.
How to check your current pod PIDs limit
kubectl get --raw "/api/v1/nodes/<nodename>/proxy/configz" | jq '.kubeletconfig.podPidsLimit'
k8s version
Per-pod PIDs limit is available since k8s@1.20
: I've worked on alternatives for k8s
below that version but this won't be the focus of this post.
I can cover that in a different post, leave a comment if you need this.
AMI
using official AL2-based EKS AMI.
OS/Node-level PIDs limit
On AL2 this was set to a relatively low value, however, this was updated in the EKS ami project in late September 2023.
If your issue comes from node level PIDs limit being too low, consider upgrading your EKS ami or adding a step in node provisioning. Details on node PIDs limit won’t be the focus of this post.
Options
Kubernetes allows you to limit the number of processes running in a Pod. You specify this limit at the node level, rather than configuring it as a resource limit for a particular Pod. Each Node can have a different PID limit.
(from: reference)
2 options to implement this:
--pod-max-pids
argument forkubelet
PodPidsLimit
inkubelet
configuration file
According to k8s
, option 1 is being deprecated in favor of option 2. If your k8s cluster doesn't need to be upgraded for the next few years, it's fine to use the 1st option. I'll briefly cover option 1 and focus on option 2.
Option 1: --pod-max-pids
Using AWS EKS terraform module:
bootstrap_extra_args = "--kubelet-extra-args '--pod-max-pids <your_limit>'"
Option 2: PodPidsLimit
The options for implementation depends on the version of k8s
.
Before k8s@1.28
You need to inject PodPidsLimit
into the kubelet
configuratio. The default configuration is located at: /etc/kubernetes/kubelet/kubelet-config.json
You can inject PodPidsLimit
(with jq
) into this configuration file in your pre_bootstrap_user_data
:
pre_bootstrap_user_data = <<-EOT
echo "$(jq '.PodPidsLimit = <your_limit>' /etc/kubernetes/kubelet/kubelet-config.json" > /etc/kubernetes/kubelet/kubelet-config.json
EOT
Alternatively, you can create your own kubelet
configuration file (make sure it's compatible with EKS), which includes PodPidsLimit
and point kubelet
to use that configuration:
pre_bootstrap_user_data = <<-EOT
# create your own kubelet config
EOT
bootstrap_extra_args = "--kubelet-extra-args '--config <path_to_your_kubelet_config>'"
k8s@1.28
or newer
kubelet
started accepting --config-dir
argument (referred to as drop in configuration) since k8s@1.28
(with some caveats): this allows you to specify kubelet
configuration overrides in a similar way to other linux system services.
You can only set — config-dir if you set the environment variable KUBELET_CONFIG_DROPIN_DIR_ALPHA for the kubelet process (the value of that variable does not matter)
You need to drop a .conf
file for kubelet
to load the KUBELET_CONFIG_DROPIN_DIR_ALPHA
environment variable. kubelet-envvar.conf
:
[Service]
Environment="KUBELET_CONFIG_DROPIN_DIR_ALPHA=true"
You also need a kubelet configuration file (as override to the default one). kubelet-config-override.yaml
:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
podPidsLimit: <your_limit>
cgroupDriver: "systemd"
hairpinMode: "hairpin-veth"
serializeImagePulls: false
Note that a few other things are specified in addition to podPidsLimit
: this is due to kubelet
configuration v1beta1
comes with default value (e.g. cgroupDriver
default), and the default value does NOT work on EKS.
You can check kubelet
config of a vanilla EKS AMI node to see what values are specified (necessary) to EKS. Alternatively, you can read the bootstrap.sh
to see which ones are necessary.
Once these 2 files are created, you need to use pre_bootstrap_user_data
to put them into the right place and use bootstrap_extra_args
to tell kubelet
to use --config-dir
:
pre_bootstrap_user_data = <<-EOT
cat << EOF > /etc/systemd/system/kubelet.service.d/kubelet-envvar.conf
${file("${path.module}/kubelet-envvar.conf")}
EOF
mkdir -p /etc/kubernetes/kubelet/drop-in-config
cat << EOF > /etc/kubernetes/kubelet/drop-in-config/10-override.conf
${file("${path.module}/kubelet-config-override.yaml")}
EOF
EOT
bootstrap_extra_args = "--kubelet-extra-args '--config-dir=/etc/kubernetes/kubelet/drop-in-config/'"
Conclusion
Echoing what I mentioned in the beginning, the steps here can be easily adjusted for kubelet
configuration other than podPidsLimit
or non-EKS k8s clusters.
With multiple options, pick the right option based on your use case. The option of using drop-in configuration requires the most work but may not be the best solution for your project.
If you find this to be helpful, give it a clap and it would mean the world to me. Please share this with whoever needs this, and I’d appreciate it if you want to buy me a coffee