Per-pod PIDs limit on EKS

Xing Du
AWS Tip
Published in
4 min readJan 1, 2024

--

I recently set up pod PIDs limit for my EKS clusters and I’m surprised that I couldn’t find good examples/instructions online on how to do this (it’s been enabled since k8s@1.20 )

In this post, I’ll walk through how to set this up for an AWS EKS cluster using the official EKS terraform module. Other kubelet configurations or non-EKS k8s clusters would essentially require very similar steps so I’m sure it will benefit more than just podPidsLimit on EKS.

Why Pod PIDs Limit is necessary

Similar to CPU/memory, PIDs (Process IDs) are fundamental resources on kubernetes nodes.

By default, kubernetes does not limit how many PIDs a pod can consume. When mistakes (e.g. application thread leak) happen, pods will exhaust PIDs from the node (in a few minutes or a few weeks). Similar to how "noisy neighbor" consumes CPU/Memory, it will take down all the critical daemonset pods (e.g. vpc-cni for pod networking) as well as your application pods.

If the root of the problem comes from a deployment that has many replicas, it will cause a lot of damage to your entire kubernetes cluster.

On one hand, application owners should prevent this issue from happening by paying close attention to application architecture and thread model changes of their applications. On the other hand, a kubernetes cluster should not let workloads exhaust PIDs without a limit, similar to what CPU/Memory employs LimitRange for.

How to check your current pod PIDs limit

kubectl get --raw "/api/v1/nodes/<nodename>/proxy/configz" | jq '.kubeletconfig.podPidsLimit'

k8s version

Per-pod PIDs limit is available since k8s@1.20: I've worked on alternatives for k8s below that version but this won't be the focus of this post.

I can cover that in a different post, leave a comment if you need this.

AMI

using official AL2-based EKS AMI.

OS/Node-level PIDs limit

On AL2 this was set to a relatively low value, however, this was updated in the EKS ami project in late September 2023.

If your issue comes from node level PIDs limit being too low, consider upgrading your EKS ami or adding a step in node provisioning. Details on node PIDs limit won’t be the focus of this post.

Options

Kubernetes allows you to limit the number of processes running in a Pod. You specify this limit at the node level, rather than configuring it as a resource limit for a particular Pod. Each Node can have a different PID limit.

(from: reference)

2 options to implement this:

  • --pod-max-pids argument for kubelet
  • PodPidsLimit in kubelet configuration file

According to k8s, option 1 is being deprecated in favor of option 2. If your k8s cluster doesn't need to be upgraded for the next few years, it's fine to use the 1st option. I'll briefly cover option 1 and focus on option 2.

Option 1: --pod-max-pids

Using AWS EKS terraform module:

bootstrap_extra_args = "--kubelet-extra-args '--pod-max-pids <your_limit>'"

Option 2: PodPidsLimit

The options for implementation depends on the version of k8s.

Before k8s@1.28

You need to inject PodPidsLimit into the kubelet configuratio. The default configuration is located at: /etc/kubernetes/kubelet/kubelet-config.json

You can inject PodPidsLimit (with jq) into this configuration file in your pre_bootstrap_user_data:

pre_bootstrap_user_data = <<-EOT
echo "$(jq '.PodPidsLimit = <your_limit>' /etc/kubernetes/kubelet/kubelet-config.json" > /etc/kubernetes/kubelet/kubelet-config.json
EOT

Alternatively, you can create your own kubelet configuration file (make sure it's compatible with EKS), which includes PodPidsLimit and point kubelet to use that configuration:

pre_bootstrap_user_data = <<-EOT
# create your own kubelet config
EOT
bootstrap_extra_args = "--kubelet-extra-args '--config <path_to_your_kubelet_config>'"

k8s@1.28 or newer

kubelet started accepting --config-dir argument (referred to as drop in configuration) since k8s@1.28 (with some caveats): this allows you to specify kubelet configuration overrides in a similar way to other linux system services.

You can only set — config-dir if you set the environment variable KUBELET_CONFIG_DROPIN_DIR_ALPHA for the kubelet process (the value of that variable does not matter)

You need to drop a .conf file for kubelet to load the KUBELET_CONFIG_DROPIN_DIR_ALPHA environment variable. kubelet-envvar.conf:

[Service]
Environment="KUBELET_CONFIG_DROPIN_DIR_ALPHA=true"

You also need a kubelet configuration file (as override to the default one). kubelet-config-override.yaml:

apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
podPidsLimit: <your_limit>
cgroupDriver: "systemd"
hairpinMode: "hairpin-veth"
serializeImagePulls: false

Note that a few other things are specified in addition to podPidsLimit: this is due to kubelet configuration v1beta1 comes with default value (e.g. cgroupDriver default), and the default value does NOT work on EKS.

You can check kubelet config of a vanilla EKS AMI node to see what values are specified (necessary) to EKS. Alternatively, you can read the bootstrap.sh to see which ones are necessary.

Once these 2 files are created, you need to use pre_bootstrap_user_data to put them into the right place and use bootstrap_extra_args to tell kubelet to use --config-dir:

pre_bootstrap_user_data = <<-EOT
cat << EOF > /etc/systemd/system/kubelet.service.d/kubelet-envvar.conf
${file("${path.module}/kubelet-envvar.conf")}
EOF

mkdir -p /etc/kubernetes/kubelet/drop-in-config

cat << EOF > /etc/kubernetes/kubelet/drop-in-config/10-override.conf
${file("${path.module}/kubelet-config-override.yaml")}
EOF
EOT
bootstrap_extra_args = "--kubelet-extra-args '--config-dir=/etc/kubernetes/kubelet/drop-in-config/'"

Conclusion

Echoing what I mentioned in the beginning, the steps here can be easily adjusted for kubelet configuration other than podPidsLimit or non-EKS k8s clusters.

With multiple options, pick the right option based on your use case. The option of using drop-in configuration requires the most work but may not be the best solution for your project.

If you find this to be helpful, give it a clap and it would mean the world to me. Please share this with whoever needs this, and I’d appreciate it if you want to buy me a coffee

--

--

Minimalist. Game Developer. Software Engineer. DevOps enthusiast. Foodie. Gamer.