Send AWS VPC CNI metrics to Datadog

Xing Du
2 min readJun 3, 2024

--

I’ve been polishing my EKS observability recently and noticed missing coverage on VPC CNI.

After doing some quick research, I failed to find an official integration for AWS VPC CNI on Datadog. Despite the disappointment, it’s not difficult to implement an equivalent solution (at a small price to pay).

I’m sharing this quick solution hoping that someone can correct me or Datadog folks can adopt this as an official integration.

Why

AWS VPC CNI is the add-on responsible for pod network on AWS EKS (or self-managed k8s on AWS).

It’s one of the most important add-on for your k8s cluster: when an instance fails, the node can no longer launch or terminate pods: they are stuck indefinitely. (see my other post on what happens when it fails) Therefore, having observability covering the insights of VPC CNI is equally important.

How

cni-metrics-helper

You can deploy cni-metrics-helper helm chart and try to port the metrics from CW to Datadog.

This solution seems more official but is also a bit complicated to me, and I didn’t try to validate if it works.

prometheus crawler

VPC CNI has its own metrics endpoint and the port is exposed (and bound to the host): metrics from each aws-node pod can be accessed from the node.

Datadog agent is also a daemonset and supports prometheus crawling based on configuration.

All we need is to configure datadog-agent with one additional prometheus instance for AWS VPC CNI at localhost:61678/metrics and properly configure the metric mapping.

You don’t need to set anything in VPC CNI’s helm chart values / EKS addon configurations, the only thing you need is to add the following to the Datadog helm chart:

agent:
confd:
prometheus.yaml: |-
init_config:
instances:
- prometheus_url: http://localhost:61678/metrics
namespace: kube-system
metrics:
# array of metric name, optionally mapping to a new name.
- awscni_add_ip_req_count: awscni.add_ip_req_count
- awscni_assigned_ip_addresses: awscni.assigned_ip_addresses
...
# skipping the rest for brevity

To view all the metrics available, you can:

  • ssh to the node and run curl http://localhost:61678/metrics, or
  • kubectl exec daemonset/datadog -n <datadog_namespace> -- curl http://localhost:61678/metrics

Results

After deploying the updated Datadog helm chart, you'll have VPC CNI metrics available in Datadog under kube_system.awscni.*. E.g.:

Now I can proceed to create a monitor to help me catch issues when awscni_eni_allocated is approaching awscni_eni_max or when awscni_no_available_ip_addresses is not 0, preventing major incidents on my EKS clusters.

--

--

Xing Du

Minimalist. Game Developer. Software Engineer. DevOps enthusiast. Foodie. Gamer.