Getting more m̶i̶l̶e̶a̶g̶e̶ “pod-age” out of your Amazon EKS cluster !!

Mani
7 min readAug 11, 2021
A lotus from our roof top garden at our home in Bengaluru — August 2021

TLDR:

Warning — There is Kubernetes and networking jargon in this blog ;-)

Update: September 13th 2021 — There is also an official AWS blog which covers this in more detail — https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/

The Amazon VPC Container Networking Interface (CNI) Plugin now supports running more pods per node on AWS Nitro based EC2 instance types. To achieve higher pod density, the VPC CNI plugin leverages a new VPC capability that enables IP address prefixes to be attached to EC2 instances. So that means, for example, if previously anm5.large instance type can support a maximum of 29 pods per node (actually its 27, if you account for the CNI plugin and kube proxy on each node), and now with VPC prefix delegation enabled, a maximum of 110 number of pods can be launched in an m5.large instance !! Isn’t that beautiful, and more bang for the buck and as they say, get more m̶i̶l̶e̶a̶g̶e̶ “pod-age” on the same EC2 Instance type !!

This blog is, as always, my personal experimentation and a write-up on how to enable this in my Amazon EKS cluster.

Background

Amazon EKS supports native VPC networking with the Amazon VPC Container Network Interface (CNI) plugin for Kubernetes. This plugin assigns an IP address from your VPC to each pod. Earlier, the number of pods that could be deployed on a EC2 node was based on the type of EC2 Instance (the number of ENI’s and number of IP addresses per ENI — as per https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt).

The formula for the max pods (Number of network interfaces for the instance type × (the number of IP addressess per network interface - 1)) + 2

With IP address prefix assignment, additional VPC IPv4 addresses can be attached to each worker node, enabling you to run more pods and fully utilize node resources on Nitro based EC2 instance types. Additionally, fewer network interfaces are required to allocate IP addresses for pods, which allows clusters to scale out faster in response to application usage spikes.

Useful links:

Prerequisites:

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-11-126.us-west-2.compute.internal Ready <none> 3m5s v1.21.2-13+d2965f0db10712
ip-192-168-43-76.us-west-2.compute.internal Ready <none> 3m7s v1.21.2-13+d2965f0db10712
$ kubectl get nodes -o json | jq '.items[].status.capacity.pods'
"29"
"29"

Lets deploy a simple nginx deployment and this should run successfully as the number of pods is only 25, well within the range of 27 pods per node for m5.large.

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 25
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: public.ecr.aws/ubuntu/nginx:latest
name: nginx

$ kubectl apply -f nginx-orig.yaml
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-94576f674-4mvq5 1/1 Running 0 12s
nginx-94576f674-8prdz 1/1 Running 0 12s
nginx-94576f674-b5tzf 1/1 Running 0 12s
nginx-94576f674-bn9q6 1/1 Running 0 12s
nginx-94576f674-bskls 1/1 Running 0 12s
nginx-94576f674-bxq9q 1/1 Running 0 12s
nginx-94576f674-dvdz8 1/1 Running 0 12s
nginx-94576f674-fkzhc 1/1 Running 0 12s
nginx-94576f674-fnhbk 1/1 Running 0 12s
nginx-94576f674-g4jrz 1/1 Running 0 12s
nginx-94576f674-hpslb 1/1 Running 0 12s
nginx-94576f674-jx2c8 1/1 Running 0 12s
nginx-94576f674-ks7vt 1/1 Running 0 12s

When you increase the number of pods to a larger number like say 150, and redeploy, you will find a lot of pods will be in the pending stage and will not reach a running stage. When you do describe of a pod, you will find that it was unable to assign an IP address.

kubectl describe pod

So, its time to kick the tires for Assigning prefixes to Amazon EC2 network interface and using version 1.9.0 or later of the Amazon VPC CNI add-on to assign /28 (16 IP addresses) IP address prefixes. I followed the instructions as per https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html for setting this up ..

Step1 : Added the vpc-cni — managed add-on with the latest version from the EKS console

vpc-cni managed add-on

Step 2: Other steps as per the documentation

You can get the max pods for various EC2 Nitro Instance types by using this script - https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/max-pods-calculator.sh by passing on the parameters as given below ..

$ kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true$ curl -o max-pods-calculator.sh https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/max-pods-calculator.sh
$ chmod +x max-pods-calculator.sh$ ./max-pods-calculator.sh --instance-type m5.large --cni-version 1.9.0-eksbuild.1 --cni-prefix-delegation-enabled
110
110 is the maximum number of pods recommended by Amazon EKS for a m5.large instance. If the ENABLE_PREFIX_DELEGATION parameter is not enabled, the recommended maximum pods is 29.$ kubectl set env ds aws-node -n kube-system WARM_PREFIX_TARGET=1
daemonset.apps/aws-node env updated

Step 3: Create a new managed node group by using the following cluster config file with eksctl, notice the maxpodspernode is now set to 110 and a label, so that we place the pods on the right nodes.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: eksworkshop-eksctl
region: us-west-2
version: "1.21"
availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c"]managedNodeGroups:
- name: nodegroup-morepods
labels: { role: testprefix }
desiredCapacity: 2
maxPodsPerNode: 110
instanceType: m5.large
ssh:
enableSsm: true
$ eksctl create nodegroup -f eksworkshop.yaml$ kubectl get nodes -lrole=testprefix
NAME STATUS ROLES AGE VERSION
ip-192-168-27-144.us-west-2.compute.internal Ready <none> 5h34m v1.21.2-13+d2965f0db10712
ip-192-168-32-182.us-west-2.compute.internal Ready <none> 5h34m v1.21.2-13+d2965f0db10712
$ kubectl get nodes -o json -lrole=testprefix | jq '.items[].status.capacity.pods'
"110"
"110"

Step 4: Deploy the nginx deployment with nodeselector set to the newly created nodegroup.

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 150
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
nodeSelector:
role: testprefix

containers:
- image: public.ecr.aws/ubuntu/nginx:latest
name: nginx

$ kubectl apply -f nginx.yaml

Voila !! the pods will reach a Running stage, you can check the number of running pods and the pod distribution across the two nodes

$ kubectl get po -o wide -A | grep nginx | grep Running | wc -l
150
$ kubectl get po -o wide -A | grep nginx | grep Running | grep ip-192-168-27-144.us-west-2.compute.internal | wc -l
75
$ kubectl get po -o wide -A | grep nginx | grep Running | grep ip-192-168-32-182.us-west-2.compute.internal | wc -l
75
EKS console

Ok, now that 150pods got deployed, while earlier we were getting errors. Hurray !!

Now, lets get to the next step of actually creating a Kubernetes deployment and a service with an AWS Application Load Balancer as an Ingress. I just followed the steps at https://www.eksworkshop.com/beginner/130_exposing-service/ingress_controller_alb/ and instead of the 2048-game, I used the nginxdemos/hello container image with 150 pod replicas in the deployment, which was more useful and displays the hostname and IP address of the pod. Also, note the node selector to ensure the the pods land on the right nodegroup.

---
apiVersion: v1
kind: Namespace
metadata:
name: game-2048
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: game-2048
name: deployment-2048
spec:
selector:
matchLabels:
app.kubernetes.io/name: app-2048
replicas: 150
template:
metadata:
labels:
app.kubernetes.io/name: app-2048
spec:
nodeSelector:
role: testprefix
containers:
- image: nginxdemos/hello

name: app-2048
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
namespace: game-2048
name: service-2048
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
type: NodePort
selector:
app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: game-2048
name: ingress-2048
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: service-2048
port:
number: 80

Lets verify the deployment via kubectl as well as the ALB endpoint from a browser.

$ cat 2048_full_latest.yaml | sed 's=alb.ingress.kubernetes.io/target-type: ip=alb.ingress.kubernetes.io/target-type: instance=g' | kubectl apply -f -                                                                                                                                                            
namespace/game-2048 created
deployment.apps/deployment-2048 created
service/service-2048 created
ingress.networking.k8s.io/ingress-2048 created
$ kubectl get ingress/ingress-2048 -n game-2048
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-2048 <none> * k8s-game2048-ingress2-8ae3738fd5-1131464554.us-west-2.elb.amazonaws.com 80 85s
$ kubectl get pods -n game-2048 | grep -i Running | wc -l
150
nginxdemos being served by the application load balancer !!

So, thanks to the Amazon EKS team for adding this feature for increasing the pod density on EC2 Nodes by leveraging the prefix feature of Amazon EC2 network interfaces, thereby helping the customers reduce costs by enabling us to run more pods and fully utilize node resources on Nitro based EC2 instance types. Additionally, fewer network interfaces are required to allocate IP addresses for pods, which allows clusters to scale out faster in response to application usage spikes.

The GitHub mechanism for getting feedback and sharing status at https://github.com/aws/containers-roadmap/projects/1 is also a great way for getting the voice of the customer. Feel free to add in your vote to the various upcoming features and also get an insight into whats coming next.

Hope this blog was useful ..

--

--

Mani

Principal Solutions Architect at AWS India, and I blog/post about interesting stuff that I am curious about and which is relevant to developers & customers.