Getting more m̶i̶l̶e̶a̶g̶e̶ “pod-age” out of your Amazon EKS cluster !!

7 min readAug 11, 2021

A lotus from our roof top garden at our home in Bengaluru — August 2021

TLDR:
Warning — There is Kubernetes and networking jargon in this blog ;-)

Update: September 13th 2021 — There is also an official AWS blog which covers this in more detail — https://aws.amazon.com/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/

The Amazon VPC Container Networking Interface (CNI) Plugin now supports running more pods per node on AWS Nitro based EC2 instance types. To achieve higher pod density, the VPC CNI plugin leverages a new VPC capability that enables IP address prefixes to be attached to EC2 instances. So that means, for example, if previously anm5.large instance type can support a maximum of 29 pods per node (actually its 27, if you account for the CNI plugin and kube proxy on each node), and now with VPC prefix delegation enabled, a maximum of 110 number of pods can be launched in an m5.large instance !! Isn’t that beautiful, and more bang for the buck and as they say, get more m̶i̶l̶e̶a̶g̶e̶ “pod-age” on the same EC2 Instance type !!

This blog is, as always, my personal experimentation and a write-up on how to enable this in my Amazon EKS cluster.

Background

Amazon EKS supports native VPC networking with the Amazon VPC Container Network Interface (CNI) plugin for Kubernetes. This plugin assigns an IP address from your VPC to each pod. Earlier, the number of pods that could be deployed on a EC2 node was based on the type of EC2 Instance (the number of ENI’s and number of IP addresses per ENI — as per https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt).

The formula for the max pods (Number of network interfaces for the instance type × (the number of IP addressess per network interface - 1)) + 2

With IP address prefix assignment, additional VPC IPv4 addresses can be attached to each worker node, enabling you to run more pods and fully utilize node resources on Nitro based EC2 instance types. Additionally, fewer network interfaces are required to allocate IP addresses for pods, which allows clusters to scale out faster in response to application usage spikes.

Useful links:

Announcement — https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-vpc-cni-plugin-increases-pods-per-node-limits/
Documentation — https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
I have reused a lot of the steps from eksworkshop.com for creating the Cluster, Ingress and other stuff ..

Prerequisites:

List of EC2 instance types built on Nitro system — https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html#ec2-nitro-instances
A running Amazon EKS cluster, I have an cluster with managed node groups with two m5.large EC2 instances running the latest version of Kubernetes 1.21

$ kubectl get nodes
NAME                                           STATUS   ROLES    AGE    VERSION
ip-192-168-11-126.us-west-2.compute.internal   Ready    <none>   3m5s   v1.21.2-13+d2965f0db10712
ip-192-168-43-76.us-west-2.compute.internal    Ready    <none>   3m7s   v1.21.2-13+d2965f0db10712$ kubectl get nodes -o json | jq '.items[].status.capacity.pods'
"29"
"29"

Lets deploy a simple nginx deployment and this should run successfully as the number of pods is only 25, well within the range of 27 pods per node for m5.large.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 25
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: public.ecr.aws/ubuntu/nginx:latest
        name: nginx
        
$ kubectl apply -f nginx-orig.yaml$ kubectl get pods
NAME                    READY   STATUS    RESTARTS   AGE
nginx-94576f674-4mvq5   1/1     Running   0          12s
nginx-94576f674-8prdz   1/1     Running   0          12s
nginx-94576f674-b5tzf   1/1     Running   0          12s
nginx-94576f674-bn9q6   1/1     Running   0          12s
nginx-94576f674-bskls   1/1     Running   0          12s
nginx-94576f674-bxq9q   1/1     Running   0          12s
nginx-94576f674-dvdz8   1/1     Running   0          12s
nginx-94576f674-fkzhc   1/1     Running   0          12s
nginx-94576f674-fnhbk   1/1     Running   0          12s
nginx-94576f674-g4jrz   1/1     Running   0          12s
nginx-94576f674-hpslb   1/1     Running   0          12s
nginx-94576f674-jx2c8   1/1     Running   0          12s
nginx-94576f674-ks7vt   1/1     Running   0          12s

When you increase the number of pods to a larger number like say 150, and redeploy, you will find a lot of pods will be in the pending stage and will not reach a running stage. When you do describe of a pod, you will find that it was unable to assign an IP address.

So, its time to kick the tires for Assigning prefixes to Amazon EC2 network interface and using version 1.9.0 or later of the Amazon VPC CNI add-on to assign /28 (16 IP addresses) IP address prefixes. I followed the instructions as per https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html for setting this up ..

Step1 : Added the vpc-cni — managed add-on with the latest version from the EKS console

Step 2: Other steps as per the documentation

You can get the max pods for various EC2 Nitro Instance types by using this script - https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/max-pods-calculator.sh by passing on the parameters as given below ..

$ kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true$ curl -o max-pods-calculator.sh https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/max-pods-calculator.sh
$ chmod +x max-pods-calculator.sh$ ./max-pods-calculator.sh --instance-type m5.large --cni-version 1.9.0-eksbuild.1 --cni-prefix-delegation-enabled
110110 is the maximum number of pods recommended by                                                             Amazon EKS for a m5.large instance. If the                                                            ENABLE_PREFIX_DELEGATION parameter is not enabled,                                                             the recommended maximum pods is 29.$ kubectl set env ds aws-node -n kube-system WARM_PREFIX_TARGET=1
daemonset.apps/aws-node env updated

Step 3: Create a new managed node group by using the following cluster config file with eksctl, notice the maxpodspernode is now set to 110 and a label, so that we place the pods on the right nodes.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfigmetadata:
  name: eksworkshop-eksctl
  region: us-west-2
  version: "1.21"availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c"]managedNodeGroups:
- name: nodegroup-morepods
  labels: { role: testprefix }
  desiredCapacity: 2
  maxPodsPerNode: 110
  instanceType: m5.large
  ssh:
    enableSsm: true$ eksctl create nodegroup -f eksworkshop.yaml$ kubectl get nodes -lrole=testprefix
NAME                                           STATUS   ROLES    AGE     VERSION
ip-192-168-27-144.us-west-2.compute.internal   Ready    <none>   5h34m   v1.21.2-13+d2965f0db10712
ip-192-168-32-182.us-west-2.compute.internal   Ready    <none>   5h34m   v1.21.2-13+d2965f0db10712$ kubectl get nodes -o json  -lrole=testprefix | jq '.items[].status.capacity.pods'
"110"
"110"

Step 4: Deploy the nginx deployment with nodeselector set to the newly created nodegroup.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 150
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector: 
        role: testprefix 
      containers:
      - image: public.ecr.aws/ubuntu/nginx:latest
        name: nginx
$ kubectl apply -f nginx.yaml

Voila !! the pods will reach a Running stage, you can check the number of running pods and the pod distribution across the two nodes

$ kubectl get po -o wide -A | grep nginx | grep Running | wc -l
150$ kubectl get po -o wide -A | grep nginx | grep Running | grep ip-192-168-27-144.us-west-2.compute.internal | wc -l
75$ kubectl get po -o wide -A | grep nginx | grep Running | grep ip-192-168-32-182.us-west-2.compute.internal  | wc -l
75

Ok, now that 150pods got deployed, while earlier we were getting errors. Hurray !!

Now, lets get to the next step of actually creating a Kubernetes deployment and a service with an AWS Application Load Balancer as an Ingress. I just followed the steps at https://www.eksworkshop.com/beginner/130_exposing-service/ingress_controller_alb/ and instead of the 2048-game, I used the nginxdemos/hello container image with 150 pod replicas in the deployment, which was more useful and displays the hostname and IP address of the pod. Also, note the node selector to ensure the the pods land on the right nodegroup.

---
apiVersion: v1
kind: Namespace
metadata:
  name: game-2048
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: game-2048
  name: deployment-2048
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: app-2048
  replicas: 150
  template:
    metadata:
      labels:
        app.kubernetes.io/name: app-2048
    spec:
      nodeSelector: 
        role: testprefix 
      containers:
      - image: nginxdemos/hello
        name: app-2048
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  namespace: game-2048
  name: service-2048
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  namespace: game-2048
  name: ingress-2048
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  rules:
    - http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: service-2048
              port:
                number: 80

Lets verify the deployment via kubectl as well as the ALB endpoint from a browser.

$ cat 2048_full_latest.yaml | sed 's=alb.ingress.kubernetes.io/target-type: ip=alb.ingress.kubernetes.io/target-type: instance=g' | kubectl apply -f -                                                                                                                                                            
namespace/game-2048 created
deployment.apps/deployment-2048 created
service/service-2048 created
ingress.networking.k8s.io/ingress-2048 created$ kubectl get ingress/ingress-2048 -n game-2048
NAME           CLASS    HOSTS   ADDRESS                                                                   PORTS   AGE
ingress-2048   <none>   *       k8s-game2048-ingress2-8ae3738fd5-1131464554.us-west-2.elb.amazonaws.com   80      85s$ kubectl get pods -n game-2048 | grep -i Running | wc -l                                                                          
150

nginxdemos being served by the application load balancer !!

So, thanks to the Amazon EKS team for adding this feature for increasing the pod density on EC2 Nodes by leveraging the prefix feature of Amazon EC2 network interfaces, thereby helping the customers reduce costs by enabling us to run more pods and fully utilize node resources on Nitro based EC2 instance types. Additionally, fewer network interfaces are required to allocate IP addresses for pods, which allows clusters to scale out faster in response to application usage spikes.

The GitHub mechanism for getting feedback and sharing status at https://github.com/aws/containers-roadmap/projects/1 is also a great way for getting the voice of the customer. Feel free to add in your vote to the various upcoming features and also get an insight into whats coming next.

Hope this blog was useful ..

Getting more m̶i̶l̶e̶a̶g̶e̶ “pod-age” out of your Amazon EKS cluster !!

Background

Useful links:

Prerequisites:

Written by Mani