Karpenter vs Cluster Autoscaler

I've been using Cluster Autoscaler on my kubernetes clusters for years. And overall, I was pretty happy with how it did the job.

But time goes on, things change, and it's time to try Karpenter.

Karpenter and Cluster Autoscaler address similar challenges with different approaches. Here's a short comparison:

  • Karpenter: Provisions right-sized, custom nodes directly via cloud APIs, allowing dynamic configurations tailored to workloads. Cluster Autoscaler relies on predefined node groups (e.g., AWS ASGs) and scales within those constraints.
  • Karpenter optimizes resource utilization by efficiently packing workloads into nodes, reducing over-provisioning. At the same time, Cluster Autoscaler can be less efficient due to fixed node group sizes and configurations.
  • Karpenter performs faster scaling by directly interacting with EC2, bypassing ASGs. That is ideal for bursty workloads. Cluster Autoscaler is slower at scaling as it adjusts predefined groups, especially with spot requests.

Karpenter manages nodes with the help of CRDS: ec2nodeclass, node pool, and nodeclaim. This allows the platform or the development team to bundle the application's infrastructure requirements with the application helm chart or just as kubernetes manifest in yaml file like this:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
amiSelectorTerms:
 - name: "amazon-eks-node-${var.cluster_version}-*"

role: ${module.karpenter.node_iam_role_name}

subnetSelectorTerms:
  - tags:
      karpenter.sh/discovery: ${module.eks.cluster_name}

securityGroupSelectorTerms:
  - tags:
      karpenter.sh/discovery: ${module.eks.cluster_name}

tags:
  karpenter.sh/discovery: ${module.eks.cluster_name}
  Name: ${var.cluster_name}-workers-karpenter

blockDeviceMappings:
- deviceName: /dev/xvda
  ebs:
    volumeSize: 100Gi
    volumeType: gp3
    deleteOnTermination: true

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: "karpenter.k8s.aws/instance-category"
          operator: In
          values: ["m"]
        - key: "karpenter.k8s.aws/instance-cpu"
          operator: In
          values: ["2", "4", "8", "16", "32"]
        - key: "karpenter.k8s.aws/instance-hypervisor"
          operator: In
          values: ["nitro"]
        - key: "karpenter.k8s.aws/instance-generation"
          operator: Gt
          values: ["2"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 30s