Evaluation of Amazon EKS Auto Mode Compute Options for High Availability and Operational Ownership

Uncategorized
Area1) EKS-provided NodeClass + EKS built-in NodePools (system, general-purpose)2) EKS-provided NodeClass + Custom NodePool(s)3) Custom NodeClass + Custom NodePool(s)
What you get (big wins)Fastest + simplest “production-ready baseline.” Built-ins give you: system pool isolation for critical add-ons (CriticalAddonsOnly taint) and a general-purpose pool. (AWS Documentation)Compute control without touching networking: you can tune AZs/arch/Spot vs On-Demand/instance categories, set CPU+memory limits, and define disruption policies using NodePool. (AWS Documentation)Full infra policy control (within Auto Mode): customize networking placement, SG selection, SNAT policy, network policy defaults, event logging, pod subnet isolation, plus storage/tagging knobs via NodeClass. (AWS Documentation)
What you lose / constraintsLeast flexibility: built-ins are fixed. Both built-ins are On-Demand only, C/M/R families, gen≥5, and general-purpose is amd64-only. (AWS Documentation)You still inherit default NodeClass networking choices. If you need custom subnets/SGs/pod-subnet isolation, you can’t do it here. (AWS Documentation)Highest complexity. More chances to misconfigure (subnet tags/AZ mismatch, SG selection, IAM/access-entry gaps). Also, still cannot choose AMI (AWS-managed). (AWS Documentation)
NodeClass availability / dependencyDefault NodeClass is automatically provisioned when built-ins are enabled. (AWS Documentation)Important: Default NodeClass exists only if at least one built-in pool is enabled. Practically, most teams keep system enabled. (AWS Documentation)If you disable all built-ins, you must create your NodeClass + NodePool. Also, AWS says do not name your custom NodeClass default. (AWS Documentation)
HA posture (cluster add-ons)Strong default: system NodePool is designed to isolate critical add-ons using CriticalAddonsOnly taint; many add-ons tolerate it. (AWS Documentation)You can keep the same HA posture by leaving system enabled and moving apps to custom pools. (Common “best of both worlds.”) (AWS Documentation)If you disable built-ins, you must recreate the “system isolation” pattern yourself (taints/tolerations + capacity plan). Otherwise cluster add-ons and apps compete for the same pool. (AWS Documentation)
Networking/security control (big differentiator)Minimal (defaults).Minimal (still defaults).Maximum (within Auto Mode): NodeClass can select node SGs, node subnets, SNAT policy, network policy defaults/logging, and pod subnet isolation. (AWS Documentation)
Compute/cost tuningLimited to AWS defaults (On-Demand only, fixed family/arch constraints). (AWS Documentation)Strong: NodePool lets you constrain instance types/categories, AZs, arch, Spot/On-Demand, and set CPU/memory limits. (AWS Documentation)Strongest overall: same as option 2 plus the ability to align networking/security posture to cost/scale requirements (e.g., pod subnet isolation for IP exhaustion scenarios). (AWS Documentation)
Upgrades & maintenance (who does what)AWS patches nodes + rolls AMIs; you mainly ensure workloads tolerate disruption (PDBs/topology spread). (AWS Documentation)Same AWS responsibility for patching; you additionally manage NodePool policies (limits, consolidation timing, disruption budgets) to control upgrade impact. (AWS Documentation)Same AWS patching; you also own NodeClass lifecycle (network/storage/tagging changes) + any required IAM/access-entry work for custom roles. (AWS Documentation)
DevOps workload (ongoing)Low: mostly app HA policies + observing events/node health. Node health monitoring/auto-repair capabilities exist and the monitoring agent is included for Auto Mode clusters. (AWS Documentation)Medium: everything in option 1 plus managing one or more NodePools (requirements, limits, disruption windows/budgets) and avoiding over-constraint. (AWS Documentation)High: everything in option 2 plus NodeClass governance (subnets/SG/SNAT/pod-subnet isolation, storage/KMS/tagging) and IAM/access-entry associations for node roles. (AWS Documentation)
Typical fitTeams optimizing for simplicity, fastest time-to-production, and standard workloads.Most common “enterprise sweet spot”: keep AWS defaults for networking, but add NodePools for HA + cost + workload segmentation.Regulated / complex networking environments: explicit subnet/SG policy, IP management requirements, pod subnet isolation, stricter infra governance. (AWS Documentation)

Below is the built-in NodePool comparison you get with “full” EKS Auto Mode: system and general-purpose.

Topicsystem (built-in)general-purpose (built-in)
Primary purposeDedicated capacity for cluster-critical add-ons to improve stability/isolation. (AWS Documentation)Default pool for general workloads (microservices, web apps, etc.) with “reasonable defaults.” (AWS Documentation)
How pods get scheduled onto itNodes have a CriticalAddonsOnly taint → pods must have a matching toleration (and typically select the pool) to run here. Example uses nodeSelector: karpenter.sh/nodepool: system + toleration. (AWS Documentation)Typical workloads just target Auto Mode nodes with eks.amazonaws.com/compute-type: auto; unless you explicitly target another pool, this is the “default” place most apps land. (AWS Documentation)
Who should run here (allowed use)CoreDNS and other critical add-ons that tolerate CriticalAddonsOnly, plus any custom critical components you want isolated (monitoring/ingress controllers, etc.)—if they can tolerate the taint. (AWS Documentation)Application workloads and services that don’t need “system-only” isolation. (AWS Documentation)
Main limitation (behavioral)Regular app pods won’t schedule here unless you add the toleration (by design). (AWS Documentation)No built-in isolation; system add-ons and apps can compete unless you keep system enabled and schedule critical add-ons there. (AWS Documentation)
CPU architecture supportamd64 + arm64 (AWS Documentation)amd64 only (AWS Documentation)
Capacity typeOn-Demand only (AWS Documentation)On-Demand only (AWS Documentation)
Instance families & generationsC/M/R families, gen 5+ (AWS Documentation)C/M/R families, gen 5+ (AWS Documentation)
NodeClass usedUses the default EKS NodeClass (AWS Documentation)Uses the default EKS NodeClass (AWS Documentation)
Can you edit/customize it?No (you can only enable/disable). For customization you must create your own NodePool(s). (AWS Documentation)No (you can only enable/disable). For customization you must create your own NodePool(s). (AWS Documentation)
Operational dependency noteIf you disable all built-in pools, EKS won’t automatically provision the default NodeClass—you must create a custom NodeClass + NodePool. (AWS Documentation)Same dependency note. (AWS Documentation)

Leave a Reply