Running Kubernetes in production requires careful planning across networking, security, reliability, and operations. After managing multiple production AKS clusters, here are the practices that matter most.
Cluster Configuration
Start with the right foundation. These settings are hard to change later:
- Azure CNI Networking: Use Azure CNI for production. It provides better network performance and is required for Windows containers and some advanced networking scenarios
- Private Clusters: Keep the API server private. Expose it only through VPN or Azure Bastion
- Availability Zones: Spread nodes across zones for resilience. This is free in most regions
- Managed Identity: Use managed identity instead of service principals
# Create production-ready cluster
az aks create \
--name prod-cluster \
--resource-group prod-rg \
--node-count 3 \
--zones 1 2 3 \
--network-plugin azure \
--enable-private-cluster \
--enable-managed-identity \
--enable-aad \
--aad-admin-group-object-ids $ADMIN_GROUP_ID \
--enable-azure-policy
Node Pool Strategy
Use multiple node pools for workload isolation:
# System node pool for critical addons
az aks nodepool add \
--cluster-name prod-cluster \
--name system \
--node-count 3 \
--mode System \
--node-taints CriticalAddonsOnly=true:NoSchedule
# General workload pool
az aks nodepool add \
--name workloads \
--node-count 5 \
--enable-cluster-autoscaler \
--min-count 3 \
--max-count 20
# High-memory pool for specific workloads
az aks nodepool add \
--name highmem \
--node-vm-size Standard_E8s_v3 \
--node-count 2
Security Essentials
- Azure AD Integration: Use Azure AD for authentication, RBAC for authorization
- Pod Managed Identity: Workloads get Azure identities without secrets
- Network Policies: Enforce network segmentation between namespaces
- Azure Policy: Prevent privileged containers, enforce image sources
- Container Insights: Monitor for security anomalies
Reliability Patterns
# Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: api
---
# Resource Quota per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
Key Takeaways
- Use Azure CNI, private clusters, and availability zones from the start
- Separate node pools for system components and workloads
- Integrate Azure AD, use pod managed identities, enable network policies
- Define PDBs and resource quotas for reliability
References
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.