Azure Kubernetes Service: Production Hardening Guide

Running Kubernetes in production requires more than deploying workloads—it demands security hardening, proper networking, observability, and disaster recovery planning. Azure Kubernetes Service (AKS) handles control plane management, but the shared responsibility model means you must secure the data plane. This guide covers production hardening practices from dozens of enterprise AKS deployments.

Network Architecture

flowchart TB
    Internet["Internet"] --> WAF["Azure WAF / Front Door"]
    WAF --> Ingress["Ingress Controller (Internal LB)"]
    
    subgraph VNET ["Hub-Spoke VNET"]
        subgraph AKS ["AKS Subnet"]
            Ingress --> Pods["Application Pods"]
        end
        
        subgraph Data ["Data Subnet"]
            SQL["Azure SQL (Private Endpoint)"]
            Redis["Redis Cache (Private Endpoint)"]
        end
        
        Pods --> SQL
        Pods --> Redis
    end
    
    style WAF fill:#FFCDD2,stroke:#C62828
    style Ingress fill:#E1F5FE,stroke:#0277BD

Azure CNI vs Kubenet

For production, always use Azure CNI:

  • Pods get VNET IPs (required for Private Endpoints)
  • Network Policies enforced by Azure
  • Lower latency than overlay networks
resource aks 'Microsoft.ContainerService/managedClusters@2022-09-01' = {
  properties: {
    networkProfile: {
      networkPlugin: 'azure'
      networkPolicy: 'azure' // or 'calico'
      serviceCidr: '10.0.0.0/16'
      dnsServiceIP: '10.0.0.10'
    }
  }
}

Security Hardening

Enable Azure Defender for Containers

az security pricing create -n Containers --tier Standard

This enables:

  • Runtime threat protection
  • Vulnerability scanning of images in ACR
  • Kubernetes audit log analysis

Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Workload Identity (AAD Pod Identity v2)

Eliminate service principal secrets by using Workload Identity:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: api-service-account
  annotations:
    azure.workload.identity/client-id: "00000000-0000-0000-0000-000000000000"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    metadata:
      labels:
        azure.workload.identity/use: "true"
    spec:
      serviceAccountName: api-service-account
      containers:
      - name: api
        image: myacr.azurecr.io/api:v1
        # No secrets needed - pod automatically gets Azure AD token

Observability Stack

resource aks 'Microsoft.ContainerService/managedClusters@2022-09-01' = {
  properties: {
    addonProfiles: {
      omsagent: {
        enabled: true
        config: {
          logAnalyticsWorkspaceResourceID: logAnalytics.id
        }
      }
      azureKeyvaultSecretsProvider: {
        enabled: true
      }
    }
    // Enable Container Insights with Managed Prometheus
    azureMonitorProfile: {
      metrics: {
        enabled: true
      }
    }
  }
}

Disaster Recovery

  • Multi-Region: Deploy AKS in paired regions with Azure Traffic Manager
  • Backup: Use Velero + Azure Blob for cluster state backup
  • GitOps: Flux or ArgoCD enables cluster recreation from Git

Key Takeaways

  • Use Azure CNI for production (not Kubenet)
  • Enable Azure Defender for runtime protection
  • Use Workload Identity for zero-secret deployments
  • Pod Security Standards replace PodSecurityPolicies
  • Container Insights + Managed Prometheus for observability
  • GitOps enables declarative cluster management and DR

Discover more from C4: Container, Code, Cloud & Context

Subscribe to get the latest posts sent to your email.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.