Most tutorials explain Pods, Services, and Deployments. But running Kubernetes (K8s) in production requires understanding the control plane, the network model, and the reconciliation loop. In this article, we dissect the architecture of a high-availability K8s cluster and explored advanced scheduling patterns.
The Control Plane vs Data Plane
A K8s cluster consists of the Control Plane (Masters) and the Data Plane (Nodes). Understanding this separation is crucial for debugging.
graph TB
subgraph "Control Plane (Master)"
API[API Server]
ETCD[(etcd)]
SCH[Scheduler]
CM[Controller Manager]
end
subgraph "Worker Node 1"
Kubelet
Proxy[Kube-Proxy]
Pod1[Pod A]
Pod2[Pod B]
end
subgraph "Worker Node 2"
Kubelet2[Kubelet]
Proxy2[Kube-Proxy]
Pod3[Pod C]
end
API --> ETCD
SCH --> API
CM --> API
API --> Kubelet
API --> Kubelet2
style API fill:#E1F5FE,stroke:#0277BD
style ETCD fill:#FFF3E0,stroke:#EF6C00
- API Server: The only component that talks to etcd. All other components verify state by asking the API Server. It is the gatekeeper.
- etcd: The brain. A distributed key-value store. If you lose etcd data, you lose the cluster. Back up etcd.
- Kubelet: The agent on the device. It says “I am Node 1, I have 4GB RAM”. It pulls images and runs containers.
Understanding Requests vs Limits
This is the #1 cause of outages I see.
Requests are for Scheduling. “I need 500m CPU”. The scheduler finds a node with 500m free.
Limits are for Throttling/Killing. “I cannot exceed 1000m”. If you try, the Linux kernel throttles you (CPU) or OOM Kills you (Memory).
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Best Practice: Always set Memory Request == Memory Limit (Guaranteed QoS). This prevents the “Eviction” chaos when a node creates memory pressure. CPU can be burstable, but memory cannot.
Discover more from C4: Container, Code, Cloud & Context
Subscribe to get the latest posts sent to your email.