A running Kubernetes cluster contains node agents (kubelet
) and master
components (APIs, scheduler, etc), on top of a distributed storage solution.
This diagram shows our desired eventual state, though we're still working on a
few things, like making kubelet
itself (all our components, really) run within
containers, and making the scheduler 100% pluggable.
When looking at the architecture of the system, we'll break it down to services that run on the worker node and services that compose the cluster-level control plane.
The Kubernetes node has the services necessary to run application containers and be managed from the master systems.
Each node runs Docker, of course. Docker takes care of the details of downloading images and running containers.
The kubelet
manages pods and their containers, their
images, their volumes, etc.
Each node also runs a simple network proxy and load balancer (see the
services FAQ for
more details). This reflects services
(see
the services doc for more details) as defined in
the Kubernetes API on each node and can do simple TCP and UDP stream forwarding
(round robin) across a set of backends.
Service endpoints are currently found via DNS or through
environment variables (both
Docker-links-compatible and
Kubernetes {FOO}_SERVICE_HOST
and {FOO}_SERVICE_PORT
variables are
supported). These variables resolve to ports managed by the service proxy.
The Kubernetes control plane is split into a set of components. Currently they all run on a single master node, but that is expected to change soon in order to support high-availability clusters. These components work together to provide a unified view of the cluster.
All persistent master state is stored in an instance of etcd
. This provides a
great way to store configuration data reliably. With watch
support,
coordinating components can be notified very quickly of changes.
The apiserver serves up the Kubernetes API. It is intended to be a
CRUD-y server, with most/all business logic implemented in separate components
or in plug-ins. It mainly processes REST operations, validates them, and updates
the corresponding objects in etcd
(and eventually other stores).
The scheduler binds unscheduled pods to nodes via the /binding
API. The
scheduler is pluggable, and we expect to support multiple cluster schedulers and
even user-provided schedulers in the future.
All other cluster-level functions are currently performed by the Controller
Manager. For instance, Endpoints
objects are created and updated by the
endpoints controller, and nodes are discovered, managed, and monitored by the
node controller. These could eventually be split into separate components to
make them independently pluggable.
The replicationcontroller
is a
mechanism that is layered on top of the simple pod
API. We eventually plan to port it to a generic plug-in mechanism, once one is
implemented.