Understanding the Kubelet Core Execution Frame
Kubelet is the node agent in a Kubernetes cluster, and is responsible for the Pod lifecycle management on the local node. Kubelet first obtains the Pod configurations assigned to the local node, and then invokes the bottom-layer container runtime, such as Docker or PouchContainer, based on the obtained configurations to create Pods. Then Kubelet monitors the Pods, ensuring that all Pods on the node run in the expected state. This article analyzes the previous process using the Kubelet source code.
Obtaining Pod Configurations
Kubelet can obtain Pod configurations required by the local node in multiple ways. The most important way is Apiserver. Kubelet can also obtain the Pod configurations by specifying the file directory or accessing the specified HTTP port. Kubelet periodically accesses the directory or HTTP port to obtain Pod configuration updates and adjust the Pod running status on the local node.
During the initialization of Kubelet, a PodConfig
object is created, as shown below:
// kubernetes/pkg/kubelet/config/config.go
type PodConfig struct {
pods *podStorage
mux *config.Mux
// the channel of denormalized changes passed to listeners
updates chan kubetypes.PodUpdate
...
}
PodConfig
is essentially a multiplexer of Pod configurations. The built-in mux
can listen on the sources of various Pod configurations (including apiserver, file, and http), and periodically synchronize the Pod configuration status of the sources. The pods
caches the Pod configuration status of the sources in last synchronization. After comparing the configurations, mux can get the Pods of which the configurations have changed. Then, mux classifies the Pods based on the change types, and injects a PodUpdate structure into each type of Pod:
// kubernetes/pkg/kubelet/types/pod_update.go
type PodUpdate struct {
Pods []*v1.Pod
Op PodOperation
Source string
}
The Op
field defines the Pod change type. For example, its value can be ADD
or REMOVE
, indicating to add or delete the Pods defined in Pods
. Last, all types of PodUpdate
will be injected to updates
of PodConfig
. Therefore, we only need to listen to the updates
channel to obtain Pod configuration updates of the local node.
Pod Synchronization
After the Kubelet initialization is complete, the syncLoop
function as shown below is invoked:
// kubernetes/pkg/kubelet/kubelet.go
// syncLoop is the main loop for processing changes. It watches for changes from
// three channels (file, apiserver, and http) and creates a union of them. For
// any new change seen, will run a sync against desired state and running state. If
// no changes are seen to the configuration, will synchronize the last known desired
// state every sync-frequency seconds. Never returns.
func (kl *Kubelet) syncLoop(updates <-chan kubetypes.PodUpdate, handler SyncHandler){
...
for {
if !kl.syncLoopIteration(...) {
break
}
}
...
}
As indicated in the comments, the syncLoop
function is the major cycle of Kubelet. This function listens on the updates, obtains the latest Pod configurations, and synchronizes the running state and desired state. In this way, all Pods on the local node can run in the expected states. Actually, syncLoop
only encapsulates syncLoopIteration
, while the synchronization operation is carried out by syncLoopIteration
.
// kubernetes/pkg/kubelet/kubelet.go
func (kl *Kubelet) syncLoopIteration(configCh <-chan kubetypes.PodUpdate ......) bool {
select {
case u, open := <-configCh:
switch u.Op {
case kubetypes.ADD:
handler.HandlePodAdditions(u.Pods)
case kubetypes.UPDATE:
handler.HandlePodUpdates(u.Pods)
...
}
case e := <-plegCh:
...
handler.HandlePodSyncs([]*v1.Pod{pod})
...
case <-syncCh:
podsToSync := kl.getPodsToSync()
if len(podsToSync) == 0 {
break
}
handler.HandlePodSyncs(podsToSync)
case update := <-kl.livenessManager.Updates():
if update.Result == proberesults.Failure {
...
handler.HandlePodSyncs([]*v1.Pod{pod})
}
case <-housekeepingCh:
...
handler.HandlePodCleanups()
...
}
}
The syncLoopIteration
function has a simple processing logic. It listens to multiple channels. Once it obtains a type of event from a channel, it invokes the corresponding function to process the event. The following is the processing of different events:
- Obtain the Pod configuration changes from
configCh
, and invoke the corresponding function based on the change type. For example, if new Pods are bound to the local node, theHandlePodAdditions
function is invoked to create these Pods. If some Pod configurations are changed, theHandlePodUpdates
function is invoked to update the Pods. - If the container status in the Pod has changed (for example, a new container is created and launched), a
PodlifecycleEvent
is sent to theplegCh
channel. The event includes the event typeContainerStarted
, container ID, and the ID of the Pod to which the container belongs. ThensyncLoopIteration
will invokeHandlePodSyncs
to synchronize the Pod configurations. syncCh
is in fact a timer. By default, Kubelet triggers this timer every second to synchronize the Pod configurations on the local node.- During initialization, Kubelet creates a
livenessManager
to check the health status of configured Pods. If Kubelet detects a running error of a Pod, it invokesHandlePodSyncs
to synchronize the Pod. This part will be further described later. houseKeepingCh
is also a timer. By default, Kubelet triggers this timer every two seconds and invokes theHandlePodCleanups
function for processing. This is a periodic cleanup mechanism in which the resources of the stopped Pods are reclaimed at a certain interval.
As shown in the above figure, the execution paths of most processing functions are similar. The functions, including HandlePodAdditions
, HandlePodUpdates
, and HandlePodSyncs
will invoke the dispatchWork
function after completing their own operations. If the dispatchWork
function detects that the Pod to be synchronized is not in the Terminated
state, it invokes the Update
method of podWokers
to update the Pod. We can consider the process of Pod creation, update, or synchronization as the status transition from running to desired. This helps you understand the Pod update and synchronization processes. For Pod creation, we can consider that the current status of new Pod is null. Then the Pod creation can also be considered as a status transition process. Therefore, in Pod creation, update, or synchronization, the status of Pods can be changed to the target status only by invoking the Update
function.
podWorkers
is created during Kubelet initialization, as shown below:
// kubernetes/pkg/kubelet/pod_workers.go
type podWorkers struct {
...
podUpdates map[types.UID]chan UpdatePodOptions
isWorking map[types.UID]bool
lastUndeliveredWorkUpdate map[types.UID]UpdatePodOptions
workQueue queue.WorkQueue
syncPodFn syncPodFnType
podCache kubecontainer.Cache
...
}
Kubelet configures a dedicated pod worker for each created pod. The pod worker is in fact the goroutine. It creates a channel with buffer size 1 and type UpdatePodOptions
(which is a pod update event), listens to the channel to obtain pod update events, and invokes the specified synchronization function in the syncPodFn
field of podWorkers
to perform synchronization.
In addition, the pod worker registers the channel to the podUpdates
map in podWorkers
so that the specified update event can be sent to the corresponding pod worker for processing.
If another update event occurs when the current event is being processed, what will happen? podWorkers
caches the latest event to lastUndeliveredWorkUpdate
, and processes it immediately after the processing of the current event is complete.
The pod worker adds the processed pod to workQueue
of podWorkers
every time an update event is processed, and inserts an additional delay. The pod can be retrieved from the queue only when the delay expires, and the next synchronization is performed. As previously mentioned, syncCh
is triggered every second to collect the Pods to be synchronized on the local node, and then HandlePodSyncs
is invoked to perform synchronization. These Pods are expired at the current time point and are obtained from workQueue
. Then, the entire pod synchronization process form a closed ring, as shown below.
When creating the podWorkers
object, Kubelet uses its own syncPod
method to initialize syncPodFn
. However, this method is only used to prepare the synchronization. For example, it uploads the latest Pod status to Apiserver, creates the dedicated directory for Pods, and obtains the pull secrets of Pods. Then, Kubelet invokes the SyncPod
method of its own containerRuntime
for synchronization. containerRuntime
abstracts the bottom-layer container running of Kubelet, and defines various interfaces for container running. SyncPod
is one of the interfaces.
Kubelet does not carry out any container-related operation. Pod synchronization is essentially the container status change. Achieving container status change must invoke and run the bottom-layer container such as PouchContainer.
The following describes the SyncPod
method of containerRuntime
to show the real synchronization operations:
// kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
func (m *kubeGenericRuntimeManager) SyncPod(pod *v1.Pod, _ v1.PodStatus, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, backOff *flowcontrol.Backoff) (result kubecontainer.PodSyncResult)
This function first invokes computePodActions(pod, podStatus)
to compare the current Pod status podStatus
and target Pod status pod
, and then calculates the required synchronization operations. After the calculation is complete, a PodActions
object is returned, as shown below:
// kubernetes/pkg/kubelet/kuberuntime/kuberuntime_manager.go
type podActions struct {
KillPod bool
CreateSandbox bool
SandboxID string
Attempt uint32
ContainersToKill map[kubecontainer.ContainerID]containerToKillInfo
NextInitContainerToStart *v1.Container
ContainersToStart []int
}
Actually, PodActions
is an operation list:
- Generally, the values of
KillPod
andCreateSandbox
are the same, indicating whether to kill the current Pod sandbox (if a new Pod is created, this operation is null) and create a new sandbox. SandboxID
identifies the Pod creation operation. If its value is null, this is the first time to create Pod. If its value is not null, this is the new sandbox created after the old one is killed.Attempt
indicates the number of times the Pod recreates sanboxes. For the first time to create Pod, this value is 0. It has the similar function toSandboxID
.ContainersToKill
specifies the containers to be killed in the Pod because the container configurations have changed or the health check fails.- If the running of init container of Pod has not finished or a running error occurs,
NextInitContainerToStart
indicates the next init container to be created. Create and start this init container. The synchronization is complete. - If the Pod sandbox has been created and running of init container is complete, start the ordinary containers that have not run in the Pod according to
ContainersToStart
.
With such an operation list, the remaining operations of SyncPod
are simple. That is, it only needs to invoke the interfaces corresponding to the bottom-layer container running one by one to perform the container adding and deleting operations, to complete synchronization.
The summarized Pod synchronization procedure is: When the Pod target status changes or a synchronization interval times out, a Pod synchronization is triggered. Synchronization is to compare the container target status with the current status, generate a container start/stop list, and invoke the bottom-layer container runtime interfaces based on the list to start or stop the containers.
Conclusion
If a container is a process, Kubelet is a container-oriented process monitor. The job of Kubelet is to continuously change the Pod running status on the local node to the target status. During the transition, unwanted containers are deleted and new containers are created and configured. There is no repeated modification, start, or stop operations on an existing container. This is all about Kubelet's core processing logic.
Notes
- The source code in this article is from Kubernetes v1.9.4, commit: bee2d1505c4fe820744d26d41ecd3fdd4a3d6546
- For detailed comments about Kubernetes source code, visit my GitHub page.
- Reference: What even is a kubelet?