kubernetes/k8s CNI分析-容器网络接口分析

关联博客:kubernetes/k8s CSI分析-容器存储接口分析

kubernetes/k8s CRI分析-容器运行时接口分析

概述

kubernetes的设计初衷是支持可插拔架构,从而利于扩展kubernetes的功能。在此架构思想下,kubernetes提供了3个特定功能的接口,分别是容器网络接口CNI、容器运行时接口CRI和容器存储接口CSI。kubernetes通过调用这几个接口,来完成相应的功能。

下面我们来对容器运行时接口CNI来做一下介绍与分析。

CNI是什么

CNI,全称是 Container Network Interface,即容器网络接口。

CNI是K8s 中标准的调用网络实现的接口。Kubelet 通过这个标准的接口来调用不同的网络插件以实现不同的网络配置方式。

CNI网络插件是一个可执行文件,是遵守容器网络接口(CNI)规范的网络插件。常见的 CNI网络插件包括 Calico、flannel、Terway、Weave Net等。

当kubelet选择使用CNI类型的网络插件时(通过kubelet启动参数指定),kubelet在创建pod、删除pod的时候,会调用CNI网络插件来做pod的构建网络和销毁网络等操作。

kubelet的网络插件

kubelet的网络插件有以下3种类型:

(1)CNI;

(2)kubenet;

(3)Noop,代表不配置网络插件。

这里主要对kubelet中CNI相关的源码进行分析。

CNI架构

kubelet创建/删除pod时,会调用CRI,然后CRI会调用CNI来进行pod网络的构建/删除。

kubernetes/k8s CNI分析-容器网络接口分析

kubelet构建pod网络的大致过程

(1)kubelet先通过CRI创建pause容器(pod sandbox),生成network namespace;

(2)kubelet根据启动参数配置调用具体的网络插件如CNI网络插件;

(3)网络插件给pause容器(pod sandbox)配置网络;

(4)pod 中其他的容器都与pause容器(pod sandbox)共享网络。

kubelet中cni相关的源码分析

kubelet的cni源码分析包括如下几部分:

(1)cni相关启动参数分析;

(2)关键struct/interface分析;

(3)cni初始化分析;

(4)cni构建pod网络分析;

(5)cni销毁pod网络分析。

基于tag v1.17.4

https://github.com/kubernetes/kubernetes/releases/tag/v1.17.4

1.kubelet组件cni相关启动参数分析

kubelet组件cni相关启动参数相关代码如下:

// pkg/kubelet/config/flags.go
func (s *ContainerRuntimeOptions) AddFlags(fs *pflag.FlagSet) {
...
// Network plugin settings for Docker.
fs.StringVar(&s.NetworkPluginName, "network-plugin", s.NetworkPluginName, fmt.Sprintf("<Warning: Alpha feature> The name of the network plugin to be invoked for various events in kubelet/pod lifecycle. %s", dockerOnlyWarning))
fs.StringVar(&s.CNIConfDir, "cni-conf-dir", s.CNIConfDir, fmt.Sprintf("<Warning: Alpha feature> The full path of the directory in which to search for CNI config files. %s", dockerOnlyWarning))
fs.StringVar(&s.CNIBinDir, "cni-bin-dir", s.CNIBinDir, fmt.Sprintf("<Warning: Alpha feature> A comma-separated list of full paths of directories in which to search for CNI plugin binaries. %s", dockerOnlyWarning))
fs.StringVar(&s.CNICacheDir, "cni-cache-dir", s.CNICacheDir, fmt.Sprintf("<Warning: Alpha feature> The full path of the directory in which CNI should store cache files. %s", dockerOnlyWarning))
fs.Int32Var(&s.NetworkPluginMTU, "network-plugin-mtu", s.NetworkPluginMTU, fmt.Sprintf("<Warning: Alpha feature> The MTU to be passed to the network plugin, to override the default. Set to 0 to use the default 1460 MTU. %s", dockerOnlyWarning))
...
}

cni相关启动参数的默认值在NewContainerRuntimeOptions函数中设置。

// cmd/kubelet/app/options/container_runtime.go
// NewContainerRuntimeOptions will create a new ContainerRuntimeOptions with
// default values.
func NewContainerRuntimeOptions() *config.ContainerRuntimeOptions {
dockerEndpoint := ""
if runtime.GOOS != "windows" {
dockerEndpoint = "unix:///var/run/docker.sock"
} return &config.ContainerRuntimeOptions{
ContainerRuntime: kubetypes.DockerContainerRuntime,
RedirectContainerStreaming: false,
DockerEndpoint: dockerEndpoint,
DockershimRootDirectory: "/var/lib/dockershim",
PodSandboxImage: defaultPodSandboxImage,
ImagePullProgressDeadline: metav1.Duration{Duration: 1 * time.Minute},
ExperimentalDockershim: false, //Alpha feature
CNIBinDir: "/opt/cni/bin",
CNIConfDir: "/etc/cni/net.d",
CNICacheDir: "/var/lib/cni/cache",
}
}

下面来简单分析几个比较重要的cni相关启动参数:

(1)--network-plugin:指定要使用的网络插件类型,可选值cnikubenet"",默认为空串,代表Noop,即不配置网络插件(不构建pod网络)。此处配置值为cni时,即指定kubelet使用的网络插件类型为cni

(2)--cni-conf-dir:CNI 配置文件所在路径。默认值:/etc/cni/net.d

(3)--cni-bin-dir:CNI 插件的可执行文件所在路径,kubelet 将在此路径中查找 CNI 插件的可执行文件来执行pod的网络操作。默认值:/opt/cni/bin

2.关键struct/interface分析

interface NetworkPlugin

先来看下关键的interface:NetworkPlugin

NetworkPlugin interface声明了kubelet网络插件的一些操作方法,不同类型的网络插件只需要实现这些方法即可,其中最关键的就是SetUpPodTearDownPod方法,作用分别是构建pod网络与销毁pod网络,cniNetworkPlugin实现了该interface。

// pkg/kubelet/dockershim/network/plugins.go
// NetworkPlugin is an interface to network plugins for the kubelet
type NetworkPlugin interface {
// Init initializes the plugin. This will be called exactly once
// before any other methods are called.
Init(host Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) error // Called on various events like:
// NET_PLUGIN_EVENT_POD_CIDR_CHANGE
Event(name string, details map[string]interface{}) // Name returns the plugin's name. This will be used when searching
// for a plugin by name, e.g.
Name() string // Returns a set of NET_PLUGIN_CAPABILITY_*
Capabilities() utilsets.Int // SetUpPod is the method called after the infra container of
// the pod has been created but before the other containers of the
// pod are launched.
SetUpPod(namespace string, name string, podSandboxID kubecontainer.ContainerID, annotations, options map[string]string) error // TearDownPod is the method called before a pod's infra container will be deleted
TearDownPod(namespace string, name string, podSandboxID kubecontainer.ContainerID) error // GetPodNetworkStatus is the method called to obtain the ipv4 or ipv6 addresses of the container
GetPodNetworkStatus(namespace string, name string, podSandboxID kubecontainer.ContainerID) (*PodNetworkStatus, error) // Status returns error if the network plugin is in error state
Status() error
}

struct cniNetworkPlugin

cniNetworkPlugin struct实现了NetworkPlugin interface,实现了SetUpPodTearDownPod等方法。

// pkg/kubelet/dockershim/network/cni/cni.go
type cniNetworkPlugin struct {
network.NoopNetworkPlugin loNetwork *cniNetwork sync.RWMutex
defaultNetwork *cniNetwork host network.Host
execer utilexec.Interface
nsenterPath string
confDir string
binDirs []string
cacheDir string
podCidr string
}

struct PluginManager

struct PluginManager中的plugin属性是interface NetworkPlugin类型,可以传入具体的网络插件实现,如cniNetworkPlugin struct

// pkg/kubelet/dockershim/network/plugins.go
// The PluginManager wraps a kubelet network plugin and provides synchronization
// for a given pod's network operations. Each pod's setup/teardown/status operations
// are synchronized against each other, but network operations of other pods can
// proceed in parallel.
type PluginManager struct {
// Network plugin being wrapped
plugin NetworkPlugin // Pod list and lock
podsLock sync.Mutex
pods map[string]*podLock
}

struct dockerService

struct dockerService其实在CRI分析的博文部分有做过详细分析,可以去回顾一下,下面再简单做一下介绍。

struct dockerService实现了CRI shim服务端的容器运行时接口以及容器镜像接口,所以其代表了dockershim(kubelet内置的CRI shim)的服务端。

struct dockerService中的network属性是struct PluginManager类型,在该结构体初始化时会将具体的网络插件结构体如struct cniNetworkPlugin存储进该属性。

创建pod、删除pod时会根据dockerService结构体的network属性里面存储的具体的网络插件结构体,去调用某个具体网络插件(如cniNetworkPlugin)的SetUpPodTearDownPod方法来构建pod的网络、销毁pod的网络。

// pkg/kubelet/dockershim/docker_service.go
type dockerService struct {
client libdocker.Interface
os kubecontainer.OSInterface
podSandboxImage string
streamingRuntime *streamingRuntime
streamingServer streaming.Server network *network.PluginManager
// Map of podSandboxID :: network-is-ready
networkReady map[string]bool
networkReadyLock sync.Mutex containerManager cm.ContainerManager
// cgroup driver used by Docker runtime.
cgroupDriver string
checkpointManager checkpointmanager.CheckpointManager
// caches the version of the runtime.
// To be compatible with multiple docker versions, we need to perform
// version checking for some operations. Use this cache to avoid querying
// the docker daemon every time we need to do such checks.
versionCache *cache.ObjectCache
// startLocalStreamingServer indicates whether dockershim should start a
// streaming server on localhost.
startLocalStreamingServer bool // containerCleanupInfos maps container IDs to the `containerCleanupInfo` structs
// needed to clean up after containers have been removed.
// (see `applyPlatformSpecificDockerConfig` and `performPlatformSpecificContainerCleanup`
// methods for more info).
containerCleanupInfos map[string]*containerCleanupInfo
}

3.cni初始化分析

Kubelet 启动过程中针对网络主要做以下步骤,分别是探针获取当前环境的网络插件以及初始化网络插件(只有当容器运行时选择为内置dockershim时,才会做CNI的初始化操作,将CNI初始化完成后交给dockershim使用)。

cni初始化的调用链:

main (cmd/kubelet/kubelet.go)

-> NewKubeletCommand (cmd/kubelet/app/server.go)

-> Run (cmd/kubelet/app/server.go)

-> run (cmd/kubelet/app/server.go)

-> RunKubelet (cmd/kubelet/app/server.go)

-> CreateAndInitKubelet(cmd/kubelet/app/server.go)

-> kubelet.NewMainKubelet(pkg/kubelet/kubelet.go)

-> cni.ProbeNetworkPlugins & network.InitNetworkPlugin(pkg/kubelet/network/plugins.go)

调用链很长,这里直接进入关键的函数NewMainKubelet进行分析。

NewMainKubelet

NewMainKubelet函数中主要看到dockershim.NewDockerService调用。

// pkg/kubelet/kubelet.go
// NewMainKubelet instantiates a new Kubelet object along with all the required internal modules.
// No initialization of Kubelet and its modules should happen here.
func NewMainKubelet(kubeCfg *kubeletconfiginternal.KubeletConfiguration,...) {
...
switch containerRuntime {
case kubetypes.DockerContainerRuntime:
// Create and start the CRI shim running as a grpc server.
streamingConfig := getStreamingConfig(kubeCfg, kubeDeps, crOptions)
ds, err := dockershim.NewDockerService(kubeDeps.DockerClientConfig, crOptions.PodSandboxImage, streamingConfig,
&pluginSettings, runtimeCgroups, kubeCfg.CgroupDriver, crOptions.DockershimRootDirectory, !crOptions.RedirectContainerStreaming)
...
}

这里对变量containerRuntime值等于docker时做分析,即kubelet启动参数--container-runtime值为docker,这时kubelet会使用内置的CRI shimdockershim作为容器运行时,初始化并启动dockershim

其中,调用dockershim.NewDockerService的作用是:新建并初始化dockershim服务端,包括初始化docker client、初始化cni网络配置等操作。

而其中CNI部分的主要逻辑为:

(1)调用cni.ProbeNetworkPlugins:根据kubelet启动参数cni相关配置,获取cni配置文件、cni网络插件可执行文件等信息,根据这些cni的相关信息来初始化cniNetworkPlugin结构体并返回;

(2)调用network.InitNetworkPlugin:根据networkPluginName的值(对应kubelet启动参数--network-plugin),选择相应的网络插件,调用其Init()方法,做网络插件的初始化操作(初始化操作主要是起了一个goroutine,定时探测cni的配置文件以及可执行文件,让其可以热更新);

(3)将上面步骤中获取到的cniNetworkPlugin结构体,赋值给dockerService structnetwork属性,待后续创建pod、删除pod时可以调用cniNetworkPluginSetUpPodTearDownPod方法来构建pod的网络、销毁pod的网络。

kubelet对CNI的实现的主要代码:pkg/kubelet/network/cni/cni.go-SetUpPod/TearDownPod(构建Pod网络和销毁Pod网络)

其中函数入参pluginSettings *NetworkPluginSettings的参数值,其实是从kubelet启动参数配置而来,kubelet cni相关启动参数在前面已经做了分析了,忘记的可以回头看一下。

// pkg/kubelet/dockershim/docker_service.go
// NewDockerService creates a new `DockerService` struct.
// NOTE: Anything passed to DockerService should be eventually handled in another way when we switch to running the shim as a different process.
func NewDockerService(config *ClientConfig, podSandboxImage string, streamingConfig *streaming.Config, pluginSettings *NetworkPluginSettings,
cgroupsName string, kubeCgroupDriver string, dockershimRootDir string, startLocalStreamingServer bool, noJsonLogPath string) (DockerService, error) {
...
ds := &dockerService{
client: c,
os: kubecontainer.RealOS{},
podSandboxImage: podSandboxImage,
streamingRuntime: &streamingRuntime{
client: client,
execHandler: &NativeExecHandler{},
},
containerManager: cm.NewContainerManager(cgroupsName, client),
checkpointManager: checkpointManager,
startLocalStreamingServer: startLocalStreamingServer,
networkReady: make(map[string]bool),
containerCleanupInfos: make(map[string]*containerCleanupInfo),
noJsonLogPath: noJsonLogPath,
}
...
// dockershim currently only supports CNI plugins.
pluginSettings.PluginBinDirs = cni.SplitDirs(pluginSettings.PluginBinDirString)
// (1)根据kubelet启动参数cni相关配置,获取cni配置文件、cni网络插件可执行文件等信息,根据这些cni的相关信息来初始化```cniNetworkPlugin```结构体并返回
cniPlugins := cni.ProbeNetworkPlugins(pluginSettings.PluginConfDir, pluginSettings.PluginCacheDir, pluginSettings.PluginBinDirs)
cniPlugins = append(cniPlugins, kubenet.NewPlugin(pluginSettings.PluginBinDirs, pluginSettings.PluginCacheDir))
netHost := &dockerNetworkHost{
&namespaceGetter{ds},
&portMappingGetter{ds},
}
// (2)根据networkPluginName的值(对应kubelet启动参数```--network-plugin```),选择相应的网络插件,调用其```Init()```方法,做网络插件的初始化操作(初始化操作主要是起了一个goroutine,定时探测cni的配置文件以及可执行文件,让其可以热更新)
plug, err := network.InitNetworkPlugin(cniPlugins, pluginSettings.PluginName, netHost, pluginSettings.HairpinMode, pluginSettings.NonMasqueradeCIDR, pluginSettings.MTU)
if err != nil {
return nil, fmt.Errorf("didn't find compatible CNI plugin with given settings %+v: %v", pluginSettings, err)
}
// (3)将上面步骤中获取到的```cniNetworkPlugin```结构体,赋值给```dockerService struct```的```network```属性,待后续创建pod、删除pod时可以调用```cniNetworkPlugin```的```SetUpPod```、```TearDownPod```方法来构建pod的网络、销毁pod的网络。
ds.network = network.NewPluginManager(plug)
klog.Infof("Docker cri networking managed by %v", plug.Name())
...
}

先来看下pluginSettings长什么样,其实是struct NetworkPluginSettings,包含了网络插件名称、网络插件可执行文件所在目录、网络插件配置文件所在目录等属性,代码如下:

// pkg/kubelet/dockershim/docker_service.go
type NetworkPluginSettings struct {
// HairpinMode is best described by comments surrounding the kubelet arg
HairpinMode kubeletconfig.HairpinMode
// NonMasqueradeCIDR is the range of ips which should *not* be included
// in any MASQUERADE rules applied by the plugin
NonMasqueradeCIDR string
// PluginName is the name of the plugin, runtime shim probes for
PluginName string
// PluginBinDirString is a list of directiores delimited by commas, in
// which the binaries for the plugin with PluginName may be found.
PluginBinDirString string
// PluginBinDirs is an array of directories in which the binaries for
// the plugin with PluginName may be found. The admin is responsible for
// provisioning these binaries before-hand.
PluginBinDirs []string
// PluginConfDir is the directory in which the admin places a CNI conf.
// Depending on the plugin, this may be an optional field, eg: kubenet
// generates its own plugin conf.
PluginConfDir string
// PluginCacheDir is the directory in which CNI should store cache files.
PluginCacheDir string
// MTU is the desired MTU for network devices created by the plugin.
MTU int
}

3.1 cni.ProbeNetworkPlugins

cni.ProbeNetworkPlugins中主要作用为:根据kubelet启动参数cni相关配置,获取cni配置文件、cni网络插件可执行文件等信息,根据这些cni的相关信息来初始化cniNetworkPlugin结构体并返回。

其中看到plugin.syncNetworkConfig()调用,主要作用是给cniNetworkPlugin结构体的defaultNetwork属性赋值。

// pkg/kubelet/dockershim/network/cni/cni.go
// ProbeNetworkPlugins : get the network plugin based on cni conf file and bin file
func ProbeNetworkPlugins(confDir, cacheDir string, binDirs []string) []network.NetworkPlugin {
old := binDirs
binDirs = make([]string, 0, len(binDirs))
for _, dir := range old {
if dir != "" {
binDirs = append(binDirs, dir)
}
} plugin := &cniNetworkPlugin{
defaultNetwork: nil,
loNetwork: getLoNetwork(binDirs),
execer: utilexec.New(),
confDir: confDir,
binDirs: binDirs,
cacheDir: cacheDir,
} // sync NetworkConfig in best effort during probing.
plugin.syncNetworkConfig()
return []network.NetworkPlugin{plugin}
}
plugin.syncNetworkConfig()

主要逻辑:

(1)getDefaultCNINetwork():根据kubelet启动参数配置,去对应的cni conf文件夹下寻找cni配置文件,返回包含cni信息的cniNetwork结构体;

(2)plugin.setDefaultNetwork():根据上一步获取到的cniNetwork结构体,赋值给cniNetworkPlugin结构体的defaultNetwork属性。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) syncNetworkConfig() {
network, err := getDefaultCNINetwork(plugin.confDir, plugin.binDirs)
if err != nil {
klog.Warningf("Unable to update cni config: %s", err)
return
}
plugin.setDefaultNetwork(network)
}
getDefaultCNINetwork()

主要逻辑:

(1)在cni配置文件所在目录下,可以识别3种cni配置文件,分别是.conf, .conflist, .json

(2)调用sort.Strings()将cni配置文件所在目录下的所有cni配置文件按照字典顺序升序排序。

(3)只取第一个读取到的cni配置文件,然后直接return。所以就算在cni配置文件目录下配置了多个cni配置文件,也只会有其中一个最终生效。

(4)调用cniConfig.ValidateNetworkList(),校验cni可执行文件目录下是否存在对应的可执行文件。

// pkg/kubelet/dockershim/network/cni/cni.go
func getDefaultCNINetwork(confDir string, binDirs []string) (*cniNetwork, error) {
files, err := libcni.ConfFiles(confDir, []string{".conf", ".conflist", ".json"})
switch {
case err != nil:
return nil, err
case len(files) == 0:
return nil, fmt.Errorf("no networks found in %s", confDir)
} cniConfig := &libcni.CNIConfig{Path: binDirs} sort.Strings(files)
for _, confFile := range files {
var confList *libcni.NetworkConfigList
if strings.HasSuffix(confFile, ".conflist") {
confList, err = libcni.ConfListFromFile(confFile)
if err != nil {
klog.Warningf("Error loading CNI config list file %s: %v", confFile, err)
continue
}
} else {
conf, err := libcni.ConfFromFile(confFile)
if err != nil {
klog.Warningf("Error loading CNI config file %s: %v", confFile, err)
continue
}
// Ensure the config has a "type" so we know what plugin to run.
// Also catches the case where somebody put a conflist into a conf file.
if conf.Network.Type == "" {
klog.Warningf("Error loading CNI config file %s: no 'type'; perhaps this is a .conflist?", confFile)
continue
} confList, err = libcni.ConfListFromConf(conf)
if err != nil {
klog.Warningf("Error converting CNI config file %s to list: %v", confFile, err)
continue
}
}
if len(confList.Plugins) == 0 {
klog.Warningf("CNI config list %s has no networks, skipping", string(confList.Bytes[:maxStringLengthInLog(len(confList.Bytes))]))
continue
} // Before using this CNI config, we have to validate it to make sure that
// all plugins of this config exist on disk
caps, err := cniConfig.ValidateNetworkList(context.TODO(), confList)
if err != nil {
klog.Warningf("Error validating CNI config list %s: %v", string(confList.Bytes[:maxStringLengthInLog(len(confList.Bytes))]), err)
continue
} klog.V(4).Infof("Using CNI configuration file %s", confFile) return &cniNetwork{
name: confList.Name,
NetworkConfig: confList,
CNIConfig: cniConfig,
Capabilities: caps,
}, nil
}
return nil, fmt.Errorf("no valid networks found in %s", confDir)
}
plugin.setDefaultNetwork

将上面获取到的cniNetwork结构体赋值给cniNetworkPlugin结构体的defaultNetwork属性。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) setDefaultNetwork(n *cniNetwork) {
plugin.Lock()
defer plugin.Unlock()
plugin.defaultNetwork = n
}

3.2 network.InitNetworkPlugin

network.InitNetworkPlugin()主要作用:根据networkPluginName的值(对应kubelet启动参数--network-plugin),选择相应的网络插件,调用其Init()方法,做网络插件的初始化操作。

// pkg/kubelet/dockershim/network/plugins.go
// InitNetworkPlugin inits the plugin that matches networkPluginName. Plugins must have unique names.
func InitNetworkPlugin(plugins []NetworkPlugin, networkPluginName string, host Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) (NetworkPlugin, error) {
if networkPluginName == "" {
// default to the no_op plugin
plug := &NoopNetworkPlugin{}
plug.Sysctl = utilsysctl.New()
if err := plug.Init(host, hairpinMode, nonMasqueradeCIDR, mtu); err != nil {
return nil, err
}
return plug, nil
} pluginMap := map[string]NetworkPlugin{} allErrs := []error{}
for _, plugin := range plugins {
name := plugin.Name()
if errs := validation.IsQualifiedName(name); len(errs) != 0 {
allErrs = append(allErrs, fmt.Errorf("network plugin has invalid name: %q: %s", name, strings.Join(errs, ";")))
continue
} if _, found := pluginMap[name]; found {
allErrs = append(allErrs, fmt.Errorf("network plugin %q was registered more than once", name))
continue
}
pluginMap[name] = plugin
} chosenPlugin := pluginMap[networkPluginName]
if chosenPlugin != nil {
err := chosenPlugin.Init(host, hairpinMode, nonMasqueradeCIDR, mtu)
if err != nil {
allErrs = append(allErrs, fmt.Errorf("network plugin %q failed init: %v", networkPluginName, err))
} else {
klog.V(1).Infof("Loaded network plugin %q", networkPluginName)
}
} else {
allErrs = append(allErrs, fmt.Errorf("network plugin %q not found", networkPluginName))
} return chosenPlugin, utilerrors.NewAggregate(allErrs)
}
chosenPlugin.Init()

当kubelet启动参数--network-plugin的值配置为cni时,会调用到cniNetworkPluginInit()方法,代码如下。

启动一个goroutine,每隔5秒,调用一次plugin.syncNetworkConfig。再来回忆一下plugin.syncNetworkConfig()的作用:根据kubelet启动参数配置,去对应的cni conf文件夹下寻找cni配置文件,返回包含cni信息的cniNetwork结构体,赋值给cniNetworkPlugin结构体的defaultNetwork属性,从而达到cni conf以及bin更新后,kubelet也能感知并更新cniNetworkPlugin结构体的效果。

此处也可以看出该goroutine存在的意义,让cni的配置文件以及可执行文件等可以热更新,而无需重启kubelet。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) Init(host network.Host, hairpinMode kubeletconfig.HairpinMode, nonMasqueradeCIDR string, mtu int) error {
err := plugin.platformInit()
if err != nil {
return err
} plugin.host = host plugin.syncNetworkConfig() // start a goroutine to sync network config from confDir periodically to detect network config updates in every 5 seconds
go wait.Forever(plugin.syncNetworkConfig, defaultSyncConfigPeriod) return nil
}

plugin.platformInit()只是检查了下是否有nsenter,没有做其他操作。

// pkg/kubelet/dockershim/network/cni/cni_others.go
func (plugin *cniNetworkPlugin) platformInit() error {
var err error
plugin.nsenterPath, err = plugin.execer.LookPath("nsenter")
if err != nil {
return err
}
return nil
}

4.CNI构建pod网络分析

kubelet创建pod时,通过CRI创建并启动pod sandbox,然后CRI会调用CNI网络插件构建pod网络。

kubelet中CNI构建pod网络的方法是:pkg/kubelet/network/cni/cni.go-SetUpPod

其中SetUpPod方法的调用链如下(只列出了关键部分):

main (cmd/kubelet/kubelet.go)

...

-> klet.syncPod(pkg/kubelet/kubelet.go)

-> kl.containerRuntime.SyncPod(pkg/kubelet/kubelet.go)

-> m.createPodSandbox(pkg/kubelet/kuberuntime/kuberuntime_manager.go)

-> m.runtimeService.RunPodSandbox (pkg/kubelet/kuberuntime/kuberuntime_sandbox.go)

-> ds.network.SetUpPod(pkg/kubelet/dockershim/docker_sandbox.go)

-> pm.plugin.SetUpPod(pkg/kubelet/dockershim/network/plugins.go)

-> SetUpPod(pkg/kubelet/dockershim/network/cni/cni.go)

下面的代码只是列出来看一下关键方法cniNetworkPlugin.SetUpPod()的调用链,不做具体分析。

// pkg/kubelet/kuberuntime/kuberuntime_manager.go
func (m *kubeGenericRuntimeManager) SyncPod(pod *v1.Pod, podStatus *kubecontainer.PodStatus, pullSecrets []v1.Secret, backOff *flowcontrol.Backoff) (result kubecontainer.PodSyncResult) {
...
podSandboxID, msg, err = m.createPodSandbox(pod, podContainerChanges.Attempt)
...
}
// pkg/kubelet/kuberuntime/kuberuntime_sandbox.go
// createPodSandbox creates a pod sandbox and returns (podSandBoxID, message, error).
func (m *kubeGenericRuntimeManager) createPodSandbox(pod *v1.Pod, attempt uint32) (string, string, error) {
...
podSandBoxID, err := m.runtimeService.RunPodSandbox(podSandboxConfig, runtimeHandler)
...
}

RunPodSandbox方法中可以看到,是先创建pod sandbox,然后启动pod sandbox,然后才是给该pod sandbox构建网络。

// pkg/kubelet/dockershim/docker_sandbox.go
func (ds *dockerService) RunPodSandbox(ctx context.Context, r *runtimeapi.RunPodSandboxRequest) (*runtimeapi.RunPodSandboxResponse, error) {
...
createResp, err := ds.client.CreateContainer(*createConfig)
...
err = ds.client.StartContainer(createResp.ID)
...
err = ds.network.SetUpPod(config.GetMetadata().Namespace, config.GetMetadata().Name, cID, config.Annotations, networkOptions)
...
}

PluginManager.SetUpPod方法中可以看到,调用了pm.plugin.SetUpPod,前面介绍cni初始化的时候讲过相关赋值初始化操作,这里会调用到cniNetworkPluginSetUpPod方法。

// pkg/kubelet/dockershim/network/plugins.go
func (pm *PluginManager) SetUpPod(podNamespace, podName string, id kubecontainer.ContainerID, annotations, options map[string]string) error {
defer recordOperation("set_up_pod", time.Now())
fullPodName := kubecontainer.BuildPodFullName(podName, podNamespace)
pm.podLock(fullPodName).Lock()
defer pm.podUnlock(fullPodName) klog.V(3).Infof("Calling network plugin %s to set up pod %q", pm.plugin.Name(), fullPodName)
if err := pm.plugin.SetUpPod(podNamespace, podName, id, annotations, options); err != nil {
return fmt.Errorf("networkPlugin %s failed to set up pod %q network: %v", pm.plugin.Name(), fullPodName, err)
} return nil
}

cniNetworkPlugin.SetUpPod

cniNetworkPlugin.SetUpPod方法作用cni网络插件构建pod网络的调用入口。其主要逻辑为:

(1)调用plugin.checkInitialized():检查网络插件是否已经初始化完成;

(2)调用plugin.host.GetNetNS():获取容器网络命名空间路径,格式/proc/${容器PID}/ns/net

(3)调用context.WithTimeout():设置调用cni网络插件的超时时间;

(3)调用plugin.addToNetwork():如果是linux环境,则调用cni网络插件,给pod构建回环网络;

(4)调用plugin.addToNetwork():调用cni网络插件,给pod构建默认网络。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) SetUpPod(namespace string, name string, id kubecontainer.ContainerID, annotations, options map[string]string) error {
if err := plugin.checkInitialized(); err != nil {
return err
}
netnsPath, err := plugin.host.GetNetNS(id.ID)
if err != nil {
return fmt.Errorf("CNI failed to retrieve network namespace path: %v", err)
} // Todo get the timeout from parent ctx
cniTimeoutCtx, cancelFunc := context.WithTimeout(context.Background(), network.CNITimeoutSec*time.Second)
defer cancelFunc()
// Windows doesn't have loNetwork. It comes only with Linux
if plugin.loNetwork != nil {
if _, err = plugin.addToNetwork(cniTimeoutCtx, plugin.loNetwork, name, namespace, id, netnsPath, annotations, options); err != nil {
return err
}
} _, err = plugin.addToNetwork(cniTimeoutCtx, plugin.getDefaultNetwork(), name, namespace, id, netnsPath, annotations, options)
return err
}
plugin.addToNetwork

plugin.addToNetwork方法的作用就是调用cni网络插件,给pod构建指定类型的网络,其主要逻辑为:

(1)调用plugin.buildCNIRuntimeConf():构建调用cni网络插件的配置;

(2)调用cniNet.AddNetworkList():调用cni网络插件,进行网络构建。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) addToNetwork(ctx context.Context, network *cniNetwork, podName string, podNamespace string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations, options map[string]string) (cnitypes.Result, error) {
rt, err := plugin.buildCNIRuntimeConf(podName, podNamespace, podSandboxID, podNetnsPath, annotations, options)
if err != nil {
klog.Errorf("Error adding network when building cni runtime conf: %v", err)
return nil, err
} pdesc := podDesc(podNamespace, podName, podSandboxID)
netConf, cniNet := network.NetworkConfig, network.CNIConfig
klog.V(4).Infof("Adding %s to network %s/%s netns %q", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, podNetnsPath)
res, err := cniNet.AddNetworkList(ctx, netConf, rt)
if err != nil {
klog.Errorf("Error adding %s to network %s/%s: %v", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, err)
return nil, err
}
klog.V(4).Infof("Added %s to network %s: %v", pdesc, netConf.Name, res)
return res, nil
}
cniNet.AddNetworkList

AddNetworkList方法中主要是调用了addNetwork方法,所以来看下addNetwork方法的逻辑:

(1)调用c.exec.FindInPath():拼接出cni网络插件可执行文件的绝对路径;

(2)调用buildOneConfig():构建配置;

(3)调用c.args():构建调用cni网络插件的参数;

(4)调用invoke.ExecPluginWithResult():调用cni网络插件进行pod网络的构建操作。

// vendor/github.com/containernetworking/cni/libcni/api.go
func (c *CNIConfig) AddNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) (types.Result, error) {
var err error
var result types.Result
for _, net := range list.Plugins {
result, err = c.addNetwork(ctx, list.Name, list.CNIVersion, net, result, rt)
if err != nil {
return nil, err
}
} if err = setCachedResult(result, list.Name, rt); err != nil {
return nil, fmt.Errorf("failed to set network %q cached result: %v", list.Name, err)
} return result, nil
} func (c *CNIConfig) addNetwork(ctx context.Context, name, cniVersion string, net *NetworkConfig, prevResult types.Result, rt *RuntimeConf) (types.Result, error) {
c.ensureExec()
pluginPath, err := c.exec.FindInPath(net.Network.Type, c.Path)
if err != nil {
return nil, err
} newConf, err := buildOneConfig(name, cniVersion, net, prevResult, rt)
if err != nil {
return nil, err
} return invoke.ExecPluginWithResult(ctx, pluginPath, newConf.Bytes, c.args("ADD", rt), c.exec)
}
c.args

c.args方法作用是构建调用cni网络插件可执行文件时的参数。

从代码中可以看出,参数有Command(命令,Add代表构建网络,Del代表销毁网络)、ContainerID(容器ID)、NetNS(容器网络命名空间路径)、IfName(Interface Name即网络接口名称)、PluginArgs(其他参数如pod名称、pod命名空间等)等。

// vendor/github.com/containernetworking/cni/libcni/api.go
func (c *CNIConfig) args(action string, rt *RuntimeConf) *invoke.Args {
return &invoke.Args{
Command: action,
ContainerID: rt.ContainerID,
NetNS: rt.NetNS,
PluginArgs: rt.Args,
IfName: rt.IfName,
Path: strings.Join(c.Path, string(os.PathListSeparator)),
}
}
invoke.ExecPluginWithResult

invoke.ExecPluginWithResult主要是将调用参数变成env,然后调用cni网络插件可执行文件,并获取返回结果。

func ExecPluginWithResult(ctx context.Context, pluginPath string, netconf []byte, args CNIArgs, exec Exec) (types.Result, error) {
if exec == nil {
exec = defaultExec
} stdoutBytes, err := exec.ExecPlugin(ctx, pluginPath, netconf, args.AsEnv())
if err != nil {
return nil, err
} // Plugin must return result in same version as specified in netconf
versionDecoder := &version.ConfigDecoder{}
confVersion, err := versionDecoder.Decode(netconf)
if err != nil {
return nil, err
} return version.NewResult(confVersion, stdoutBytes)
}

5.CNI销毁pod网络分析

kubelet删除pod时,CRI会调用CNI网络插件销毁pod网络。

kubelet中CNI销毁pod网络的方法是:pkg/kubelet/network/cni/cni.go-TearDownPod

其中TearDownPod方法的调用链如下(只列出了关键部分):

main (cmd/kubelet/kubelet.go)

...

-> m.runtimeService.StopPodSandbox (pkg/kubelet/kuberuntime/kuberuntime_sandbox.go)

-> ds.network.TearDownPod(pkg/kubelet/dockershim/docker_sandbox.go)

-> pm.plugin.TearDownPod(pkg/kubelet/dockershim/network/plugins.go)

-> TearDownPod(pkg/kubelet/dockershim/network/cni/cni.go)

下面的代码只是列出来看一下关键方法cniNetworkPlugin.TearDownPod()的调用链,不做具体分析。

StopPodSandbox方法中可以看到,会先销毁pod网络,然后停止pod sandbox的运行,但是这两个操作中的任何一个发生错误,kubelet都会继续进行重试,直到成功为止,所以对这两个操作成功的顺序并没有严格的要求(删除pod sandbox的操作由kubelet gc去完成)。

// pkg/kubelet/dockershim/docker_sandbox.go
func (ds *dockerService) StopPodSandbox(ctx context.Context, r *runtimeapi.StopPodSandboxRequest) (*runtimeapi.StopPodSandboxResponse, error) {
...
// WARNING: The following operations made the following assumption:
// 1. kubelet will retry on any error returned by StopPodSandbox.
// 2. tearing down network and stopping sandbox container can succeed in any sequence.
// This depends on the implementation detail of network plugin and proper error handling.
// For kubenet, if tearing down network failed and sandbox container is stopped, kubelet
// will retry. On retry, kubenet will not be able to retrieve network namespace of the sandbox
// since it is stopped. With empty network namespcae, CNI bridge plugin will conduct best
// effort clean up and will not return error.
errList := []error{}
ready, ok := ds.getNetworkReady(podSandboxID)
if !hostNetwork && (ready || !ok) {
// Only tear down the pod network if we haven't done so already
cID := kubecontainer.BuildContainerID(runtimeName, podSandboxID)
err := ds.network.TearDownPod(namespace, name, cID)
if err == nil {
ds.setNetworkReady(podSandboxID, false)
} else {
errList = append(errList, err)
}
}
if err := ds.client.StopContainer(podSandboxID, defaultSandboxGracePeriod); err != nil {
// Do not return error if the container does not exist
if !libdocker.IsContainerNotFoundError(err) {
klog.Errorf("Failed to stop sandbox %q: %v", podSandboxID, err)
errList = append(errList, err)
} else {
// remove the checkpoint for any sandbox that is not found in the runtime
ds.checkpointManager.RemoveCheckpoint(podSandboxID)
}
}
...
}

PluginManager.TearDownPod方法中可以看到,调用了pm.plugin.TearDownPod,前面介绍cni初始化的时候讲过相关赋值初始化操作,这里会调用到cniNetworkPluginTearDownPod方法。

// pkg/kubelet/dockershim/network/plugins.go
func (pm *PluginManager) TearDownPod(podNamespace, podName string, id kubecontainer.ContainerID) error {
defer recordOperation("tear_down_pod", time.Now())
fullPodName := kubecontainer.BuildPodFullName(podName, podNamespace)
pm.podLock(fullPodName).Lock()
defer pm.podUnlock(fullPodName) klog.V(3).Infof("Calling network plugin %s to tear down pod %q", pm.plugin.Name(), fullPodName)
if err := pm.plugin.TearDownPod(podNamespace, podName, id); err != nil {
return fmt.Errorf("networkPlugin %s failed to teardown pod %q network: %v", pm.plugin.Name(), fullPodName, err)
} return nil
}

cniNetworkPlugin.TearDownPod

cniNetworkPlugin.TearDownPod方法作用cni网络插件销毁pod网络的调用入口。其主要逻辑为:

(1)调用plugin.checkInitialized():检查网络插件是否已经初始化完成;

(2)调用plugin.host.GetNetNS():获取容器网络命名空间路径,格式/proc/${容器PID}/ns/net

(3)调用context.WithTimeout():设置调用cni网络插件的超时时间;

(3)调用plugin.deleteFromNetwork():如果是linux环境,则调用cni网络插件,销毁pod的回环网络;

(4)调用plugin.deleteFromNetwork():调用cni网络插件,销毁pod的默认网络。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) TearDownPod(namespace string, name string, id kubecontainer.ContainerID) error {
if err := plugin.checkInitialized(); err != nil {
return err
} // Lack of namespace should not be fatal on teardown
netnsPath, err := plugin.host.GetNetNS(id.ID)
if err != nil {
klog.Warningf("CNI failed to retrieve network namespace path: %v", err)
} // Todo get the timeout from parent ctx
cniTimeoutCtx, cancelFunc := context.WithTimeout(context.Background(), network.CNITimeoutSec*time.Second)
defer cancelFunc()
// Windows doesn't have loNetwork. It comes only with Linux
if plugin.loNetwork != nil {
// Loopback network deletion failure should not be fatal on teardown
if err := plugin.deleteFromNetwork(cniTimeoutCtx, plugin.loNetwork, name, namespace, id, netnsPath, nil); err != nil {
klog.Warningf("CNI failed to delete loopback network: %v", err)
}
} return plugin.deleteFromNetwork(cniTimeoutCtx, plugin.getDefaultNetwork(), name, namespace, id, netnsPath, nil)
}
plugin.deleteFromNetwork

plugin.deleteFromNetwork方法的作用就是调用cni网络插件,销毁pod指定类型的网络,其主要逻辑为:

(1)调用plugin.buildCNIRuntimeConf():构建调用cni网络插件的配置;

(2)调用cniNet.DelNetworkList():调用cni网络插件,进行pod网络销毁。

// pkg/kubelet/dockershim/network/cni/cni.go
func (plugin *cniNetworkPlugin) deleteFromNetwork(ctx context.Context, network *cniNetwork, podName string, podNamespace string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations map[string]string) error {
rt, err := plugin.buildCNIRuntimeConf(podName, podNamespace, podSandboxID, podNetnsPath, annotations, nil)
if err != nil {
klog.Errorf("Error deleting network when building cni runtime conf: %v", err)
return err
} pdesc := podDesc(podNamespace, podName, podSandboxID)
netConf, cniNet := network.NetworkConfig, network.CNIConfig
klog.V(4).Infof("Deleting %s from network %s/%s netns %q", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, podNetnsPath)
err = cniNet.DelNetworkList(ctx, netConf, rt)
// The pod may not get deleted successfully at the first time.
// Ignore "no such file or directory" error in case the network has already been deleted in previous attempts.
if err != nil && !strings.Contains(err.Error(), "no such file or directory") {
klog.Errorf("Error deleting %s from network %s/%s: %v", pdesc, netConf.Plugins[0].Network.Type, netConf.Name, err)
return err
}
klog.V(4).Infof("Deleted %s from network %s/%s", pdesc, netConf.Plugins[0].Network.Type, netConf.Name)
return nil
}
cniNet.DelNetworkList

DelNetworkList方法中主要是调用了addNetwork方法,所以来看下addNetwork方法的逻辑:

(1)调用c.exec.FindInPath():拼接出cni网络插件可执行文件的绝对路径;

(2)调用buildOneConfig():构建配置;

(3)调用c.args():构建调用cni网络插件的参数;

(4)调用invoke.ExecPluginWithResult():调用cni网络插件进行pod网络的销毁操作。

// vendor/github.com/containernetworking/cni/libcni/api.go
// DelNetworkList executes a sequence of plugins with the DEL command
func (c *CNIConfig) DelNetworkList(ctx context.Context, list *NetworkConfigList, rt *RuntimeConf) error {
var cachedResult types.Result // Cached result on DEL was added in CNI spec version 0.4.0 and higher
if gtet, err := version.GreaterThanOrEqualTo(list.CNIVersion, "0.4.0"); err != nil {
return err
} else if gtet {
cachedResult, err = getCachedResult(list.Name, list.CNIVersion, rt)
if err != nil {
return fmt.Errorf("failed to get network %q cached result: %v", list.Name, err)
}
} for i := len(list.Plugins) - 1; i >= 0; i-- {
net := list.Plugins[i]
if err := c.delNetwork(ctx, list.Name, list.CNIVersion, net, cachedResult, rt); err != nil {
return err
}
}
_ = delCachedResult(list.Name, rt) return nil
} func (c *CNIConfig) delNetwork(ctx context.Context, name, cniVersion string, net *NetworkConfig, prevResult types.Result, rt *RuntimeConf) error {
c.ensureExec()
pluginPath, err := c.exec.FindInPath(net.Network.Type, c.Path)
if err != nil {
return err
} newConf, err := buildOneConfig(name, cniVersion, net, prevResult, rt)
if err != nil {
return err
} return invoke.ExecPluginWithoutResult(ctx, pluginPath, newConf.Bytes, c.args("DEL", rt), c.exec)
}
c.args

c.args方法作用是构建调用cni网络插件可执行文件时的参数。

从代码中可以看出,参数有Command(命令,Add代表构建网络,Del代表销毁网络)、ContainerID(容器ID)、NetNS(容器网络命名空间路径)、IfName(Interface Name即网络接口名称)、PluginArgs(其他参数如pod名称、pod命名空间等)等。

// vendor/github.com/containernetworking/cni/libcni/api.go
func (c *CNIConfig) args(action string, rt *RuntimeConf) *invoke.Args {
return &invoke.Args{
Command: action,
ContainerID: rt.ContainerID,
NetNS: rt.NetNS,
PluginArgs: rt.Args,
IfName: rt.IfName,
Path: strings.Join(c.Path, string(os.PathListSeparator)),
}
}
invoke.ExecPluginWithResult

invoke.ExecPluginWithResult主要是将调用参数变成env,然后调用cni网络插件可执行文件,并获取返回结果。

func ExecPluginWithResult(ctx context.Context, pluginPath string, netconf []byte, args CNIArgs, exec Exec) (types.Result, error) {
if exec == nil {
exec = defaultExec
} stdoutBytes, err := exec.ExecPlugin(ctx, pluginPath, netconf, args.AsEnv())
if err != nil {
return nil, err
} // Plugin must return result in same version as specified in netconf
versionDecoder := &version.ConfigDecoder{}
confVersion, err := versionDecoder.Decode(netconf)
if err != nil {
return nil, err
} return version.NewResult(confVersion, stdoutBytes)
}

总结

CNI

CNI,全称是 Container Network Interface,即容器网络接口。

CNI是K8s 中标准的调用网络实现的接口。Kubelet 通过这个标准的接口来调用不同的网络插件以实现不同的网络配置方式。

CNI网络插件是一个可执行文件,是遵守容器网络接口(CNI)规范的网络插件。常见的 CNI网络插件包括 Calico、flannel、Terway、Weave Net等。

当kubelet选择使用CNI类型的网络插件时(通过kubelet启动参数指定),kubelet在创建pod、删除pod的时候,通过CRI调用CNI网络插件来做pod的构建网络和销毁网络等操作。

kubernetes/k8s CNI分析-容器网络接口分析

kubelet构建pod网络的大致过程

(1)kubelet先通过CRI创建pause容器(pod sandbox),生成network namespace;

(2)kubelet根据启动参数配置调用具体的网络插件如CNI网络插件;

(3)网络插件给pause容器(pod sandbox)配置网络;

(4)pod 中其他的容器都与pause容器(pod sandbox)共享网络。

kubelet组件CNI相关启动参数分析

(1)--network-plugin:指定要使用的网络插件类型,可选值cnikubenet"",默认为空串,代表Noop,即不配置网络插件(不构建pod网络)。此处配置值为cni时,即指定kubelet使用的网络插件类型为cni

(2)--cni-conf-dir:CNI 配置文件所在路径。默认值:/etc/cni/net.d

(3)--cni-bin-dir:CNI 插件的可执行文件所在路径,kubelet 将在此路径中查找 CNI 插件的可执行文件来执行pod的网络操作。默认值:/opt/cni/bin

kubelet中的CNI初始化

kubelet启动后,会根据启动参数中cni的相关参数,获取cni配置文件并初始化cni网络插件,待后续创建pod、删除pod时会调用SetUpPodTearDownPod方法来构建pod的网络、销毁pod的网络。同时,初始化时起了一个goroutine,定时探测cni的配置文件以及可执行文件,让其可以热更新。

CNI构建pod网络

kubelet创建pod时,通过CRI创建并启动pod sandbox,然后CRI会调用CNI网络插件构建pod网络。

kubelet中CNI构建pod网络的代码方法是:pkg/kubelet/network/cni/cni.go-SetUpPod

CNI销毁pod网络

kubelet删除pod时,CRI会调用CNI网络插件销毁pod网络。

kubelet中CNI销毁pod网络的方法是:pkg/kubelet/network/cni/cni.go-TearDownPod

上一篇:Docker、kubernetes、微服务、SpringBoot/Cloud...好乱!到底要不要学?


下一篇:C#中的问号