IP Tunnel one-to-many用法

2023-09-25 18:20:40

Linux IP Tunnel有IPIP、Gre、Sit，使用虚拟网络中常用的overlay技术，一般需要直接配置local 和 remote address，但是在一些SDN的虚拟网络中常常会存在大量的对端，这就需要配置很多Tunnel口。管理起来比较麻烦。

解决这个问题的一个办法，就是在配置IP Tunnel时，不指定remote address，而是在指向Tunnel口的路由中通过Nexthop指定remote address。这种tunnel在linux 内核代码的注解中叫NBMA Tunnel。

而更彻底一点，使用FallBack隧道，它没有进行隧道封装所需的源地址、目的地址、秘钥（GRE）、序列号等信息，即tunnel para为空的Tunnel设备。

这种设备在系统初始化的时候，如果内核加载了对应的tunnel模块后实际上已经创建好了，名称固定，GRE IPv4模块加载之后，默认创建三个设备，分别为gre0、gretap0和erspan0，IPIP隧道默认创建tunl0名字的设备，VTI隧道创建的默认设备名为ip_vti0。

## 默认看不到这些接口

#ip link

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000

    link/ether 52:54:00:cf:65:d1 brd ff:ff:ff:ff:ff:ff

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000

    link/ether 52:54:00:b6:4d:0d brd ff:ff:ff:ff:ff:ff

## 可是如果我们创建一个没有参数的tunnel口，又会提示已经存在

#ip link add ipipx type ipip 

RTNETLINK answers: File exists

## 我们再看的化，又能够看到了

#ip link

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000

    link/ether 52:54:00:cf:65:d1 brd ff:ff:ff:ff:ff:ff

3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000

    link/ether 52:54:00:b6:4d:0d brd ff:ff:ff:ff:ff:ff

18: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default qlen 1000

    link/ipip 0.0.0.0 brd 0.0.0.0

用法很简单，如果接口配置如下：
host1:
lo: 11.1.1.1/32 ; eth1: 192.168.122.11
host2:
lo: 12.1.1.1/32 ; eth1: 192.168.122.12
host3:
lo: 13.1.1.1/32 ; eth1: 192.168.122.13

那么，host1 需要两条到达 host2和host3的tunnel，我们在只使用默认的tunl0 FallBack隧道的情况下只需要配置两条路由（对端同样配置）：

12.1.1.1 via 192.168.122.12 dev tunl0 onlink

13.1.1.1 via 192.168.122.13 dev tunl0 onlink

注意这里需要指定onlink，因为下一跳和tunl0不是一个网络（网段），直接配置会出错。

执行 ping 12.1.1.1 -I 11.1.1.1 时，会查找到第一条路由，将报文送入路由的dev，即tunl0口，tunl0 口会将nexthop作为tunnel的外层dst ip

相关原理我们可以简单看一下代码：

IP tunnel的发送函数都会调用:

void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,

                    const struct iphdr *tnl_params, u8 protocol)

{

    struct ip_tunnel *tunnel = netdev_priv(dev);

    const struct iphdr *inner_iph;

    struct flowi4 fl4;

    u8    tos, ttl;

    __be16 df;        

    struct rtable *rt; /* Route to the other host */

    unsigned int max_headroom; /* The extra header space needed */

    __be32 dst;

    bool connected;

    inner_iph = (const struct iphdr *)skb_inner_network_header(skb);

    connected = (tunnel->parms.iph.daddr != 0);

    memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));

    dst = tnl_params->daddr;

    if (dst == 0) {           // ========= dst 为空

        /* NBMA tunnel */

        if (!skb_dst(skb)) {

            dev->stats.tx_fifo_errors++;

            goto tx_error;

        }

    if (skb->protocol == htons(ETH_P_IP)) {

        rt = skb_rtable(skb);                               // ========= dst 为空的时候，dst取路由的nexthop，这里的路由在报文送到tunnel口之前就已经查好挂在skb中了

        dst = rt_nexthop(rt, inner_iph->daddr);       

    }

#if IS_ENABLED(CONFIG_IPV6)

    else if (skb->protocol == htons(ETH_P_IPV6)) {

。。。。。。

    }

。。。。。。



    init_tunnel_flow(&fl4, protocol, dst, tnl_params->saddr,

    tunnel->parms.o_key, RT_TOS(tos), tunnel->parms.link);

    if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0)

        goto tx_error;

    //  ===============  这里，以及下面的 ip_route_output_key 函数会对fl4 赋值，其中的tunnel的外层 src ip，是根据nexthop 查找的tunnel 外层dst ip的路由对应的出接口的ip地址。

    rt = connected ? dst_cache_get_ip4(&tunnel->dst_cache, &fl4.saddr) :NULL;

    if (!rt) {

        rt = ip_route_output_key(tunnel->net, &fl4);

。。。。。。

    }

    // ======== 找到了local 和 remote ip， iptunnel_xmit根据这些信息封装外层ip头，发出去

     iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl, df, !net_eq(tunnel->net, dev_net(dev)));

    return;

。。。。。。

}

其实不仅是ip tunnel，其它类型的tunnel也有类似机制，比如vxlan，它是一种二层tunnel，转发的一句不是ip，而是mac地址，也可以不指定remote address，而是用过二层的fdb表项，指定remote，从而使用一个tunnel口完成到达多个对端的效果。

码农公寓

相关文章