CESSNA: 弹性边缘计算
本文为SIGCOMM 2018 Workshop (Mobile Edge Communications, MECOMM)论文。
笔者翻译了该论文。由于时间仓促,且笔者英文能力有限,错误之处在所难免;欢迎读者批评指正。
本文及翻译版本仅用于学习使用。如果有任何不当,请联系笔者删除。
本文包含5位共同作者,UC Berkeley的Yotam Harchol、Aisha Mushtaq和James McCauley,NYU和ICSI的Aurojit Panda以及UC Berkeley和ICSI的Scott Shenker。
ABSTRACT (摘要)
The introduction of computational resources at the network edge has moved us from a Client-Server model to a Client-Edge-Server model. By offloading computation from clients and/or servers, this approach can reduce response latency, backbone bandwidth, and computational requirements on clients. While this is an attractive paradigm for many applications, particularly 5G mobile networks and IoT devices, it raises the question of how one can design such a client-edge-server system to tolerate edge failures and client mobility. The key challenge is to ensure correctness when the edge processing is stateful (so the processing depends on state it has previously seen from the client and/or server). In this paper we propose an initial design for meeting this challenge called Client-Edge-Server for Stateful Network Applications (CESSNA).
网络边缘处计算资源的引入将我们由客户端-服务器模式转向客户端-边缘-服务器模式。通过卸载客户端和/或服务器的计算,这一方法可以降低响应延迟、骨干带宽和客户端的计算需求。尽管这对许多应用来说是具有吸引力的结构(特别是5G移动网络和IoT设备),它提出如下问题:如何设计这种容忍边缘故障和客户端移动性的客户端-边缘-服务器系统。关键挑战是在边缘处理是有状态时(处理依赖于客户端和/或服务器的前期状态)保证正确性。本文提出一种满足这一挑战的初始设计,称为为有状态网络应用的客户端-边缘-服务器(CESSNA)。
1 INTRODUCTION (摘要)
The recent introduction of compute and storage resources at the network edge allows service providers to offer lower latency and higher throughput to geographically nearby content and computation, and this in turn allows applications such as sensors and IoT devices to reduce their upstream bandwidth requirements by pre-processing data at the edge.
最近,网络边缘处计算和存储资源的引入使得服务提供者可以为地理位置接近的内容和计算提供更低延迟和更高带宽,同时这相应地使得应用(如传感器和IoT设备)可以通过在边缘处预处理数据降低上行带宽需求。
Network applications have long been based on the client-server paradigm, where a stateful server (or set of servers) provides services to multiple clients. While consistency issues of the client-server model have been thoroughly studied, the addition of a stateful edge processor in between the two complicates the consistency problem.
网络应用长期基于客户端-服务器模式;这种模式下,一个状态服务端(或服务端集合)为多个客户端提供服务。尽管客户端-服务器模型的一致性已经被深入的研究,服务端和客户端之间的额外的有状态边缘处理器使一致性问题复杂化。
To illustrate the problem, consider a simple example of an edge that serves as a packet counter: The server is not interested in getting every packet from the client, but only in the total number of packets the client has emitted. The edge is thus holding the counter value and if the edge fails, even if another edge is brought up immediately, the state is lost and the server never gets an accurate count of the emitted messages.
为了解释这个问题,考虑如下简单示例:边缘作为数据包计数服务。服务器不对获取客户端的每个数据包感兴趣,只需要得到客户端发送的数据包总量。因此,边缘维持计数值;如果边缘故障,即使另一个边缘立即启动,状态也会丢失,服务器无法获取发送信息的准确计数。
This is merely an illustration of a general problem. In this paper we frst formally articulate the edge consistency problem, and then propose a general purpose framework for client-edge-server applications that provides strong consistency guarantees as described later in this paper.
这只是一个普遍问题的示例。本文中,我们首先形式化地表示边缘一致性问题,然后提出一种提供强一致性保证的通用客户端-边缘-服务器应用框架(见本文后续章节)。
Our computational model, elaborated in Section 2, is of a computationally-capable edge that allows offloading of computation from the server, the client, or both (and can receive messages from both clients and servers). We assume that the edge is stateful but keeps state on a per-client basis: that is, a new edge process (or set of processes) is instantiated to handle each client-server session. Thus, the consistency we wish to provide for the edge would guarantee that the edge’s state always correctly reflects both client’s and server’s inputs (relative to this session) for as long as this session is active. Moreover, we only care about preserving consistency in the case of an edge failure; if the client or server fails, we assume the session terminates implicitly. However, note that nothing in our design precludes the use of replication or other techniques to increase the resiliency of the server (or even the client).
我们的计算模型(在第二部分详细说明)是一种具备计算能力的边缘,可以卸载服务器、客户端或者两者的计算(也允许接收来自客户端和服务器的信息)。我们假设边缘是有状态的,但是以每客户端为基础保持状态;即,实例化一个新的边缘过程(或一组过程)处理每个客户端-服务器对话。因此,只要会话是有效的,我们为边缘提供的一致性可以保证边缘的状态总是正确地反映客户端和服务器的输入(相对于此对话)。此外,我们只关心在边缘故障的情形下维护一致性;如果客户端或服务器故障,我们假设会话隐式地结束。然而,注意到我们的设计并不阻止复制或其它加强服务器(甚至是客户端)弹性的技术的使用。
Given this model, we would want a framework that will provide client-edge-server applications with these consistency guarantees, even though the edge may arbitrarily fail, and clients may arbitrarily move between edges. Our design aims at no or minimal modifcation to the source code of existing applications.
给定这种模型,我们期望一种可以提供具备一致性保证的客户端-边缘-服务器应用的框架,即使边缘可能任意失效;并且,客户端可能在边缘间任意移动。我们的设计目标是没有或者很少对现有的应用源代码的修改。
Our proposed design is built in two layers, for two different types of edge recovery: local recovery, and remote recovery. Local recovery can be used when the failed edge and the recovered edge are physically close to each other, for example, under the same ToR switch. In this case we use a mechanism similar to the one used in [12], though we strip it down to a much-simplifed approach, which is enough due to the client-specific nature of edge state that we consider.
我们的设计包含两层,用于两种不同类型的边缘恢复:局部恢复和远程恢复。局部恢复用于故障边缘和恢复边缘在物理上彼此接近时(如,在同一个ToR交换机之下)。在这种情形下,我们使用一种类似于[12]中的机制,但我们将其简化为一种更为简单的方法。鉴于我们考虑的边缘状态的客户端制定特性,这种简化的方法是足够的。
Remote recovery refers to the case when the two edges are far from each other, and can also apply to the client mobility case. In this case, the client and the server cooperate with the newly provisioned edge to quickly restore its state and continue the session from where it has stopped.
远程恢复指两个边缘彼此距离较远的情形,同样可以用于客户端移动的情形。在这种情形下,客户端和服务器与新提供的边缘协作,从而快速恢复状态,并从上次会话停止处继续进行。
We make the following observations when coming to design such a framework:
在我们的框架设计中我们做出如下观察:
- The edge receives messages from two different parties, and its state may be dependent on the exact ordering of these messages. Thus, for any process of reconstruction of the state, we must have this ordering available.
- 边缘从两个不同部分接收消息,其状态可能依赖于这些消息的精确顺序。因此,对于重构状态的任意进程,我们必须使得这一顺序可用。
- In order to allow such a reconstruction of the edge’s state, each endpoint (client, server) should keep a copy of the messages it sent to the edge, at least until the edge can guarantee these copies are not needed anymore.
- 为了允许边缘状态重构,每个端点(客户端,服务器)需要保存一份他们发送到边缘的消息的副本,至少需要保持到边缘可以确认这些副本不再需要。
- Since the edge is not reliable, the ordering of incoming messages must be stored elsewhere. One option is to send it to either the client or the server. Since each outgoing message from the edge, to either the client or the server, may indicate some state change at the edge, each such message should be accompanied with the incremental addition to the edge’s total incoming message ordering. Another option is to store it remotely. We intermix the two options in our design as we describe later.
- 因为边缘是不可靠的,输入消息的顺序必须保存在其它地方。一种选择是将顺序发送到客户端或服务器。由于边缘发送出去的每个消息(到客户端或服务器)都可能表明边缘处的状态改变,每条消息都需要带有边缘总输入消息序列的增量加值。另一种选择是在远程存储。我们在设计中混杂使用者两种选择,见下文。
Of course, actually storing all outgoing messages forever may be prohibitive for most applications in terms of memory, and may have signifcant and negative performance implications in cases of failure. Thus, we use periodic snapshots in order to limit the size of the required buffers, and reduce the time for session reconstruction. We describe the design in detail in Section 3.
当然,永久存储所有的输出消息可能限制大部分应用(就内存而言),并且在故障情形下可能带来显著的负面性能影响。因此,我们使用周期性快照以限制所需的缓存大小,同时降低会话重构时间。我们在第三部分讨论设计详情。
2 COMPUTATIONAL MODEL (计算模型)
Having computation at the edge allows one to (i) offload computation from the client (so it can be weak and/or low-powered), and/or (ii) offload computation from the server (so that responses to the client can have lower latency), and/or (iii) reduce bandwidth to the server (by doing preprocessing at the edge). Thus, the edge can be seen as extending the power of the client and/or extending the reach of the server. As a result, one cannot think of the edge as merely splitting the client code, or merely splitting the server code, but could involve a bit of both.
边缘处计算使得(i)可以卸载客户端的计算(因此,客户端可以是较弱功能和/或低功耗的),和/或(ii)卸载服务器的计算(使得对客户端的响应是低延迟的),和/或(iii)降低到客户端的带宽(在边缘进行预处理)。因此,边缘可以看做客户端的功能扩展,和/或服务器可达性的扩展。其结果是,可以认为边缘仅仅是客户端代码的划分,或者仅仅是客户端代码的划分,但也可能是同时包含客户端和服务器的一部分。
We assume the purpose of a system is to process inputs coming from clients and the server. This processing can result in packets being emitted to the server, or to the client (or both). Thus, the logical model is one of clients sending input to the system, and perhaps receiving responses, or updates from the edge, presumably based on input from the server.
我们假设系统的目的是处理来自客户端和服务器的输入。这种处理可能导致数据包被发送到服务器或客户端(或者两者)。因此,逻辑模型是一个客户端发送输入到系统,并且可能接收来自边缘的响应或更新(可能基于来自服务器的输入)。
Servers. We assume that clients communicate with the system by logically sending messages to a server. This is done via the edge. The backend system handles all issues of replication and recovery for these servers and any other backend processing. There are many options here, and we leave this up to the application designer. Similar to the current client-server model, a single server can service several (or all) clients, can store data in a database, and allow clients to coordinate among each other. We place no restrictions on how clients communicate and coordinate through a server, and only require that the server be able to play back messages.
服务器。我们假设客户端通过逻辑上发送消息给客户端来与系统通信。这通过边缘完成。后端系统处理所有的复制事务,为这些服务器和任意其它后端处理执行恢复。这里存在很多选项,我们将这些选择留给应用设计者决断。类似于当前的客户端-服务器模型,单个服务器可以服务于许多(或者所有)的客户端,可以保存数据到数据库,并允许客户端之间彼此协同。我们不对客户端如何通过服务器通信和协同做任何限制,只要求服务器可以回放消息。
Clients. We assume that clients do not depend on detailed timing information between messages or on latency of message response. Beyond this we allow clients to perform arbitrary processing, and to depend on arbitrary input including input from external sensors, video cameras, game controllers, etc. Finally, we assume that clients can be mobile, and as a result they might connect to different edges over time. Thus, applications may not assume that a particular set of clients is connected to a common edge. For applications where such aggregation is desirable and where clients are immobile (such as applications which aggregate inputs from multiple sensors [1]), we treat the set of clients whose input is being aggregated as a single logical client.
客户端。我们假设客户端不依赖消息间以及消息响应延迟间的详细时序信息。在这之上,我们允许客户端执行任意处理,可以依赖于任意输入(包括来自外部传感器的输入,视频摄像头,游戏控制器等)。最后,我们假设客户端可以是移动设备,它们可以在不同的时间连接到不同的边缘。因此,应用不应该假设是特定的客户端连接到同一边缘。对于期望这种聚合以及客户端不会移动的应用(例如,聚合来自多种传感器的输入的应用[1]),我们将输入聚合的客户端集合作为一个逻辑客户端。
The Edge. Clients send a series of messages to the edge, which in turn can send messages to the client and messages to the backend server. The edge also receives messages from the backend server, and can use those in its processing of messages (for example, these messages may cause state changes in the edge). We assume that the edge application correctly and consistently handles inputs during such state changes in the absence of failures (correct behavior in the presence of failures will then be provided by CESSNA automatically). In particular, we require that the edge application be designed so that state updates are atomic and a single message (or packet) is processed using only one version of the state.
边缘。客户端发送一系列消息给边缘,边缘接着可以发送消息给客户端和消息给后端服务器。 边缘也接收来自后端服务器的消息,并且可以在它的消息处理中使用这些来自后端的消息(例如,这些消息可能导致边缘的状态改变)。不存在故障时,我们假设边缘应用在状态改变期间正确且一致性地处理输入(故障期间的正确行为由CESSNA自动提供)。具体地,我们要求边缘应用的设计中,状态更新是原子的,并且单个消息(或数据包)只使用状态的一个版本处理。
2.1 Problem Statement (问题描述)
Given the above assumptions, we would like to design a consistency framework for a stateful edge, such that in case of a failure of an edge instance, another instance can be provisioned and the state is correctly recovered.
给定上述假设,我们期望为有状态边缘设计一种一致性框架,使得在边缘故障情形下,可以提供另外的边缘实例,且状态可以被正确恢复。
The correctness of the recovery process is defned such that the recovered edge continues to process input messages and emit output messages exactly the same as the original edge would have. If there was more than one plausible outcome for the original edge at the time of the failure, the outcome of the recovered edge must be one of these plausible outcomes.
恢复过程的正确性定义为:恢复的边缘可以继续处理输入消息,并发送输出消息,这些消息与故障边缘应该处理和发送的消息完全一致。如果原始边缘在故障时有多个合理的结果,恢复的边缘的结果也必须是其中某个合理的结果。
3 FRAMEWORK DESIGN (框架设计)
The design of our framework is illustrated in Figure 1. In this section we discuss the design of each component in the figure.
我们框架的设计如图1所示。本节,我们讨论图中每个组件的设计。
图1:我们框架的设计。
3.1 Edge Design (边缘设计)
The main contribution of this work is a design for an edge runtime environment that allows seamless local and remote failover of edge applications, while preserving the correctness of the state at the edge, such that the entire failover process is transparent to the client and the server applications.
本文的主要贡献是设计了一种边缘运行时环境,允许边缘应用无缝的局部和远程故障转移,同时维护边缘的正确状态;这样,整个故障转移过程对客户端和服务器应用来说是透明的。
3.1.1 Runtime Engine (运行时引擎)
Edge applications are software, or more precisely, processes. They can run in virtual machines or containers, with some hypervisor underneath. Our design does not require any specific runtime engine or hypervisor. We only require it to provide the following features:
边缘应用是软件,或者更精确地说是进程。它们可以运行在虚拟机或者容器中,底层是某种虚拟机管理软件。我们的设计不要求为任意指定的运行时引擎或者虚拟机管理软件。我们只需要它提供如下特征:
- Generic software: We would like the edge to be able to run generic software as much as possible, without limiting it to specific programming languages or uncommon libraries.
- 通用软件:我们期望边缘可以运行越多的通用软件越好,不限制其为特定的编程语言或不常用的库。
- Efciency: The runtime engine should allow efficient running of multiple applications on top of it, in parallel.
- 高效:运行时引擎应该允许其上多个应用的高效并行运行。
- Snapshotting: The runtime engine should be able to take snapshots of running instances and to restore an instance given such a snapshot.
- 快照:运行时引擎应该能够获取运行时实例的快照,并且能够根据快照恢复实例。
Examples for existing products that provide these features are Docker [6], KVM [3], VMware [13], etc.
支持上述特性的现有产品包括Docker[6], KVM[3]和VMware[13]等。
3.1.2 Edge Storage (边缘存储)
Each edge runtime has some shared storage capabilities. This storage may be used by multiple instances of the same edge application when multiple client sessions benefit from sharing data – a prime example of this being content caching. The storage can be shared with instances on the same physical server, same rack, etc.
每个边缘运行时包含一些共享存储的能力。当多个客户端会话可以从共享数据受益时,同一边缘应用的多个实例可以使用存储 – 一个主要实例是内容缓存。存储可以由同一物理服务器、同一机架等共享。
Note that the shared storage must not be used for state-related storage. The state of each instance must be managed in memory for each instance, as snapshots do not include data from the shared storage. The state of an edge application must not be dependent on the presence, or the lack of presence, of a specifc item in the shared storage.
注意共享存储不能用于状态相关的存储。因为快照不包括共享存储中的数据,每个实例的状态必须在实例的内存中管理。边缘应用的状态不应该依赖于共享存储中特定项的存在或者不存在。
3.2 Edge Recovery (边缘恢复)
Our recovery model assumes that the edge application is fully stateful and that its state is a function of both the client and the server. If the edge application is stateless, or if its state is only a function of one side of the communication (client or server), then the recovery model can be much simplifed. We discuss these simplifcations after presenting the fully stateful model.
我们的恢复模型假设边缘应用具有全状态,并且其状态是客户端和服务器的函数。如果边缘应用是无状态的,或者如果它的状态只是通信一端(客户端或服务器)的状态的函数,那么恢复模型更为简单。我们在给出全状态模型后讨论这些简化方法。
The recovery model has two layers: local recovery, which refers to the case when the replacement edge has relatively fast storage shared with the failed edge, and remote recovery, which refers to the case when the two edges are distant from each other. We make this distinction since the local case can be solved using existing techniques as we discuss below, while for the remote case a more complex model is required.
恢复模型包含2层:局部恢复,指的是替换边缘和故障边缘具有相对快速的共享存储;远程恢复,指的是两个边缘的距离较远。我们做出如此区分是因为局部恢复可以使用我们下面将要讨论的现有技术解决,而远程恢复需要更复杂的模型。
3.2.1 Local Recovery (局部恢复)
In order to provide fast local failover, we use a simplifed version of the technique presented in FTMB [12]. In FTMB, the framework is able to recover arbitrary network functions by logging and sequencing their incoming and outgoing packets, and their nondeterministic decisions, together with periodic snapshots.
为了提供快速的局部故障转移,我们使用FTMB[12]中给出的技术的简化版本。FTMB中,框架能够通过日志和输入输出数据包序列恢复任意网络功能,同时使用周期性快照可以恢复不确定决断。
In our case the situation is simpler: we treat sessions as logically independent entities, so we do not need to account for sequencing information across flows or sessions. Furthermore, we assume that the edge application is deterministic and produces consistent output given identical inputs, and thus do not log outgoing packets.
在我们的案例下,情形更为简单:我们将会话看作逻辑上独立的实体,因为我们不需要考虑跨流或会话的序列信息。此外,我们假设边缘应用是确定的,给定输入产生一致性输出,因此不需要记录输出数据包。
Because of the reasons above we also do not need to pause the application while taking a snapshot. If the underlying runtime supports live snapshotting, we can simply store the last position in the incoming packets log, then take a snapshot, and store the two together. Upon recovery, our framework would restore the snapshot and replay packets from the stored position. Since we assume traffic is sent over TCP, the application will ignore packets it has seen since the time the log position was logged until the snapshot was actually taken, as such packets simply appear to be delayed duplicates.
鉴于以上原因,我们不需要在执行快照时暂停应用。如果底层运行时支持现场快照,我们可以简单地存储数据包日志的最后位置,然后执行快照,并将两者存储在一起。恢复时,我们的框架恢复快照并从存储的位置开始重放数据包。由于我们假设数据流通过TCP传输,应用将忽略自日志位置被记录到快照实际执行期间的数据包,因为这些数据包只是简单地被延迟复制。
In order to store the snapshots and the log, the system should have some local storage. This can be per physical machine, or per rack of multiple machines, for example. This snapshot storage is not to be logically confused with the shared storage discussed in Section 3.1.2, which is used for application data storage and sharing. Physically, they can be colocated.
为了存储快照和日志,系统需要具有一些局部存储。例如,局部存储可以是每物理机或者每机架(多个机器)的。快照存储逻辑上不会和3.1.2节讨论的共享存储混淆;共享存储用于应用数据存储和共享。物理上,他们可以是共存的。
3.2.2 Remote Recovery and Mobility (远程恢复和移动性)
In the case when the client fails over to a remote edge, which does not share a packet logger and snapshot storage with the failed (or previous) edge, we delegate the responsibility for the recovery to the client and/or the server.
在客户端故障转移到远程边缘的情形下(其和故障边缘不共享数据包日志和快照存储),我们将恢复的责任委托给客户端和/或服务器。
Upon taking a snapshot, the edge runtime stores it locally, but it also sends it to one or two of the endhosts (the client and/or the server). It is only necessary to send it to one of them for correct remote recovery, and we assume that for most applications it would make sense to only send it to the client (to allow scaling at the server). The snapshot is encrypted and signed by the edge, so the client cannot see its content or tamper with it.
获取快照后,边缘运行时将其存储在本地,也将其发送到端点中的一个或者两个(客户端和/或服务器)。为了正确执行远程恢复,只需要将快照发送到其中的一个端点;同时,我们假设对大多数应用来说只将快照发送到客户端是明智的(允许服务器扩展)。快照经边缘加密和签名,因此客户端无法看到起内容也无法篡改。
In addition to the snapshot, the edge also sends the endhosts information that will later help a recovered edge to determine the order in which messages from both sides were processed by the failed edge. This is done such that every packet emitted by the edge, to either the client or the server, contains the most up-to-date information on this ordering.
除了快照,边缘还需要发送终端信息,用于后期帮助恢复的边缘确定消息的顺序,这些消息来自两端,且由故障边缘处理。这使得边缘发送的每个数据包(到客户端或服务器)包含顺序的最新信息。
Upon recovery, in order to restart the session, the client and the server send the most up-to-date snapshot they received (if any), their outgoing message logs, and their knowledge of the ordering discussed above. The newly provisioned edge is then restored to the given snapshot, and then it orders the messages given by the endhosts based on the ordering they provided. This process is illustrated in Figure 2.
恢复时,为了重启会话,客户端和服务器发送他们收到的(如果有)最新快照,它们输出信息日志以及上述讨论的它们关于顺序的信息。新提供的边缘然后恢复到指定快照,然后根据提供的序列信息为终端消息排序。处理过程如图2所示。
图2: 远程恢复过程描述。圆圈数字表示事件顺序。
3.2.3 Stateless or Semi-Stateful Edge (无状态或半状态边缘)
The recovery mechanisms described so far assume a fully stateful edge. However these mechanisms can be simplifed in the following cases:
到目前为止,讨论的恢复机制假设边缘是全状态的。然而,这些机制在下列情形下可以简化:
If the edge application is completely stateless, we only need to be able to replay messages from the client and the server, which were already sent, but have not yet been processed. Thus, we only need the replay mechanisms described above (either using messages logged at the client, for a remote recovery, or using the local logger, in a local recovery).
如果边缘应用是完全无状态的,我们只需要能够回放来自客户端或服务器的已发送但尚未处理的消息。因为,我们只需要上述讨论的回放机制(对于远程恢复使用客户端消息日志,或者在局部恢复中使用本地日志)。
In the case of a semi-stateful edge application, where its state is only a function of one of the endhosts (client or server), we also need to have state snapshots, in addition to the replay mechanism that is required for the stateless case. The replay from the endhost on which state is dependent should be from the frst message after the last snapshot was taken. We do not need the interleaving ordering of client and server messages.
在半状态边缘应用的情形下,状态只是某一终端(客户端或服务器)的函数,除了无状态情形下的回放机制外,我们还需要状态快照。回放来自状态所依赖的终端的消息,这些消息包含上次快照后的第一条消息开始的后续消息。我们不需要客户端和服务器消息的交叉顺序。
Based on the nature of the application, it can declare whether it is stateless, semi-stateful, or stateful, and the framework can then adjust its recovery mechanisms for this application accordingly.
根据应用的特性,应用可以声明其是无状态的、半状态的或有状态的。框架可以根据应用相应的调整其回复机制。
3.3 Discovery (发现)
The client should be able to find the correct edge to connect to, based on the application it is connecting to, its location, etc. In our design, there is a discovery service that provides this information. Also, once the client is connected to an edge (e.g., the default one), this edge can provide it with alternative edge addresses, so in case of a failure, the client does not have to use the discovery service again but instead can immediately contact an alternative edge.
基于其连接的应用、位置等信息,客户端需要能够找到其需要连接的边缘。在我们的设计中,包含提供这一信息的发现服务。当客户端连接到某个边缘(例如,默认边缘)后,边缘可以为其提供可选边缘的地址。因此,在故障情形下,客户端不需要再次使用发现服务,可以直接联系可选择边缘。
3.4 The CESSNA Protocol (CESSNA协议)
Each message sent from each of the entities in our design should be sequenced, so that we could later refer to it in the ordering described above for remote recovery (local recovery uses TCP sequencing). Packets going out of the edge also contain ordering information to be stored at the endhosts.
我们设计中每个实体发送的消息都需要包含序列信息,这样我们在后期才可能在远程恢复过程中获取其顺序信息(局部恢复使用TCP顺序)。边缘输出的数据包同时包含存储在终端的顺序信息。
In order to facilitate that, we design a simple layer-7 protocol, and we wrap all packets with its header. This header may contain just a sequence number (for messages going out of the hosts), or a sequence number and ordering information (for messages going out of the edge). This header precedes any layer-7 payload in packets.
为了更加便利,我们设计了一种简单的7层协议,使用报头信息包装数据包。报头信息只包含序列号(对于由主机发送的消息),或者序列号和顺序信息(边缘发送出的消息)。报头在数据包的7层负载之前。
In the current version of the CESSNA protocol, the header for messages from endhosts to the edge is 16 bytes long. The header for messages from the edge is at least 20 bytes long, depending on the frequency of messages emitted by the edge. Each message from the edge contains the differential logging of packets received by the edge. Thus, the more messages emitted by the edge, the shorter the header is. We also note that the header used by our prototype is optimized for simplicity and not size; its size could be reduced by encoding the current information using variable-length integers or by leveraging application-specifc properties.
当前版本的CESSNA协议中,终端到边缘的消息的报头长度是16字节。边缘发出的消息的报头至少是20字节,这依赖于边缘发送消息的频率。边缘发送的每个消息包含其接收数据包的差分日志。因此,边缘发送的消息越多,报头长度越短。我们注意到协议所采用的的报头是为简单化而优化的,而不是为了大小;可以使用变长数据编码当前信息或者利用应用特定地属性降低报头大小。
3.5 Client / Server Design (客户端/服务器设计)
There is no actual difference between a client and a server in our design, except for their possible different set of preferences. For example, of whether to receive snapshots from the edge or not. An endhost, whether a client or a server, is simply an application running on top of our host platform, which manages the communication with the edge.
我们的设计中,客户端和服务器没有实质性差异,除了它们可能的不同首选项集合。例如,是否从边缘接收快照。在终端,客户端或者服务器只是运行在我们主机平台上的应用,管理与边缘的通信。
4 INITIAL IMPLEMENTATION (初始实现)
We have begun to implement the design described in this paper. In our implementation, we use Docker as the runtime engine for the edge. We briefly examine our prototype in the following subsections.
我们已经开始实现本文讨论的设计。我们的实现中,使用Docker作为边缘的运行时引擎。我们在下面的子章节简单介绍我们的原型系统。
4.1 CESSNA Library (CESSNA库)
We design a shared library to be used by applications both at the endhosts and at the edge, to take care of the serialization and deserialization of messages using the CESSNA protocol, logging messages at the endhosts, and so on.
我们设计了可被终端(客户端和服务器)和边缘应用使用的共享库,使用CESSNA协议处理消息的序列化和反序列化和终端的消息日志等。
The runtime library is implemented in C++, and it overrides the Linux system calls for socket handling, such as connect, accept, send, recv, close (and several others). The library is loaded dynamically using the LD_PRELOAD environment variable so that applications need no modifcation in order to use it. This, for example, also enables Java and Python applications to use the library with no modifcation (as JVM and the Python interpreter use the OS socket library underneath).
运行时库使用C++实现,并覆盖了Linux系统的套接口处理调用,如connect、accept、send、recv、close(以及一些其它函数)。该库使用LD_PRELOAD环境变量动态加载,这样应用在使用该库时不需要修改。这同时允许如Java和Python应用使用该库而无需修改(因为JVM和Python解释器使用OS底层的套接口库)。
4.2 Host Agent (主机代理)
Our host agent is implemented in Python. It is responsible for all slow path tasks at the hosts: logging outgoing packets, tracking the order of packets reported by the edge, receiving snapshots, and restarting sessions in case of a failure. The agent communicates with its corresponding edge runtimes out-of-band, in parallel to the application sessions.
主机代理使用Python实现,负责主机的所有慢速路径任务:输出包的日志,追踪边缘报告的数据包的顺序,接收快照,以及故障时重启会话。代理和其相应的边缘运行时通信,该通信是带外通信,且与应用会话并行。
4.3 Edge Platform (边缘平台)
The edge platform is based on a Python agent that runs adjacent to the Docker engine to manage snapshots and communication with the host agents.
边缘平台基于运行于Docker引擎邻近的Python代理,用于管理快照和与主机代理的通信。
4.4 CESSNA Over HTTP (HTTP之上的CESSNA)
In addition to the described implementation, which requires the usage of the CESSNA library and the host agent on the client, we are also working on a version of CESSNA specifcally suited for web applications (running over HTTP/HTTPS) which does not require the installation of an additional CESSNA agent or usage of the CESSNA library at the client.
除了上述讨论的实现(需要使用CESSNA库和客户端上的主机代理),我们正在研发适用于web应用的CESSNA版本(运行于HTTP/HTTPS之上),它不需要安装额外的CESSNA代理,也不需要使用客户端的CESSNA库。
CESSNA over HTTP is an extension to the CESSNA edge and server platforms that implements the client features in Javascript, so that the client can participate in the backup and recovery process just as in the original design, with the web browser handling the entire logic by simply running Javascript code given by the edge or the server. This code is responsible for logging outgoing requests, storing ordering information received from the edge, and managing snapshots. It is also responsible for recovery of sessions.
HTTP之上的CESSNA是对CESSNA边缘和服务器平台的扩展,使用Javascript实现客户端特征,这样客户端就可以和原始设计一样参与到备份和恢复过程。web浏览器通过运行边缘或服务器的Javascript代码处理所有的逻辑。代码负责记录外出请求,存储由边缘接收的信息的顺序,并管理快照;同时负责会话恢复。
5 APPLICABILITY & DISCUSSION (适用性和讨论)
Many applications that beneft from the Client-Edge-Server model do not have a stateful edge, or do not have strong consistency requirements on their edge state. For example, an audio/video conferencing application may beneft from an edge which can reflect streams to other clients on the same edge, and combine and transcode data being sent “upstream” to remote clients. If an edge is transcoding a video frame for a client and the client migrates to another edge, it may be acceptable to simply drop that frame (indeed, if the time taken by the infrastructure to provision a new edge and/or the time taken by the client to establish a session with the new edge is greater than the duration of a frame, there is no point in doing otherwise).
许多受益于客户端-边缘-服务器模型的应用没有有状态边缘,或者对它们边缘状态不需要强一致性。例如,音频/视频会议应用可能受益于边缘,该边缘能够将流反射到同一边缘的其它客户端,同时可以组合及转码上行数据到远程客户端。如果某个边缘正在为客户端转码视频帧,同时客户端移动到另一个边缘,简单地丢弃该帧是可接受的(事实上,如果基础设施提供新边缘的时间,以及/或客户端与新边缘建立会话的时间大于帧时间,做其他选择是没有意义的)。
For other applications, however, maintaining state consistency during cases of failure or migration may be vital. For instance, one might imagine an edge-based system which uses deep packet inspection to provide network-based security. DPI systems typically must track the state of various protocols; if such an application does not maintain this state perfectly under failover, connections may be aborted or policy violations may occur. It is applications in this class – stateful applications which beneft from consistency during failure or migration – for which CESSNA is ideally suited. This raises two points.
然而,对于其它应用,维护故障期间或转移期间的状态一致性是至关重要的。例如,可以设想一个基于边缘的系统,系统中使用深度包检测(DPI)提供基于网络的安全。DPI系统通常必须追踪多种协议的状态;如果这种应用在故障转移期间不维护精确地状态,连接可能终端或发生策略违规。这种类型的应用(有状态应用)得益于故障或转移期间的一致性,这也是CESSNA理想的适用场景。这就提出两点。
First, CESSNA does not force an application to use stronger guarantees than it needs. Many real-world Client-Edge-Server applications may contain several components, some that require consistency guarantees and some that do not. One can choose to use CESSNA for only the portion of the application that falls into the former class.
第一,CESSNA不强制应用使用比其所需要的更强的保证。许多现实中的客户端-边缘-服务器应用可能包含多个组件,其中一些要求一致性保证,另外一些不需要。可以为前者选择使用CESSNA。
Second, it is certainly possible to write applications with seamless failover without CESSNA. However, doing this on a per-application basis (and, in particular, getting it right) is typically nontrivial. The beneft of CESSNA is that it factors out this aspect of the design and provides a general solution that applications can just use.
第二,不使用CESSNA也可以编码具有无缝故障转移的应用。然而,为每个应用都这么做(特别地,能够确保正确)是非平凡的。CESSNA的好处是提取出设计中的这类因素,并为应用提供可用的通用方案。