转载:【原译】Erlang构建和匹配二进制数据(Efficiency Guide)

转自:http://www.cnblogs.com/futuredo/archive/2012/10/19/2727204.html

Constructing and matching binaries

Erlang/OTP R15B02

In R12B, the most natural way to write binary construction and matching is now significantly faster than in earlier releases.

在R12B版本中,构造和匹配二进制数据最自然的方式,相比较之前的版本其效率有了明显提高。

To construct at binary, you can simply write

R12B版本中,构造二进制数据,你可以简单地这样写(但不要在R12B之前的版本中这样写)

DO (in R12B) / REALLY DO NOT (in earlier releases)

my_list_to_binary(List) ->
my_list_to_binary(List, <<>>). my_list_to_binary([H|T], Acc) ->
my_list_to_binary(T, <<Acc/binary,H>>);
my_list_to_binary([], Acc) ->
Acc.

(%% 取出一个列表元素,转换到binary当中 %%)

In releases before R12B, Acc would be copied in every iteration. In R12B, Acc will be copied only in the first iteration and extra space will be allocated at the end of the copied binary. In the next iteration, H will be written in to the extra space. When the extra space runs out, the binary will be reallocated with more extra space.

在R12B之前的版本中,Acc在每次的迭代中 都会被复制一次。在R12B版本中,Acc只在第一次迭代中会被复制一次,并且会在二进制副本结尾分配额外的空间。在下一次迭代中,H会被写到额外的空间 上。当这些额外的空间使用完毕,系统会为二进制数据会重新分配内存并带有更大的额外空间。

The extra space allocated (or reallocated) will be twice the size of the existing binary data, or 256, whichever is larger.

额外分配的空间的大小将是现有二进制数据大小的两倍,或者256,看哪个更大。

The most natural way to match binaries is now the fastest:

在R12B版本中,匹配二进制数据最自然的方式(相比其他方式)现在是效率最快的:

DO (in R12B)

my_binary_to_list(<<H,T/binary>>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].

1  How binaries are implemented

二进制数据是怎么实现的

(%% 这部分讲的是内部机制,有点不太好理解,翻译起来术语容易混淆 %%)

Internally, binaries and bitstrings are implemented in the same way. In this section, we will call them binaries since that is what they are called in the emulator source code.

二进制数据和比特位串的内部实现是相同的。在这一节,我们将它们统一称为二进制数据,因为在虚拟机的源代码里它们就是这么叫的。

There are four types of binary objects internally. Two of them are containers for binary data and two of them are merely references to a part of a binary.

在内部,共有4种二进制对象。其中两种是装载二进制数据的容器,另外两种只是对二进制数据中的一部分的引用。

(%% 两种是数据容器,Refc binaries 和 heap binaries %%)

The binary containers are called refc binaries (short for reference-counted binaries) and heap binaries.

二进制数据容器被叫做refc binaries(引用计数的二进制数据的简称)和heap binaries(堆二进制数据)

Refc binaries consist of two parts: an object stored on the process heap, called a ProcBin, and the binary object itself stored outside all process heaps.

Refc binaries 由两部分组成:存储在进程堆上的一个叫ProcBin的对象,存储在所有进程堆之外的binary object二进制对象本身。

The binary object can be referenced by any number of ProcBins from any number of processes; the object contains a reference counter to keep track of the number of references, so that it can be removed when the last reference disappears.

可以从任意进程的任意ProcBins来引用到binary object二进制对象;这个对象(binary object)包含一个引用计数器,用来追踪引用的数量,这样就可以在最后一个引用消失之后将自己移除。

All ProcBin objects in a process are part of a linked list, so that the garbage collector can keep track of them and decrement the reference counters in the binary when a ProcBin disappears.

一个进程中所有的ProcBin对象都是一个链表的一部分,这样垃圾回收器就可以追踪到它们,当一个ProcBin消失之后,可以减小其(binary object)引用计数。

Heap binaries are small binaries, up to 64 bytes, that are stored directly on the process heap. They will be copied when the process is garbage collected and when they are sent as a message. They don't require any special handling by the garbage collector.

Heap binaries 是一种小型的二进制数据,最大为64字节,直接存储在进程堆上。当进程被回收或者作为一个消息被发送的时候,这些数据就会被复制。它们不需要垃圾回收器做特殊处理。

There are two types of reference objects that can reference part of a refc binary or heap binary. They are called sub binaries andmatch contexts.

有两种引用对象可以引用到refc binary或者heap binary的一部分,叫做sub binaries(子二进制数据)和match contexts(匹配上下文)。

(%% 两种是数据引用,sub binaries 和 match context %%)

sub binary is created by split_binary/2 and when a binary is matched out in a binary pattern. A sub binary is a reference into a part of another binary (refc or heap binary, never into a another sub binary). Therefore, matching out a binary is relatively cheap because the actual binary data is never copied.

当使用split_binary/2函数或者一 个二进制数据被匹配的时候,会产生出一个sub binary(子二进制数据)。一个sub binary(子二进制数据)是对另一个二进制数据一部分的引用(可以是refc binary或者heap binary,但不能是另外一个sub binary)。因此,匹配出来一个二进制数据相对来说开销不大,因为实际的二进制数据没有被复制。

match context is similar to a sub binary, but is optimized for binary matching; for instance, it contains a direct pointer to the binary data. For each field that is matched out of a binary, the position in the match context will be incremented.

一个match context(匹配上下文)和sub binary(子二进制数据)相似,但对二进制数据匹配做了优化;例如,它包含了一个指向二进制数据的指针。每一个匹配出来的二进制数据的域,在 match context(匹配上下文)中都有一个位置,并且是递增的。

In R11B, a match context was only used during a binary matching operation.

在R11B版本中,一个match context(匹配上下文)只是在进行二进制数据匹配操作时才会被使用。

In R12B, the compiler tries to avoid generating code that creates a sub binary, only to shortly afterwards create a new match context and discard the sub binary. Instead of creating a sub binary, the match context is kept.

在R12B版本中,编译器试图避免产生那种会生成sub binary(子二进制数据)的代码,因为之后很快会生成一个新的match context(匹配上下文),弃用sub binary(子二进制数据)。

The compiler can only do this optimization if it can know for sure that the match context will not be shared. If it would be shared, the functional properties (also called referential transparency) of Erlang would break.

编译器只有在确保match context(匹配上下文)不会被共享时才能做出这种优化。如果match context(匹配上下文)会被共享,那么Erlang的功能特性(也叫做引用透明)会失效。

(%% 不明白match context的共享是怎么影响到优化的 %%)

(%%

  看到阮一峰的博客《函数式编程初探》一文里 有对引用透明(Referential transparency)的解释,它指的是函数的运行不依赖于外部变量或"状态",只依赖于输入的参数,任何时候只要参数相同,引用函数所得到的返回值 总是相同的。其他类型的语言,函数的返回值往往与系统状态有关,不同的状态之下,返回值是不一样的。这就叫"引用不透明",很不利于观察和理解程序的行 为。

  出处:http://www.ruanyifeng.com/blog/2012/04/functional_programming.html

%%)

2  Constructing binaries

构造二进制数据

In R12B, appending to a binary or bitstring

在R12B版本中,虚拟机对二进制数据和比特位串的附加操作

<<Binary/binary, ...>>
<<Binary/bitstring, ...>>

is specially optimized by the run-time system. Because the run-time system handles the optimization (instead of the compiler), there are very few circumstances in which the optimization will not work.

进行了特别优化。因为是虚拟机(不是编译器)来做优化,所以优化过程几乎适用于所有情况。

To explain how it works, we will go through this code

为了解释它是怎么工作的,我们来看看这些代码

Bin0 = <<0>>,                    %% 1
Bin1 = <<Bin0/binary,1,2,3>>, %% 2
Bin2 = <<Bin1/binary,4,5,6>>, %% 3
Bin3 = <<Bin2/binary,7,8,9>>, %% 4
Bin4 = <<Bin1/binary,17>>, %% 5 !!!
{Bin4,Bin3} %% 6

line by line.

The first line (marked with the %% 1 comment), assigns a heap binary to the variable Bin0.

第一行给Bin0变量赋了一个heap binary(堆二进制数据)值。

The second line is an append operation. Since Bin0 has not been involved in an append operation, a new refc binary will be created and the contents of Bin0 will be copied into it. The ProcBin part of the refc binary will have its size set to the size of the data stored in the binary, while the binary object will have extra space allocated. The size of the binary object will be either twice the size of Bin0 or 256, whichever is larger. In this case it will be 256.

第二行是一个附加操作。因为之前Bin0不曾有 经历过一个附加操作,所以现在一个新的refc binary会被创建,Bin0的内容会被复制到其中。refc binary中,ProcBin部分的大小被设置成二进制数据的大小,二进制对象部分会分配额外的空间,(二进制对象)大小是Bin0大小的两倍,或者是 256,看哪个大,这个例子中是256。

It gets more interesting in the third line. Bin1 has been used in an append operation, and it has 255 bytes of unused storage at the end, so the three new bytes will be stored there.

第三行更有意思。Bin1之前经历过一个附加操作,在结尾有255个字节空间没被使用,所以三个新的字节会存在那里。

(%% 为什么是255个 %%)

Same thing in the fourth line. There are 252 bytes left, so there is no problem storing another three bytes.

第四行也是一样。还有252个字节留存,所以再存3个字节也没问题。

But in the fifth line something interesting happens. Note that we don't append to the previous result in Bin3, but to Bin1. We expect that Bin4 will be assigned the value <<0,1,2,3,17>>. We also expect that Bin3 will retain its value (<<0,1,2,3,4,5,6,7,8,9>>). Clearly, the run-time system cannot write the byte 17 into the binary, because that would change the value of Bin3 to <<0,1,2,3,4,17,6,7,8,9>>.

但在第五行有趣的事发生了。注意,我们没有把之 前的结果附加到Bin3上,而是附加到了Bin1上。我们预计Bin4会被赋值为<<0,1,2,3,17>>,Bin3会保留 它原有的值<<0,1,2,3,4,5,6,7,8,9>>。显然,虚拟机不能把17写到二进制数据里,因为那样会把Bin3的 值变为<<0,1,2,3,4,17,6,7,8,9>>。

(%% 17为什么不写到4的位置 %%)

What will happen?

那么到底会发生什么?

The run-time system will see that Bin1 is the result from a previous append operation (not from the latest append operation), so it will copy the contents of Bin1 to a new binary and reserve extra storage and so on. (We will not explain here how the run-time system can know that it is not allowed to write into Bin1; it is left as an exercise to the curious reader to figure out how it is done by reading the emulator sources, primarily erl_bits.c.)

虚拟机会看到Bin1是之前附加操作的结果(不是最新的那次附加操作),所以它会把Bin1的内容复制到一个新的二进制数据中,并分配额外的空间。(这里我们不会解释虚拟机是怎样知道数据不可以写到Bin1中的;有兴趣可以去看看虚拟机的源码 erl_bits.c)

Circumstances that force copying

强行复制的情况

The optimization of the binary append operation requires that there is a single ProcBin and a single reference to the ProcBin for the binary. The reason is that the binary object can be moved (reallocated) during an append operation, and when that happens the pointer in the ProcBin must be updated. If there would be more than one ProcBin pointing to the binary object, it would not be possible to find and update all of them.

二进制数据附加操作的优化,要求只有一个 ProcBin和对其二进制对象的单个引用。理由是,在一个附加操作中,二进制对象可以移动(重新分配),发生这种情况时,在ProcBin里的指针必须 更新。如果有多于一个ProcBin指向二进制对象,那就不可能发现和更新所有指针。

(%% 前面提到过,ProcBin是存在链表里的,既然一个指针可以更新,为什么不能更新其他ProcBin的指针 %%)

Therefore, certain operations on a binary will mark it so that any future append operation will be forced to copy the binary. In most cases, the binary object will be shrunk at the same time to reclaim the extra space allocated for growing.

因此,二进制数据上的这种操作会标明,任何以后的附加操作会要求复制二进制数据。在大多数情况下,二进制对象同时也会收缩,回收增长时期分配的额外空间。

When appending to a binary

当对二进制数据进行附加操作时

Bin = <<Bin0,...>>

only the binary returned from the latest append operation will support further cheap append operations. In the code fragment above, appending to Bin will be cheap, while appending to Bin0 will force the creation of a new binary and copying of the contents of Bin0.

只有最新一次附加操作返回的二进制数据才会支持以后开销小的附加操作。在上面的代码块中,对Bin进行附加操作开销较小,但对Bin0进行附加操作,则要求新建一个新的二进制数据,并复制Bin0的内容。

If a binary is sent as a message to a process or port, the binary will be shrunk and any further append operation will copy the binary data into a new binary. For instance, in the following code fragment

如果一个二进制数据被当做消息发送给一个进程或者端口,那么这个二进制数据会收缩,以后的附加操作会复制原有的数据到一个新的二进制数据,下面的代码块中

Bin1 = <<Bin0,...>>,
PortOrPid ! Bin1,
Bin = <<Bin1,...>> %% Bin1 will be COPIED

Bin1 will be copied in the third line.

第三行,Bin1会被复制。

The same thing happens if you insert a binary into an ets table or send it to a port using erlang:port_command/2.

如果你把一个二进制数据插入到ets表中,或者使用erlang:port_command/2函数把它发送到一个端口,同样也会进行复制。

Matching a binary will also cause it to shrink and the next append operation will copy the binary data:

匹配一个二进制数据也会引起数据收缩,下一次附加操作会复制二进制数据:

Bin1 = <<Bin0,...>>,
<<X,Y,Z,T/binary>> = Bin1,
Bin = <<Bin1,...>> %% Bin1 will be COPIED

The reason is that a match context contains a direct pointer to the binary data.

这是因为,match context(匹配上下文)包含了一个指向二进制数据的指针。

(%% 这个解释不太明白,有了指针就可以复制指向的数据,但为什么要复制呢 %%)

If a process simply keeps binaries (either in "loop data" or in the process dictionary), the garbage collector may eventually shrink the binaries. If only one such binary is kept, it will not be shrunk. If the process later appends to a binary that has been shrunk, the binary object will be reallocated to make place for the data to be appended.

如果一个进程只是简单地保留二进制数据 (在"loop data"或者进程字典中),垃圾回收器可能最后会收缩这些数据。如果进程只是保留一个二进制数据,那么它不会被收缩。如果进程之后对收缩的二进制数据进 行了附加操作,那么二进制对象会被重新分配,给附加的数据留出空间。

3  Matching binaries

匹配二进制数据

We will revisit the example shown earlier

我们来重新看一下之前的例子(R12B版本)

DO (in R12B)

my_binary_to_list(<<H,T/binary>>) ->
[H|my_binary_to_list(T)];
my_binary_to_list(<<>>) -> [].

too see what is happening under the hood.

看看发生了什么。

The very first time my_binary_to_list/1 is called, a match context will be created. The match context will point to the first byte of the binary. One byte will be matched out and the match context will be updated to point to the second byte in the binary.

第一次调用 my_binary_to_list/1函数,就创建了一个match context(匹配上下文)。这个match context(匹配上下文)将指向二进制数据的首个字节。一个字节被匹配出来,然后这个match context(匹配上下文)会被更新,指向二进制数据中的第二个字节。

In R11B, at this point a sub binary would be created. In R12B, the compiler sees that there is no point in creating a sub binary, because there will soon be a call to a function (in this case, to my_binary_to_list/1 itself) that will immediately create a new match context and discard the sub binary.

在R11B版本中,这里会创建一个sub binary(子二进制数据)。在R12B版本中,编译器会明白创建一个sub binary(子二进制数据)是没有意义的,因为接下来很快会调用一个函数,立即产生一个match context(匹配上下文),抛弃sub binary(子二进制数据)。

Therefore, in R12B, my_binary_to_list/1 will call itself with the match context instead of with a sub binary. The instruction that initializes the matching operation will basically do nothing when it sees that it was passed a match context instead of a binary.

因此,在R12B版本 中,my_binary_list/1函数带着match context(匹配上下文)去调用自身,而不是sub binary(子二进制数据)。当初始化匹配的操作知道,传过来的是一个match context(匹配上下文)而不是一个二进制数据,它基本不做任何工作。

When the end of the binary is reached and the second clause matches, the match context will simply be discarded (removed in the next garbage collection, since there is no longer any reference to it).

当到达二进制数据的尾端,第二个函数段匹配时,这个match context(匹配上下文)会被弃用。(由于不再有对它的引用,在下一次垃圾回收的时候会被移除)

To summarize, my_binary_to_list/1 in R12B only needs to create one match context and no sub binaries. In R11B, if the binary contains N bytes, N+1 match contexts and N sub binaries will be created.

总结一下,R12B版本 中,my_binary_to_list/1只需创建一个match context(匹配上下文),不要创建sub binaries(子二进制数据)。在R11B版本中,如果二进制数据包含N个字节,N+1个match context(匹配上下文)和N个sub bianry(子二进制数据)会被创建。

(%% sub binary好理解,这个N+1个match context,是还有一个指向不存在的尾端吗 %%)

In R11B, the fastest way to match binaries is:

在R11B版本中,匹配二进制数据最快的方式是:(不要在R12B中使用)

DO NOT (in R12B)

my_complicated_binary_to_list(Bin) ->
my_complicated_binary_to_list(Bin, 0). my_complicated_binary_to_list(Bin, Skip) ->
case Bin of
<<_:Skip/binary,Byte,_/binary>> ->
[Byte|my_complicated_binary_to_list(Bin, Skip+1)];
<<_:Skip/binary>> ->
[]
end.

This function cleverly avoids building sub binaries, but it cannot avoid building a match context in each recursion step. Therefore, in both R11B and R12B, my_complicated_binary_to_list/1 builds N+1 match contexts. (In a future release, the compiler might be able to generate code that reuses the match context, but don't hold your breath.)

这个函数聪明的避免了构造sub binaries(子二进制数据),但是免不了在每次迭代中构造一个match context(匹配上下文)。所以,在R11B和R12B版本中,my_complicated_binary_to_list/1函数都要构造N+1 个match context(匹配上下文)。(在未来的版本中,编译器可能能够生成复用match context(匹配上下文)的代码)

(%% 怎样避免构造sub binary还是不太明白,是因为编译器判断出要产生match context而弃用sub binary吗 %%)

Returning to my_binary_to_list/1, note that the match context was discarded when the entire binary had been traversed. What happens if the iteration stops before it has reached the end of the binary? Will the optimization still work?

回到my_binary_to_list/1函数,注意到,当整个二进制数据遍历完以后match context(匹配上下文)会被弃用。那如果在还没遍历到二进制数据尾端的时候终止迭代,会发生什么呢?优化还能够进行吗?

after_zero(<<0,T/binary>>) ->
T;
after_zero(<<_,T/binary>>) ->
after_zero(T);
after_zero(<<>>) ->
<<>>.

Yes, it will. The compiler will remove the building of the sub binary in the second clause

是的,优化仍能进行。编译器会不在第二个函数段中构造sub binary(子二进制数据)。

.
.
.
after_zero(<<_,T/binary>>) ->
after_zero(T);
.
.
.

but will generate code that builds a sub binary in the first clause

而是生成一种代码,在第一个函数段中构造sub binary(子二进制数据)

after_zero(<<0,T/binary>>) ->
T;
.
.
.

Therefore, after_zero/1 will build one match context and one sub binary (assuming it is passed a binary that contains a zero byte).

因此,after_zero/1函数会构造一个match context(匹配上下文)和一个sub binary(子二进制数据)。(假设传过去一个包含0个字节的二进制数据)

(%% 靠编译器来判断 %%)

Code like the following will also be optimized:

像下面这样的代码也会被优化:

all_but_zeroes_to_list(Buffer, Acc, 0) ->
{lists:reverse(Acc),Buffer};
all_but_zeroes_to_list(<<0,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, Acc, Remaining-1);
all_but_zeroes_to_list(<<Byte,T/binary>>, Acc, Remaining) ->
all_but_zeroes_to_list(T, [Byte|Acc], Remaining-1).

The compiler will remove building of sub binaries in the second and third clauses, and it will add an instruction to the first clause that will convert Buffer from a match context to a sub binary (or do nothing if Buffer already is a binary).

编译器在第二个函数段和第三个函数段将不会构造 sub binary(子二进制数据),它会对第一个函数段加个操作,把Buffer从一个match context(匹配上下文)转换为sub binary(子二进制数据)(或者不做任何操作,如果Buffer已经是一个二进制数据)。

(%% 还是那套原则吗?编译器判断出会用到match context,所以在最后才创建sub binary %%)

Before you begin to think that the compiler can optimize any binary patterns, here is a function that the compiler (currently, at least) is not able to optimize:

在你认为编译器可以优化任何二进制数据之前,这里有个编译器不能优化的函数(至少现在是这样):

non_opt_eq([H|T1], <<H,T2/binary>>) ->
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.

It was briefly mentioned earlier that the compiler can only delay creation of sub binaries if it can be sure that the binary will not be shared. In this case, the compiler cannot be sure.

在之前简短地提到过,只有当编译器能明确地知道二进制数据不会被共享的时候,它才能延迟sub binary(子二进制数据)的创建。在这个例子中,编译器不能确定这个条件。

(%% 不会被共享,才能延迟创建,机制还是不太明白 %%)

We will soon show how to rewrite non_opt_eq/2 so that the delayed sub binary optimization can be applied, and more importantly, we will show how you can find out whether your code can be optimized.

不久,我们会讲解怎样来重写non_opt_eq/2函数,使得能够延迟sub binary(子二进制数据)的创建。更重要的是,我们会讲到怎样判断你的代码能不能被优化。

The bin_opt_info option

bin_opt_info 选项

Use the bin_opt_info option to have the compiler print a lot of information about binary optimizations. It can be given either to the compiler or erlc

使用bin_opt_info选项可以让编译器打印很多关于二进制数据优化的信息。这个选项可以加到erlc

erlc +bin_opt_info Mod.erl

or passed via an environment variable

或者通过环境变量来传递

export ERL_COMPILER_OPTIONS=bin_opt_info

Note that the bin_opt_info is not meant to be a permanent option added to your Makefiles, because it is not possible to eliminate all messages that it generates. Therefore, passing the option through the environment is in most cases the most practical approach.

注意到,bin_opt_info选项不应该是一个加到Makefiles文件中的固定选项,因为它不可能消除它产生的所有消息。因此,大多数情况下,通过环境变量来传递这个选项是最实际的方式。

(%% 翻译很别扭,应该是方便改Makefiles中的所有bin_opt_info变量吧 %%)

The warnings will look like this:

提醒消息类似这样:

./efficiency_guide.erl:60: Warning: NOT OPTIMIZED: sub binary is used or returned
./efficiency_guide.erl:62: Warning: OPTIMIZED: creation of sub binary delayed

To make it clearer exactly what code the warnings refer to, in the examples that follow, the warnings are inserted as comments after the clause they refer to:

为了清晰地表示这些提醒是对应于哪些代码,在下面的例子中,提醒信息作为注释插入到了对应函数段的后面:

after_zero(<<0,T/binary>>) ->
%% NOT OPTIMIZED: sub binary is used or returned
T;
after_zero(<<_,T/binary>>) ->
%% OPTIMIZED: creation of sub binary delayed
after_zero(T);
after_zero(<<>>) ->
<<>>.

The warning for the first clause tells us that it is not possible to delay the creation of a sub binary, because it will be returned. The warning for the second clause tells us that a sub binary will not be created (yet).

第一个函数段的提醒信息告诉我们,不能延迟一个sub binary(子二进制数据)的创建,因为它这里会被返回。第二个函数段的提醒信息告诉我们,将不会创建一个sub binary(子二进制数据)。

It is time to revisit the earlier example of the code that could not be optimized and find out why:

现在让我们回过头来看看先前那个不能优化的代码例子,找出为什么:

non_opt_eq([H|T1], <<H,T2/binary>>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
%% NOT OPTIMIZED: called function non_opt_eq/2 does not
%% begin with a suitable binary matching instruction
non_opt_eq(T1, T2);
non_opt_eq([_|_], <<_,_/binary>>) ->
false;
non_opt_eq([], <<>>) ->
true.

The compiler emitted two warnings. The INFO warning refers to the function non_opt_eq/2 as a callee, indicating that any functions that call non_opt_eq/2 will not be able to make delayed sub binary optimization. There is also a suggestion to change argument order. The second warning (that happens to refer to the same line) refers to the construction of the sub binary itself.

编译器产生了两条提醒信息。INFO提醒把 non_opt_eq/2函数当做被调函数,表明任何调用non_opt_eq/2的函数无法做出延迟创建sub binary(子二进制数据)的优化。这里指出了一个建议,改变参数顺序。第二个提醒(碰巧是在同一行)指的是sub binary(子二进制数据)的构造。

We will soon show another example that should make the distinction between INFO and NOT OPTIMIZED warnings somewhat clearer, but first we will heed the suggestion to change argument order:

我们将用另外一个例子来更加明显地表示INFO和NOT OPTIMIZED的区别,但首先我们需要采用改变参数顺序的建议:

opt_eq(<<H,T1/binary>>, [H|T2]) ->
%% OPTIMIZED: creation of sub binary delayed
opt_eq(T1, T2);
opt_eq(<<_,_/binary>>, [_|_]) ->
false;
opt_eq(<<>>, []) ->
true.

The compiler gives a warning for the following code fragment:

编译器为下面代码给出了一个提醒信息:

match_body([0|_], <<H,_/binary>>) ->
%% INFO: matching anything else but a plain variable to
%% the left of binary pattern will prevent delayed
%% sub binary optimization;
%% SUGGEST changing argument order
done;
.
.
.

The warning means that if there is a call to match_body/2 (from another clause in match_body/2 or another function), the delayed sub binary optimization will not be possible. There will be additional warnings for any place where a sub binary is matched out at the end of and passed as the second argument to match_body/2. For instance:

这条提醒表示的是,如果 match_body/2函数被调用(从另外一个match_body/2函数段或者其他函数),就无法来优化延迟创建sub binary(子二进制数据)。在任何地方(例如,下面的match_head/2函数中),sub binary(子二进制数据)作为第二个参数,并且最后做了匹配,会有其它的提醒信息。例如:

(%% 它的建议是,要优化,二进制数据都应该放在左边 %%)

match_head(List, <<_:10,Data/binary>>) ->
%% NOT OPTIMIZED: called function match_body/2 does not
%% begin with a suitable binary matching instruction
match_body(List, Data).

Unused variables

不使用的变量

The compiler itself figures out if a variable is unused. The same code is generated for each of the following functions

编译器自身会判断出一个变量是否被使用。对于下面的每一个函数,会产生相同的代码

count1(<<_,T/binary>>, Count) -> count1(T, Count+1);
count1(<<>>, Count) -> Count. count2(<<H,T/binary>>, Count) -> count2(T, Count+1);
count2(<<>>, Count) -> Count. count3(<<_H,T/binary>>, Count) -> count3(T, Count+1);
count3(<<>>, Count) -> Count.

In each iteration, the first 8 bits in the binary will be skipped, not matched out.

每次迭代中,二进制数据的头8个bit位会被跳过,不匹配出来。

上一篇:Linux企业级项目实践之网络爬虫(16)——使用base64传输二进制数据


下一篇:atitit.无损传输二进制数据串传输网络