我正在编写一些代码,该代码必须根据设置将不同的算法应用于大型数据集.数据集很大,实际的时间表明,如果可能的话,我们需要对此进行优化.
所选算法必须在大型数组的许多数据子集上运行.因此,我决定尝试几种不同的方法:
>初始化功能.委托以引用所需的算法.从主循环中调用此委托.
>遍历数据并从主循环内调用适当的算法.
>调用一个单独的方法,该方法为每种算法本身实现主循环.
在我的测试中,我让每种方法都调用了相同的基础方法calculate(). (当然,实际代码为每个算法调用了不同的方法,但是在这里,我正在测试调用算法的最快方法,而不是算法本身.)
每个测试都会在ITERS次循环中调用所需的算法.
在此测试代码中,DataReductionAlgorithm只是定义各种算法的枚举.除了模拟真实代码中发生的事情外,它并没有真正使用.
这是我对方法(1)的测试实现.这非常简单:分配Func<>一个要调用的算法,然后从循环中调用它:
private static void test1(int[] data, DataReductionAlgorithm algorithm)
{
Func<int[], int, int, int> a;
switch (algorithm)
{
case DataReductionAlgorithm.Max:
a = calculate;
break;
case DataReductionAlgorithm.Mean:
a = calculate;
break;
default:
a = calculate;
break;
}
for (int i = 0; i < ITERS; ++i)
a(data, 0, data.Length);
}
这是我对方法(2)的测试实现.它将if测试用于选择算法的循环之外.我期望这是最快的方法:
private static void test2(int[] data, DataReductionAlgorithm algorithm)
{
switch (algorithm)
{
case DataReductionAlgorithm.Max:
for (int i = 0; i < ITERS; ++i)
calculate(data, 0, data.Length);
break;
case DataReductionAlgorithm.Mean:
for (int i = 0; i < ITERS; ++i)
calculate(data, 0, data.Length);
break;
default:
for (int i = 0; i < ITERS; ++i)
calculate(data, 0, data.Length);
break;
}
}
这是测试方法(3)的代码.如果在循环内移动if测试以选择算法.我期望这种方法(2)的速度较慢,因为if测试将执行ITERS次,而不是一次:
private static void test3(int[] data, DataReductionAlgorithm algorithm)
{
for (int i = 0; i < ITERS; ++i)
{
switch (algorithm)
{
case DataReductionAlgorithm.Max:
calculate(data, 0, data.Length);
break;
case DataReductionAlgorithm.Mean:
calculate(data, 0, data.Length);
break;
default:
calculate(data, 0, data.Length);
break;
}
}
}
由于我得到了奇怪的计时结果,所以我添加了一个与test2()几乎相同的新测试,只是我调用了一个方法来执行完全相同的循环,而不是在切换情况下循环.
因此,我希望这几乎与test2()花费相同的时间:
private static void test4(int[] data, DataReductionAlgorithm algorithm)
{
switch (algorithm)
{
case DataReductionAlgorithm.Max:
iterate(ITERS, data);
break;
case DataReductionAlgorithm.Mean:
iterate(ITERS, data);
break;
default:
iterate(ITERS, data);
break;
}
}
private static void iterate(int n, int[] data)
{
for (int i = 0; i < n; ++i)
calculate(data, 0, data.Length);
}
如果有人想尝试一下,这里是整个程序:
using System;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
public enum DataReductionAlgorithm
{
Single,
Max,
Mean
}
internal class Program
{
private const int ITERS = 100000;
private void run()
{
int[] data = Enumerable.Range(0, 10000).ToArray();
Stopwatch sw = new Stopwatch();
for (int trial = 0; trial < 4; ++trial)
{
sw.Restart();
test1(data, DataReductionAlgorithm.Mean);
Console.WriteLine("test1: " + sw.Elapsed);
sw.Restart();
test2(data, DataReductionAlgorithm.Mean);
Console.WriteLine("test2: " + sw.Elapsed);
sw.Restart();
test3(data, DataReductionAlgorithm.Mean);
Console.WriteLine("test3: " + sw.Elapsed);
sw.Restart();
test4(data, DataReductionAlgorithm.Mean);
Console.WriteLine("test4: " + sw.Elapsed);
Console.WriteLine();
}
}
private static void test1(int[] data, DataReductionAlgorithm algorithm)
{
Func<int[], int, int, int> a;
switch (algorithm)
{
case DataReductionAlgorithm.Max:
a = calculate;
break;
case DataReductionAlgorithm.Mean:
a = calculate;
break;
default:
a = calculate;
break;
}
for (int i = 0; i < ITERS; ++i)
a(data, 0, data.Length);
}
private static void test2(int[] data, DataReductionAlgorithm algorithm)
{
switch (algorithm)
{
case DataReductionAlgorithm.Max:
for (int i = 0; i < ITERS; ++i)
calculate(data, 0, data.Length);
break;
case DataReductionAlgorithm.Mean:
for (int i = 0; i < ITERS; ++i)
calculate(data, 0, data.Length);
break;
default:
for (int i = 0; i < ITERS; ++i)
calculate(data, 0, data.Length);
break;
}
}
private static void test3(int[] data, DataReductionAlgorithm algorithm)
{
for (int i = 0; i < ITERS; ++i)
{
switch (algorithm)
{
case DataReductionAlgorithm.Max:
calculate(data, 0, data.Length);
break;
case DataReductionAlgorithm.Mean:
calculate(data, 0, data.Length);
break;
default:
calculate(data, 0, data.Length);
break;
}
}
}
private static void test4(int[] data, DataReductionAlgorithm algorithm)
{
switch (algorithm)
{
case DataReductionAlgorithm.Max:
iterate(ITERS, data);
break;
case DataReductionAlgorithm.Mean:
iterate(ITERS, data);
break;
default:
iterate(ITERS, data);
break;
}
}
private static void iterate(int n, int[] data)
{
for (int i = 0; i < n; ++i)
calculate(data, 0, data.Length);
}
private static int calculate(int[] data, int i1, int i2)
{
// Just a dummy implementation.
// Using the same algorithm for each approach to avoid differences in timings.
int result = 0;
for (int i = i1; i < i2; ++i)
result += data[i];
return result;
}
private static void Main()
{
new Program().run();
}
}
}
结果
首先,请注意,这些结果来自调试器外部运行的RELEASE BUILD.运行调试版本(或运行调试器的发行版本)会产生误导性的结果.
我正在Windows 8.1上使用.Net 4.51的四核Intel处理器测试构建. (但是,使用.Net 4.5和.Net 4时,我得到了类似的结果.)
根据x64 / AnyCPU还是x86,我得到了不同的结果.
回顾一下:我期望test1()和test3()最快,而test2()最快,而test4()的速度几乎与test2()相同.
这是x86的结果:
test1: 00:00:00.5892166
test2: 00:00:00.5848795
test3: 00:00:00.5866006
test4: 00:00:00.5867143
这是我所期望的,除了test1()比我认为的要快(可能表明高度优化了调用委托).
这是x64结果:
test1: 00:00:00.8769743
test2: 00:00:00.8750667
test3: 00:00:00.5839475
test4: 00:00:00.5853400
哇!
test1()和test2()发生了什么?我无法解释.如何使test2()比test3()慢得多?
为什么test4()的速度与test2()的速度几乎没有相同?
以及为什么x86和x64之间存在巨大差异?
谁能对此有所启示?速度的差异不是微不足道的-可能需要10秒和15秒之间的差异.
附录
我接受了下面的答案.
但是,仅为了说明下面@usr提到的JIT优化的脆弱性,请考虑以下代码:
using System;
using System.Diagnostics;
namespace Demo
{
internal class Program
{
private const int ITERS = 10000;
private void run()
{
Stopwatch sw = new Stopwatch();
int[] data = new int[10000];
for (int trial = 0; trial < 4; ++trial)
{
sw.Restart();
test1(data, 0);
var elapsed1 = sw.Elapsed;
sw.Restart();
test2(data, 0);
var elapsed2 = sw.Elapsed;
Console.WriteLine("Ratio = " + elapsed1.TotalMilliseconds / elapsed2.TotalMilliseconds);
}
Console.ReadLine();
}
private static void test1(int[] data, int x)
{
switch (x)
{
case 0:
{
for (int i = 0; i < ITERS; ++i)
dummy(data);
break;
}
}
}
private static void test2(int[] data, int x)
{
switch (x)
{
case 0:
{
loop(data);
break;
}
}
}
private static int dummy(int[] data)
{
int max = 0;
// Also try with "int i = 1" in the loop below.
for (int i = 0; i < data.Length; ++i)
if (data[i] > max)
max = data[i];
return max;
}
private static void loop(int[] data)
{
for (int i = 0; i < ITERS; ++i)
dummy(data);
}
private static void Main()
{
new Program().run();
}
}
}
注意注释下面的代码行//在下面的循环中也尝试使用“ int i = 1”.
在i = 0的情况下,对于x64版本,我得到以下结果:
Ratio = 1.52235829774506
Ratio = 1.50636405328076
Ratio = 1.52291602053827
Ratio = 1.52803278744701
仅将其更改为i = 1,我得到以下结果:
Ratio = 1.16920209593233
Ratio = 0.990370350435142
Ratio = 0.991150637472754
Ratio = 0.999941245001628
有趣!