C自动向量化矩阵乘法循环

编译执行基本矩阵矩阵乘法并启用自动矢量化和自动并行化的源代码时,我在控制台中收到以下警告:

C5002: loop not vectorized due to reason '1200'
C5012: loop not parallelized due to reason'1000'

我已经阅读了MSDN提供的this资源,其中指出:

Reason code 1200: Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences.

Reason code 1000: The compiler detected a data dependency in the loop body.

我不确定循环中是什么引起了问题.这是我的源代码的相关部分.

// int** A, int** B, int** result, const int dimension
for (int i = 0; i < dimension; ++i) {
    for (int j = 0; j < dimension; ++j) {
        for (int k = 0; k < dimension; ++k) {
            result[i][j] = result[i][j] + A[i][k] * B[k][j];
        }   
    }
}

任何见识将不胜感激.

解决方法:

循环执行的依赖关系取决于结果[i] [j].

解决问题的方法是在对结果求和时使用临时变量,并在最内层循环之外进行更新,如下所示:

for (int i = 0; i < dimension; ++i) {
    for (int j = 0; j < dimension; ++j) {
        auto tmp = 0;
        for (int k = 0; k < dimension; ++k) {
            tmp += A[i][k] * B[k][j];
        }
        result[i][j] = tmp;
    }
}

这将消除依赖关系(因为对result [i] [j]的写后读操作更多,并且应该帮助矢量化器做得更好.

上一篇:遍历矩阵的每一行来查找索引


下一篇:python – numpy数组中的多个累积和