code 大概长这样
inline int64_t RoundUpToPowerOfTwo(int64_t v) {
--v;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v |= v >> 32;
++v;
return v;
}
void foo(int64_t* src, int64_t* dst, int len) {
for (int i = 0; i < len; i++) {
dst[i] = RoundUpToPowerOfTwo(src[i]);
}
}
编译参数
$ g++ -fopt-info-vec-optimized -O3 -g -fopt-info-vec-optimized ans.cpp -std=c++11 -mavx2
没输出
$ objdump -d ./a.out |less
...
发现没相关vectorized指令,但是这个 RoundUpToPowerOfTwo 的确是内联了,中间没有函数调用
添加 __restrict 参数也没作用
经过排查发现右移是无法向量化的
void foo(int64_t* src, int64_t* dst, int len) {
for (int i = 0; i < len; i++) {
dst[i] = src[i] >> 1;
}
}
查阅资料发现左移是可以向量化的
解决思路:
// 把输入改成uint64_t
void foo(uint64_t* src, uint64_t* dst, int len) {
for (int i = 0; i < len; i++) {
dst[i] = src[i] >> 1;
}
}
inline uint64_t RoundUpToPowerOfTwo(uint64_t v);
ans.cpp:46:23: optimized: loop vectorized using 32 byte vectors
ans.cpp:46:23: optimized: loop versioned for vectorization because of possible aliasing
ans.cpp:46:23: optimized: loop vectorized using 16 byte vectors