在浏览CLR / CLI规范和内存模型等时,我注意到根据ECMA CLI spec的原子读/写的措辞:
A conforming CLI shall guarantee that read and write access to
properly aligned memory locations no larger than the native word size
(the size of type native int) is atomic when all the write accesses to
a location are the same size.
特别是“恰当对齐的记忆”这句话引起了我的注意.我想知道我是否能以某种技巧在64位系统上以某种方式使用long类型进行撕裂读取.所以我写了以下测试用例:
unsafe class Program {
const int NUM_ITERATIONS = 200000000;
const long STARTING_VALUE = 0x100000000L + 123L;
const int NUM_LONGS = 200;
private static int prevLongWriteIndex = 0;
private static long* misalignedLongPtr = (long*) GetMisalignedHeapLongs(NUM_LONGS);
public static long SharedState {
get {
Thread.MemoryBarrier();
return misalignedLongPtr[prevLongWriteIndex % NUM_LONGS];
}
set {
var myIndex = Interlocked.Increment(ref prevLongWriteIndex) % NUM_LONGS;
misalignedLongPtr[myIndex] = value;
}
}
static unsafe void Main(string[] args) {
Thread writerThread = new Thread(WriterThreadEntry);
Thread readerThread = new Thread(ReaderThreadEntry);
writerThread.Start();
readerThread.Start();
writerThread.Join();
readerThread.Join();
Console.WriteLine("Done");
Console.ReadKey();
}
private static IntPtr GetMisalignedHeapLongs(int count) {
const int ALIGNMENT = 7;
IntPtr reservedMemory = Marshal.AllocHGlobal(new IntPtr(sizeof(long) * count + ALIGNMENT - 1));
long allocationOffset = (long) reservedMemory % ALIGNMENT;
if (allocationOffset == 0L) return reservedMemory;
return reservedMemory + (int) (ALIGNMENT - allocationOffset);
}
private static void WriterThreadEntry() {
for (int i = 0; i < NUM_ITERATIONS; ++i) {
SharedState = STARTING_VALUE + i;
}
}
private static void ReaderThreadEntry() {
for (int i = 0; i < NUM_ITERATIONS; ++i) {
var sharedStateLocal = SharedState;
if (sharedStateLocal < STARTING_VALUE) Console.WriteLine("Torn read detected: " + sharedStateLocal);
}
}
}
但是,无论我运行程序多少次,我都没有合法地看到“Torn read detected!”这一行.那为什么不呢?
我在一个块中分配了多个long,希望它们中的至少一个会在两个缓存行之间溢出;并且第一个长度的“起始点”应该是错位的(除非我误解了某些东西).
我也知道多线程错误的本质意味着它们很难强制,而且我的“测试程序”并不像它那样严格,但我现在运行程序差不多30次而没有结果 – 每个有200000000次迭代.
解决方法:
这个程序中存在许多隐藏撕裂读取的缺陷.关于非同步线程的行为的推理从来都不简单,并且难以解释,意外同步的可能性总是很高.
var myIndex = Interlocked.Increment(ref prevLongWriteIndex) % NUM_LONGS;
Interlocked没有什么非常微妙的,不幸的是它也影响了读者线程.很难看,但你可以使用秒表来计算线程的执行时间.你会看到Interlocked在作者身上使阅读器放慢了约2倍.足以影响读者的时机而不是重现问题,意外同步.
消除危险并最大化检测撕裂读取的几率的最简单方法是始终从同一存储位置读取和写入.固定:
var myIndex = 0;
if (sharedStateLocal < STARTING_VALUE)
这个测试对检测撕裂的读数没有多大帮助,有很多只是不会触发测试.在STARTING_VALUE中有太多二进制零使得它更不可能.最大化检测几率的一个好方法是在1和-1之间交替,确保字节值始终不同并使测试非常简单.从而:
private static void WriterThreadEntry() {
for (int i = 0; i < NUM_ITERATIONS; ++i) {
SharedState = 1;
SharedState = -1;
}
}
private static void ReaderThreadEntry() {
for (int i = 0; i < NUM_ITERATIONS; ++i) {
var sharedStateLocal = SharedState;
if (Math.Abs(sharedStateLocal) != 1) {
Console.WriteLine("Torn read detected: " + sharedStateLocal);
}
}
}
这很快就会在32位模式下在控制台中找到几页撕裂的读取数据.要使它们达到64位,你需要做额外的工作来使变量不对齐.它需要跨越L1高速缓存行边界,因此处理器必须执行两次读写操作,就像在32位模式下一样.固定:
private static IntPtr GetMisalignedHeapLongs(int count) {
const int ALIGNMENT = -1;
IntPtr reservedMemory = Marshal.AllocHGlobal(new IntPtr(sizeof(long) * count + 64 + 15));
long cachelineStart = 64 * (((long)reservedMemory + 63) / 64);
long misalignedAddr = cachelineStart + ALIGNMENT;
if (misalignedAddr < (long)reservedMemory) misalignedAddr += 64;
return new IntPtr(misalignedAddr);
}
-1和-7之间的任何ALIGNMENT值现在也会在64位模式下产生撕裂读取.