分两个线程,计算 $1+1+1+\ldots+1$ (共计 $2n$ 个 $1$)
#define N 100000000
long sum = 0;
void Tsum() { for (int i = 0; i < N; i++) sum++; }
int main() {
create(Tsum);
create(Tsum);
join();
printf("sum = %ld\n", sum);
}
如果添加编译优化?
-O1
: 100000000 😱😱-O2
: 200000000 😱😱😱编译器对内存访问 “eventually consistent” 的处理导致共享内存作为线程同步工具的失效。
刚才的例子
-O1
: R[eax] = sum; R[eax] += N; sum = R[eax]
-O2
: sum += N;
另一个例子
while (!done);
// would be optimized to
if (!done) while (1);
回忆 “编译正确性”
asm volatile ("" ::: "memory");
volatile
变量extern int volatile done;
while (!done) ;