请大家及时关注课程网站发布的作业
HW2/Lab2 已发布
PA2.2 已临近截止
以各位的平均水准来看,完成 PA 的时间是长于预估时间的
我们已经知道数据是如何在计算机中表示的。但为什么要这样表示?这样的表示有什么好处和用法?
逻辑门和导线是构成计算机 (组合逻辑电路) 的基本单元
&
(与), |
(或), ~
(非)^
(异或)<<
(左移位), >>
(右移位)习题:用上述位运算和常数实现 4 位整数的加法运算/Lab1
142857 -> 0000 0000 0000 0010 0010 1110 0000 1001
热身问题:字符串操作
&
,|
,~
, ... 对于整数里的每一个 bit 来说是独立 (并行) 的
如果我们操作的对象刚好每一个 bit 是独立的
例子:Bit Set
bitset
,性能非常可观测试 $x\in S$
(S >> x) & 1
求 $S' = S\cup{x}$
S | (1 << x)
更多习题
int bitset_size(uint32_t S) {
int n;
for (int i = 0; i < 32; i++) {
n += bitset_contains(S, i);
}
return n;
}
int bitset_size1(uint32_t S) { // SIMD
S = (S & 0x55555555) + ((S >> 1) & 0x55555555);
S = (S & 0x33333333) + ((S >> 2) & 0x33333333);
S = (S & 0x0F0F0F0F) + ((S >> 4) & 0x0F0F0F0F);
S = (S & 0x00FF00FF) + ((S >> 8) & 0x00FF00FF);
S = (S & 0x0000FFFF) + ((S >> 16) & 0x0000FFFF);
return S;
}
有二进制数x = 0b+++++100
,我们希望得到最后那个100
+++++
的部分给抵消掉表达式 | 结果 |
---|---|
x |
0b+++++100 |
x-1 |
0b+++++011 |
~x |
0b-----011 |
~x+1 |
0b-----100 |
一些有趣的式子:
x & (x-1)
→ 0b+++++000
;x ^ (x-1)
→ 0b00000111
x & (~x+1)
→ 0b00000100
(lowbit️)x & -x
, (~x & (x-1)) + 1
都可以实现 lowbit等同于 $31 - \mathrm{clz}(x)$
int clz(uint32_t x) {
int n = 0;
if (x <= 0x0000ffff) n += 16, x <<= 16;
if (x <= 0x00ffffff) n += 8, x <<= 8;
if (x <= 0x0fffffff) n += 4, x <<= 4;
if (x <= 0x3fffffff) n += 2, x <<= 2;
if (x <= 0x7fffffff) n ++;
return n;
}
(奇怪的代码) 假设 $x$ 是 lowbit
得到的结果?
#define LOG2(x) \
("-01J2GK-3@HNL;-=47A-IFO?M:<6-E>95D8CB"[(x) % 37] - '0')
用一点点元编程 (meta-programming);试一试 log2.c
import json
n, base = 64, '0'
for m in range(n, 10000):
if len({ (2**i) % m for i in range(n) }) == n:
M = { j: chr(ord(base) + i)
for j in range(0, m)
for i in range(0, n)
if (2**i) % m == j }
break
magic = json.dumps(''.join(
[ M.get(j, '-') for j in range(0, m) ]
)).strip('"')
print(f'#define LOG2(x) ("{magic}"[(x) % {m}] - \'{base}\')')
Henry S. Warren, Jr. Hacker's Delight (2ed), Addison-Wesley, 2012.
让你理解写出更快的代码并不是 “瞎猜”
Undefined behavior (UB) is the result of executing computer code whose behavior is not prescribed by the language specification to which the code adheres, for the current state of the program. This happens when the translator of the source code makes certain assumptions, but these assumptions are not satisfied during execution. -- Wikipedia
C 对 UB 的行为是不做任何约束的,把电脑炸了都行
为了尽可能高效 (zero-overhead)
为了兼容多种硬件体系结构
/0
会产生处理器异常埋下了灾难的种子
例子:CVE-2018-7445 (RouterOS), 仅仅是忘记检查缓冲区大小……
while (len) {
for (i = offset; (i - offset) < len; ++i) {
dst[i] = src[i+1];
}
len = src[i+1]; ...
offset = i + 1;
}
表达式 | 值 |
---|---|
UINT_MAX+1 |
0 |
INT_MAX+1; LONG_MAX+1 |
undefined |
char c = CHAR_MAX; c++; |
varies (???) |
1 << -1 |
undefined |
1 << 0 |
1 |
1 << 31 |
undefined |
1 << 32 |
undefined |
1 / 0 |
undefined |
INT_MAX % -1 |
undefined |
int f() { return 1 << -1; }
根据手册,这是个 UB,于是 clang 这样处置……
0000000000000000 <f>:
0: c3 retq
编译器把这个计算直接删除了
W. Xi, et al. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In Proceedings of SOSP, 2013.
实数非常非常多 ($\aleph_0 < \mathfrak c$)
于是有了 IEEE754 (1bit S, 23/52bits Fraction, 8/11bits Exponent)
$$x = (-1)^S \times (1.F) \times 2^{E - B}$$
一个有关浮点数大小/密度的实验 (float.c)
越大的数字,距离下一个实数的距离就越大
例子:计算 $1 + \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{n}$
#define SUM(T, st, ed, d) ({ \
T s = 0; \
for (int i = st; i != ed + d; i += d) \
s += (T)1 / i; \
s; \
})
比较
a == b
需要谨慎判断 (要假设自带 $\varepsilon$)非规格化数 (Exponent == 0)
零
+0.0
, -0.0
的 $S$ bit 是不一样的,但 +0.0 == -0.0
Inf/NaN (Not a Number)
0.0/0.0
): 能够满足 x != x
表达式的值除了 $x = (-1)^S \times (1.F) \times 2^{E - B}$,还要考虑
An interview with the old man of floating-point. Reminiscences elicited from William Kahan by Charles Severance.
如果考虑比较极端的数值条件?
x + 1.0 == x
的例子吗一个更好的一元二次方程求根公式
It looked pretty complicated. On the other hand, we had a rationale for everything. -- William Kahan, 1989 ACM Turing Award Winner for his fundamental contributions to numerical analysis.
+0.0/-0.0
和 Inf 保证 $(1/x)/x$ 不会发生 sign shiftIEEE754 天才的设计保证了数值计算的稳定
应用:Surface Norm
如何不借助硬件指令,快速 (近似) 计算 $f(x)$?
float Q_rsqrt( float number ) {
union { float f; uint32_t i; } conv;
float x2 = number * 0.5F;
conv.f = number;
conv.i = 0x5f3759df - ( conv.i >> 1 ); // ???
conv.f = conv.f * ( 1.5F - ( x2 * conv.f * conv.f ) );
return conv.f;
}
看看别人的毕业设计
int64_t multimod_fast(int64_t a, int64_t b, int64_t m) {
int64_t x = (int64_t)((double)a * b / m) * m;
int64_t t = (a * b - x) % m;
return t < 0 ? t + m : t;
}
令 $a \times b = p \cdot m + q$
a * b
→ $(p \cdot m + q) \bmod 2^{64}$x
→ $(\lfloor \frac{p\cdot m+q}{m} \rfloor \cdot m) \bmod 2^{64}$Everything is a bit-string!
PA: 禁止写出不可维护的代码