转自:http://hi.baidu.com/lammy/blog/item/bc5e3d4e869073c3d1c86a89.html
在GTK+2.0源码中有很多这样的宏:G_LIKELY和G_UNLIKELY。比如下面这段代码:
if (G_LIKELY (acat == 1)) /* allocate through magazine layer */ { ThreadMemory *tmem = thread_memory_from_self(); guint ix = SLAB_INDEX (allocator, chunk_size); if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix))) { thread_memory_swap_magazines (tmem, ix); if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix))) thread_memory_magazine1_reload (tmem, ix); } mem = thread_memory_magazine1_alloc (tmem, ix); }
在源码中,宏G_LIKELY和G_UNLIKELY 是这么定义的:
#define G_LIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 1)) #define G_UNLIKELY(expr) (__builtin_expect (_G_BOOLEAN_EXPR(expr), 0))
宏_G_BOOLEAN_EXPR的作用是把expr转换为0和1,即真假两种。要理解宏G_LIKELY和G_UNLIKELY ,很明显必须理解__builtin_expect。__builtin_expect是GCC(version>=2.9)引进的宏,其作用就是帮助编译器判断条件跳转的预期值,避免跳转造成时间乱费。拿上面的代码来说:
if (G_LIKELY (acat == 1)) //表示大多数情况下if里面是真,程序大多数直接执行if里面的程序
而
if (G_UNLIKELY (thread_memory_magazine1_is_empty (tmem, ix)))//表示大多数情况if里面为假,程序大多数直接执行else里面的程序
可能大家看到还是一头雾水,看下面一段就会明白其中的乐趣啦;
//test_builtin_expect.c #define LIKELY(x) __builtin_expect(!!(x), 1) #define UNLIKELY(x) __builtin_expect(!!(x), 0)
int test_likely(int x) { if(LIKELY(x)) { x = 5; } else { x = 6; } return x; }
int test_unlikely(int x) { if(UNLIKELY(x)) { x = 5; } else { x = 6; } return x; }
[lammy@localhost test_builtin_expect]$ gcc -fprofile-arcs -O2 -c test_builtin_expect.c [lammy@localhost test_builtin_expect]$ objdump -d test_builtin_expect.o
test_builtin_expect.o: file format elf32-i386
Disassembly of section .text:
00000000 <test_likely>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 08 mov 0x8(%ebp),%eax
6: 83 05 38 00 00 00 01 addl $0x1,0x38
d: 83 15 3c 00 00 00 00 adcl $0x0,0x3c
14: 85 c0 test %eax,%eax
16: 74 15 je 2d <test_likely+0x2d>//主要看这里
18: 83 05 40 00 00 00 01 addl $0x1,0x40
1f: b8 05 00 00 00 mov $0x5,%eax
24: 83 15 44 00 00 00 00 adcl $0x0,0x44
2b: 5d pop %ebp
2c: c3 ret
2d: 83 05 48 00 00 00 01 addl $0x1,0x48
34: b8 06 00 00 00 mov $0x6,%eax
39: 83 15 4c 00 00 00 00 adcl $0x0,0x4c
40: 5d pop %ebp
41: c3 ret
42: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
49: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
00000050 <test_unlikely>:
50: 55 push %ebp
51: 89 e5 mov %esp,%ebp
53: 8b 55 08 mov 0x8(%ebp),%edx
56: 83 05 20 00 00 00 01 addl $0x1,0x20
5d: 83 15 24 00 00 00 00 adcl $0x0,0x24
64: 85 d2 test %edx,%edx
66: 75 15 jne 7d <test_unlikely+0x2d>//主要看这里
68: 83 05 30 00 00 00 01 addl $0x1,0x30
6f: b8 06 00 00 00 mov $0x6,%eax
74: 83 15 34 00 00 00 00 adcl $0x0,0x34
7b: 5d pop %ebp
7c: c3 ret
7d: 83 05 28 00 00 00 01 addl $0x1,0x28
84: b8 05 00 00 00 mov $0x5,%eax
89: 83 15 2c 00 00 00 00 adcl $0x0,0x2c
90: 5d pop %ebp
91: c3 ret
92: 8d b4 26 00 00 00 00 lea 0x0(%esi,%eiz,1),%esi
99: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
000000a0 <_GLOBAL__I_65535_0_test_likely>: a0: 55 push %ebp a1: 89 e5 mov %esp,%ebp a3: 83 ec 08 sub $0x8,%esp a6: c7 04 24 00 00 00 00 movl $0x0,(%esp) ad: e8 fc ff ff ff call ae <_GLOBAL__I_65535_0_test_likely+0xe> b2: c9 leave b3: c3 ret [lammy@localhost test_builtin_expect]$
两个函数编译生成的汇编语句所使用到的跳转指令不一样,仔细分析下会发现__builtin_expect实际上是为了满足在大多数情况不执行跳转指令,所以__builtin_expect仅仅是告诉编译器优化,并没有改变其对真值的判断。
这种用法在Linux内核中也经常用到,国外也有一篇相关的文章,大家不妨看看:http://kernelnewbies.org/FAQ/LikelyUnlikely
不知大家注意到没有,我在生产汇编时用的是gcc -fprofile-arcs -O2 -c test_builtin_expect.c,而不是gcc -O2 -c test_builtin_expect.c,具体可以参考http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html。
FAQ/LikelyUnlikely
likely() and unlikely()
What are they ?
In Linux kernel code, one often find calls to likely() and unlikely(), in conditions, like :
bvl = bvec_alloc(gfp_mask, nr_iovecs, &idx); if (unlikely(!bvl)) { mempool_free(bio, bio_pool); bio = NULL; goto out; }
In fact, these functions are hints for the compiler that allows it to correctly optimize the branch, by knowing which is the likeliest one. The definitions of these macros, found in include/linux/compiler.h are the following :
#define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0)
The GCC documentation explains the role of __builtin_expect() :
-- Built-in Function: long __builtin_expect (long EXP, long C) You may use `__builtin_expect' to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (`-fprofile-arcs'), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect. The return value is the value of EXP, which should be an integral expression. The value of C must be a compile-time constant. The semantics of the built-in are that it is expected that EXP == C. For example: if (__builtin_expect (x, 0)) foo (); would indicate that we do not expect to call `foo', since we expect `x' to be zero. Since you are limited to integral expressions for EXP, you should use constructions such as if (__builtin_expect (ptr != NULL, 1)) error (); when testing pointer or floating-point values.
How does it optimize things ?
It optimizes things by ordering the generated assembly code correctly, to optimize the usage of the processor pipeline. To do so, they arrange the code so that the likeliest branch is executed without performing any jmp instruction (which has the bad effect of flushing the processor pipeline).
To see how it works, let's compile the following simple C user space program with gcc -O2 :
#define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0) int main(char *argv[], int argc) { int a; /* Get the value from somewhere GCC can't optimize */ a = atoi (argv[1]); if (unlikely (a == 2)) a++; else a--; printf ("%d\n", a); return 0; }
Now, disassemble the resulting binary using objdump -S (comments added by me) :
080483b0 <main>: // Prologue 80483b0: 55 push %ebp 80483b1: 89 e5 mov %esp,%ebp 80483b3: 50 push %eax 80483b4: 50 push %eax 80483b5: 83 e4 f0 and $0xfffffff0,%esp // Call atoi() 80483b8: 8b 45 08 mov 0x8(%ebp),%eax 80483bb: 83 ec 1c sub $0x1c,%esp 80483be: 8b 48 04 mov 0x4(%eax),%ecx 80483c1: 51 push %ecx 80483c2: e8 1d ff ff ff call 80482e4 <atoi@plt> 80483c7: 83 c4 10 add $0x10,%esp // Test the value 80483ca: 83 f8 02 cmp $0x2,%eax // -------------------------------------------------------- // If 'a' equal to 2 (which is unlikely), then jump, // otherwise continue directly, without jump, so that it // doesn't flush the pipeline. // -------------------------------------------------------- 80483cd: 74 12 je 80483e1 <main+0x31> 80483cf: 48 dec %eax // Call printf 80483d0: 52 push %edx 80483d1: 52 push %edx 80483d2: 50 push %eax 80483d3: 68 c8 84 04 08 push $0x80484c8 80483d8: e8 f7 fe ff ff call 80482d4 <printf@plt> // Return 0 and go out. 80483dd: 31 c0 xor %eax,%eax 80483df: c9 leave 80483e0: c3 ret
Now, in the previous program, replace the unlikely() by a likely(), recompile it, and disassemble it again (again, comments added by me) :
080483b0 <main>: // Prologue 80483b0: 55 push %ebp 80483b1: 89 e5 mov %esp,%ebp 80483b3: 50 push %eax 80483b4: 50 push %eax 80483b5: 83 e4 f0 and $0xfffffff0,%esp // Call atoi() 80483b8: 8b 45 08 mov 0x8(%ebp),%eax 80483bb: 83 ec 1c sub $0x1c,%esp 80483be: 8b 48 04 mov 0x4(%eax),%ecx 80483c1: 51 push %ecx 80483c2: e8 1d ff ff ff call 80482e4 <atoi@plt> 80483c7: 83 c4 10 add $0x10,%esp // -------------------------------------------------- // If 'a' equal 2 (which is likely), we will continue // without branching, so without flusing the pipeline. The // jump only occurs when a != 2, which is unlikely. // --------------------------------------------------- 80483ca: 83 f8 02 cmp $0x2,%eax 80483cd: 75 13 jne 80483e2 <main+0x32> // Here the a++ incrementation has been optimized by gcc 80483cf: b0 03 mov $0x3,%al // Call printf() 80483d1: 52 push %edx 80483d2: 52 push %edx 80483d3: 50 push %eax 80483d4: 68 c8 84 04 08 push $0x80484c8 80483d9: e8 f6 fe ff ff call 80482d4 <printf@plt> // Return 0 and go out. 80483de: 31 c0 xor %eax,%eax 80483e0: c9 leave 80483e1: c3 ret
How should I use it ?
You should use it only in cases when the likeliest branch is very very very likely, or when the unlikeliest branch is very very very unlikely.