gcc提供了-fdump-tree-original、-fdump-tree-all等选项,可以输出gcc处理源代码过程中的AST及GIMPLE中间表示信息。例如使用-fudmp-tree-original就可以输出GCC进行词法/语法解析后所生成的AST信息,然而该AST信息过于繁杂,不便于分析,本文通过在GCC源代码中增加一些调试语句,从而输出特定函数的AST信息。
一. gcc源代码中增加调试代码
在gcc/gimplify.c的gimplify_function_tree函数中添加如下语句,主要调用dump_node函数打印当前函数的AST节点,此时打印的节点信息是在AST转换为GIMPLE之前的内容。
void
gimplify_function_tree (tree fndecl)
{
tree oldfn, parm, ret;
gimple_seq seq;
gimple bind;
int i;
//增加的代码
FILE *fp;
char filename[128];
sprintf(filename, "AST-%s", current_function_name ()); //保存AST信息的文件名称为AST-${function_name}
fp = fopen(filename, "w");
dump_node(fndecl, 0x1FFF8, fp); //0x1FFF8为打印标志,不同的标志会导致输出结果的差异
fclose(fp);
//…….
}
二. 编译GCC源代码,生成新的编译程序cc1
三. 使用cc1(或者生成的gcc程序均可)编译测试程序test.c,生成main函数部分的AST信息,保存在文件AST-main中。
测试程序test.c如下:
[GCC@localhost test]$ cat test.c
int main(int argc, char *argv[]){
int i=0;
int sum=0;
for(i=0; i<10; i++){
sum = sum + i;
}
return sum;
}
编译该源代码:
[GCC@localhost test]$ ~/paag-gcc/host-i686-pc-linux-gnu/gcc/cc1 test.c
查看生成的AST导出文件AST-main
[root@localhost AST]# cat AST-main
@1 function_decl name: @2 type: @3 srcp: test.c:1
args: @4 link: extern body: @5
addr: b7337900
@2 identifier_node strg: main lngt: 4 addr: b7320a10
@3 function_type size: @6 algn: 8 retn: @7
prms: @8 addr: b7338750
@4 parm_decl name: @9 type: @7 scpe: @1
srcp: test.c:1 chan: @10
argt: @7 size: @11 algn: 32
used: 0 addr: b72b11e0
@5 bind_expr type: @12 vars: @13 body: @14
addr: b73051b8
@6 integer_cst type: @15 low : 8 addr: b72a9508
@7 integer_type name: @16 size: @11 algn: 32
prec: 32 sign: signed min : @17
max : @18 addr: b72b82d8
@8 tree_list valu: @7 chan: @19 addr: b7342ab8
@9 identifier_node strg: argc lngt: 4 addr: b733cce8
@10 parm_decl name: @20 type: @21 scpe: @1
srcp: test.c:1 argt: @21
size: @11 algn: 32 used: 0
addr: b72b1230
@11 integer_cst type: @15 low : 32 addr: b72a9690
@12 void_type name: @22 algn: 8 addr: b72bf270
@13 var_decl name: @23 type: @7 scpe: @1
srcp: test.c:2 chan: @24
init: @25 size: @11 algn: 32
used: 1 addr: b7343000
@14 statement_list 0 : @26 1 : @27 2 : @28
3 : @29 4 : @30 5 : @31
6 : @32 7 : @33 8 : @34
9 : @35 10 : @36 addr: b7342b28
@15 integer_type name: @37 size: @38 algn: 64
prec: 36 sign: unsigned min : @39
max : @40 addr: b72b8068
@16 type_decl name: @41 type: @7 srcp: <built-in>:0
addr: b72b8680
@17 integer_cst type: @7 high: -1 low : -2147483648
addr: b72a963c
@18 integer_cst type: @7 low : 2147483647 addr: b72a9658
@19 tree_list valu: @21 chan: @42 addr: b7342a9c
@20 identifier_node strg: argv lngt: 4 addr: b733cd20
@21 pointer_type size: @11 algn: 32 ptd : @43
addr: b73386e8
@22 type_decl name: @44 type: @12 srcp: <built-in>:0
addr: b72c2000
@23 identifier_node strg: i lngt: 1 addr: b733cd90
@24 var_decl name: @45 type: @7 scpe: @1
srcp: test.c:3 init: @25
size: @11 algn: 32 used: 1
addr: b7343058
@25 integer_cst type: @7 low : 0 addr: b72a9ccc
@26 decl_expr type: @12 addr: b72b6500
@27 decl_expr type: @12 addr: b72b6520
@28 modify_expr type: @7 op 0: @13 op 1: @25
addr: b72b060c
@29 goto_expr type: @12 labl: @46 addr: b72b65a0
@30 label_expr type: @12 name: @47 addr: b72b6540
@31 modify_expr type: @7 op 0: @24 op 1: @48
addr: b72b069c
@32 postincrement_expr type: @7 op 0: @13 op 1: @49
addr: b72b0654
@33 label_expr type: @12 name: @46 addr: b72b6580
@34 cond_expr type: @12 op 0: @50 op 1: @51
op 2: @52 addr: b7305190
@35 label_expr type: @12 name: @53 addr: b72b65e0
@36 return_expr type: @12 expr: @54 addr: b72b6600
@37 identifier_node strg: bit_size_type lngt: 13
addr: b72b79a0
@38 integer_cst type: @15 low : 64 addr: b72a978c
@39 integer_cst type: @15 low : 0 addr: b72a9c08
@40 integer_cst type: @15 high: 15 low : -1
addr: b72a9bd0
@41 identifier_node strg: int lngt: 3 addr: b72b7348
@42 tree_list valu: @12 addr: b72a9fc0
@43 pointer_type size: @11 algn: 32 ptd : @55
addr: b72c1af8
@44 identifier_node strg: void lngt: 4 addr: b72b7658
@45 identifier_node strg: sum lngt: 3 addr: b733cdc8
@46 label_decl type: @12 scpe: @1 srcp: test.c:6
note: artificial addr: b72b1320
@47 label_decl type: @12 scpe: @1 srcp: test.c:6
note: artificial addr: b72b12d0
@48 plus_expr type: @7 op 0: @24 op 1: @13
addr: b72b0678
@49 integer_cst type: @7 low : 1 addr: b72a9ce8
@50 le_expr type: @7 op 0: @13 op 1: @56
addr: b72b0630
@51 goto_expr type: @12 labl: @47 addr: b72b6560
@52 goto_expr type: @12 labl: @53 addr: b72b65c0
@53 label_decl type: @12 scpe: @1 srcp: test.c:6
note: artificial addr: b72b1370
@54 modify_expr type: @7 op 0: @57 op 1: @24
addr: b72b06c0
@55 integer_type name: @58 size: @6 algn: 8
prec: 8 sign: signed min : @59
max : @60 addr: b72b81a0
@56 integer_cst type: @7 low : 9 addr: b7342bb4
@57 result_decl type: @7 scpe: @1 srcp: test.c:1
note: artificial size: @11
algn: 32 addr: b72b1280
@58 type_decl name: @61 type: @55 srcp: <built-in>:0
addr: b72b86e8
@59 integer_cst type: @55 high: -1 low : -128
addr: b72a94d0
@60 integer_cst type: @55 low : 127 addr: b72a9594
@61 identifier_node strg: char lngt: 4 addr: b72b70a8
四. AST的可视化
可以看出,上述输出的AST信息阅读起来还是比较晦涩,读者很难把握这些节点之间的关系。当源代码比较大时,生成的AST节点更为复杂,此时AST文件的阅读更加困难。
在GCC的新版本例好像已经有了完整的可视化功能,早期的版本,例如GCC-4.4.0中,还需要自己加点东西才可以实现。
为了对上述的AST信息进行有效分析,尤其是各个AST节点之间的相互关系,可以使用http://www.graphviz.org提供的图形可视化工具Graphviz(Graph Visualization Software)对上述的AST信息进行图示,从而直观地进行AST分析。因此需要首先安装该工具。
在进行图示前,需要对上述的AST信息进行处理,分析节点之间的关系,并转换成绘图脚本,最后调用graphviz提供的绘图工具(例如dot工具)绘制出这些节点之间的关系。
下面给出使用shell工具对AST信息进行提取,并进行图形绘制的Shell脚本。主要包括如下几个步骤:(其中如下的pre.awk和treeviz.awk来自网络,可惜忘了出处,对原作者表示感谢!)
1. pre.awk:使用awk脚本对AST文件信息进行预处理。
[GCC@localhost ast-node]$ cat pre.awk
#! /usr/bin/gawk -f
/^[^;]/{
gsub(/^@/, "~@", $0);
gsub(/( *):( *)/, ":", $0);
print;
}
2. treeviz.awk:使用awk将预处理后的AST信息转换成图形脚本。
[GCC@localhost ast-node]$ cat treeviz.awk
#! /usr/bin/gawk -f
#http://alohakun.blog7.fc2.com/?mode=m&no=355
BEGIN {RS = "~@"; printf "digraph G {\n node [shape = record];\n";}
/^[0-9]/{
s = sprintf("%s [label = \"{%s: %s | {", $1, $1, $2);
for(i = 3; i < NF; i++)
s = s sprintf("%s | ",$i);
s = s sprintf("%s}}\"];\n", $i);
$0 = s;
while (/([0-9a-zA-Z]+):@([0-9]+)/){
format = sprintf("<\\1>\\1 \\3\n %s:\\1 -> \\2;", $1);
$0 = gensub(/([0-9a-zA-Z]+):@([0-9]+)(.*)$/, format, "g");
};
printf " %s\n", $0;
}
END {print "}"}
3. ast_to_dot.sh:调用上述两个awk处理脚本,并最终调用dot等绘图工具,生成AST图形。
[GCC@localhost ast-node]$ cat ast_to_dot.sh
#/bin/bash
# $1 为AST文件名称。
# $2 可以是字符串“all”表示图示AST中的所有节点。
# $2,$3,$4,…也可以是一系列的AST节点编号,则该脚本只图示指定编号的AST节点。
# 例如:
# ./ast_to_dot.sh AST_file all 表示图示所有节点及其相关关系。
# ./ast_to_dot.sh AST_file 1 4 5 8 表示图示AST文件中编号为1、4、5、8等几个节点的信息及其关系。
# 获取AST文件名称。
f=$1
# 对AST文件中一些特殊字段进行处理,将不必要的空格去掉。
sed -i "s/op\ 1/op1/g" $f
sed -i "s/op\ 2/op2/g" $f
sed -i "s/op\ 0/op0/g" $f
# 对AST文件进行预处理,为了清晰起见,可以将一些“次要的”信息删除,减少图形中的信息,用户可以根据需要修改。
./pre.awk $f | sed 's/srcp:[a-z_.:0-9<>-]*//g' | sed 's/note:[a-z]*//g' | sed 's/link:[a-z]*//g' | sed 's/used:[0-9]*//g' | sed 's/algn:[0-9]*//g' |sed 's/prec:[0-9]*//g' | sed 's/lngt:[0-9]*//g' | sed 's/sign:[a-z]*//g' > tmp1
# 对简化后的AST文件进行转换,生成图形脚本文件$f.dot。
./treeviz.awk tmp1 > $f.dot
# 创建临时文件。
rm -f tmp; touch tmp
# 如果$2表示全部转换,则直接使用上述转换后的dot脚本,否则,从上述生成的dot脚本中筛选相应的节点,加入到tmp文件中。
if [ $2 != "all" ]
then
echo "digraph G {" >> tmp
echo " node [shape = record];" >> tmp
shift
rm -rf tmp_header
# 筛选给定的节点。
for n in $*
do
grep " $n " $f.dot >> tmp
grep " $n:" $f.dot >> tmp_header
done
rm -rf tmp_header_tail
for n in $*
do
grep " $n;" tmp_header >> tmp_header_tail
done
# 去除冗余的节点信息。
sort tmp_header_tail | uniq >> tmp
echo " }" >> tmp
# 否则图示所有节点。
else
cat $f.dot > tmp
fi
#调用graphviz中的dot工具绘图,节点字体大小为10point,输出文件格式为svg矢量图形格式,输出文件名称为$f.svg。
#dot -Nfontsize=10 -Tsvg tmp -o $f.svg
#输出文件为png格式
dot -Nfontsize=10 -Tpng tmp -o $f.png
4. 使用脚本生成图像
ast_to_dot.sh脚本有两种典型的执行方式:
(1) 对AST文件中的所有内容进行处理并绘图,生成的图形文件名称为AST-main.png。
[GCC@localhost test]$ ./ast_to_dot AST-main all
生成的dot脚本文件AST-main.dot为:
[root@localhost AST]# cat AST-main.dot
digraph G {
node [shape = record];
1 [label = "{1: function_decl | {<name>name | <type>type | <args>args | <body>body }}"];
1:name -> 2;
1:type -> 3;
1:args -> 4;
1:body -> 5;
2 [label = "{2: identifier_node | {strg:main}}"];
3 [label = "{3: function_type | {<size>size | <retn>retn | <prms>prms }}"];
3:size -> 6;
3:retn -> 7;
3:prms -> 8;
4 [label = "{4: parm_decl | {<name>name | <type>type | <scpe>scpe | <chan>chan | <argt>argt | <size>size }}"];
4:name -> 9;
4:type -> 7;
4:scpe -> 1;
4:chan -> 10;
4:argt -> 7;
4:size -> 11;
5 [label = "{5: bind_expr | {<type>type | <vars>vars | <body>body }}"];
5:type -> 12;
5:vars -> 13;
5:body -> 14;
6 [label = "{6: integer_cst | {<type>type | low:8}}"];
6:type -> 15;
7 [label = "{7: integer_type | {<name>name | <size>size | <min>min | <max>max }}"];
7:name -> 16;
7:size -> 11;
7:min -> 17;
7:max -> 18;
8 [label = "{8: tree_list | {<valu>valu | <chan>chan }}"];
8:valu -> 7;
8:chan -> 19;
9 [label = "{9: identifier_node | {strg:argc}}"];
10 [label = "{10: parm_decl | {<name>name | <type>type | <scpe>scpe | <argt>argt | <size>size }}"];
10:name -> 20;
10:type -> 21;
10:scpe -> 1;
10:argt -> 21;
10:size -> 11;
11 [label = "{11: integer_cst | {<type>type | low:32}}"];
11:type -> 15;
12 [label = "{12: void_type | {<name>name }}"];
12:name -> 22;
13 [label = "{13: var_decl | {<name>name | <type>type | <scpe>scpe | <chan>chan | <init>init | <size>size }}"];
13:name -> 23;
13:type -> 7;
13:scpe -> 1;
13:chan -> 24;
13:init -> 25;
13:size -> 11;
14 [label = "{14: statement_list | {<0>0 | <1>1 | <2>2 | <3>3 | <4>4 | <5>5 | <6>6 | <7>7 | <8>8 | <9>9 | <10>10 }}"];
14:0 -> 26;
14:1 -> 27;
14:2 -> 28;
14:3 -> 29;
14:4 -> 30;
14:5 -> 31;
14:6 -> 32;
14:7 -> 33;
14:8 -> 34;
14:9 -> 35;
14:10 -> 36;
15 [label = "{15: integer_type | {<name>name | <size>size | <min>min | <max>max }}"];
15:name -> 37;
15:size -> 38;
15:min -> 39;
15:max -> 40;
16 [label = "{16: type_decl | {<name>name | <type>type }}"];
16:name -> 41;
16:type -> 7;
17 [label = "{17: integer_cst | {<type>type | high:-1 | low:-2147483648}}"];
17:type -> 7;
18 [label = "{18: integer_cst | {<type>type | low:2147483647}}"];
18:type -> 7;
19 [label = "{19: tree_list | {<valu>valu | <chan>chan }}"];
19:valu -> 21;
19:chan -> 42;
20 [label = "{20: identifier_node | {strg:argv}}"];
21 [label = "{21: pointer_type | {<size>size | <ptd>ptd }}"];
21:size -> 11;
21:ptd -> 43;
22 [label = "{22: type_decl | {<name>name | <type>type }}"];
22:name -> 44;
22:type -> 12;
23 [label = "{23: identifier_node | {strg:i}}"];
24 [label = "{24: var_decl | {<name>name | <type>type | <scpe>scpe | <init>init | <size>size }}"];
24:name -> 45;
24:type -> 7;
24:scpe -> 1;
24:init -> 25;
24:size -> 11;
25 [label = "{25: integer_cst | {<type>type | low:0}}"];
25:type -> 7;
26 [label = "{26: decl_expr | {<type>type }}"];
26:type -> 12;
27 [label = "{27: decl_expr | {<type>type }}"];
27:type -> 12;
28 [label = "{28: modify_expr | {<type>type | <op0>op0 | <op1>op1 }}"];
28:type -> 7;
28:op0 -> 13;
28:op1 -> 25;
29 [label = "{29: goto_expr | {<type>type | <labl>labl }}"];
29:type -> 12;
29:labl -> 46;
30 [label = "{30: label_expr | {<type>type | <name>name }}"];
30:type -> 12;
30:name -> 47;
31 [label = "{31: modify_expr | {<type>type | <op0>op0 | <op1>op1 }}"];
31:type -> 7;
31:op0 -> 24;
31:op1 -> 48;
32 [label = "{32: postincrement_expr | {<type>type | <op0>op0 | <op1>op1 }}"];
32:type -> 7;
32:op0 -> 13;
32:op1 -> 49;
33 [label = "{33: label_expr | {<type>type | <name>name }}"];
33:type -> 12;
33:name -> 46;
34 [label = "{34: cond_expr | {<type>type | <op0>op0 | <op1>op1 | <op2>op2 }}"];
34:type -> 12;
34:op0 -> 50;
34:op1 -> 51;
34:op2 -> 52;
35 [label = "{35: label_expr | {<type>type | <name>name }}"];
35:type -> 12;
35:name -> 53;
36 [label = "{36: return_expr | {<type>type | <expr>expr }}"];
36:type -> 12;
36:expr -> 54;
37 [label = "{37: identifier_node | {strg:bit_size_type}}"];
38 [label = "{38: integer_cst | {<type>type | low:64}}"];
38:type -> 15;
39 [label = "{39: integer_cst | {<type>type | low:0}}"];
39:type -> 15;
40 [label = "{40: integer_cst | {<type>type | high:15 | low:-1}}"];
40:type -> 15;
41 [label = "{41: identifier_node | {strg:int}}"];
42 [label = "{42: tree_list | {<valu>valu }}"];
42:valu -> 12;
43 [label = "{43: pointer_type | {<size>size | <ptd>ptd }}"];
43:size -> 11;
43:ptd -> 55;
44 [label = "{44: identifier_node | {strg:void}}"];
45 [label = "{45: identifier_node | {strg:sum}}"];
46 [label = "{46: label_decl | {<type>type | <scpe>scpe }}"];
46:type -> 12;
46:scpe -> 1;
47 [label = "{47: label_decl | {<type>type | <scpe>scpe }}"];
47:type -> 12;
47:scpe -> 1;
48 [label = "{48: plus_expr | {<type>type | <op0>op0 | <op1>op1 }}"];
48:type -> 7;
48:op0 -> 24;
48:op1 -> 13;
49 [label = "{49: integer_cst | {<type>type | low:1}}"];
49:type -> 7;
50 [label = "{50: le_expr | {<type>type | <op0>op0 | <op1>op1 }}"];
50:type -> 7;
50:op0 -> 13;
50:op1 -> 56;
51 [label = "{51: goto_expr | {<type>type | <labl>labl }}"];
51:type -> 12;
51:labl -> 47;
52 [label = "{52: goto_expr | {<type>type | <labl>labl }}"];
52:type -> 12;
52:labl -> 53;
53 [label = "{53: label_decl | {<type>type | <scpe>scpe }}"];
53:type -> 12;
53:scpe -> 1;
54 [label = "{54: modify_expr | {<type>type | <op0>op0 | <op1>op1 }}"];
54:type -> 7;
54:op0 -> 57;
54:op1 -> 24;
55 [label = "{55: integer_type | {<name>name | <size>size | <min>min | <max>max }}"];
55:name -> 58;
55:size -> 6;
55:min -> 59;
55:max -> 60;
56 [label = "{56: integer_cst | {<type>type | low:9}}"];
56:type -> 7;
57 [label = "{57: result_decl | {<type>type | <scpe>scpe | <size>size }}"];
57:type -> 7;
57:scpe -> 1;
57:size -> 11;
58 [label = "{58: type_decl | {<name>name | <type>type }}"];
58:name -> 61;
58:type -> 55;
59 [label = "{59: integer_cst | {<type>type | high:-1 | low:-128}}"];
59:type -> 55;
60 [label = "{60: integer_cst | {<type>type | low:127}}"];
60:type -> 55;
61 [label = "{61: identifier_node | {strg:char}}"];
}
对应的图形文件为AST-main.png
(2) 对AST文件中的编号为@node_num1,@node_num2,...等节点的内容进行处理并绘图,生成的图形文件名称为AST-main.png。
[GCC@localhost test]$ ./ast_to_dot AST-main 1,2,3,4,5,6,7,8
生成的dot脚本文件AST-main.dot内容如下:
[root@localhost AST]# cat AST-main.dot
digraph G {
node [shape = record];
1 [label = "{1: function_decl | {<name>name | <type>type | <args>args | <body>body }}"];
2 [label = "{2: identifier_node | {strg:main}}"];
3 [label = "{3: function_type | {<size>size | <retn>retn | <prms>prms }}"];
4 [label = "{4: parm_decl | {<name>name | <type>type | <scpe>scpe | <chan>chan | <argt>argt | <size>size }}"];
5 [label = "{5: bind_expr | {<type>type | <vars>vars | <body>body }}"];
6 [label = "{6: integer_cst | {<type>type | low:8}}"];
7 [label = "{7: integer_type | {<name>name | <size>size | <min>min | <max>max }}"];
8 [label = "{8: tree_list | {<valu>valu | <chan>chan }}"];
1:args -> 4;
1:body -> 5;
1:name -> 2;
1:type -> 3;
3:prms -> 8;
3:retn -> 7;
3:size -> 6;
4:argt -> 7;
4:scpe -> 1;
4:type -> 7;
8:valu -> 7;
}
生成的图示如下:
需要说明的是:上述脚本中使用脚本pre.awk和treeviz.awk就可以生成dot的脚本,大家可以根据自己的需求自行修改上述的dot文件,完成dot图示内容的筛选,完成不同的功能。