My favorites | Sign in
Project Home Wiki Issues Source
Repository:
Checkout   Browse   Changes   Clones  
Changes to /ramblings/blogs-optimizing-1.md
61f4f1ae9493 vs. fe414b9cb184 Compare: vs.  Format:
Revision fe414b9cb184
Go to: 
Project members, sign in to write a code review
/ramblings/blogs-optimizing-1.md   61f4f1ae9493 /ramblings/blogs-optimizing-1.md   fe414b9cb184
1 regex-dna - regex matching fast enough
2 ======================================
1 3
2 Since my goal is to improve the compiler optimizer (staticly with B::CC, but also the perl compiler in op.c) I came to produce these interesting benchmarks. 4 Since my goal is to improve the compiler optimizer (staticly with B::CC, but also the perl compiler in op.c) I came to produce these interesting benchmarks.
3 5
4 I took the regex-dna example from *"The Computer Language Benchmarks Game"* at [shootout.alioth.debian.org](http://shootout.alioth.debian.org/) 6 I took the regex-dna example from *"The Computer Language Benchmarks Game"* at [shootout.alioth.debian.org](http://shootout.alioth.debian.org/)
5 7
6 $ time perl t/regex-dna.pl <t/regexdna-input 8 $ time perl t/regex-dna.pl <t/regexdna-input
7 agggtaaa|tttaccct 0 9 agggtaaa|tttaccct 0
8 [cgt]gggtaaa|tttaccc[acg] 3 10 [cgt]gggtaaa|tttaccc[acg] 3
9 a[act]ggtaaa|tttacc[agt]t 9 11 a[act]ggtaaa|tttacc[agt]t 9
10 ag[act]gtaaa|tttac[agt]ct 8 12 ag[act]gtaaa|tttac[agt]ct 8
11 agg[act]taaa|ttta[agt]cct 10 13 agg[act]taaa|ttta[agt]cct 10
12 aggg[acg]aaa|ttt[cgt]ccct 3 14 aggg[acg]aaa|ttt[cgt]ccct 3
13 agggt[cgt]aa|tt[acg]accct 4 15 agggt[cgt]aa|tt[acg]accct 4
14 agggta[cgt]a|t[acg]taccct 3 16 agggta[cgt]a|t[acg]taccct 3
15 agggtaa[cgt]|[acg]ttaccct 5 17 agggtaa[cgt]|[acg]ttaccct 5
16 18
17 101745 19 101745
18 100000 20 100000
19 133640 21 133640
20 22
21 real 0m**0.130s** /(varying from 0.125 to 0.132)/ 23 real 0m**0.130s** /(varying from 0.125 to 0.132)/
22 user 0m0.120s 24 user 0m0.120s
23 sys 0m0.008s 25 sys 0m0.008s
24 26
25 t/regexdna-input contains 100KB 1600 lines of DNA code, which is used to match 27 t/regexdna-input contains 100KB 1600 lines of DNA code, which is used to match
26 DNA 8-mers and substitute nucleotides for IUB codes. 28 DNA 8-mers and substitute nucleotides for IUB codes.
27 29
28 $ wc t/regexdna-input 30 $ wc t/regexdna-input
29 1671 1680 101745 t/regexdna-input 31 1671 1680 101745 t/regexdna-input
30 32
31 Perl behaves pretty good in this [benchmark](http://shootout.alioth.debian.org/u64q/performance.php?test=regexdna), 33 Perl behaves pretty good in this [benchmark](http://shootout.alioth.debian.org/u64q/performance.php?test=regexdna),
32 it is actually the fastest scripting language. But the compiler should 34 it is actually the fastest scripting language. But the compiler should
33 do better, and I had some ideas to try out for the optimizing 35 do better, and I had some ideas to try out for the optimizing
34 compiler. So I thought. 36 compiler. So I thought.
35 37
36 First the simple and stable B::C compiler with -O3: 38 First the simple and stable B::C compiler with -O3:
37 39
38 $ perlcc -O3 -o regex-dna-c -S t/regex-dna.pl 40 $ perlcc -O3 -o regex-dna-c -S t/regex-dna.pl
39 $ time ./regex-dna-c <t/regexdna-input 41 $ time ./regex-dna-c <t/regexdna-input
40 agggtaaa|tttaccct 0 42 agggtaaa|tttaccct 0
41 [cgt]gggtaaa|tttaccc[acg] 3 43 [cgt]gggtaaa|tttaccc[acg] 3
42 a[act]ggtaaa|tttacc[agt]t 9 44 a[act]ggtaaa|tttacc[agt]t 9
43 ag[act]gtaaa|tttac[agt]ct 8 45 ag[act]gtaaa|tttac[agt]ct 8
44 agg[act]taaa|ttta[agt]cct 10 46 agg[act]taaa|ttta[agt]cct 10
45 aggg[acg]aaa|ttt[cgt]ccct 3 47 aggg[acg]aaa|ttt[cgt]ccct 3
46 agggt[cgt]aa|tt[acg]accct 4 48 agggt[cgt]aa|tt[acg]accct 4
47 agggta[cgt]a|t[acg]taccct 3 49 agggta[cgt]a|t[acg]taccct 3
48 agggtaa[cgt]|[acg]ttaccct 5 50 agggtaa[cgt]|[acg]ttaccct 5
49 51
50 101745 52 101745
51 100000 53 100000
52 133640 54 133640
53 55
54 real 0m**0.285s** 56 real 0m**0.285s**
55 user 0m0.272s 57 user 0m0.272s
56 sys 0m0.004s 58 sys 0m0.004s
57 59
58 0.130s vs 0.285s compiled? What's going on? B::C promises faster startup-time and equal run-time. 60 0.130s vs 0.285s compiled? What's going on? B::C promises faster startup-time and equal run-time.
59 With -S we keep the intermediate C source to study it. 61 With -S we keep the intermediate C source to study it.
60 Let's try B::CC, via -O. Here you don't need a -O3 as B::CC already contains all B::C -O3 optimizations 62 Let's try B::CC, via -O. Here you don't need a -O3 as B::CC already contains all B::C -O3 optimizations
61 63
62 $ perlcc -O -o regex-dna-cc t/regex-dna.pl 64 $ perlcc -O -o regex-dna-cc t/regex-dna.pl
63 $ time ./regex-dna-cc <t/regexdna-input 65 $ time ./regex-dna-cc <t/regexdna-input
64 ... 66 ...
65 real 0m**0.267s** 67 real 0m**0.267s**
66 user 0m0.256s 68 user 0m0.256s
67 sys 0m0.008s 69 sys 0m0.008s
68 70
69 Hmm? Let's see what's going on with **-v5**. 71 Hmm? Let's see what's going on with **-v5**.
70 72
71 $ perlcc -O3 -v5 -S -oregex-dna-c -v5 t/regex-dna.pl 73 $ perlcc -O3 -v5 -S -oregex-dna-c -v5 t/regex-dna.pl
72 74
73 script/perlcc: Compiling t/regex-dna.pl 75 script/perlcc: Compiling t/regex-dna.pl
74 script/perlcc: Writing C on regex-dna-c.c 76 script/perlcc: Writing C on regex-dna-c.c
75 script/perlcc: Calling /usr/local/bin/perl5.14.2d-nt -Iblib/arch -Iblib/lib -MO=C,-O3,-Dsp,-v,-oregex-dna-c.c t/regex-dna.pl 77 script/perlcc: Calling /usr/local/bin/perl5.14.2d-nt -Iblib/arch -Iblib/lib -MO=C,-O3,-Dsp,-v,-oregex-dna-c.c t/regex-dna.pl
76 Starting compile 78 Starting compile
77 Walking tree 79 Walking tree
78 done main optree, walking symtable for extras 80 done main optree, walking symtable for extras
79 Prescan 0 packages for unused subs in main:: 81 Prescan 0 packages for unused subs in main::
80 %skip_package: B::Stackobj B::Section B::FAKEOP B::C B::C::Section::SUPER B::C::Flags 82 %skip_package: B::Stackobj B::Section B::FAKEOP B::C B::C::Section::SUPER B::C::Flags
81 B::Asmdata O DB B::CC Term::ReadLine B::Shadow B::C::Section B::Bblock B::Pseudoreg 83 B::Asmdata O DB B::CC Term::ReadLine B::Shadow B::C::Section B::Bblock B::Pseudoreg
82 B::C::InitSection B::C::InitSection::SUPER 84 B::C::InitSection B::C::InitSection::SUPER
83 descend_marked_unused: 85 descend_marked_unused:
84 ... 86 ...
85 %INC and @INC: 87 %INC and @INC:
86 Delete unsaved packages from %INC, so run-time require will pull them in: 88 Delete unsaved packages from %INC, so run-time require will pull them in:
87 Deleting IO::Handle from %INC 89 Deleting IO::Handle from %INC
88 Deleting XSLoader from %INC 90 Deleting XSLoader from %INC
89 Deleting B::C::Flags from %INC 91 Deleting B::C::Flags from %INC
90 Deleting B::Asmdata from %INC 92 Deleting B::Asmdata from %INC
91 Deleting Tie::Hash::NamedCapture from %INC 93 Deleting Tie::Hash::NamedCapture from %INC
92 Deleting B::C from %INC 94 Deleting B::C from %INC
93 Deleting SelectSaver from %INC 95 Deleting SelectSaver from %INC
94 Deleting IO::Seekable from %INC 96 Deleting IO::Seekable from %INC
95 Deleting base from %INC 97 Deleting base from %INC
96 Deleting Config from %INC 98 Deleting Config from %INC
97 Deleting B from %INC 99 Deleting B from %INC
98 Deleting Fcntl from %INC 100 Deleting Fcntl from %INC
99 Deleting IO from %INC 101 Deleting IO from %INC
100 Deleting Symbol from %INC 102 Deleting Symbol from %INC
101 Deleting O from %INC 103 Deleting O from %INC
102 Deleting Carp from %INC 104 Deleting Carp from %INC
103 Deleting mro from %INC 105 Deleting mro from %INC
104 Deleting File::Spec::Unix from %INC 106 Deleting File::Spec::Unix from %INC
105 Deleting FileHandle from %INC 107 Deleting FileHandle from %INC
106 Deleting Exporter::Heavy from %INC 108 Deleting Exporter::Heavy from %INC
107 Deleting strict from %INC 109 Deleting strict from %INC
108 Deleting Exporter from %INC 110 Deleting Exporter from %INC
109 Deleting vars from %INC 111 Deleting vars from %INC
110 Deleting Errno from %INC 112 Deleting Errno from %INC
111 Deleting File::Spec from %INC 113 Deleting File::Spec from %INC
112 Deleting IO::File from %INC 114 Deleting IO::File from %INC
113 Deleting DynaLoader from %INC 115 Deleting DynaLoader from %INC
114 %include_package: warnings warnings::register 116 %include_package: warnings warnings::register
115 %INC: warnings.pm warnings/register.pm 117 %INC: warnings.pm warnings/register.pm
116 amagic_generation = 1 118 amagic_generation = 1
117 Writing output 119 Writing output
118 Total number of OPs processed: 323 120 Total number of OPs processed: 323
119 NULLOP count: 8 121 NULLOP count: 8
120 122
121 %include_package contains: **warnings warnings::register**. These two cost a lot of time. 123 %include_package contains: **warnings warnings::register**. These two cost a lot of time.
122 Carp is also a nice example of code bloat for the static compiler. 124 Carp is also a nice example of code bloat for the static compiler.
123 125
124 Let's try without: 126 Let's try without:
125 127
126 $ perlcc -O3 -Uwarnings -Uwarnings::register -S -oregex-dna-c1 t/regex-dna.pl 128 $ perlcc -O3 -Uwarnings -Uwarnings::register -S -oregex-dna-c1 t/regex-dna.pl
127 $ wc regex-dna-c.c 129 $ wc regex-dna-c.c
128 2293 16084 128953 regex-dna-c.c 130 2293 16084 128953 regex-dna-c.c
129 $ wc regex-dna-c1.c 131 $ wc regex-dna-c1.c
130 1201 7488 57236 regex-dna-c1.c 132 1201 7488 57236 regex-dna-c1.c
131 133
132 128953 down to 57236 bytes. Double size with warnings. So lot of startup-time overhead. 134 128953 down to 57236 bytes. Double size with warnings. So lot of startup-time overhead.
133 135
134 $ perlcc -O -O2 -Uwarnings -Uwarnings::register -S -oregex-dna-cc1 t/regex-dna.pl 136 $ perlcc -O -O2 -Uwarnings -Uwarnings::register -S -oregex-dna-cc1 t/regex-dna.pl
135 137
136 $ time ./regex-dna-c1 <t/regexdna-input 138 $ time ./regex-dna-c1 <t/regexdna-input
137 ... 139 ...
138 real 0m**0.284s** 140 real 0m**0.284s**
139 user 0m0.271s 141 user 0m0.271s
140 sys 0m0.004s 142 sys 0m0.004s
141 143
142 $ time ./regex-dna-cc1 <t/regexdna-input 144 $ time ./regex-dna-cc1 <t/regexdna-input
143 ... 145 ...
144 real 0m**0.266s** 146 real 0m**0.266s**
145 user 0m0.255s 147 user 0m0.255s
146 sys 0m0.008s 148 sys 0m0.008s
147 149
148 Not much gain by stripping warnings, since the main part is run-time, startup-time is usually 150 Not much gain by stripping warnings, since the main part is run-time, startup-time is usually
149 0.010 (uncompiled) to 0.001 (compiled). 151 0.010 (uncompiled) to 0.001 (compiled).
150 152
151 Wait, what perl is perlcc calling at all? Hopefully the same as perl. Nope. As it turns out 153 Wait, what perl is perlcc calling at all? Hopefully the same as perl. Nope. As it turns out
152 perlcc was compiled debugging, and comparing debugging perls with non-debugging explains double run-time. You see it with -v in the output above /usr/local/bin/perl5.14.2d-nt, which is my naming [perlall-derived](http://search.cpan.org/dist/App-perlall/) convention for debugging non-threaded. 154 perlcc was compiled debugging, and comparing debugging perls with non-debugging explains double run-time. You see it with -v in the output above /usr/local/bin/perl5.14.2d-nt, which is my naming [perlall-derived](http://search.cpan.org/dist/App-perlall/) convention for debugging non-threaded.
153 155
154 Recompiling the compiler with normal perl, and re-testing: 156 Recompiling the compiler with normal perl, and re-testing:
155 157
156 $ perl -S perlcc -O3 -Uwarnings -Uwarnings::register -S -oregex-dna-c1 t/regex-dna.pl 158 $ perl -S perlcc -O3 -Uwarnings -Uwarnings::register -S -oregex-dna-c1 t/regex-dna.pl
157 $ perl -S perlcc -O -O2 -Uwarnings -Uwarnings::register -S -oregex-dna-cc1 t/regex-dna.pl 159 $ perl -S perlcc -O -O2 -Uwarnings -Uwarnings::register -S -oregex-dna-cc1 t/regex-dna.pl
158 160
159 $ time ./regex-dna-c1 <t/regexdna-input 161 $ time ./regex-dna-c1 <t/regexdna-input
160 ... 162 ...
161 real 0m0.127s 163 real 0m0.127s
162 user 0m0.124s 164 user 0m0.124s
163 sys 0m0.000s 165 sys 0m0.000s
164 166
165 $ time ./regex-dna-cc1 <t/regexdna-input 167 $ time ./regex-dna-cc1 <t/regexdna-input
166 ... 168 ...
167 real 0m0.121s 169 real 0m0.121s
168 user 0m0.120s 170 user 0m0.120s
169 sys 0m0.008s 171 sys 0m0.008s
170 172
171 0.130s vs 0.127s (compiled) vs 0.121s (optimizing compiled) makes now sense. But not much room to improve here, as the regex engine already has a pretty good DFA (not the fastest as re::Engine::RE2 would be faster) but is not optimizable by the optimizing compiler. 173 0.130s vs 0.127s (compiled) vs 0.121s (optimizing compiled) makes now sense. But not much room to improve here, as the regex engine already has a pretty good DFA (not the fastest as re::Engine::RE2 would be faster) but is not optimizable by the optimizing compiler.
172 174
173 Better optimize numbers. Tomorrow. I want to improve *stack smashing* in B::CC. 175 Better optimize numbers. Tomorrow. I want to improve *stack smashing* in B::CC.
174 Getting rid of copying intermediate C values from the C stack and back to the perl heap. 176 Getting rid of copying intermediate C values from the C stack and back to the perl heap.
175 177
176 See the [arithmetic part 2](http://blogs.perl.org/users/rurban/2012/10/optimizing-compiler-benchmarks-part-2.html) 178 See the [arithmetic part 2](http://blogs.perl.org/users/rurban/2012/10/optimizing-compiler-benchmarks-part-2.html)
Powered by Google Project Hosting