Decompiling the differences

I have created about the simplest test I can, the “SimpleLoop” programmes.
[AS3 version](
[HaXe version](
The difference in performace is dramatic – 10 to 20 fold. But all is not as bad as it seems…

I have been looking at the *very* interesting tool, abcdump.exe,
[as described here](

You can really get a feel for the differences between the outputs. And there is a simple explaination for the differences in file size – haXe includes a small library in the SWF. Fair enough.

I will concentrate on the “Run2” test – both using “while” loops, rather than “for” loops. HaXe’s iterator-style for loops are slower than its while style loops – I’m hoping that this will not always be the case _\[Edit: actually the timings both vary and seem about the same\]_. (Although as a side note, I hope haXe will support *both* styles of for loops, since it makes porting easier amongst other reasons).

A quick look at the decompile of the Run2 function (two nested while… loops) shows the use of the command “iflt”, presumably “if less than”, which seems ideal for these loops. HaXe uses 3 statements here: coerce\_a, lessthan, iffalse. I believe that haXe could easily use this optimisation, especially considering the for(i in 1…1000) style syntax. Also the increment operation. AS3 uses “inclocal\_i”, where haXe uses 4 statements: getlocal, increment, coerce\_a, setlocal. Again some low-hanging fruit for haXe to pick up.

Another trick is “pushshort” rather than “pushint” where size will allow, and it seems haXe integer constants are followed by “coerce_a”, whereas AS3 ones are not. AS3 used “convert\_i” whereas haXe uses “coerce\_a”. I’m not sure of the performace implications of this.

So, after some initial doubts, now I think haXe could get about a 10 fold increase in speed (in these very tight loops) pretty easily. HaXe (especially for flash 9) is very new, and I’m condifent these optimisation will come soon enough.

AS3 haXe
function Run2():int	/* disp_id 0*/
// local_count=4 max_scope=1
// max_stack=2 code_len=60
0     getlocal0
1     pushscope
2     pushbyte      	0
4     setlocal1
5     pushbyte      	0
7     setlocal2
8     pushbyte      	0
10    setlocal3
11    pushbyte      	0
13    setlocal1
14    pushbyte      	0
16    setlocal2
17    jump          	L1

21    label
22    pushbyte      	0
24    setlocal1
25    pushbyte      	0
27    setlocal3
28    jump          	L3

32    label
33    getlocal1
34    getlocal3
35    add
36    convert_i
37    setlocal1
38    inclocal_i    	3

40    getlocal3
41    pushshort     	10000
44    iflt          	L4

48    inclocal_i    	2

50    getlocal2
51    pushshort     	1000
54    iflt          	L2

58    getlocal1
59    returnvalue

    function Run2():*	/* disp_id 0*/
// local_count=4 max_scope=1
// max_stack=2 code_len=70
0     getlocal0
1     pushscope
2     pushbyte      	0
4     coerce_a
5     setlocal1
6     pushbyte      	0
8     coerce_a
9     setlocal2
10    jump          	L1

14    label

15    getlocal2
16    pushint       	1000	// 0x3e8
18    coerce_a
19    lessthan
20    iffalse       	L3

24    pushbyte      	0
26    coerce_a
27    setlocal1
28    pushbyte      	0
30    coerce_a
31    setlocal3
32    jump          	L4

36    label

37    getlocal3
38    pushint       	10000	// 0x2710
40    coerce_a
41    lessthan
42    iffalse       	L6

46    getlocal1
47    getlocal3
48    add
49    coerce_a
50    setlocal1
51    getlocal3
52    increment
53    coerce_a
54    setlocal3
55    jump          	L5

59    getlocal2
60    increment
61    coerce_a
62    setlocal2
63    jump          	L2

67    getlocal1
68    returnvalue
69    returnvoid

2 Replies to “Decompiling the differences”

  1. Hey there!

    It has been quite a while since you wrote this post, but do you think that the performance of basic for (i in 0…1000) style loops has been optimised in Haxe 3?

    Or do integer iterators still get instantiated which inevitably get garbage collected? I am concerned that simple loops will lead to lots of extra GC in my game.

    Many thanks

Leave a Reply

Your email address will not be published. Required fields are marked *