Due to the great work of Nicolas Cannasse, most of the results below have to be re-written! HaXe now as stong typing in flash9, significantly improving performance. I also have a new machine, so some of the results will not be directly comparable, but you will get the idea. I have also added a new one: inline-grid-while, that uses while loops instead of for loops.
With the new version of haXe comes some very interestesting technology – hxasm. This allows you to use haXe syntax to write flash9 “bytecode”. This gives the possibility of decoupling the “per object” bit of the grid iteration from the looping bit by concatenating chunks of bytecode. In theory, you should be able to achieve optimal performance using this method, since you can write any bytecode you like. However, currently I can’t quite get the performance I think because ultimately the function is called through a “dynamic” interface, rather than a strongly typed one.
Writing hxasm from scratch can be quite difficult. For starters, the flash api requires time to compile the code, so the api involves a callback to complete the compilation. Also, the haXe syntax is not that of a “proper” assembler, so jumps etc take a bit of work. And sometimes it is a bit hard to know where to start. To help with this, I’ve written a tool that takes compiled hx code, via the output of “abcdump”, and converts it to hxasm. You can find this code in abctools.zip.
Examining the hxasm code, you can see the difference between the for and while loops. Interestingly, other “hand optimisations” did not seem to give much better results – I suspect the flash vm is doing some pretty good optimisation as it goes. So I think the way to optimise is probably to change the original hx code, rather than the hxasm code (eg, using while loops instead of for loops). Another optimisation I looked at was to “burn in” runtime values. So rather than using the op code to get a member variable, you can burn this variable in as a constant into the bytecode. I think this gave a small improvement – I could not really tell. Infact, this last optimisation is really the only performace increment to be gained from runtime compilation – the rest could in theory be done in the production of the swf file. However, it does present a very interesting solution the the code decoupling!
The source code can be found in src2.zip. Unfortunately, this breaks the ability to compile for neko. Also, it requires a small mod to hxasm 1.03, using an additional offset of -4 on the “backwardJump” call in Context.hx.
|Object List||8.1||Easy to understand/debug.||Slowest. Causes stutter while garbage collection runs|
|HaXe Iterator||10.1||Improved performace over Object List.
Direct “drop in” replacement for Object List.
|Slightly complex to write. Slightly slower than most.|
|While Iterator||7.1||Slightly faster than for-iterator. Slightly easier to write||Slightly more complex to use.|
|Closure/Callback||13.9||Slightly faster than for-iterator. Decoupled. Interesting way of writing code.||Interesting way of writing code.|
|Member Callback||6.0||Faster than anonymous callback.||Member function name is explicit in code.|
|Inline GOB||6.4||Faster.||Couples GOB code to grid implementation. Requires separate code for each function|
|Inline Grid – for||4.5||Fast. Easy to understand/debug. Not as badly coupled as Inline GOB.||Couples Grid code to GOB implementation. Requires separate code for each function|
|Inline Grid – while||4.0||Fastest. Same as “for” loop, but slightly faster, and slightly more verbose.||Couples Grid code to GOB implementation. Requires separate code for each function|
|HxASM inline code||5.1||Fast and decoupled.||Requires writing “raw” hxasm callback. 2-phase setup|
Out of all this, the conclusion is pretty similar – the tighter coupling creates faster code – but all the code is faster now, which is great. The inline hxasm is very interesting, and while probably not appropriate for this application, shows some promise for certain applications.