I noticed that Google have a site that compares a few languages using a graph traversal benchmark. http://code.google.com/p/multi-language-bench/.
So I thought it would be interesting to see how haxe/hxcpp fares. I started with a direct (ie, almost line-for-line) translation of their cpp implementation. I changed the list to arrays where appropriate, and implemented one “set” as an array because haxe does not have a native “dictionary” object for keying from generic objects.
The supplied cpp target was pretty easy to compile on windows by ignoring the makefile and using:
cl -O2 LoopTesterApp.cc mao-loops.cc /EHsc
The haxe target uses (right-click, Save As) LoopTesterApp.hx and MaoLoops.hx, and compiles with:
haxe -main LoopTesterApp -cpp cpp
The java_pro code can be built and run (note that increased stack size is required when running) :
$JAVA_HOME/bin/javac LoopTesterApp.java
$JAVA_HOME/bin/jar -cvf LoopTesterApp.jar `find . -name \*.class`
java -Xss15500k LoopTesterApp
The java target requires additional stack space – and so does neko, however there is no option to increase the stack space with neko like there is in java, so neko just panics. I have not tried as3 – I guess there will be a script timeout which would need to be managed. Also a JS or v8 time would be very interesting.
The runtimes are:
cpp | 19.2 seconds |
---|---|
cpp (claimed) | 6 seconds ? |
hxcpp | 26.7 seconds |
java (pro) | 22.5 seconds |
The google paper cites a 3x speedup with optimised c++ code, but there is no code for this.
The java implementation is the hand-optimized version. In hindsight, I should probably have ported this version, rather than the vanilla cpp version. It includes optimizations such as object pooling and task-specific containers. Also, this particular benchmark is designed to allow java to comfortably perform its JIT compiling.
So in conclusion, I would say I’m pretty happy with these results. I’m sure there are some micro-optimizations, such as the “unsafe get” on arrays, which does not check the bounds, and some higher-level stuff that profiling may reveal that could share quite a few percent from the hxcpp time. I’m not particularly interested in optmising for a particular benchmark, however a little bit of profiling here may help speed up the target as a whole, which would be a very good thing.