Switched to IMMIX for Internal Garbage Collection

I did a little bit of profiling on the iPhone and found a bit too much time was spent doing garbage collection.
The hxcpp runtime has 2 modes – “Boehm GC with explicit statics” and “internal”. The former is from a standard and robust code base, with the latter uses built in code with explicit marking. I added the second mode because Boehm GC was just too slow on the iPhone – not sure why because it is pretty good on the other platforms (maybe I missed a configuration option).

The internal GC has some restrictions that make it mainly suitable for games. These are: the collection must be triggered explicitly, since no stack searching is done, which is most easily done once per frame. And it is not thread safe, which can be worked around. Within these confines, many different schemes can be tried.
My first attempt could probably be termed “Naive Mark and Sweep”, and used free lists. On Windows/Mac this underperfromed Boehm GC, but on the iPhone, worked better.

The current scheme is now “Simplified IMMIX“. It is simplified because it is single threaded, and I have not implemented overflow allocation, defragmentation (although there are hooks in there for moving) or any generational stuff.
I think overflow allocation should be easy enough, and defrag should not be too hard in some form or other. The insertion of write barriers for generational control may also be straight-forward using the “operator =”. I may also change the code generation to separate stack variables (local, function args) from member variables since in the current scheme, stack variables never form roots, and therefore would not need to use write-barriers.

Anyhow, on the “Physaxe” test, which creates lots of small list objects per frame, the Naive GC got about 51fps, Boehm GC got about 65fps and IMMIX got about 69fps – so a bit of a win there. For this test, I triggered all collections exactly once per frame. The difference between Naive and IMMIX is significant, and this perfromance gain also translates to the iPhone, which is good news.

Since the internal scheme is precise, I feel it should be able to outperform Boehm GC by a bit more, and maybe the extra could come from a generational system. The code is actually not that complex (1 cpp file, 1 header file) so any budding GC researchers may want to see what they can do.

Currently, the internal GC is default only for the iPhone, but you can try it on other platforms by changing the #define in hxGCInternal.h. The reason for this is the restrictions mentioned above – the easiet way to conform to these restrictions is to enable the “Collect Every Frame” in neash.Lib. To remove these restrictions, I will need to find some way of stopping the world (safe points?) and some way of capturing the stack (code mods to allow objects to push themselves on a shadow stack?), both of which are very doable, although I’m not sure on the effect on performance.

11 Replies to “Switched to IMMIX for Internal Garbage Collection”

  1. Thanks for your continuing work on this… I’m doing iPhone development using Cocos2d for iPhone, an objective c game library, and really like it, but I’d love if I could target more than one platform as haxe does.

    I built and installed some sample apps using the process you detailed in a previous post, but have only gotten 3-4 fps so far, which I assume is because it’s still using SDL.

    Are you’re working on OpenGL support? And if so, do you have a guesstimate as to when it’ll be available?

    The reason I ask is you mention you were getting some great frame rates (51-69fps) for the Physaxe test… Was this only in the simulator?

  2. I just realized that for the Neash example, there was “neash.Lib.mOpenGL = true;” is required to enable openGL support… Makes sense of course, but I started trying more samples other than just the dead simple sample, and used code from existing sample projects, which didn’t have this enabled by default.

    So the openGL support appears to be there for Neash at the least… Is it there in some way for the NME only examples? I tried searching SVN for a public way of setting this type of property, but didn’t see one….

    1. Hi Brad,
      The opengl flag is in the constructor of the NME Manager class – should be able to put it on there.
      Some of the later nme samples allow you to specify “-o” on the commandline for opengl.
      OpenGL should really be on always in the NME ndll for iphone – I will probably fix this at some stage.

      I can get about 20-30 fps for the physaxe example – but there are some inefficiencies I’m trying to remove.
      Up to 60fps for a simple bitmap based game. Currently text is overly expensive to render – so turn the stats
      panel of in the physaxe example.
      The other real killer is writing a “copyPixel” based engine, since these operations are not ogl accelerated,
      and must be uploaded as a texture, which is very expensive.

      Huge

  3. thanks for all the info… i’ve compiled the physaxe sample and have only gotten 15 fps out of it, but I used physaxe source from an old hxcpp release you did, and I’m sure you’ve probably done some optimizations to the physaxe code yourself to get some better frame rates…

    I’m building a physics based platformer that’s about 70% working in Objective C right now. But if I could switch to Haxe and feel confident that a similar iPhone target would be viable somewhat soon, then I’d drop that and focus my efforts on building a haxe platformer instead, so I wouldn’t be locked into one platform/vendor, which of course is a huge thing with the fickleness of the appstore.

    What’s your feeling on this? Do you think I should just finish up my game in Objective C, or do you think that the optimizations you’re doing now would come through enough that a physics based platformer with sprites of course, would be viable under Haxe on the iPhone anytime soon?

    Thanks again! No matter what I’m going to be using Haxe… the only real consideration is whether or not it’s for this project!

  4. One more question 😉

    Hxcpp seems to be able to compile the Flash9 haxe library. Can hxcpp compile the Flash haxe library too (ie Flash 6-8) ?

    I’ve tried adding the “—swf—version 8” flag to the haxe command line, but it doesn’t seem to respect it… It seems to still build for the Flash 9 library…

    Thanks!

  5. Hi Brad,
    Yes, hxcpp uses NME, which implements the flash9 drawing api. There is no reason why someone could not write a flash 8 api, but it has not been done.

    Hugh

  6. Hello
    I’m using latest openfl (but this problem also happens with NME)
    My game suffers of periodic small stalls on mobile devices, most of the time it runs very smooth but because it has fast background scrolling the stalls are very noticeable. I have already tried to pool everything, but its not a memory problem, our game runs fine on older devices like the 3GS so the problem I suspect is the garbage collector.
    Checking GCinternal.cpp I can’t find the implementation of the Boehm GC, maybe I’m missing something here. I wanted to test the incremental/generational option that in theory could address my problem even if takes a bit of my FPS.
    Can you hel me on this issue?
    Laurens

    1. Hi
      Hxcpp has not used Boehm Gc for some time now, and there is no incremental option. Is the problem with android? Because the problem could be to do with the java GC rather than the cpp gc.

  7. Hi Hugh, many thanks for your time.
    the problem happens in iOS and Android devices, in lower-spec Androids it’s a bit more noticeable but not so much.

    There is some curious evidence pointing to the hxcpp garbage collector:
    we started porting a “big flash game” to Haxe, when we started to flesh up our levels with all the game entities we noticed big stalls on mobile devices, we started pooling all that we could and the stalls were reduced but they are still happening.

    Now, most of the time the game ran fast, even in very older devices like the 3GS it maintained more than 30 FPS, the curious thing is that the stalls were more noticeable on higher devices at 60 FPS (without pooling), I suspect this was due to more calls to the game update and more temporal objects being created and then needing to be discarded.

    There is some version available with the Boehm GC?

    1. The latest version of hxcpp has some multi-threaded collection. This is not concurrent with execution, but should improve the collection time on multi-core devices. One thing with object pooling is that it might improve performance on average (decrease the GC stall frequency), but it might also increase the GC stall time, since this depends on the active number of objects.

Leave a Reply to Brad Parks Cancel reply

Your email address will not be published. Required fields are marked *