Switched to IMMIX for Internal Garbage Collection

I did a little bit of profiling on the iPhone and found a bit too much time was spent doing garbage collection.
The hxcpp runtime has 2 modes – “Boehm GC with explicit statics” and “internal”. The former is from a standard and robust code base, with the latter uses built in code with explicit marking. I added the second mode because Boehm GC was just too slow on the iPhone – not sure why because it is pretty good on the other platforms (maybe I missed a configuration option).

The internal GC has some restrictions that make it mainly suitable for games. These are: the collection must be triggered explicitly, since no stack searching is done, which is most easily done once per frame. And it is not thread safe, which can be worked around. Within these confines, many different schemes can be tried.
My first attempt could probably be termed “Naive Mark and Sweep”, and used free lists. On Windows/Mac this underperfromed Boehm GC, but on the iPhone, worked better.

The current scheme is now “Simplified IMMIX“. It is simplified because it is single threaded, and I have not implemented overflow allocation, defragmentation (although there are hooks in there for moving) or any generational stuff.
I think overflow allocation should be easy enough, and defrag should not be too hard in some form or other. The insertion of write barriers for generational control may also be straight-forward using the “operator =”. I may also change the code generation to separate stack variables (local, function args) from member variables since in the current scheme, stack variables never form roots, and therefore would not need to use write-barriers.

Anyhow, on the “Physaxe” test, which creates lots of small list objects per frame, the Naive GC got about 51fps, Boehm GC got about 65fps and IMMIX got about 69fps – so a bit of a win there. For this test, I triggered all collections exactly once per frame. The difference between Naive and IMMIX is significant, and this perfromance gain also translates to the iPhone, which is good news.

Since the internal scheme is precise, I feel it should be able to outperform Boehm GC by a bit more, and maybe the extra could come from a generational system. The code is actually not that complex (1 cpp file, 1 header file) so any budding GC researchers may want to see what they can do.

Currently, the internal GC is default only for the iPhone, but you can try it on other platforms by changing the #define in hxGCInternal.h. The reason for this is the restrictions mentioned above – the easiet way to conform to these restrictions is to enable the “Collect Every Frame” in neash.Lib. To remove these restrictions, I will need to find some way of stopping the world (safe points?) and some way of capturing the stack (code mods to allow objects to push themselves on a shadow stack?), both of which are very doable, although I’m not sure on the effect on performance.

Haxe, iPhone & C++ At Last

Hxcpp 1.0, neash 1.0, NME 1.0

The release this week of haXe version 2.0.4 officially includes c++ as a build target, for Windows, Mac, Linux and iPhone. You can download and install from haxe.org. In addition to the standard includes, you will need the “hxcpp” library, which can be insatlled with the included haxelib management tool.

Coincident with the hxcpp release, I have updated the neash and NME libraries to versions 1.0. You can also download these via the haxelib tool too. There are several incrental improvements, and the iPhone target has been added!

Getting started with the iPhone

Getting started with the iPhone is quite tricky at the moment, mainly because of the pain of setting up an Xcode project. Also, getting the simplest program onto the device is hard due to the code signing requirements. So if you can already get one of the existing application templates to work, you are half way there.

Note that this solution uses the “SDL” library, and must statically link against this. SDL is covered by the LGPL license, and this has implications should you choose to release your software. I am hoping to remove the LGPL restiction at a later date.

The binaries used here are have been compiled for the “2.2.1” iPhone SDK. So choose this version when compiling for simulator or device.

  1. Download and install components
    • Get haxe & neko: Visit haxe.org
    • Get hxcpp: haxelib install hxcpp
    • Get nme: haxelib install nme
    • Get neash: haxelib install neash
    • Get the sdl-static libs for iphone: I have created a project with binary builds of these. You can get the latest builds directly from subversion svn code at:
      http://code.google.com/p/sdl-static/source/checkout.
      Or get the snapshot bundle from this site and install somewhere handy:
      sdl-static-iphone-1.0.zip
  2. Get Xcode with iphone sdk support – visit apple.com
  3. Get a Developer key (you can try simulator without it). You will need to pay to sign up as a developer on the apple site.
  4. Fire up Xcode and do File > New Project.

    Choose iPhone OS > Application. Here choose a “Windows-Based Application
    but infact we will use the delegate setup in the SDL code, so we will have
    to delete the one created by the wizard.

    Select a name & directory for the project. I’m calling it “Haxe Test”.

    Now as it stands, you should be able to build for the Simulator and
    get a lovely white screen and a program called “Haxe Test” in the simulator
    start screen.

    Next thing is to delete(to trash) the “…AppDelegate.h” “…AppDelegate.m”,
    the “Nib Files” group, Resources/MainWindow.xib and “main.m”.
    Finally, select the “Haxe Test” executable (in the Targets section) and from the “Get Info” –
    “Properties” tab, clear the reference to “MainWindow”.

    We will add replacements for these soon.

  5. Add “main.cpp” from the NME project.
    Select the top-level project folder and then use Action > Add > Existing Files.
    It is probably in /usr/lib/haxe/lib/nme/1,0/ndll/iPhone/ or
    similar depending on which version of NME you have installed. It can be
    very painful to get xcode to load from this location, unless you hit
    Command-Shift-G at the “Add” dialog and type (at least some) of this filename in.
    Choose to “Copy to destinations folder” so
    that you can mess with it if you wish. Note: you need to have a cpp mainline
    in order to automatically link in the correct runtime libraries.

  6. Add the libNME.iphoneos.a and libNME.iphonesim.a files from the haxelib NME project.
    You can add them both and the linker
    will select the correct on depending on your build. They are in the same place
    as main.cpp, you you should be able to use “iPhone” from the pull-down box
    in the add dialog. Probably best not to copy these files – in case you want
    to change them at some stage.
  7. Add the whole sdl-static/lib/iPhone directory.
    Again probably best not to copy.
    I used the “Recursively create groups” option. These will be where you stored them
    in step 1.

  8. Add the whole hxcpp/bin/iPhone directory like above.
    Again, this will
    be in a path like /usr/lib/haxe/lib/hxcpp/1,0,2/bin/iPhone/.
  9. Add the hxcpp include directory to the include path.
    Use the “Info” button
    to get the project properties, and on the build tab, under “Search Paths”
    add something like /usr/lib/haxe/lib/hxcpp/1,0,2/include/ to “Header Search Path”
  10. Now we are ready for the haxe code. If you have and existing project,
    then you can adapt the following instructions.

    Create a new file from Xcode (Other/Empty File] Here I have called it “HaxeTest.hx”, and unticked the “Targets” option. I’m prety sure there is a way to get “Haxe File” to appear as on option here – but I don’t know the details.

    In the haxe file, enter something like (Note the window size):

    import flash.display.Sprite;
    import flash.display.Shape;
    
    class HaxeTest extends Sprite
    {
    
       public function new()
       {
          super();
          flash.Lib.current.addChild(this);
    
          var circle:Shape = new Shape( );
          circle.graphics.beginFill( 0xff9933 , 1 );
          circle.graphics.drawCircle( 0 , 0 , 40 );
          circle.x = 150;
          circle.y = 200;
          addChild( circle );
       }
    
       static public function main()
       {
          neash.Lib.mOpenGL = true;
          neash.Lib.Init("HaxeTest",320,480);
          neash.Lib.SetBackgroundColour(0x447733);
    
          new HaxeTest();
    
          neash.Lib.ShowFPS();
          neash.Lib.Run();
       }
    }
    

    This is the “main” file for haxe, and the hxcpp compile will create a library matching
    this class name.

  11. Set up a build script to build changes you make to your haxe files into a library.
    Xcode has a few issues with a straight custom build script order due to incorrect
    dependency checking. This can be worked around by first adding a custom target.

    Highlight the “Targets” in the Groups & Files and use the “Action > Add > New Target..
    Choose “Other > Shell Script Target” and call it something like “Compile Haxe”.
    Close the pop-up and go back to the explorer. There should be a “Run Script”
    entry under the “Compile Haxe” target if you expand it out.

    Get info on “Run Scipt” and enter the following script

       if [ "$CURRENT_ARCH" = "i386" ]
       then
          haxe -main HaxeTest -cpp cpp -lib neash -lib nme  --remap neko:cpp --remap flash:neash -D iphonesim
       else
          haxe -main HaxeTest -cpp cpp -lib neash -lib nme  --remap neko:cpp --remap flash:neash -D iphoneos
       fi
    


    You can untick the “Show Environment” if you do not need to debug this.

    One last step – drag the “Compile Haxe” target into the “Haxe Test” target.
    It should now also show up as first item “under” the “Haxe Test” target.
    The build order should now be correct. (See image at end of post)

  12. Now you are ready to do the build. The first time you build, the build
    results will show “Running custom shell script…” for quite a while.
    Haxe compiles to cpp very quickly, but it take a while for the cpp files
    to compile to a library. You can see the progress if you expand out the
    middle tab bit.

    At this stage, you should get a bunch or errors when linking, but also haxe
    should have created a library for you. Add this library to the project –
    it should be in the local cpp/HaxeTest.iphonesim.a.

  13. Compiling now gets a bunch of unresolved functions from frameworks.
    Add the following frameworks to the project (Add > Existing Frameworks):

    • QuartzCore
    • OpenGLES
    • AudioToolbox

    These can be found in /Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.2.1.sdk/System/Library/Frameworks/.

  14. Run!
    So you should be good to go. Open up the debug console so you can see
    any traces/printfs.

  15. Change the target to “Device – IPhone OS” from the pull-down and hit “Build and Go”.
    Again, this takes quite a while the first time.
    Now add the new cpp/HaxeTest.iphoneos.a library to the project.

  16. Now you need to sort out your code signing. If you have not done so already,
    setup you apple developer account & certificates on the apple web site.
    Go to the info of the “Haxe Test” executable and the “properties” tab.
    Change the “Identifier” to match one of your cerificates. Make sure to
    match your company URL. You may want to use “*” when creating your
    profile for easy changing.

    Under the “Build” tab, under the “Code Signing” bit
    in the “Any iPhone Device” pull down your profile. If you don’t have one then
    you will need to create one on the apple website.

  17. Connect up your iPhone(iPod touch) and build! W00t!

HaxeTest

I have had all sorts of errors when trying to upload to the device.
So far, they have been solved by getting out of the car, walking around it and getting back in.
ie, Disconnect and power down ipod. Fully exit Xcode and the start it all up and try again. Also, uninstalling the app from the “Windows > Orgainiser” directory can help.

But now the easy bit. Change to HaxeTest.hx file, and hit Build & Go. It is that simple.
Errors should show up nicely in xcode.

You can add data files (eg, pngs, xml etc) to the project and they will be copied to device so you can open them with a relative path.

In the properties of the “Info.plist” you can set a Icon File – don’t forget to add the icon to the project too.

Not covered here (because I have not fully sorted it out myself):

  • Syntax highlighting in XCode
  • Debug build (hxcpp can do then – it’s a matter of setting up Xcode)
  • Code completion in Xcode
  • Automating this procedure!

Edit: Add framework path, SDL version, MainWindow clearing.

Haxe on the iPhone – For Real!

iphone3 To progress this project a bit further, I needed a real device – so I convinced the little woman that an iPod touch would be a good thing to have around. She seems to have taken to it, so now I’m thinking I may need one each :).

After much phaffing about, I’ve finally managed to get stuff running on the actual device. I had to comment out quite a bit of NME, since I only used the base SDL, not all the extras. Boehm GC was also a bit tricky because I didn’t really know what I was doing, but I brought in some bits from the mono project and then disabled USE_MUNMAP because it caused it to crash. In the end, it seems to work – no crash, but then I may not have been running it long enough. I will have to try some memory thrashing later.

One thing I found with Xcode is that if you ever change the project name/AppID settings then you really need to clean the project, exit Xcode and get back in. But the hardest part was working out where to go the get the developer certificate! I guess I’m a bit thick, or missed the meno, but it took me ages to get to the web form to create a certificate.

So the big question is perfromance. In this demo, initally, it runs at about 2.5 frames a second (I don’t have a fps counter yet), but slows a bit later when things spread out. But this is using the high-quality, anti-aliasing software renderer. Next job is to hook up the OpenGLES renderer, then I’ll really know where I stand.
But overall, pretty positive result I think.

A Second Look (iPhone + Haxe)

iphone2

Once the basics are in place, the rest comes pretty naturally.

Just a slight tweak to the MovieClip transformation gets Physaxe doing it’s thing.

Performace seems ok-ish in the simulator, not sure how it woud go on the real device.

Hxcpp 0.4, NME 0.9, Neash 0.9 Released!

What the flash?

What is Hxcpp? Hxcpp is the c++ backend for haxe. This means you can compile haxe code to c++ code, and then compile this to a native executable, for Windows, Linux or Mac.

What is NME? NME is the “Neko Media” library that wraps SDL, providing gaming interfaces for neko, and now native compiled haxe code.

What is Neash? Neash is a compatability layer that presents the flash API to haxe code running on other systems, such as js, neko or c++ native code.

Together these allows you to write code to target flash SWF files, and also cross compile to native code for Windows, Linux or Mac.

Hxcpp on haxelib

I have finally packaged up a bunch of changes into offical haxelib releases. Hxcpp is now on haxelib, which means you can get it with “haxelib install hxcpp”. This effectively creates a whole separate install of haxe, which can be run side-by-side so you can test it out without risk.

The cpp backend now supports Mac(intel) and Linux as well as the original Windows platform.

The main change to hxcpp is the packaging – moving towards a the final installation form. Currently there are a whole bunch of files distibuted in this release that should become redundant once the c++ backend is merged into the main branch. Also, the library coverage has been expanded a bit, but it is still not complete.

Usage

Firstly, you will need to run “haxecpp” instead of “haxe”. This executable is found in the appropriate bin subdirectory. I’m not sure if the “executable” flag will survive the compression, so you may need to “chmod a+x” the file.

It is probably best to place the appropriate bin directory in your executable path. On windows, this will also solve the problem finding the dynamic link library, hxcpp.dll. And on all systems, this will allow you to use the “make\_cpp” command from the hxml files. On Linux systems, you will have to allow the executable to find the hxcpp.dso. This is most easily done by setting LD\_LIBRARY\_PATH to the bin/Linux directory, or copying this file into an existing library path. Similarly on Mac, you should set DYLD\_LIBRARY\_PATH.

To build haxe code, use “haxecpp” inplace of “haxe”, with a target specified by “-cpp directory”.
This will place source code and a makefile in the given directory. Then you need to do a “make” on linux/Mac, or “nmake” on Windows to build the executable. You may need to set the environment variable “HXCPP” to point the the directory that contains this file. On windows, this will be something like: c:\Progra~1\Motion-Twin\haxe\lib\hxcpp\0,4\

As a shortcut, if you are using a hxml file, you can use “-cmd make_cpp” which will do the build for you assuming you used the “-cpp cpp” directory.

Neash/NME

The big changes for NME is that it now supports Linux and Mac(intel) for neko ac c++ targets. There have been a few bug fixes as well as a few new features:

  • Bitmap class
  • Expanded and optimised TileRenderer for render scaled and rotated sub-rects from a surface
  • A few smarts for finding fonts, if no ttf is supplied
  • Some blend modes have been added
  • Added scale9Rect
  • Added drawTriangles, with perspective correct textures

ToDo

There is still plenty to do, including, but not limited to:

Hxcpp:

  • Proper coverage of all APIs.
  • Resolve the order-of-operation problem: In c++ f(x++,x++) is ambiguous as to what order the increments are performed. Or perhaps agree to live with it.

NME:

  • Add all blend modes
  • Add all filters
  • Discuss with experts the merits of static vs dynamic linking Mac and Linux.

Neash:

  • Sound is a big ommision
  • Loader code
  • Unit testing of supported APIs.

Despite these issues, I think there is a useful core of functionality here.

Let me know what you think.

HXCPP 0.3 Released

I have put together a new version of hxcpp, the c++ backend for haxe. New features include improved coverage of language features. All the unit tests except remoting pass now. I have also cleaned up the ocaml code a bit and improved the output consistency. Still a bit to do here, but not that much.

The code now contains a dependency system that allows for incremental compiling, greatly improving the speed.

Dlls are now all in one directory – by adding this to the exe path no dll copying should be required. This still needs a little thought :- I tried to delay load the dll, thereby giving greater control over locating it, but it seems the rtti system brings the dll in before I can change the load path. Apart from this, most of the 0.1 TODO list is finished. There are still a few little language features required – such as “break” from within a return block but it’s 99% there, and the external libraries are pretty much untested.

I have also hijacked the neko code to provide OS libraries. This means that I had to allow “neko” class paths in the cpp target – this seems a little odd – I will have to think about a solution here.

Strings are implemented with wchar_t, rather than utf8 bytes, so some neko functions that took “string” actually take “byte array” in cpp. On the plus side, multi-byte characters are “native” in the c++ target.

The source and demos are in hxcpp-03.zip.

HXCPP 0.2 – Huge performance increase.

I have switched hxcpp over from using ref-counting to using Boehm garbage collection. I have also added some additional perfromance improvements, such as integer-index field names to make interaction with neko more efficient.

The overall result is that for the Physaxe demo, the frame rate went from 24 fps to 82 fps (in opengl mode). The swf file runs at about 35 fps, and neko at about 8 fps. This is about what I was hoping for from the first round, but I got there in the end.

You can download the updated files here.

Boehm GC, virtual inheritance and finalizers.

I’m trying to get a speedup for the cpp backend for haxe by using garbage collection. Initial results are very promising – potentially about twice as fast. Howerver I spent a good few hours getting the the bottom of a little problem. Boehm garbage collection is a very impressive piece of work – it has all sorts of magic that does magical things, such as deal with virtual inheritance. This was a bit of a surprise because you do not always get “real” pointers, when you store an object pointer, you get one with an offset. However it seemed to work. Until I added finalizers to the external draw objectes used by the renderer. Apparently, you can only add finalizers to the “real” pointers (ie, those returned from “GC\_MALLOC” et al), rather than a pointer to the same object related by virtual inheritance. The symptom was that the object gets finalized in the first “gc_collect”, even though it was still “used” as far as I was concerned. I guess this is not too surpising, and the fix was pretty easy, but the fact that everything else worked so well lulled be into not suspecting this initially.

C++ backend for haXe

I have just completed an alpha release of a c++ backend for haxe. This means that you can complile haxe code into a 100% compiled executable. You can download the demo file in hxcpp-01.zip. Sorry, windows only at this stage.

The distribution contains a new cpp backend for haxe. It has been based on a 2.0 version of haxe, which may be a tiny bit out of date. Most of the changes are in the new “gencpp.ml”, and to the standard library files, with a few little extra bits here and there. You can re-compile the
haxe compiler if you have ocaml by using the supplied install.ml script.

To try this version for yourself, first backup your haxe distro and copy then supplied “compile/bin/haxe.exe” and “compiler/std/*” files over the top. Use the “-cpp cpp_directory” command line to generate a directory that contains src, include and nmake files. You can then compile these using the microsoft visual studio “nmake” utility. The build system requires the library, include, make and dlls from the “hxcpp” directory. To access these, you should set the environment variable “HXCPP” to point to hxcpp directory extracted from this distribution. This can be done from right-click-“My Computer”/Properties/Advanced/Environment Variables, or from the commandline before compiling.
These resulting “exe” file also needs the hxcpp.dll file from the hxcpp/dll directory. The should be in your “path”, or simply copy it next to your exe.

You can recompile the hxcpp.dll using the nmake file in the directory. You can change the compile flags from the $HXCPP/nmake.setup file (eg, turn on debug).

Demos

Two demos have been included – “perf”, a small benchmark program I found on the net
and a “Physaxe” demo. The source is included (slightly modified), and so are the binaries.
The cpp src and include directories have been included to give you taste of the
output if you can’t be bothered setting up the compiler yourself.
The binaries can be found in demos/bin, and are compiled for neko, swf and cpp.
The neko version can be run with “neko phx.n” or “neko TestRunner.n”. You do not
need a very recent version of neko, but you do need the included “nme.ndll” findable
by neko (next to it will work).

The cpp version of Physaxe uses the cpp verion of NME. This was compiled from
the same code base as the neko version, except it uses the “neko.h” file found
in the hxcpp directoty instead of the one that comes with neko. The nme.dll should
be next to the compiled exe.

If you want to compile the nme versions yourself, you will need the latest nme and neash
versions from code.google.com:
http://code.google.com/p/nekonme/source/checkout
http://code.google.com/p/neash/source/checkout

Performance
————
The flash version of physaxe runs the fastest, with the cpp version about 70% of the
speed (when using the opengl version), and neko about 20% of the speed.

One of the problems is that the cpp version uses the neko api, which required fields
to be looked up by name, which is quite slow in this implementation. A faster version
could link directly to the hxcpp objects – but then it could not use the same API.
This problem is made far worse by the fact the physaxe re-renders each point in each
object every frame, rather then simply adjusting the matrix of existing objects.

I think the most significant loss of perfromance is coming from the reference counting
housekeeping. I will look into a garbage collected system soon.

The results from the “TestRunner” are mixed with flash being faster for stings, but
cpp faster for maths and looping. Neko is fastest for the sting sort in this case,
but this is unusual because the stings are already sorted. When they are not, neko
is very slow. The cpp string code is very simple, so there is scope for improvement there.

TODO
—-
There is still plenty to do.

  • A lot of the operators (eg, “*=”) have not been looked at.
  • The actual formatting of the generated code needs a complete overhaul.
  • The ml code needs some simplifying/cleaning.
  • The standard libraries (eg, xml,regex)
  • Need some way of locating the various dlls etc.
  • Splitup/refactor the HObject.h et al files.
  • Returning values from blocks/swithes.
  • Complete neko.h
  • Look at GC.

Plenty more, I’m sure.