New Releases!

Finally, I’ve managed to make a few more releases. Namely, HXCPP 2.06 and NME 2.0.
HXCPP can be downloaded via “haxelib”, and works with the new 2.06 version of haxe. This release contains many bug fixes and optimizations. Also, when you compile with the -debug flag, you can get additional null checks and stack dumps.
NME 2 is a complete rewrite from the ground up. Most of the logic has been moved from neash to the c++ code for optimization reasons.
For NME developers this only means a couple of things:

  • You use –remap flash=nme instead of –remap flash=neash
  • The “boot” code has changed, so you will need to follow the main line from one of the samples.
  • Improved performance!

I updated my Xcode SDK, which caused a bunch of link errors linking for the simulator with the NME library. It seems Apple have changed the “ABI” for objc (basically, broke binary compatibility). So I must choose pre 4.0, post 4.0 or both. I think I am going to require NME iphone simulator users to have the latest SDK – unless there are any objections?

iPhone Xcode Template

I finally got around to making an Xcode template for haxe compiling. Try it out and let me know if it works. It’s my first one, so I’m open to ideas for improving it.

To use the template, first extract it to your developer template area, eg: /Developer/Platforms/iPhoneOS.platform/Developer/Library/Xcode/Project Templates/Application.
Then when you create an new Xcode project, this template should show up. You should be able to build for the simulator.

To build for a real device, you will need to go though the official Apple developer program, and get yourself signed up. Then you need to get your certificates in order, and finally edit the “plist” file in the Resources folder and change the company URL to match the one you used in your certificates.

You can edit Main.hx code in the “haxe/src” directory, but you will probably want to locate your haxe source tree outside your project area, since we are multi-platform developers an do not want to tie ourselves to Xcode. To do this, you can edit the class path in the Build.xml file, and change the boot code in IPhoneMain.hx. It is done this way so the haxe-generated library always has the same name. If all else fails, you can have a look in the makefile, which is set up to allow building debug and release versions for iphoneos and simulator without having to change other project settings.

Let me know if you have any luck.

Android Port – Second Look

I have looked into some of the performance aspects of the Android port, and I’ve come to some conclusions. Firstly, after looking at the disassembly, there did not seem to be any additional code associated with exception handling, so there was no optimizing to be done there. Secondly, the compile flags meant that software floating point operations were used, rather than the built-in hardware FPU.

So I added compile flags to force hardware floats, and added armv6 instructions while I was at it. You can get the installer here.

This gives my phone a nice 50% boost in frame-rate to about 15fps. However, on some devices this app may fail to run. This change brings the performance in line with iPhone performance for the simulation side of the program. However the OpenGL seemed slower. I guess the Qualcomm MSM 7227 does not have strong 3D acceleration \- or perhaps I am still missing something?

For comparison, you can test the flash version of the code in this directory. I get about 6\-7 fps, which is not too far from my initial software-float based results and is quite impressive. I’m not sure if the flash renderer is using software or hardware rendering \- maybe it’s worth a closer look to find out what it’s doing.

Android Port – First Light

I bought myself and Android (2.1) HTC Legend phone. Obviously, the first thing I wanted to do was to get haxe/NME running on it. Now the Android platform is very well setup for Java development but, unfortunately, the haxe Java target is not fully developed at the moment. Luckily, there is also a “native” development kit (NDK) that allows you to run c++ code on the device using a Java bootstrap to load the code and JNI.

The NDK is based in the GCC toolchain, and therefore is already pretty well supported by the hxcpp haxe backend. The problem is that the standard NDK is somewhat crippled – it does not support exceptions or the STL. It is possible to rework the hxcpp backend to work without the STL, but removing exceptions may take a bit more work. I managed to work around this by using a slightly modified NDK, created by a very helpful member of the comminity over at crystax.net. In theory this should have been all that was required, however the NDK also lacks proper wchar_t support. When I finally worked this out, it was reasonably easy to substitute a simple translation layer to fix these wide-char problems.
So far, I have only implemented the rendering API, with no interactive options. The OpenGL ES rendering code is exactly the same as for the iPhone, except that the initial context and display surface are setup in the Java code, rather than the Objective C code.

Developing with haxe for the Android target requires several modules to work together. First, you compile your haxe code as normal to a shared object – which is actually a JNI module. This exposes a single “main” function that can be called from Java. Your actual Android application starts with Java code, and you can setup the properties etc. from eclipse. Ultimately this will be boilerplate code and you will only need to change the bits you need to. This Java code then calls the haxe main function. In the graphical/NME environment the main thing this does is dynamically load the nme native library and setup a callback for when the graphics are finally initialized. You then return to Java code and setup a OpenGL ES context. When this is done, a JNI call is made into the nme dynamic library, which in turn calls back into your haxe application to complete setup. The “MainLoop” is then processed in Java, and events such as “Redraw” (and later “OnMouse” etc) are passed into the nme library, and then into the haxe application as registered “addEventListeners”.

So in this somewhat convoluted way, Java maintains full control of the app, and therefore correct interaction with the OS, while almost all the real work is done in the haxe code. In the example shown here, no changes were necessary to the haxe code, and the internal garbage collection also worked without modification, so it is a pretty solid cross-platform solution.

Now the bad news. The performance is way down compared to my 2nd generation iPod touch. This example is 10fps on the Android device, and about 24 on the iPod. Which is strange, because the core should actually be running a bit faster. The slowdown seems to be on both the OpenGL side, and the physics calculation side. It should also be noted that this is a first pass, so there is quite a bit of room for improvement. My guess is that the gcc compiler is generating significant runtime overhead per function because of the exception code. This is what I noted while compiling for the iPhone (it was doing a pthread lock per function!) \- but in that case the penalty was only incurred in functions that actually “threw”, so I simply moved the throwing code into a separate function. Which brings me to my second problem …

No easy profiling or debugging support for native code. This is a real pain for debugging (back to “log” debugging) and the lack of profiling makes it very hard to work out what needs optimizing for the target. I may be able to use a simple “setjump” style system for exceptions because the use of garbage collection means that there are no real destructors required for cleanup. But I would need to be sure this would help – premature optimization and all that.

I would also count the fact that you really need to use eclipse as an IDE as a big negative, but I think that this is just my personal dislike of the program. I think there may ultimately be ways around this with command-line compilers etc.

Native compiling tools for the Android target are still in active development and moving forwards, so I’m assuming that most of these problems will be overcome eventually. And of course, the big advantage of the Android OS is the openness \- so I can provide you with the actual application to run on your device \- give it a go, and let me know if it runs for you.

JavaScript – ready or not.

JavaScript Performance

There have been some very promising improvements in JavaScript performance, but exactly how good is it? It turns out, that there is a pretty easy way to work this out – thanks to haxe.

Haxe allows the same code base to be compile to Flash, JavaScript, neko and cpp. The graphics is handled differently – Flash uses its plugin, JS uses canvas and neko is using the NME library, running opengl. To compare these, I’ve chosen the Physaxe library, which is optimized for all these platforms, and can give a feeling for an app that has a computational and graphics load.

Into this mix, I will add another interesting option: The V8 JS engine, running using the NME library in opengl mode. This cross-over mode is actually quite easy to implement because of 3 stars aligning: 1. The NME library has a external interface that uses opaque handles that map very naturally to the v8::Value *. 2. The haxe compiler makes it possible to program JS without losing your mind, and all the existing library code is valid for this target. and 3: The Google V8 JS engine has a clean API that makes it easy to embed (you would almost think they designed it that way – dispite the frugal documentation).

The benchmark I have chosen is the “Pentagonal Rain”, which is nice and stressful for the CPU. You can try for yourself – use the ‘5’ key to switch to this demo.

Engine FPS
Neko/nme 9
Chrome 4.1, JS 11
Opera 10.5.3, JS 18
V8VM/nme 23
Flash 37
CPP/nme 130

So as you can see, the V8VM option is actually quite viable as a scripting vm. Since there is a lot in common between neko, v8vm and cpp haxe targets and plugin architectures, it should be relatively straight forward to switch between them.

The JS demo can run on the iPhone. But just because you can do something, it doesn’t not mean you should \- at about 2 FPS on the title screen, I can’t imagine how slow it would run in the Pentagonal Rain demo. And probably not great for your battery either 🙂

Bravo, Apple

Finally, Apple is doing away with those arrogant upstarts who think then can write a few lines in a high level language and call it a program. Their new developer agreement requires:

3.3.1 – Applications may only use Documented APIs in the manner prescribed by Apple and must not use or call any private APIs. Applications must be originally written in Objective-C, C, C++, or JavaScript as executed by the iPhone OS WebKit engine, and only code written in C, C++, and Objective-C may compile and directly link against the Documented APIs (e.g., Applications that link to Documented APIs through an intermediary translation or compatibility layer or tool are prohibited).

This has a couple of good points – firstly banning stupid languages (used by those people who are not smart enough to learn c++), and secondly getting rid of translation layers. Apple has clearly put a lot of thought into their APIs, so why would anyone want to put a layer on top of them – it’s just going to make things harder to use.

Languages

There has been a lot of talk recently about compiling “foreign” languages, such as haxe, as3, javascript, java, .net based languages, into binaries that will run extremely well on the iPhone. But like all foreigners (who are responsible for all the terrorism in the world) these languages should be cleansed from all iPhones to maintain the iPhones mono-lingual purity. Putting such insidious diversity into a beautifully designed device can be shown to confuse consumers, most of whom don’t even know their device and been compromised by these so call “high level” languages.

By raising the barrier of entry, and only permitting “real” programming languages (ie, “C” based ones), Apple ensures that the quality of apps will remain at its current lofty levels. “Natural Selection” will then weed out those people who are too lazy or too stupid to learn a proper language. In fact, I think Apple has not gone far enough here and should dabble in a bit of “Intelligent Design” by requiring that all developers who wish to submit apps hold at least a 4 year degree in computer science. Just imagine a world where any kid can work out of his garage and build an application with an original language, or bit of hardware, that snubs its nose at the establishment – anarchy would ensue. Therefore, it is important that the responsible companies out there vet such potentially disruptive ideas before they can cause too much damage.

It can’t be said that Apple don’t like new langauges, after all, they championed the greatest NeXT Step in programming ever, Objective-C, it’s just that all the other languages are utter crap. Some of then do away with the beautiful square bracket, some use commas to separate function arguments and nearly all the modern ones perform “Garbage Collection”. What a joke! Apple solved this problem years ago be simply not creating garbage in the first place. Again, it is only those too lazy to learn about how to use allocation pools and correct reference counting that need anything as dirty as Garbage Collection.

The new langages, such as haxe, are so terse that you do not even know when you are using a delegate. How can anyone possibly understand that code like:
addEventListener(KeyboardEvent.KEY_DOWN, function(event) { trace(event); });
Is supposed to do? I mean where is the delegate? Where is the class that implements the UITextFieldDelegate protocol? (And why must these languages continue to call things “Interfaces” when they are clearly “Protocols” ?)

I think Apple are right to ban code generators, such as the haxe c++ backend. While these produce code that could in theory be produced by hand, the code it robotic and lacks the “soul” of hand written code. To err is human, and without the quirks introduced bu a human coding c++ we may as well hand the future over to SkyNet and let the machines run everything.

Layers and Tools

Thankfully, Apple has also done its research into programming techniques as well as programming languages. The problem with programming these days is that where are too many layers and tools to learn, and they are taking us back to a simpler times where you are “close to the metal”. Apple rightfully shuns these extra layers, and focuses only on code. Once you understand Objective-C, Interface Builder, NIB, XIB, Frameworks, .app layouts, provisioning, xml, plist, controllers, delegates, owners and outlets, then you can create pure lovely code, without any of that layering crap getting in your way.

Programmers must beware of code that essentially “lies” by pretending that the beautiful, native API actually looks like one of the ill-conceived APIs from another language. For example, why would anyone want to view a native UIView image as the practically unsable as3 “equivalent” (I use the term loosly) of BitmapData? I don’t think there is a single successful application ever written that uses this BitmapData class.

Isolating your code from the native API will cause your code to lose its identity. If you can compile it for another (obviously inferior) device then your code will become tainted by the lower class device, even it it performs identically on the Apple device. How quickly people forget that the upper class should not mingle with the lower class.

I hope Apple’s ban extends to the gzip “translation layer”. Programmers should not be using this library because it has security implications, and they should simply use the streaming classes and do the decompression in their own code. If more programmers thought like Apple, then there would be a lot fewer security holes in software.

Don’t get me started on Game Making programs. Thank god these are banned – imagine letting a non-programmer create an App. What next, Artist creating games? Don’t make me laugh.

Conclusion

Apple has made a huge stride forwards by tightening the definition of what a real developer is, and I’m looking forward to what’s next. I think they have a little way to go – for example, what about all those people using foreign editors, rather than XCode? Surely if XCode is not good enough for a developer, then that developer is not good enough for Apple. The best way I can see for them enforcing this is for them to install a “watchdog” application the the developer’s machine, and send screenshots back to Apple periodically. That way, if the developer does not conform to the coding purity required by Apple, they could be identified and sent to a camp to help them concentrate on being better programmers. Win-win, what a great idea.

3 Years On

Wow, has it really been 3 years? 2009 was an interesting year – I guess the big ticket items were haxe for the iPhone and getting hxcpp into the standard distribution for haxe. I am very satisfied with these achievements, however there is still quite a bit of polish to add – especially in terms of ease-of-use.
I also started some other projects – fastcgi for haxe, and “waxe“, the wx/haxe interface, as well as continuing with neash and nme development. One of the big changes for hxcpp, although not visible, was using an internal garbage collector which has improved performance and reduced the compile dependence on a library that is hard to debug on other peoples machines.

Currenly, I’m working on an NME rewrite to remove GPL code from the iPhone target, and to help integration. Now that hxcpp has reached a certain level of quality, the diverse projects are starting to coalesce and I’m pushing ahead with a complete hxcpp/nme/iphone solution which should be very useful.

Looks like 2010 may be the year it all comes together (hopefully!).

FastCGI For Neko On Share Hosting

In my previous post, I described you could setup neko web services on a shared host, using CGI. This method is not as efficient as it might be because a separate process is required for each request. However it is possible to extend this to “Fast CGI” (FGCI), which starts a single process, and keeps it alive. Apache talks to this over a socket, sending requests and receiving data and a very efficient manner.

If you got CGI working, and your server supports FCGI, then the transition on pretty simple.

The first thing to do is to download the new “fastcgi” haxelib. From a shell use:

haxelib install fastcgi

If haxelib asks you for a project directory, the following discussion assumes you specify your “haxeneko/lib” directory.
There is one bit of housekeeping you should do at this time – copy the “nekoapi.dso” object from “~/haxeneko/lib/fastcgi/0,1/ndll/Linux” into your “~/haxeneko” directory. This ensures that this dso will be found when neko is run by the web server.

Now it is time to change the cgi script. The code is very similar, except the extension should be “.fcgi”. Here is the script I used Site.fcgi:

#!/bin/sh
export HAXENEKO=~/haxeneko
export LD\_LIBRARY\_PATH=$HAXENEKO
export PATH=$PATH:$HAXENEKO
export NEKOPATH=$HAXENEKO
export HAXE\_LIBRARY\_PATH=$HAXENEKO/std
cd ../../site
exec neko SiteFCGI.n

Note the final “exec” call to ensure the pipes are all correctly plumbed.
And the obvious change to the .htaccess file (.fcgi extension):

RewriteEngine on
RewriteRule \\.(css|jpe?g|gif|png)$ - [L]
RewriteRule ^(.*)?$ cgi-bin/Site.fcgi [L]

Finally, compile the “Test.hx” file that came with the fastcgi lib. I have a slightly altered version here:

class Test
{
   static var processed = 0;
   static public function main()
   {
      // Called in single threaded mode...
      fastcgi.Request.init("");
      // This can be called multi-threaded...
      var req = new fastcgi.Request();
      while( req.nextRequest() )
      {
         req.write( "Content-type: text/html\r\n" +
            "\r\n" +
            "<title>Neko FastCGI<title/>" +
            "<h1>Fast CGI</h1> Requests processed here: " + (processed++) );
         req.write( "\n page = " + req.getParam("REQUEST\_URI") );
         req.close();
      }
   }
}

This version prints the request uri too. To compile, use:

haxe -main Test -neko SiteFCGI.n -lib fastcgi

And that should be that! When you visit your web page, you should see the “processed” counter increase, verifying that it is the same process that is running.

Currently the system does not support easily killing the FCGI process, which is something that you must do when you update the “.n” neko file. The only way at the moment is to use the shell to do “ps -x” to identify the process number, and then “kill -9 number”, where number is the process number of the neko executable.

Neko on Shared Hosting

I started to think about using neko web technology, but since I have shared hosting, it was not obvious now this could be done. Currently I’m with site5.com, which is very cheap for running multiple websites, but since it is shared hosting, you don’t get to install anything. However, it does have few features that made getting a neko site up and running quite possible.
The key features are:

  1. Shell access – not really required if you can copy files to the site (eg, via ftp) but very useful for debugging and getting things going. The shell I have is “jailshell”, which I think prevents directory listings outsite your home directory, but otherwise is pretty functional (based on bash).
  2. gcc access – again, not really required once things work, but as you will see, pretty much required if things go wrong. And also good if you want to compile a c++ target!
  3. CGI access. Since we can’t modify the apache installation, the only way we can get our code to “run” is via and external process – this is what cgi is for. I will talk a bit about “fast-cgi” later (once I get it going).

First thing is to check you have cgi access. When I first set up the site, I have nothing but an empty “cgi-bin” directory. To test this create a file “test.cgi” in there containing:

#!/bin/sh
echo "Content-type: text/plain"
echo
echo "Hello from CGI!"
set


Now to enter this code, I used old-school remote ssh shell (using putty) & vi. You may choose to ftp it on use filezilla or similar. You will also need to add executable permission (chmod a+x test.cgi for ssh, not sure how to do this via ftp). You can then test it with yoursite.com/cgi-bin/test.cgi. With any luck, you should see the expected greeting, plus the “set” command should dump all the environment variables available to your application.

If you get a “500 – server error” at this stage, it must be fixed. The error is spectacularly unhelpful – not sure where to find the additional error info. Start by trying to run the file from the command line, ie type “~/www/cgi-bin/test.cgi” (assuming this is where your script is located). You should see the output, or perhaps a better error message. Also check for “execute” permission for “all”, as the apache server will run this script with limited privileges. Finally, make sure you have specified a “Content-type” and an additional blank line in the output.

Ok, now we have cgi working! Next step is neko – and haxe too since I will be doing some compiling on the server to help with testing. Haxe is not strictly required if you are deploying pre-compiled solutions.

The hard way

As I said before, you can’t “install” anything on the shared host (no package managers, so it all has to go in your home directory. First thing I did was to download the linux binary distro from nekovm.org. This is easy with the magic “wget” shell command. With your desktop browser, go to the download page and find the link – right click and “copy” the link address. Then go the the shell (putty) window and then paste the link in so you get something like “wget http://nekovm.org/_media/neko-1.8.1-linux.tar.gz?id=download” – hey presto a gzipped-tar file (may have a funny name -that’s ok. Try to use “tab” for tab-complete the filename to save typing). Make a suitable directory and “tar xvzf file” the file to extract the neko files. Now go to the directory and try to run neko. (ie, “./neko”).

You will probably get an error like “libneko.so not found”. But it’s right there, wtf? So you need to set you LD\_LIBRARY\_PATH (“export LD\_LIBRARY\_PATH=~/dir/neko-1.8.1-linux”).

Ok, now you get libgc.so.1 is missing, which indeed it is. The easiest way I found to fix this was to use wget to download the source from “http://www.hpl.hp.com/personal/Hans\_Boehm/gc/gc\_source/gc.tar.gz”, unpack it, “./configure” it and “make” it. You end up with the required libgc.so.1 file in the “.libs” directory, which I then copied to be next to the neko executable. And now “./neko” works – apparently. I will save you the suspense – you also need to do the same thing with “libcpre” from “http://sourceforge.net/projects/pcre/files/pcre/7.9/pcre-7.9.tar.gz/download” – I use the 7.9 version, not sure if 8.0 works. This is required for haxelib later.

See, I told you that compiler access would come in handy.

Ok, neko done, time for haxe. Again the installer is not much use, so I downloaded the binaries from “http://haxe.org/file/haxe-2.04-linux.tar.gz”, however when I went to run this, I found the “tls” library required a GCC 2.4 runtime, which I did not have, and could not up grade. So – you guessed it linux fans, compile from source. One small hump to get over first, haxe requires ocaml to compile. Of course, ocaml is not installed, but if you are still with me at this stage you know the answer – compile from source. So “wget” it, and here is the trick – make a ocaml directory in your home directory (or somewhere under it), extract the source and use “./configure -prefix your\_ocaml\_dir” – this provides the “install” directory, since ocaml can’t be used without installing it. The the make is 3-phase “make world opt install”, and now you should have a ocaml install. You will need to put this in your executable path before you can think about compiling haxe.

The online doco suggests that you download and run “install.ml”. I tried this, but the cvs timed out. So I ran this on my windows box (already had ocaml installed!), tarred up the result and ftp-ed it over to my site. Painful, but it worked. One thing is that this uses the cvs “head” – anyone know where to get the 2.0.4 source tar-ball? Once I had the source, I commented out the “download” call in install.ml and “ocaml”ed it. And haxe was built. The haxe distro has a “tools” directory under it, and you can build “haxelib” if you have neko setup correctly.

Getting the paths right is a bit tricky, so I decided to simplify things. I made a directory “haxeneko” in my home directory and “cp -r *” the files from the neko distro (including the new gc and pcre libraries) into this new directory. Also, I copied the bin/haxe built executable in there, and haxelib too (once it was built). Finally, I copied (“-r”) the “haxe/std” files from the haxe distro into this directory too. Now I have everything required in the one spot – and you can too!

The easy way

I have saved you the pain, and you can simply download the files from haxeneko-1.0.tgz. So you should be able to “wget” this, untar it and be almost ready. You may run into problems if there is some incompatible library somewhere – in which case, back to the hard way for you!

Finally, we need to set up the paths. Because my hosting provides the “bash” shell, this setup goes in ~/.bashrc. The required “install” is:

export HAXENEKO=~/haxeneko
export LD\_LIBRARY\_PATH=$HAXENEKO
export PATH=$PATH:$HAXENEKO
export NEKOPATH=$PATH:$HAXENEKO
export HAXE\_LIBRARY\_PATH=$HAXENEKO/std

You may need to login again for this to work (or you could paste it directly to your command-line), but now you should be ready to compile some code!

Start by creating your site-code in a directory that is not under you www (public_html) folder \- I have called mine “site”. And here is a simple example haxe file:

class Site
{
   public function out(inString:String)
   {
      neko.Lib.print(inString);
   }
   public function new()
   {
      out("Content-type: text/plain\n\n");
      out("Hello World!\n");
      out("Page : " + neko.Sys.getEnv("REQUEST\_URI") + "\n" );
   }

   static public function main() { return new Site(); }
}

which can now be compiled with “haxe -main Site -neko Site.n”, and tested with “neko Site.n” to give:

Content-type: text/plain

Hello World!
Page : null

Alright – I think you can see where I’m going here, but we are not quite there yet. The problem is that the setup variables in the .bashrc file are not used by the apache server. Apparently, you can use “SetEnv” in a .htaccess file to get this to work, but I could not get it to (maybe the module was not enabled). But all is not lost. You can simply use a script to launch neko. Back in the cgi-bin directory, you can replace the “test.cgi” script with a “Site.cgi” script containing:

#!/bin/sh
export HAXENEKO=~/haxeneko
export LD\_LIBRARY\_PATH=$HAXENEKO
export PATH=$PATH:$HAXENEKO
export NEKOPATH=$HAXENEKO
cd ../../site
neko Site.n

Now point your browser at http://yoursite.com/cgi-bin/Site.cgi, and you should see the glorious neko output:

Hello World!
Page : /cgi-bin/Site.cgi

Now creating a bunch of cgi files is painful, and you do not want users to see this kind of implementation details, so we use one more trick \- the almighty “mod_rewrite”.

In your base “public_html” (www) directory, create a file called “.htaccess”, and add the following lines:

RewriteEngine on
RewriteRule \.(css|jpe?g|gif|png)$ - [L]
RewriteRule ^(.*)?$ cgi-bin/Site.cgi [L]

This leaves the css and image files in the www directory, but it redirects all other URLs your neko script, where they show up in your REQUEST\_URI. So now if you use the URL “http://yoursite.com/some_dir/file.html?param=abc&other=xyz”, you get the output:

Hello World!
Page : /some_dir/file.html?param=abc&other=xyz

Now the world (wide web) is your oyster \- you can parse the URL anyway you like, and generate any output you like.

This certainly gets you up and running with neko on a shared-hosting web server. One problem is that 2 processes are created for every request. I have done a some initial work with the “fast-cgi” interface, and think I should be able to get this going, in which case there should be a big boost in efficiency.

There should also be no reason why you could not compile the site to a c++ native executable. However, this may reduce your ability to use the neko “.n” template system.

Quick Update

Just a quick update to let you know things are still moving forwards. I have not done very much iPhone specific stuff recently, but I have been working on the graphics code base in general. The idea is to separate the graphics code from SDL (although still use SDL most of the time) to avoid GPL issues and also allow the library to be used in a wider set of applications rather than just games. For example, inside a window in a larger desktop application, or inside Google’s Native Client plugin. Another goal is to improve rendering speed by minimising the amount of data moved between haxe and the graphics code – this involves moving code logic from haxe into c++. I have also been looking at a few minor tweaks such as setting object widths/alphas like flash does.

Once this is done and moved into Neash/NME, I will fix a few of the remaining issues on the C++ backend – maybe even get to add dynamic properties to the runtime. I would also like to spend a bit of time finishing off the internal garbage collection to make it thread-safe and run automatically – this could solve some of the problems compiling boehm-gc on various systems.

And, of course, I would like to write more tutorials…

Switched to IMMIX for Internal Garbage Collection

I did a little bit of profiling on the iPhone and found a bit too much time was spent doing garbage collection.
The hxcpp runtime has 2 modes – “Boehm GC with explicit statics” and “internal”. The former is from a standard and robust code base, with the latter uses built in code with explicit marking. I added the second mode because Boehm GC was just too slow on the iPhone – not sure why because it is pretty good on the other platforms (maybe I missed a configuration option).

The internal GC has some restrictions that make it mainly suitable for games. These are: the collection must be triggered explicitly, since no stack searching is done, which is most easily done once per frame. And it is not thread safe, which can be worked around. Within these confines, many different schemes can be tried.
My first attempt could probably be termed “Naive Mark and Sweep”, and used free lists. On Windows/Mac this underperfromed Boehm GC, but on the iPhone, worked better.

The current scheme is now “Simplified IMMIX“. It is simplified because it is single threaded, and I have not implemented overflow allocation, defragmentation (although there are hooks in there for moving) or any generational stuff.
I think overflow allocation should be easy enough, and defrag should not be too hard in some form or other. The insertion of write barriers for generational control may also be straight-forward using the “operator =”. I may also change the code generation to separate stack variables (local, function args) from member variables since in the current scheme, stack variables never form roots, and therefore would not need to use write-barriers.

Anyhow, on the “Physaxe” test, which creates lots of small list objects per frame, the Naive GC got about 51fps, Boehm GC got about 65fps and IMMIX got about 69fps – so a bit of a win there. For this test, I triggered all collections exactly once per frame. The difference between Naive and IMMIX is significant, and this perfromance gain also translates to the iPhone, which is good news.

Since the internal scheme is precise, I feel it should be able to outperform Boehm GC by a bit more, and maybe the extra could come from a generational system. The code is actually not that complex (1 cpp file, 1 header file) so any budding GC researchers may want to see what they can do.

Currently, the internal GC is default only for the iPhone, but you can try it on other platforms by changing the #define in hxGCInternal.h. The reason for this is the restrictions mentioned above – the easiet way to conform to these restrictions is to enable the “Collect Every Frame” in neash.Lib. To remove these restrictions, I will need to find some way of stopping the world (safe points?) and some way of capturing the stack (code mods to allow objects to push themselves on a shadow stack?), both of which are very doable, although I’m not sure on the effect on performance.