10 Year Anniversary!

Wow, 10 years. We have seen a lot in the last decade. Rise of mobile, fall of flash and the rise of html5. And Hxcpp and Nme still manage to survive and adapt.

Nme just had a new release, 6.0.58. And it has seen some big changes. This release completes the move away from shipping static libraries, and instead ships Acadnme hosts for the desktop platforms. This allow you to test quickly with the cppia target without needing a c++ compile. This target now includes just-in-time compiling and runs at a decent speed. The windows target now uses the “angle” library by default for improved performance though a DirectX emulated OpenglES layer.

The latest release contains the reemergence of the javascript browser target. The approach here is quite different from the original ‘jeash’ – it uses almost exactly the same code base as the c++ target by compiling the core Nme library to a asm.js library, and then calling these compiled Nme functions from normal javascript. This javascript is then bundled with the assets into a single “.nme” file for uploading – you can also add cppia byte code to allow the same .nme file to be run anywhere – desktop, mobile, browser.
The .nme file is kept very small by splitting most of the nme classes and asm.js into separate files, which can be shared and cached between apps.

I have revived the original 1000 Ogre demo. The original was in flash, but is now in html5. The preloader is a bit dodgy, but I will work on that.

You can also check out the ubiquitous BunnyMark demo running under the new system.

You can try your own apps with the latest nme and the:
nme jsprime
command.

The cppia and jsprime targets are still quite new, but they both are looking super promising. Let me know over at gitter.im how you go!

How I Improved Hxcpp Speed 6x

Let me start by qualifying the title with the obligatory phrase in some situations. TL;DR – it was probably too slow to start with.

The performance story starts, like all good performance stories do, with a benchmark. I was looking for a benchmark to measure the allocation system, but also mix in a little calculation, in as few lines as possible – so I settled on the Mandelbrot program. But with a little “deoptimization” to use classes for the complex arithmetic rather than inlining the components. On a whim, I also decided to store the resulting image as an array of RGB classes, rather that an array of Int, which turned out to also have significant implications.

You can see the code in the haxe repo. You may note there are 2 versions controlled by a define, a class based one and an anonymous object (typedef) based one. The typedef option is for future optimization attempts. Also, the performance figures I will be talking about have the “SIZE” parameter increased to 40, to allow more accurate timing. The results are predictable, if not beautiful.

mandelbrot

This benchmark is all about allocation and garbage collection (GC). The allocation part calls the object “new” routine to find some space on one of the memory blocks, and the collection part must scan the allocation space to reclaim object space that is no longer used. These phases have quite different optimization characteristics – the allocation part gets called literally millions of times, and therefore micro-optimizations at the code level may be appropriate, while the collection code get run less frequently, but must do quite a lot of work, so algorithmic optimizations might be more appropriate.

The initial results with hxcpp 3.2.102 were really quite bad. Hxcpp 3.2.102 takes approximately 35 seconds on a windows 32 build. Compare this to the node/js solution of 6.1 second. It should be noted that this test is measuring mostly allocations, and code base is small, so node can quite efficiently inline the code and use its highly-tuned single-thread allocation code. But still, it should be possible to improve things.

Before even looking at the low-level code, one thing that has a big effect on speed is the amount of free space available. The Gc allocates into its free memory, and when it’s full, it reclaims/collects the unused portions. The speed of the reclaiming depends mostly on the the amount of “live” memory after the collection. By increasing the amount of free memory, the time between collections increases, while keeping the collection time roughly the same, thereby improving the overall speed. Hxcpp uses a heuristic to determine the free space it should use depending on the “live” size at last collection. There was a fixed ratio in 3.2.103, but since then I have exposed the settings and increased the minimum working memory to 20M on desktop and 8M on mobile. This alone gives a nice improvement.

After creating a benchmark, the next step in optimization is to run a profiler. And the profiler did indeed show that most of the time was spent in the allocation code. It did not appear to be in one particular part of the code, but instead spread over the whole routine. This implies that I really need to get the line count down, and to that end I decided to simplify some the auxiliary data structures at the expense of memory. The hxcpp Gc uses conservative marking on the stack. That is, it scans the memory and looks for things that look like pointers to Gc objects. It then needs to know with 100% certainty that the thing pointed to is (or was recently) an actual object, otherwise the marking process could trash memory. The means that some extra information needs to be used to track this. Initially, I used what I thought was a clever trick to reuse the byte I was already using to indicate whether a line was marked as a head for a mini linked-list of live-objects. This worked and did not require extra memory, but the maintenance of the list complicated the allocation routine. I switched to using a extra bitmap, which uses 1 bit per 4 bytes of Gc memory to indicate whether an object starts at the particular location. This is 3% extra memory, but it greatly simplifies the allocation logic, allowing for a neater implementation.

The next point of simplification was pre-calculating the “gaps” in the Gc memory blocks. Initially, this was done in the allocation code, but by doing it in advance, it simplifies allocation code that is getting run millions of times. Theoretically, there are a similar number of operations going on here, but practically the separation makes the code much neater and therefore faster. Later, I moved this calculation to a separate thread so it can be done in parallel with the normal code.

I also rearranged the object header to allow the marking code to test whether an object needs marking with a single bit test, and therefore avoid an expensive marking function call in some circumstances. The header also includes a neater row count which makes the actual row marking faster.

These changes sped the allocations up considerably, but the profiler still showed most of the time running the allocation code. The main issue then became the function call depth. Modern c++ compilers can do a pretty good job at inlining code, but they are not magic, so some extra help is required. First a quick review of what happens when you call “new Complex” in haxe. Hxcpp generates a standard c++ “new” call, which calls into hx::Object “operator new” which is where the Gc memory allocation routine is called. C++ then initialises the object by inserting the vtable pointer and hxcpp then calls “__construct” on the object – which is the “new” function defined in haxe. It is split like this to allow the haxe code to call “super” in ways that c++ does not allow with its native constructors. The “operator new” code takes a parameter defining whether the resulting object contains pointers to other object – this allows the marking code to make some optimizations. In the original code, this then called an “InternalNew” function which checked the object size and delegated to either a thread-local allocator, or a large allocator. This allows independent threads to allocate at full speed without needing locks. The problem with this arrangement is that several function calls are required per allocation. The solution here was it inline quite a lot of code into the “operator new” function, and also to observe that haxe objects always allocate from the local allocator, so some some checks can be avoided. Inlining means that some of the guts of the allocator need to be exposed to the haxe code. However, this is kept under control by keeping the complex case (when there is not enough room) in the separate file, and only inlining the “easy” case, which is also helped by the simplifications made earlier. You can see the code here.

There are still a few things that could be done. There are some checks that could be avoided in single threaded app. Another idea I had was to move some of the housekeeping (setting of the header and start bits) to a separate thread to be done later. There are still 2 function calls going on here – the “operator new” (which gets inlined) and the “__construct” call (which does not). It should be possible to inline the construct call if it is simple enough.

These were the changes to the allocation code. As mentioned earlier, there were some improvements to the marking code. Additional things were also done in the collector. The reclaimed memory is cleared in a background thread. This is actually is quite a nice improvement because up to 7% of the time was spent doing the clear. One thing this particular benchmark highlighted was the marking of a single large array (the image array) actually got slower when using multi-threaded marking, due to a slight overhead. I’m not sure how applicable this is to general code, however there was a solution. The multi-threaded array marking code now splits large array marking up if there are idle marking threads.

So with all these improvements, did it beat the node/js speed? Just about – it was a close thing. One more change was really required – compile a 64 bit target instead of a 32 bit target. Then finally the time came in at 5.6 seconds, less than 1/6 the original time. Node is doing a pretty awesome job here and I’m sure it is only going to get better so it will take some work to keep up. If I change the benchmark to use member functions instead of static-inline functions, then hxcpp becomes significantly faster than node, which is nice to know. It should also be noted that I’m cheating a bit here by using multiple threads to get the job done. But I’m not above a bit of cheating.

This brings me to the Java target – 2 seconds. Wow. I’m not actually sure if it is doing any allocations here at all – maybe it has completely inlined the code. Or maybe it just really really good. That sets a seriously high benchmark

Is there still room for improvement? I think there is – particularly, a generational Gc would reduce the marking time to practically nothing in this case, but it requires a bit of effort on the haxe compiler side. Inlining the constructor for simple classes (like the “Complex” class in this code) should also be a nice win.

It has been an interesting experience getting the code faster. I think that it highlights one key thing – if you want the hxcpp code to run faster, then create a simple benchmark.

Cross compile from Mac to Linux with hxcpp

It is possible to use the hxcpp build tool to compile linux 32/64 binaries from a Mac Box. “Why would you want to do this?” I hear you ask. Well, I use this so I can run all my builds from a mac-mini, with windows running in VirtualBox. I know there are other ways, but having a single little box doing all the work is pretty easy to manage.

To do this, you just need to setup the HXCPP_XLINUX variables. There are a number of ways you can do this, but the ~/.hxcpp_config.xml file is probably the cleanest place.

Having installed the cross-compilers from http://crossgcc.rts-software.org/doku.php?id=compiling_for_linux, is it just a matter of pointing to the executables using some xml entries:

<set name="HXCPP_XLINUX64_CXX" value="/usr/local/gcc-4.8.1-for-linux64/bin/x86_64-pc-linux-g++" />
<set name="HXCPP_XLINUX64_STRIP" value="/usr/local/gcc-4.8.1-for-linux64/bin/x86_64-pc-linux-strip" />
<set name="HXCPP_XLINUX64_RANLIB" value="/usr/local/gcc-4.8.1-for-linux64/bin/x86_64-pc-linux-ranlib" />
<set name="HXCPP_XLINUX64_AR" value="/usr/local/gcc-4.8.1-for-linux64/bin/x86_64-pc-linux-ar" />

<set name="HXCPP_XLINUX32_CXX" value="/usr/local/gcc-4.8.1-for-linux32/bin/i586-pc-linux-g++" />
<set name="HXCPP_XLINUX32_STRIP" value="/usr/local/gcc-4.8.1-for-linux32/bin/i586-pc-linux-strip" />
<set name="HXCPP_XLINUX32_RANLIB" value="/usr/local/gcc-4.8.1-for-linux32/bin/i586-pc-linux-ranlib" />
<set name="HXCPP_XLINUX32_AR" value="/usr/local/gcc-4.8.1-for-linux32/bin/i586-pc-linux-ar" />

Then you can build from a “build.n” (such as in $HXCPP/project) file using:

neko build.n linux

wwx2015

Hxcpp – state of the union enum.

Here are the slides from my wwx2015 talk.

You can compile them yourself with:

haxelib install nme
haxelib install gm2d
haxelib install acadnme
haxelib run nme demo gm2d:11 cppia


On linux, you may want to compile for neko or cpp, instead of cppia.

Wwx2014 Talk – Hxcpp magic

Just got back from the haxe conference, wwx2014, after having a great time. Plenty of fantastic people to talk to.

The talk consists of some slides, with a demo at the end. To run the demo, you need to run the binary version on either mac or windows, but the slides can be viewed with flash.

Or, you can build it yourself for you preferred platform and explore the source code with:

haxelib install nme
haxelib install gm2d
haxelib run nme demo gm2d 10-wwx2014 cpp

Edit – must mention that you might need an dev version of haxe to build the demo.

HXCPP Built-in Debugging

The latest version of hxcpp has increased control over how much debugging information is included in the compiled code, as well as some built-in debugging and profiling tools.

Debugging is normally added with the -debug flag to the compiler. This flag turns off compiler optimisation, adds source-line and call-stack tracking and adds additional runtime checks – eg for null pointer access. This is useful for locating problems, but it can make the code run several times slower.

Depending on the application, you may not need all these debugging features. For example, if you do not intend to attach a native debugger, then you probably do not need to turn off compiler optimisations. If you are trying to profile the code, you probably only want the call-stack symbols, but not the other runtime checks. These scenarios are now supported by HXCPP, via compiler defines.

Quick recap: To add a define to the haxe compiler, you can use the “-D DEFINE” syntax on the command line or hxml file, or <haxedef name="DEFINE" /> from an nmml file.

Now, for testing performance, I will be using BunnyMark, and add bunnies until there the a measurable drop in performance. In this case, I added 20000 bunnies to get a frame-rate of about 50fps on my macbook air.

Adding the -debug flag brings the frame-rate down to about 40fps. This is not a huge drop – I guess because the CPU is not that taxed, it the the GPU that is doing more work.

So instead of adding the -debug flag, we can add various defines to achieve the effect we are after.

HXCPP_DEBUG_LINK

Adding this define will keep the symbol tables in the final executable. This is probably not something you want to do in your final release build for external use, but it is something that that is good to have otherwise. This flag will allow you to get meaningful information if you attach your system debugger. Runtime performance hit: none.

DHXCPP_STACK_TRACE

This define will allow you to get a haxe stack trace that includes function names when an exception is thrown, or when the stack is queried directly. There is a small overhead incurred per function call when this is on. Runtime performance hit: very small.

HXCPP_STACK_LINE

This define implies DHXCPP_STACK_TRACE and adds line numbers to the function names. There is additional overhead per line of haxe code, however I had trouble measuring this overhead on the bunnymark – probably because it is not too CPU intensive. Runtime performance hit: small to medium.

HXCPP_CHECK_POINTER

This define explicitly checks pointers for null access and throws an informative exception, rather than just crashing. There is an overhead per member-access. Again, for the bunnymark, it was hard to measure a slowdown. Runtime performance hit: small.

So, these defines can be added without using the -debug flag, which removes the compiler optimisations, which accounts for over 80% of the performance drop.

Built-in Profiler

So where is all the time spent? To answer this question, you can use the built-in profiler. The profiler needs HXCPP_STACK_TRACE, and is started by calling cpp.vm.Profiler.start(filename) from the haxe thread you are interested in. You can call this, say, in response to a button-press, but in this example, I will call it from the mainline. You can provide a log filename, or you can just let it write to stdout. If you use a relative filename in an nme project, the file will be written into the executable directory.

  private static function create()
  {
     cpp.vm.Profiler.start("log.txt");
     Lib.current.addChild (new BunnyMark());
  }

We can then analyse the log. The entries are sorted by total time (including child calls) spent in the routine. The first few are all about 100%, as they look for somewhere to delegate the actual work to. Soon enough, we get to this entry:

Stage::nmeRender 95.60%/0.07%
   DisplayObjectContainer::nmeBroadcast 39.6%
   extern::cffi 60.3%
   (internal) 0.1%


Here the Stage::nmeRender call is almost always active, and is broadcasting an event (the ENTER_FRAME event) for 40% of its time, and calling into cffi (nme c++ render code) for 60% of the time. So we have learnt something already.

You can trace ENTER_FRAME the calls though the event dispatcher until we get to the routine actually in bunnymark:

TileTest::enterFrame 36.55%/8.26%
   Lib::getTimer 0.1%
   Tilesheet::drawTiles 70.2%
   Graphics::clear 6.6%
   DisplayObject::nmeSetY 0.0%
   DisplayObject::nmeSetX 0.0%
   DisplayObject::nmeGetWidth 0.2%
   DisplayObject::nmeGetHeight 0.1%
   DisplayObject::nmeGetGraphics 0.2%
   GC::realloc 0.1%
   (internal) 22.6%

Here the “36.55%/8.26%” means that 35% of the total exe time is spent in this routine, including its children, but only 8% of the total time is spent internally (ie, not in child calls).

And, looking at the drawTiles call:

Tilesheet::drawTiles 25.65%/0.01%
   Graphics::drawTiles 99.9%
   (internal) 0.1%

We see 25% of the total time is spend in the NME Graphics::drawTiles call. So this app spends 60% of the time in drawing routine, and 25% of the time in preparing the draw call. From this, you can assume that it is pretty well optimised!

It is also interesting to note the GC entry:

GC::new 0.17%/0.15%
   GC::collect 12.0%
   (internal) 88.0%


Which is very small, indicating that techniques such as object pooling would have no effect here.

This technique is available on all platforms – you just might have to be a little bit careful about where you write your output file.

Built-in Debugger

The built-in debugger requires you to compile with the HXCPP_DEBUGGER define. Starting the debugger is similar to starting the profiler.

  private static function create()
  {
     new hxcpp.DebugStdio(true);
     Lib.current.addChild (new BunnyMark());
  }

You will note that the “DebugStdio” class is in the hxcpp project, so you will need to add the “-lib hxcpp” command, or via nmml: < haxelib name="hxcpp" />.

The “true” parameter means that the debugger stops as soon as you hit this line. This allows you to set breakpoints etc. Running the above, I get an empty display window (no draw calls have been made yet), and a prompt on the command line:

debug>stopped.
debug>h
help  - print this message
break [file line] - pause execution of one thread [when at certain point]
breakpoints - list breakpoints
delete N - delete breakpoint N
cont  - continue execution
where - print call stack
files - print file list that may be used with breakpoints
vars - print local vars for frame
array limit N - show at most N array elements
mem - print memory usage
collect - run Gc collection
compact - reduce memory usage
exit  - exit programme
bye  - stop debugging, keep running
ok
debug>

Entering “files” shows the input files with indexes 0 to 93. You can use either the filename or file number for setting breakpoints. For example:

debug>break BunnyMark.hx 41

Will set a breakpoint for line 41 on BunnyMark.hx. Currently, you are going to need to have the file handy to look up the line number. So now we want to go back to executing:

debug> c


And immediately, the debugger prints “stopped”. This is because the breakpoint has been hit. To find out where, you can use the “w” command:

debug>stopped.
w
Must break first.
debug>where
*1:FilePos(Method(BunnyMark,addedToStage),BunnyMark.hx,41)
 2:FilePos(Method(Listener,dispatchEvent),neash/events/EventDispatcher.hx,181)
 3:FilePos(Method(EventDispatcher,dispatchEvent),neash/events/EventDispatcher.hx,80)
 4:FilePos(Method(DisplayObject,nmeDispatchEvent),neash/display/DisplayObject.hx,315)
 5:FilePos(Method(DisplayObject,dispatchEvent),neash/display/DisplayObject.hx,198)
 6:FilePos(Method(DisplayObject,nmeOnAdded),neash/display/DisplayObject.hx,458)
 7:FilePos(Method(DisplayObjectContainer,nmeOnAdded),neash/display/DisplayObjectContainer.hx,183)
 8:FilePos(Method(DisplayObject,nmeSetParent),neash/display/DisplayObject.hx,659)
 9:FilePos(Method(DisplayObjectContainer,addChild),neash/display/DisplayObjectContainer.hx,31)
 10:FilePos(Method(BunnyMark,create),BunnyMark.hx,81)
 11:FilePos(Method(BunnyMark,main),BunnyMark.hx,69)
 12:FilePos(Method(Reflect,callMethod),Reflect.hx,55)
 13:FilePos(Method(*,_Function_1_1),ApplicationMain.hx,59)
 14:FilePos(Method(*,_Function_1_1),neash/Lib.hx,75)
 15:FilePos(Method(extern,cffi),/Users/hugh/dev/code.google/hxcpp/src/hx/Lib.cpp,130)
 16:FilePos(Method(Lib,create),neash/Lib.hx,64)
 17:FilePos(Method(Lib,create),nme/Lib.hx,60)
 18:FilePos(Method(ApplicationMain,main),ApplicationMain.hx,39)
ok

The “you must break first” is a little bug in the debugger due to the fact that the break is async. But trying again seems to have fixed this. So here you can see the full call stack. Using the “vars” command shows the local variables, and using the “p” command (short for “print”) shows the values.

debug>vars
[e,this]
ok
debug>p e
nmeIsCancelledNow=false
nmeIsCancelled=false
_type=addedToStage
_target=BunnyMark
_eventPhase=1
_currentTarget=BunnyMark
_cancelable=false
_bubbles=false
type=addedToStage
target=BunnyMark
eventPhase=1
currentTarget=BunnyMark
cancelable=false
bubbles=false
ok
debug>p this
fps=FPS
bg=Background
useHandCursor=false
buttonMode=false
nmeChildren=2 elements
tabChildren=false
numChildren=2
mouseChildren=true
nmeMouseEnabled=true
needsSoftKeyboard=false
moveForSoftKeyboard=false
mouseEnabled=true
doubleClickEnabled=false
nmeScrollRect=(null)
nmeScale9Grid=(null)
nmeParent=neash.display.MovieClip
nmeID=3
nmeGraphicsCache=(null)
nmeFilters=(null)
y=0
x=0
width=580
visible=true
transform=neash.geom.Transform
stage=neash.display.Stage
scrollRect=(null)
scaleY=1
scaleX=1
scale9Grid=(null)
rotation=0
parent=neash.display.MovieClip
opaqueBackground=(null)
nmeHandle=null
name=BunnyMark 3
mouseY=0
mouseX=0
mask=(null)
height=740
graphics=neash.display.Graphics
filters=(empty)
pixelSnapping=NEVER
pedanticBitmapCaching=false
cacheAsBitmap=false
blendMode=NORMAL
alpha=1
nmeTarget=BunnyMark
nmeEventMap=Hash
ok

So now we have a little fun (in a very nerdy sort of way):

debug>set fps.y = 200
ok
debug>cont
running

And you can see that the fps counter has moved down the screen. The more astute reader will notice that the “y” member is actually a “property” and that setting this property has actually caused some haxe code to be run. You can also print the value of a function call to get arbitrary functions to run.
While the code is running you can use “break” to stop it wherever it happens to be.
Using the “exit” command is a quick way out – this can be useful on android when you want to make sure the process is actually dead.

You can jump into a different function on the stack via the "frame" command (notice the "*" has moved), and examine the vars there:

 => frame 17
ok
 => where
 1:FilePos(Method(BunnyMark,addedToStage),BunnyMark.hx,41)
 2:FilePos(Method(Listener,dispatchEvent),neash/events/EventDispatcher.hx,181)
 3:FilePos(Method(EventDispatcher,dispatchEvent),neash/events/EventDispatcher.hx,80)
 4:FilePos(Method(DisplayObject,nmeDispatchEvent),neash/display/DisplayObject.hx,315)
 5:FilePos(Method(DisplayObject,dispatchEvent),neash/display/DisplayObject.hx,198)
 6:FilePos(Method(DisplayObject,nmeOnAdded),neash/display/DisplayObject.hx,458)
 7:FilePos(Method(DisplayObjectContainer,nmeOnAdded),neash/display/DisplayObjectContainer.hx,183)
 8:FilePos(Method(DisplayObject,nmeSetParent),neash/display/DisplayObject.hx,659)
 9:FilePos(Method(DisplayObjectContainer,addChild),neash/display/DisplayObjectContainer.hx,31)
 10:FilePos(Method(BunnyMark,create),BunnyMark.hx,81)
 11:FilePos(Method(BunnyMark,main),BunnyMark.hx,70)
 12:FilePos(Method(Reflect,callMethod),Reflect.hx,55)
 13:FilePos(Method(*,_Function_1_1),ApplicationMain.hx,59)
 14:FilePos(Method(*,_Function_1_1),neash/Lib.hx,75)
 15:FilePos(Method(extern,cffi),/Users/hugh/dev/code.google/hxcpp/src/hx/Lib.cpp,130)
 16:FilePos(Method(Lib,create),neash/Lib.hx,64)
*17:FilePos(Method(Lib,create),nme/Lib.hx,60)
 18:FilePos(Method(ApplicationMain,main),ApplicationMain.hx,39)
ok
 => vars
[icon,title,flags,color,frameRate,height,width,onLoaded]
ok
 => p title
BunnyMark

The command line is all well and good for desktop apps, but it is not much use for mobile apps. To use the debugger on mobile, you can create a debug socket server - the mobile will then connect over WiFi. On your desktop, create and run the hxcpp DdebugTool:

haxe -main hxcpp.DebugTool -lib hxcpp -neko server.n
neko server.n
Waiting for connection on 192.168.0.12:8080

You can see the ip:port that the server is waiting on. Then, back in your code, replace the "DebugStdio" line with a "DebugSocket" line, using the ip:port from the server:

private static function create()
{
   new hxcpp.DebugSocket("192.168.0.12",8080,true);
   Lib.current.addChild (new BunnyMark());
}

And then the operation is exactly the same as before.

The debug code is designed to allow different protocols and backends. It should be possible to replace the code in the hxcpp library with code of your own to present the debug information in any way you like. The "worker" classes are in cpp.vm.Debugger, and you can build your own debugger on top of these.
As a final trick, you can call the debugger functions directly (eg, to always add a breakpoint), or cpp.vm.Debugger.breakBad() (I've been watching too much TV) to allow complex conditions to generate breakpoints, eg:

   if (items.length>0 && !found)
      Debugger.breakBad(); // WTF ?

WWX Conference

I recently got back from ‘WWX’, the World Wide Haxe conference. I had a fantastic time – got to meet a whole bunch of people from the community.

I gave a presentation about the architecture and some details of the hxcpp backend. The slide are in wwx2012-hxcpp.pdf.

NME From Scratch

Part 1 – Get The Basics Going

I’ve just got a shiny new computer at home – nothing installed. So it seems like a good chance to go through exactly what it takes to get and NME sample up and running on a new Windows 7 box.

8:20pm For starters, I’ll need a c++ compiler, so first thing is to start the MSVC 2010 Express downloading: 2010-Visual-CPP.

8:25pm OK – I’ve signed my rights away and that is downloading. The next thing I’ll need is haxe. It is easy to install from here.

8:28pm Haxe 2.07, neko 1.81 downloaded and installed. Windows complained that it might not have installed correctly, but this is just because the exe had “installer” in the name, and did not write an “uninstall” entry.

Test: Start a “cmd” prompt by clicking on the Windows start circle and type “cmd[Enter]” into the search box. And in this box, type “haxe [Enter]”. I am now rewarded with the haxe help message.

8:35pm Visual C++ Express in successfully installed.
Test: Start up a new cmd shell, and type “cl“. This does not work because the exe can’t be found in my path. But here is the trick. Type “c:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Tools\vsvars32.bat” at the prompt (note:include the quotes!), and get the message “Setting environment for using Microsoft Visual Studio 2010 x86 tools.”. Now “cl” is rewarded with the Microsoft banner.

8:41pm I’m on a bit of a roll here, so I’ll see if I can get an haxe project going. As I said, I have nothing installed, so I’ll go old-school. First thing is to make a directory. The cmd prompt starts in my home directory (c:\Users\Hugh), and I will make a directory:
mkdir projects
cd projects
mkdir hello
cd hello

And now, do the best I can:
notepad.exe Hello.hx (yes I do want to create the file)

class Hello
{
  public static function main()
  {
     trace("Hello!");
  }
}


And switch back to the cmd prompt:
haxe -main Hello -neko hello.n
neko hello.n

Hello.hx:5: Hello!

Woo Hoo! 8:48pm and I’ve run my own program.

Now lets get even more adventurous, and try a c++ example. Trying:
haxe -main Hello -cpp bin
Tells me that “Project hxcpp is not installed” – so let’s install it:
haxelib install hxcpp
And try again:
haxe -main Hello -cpp bin
And test:
bin\Hello.exe
8:43pm, I have my first hxcpp prgram working!

Now, try for some graphics:
haxelib intall nme
and start a new project:
cd ..
mkdir graphics1
cd graphics1
copy “c:\Motion-Twin\haxe\lib\nme\2,0,1\samples\02-Text\Sample.hx”
haxe -main Sample -cpp bin -lib nme
bin\Sample.exe

9:05pm And there it is. Haxe, neko, hxcpp, nme VC2010 installed and run in 40 minutes, including this write up.

Part 2 – Compile NME From Source

Well, that went much better than I expected, so I will now attempt some bleeding-edge stuff. The version on NME used above is old, and I have no one to blame but myself. My intentions are to do a release soon, but I just have not got my finger out. Which leave me with the option of compiling NME from source if I want the latest features.

First thing, is to create a place where I can download various bits of source code for compiling. I’m going to put it a “e:\code.google”, because my C drive is a fast SSD, and has limited room.

e:
mkdir code.google
cd code.google

Following the instructions from the source page, but changing the name, I can get a copy with:
svn checkout http://nekonme.googlecode.com/svn/trunk/ nme
– if only I had svn installed. So first install this, I’ll be using this version. Once installed, I have to restart the cmd prompt and do the vsvars32.bat thing again. Now when I try again, I get the required files. There is also a companion project to go with NME, and that is the “sdl-static” project, which contains libraries required by NME. To get this, simply do:
svn checkout http://sdl-static.googlecode.com/svn/trunk/ sdl-static
This takes a while….

Time to build –
cd nme\project
haxelib run hxcpp Build.xml

The “haxelib” tool looks for a file called “run.n” in an installed haxe library and runs it. In the hxcpp project, the run.n file gathers compiler options to build the haxe output. This same program can be used to build other projects – including the NME project. Unfortunately, compiling NME like this gives the error ” cannot open input file ‘ddraw.lib'”. This is because the VC express install does not have all the required system support files. This file can be found in the “DirectX SDK”, and I’ll be using the June 2010 version. This is a huge file, so it will take a while. If you think it is a lot of effort for a tiny lib, then you are right.

10:10pm and the download has finished. I have chosen to install it in “e:\SDKs\Microsoft DirectX SDK (June 2010)”, because I’m trying not to put crap on my C drive, and I will be installing quite a few SDKs, and it is nice to have them all together.

This does not immediately fix the problem, because the NME project does not know where I installed it. This is where the per-machine hxcpp config comes in.

Following the instructions in BuildCommon.xml, I create a file in “C:\Users\Hugh” called “.hxcpp_config.xml”, and put the following in it:

<xml>
  <section id="exes">
     <linker id="dll" if="windows">
        <flag value = "-libpath:e:\SDKs\Microsoft DirectX SDK (June 2010)\Lib\x86"/>
     </linker>
  </section>
</xml>

Oh crikey! Looks like Microsoft in their wisdom have dropped support for this ddraw.lib, and I’m currently using a version of SDL that needs it! It’s OK, problem solved – I’ve added it to the NME project, but you still need the SDK for dxguid.lib, which I guess I should also add.

Anyhow, after a long delay, at 10:30pm I have NME building!

Now, going back to the original graphics1 example, the first thing to do is tell haxe to use our SVN haxe code instead of the 2.0.1 dowloaded from haxelib. This is done via:
haxelib dev nme e:/code.google/nme

Then build & test:
haxe -main Sample -cpp bin -lib nme
bin\Sample.exe

Which works as before. But now we can test some of the new features in NME. First get the new sample, and the new associated project file:
copy e:\code.google\nme\samples\02-Text\Sample.hx .
copy e:\code.google\nme\samples\02-Text\Sample.nmml .

Then you can use the NME build tool, with the command “test” (which is “build” and “run”) on the Sample.nmml project file, and for the target “neko”.
haxelib run nme test Sample.nmml neko
And you can see the result. Then you can test for cpp:
haxelib run nme test Sample.nmml cpp

So it’s now 10:45pm (had to catch the end of “Dexter”) and I’ve successfully compiled the latest version of NME and tested the new project feature.

Part 3 – Android

Things seem to still be going well, so I’m going to take one more step – android (spoiler – this is going to take longer than expected). First thing to so is install the Java Development Kit. (NOTE: Install the “windows” version, not the “x64” version) Then, the android SDK.
I installed java JRE and JDK in my SDK directory, but Google’s (always painful) build tools seem to think I have not installed java, even though it works from the cmd prompt. Thank guys. So I’ve uninstalled it, and reinstalled the JRE in the default location, and now it seems happy. The Android SDK download is just the start – it now runs and downloads a whole bunch more. This looks like it may take some time…

I may as well get on with downloading the NDK too. And while I wait for those I’ll get my phone ADB USB drivers installed. My HTC phone actually installs the drivers when I install HTCSync, found on my sdcard that was shipped with it.
EDIT: The android ndk r5b still has issues with exceptions/c++. However, these can be solved by dropping this version of libstdc++.a from the Crystax r4 distribution over the top of sources/cxx-stl/gnu-libstdc++/libs/armeabi/libstdc++.a in your downloaded ndk. If google ever manage to write a good build system, they might end up being a successful company.

The Google build tools also require the “Cygwin” utilities, so install these too.

Finally, we will need a new version of hxcpp, which we can get with:
e:
svn checkout http://hxcpp.googlecode.com/svn/trunk/ hxcpp
haxelib dev hxcpp e:\code.google\hxcpp

11:45pm, I have finally downloaded and installed the Android prerequisites (I think) but will give up now.

Next day – Here we go again. Now to use the google android NDK, you need to have the cygwin dlls in your exe path. To change the path, right click in the “Computer” shortcut in the start menu, and choose “properties”, then on the left “Advance system settings”, then the “Environment Variables” button and scroll through the top bit for “PATH” and click “edit”. This already has haxe and neko in it, so we add the cygwin:
%HAXEPATH%;%NEKO_INSTPATH%;e:\SDKs\cygwin\bin
Now restart the cmd prompt, and typing “ls” should work.
And one more thing – in lieu of using “eclipse” for java building (which I just can’t stand – don’t get me started), the google tools need the “ant” program, which you install by unzipping somewhere.

Tell the build system where we installed these things.

set ANT_HOME=e:/SDKs/ant
set ANDROID_SDK=e:\SDKs\android-sdk
set ANDROID_NDK_ROOT=e:\SDKs\android-ndk
set JAVA_HOME=e:\SDKs\Java\jdk1.6.0_24

And rebuild nme, like before, except that the “obj” directory should be removed first, because I have not yet allowed 2 compilers to be running at the same time.
haxelib run hxcpp Build.xml -Dandroid

Now, back in the original directory, we can build + run for android:
haxelib run nme test Sample.nmml android

Which, finally, works! You can terminate the debug log with control-c.

So, an awful lot of set up, but subsequent projects should only be a single line.

Android + HXCPP – a Quickstart Guide

After having some success with making an Xcode template, I thought it would be relatively easy make something similar for eclipse and Android. However, there was nothing but pain for me when I tried, so instead I’ve decided to write this guide.

Prerequisites

There are quite a few prerequisites you need to organize before you can get things going. Android allows building from Windows, Mac and Linux. The procedures are quite similar, except that Windows requires some messing about with Cygwin binaries. The method described here avoids most of the Cygwin pain \- and the google make sytem pain \- by avoiding the google make sysem altogether.

  • Download and install the Android SDK. This is the Java tools and libraries required for building and debugging byte-code applications.
  • Download and install eclipse IDE. This is the IDE that runs the SDK – follow the instructions on the Android SDK page and install the Android plugins too.
  • Install the USB drivers for your device (if required). For my device (HTC Legend) I found the drivers on the phone itself by using it as a thumb drive.

You should now be in a position to build some sample (byte-code) applications for your device.

  • Download the Android Native Development Kit. This allows you to build binary code for your device. Now for HXCPP, it is very important to download the latest build provided at Crystax.net, which is a build done by a generous community member that corrects some of the glaring omissions of the official build, namely RTTI and exceptions. If it is all the same to you, extract it to c:/tools/android-ndk for Windows, and ~/tools/android-ndk for other systems, and this will make the remaining instructions easier.
  • Currenly on Windows, you need the svn version of HXCPP (slightly newer that 2.06.1) which has some include path fixes. See the instructions at http://code.google.com/p/hxcpp/ for getting the latest version.
  • Also on windows, you need the Cygwin dlls in your path. One way to do this is to install the whole Cygwin toolchain and put it in your path. The other way so to drop the two dlls from cygwin-extra.tgz into the ndk binary directory, ie c:/tools/android-ndk/build/prebuilt/windows/arm-eabi-4.4.0/bin.

Project structure

An android project consists several components that all work together.

  • Java Code. The Java code provided in the sample project comes from a couple of places. Because the project is graphics based, the copy NME Java code is included. If the version of NME increases, it may be desirable to update the NME code, either by copying the new code in, or instead linking to the NME code directly. Also, the HXCPP bootstrap Java code is included along with a small Activity wrapper file.
  • Native code. The shared object files provide native code for running on the device. These include the standard libraries, the NME library and the haxe code compiled with hxcpp.
  • AndroidManifest.xml. This controls how your application is deployed, and quite a few things can be done with this file. It is best to consult the Android documentation about what can be done here.
  • Resources & Assets. These can be useful if you want to add standard menus or other GUI elements to your application.

The basic workflow starts by making a change to your haxe source files. You then compile the haxe code to Android cpp, which is in turn is compiled to an android shared object. This .so file is then copied to the libs/armeabi directory in the project. Because eclipse does not recognize a change to the shared objects as a important update, it is then necessary to touch one of the Java files so that eclipse rebuilds the project. These steps are handled by the build_haxe batch/shell scripts provided with the project, so all you should have to do is change the code and run the script. Then, press the “play” button in eclipse(the first time you do this, you may need to specify Run-As Android Project) and your application should launch.

The haxe code included in the sample directory uses a fixed class name, AndroidMain, as the bootstrap point for building the haxe shared object, libAndroidMain.so. By fixing these names, the build script is simplified. I encourage you to put your main code for the application outside the provided project directory, and edit the AndroidMain.hx and build.hxml files to point to this external application code. This will help with cross-platform development, and keep the boiler-plate code separate from your precious source code.

Creating a New Project

I could not find a very nice way to make a project template, so this is what I’ve come up with. First, download and extract the example project, android-2.06.1.tgz. You may like to rename the parent directory from android-2.06 to something more meaningful at this stage.

At this point, you should be able to build the sample haxe code using the build-scripts provided. This requires your prerequisite installations to be good, so it is worth testing. If you have downloaded the android-ndk to a different location, you can edit the appropriate build script to reflect this. You will need the latest NME code from haxelib. Windows users may also need the svn version of HXCPP.

So that all worked? Congratulations, your system is set up for development.

Next, fire up eclipse, and create a “File – New Project..”, then select “Android Project”, then select “Create project from existing source”, and browse to your newly created directory. You will notice that down the bottom of the Dialog, the properties are filled out with names from the sample project – we will change these next. Once you select “Finish”, your project should be created, and ready to run on your device.

The project and package names are tied into Java and Android naming conventions, as well as the Android manifest, and can be difficult to budge. It is easiest to use the eclipse Refactor-Rename menu option to change the name from “MyActivity” to something more appropriate for you, say “CircleDisplay”. Then in the source tree under “src”, there is a file in com.company called “MyActivity.Java”. Select this, and use the menu option to change its name to “CircleDisplay” too. Similarly, select the “com.company” and change this to something else, in my case “com.gamehaxe” (select preview and agree to everything). There is one final change required – the refactor option misses a reference in the AndroidManifest.xml because it starts with a period. Double click this and in the “AndroidManifest.xml” tab, change the “.MyActivity” to “.CircleDisplay”.

It is important to rename these items because it effects how your application is ultimately stored in the device.

So now you should be good to go – press the play button and select “Android Project”.

There are quite a few things that can go wrong with so many things to install, so I’ve got my fingers crossed for you.