I started looking at Vessel in January 2013 - initially just in my evenings. In my ‘spare’ time. I was looking at it because John and Martin (the founders of Strange Loop Games and the creators of Vessel) had asked me to consider porting it to PlayStation 3. We knew each other from our time together at Pandemic Studios in Brisbane where I had worked closely with Martin in the Engine team while John worked as an AI programmer. I was keen to do the port but I wanted to be as sure as possible that I knew how much work was involved in it before committing.
I found the idea of porting such a game daunting - it had a custom rendering engine, a bespoke fluid physics system with major parts of the game code written in Lua - but I figured it was time to step up and take on such a challenge. I have an immense amount of respect for John and Martin - they are both incredibly smart coders and I relished the chance of working with them again. I wanted to do it but I didn't want to let them down. They were going to be my clients, but they were also friends.
I started by looking at their code - in order to give a quote on the porting process I needed to have the game running so that I could profile it. Vessel had already been partially ported to the PS3 by a third party, but wasn't compiling when I first got my hands on it. So I spent a few weeks of long evenings fixing the FileIO, the template compilation errors (GCC and MSVC disagree on some of the finer points of templates), and just commenting out anything I didn't think was relevant in order to get the game up and running. Once it was working, I ran my trusty profiler over the game and tried to guess how much time it would take to get this baby to a shippable state. What I found was a little scary.
Vessel runs predominantly on three threads. The game thread which runs all the Lua scripts which control the majority of the game logic and in game objects, the render thread which takes the output of the game thread and converts it to a format that can be fed to the SPUs which then build the graphics command buffer and feed it to the GPU in parallel, plus the physics thread which manages all the physics processing - fluid dynamics, rigid body collisions, everything. John had built the physics system himself and it was pretty impressive (and complicated). It was too much to grok initially, so all I could do was look at the game’s performance on some of the worst levels, profile it there and see how much I needed to speed it up.
I investigated a few of the levels on the PS3 build and picked two that I thought were the worst. What they told me was that the physics thread was the primary cause of slowdown and so I concentrated my analysis on that thread. The game was running at about 60ms per frame in the worst levels I was testing - about 15 to 20 frames per second. From discussions with Strange Loop, I knew that the physics thread had to run at 60fps so this meant that I needed a factor of 4 speed up. From experience, I knew that this was something that was achievable. The game thread was running at around 30ms per frame and the Render thread was clocking in at around 5ms per frame. SPU rendering was pretty fast, and I figured that wouldn't be much of a problem.
Months of fannying about took place and a contract was finally signed in June. So I did the only appropriate thing one could do in that situation - I did two weeks of work on Vessel and then went on a 6 week family holiday to the UK.
In anticipation of the Vessel contract I’d hired an experienced programmer, Ben Driehuis. Ben came highly recommended and had worked on the Bioshock series of games, plus, most importantly, he was living in the same city as me. Having spent the last 6 years working remotely I knew that my best bet of delivering this game on time was to be in the same physical space as my team. I threw Ben in the deep end. He was hired and then given work on platforms and engines that he’d never had to deal with before. Then when I buggered off to the UK he had to manage by himself, dealing with a performance contract on a big name title and then kicking on with Vessel in my absence. Poor bugger.
I also had a friend who was interested in the QA side of things, so James (an old Uni mate) was responsible for playing Vessel over and over and over again, reporting bugs to us and telling us that the performance improvements we’d just made hadn't made any difference. James also helped us out on the testing for the TRC (Technical Requirements Checklist - a massive list of tests that need to be checked off in order to get your game through submission to Sony) and the actual submission of the game.
Once I got back from my holiday we moved into the new office and started working in earnest on the Vessel port. Our goal was to finish Vessel by the end of August - we had about four man-months to do the work, and I had carefully calculated how much time each section of the port would take. Surely that was enough time?
Ben and I divided up the work between us: I took on the optimisation of the physics and Ben was to deal with pretty much everything else - audio was broken, some of the in game objects didn't animate, loading needed to be optimised, memory usage had to be reduced, we needed save games and trophies and a whole swathe of PS3 specific features implemented. Ben has written about some of the fun he had during the port here, here and here.
From my initial inspection of the code, I had noticed a lot of memory allocations taking place, so I was optimistic that I could reduce (or remove) these and scrape back a lot of frame time. I also figured that rearranging the memory usage to be more cache friendly would help the PlayStation 3’s main processor so I could leave the core of the physics code alone. The last thing I wanted to do was to port all of the physics to SPU.
Three weeks later I started porting the physics to SPU. The changes I was making were incrementally improving the performance, but I could see that we’d never hit the speed we needed without dramatic changes. Porting code to SPU isn’t like porting code from one platform to another. You can program SPUs in C++, but they have very limited memory available to them - only 256kb each. So, in order to port any significantly complex task, you need to understand how that task uses memory - where it is, what order it reads it, where it writes it - the things that most programmers of high level languages take for granted. Not only that, but you have to consider how the task will parallelise. Running on one SPU is good, but running on 6 is awesome. (I’ll go into more detail on the SPU porting process in later articles.)
The first SPU task I built was one that looked for drops that were near a set of objects. My first shot at this shaved almost 2ms off the frame time - the function in question dropped from a peak of 2.4ms to 0.6ms! That meant that in order to hit 16ms per frame I just needed to do this 21 more times!
Still, Ben and I kept hammering away at the code. By the end of the first week in August I’d shaved 15ms off the Physics thread and yet that thread remained the bottleneck. Ben had freed up over 200MB of memory and we were close to running on a retail PS3. But we’d started to notice a few things - some of the levers and triggers in the game weren’t working, fluid shaders weren’t working properly, we were getting spikes of over 200ms in some cases and the game still wasn’t playable due to not just the levers not working but the AI of the fluros seemed to be broken (some would teleport away).
It was evident at this point that we weren’t going to hit the end of August deadline. Other projects had taken some time from both Ben and myself, so we still had billable time available going into September, so I let Strange Loop know that things were going to take a little longer than I had expected. They were very understanding and comfortable with the release date pushing out a little more.
Things were progressing well though - by the end of August the physics thread was running at better than 33ms per frame - we were over halfway there! Save games were in place, loading was optimised, and trophies were functioning. Unfortunately, the game thread was slowing down - as Ben was fixing problems with the code and data, more things were being processed. For example, audio wasn’t working before and now there was a lot of audio being triggered so this obviously took more processing time.
It was the second week in September when I realised that I’d made a horribly flawed assumption about the game thread in Vessel. I discovered that it had to function in lock step with the physics thread and so it too had to run at 16ms per frame. Reducing the framerate of the physics wasn’t an option as it was fixed to 16ms per frame - experimentation with varying that resulted in fluid collisions breaking horribly.
So, at that point we had the physics running at sub-30ms per frame, and the game thread running at 30ms and above. We were close, but we still had a long way to go - we now had to figure out how to optimise the Lua heavy game thread down to 16ms per frame. Actually, it had to be faster than that - as the PS3 has only two hardware threads, the 3 software threads have to share those two hardware threads which meant that the game thread plus the render thread had to be faster than 16ms per frame.
Then Ben discovered the reason for the teleporting fluros - a compiler bug triggered only in optimised builds was causing fluros to incorrectly calculate their trajectory during jumps, making them disappear. Turning off optimisation for that function fixed the problem. That was a tricky bug to find, so we were pretty stoked - until we realised that this meant that the number of fluros in view was now higher that it was before. Much higher. More fluros means more processing - more physics, more drops, more rendering, more audio. This one fix resulted in the Physics thread jumping back up to 50ms from around 24ms, the render thread jumping up to 12ms and the Game thread climbing to 40ms.
I was horrified. We were effectively back to where we had started. We now needed to save a total of 60ms over three threads. I wasn't sure we could do it. It was like that bit in horror movies where the protagonist has killed the evil monster, only to have it rise up, stronger than it was before.
The initial optimisations on a project are the easiest. As time goes on and all the low hanging fruit has been picked, the remaining optimisations are harder and harder to perform. We really had our work cut out for us - and we'd come too far to go back now.
With the realisation that both the game and render threads needed to run a 60fps, we had to reassess where we focussed our optimisation efforts. The game thread was doing a lot of different jobs - Lua scripting, playing audio, AI, animating sprites - all sorts of things. There was a lot to understand there, and on top of that, both the physics and render threads still needed more work. But performance wasn't the only concern - there was TRC compliance, game play bugs, new bugs introduced by our changes and, as we eventually discovered, the need for a PC build (more on that later).
Now, Vessel was ‘mostly’ bug free on PC when we started - we did find a few bugs in the Lua and in game code and the few compiler and shader compiler bugs added to that, but there were subtle differences in the way the two platforms dealt with numbers which meant that weird things sometimes happened. For instance, a door on one of the levels would get stuck and your character would have to bump into it in order for it to open. This was due to a variation in position values about 5 decimal places in causing the jamming on PS3. Additionally, there was some code that was used in a number of places that did something like the following;
x = MIN( y/z, 1.0);
What this did on PC was catch the case where z tended to zero and clamped x to 1.0 (
MIN(infinity, 1.0) = 1.0). Unfortunately, on PS3 value/0.0 was
MIN(QNAN, 1.0) was
QNAN. So the clamping never happened resulting in much weirdness in hard to reproduce places (it only occurred when z was zero), and was therefore an absolute bugger to find.
This meant that we weren't just optimising and adding in new PS3 functionality, we were also debugging a large foreign codebase. I hadn't expected that we'd need to change Vessel much - it was a shipped game after all and so I had assumed that it was pretty much bug free.
I had also assumed that for the most part we wouldn't need to change any assets significantly. Things like platform specific images for controllers and buttons were just texture changes and so weren't a problem. Text was fine too. Unfortunately, problems like the sticking door mentioned above meant that we need to go into some of the levels and jiggle bits around. Later in the life of the port we had to tweak fluid flows, emission rates, drains and other fluid based effects to help with performance.
All of this meant that we needed to have the editor up and running, and the editor was built using the same engine as the game so we needed to maintain two builds of the game - one for PS3 and one for PC solely for the editor. This was done crudely by #defineing most of our PS3 changes out and leaving the PC code in there inside the #else conditional. This did slow us down a bit and also made our code that much uglier, but it had the side effect of allowing us to easily test if a bug was present in the PC build or was specific to the PS3.
By the end of September we had fixed many audio issues and managed to get LuaJIT (an optimised version of Lua) working again (it was mostly working but was causing some repeatable problems in some places). Ben also profiled the Lua memory usage (article here) and tweaked its garbage collection to be more CPU friendly (it was occasionally peaking at 13ms in a call). So, slowly, we were scraping back time spent on the game thread. It was still a serious problem, but it was getting better.
Back on the physics system, more and more the CPU code was being ported to the SPUs - in many ways, this was getting easier as I was building a lot of infrastructure which made it more convenient to port. Many of the patterns that I used for one system would translate well to others. I started reducing the amount of memory used by parts of the physics, like dropping the size of the WaterDrop struct to 128 bytes from 144 bytes meant better cache usage and simpler processing on SPU (nicely aligned structs meant that I could assume alignment for SIMD processing).
Some optimisations were straightforward. For example, I changed the way that we processed drops on fluros skeletons. Instead of iterating over the entire drop list and checking each drop to see if it was part of skeleton ‘N’ I changed it to pass over the drop list once, partitioning the drops into buckets of drops per skeleton. Then we could just process a given skeleton, touching only those drops on that skeleton. Obvious in hindsight, and unnecessary on the PC build, but it takes time to get to the point where you really start to understand the code and can begin to change the way it works (rather than optimise a function to perform the same algorithm, just faster).
At this time we also saw the transition of the render thread to the status of low hanging fruit. So we started working on that - unrolling loops, prefetching, minimising data copying. Tweaking, poking, prodding and breaking, testing, measuring, then swearing or cheering and checking in.
The end of September rolled around and we still had no real end in sight. I kept talking to Strange Loop, trying to keep them up to date while we furiously worked to finish this game. As we’d already blown the initial quote, our contract specified that we would have reduce our daily rate for the remainder of the project, to be recouped on sales. So not only did we have significant time pressure, we now had financial pressure on top of that.
October was very productive - it saw over 150 bug fixes and optimisations checked in for the month. As the game was improving in performance and becoming more playable James, our QA guy, was able to play more and more of it. We hadn't actually managed to do a complete play through of the game by this stage, so we focussed heavily on removing all blocking bugs and getting the game in a functionally shippable state (assuming that we would continue to make performance improvements).
Second week in, after discussions with Strange Loop Games we decided to postpone the release of the game to January. This meant submitting the game to Sony in Europe (SCEE) and in America (SCEA) by the end of the year, a mere 4 months after our initial estimate. There was no way I was going to let this date slip again.
Ben was still working on optimising the Lua execution and audio (which at this stage was still taking 4 or 5 ms per frame with peaks way higher than that). I’d thought that as Vessel used FMOD that it would all just work when I turned it on. Unfortunately, the audio was a complete mess on the PS3. While profiling the game thread in detail we discovered that it was taking up huge amounts of time, spiking and causing frame stutters as well as just plain behaving badly. It took weeks of intermittent work plus lots of discussions with the FMOD boys to figure out what the problems were. Turns out, many of the sounds were just set up incorrectly for the PS3 build - effects and layers that were disabled on PC were left on for PS3. Some sounds would loop infinitely and multiple sounds would play at the same time. All these things took time to discover, time to investigate a solution and time to fix. Something we had very little of.
To make things worse, Vessel performed no culling for the rendering, AI or audio. At all. The entire level was submitted to the renderer which culled it and then passed off the visible bits to the SPU renderer. Also, Lua scripts (which are slow at the best of times) were being called even when the things they were modifying weren't in view. Audio was also being triggered by objects that were a long way from the player, relying on distance attenuation to reduce volume. All of this meant that there was a lot of work going on under the hood for no visible or audible impact.
We were still stumbling across weird little issues. For example, the animated textures in the game were never animating. They were loading correctly, but never changing. Ben tracked this down to an endian swap function which was broken (the PC and PS3 have different byte ordering for their numbers and so the same data when loaded from file must be modified depending on platform). This endian swap was only ever called in one place and only by this particular animated material.
The fix is often easy. Finding what to fix is hard.
Even though we had a tight timeline, we also had other client obligations which saw us each lose 2 weeks in November. This helped financially (well, it would have if they had paid on time) but put even more time pressure on us. Regardless, we were confident that we’d make the end of December deadline. Plenty of time.
We finally managed to get a complete play though of the game in November. Only to realise that the endgame didn’t work. Fix, patch, hack, test.
It was during the last week of November that Ben had a breakthrough. He bit the bullet and implemented a simple container based culling system for the game thread. All objects in a loaded level were placed within a rectangular container and there were an arbitrary number of these per level. This meant that we could quickly check what should be rendered or processed (Lua or audio) by looking at the container in view and those next to it. Conceptually simple, but implementation required that we modify the editor (which we’d not done before) and renderer, then visit each level in the editor and manually add the new containers then re-export everything.
This fix made a massive difference to performance. Audio was faster as it wasn’t playing as many sounds. There was less Lua processing happening as there were fewer things active. And rendering was faster as there was less being sent to the render thread. So it worked, and it worked well. In fact, we had to limit the frame rate to 30fps as some places were now rendering at 60fps on all threads!
The container fix was instrumental in getting the game thread to a more playable speed, but it also introduced a lot of little bugs that took a month to fix.
And still, it was too slow.
With only three weeks until the Christmas break we figured we needed to start cutting back on the content in some of the levels in order to get the frame rate up. We picked a few of the worst and closely examined them to try and figure out how we could speed them up. We found the following;
- Trees were very costly (as were their roots). So we cut some of them down.
- Drains were expensive, so we removed some of them.
- Areas with lots of water were obviously expensive, so we reduced the amount of water in some areas, cutting back on the flow or adding drains to reduce the amount of water in pools.
- We tweaked the object limiter code to ensure that there were always enough fluros to finish a given level, yet not too many to make it run too slow. Same with the number of drops and seeds.
All of the above helped, and the game was now running pretty well. There were still areas where it would slow down, and you could (and still can) slow down some areas intentionally by spraying lots of water and generating lots of fluros, but there was no time left for further optimisations. I'd already built 13 different SPU tasks to speed the physics up and one or two for the render thread - it was getting very hard to speed things up more, and at this stage it was risky to make any extra significant changes. This would just have to do.
James was now noticing some play specific bugs. Some fluros weren't moving correctly - jumping too high and destroying themselves on the scenery or just behaving badly. Which is fine in most cases, but this was happening with a key fluro that you needed to use to complete the level. We had to modify the level to make it work again.
In addition to the optimisations that we were still implementing, and the bug fixes for the odd things that were happening, we also had to make sure that the game was TRC compliant. James trawled through that massive missive, highlighting things that we needed to check - the ways that save/load games functioned, how the game shut down, what would happen in the case of no hard drive space left, the order that trophies unlock, so many things. And so little time.
After the Christmas break (where I was forced into yet another family vacation) I dove straight back into the code, hoping to get the final issue fixed and the game submitted as soon as possible. There was still time to submit and if I passed Sony’s submission process first time then we could still have the game on the PSN store by the end of January. I fixed the remaining problem with the start up error reporting and added in a little more error checking just to be sure. I was pretty paranoid about the submission process as I didn’t want to delay the game anymore by failing - I carefully checked everything twice, crossed my fingers and hit the ‘submit’ button to Sony America (SCEA) on January 9.
In parallel to the code submission process there is a ‘Metadata’ submission process. This corresponds to the PlayStation Store assets and the game’s presence there. It consists of all the text, images, trailers, prices etc and they all have very specific requirements that must be met in order to pass submission. James (our QA guy) managed most of this and involved communicating with Strange Loop’s art guy Milenko (who was incredibly responsive - I’m not sure he ever sleeps) and consisted of asking for different resolution images and screenshots, as well as sourcing the different language versions of the game text. It took us a few submissions of the meta data to get it all right, but the turnaround was pretty quick and a continuous dialog with the Sony helped a lot.
The code submission process consists of uploading the game in a special package format plus some extra documents that describe the trophies and other bits and pieces. We had to submit to both SCEA and Sony Europe (SCEE) so we could have the game released in both those regions. We hadn’t submitted to SCEE at the same time as SCEA as we were still waiting on some publisher registration details to come through, so all I could do was wait for that and for the response from SCEA on our initial submission while I busied myself with some other work.
On January 18th I received the first fail from Sony. As fails go, it was a pretty good one. There were three “Must Fix” bugs: one was due to my entering the wrong data in a field when submitting the game package, and two were due to saving games failing when there was no disk space left. They’d also mentioned some slowdown in some of the levels - I’d expected that, and as it wasn’t a critical bug, I ignored it. The save game bugs proved troublesome - Ben had written all of the save game code and with him gone I now had to learn how Sony wanted you to do it, how Ben had done it and how I could make it work correctly. It took me a few days to find and fix the problems and by this time the details that SCEE required had come through, so I resubmitted to SCEE and SCEA at the same time (January 24)
I was quietly confident that it’d all pass this time. I’d thoroughly tested the save game code now and it all worked. What could go wrong? I seriously considered heading out and buying a bottle of Chimay Blue as a pre-emptive celebratory reward.
The first fail I saw came back from SCEE on February 2nd. I’d messed up the submission data again (I hate filling out forms) plus there was a legitimate bug they’d found where once you’d finished the game you couldn’t continue on from that save game if you wanted to continue playing to complete all the trophies. When I had fixed that I re-submitted the new build to SCEE on February 4th and then began to wonder what had happened to the SCEA build - it should have come back at about the same time as the SCEE one. I also worried that they would pass it with the “continue when finished” bug in it. I needn’t have worried, as the SCEA team came up with a whole slew of new bugs that weren’t mentioned in any of the previous tests.
This new bug report that SCEA had sent back was very concerning. The crashes they were reporting sounded like bugs I’d previously fixed - I was also impressed with their thoroughness. One of the bugs was something like “Hang from the rope in level blah and throw seeds for at least 5 minutes into an area with no water. Crashes 70% of the time”. While looking at this bug I received notice from SCEE that they had passed Vessel. Which was great, but it was also worrying as I now had numerous repeatable crashes in the build - so I failed the SCEE build myself in order to be able to submit a fixed build which would match the SCEA build.
By now I was getting a little frazzled. It was like one of those movies where the protagonist has slain the evil uber zombie, only to have it rise from the dead over and over again. Would this submission process never end?
I eventually tracked the crash bugs down to a simple stupid error. There was a section of code that was responsible for limiting the amount of fluid, fluros and seeds in a given section. When the frame rate dropped below 60fps, this code would kick in and drop the number of drops/fluros or seeds to the minimum value deemed suitable for that level. During the final stages of development we had created a FINAL_RELEASE mode which turned off all of the unnecessary debugging code. Unfortunately, this erroneously included some code which updated the time values that the limiting code used, so the fluid, fluros and seeds were never being reduced. Ever. This meant that the game would have been running much slower than it should have, and was prone to crashes whenever certain hard coded limits were exceeded. I've never been so relieved to fail anything.
Something I’d like to focus on here is the level of performance that Vessel was submitted with; Ben and I spent a huge amount of our time just trying to squeeze as much performance out of this game as possible on the platform we had. For the most part, I'm pretty happy with the results. Yes, you can abuse the game and make it slow down quite easily - if you try to fill a room with fluros and water and seeds then you’ll most likely see some slowdown - but during normal play, the game maintains 60fps simulation at 30fps frame rate (the graphics is 30fps, so there are two simulation passes per visible frame). However there are a few levels where the sheer number of fluoros and fluid required to solve the puzzles means that the frame rate will drop to 20fps. Given another month I reckon I could have fixed that too, but given how far over time and budget we were, that wasn't an option. So, while I’m happy with how much we improved the frame rate of the game, I'm still a little disappointed that we didn't hit a full 60fps simulation everywhere.
Sure, the PlayStation 3 hardware did slow down the porting of the game initially, but given the complexity of the fluid simulation required I doubt that there will ever be an Xbox 360 version released. The SPUs allowed us to perform at least 80% of the physics simulation completely in parallel - with no impact on the main processor at all. There is no way you’d get the same level of performance on the X360.
With the FINAL_RELEASE bug fixed and the game resubmitted to SCEE and SCEA (which passed on the 20th and 22nd respectively) I was finally free! The metadata was all sorted and we shipped Vessel on March 11 in the SCEA territories and March 12 in the SCEE regions.
What went right
Having a QA guy in house was invaluable. An impartial, non-technical third party to throw our fixes at kept us honest. As a developer you get very close to the game and like to assume that your fixes have made a positive difference and you’re moving forwards. That’s not always the case.
Honest communication with the Publisher/client. I tried to keep John at Strange Loop Games constantly up to date with the progress (or lack thereof) of the game. He was incredibly understanding with the continuous delays, and while it was hard to deliver bad news, at least it reduced the surprises.
Having good, experienced coders on the job. Bringing Ben onboard was a good choice, even though we had to let him go in the end. There is no way I could have done this without him. Working from the same physical office was also beneficial. I worked remotely for over 5 years leading up to this port, and I think we’d have delivered even later if we’d worked separately
What went wrong
Work estimation. I fundamentally underestimated the amount of work required. I should have applied Yossarian's Circular Estimation Conjecture and multiplied it by PI - that would have been closer to the mark. The big killer was the misunderstanding with which threads needed to run at 60fps. If it was just the physics I'd underestimated, we probably would have been a shade over a month late (maybe two), but with both the physics and game threads needing to execute in under 16.6ms, we really had our work cut out for us. The amount of time taken for submission shouldn't be forgotten either; 4 to 8 weeks after the initial submission should see your game on the store.
I’d also recommend to anyone doing a console game that they get as much TRC compliance done as soon as possible in the development of the game. Get the load/save stuff working, trophies, error reporting, quitting, controller connection, file error reporting and the like in place and functioning sooner rather than later.