My name is Max Savenkov and I work for Owlcat Games, the company behind one of the most successful Russian computer RPGs, Pathfinder: Kingmaker. When we started to work on the port of our first game for consoles, we encountered the problem of memory leaks. Unity's built-in tools were of little help for various reasons, as well as platform-specific ones (which are mostly geared toward tracking memory usage in native code). So we decided to write our own memory profiler.
Owlcat Mono Profiler is our new tool for tracking Mono heap usage in Unity-based games. You can download the latest stable version and sources at project's Github page. Unlike Unity's built-in Profiler or Memory Profiler package, our software doesn't require taking memory snapshots at the specific moments. Instead, it monitors the state of the Mono heap continuously, which allows the user to track down memory leaks, memory peaks and superfluous repeating allocations. Compared to platform-specific tools, like Memory Analyzer for PlayStation 4, it correctly handles memory managed by Mono's Garbage Collector.
Now that the formalities are out of the way, let's dive into the story of its creation.
Why and Wherefore
It all began when we noticed our game leaking memory. It wasn't much of a problem on PC, because the leaks weren't a mighty waterfall, and besides even the low-end PCs now have more memory than PlayStation 4 or XBox One. Also, Windows handles out-of-memory situations by attempting to dump other processes into swap file, delaying the inevitable, while consoles unceremoniously kill your game and send you debugging.
Unity's built-in tools were of no help: the profiler in Unity 2018.4 took more than 8 hours to take a single snapshot of our game, and often it couldn't even finish the task at all (things got much better in 2019.x, but we couldn't upgrade Unity version without breaking a great number of things).
PS4 SDK offered a marvelous tool called Memory Analyzer. Seriously, this is one of the best memory profilers I've ever seen (even though it has its little problems). The ability to mark any function with a matching signature as alloc/realloc/free alone makes it super-useful for any game that uses custom memory allocators, pools etc.
Unfortunately, it couldn't help us in all cases. Mono, as used by Unity, is running a venerable garbage collector called BoehmGC. It's quite stable and battle-tested, but in some respects, it resembles a cross between a black box and a black hole, because you can put things into it, and it processes them somehow, but very little information gets out. In particular, there is no way I could find to detect than an object has been deallocated.
So, why is it hard to write a memory profiler for Unity?
Let's take a step back, and remind ourselves how garbage collectors work. Before joining Olwcat Games, I mostly worked on projects with custom engines written in C++, so I had only some vague ideas about garbage collection, most of which were proven wrong by the harsh reality. If you're already an expert in this area, skip the next paragraph, or hold your peace while I offer an incredibly simplified and possibly mangled explanation for others like myself.
So, what does a garbage collector do? It allocates a block of memory from the system... And never returns it (at least this is the way BoehmGC behaves on PS4). Inside this block of memory, it allocates smaller blocks for user-requested objects - in fact, it's a bit more complex than that, but it doesn't matter in this case. What matters is that the fact of allocation is easy to notice. There are a few functions with obvious names like gc_malloc_whatever, which can be marked as "alloc" in Memory Analyzer, or intercepted in some other way at your leisure (and in fact, Mono provides a callback which is called for every allocation).
Deallocations are another matter entirely. In non-managed languages like C/C++ you just call free or delete on an object and you're done - you have an exact place and moment where deallocation takes place. With a garbage collector, things are different: the user's code just kinda "forgets" about the object, by clearing references to it as they leave scope or whatever. Now, the garbage collector doesn't actually hound every executed instruction waiting for the moment when some “mov” or “lea” clears the memory where the reference used to reside. Instead, once in a while it stops the world and takes its time while it sorts through all objects, seeking ones which became unreferenced. Then it marks their memory as free for reuse.
This, together with some peculiarities of BoehmGC means two things: one, the deallocation event can be very far (by tens of seconds) removed from the moment the last reference to it is lost, and, more importantly, we can't actually get a deallocation event from this library. I spent some time glaring at BoehmGC code, but proved to be too feeble-minded to find a place where marking of the object as deleted takes place. Still, even if I did find it, it would be no use: we couldn't really change the code of the garbage collector, since it can't be recompiled on consoles, and troublesome to recompile on desktop (you need to recompile Mono as well, and Unity uses a custom version of Mono, which compounds the problem).
Hitting the brick wall
Then I had a bright idea to add a finalizer to each allocated object. Finalizer is a function that serves as a kind of "destructor" for managed objects, and is called whenever the object is finally deallocated by the garbage collector. But this path also led me nowhere: I could modify il2cpp code (which is open source) and add the finalizers, but I couldn't do it with Mono builds (which we use on PC, because it's easier for modders). Also, some objects already had finalizers necessary for their normal functioning, so I needed to save them, and call them from my finalizer. It is possible that I could make it all work, on consoles only, and via modifying parts of il2cpp code... But that didn't sound like a good idea, so it was time to go back to the drawing board.
Well, no, not the drawing board. Actually, I went back to googling stuff. What I found was the official Mono documentation, which describes the built-in Mono profiler called "log". Unfortunately, Unity's version of Mono lacks this tool. Besides, further queries suggested that even if I could get it back into Mono somehow (via a plug-in?), most tools for parsing its output are abandoned, obsolete or unfinished.
Taking the long way
Still more googling led me to an interesting, if also long-abandoned project called Heap-Prof. It was an early attempt to write a memory profiler for Mono without interfering with either Mono itself or GC code. The ideas behind it was pretty simple: to repeat all the work which the real garbage collector does:
* Register allocations as they come via Mono's callback
* Catch "GC finished" events, also using Mono’s callbacks, and iterate over registered allocations, checking whether the allocated object is still alive
Quickly enough, I updated and modified heap-prof's code into a Unity plug-in dll and used GetProcAddress to get necessary Mono functions. And then the game crashed. In a function called mono_object_is_alive. A few attempts at understanding what it does and why it crashes later, which did nothing to improve my self-esteem, I happened upon a letter on the Mono-Dev mailing list, dated November 2009, from one of the developers, Massimiliano Mantione. This missive went on to describe a complete lack of profiling API for BoehmGC, a work-around employed by heap-prof and then said "The problem is that this is not reliable: "mono_object_is_alive" was not meant to be a public function. And in fact sometimes the heap snapshots are wrong (or the profiler crashes)." It also offered a solution, which was to improve profiling API... For the then-new SGen GC. Which Unity's version of Mono still eschews for whatever reasons.
Things looked bleak right then, but nothing can stop a determined Russian Programmer with his trusty GetProcAddress, DumpBin and a penchant for finding ways to abuse any API until it does what he wants. "The Garbage Collector knows if the object is dead. - Thought I, - Can we reproduce its way of determining this?"
Let's take another theoretical digression here and see how a garbage collector determines liveness of an object. We know it searches for references to an object. But HOW it does that? What sounds like a magical trick is, in fact, a copious application of brute force (at least in case of BoehmGC). The GC scans the memory of each allocated object byte-by-byte and checks if the value looks like an address inside its heap. If it does, it considers this a reference to the object that resides at that address (once again, I'm simplifying things for the sake of brevity, but not too much, really). Those of sharp mind will note that if we apply this process iteratively, there will be no objects left in heap at all (e.g. if our heap has just two objects, A and B, and A references B, than during the first iteration, A will be deleted, since no one references it, and during the next one, B will be deleted, since NOW no one references it, too). To avoid that problem, the garbage collector has "root" objects, which are registered externally, and are never deleted even if there are no references to them until they are unregistered.
As I noted earlier, BoehmGC is a black hole - you can register root objects inside it, but there is no way to get the list of registered root objects. Fortunately, Mono solves that problem for us by providing callbacks whenever it registers or unregisters a GC root. And I almost implemented the access to GC's internal list of roots by address!
Fruits of labors
The rest was a matter of some coding a whole lot of debugging. Every time my profiler caught an allocation event, it wrote its address and size to its own allocations table. Every time it noted an end-of-garbage-collection event, it iterated over objects, beginning with roots, and marked all reachable objects and removed the rest from the table. And this is how I got both allocation and deallocation events.
Besides finding dead objects, this approach allowed me to build a graph of live objects, and query it for a list of references to any specific objects to determine who exactly is holding a spurious reference to the object that should be dead (Unity's built-in profiler also offers this very useful functionality).
Now that I had a stream of allocation and deallocation events, I could replay any part of it to get a list of currently alive objects at any time in my profiling session. This allowed me to record a long portion of gameplay and analyze it later instead of taking precisely timed memory snapshots.
Another benefit of having our own profiler that does not rely on Unity's profiling code was that it became possible to profile release builds (and even other people's games!). All that is needed is a PDB file for the version of UnityPlayer that the game uses (the profiler needs it to get addresses of a few Unity functions it wants to intercept; specifically, there is absolutely no way to get end-of-frame event in native code outside of such hackery, so I had to resort to Microsoft Detours).
It's not without drawbacks, too, of course. The profiler slows down the game considerably, about 20% during the non-GC frames, and much more whenever collection happens (up to 10 seconds, depending on number of currently allocated objects). Also, the profiler requires quite a lot of memory on the same machine where the game is running - up to 200Mb for ~2 millions of allocations. For the profiler client's database, even more memory is necessary: up to 4Gb for the longer sessions, though I don't think this poses a serious problem for most developers (and the client can run on another PC - it connects to the server via network).
I'm hoping to improve both performance and memory requirements in future, though.
The current version of the profiler has an UI built with Qt5, and should, theoretically, be easy to port to OSes other than Windows (the support for Linux is planned, but not a priority for now). It uses SQLite database for storage of profiling events and statistics with tables partially stored in memory cache, though I'm planning to research memory-mapped databases later to make it faster. There is no integration with Unity editor for now, because sometimes you want to profile the game in the Editor (to test ideas and fixes, for example) without doing a build, or even the editor itself, and having a profiler allocating managed memory for UI is a very bad idea in this case.
The profiler is open sourced and free to use for all Unity developers (I think it can be adapted for profiling general-purpose Mono projects, but it would require some work). I hope it will prove useful for people outside our company. It's currently in early stages of development, and therefore, certainly contains bugs and lacks some useful functions. I'm awaiting your suggestions (and pull requests!) at Github. I have plans for more Unity developers tools that will benefit our and your games, which will become a part of a set called Owlcat Grooming Toolkit. In particular, I'd like to have a free CPU profiler for Unity that can be given to end-users to diagnose non-reproducible problems at their end.