Use code generation to generate a localization system

By using code generation, you can fully automate the generation of a complete localization system.

February 14, 2020

9 Min Read

This system is the last of many different localization systems I've made over the years. They all had their own requirements, and required different solutions. For our game TERROR SQUID, we needed a fairly simple solution. The game has been localized for 26 languages. Our localization agency put all of the localized strings in a Google Sheet. The sheet was set up with one column per language, and one row per localized string. My job was to create a system that would make it as easy as possible to get the data from the Google Sheet into the game.

Design goals

The design goals were:

Must be possible to switch language at runtime, without having to restart the game.
Localized strings must support variables. E.g.: “You scored x points”.
We should be able to get new content from the Google Sheet into the game with as little effort as possible.
The system should load as fast as possible, and switch language as fast as possible.
The system should produce as little garbage in memory as possible.

Before I started planning, I thought about what worked and what didn’t work with the other localization systems I made, and came to a few conclusions for what to do with this one.

I didn’t want to put all the strings in text assets, like French strings in french.txt, and English in english.txt. It takes time to load an asset, and it has to be deserialized, which takes even longer, and causes a lot of garbage. The old asset has to be unloaded and any remains cleaned up. That’s not ideal at runtime.
If I had put strings in an asset, I would get a new problem; variables in strings. The asset can’t contain any logic, so I would have to load a string, parse it, replace any tags with values, and lastly join the strings. That’s expensive to do at runtime. When supporting variables in strings, you can’t escape the fact that you need to join strings before showing them, but the effects of the problem can be reduced.

The system

I started looking at what it takes to support variables in strings. I figured that if the localized asset could embed logics, I would write each string like a function, like this:


string GetScore(int score) => $"You scored {score} points!";

That way I would separate the API from the contents. It wouldn’t matter what language the system returned, I could still call GetScore(123) and get the correct localized string. The calling code would never need to know what the current language is, or any other specific details about the localization system or the state it’s in. All it needs to know is which method to call.

Merging text with logics like this could be done by code generation. That’s easy. All you have to do is write the correct code to a text file, and let your engine do the compiling. But would that solve all of my problems? Let’s review the design goals.

Fast loading time

When the strings are part of the source code, they’ll get compiled together with the code, and bundled in one of the dll-files. For TERROR SQUID, we use Unity, and Unity handles the loading of dll’s, and makes sure all code is ready when our game code starts. Convenient!

Switch language at runtime

When strings are grouped in separate classes, one for each language, loading a new language is a matter of instantiating the correct class. Easy and fast!

Produce as little garbage as possible

The language instance doesn’t need any parsing on load, so all it does is allocate a bit of RAM to make room for the class instance. Each string returned allocates a bit of RAM. Switching language means throwing away the previous language instance, and replace it with a new one. The garbage collector will do a bit of work when cleaning up the old instance. So, compared to loading an asset of serialized strings, this will most likely produce less garbage. Sounds like win!

Localized strings must support variables

Check!

Update content with little effort

Generating the code can be automated. Deciding what to generate code for is done via parsing the Google Sheet. Getting the Google Sheet can be solved by using third party software. Someone must have made a Unity integration for Google Docs. So, all of this can be automated via editor scripting. It’s just a matter of writing the code and getting it right. Easy!

Code generation does indeed seem to solve all my problems.

Code generation

So, to make this work, I had to create one cs-file per language, with all the localized strings wrapped in functions with a name that reflects the contents of the string. Here’s a method from the english file:

public string GetVO_Awesome() {
return $"Awesome";
}

The method needs to be generated using only the information returned from the Google Sheet. For this method, I know that it comes from a column called “English”, and a row called “VO_Awesome”. The cell contains only the word “Awesome”. With this information, I created a few simple rules:

Use the name of the column to dictate the name of the file. In this case it’s named “EntriesEnglish.cs”.
Use the name of the row to dictate the name of the method. Always prefix the name with “Get”, so that the method is named “GetVO_Awesome”.
Methods has to be marked as `public string` in order to be publicly available and return a string.
Use string interpolation ($) for all strings.

This was easy for strings without variables. When they contain variables, I had to dig a little deeper.

Variables in strings
I needed a way to identify variables in an unambiguous way. For that, I had to use some special character for the parser to recognize. In TERROR SQUID we don’t use many special characters. We might use an exclamation mark, or a question mark, so it was safe to decide on using curly brackets for variables, like this:

You scored {score} points!

In order to pass a variable via method parameters, it has to be of the correct type. So, I needed a way to describe the variables. The type of variables we needed support for was integers, floats and strings.

Variables ended up being defined like this:

You scored {score:i} points!
for integers.

Your name is {name:s}
for strings.

You are {height:f} meters tall
for floating points.

By postfixing the variable with an i, f or s, I have everything I need to generate the code.

Generating the code
For each string parsed, I have to figure out if it contains variables or not. For that I use a simple regex to look for matches of the variable pattern, which looks like this:

private const string patternVariable = @"{(.*?)}";

The generator is split into three phases. First phase looks for variables. Second phase uses the information about the variables to generate the method signature. The third and last phase generates the method body. Used together, I could easily generate all the necessary classes and methods, both with and without parameters.

Runtime code
To make sure any calling code can access all strings, regardless of language, I created an interface for the entries classes. I had all the information I needed to generate the interface. It looks like this:

namespace TerrorSquid.Localization {
  public interface ILangEntries {
  string GetMainMenu_Play();
  string GetStats_SurvivalFormat(int day, int hour, int minute, int second);
}
}

Implementations of the interface gets generated like this:

namespace TerrorSquid.Localization {
  public class EntriesEnglish : ILangEntries {
  public string GetMainMenu_Play() {
  return $"Play";
}
  public string GetStats_SurvivalFormat(int day, int hour, int minute, int second) {
  return $"{day}d {hour}h {minute}m {second}s";
}
}
}

To be able to switch between languages in a type safe way, I also generate an enum with all the languages the game supports.

namespace TerrorSquid.Localization {
  public enum Lang {
  English = 0,
Norwegian = 1,
French = 2,
German = 3,
}
}

For displaying the possible languages to toggle between in the UI, I wanted to avoid parsing the enum to string at runtime. That’s why I generated a class with the pre-parsed strings. All in the name of performance.

namespace TerrorSquid.Localization {
  public static class LangHelper {
  public const int NUM_LANGUAGES = 26;
  public static readonly string[] LANG_LABELS = {
  "English",
  "Norwegian",
  "French",
  "German",
};
}
}

When using an enum to switch between languages, I also needed a way to know which class to instantiate based on the value of the enum. That could be generated too, like this:

namespace TerrorSquid.Localization {
  public static class LangEntries {
  public static ILangEntries Create(Lang lang) {
  switch (lang) {
  case Lang.English: return new EntriesEnglish();
  case Lang.Norwegian: return new EntriesNorwegian();
  case Lang.French: return new EntriesFrench();
  case Lang.German: return new EntriesGerman();
}
}
}
}

The only thing I didn’t generate was the entry point that binds all of the generated classes together.

namespace TerrorSquid.Localization {

  public delegate void LanguageLoaded(Lang fromLang, Lang toLang);

  public static class Loc {

  public static ILangEntries entries;
public static Lang CurrentLanguage { get; private set; }
  public static int NumLanguages => LangHelper.NUM_LANGUAGES;
  public static event LanguageLoaded OnLanguageLoaded = (f, t) => {};

  public static void SetLanguage(Lang lang) {
Lang fromLang = CurrentLanguage;
CurrentLanguage = lang;
entries = LangEntries.Create(CurrentLanguage);
OnLanguageLoaded.Invoke(fromLang, lang);
}

  public static string GetLangLabel(Lang lang) => LangHelper.LANG_LABELS[(int)lang];
}
}

Of all the runtime code written for the localization system, only 23 are written and maintained by hand. The other 20,173 lines are auto generated from a Google Sheet.

So far, this is the easiest to maintain, and the most efficient localization system I’ve ever written. At least it solves our challenges in possibly the best way it could.

Hope this is of any inspiration.

Reveal trailer for TERROR SQUID: https://www.youtube.com/watch?v=xWz7nolopYQ
Discord: https://discord.gg/9wsFA4Z
Website: https://terrorsquid.ink/
Reddit: https://www.reddit.com/r/TerrorSquid/

About the Author

Thomas Viktil

Blogger

See more from Thomas Viktil

Related Topics

Related Topics

Recent in More

Related Topics

Use code generation to generate a localization system

Design goals

The system

Fast loading time

Switch language at runtime

Produce as little garbage as possible

Localized strings must support variables

Update content with little effort

Code generation

About the Author

Latest News

Trending

Featured Blogs

Related Topics

Related Topics

Recent in More

Related Topics

<span class="ArticleBase-LargeTitle">Use code generation to generate a localization system</span>Use code generation to generate a localization system

Design goals

The system

Fast loading time

Switch language at runtime

Produce as little garbage as possible

Localized strings must support variables

Update content with little effort

Code generation

About the Author

Latest News

Trending

Featured Blogs

Use code generation to generate a localization system