46. Optimizations From Underneath a Woolly Mammoth

farcryprimalmammothHeader Image from Far Cry Primal reveal trailer.

I’ve been super productive the past week or so. I worked on some fun relatively non-spoilery things, so I can actually talk about them (yay!). Be forewarned; It’s all very technical. But if you’re intrigued by optimization or procedural puzzle generation, press on.

I sent a quick build out to a few people the past couple weeks. Less for playtesting and more for overall thoughts. I’ve still got the next real test build incoming eventually, so if you’re on the list of testers and have been waiting a while….

Don’t worry, I see you.

Anyway, the next build will still be a while, as I have to essentially revise every area in the game. This will mostly amount to cutting lots of puzzles. But after that we can do some testing, and see if I cut too much.

Optimizing Save/Load

One of the “not-testers” this past couple weeks unfortunately lost their progress due to a VM crash. I had assumed this was probably a failure of my saving code (as no untested code is perfect) resulting in a corrupted file, but it turned out that there was no save file at all. The game was only saving when you exited the program. I had forgotten that I disabled periodic saves because there was a noticable hitch if I saved in the middle of gameplay.

Most of this hitch is just because there are so many puzzles floating around in the game world (about 460, with only 230 or so being accessible to the player, and the rest just in the scene as WIP areas or puzzles which I cut for now but that might find a place eventually), but I figured there had to be some performance gains to be had, since I hadn’t really written it in the first place with an eye to performance.

Even still, I tend to avoid doing unnecessary optimization work if I can, so my first inclination was to take the saving and just spread it out over multiple frames. The main issue there is that there is a lot of state that needs to be kept track of to resume saving in an iterative fashion. In theory, this is what coroutines are for, a simple way to make any function you have into an iterative function, by allowing the function to return an iterator which contains its own stack contents. Thus far, I haven’t found a good use for coroutines, because most of the time it’s easier to just store the state in a variable or two and manually resume on the next frame. But aha! Here was the chance for the coroutine to shine!

Except naturally, when I put the function into a coroutine, initializing the coroutine took longer than the actual saving did. So my hitch got worse instead of better. Another “nice” side affect was that I could no longer handle .NETs I/O exceptions because you’re not allowed to use the yield statement (required for a coroutine) inside of a try{} block.

I tried to make the function use asynchronous file operations as well, but this had a similar overhead hitch. Better than the coroutine, but I wasn’t making gains here.

I could’ve continued on. Maybe used threads. Maybe created a coroutine/thread at the startup of the game and just leave it always running along and waiting for saving tasks, but instead I took all this as a hint that I was wasting time on API nonsense and I should actually instrument my code a little and do some real optimizations.

Since the save system is currently human-readable, writing the save files naturally involves writing a lot of text to the file. So, it wasn’t a surprise to see StreamWriter.Write() eating up the majority of the time spent saving. What was a bit more of a surprise was realizing that most of the time spent in that function was on converting integers to strings.

Now, be warned, the following code may disturb you greatly, but this is what I was actually doing for the inner loop which writes the state of all tiles on all panels in the game:

int currentValue;
for(int x = 0; x < (panel.width*panel.height); x++)
     if(panel.currentState[x] == true) currentValue = 1;
     else currentValue = 0;
     sw.WriteLine(currentValue); //sw is the StreamWriter we're using

Now it should be apparent why I was spending so much time converting integers into strings. For every single tile on every single puzzle in the game, I’m calling the StreamWriter.Write() command and passing it an integer. This means that every time, StreamWriter has to convert it to a string before it can write it to the file.

The reason I was passing an integer into the StreamWriter instead of the booleans directly is because if I simply passed in the booleans, StreamWriter will output them into the file as “True” or “False”, and wanted to write the panel state as “1” and “0”.

So obviously, I made the simple change to the following:

for(int x = 0; x < (panel.width*panel.height); x++)
     if(panel.currentState[x] == true)  sw.WriteLine('1');
     else  sw.WriteLine('0');

Just this one change got me about a 3x speedup, by avoiding the string allocations and the integer to string conversion. But there’s a few more things I decided to do to make it a bit faster. The first should be pretty obvious:

char[] panelValues = new char[(panel.width*panel.height)+(panel.height)];
int ii = 0;
for(int y = 0; y < panel.height; y++)
     for(int x = 0; x < panel.width; x++)
          panelValues[ii++] = (panel.currentState[x+panel.width*y] ? '1':'0');
     panelValues[ii++] = '\n';

The main change here is that I simply pre-allocate a char[] buffer for the entire string that represents the state of the panel, and only access the file in one call to StreamWriter. The reason to switch to char[] instead of string should be obvious: strings produce more garbage. And since we’re in a managed language, the less garbage the better.

There’s a few details that are not completely self-explanatory though. One is that I’m now iterating over both width and height. This is just an aesthetic change in that I wanted the panel state to be written 2 dimensionally in the file for better human readability. This is also why the size of the char[] gets and additional panel.height added onto it at the end. I do this because I need an additional column of characters compared to panel states, in order to add the newline characters at the end of each row.

As a bonus optimization on saving, for the situations in which I have no choice but to write integers to the file, I wrote a function to convert integers to char arrays. The function is not very general purpose (obviously), but is much faster than whatever StreamWriter was using.

Note: This code has been updated from the original version due to some bugs pointed out by “Steve” in the comments. If you’d like to view the original buggy code. I’ve posted it here: https://pastebin.com/HNNp7Zug 

    private static char[] IntToCharArray(int input)
        short len = 0;
        if(input < 10) len = 1;
        else if(input < 100) len = 2;
        else if (input < 1000) len = 3;
        else if (input  0)
            output[len-1] = (char)(48+(input%10));
            if(len>1)input = (int)(input/10);
        return output;

Although the method of determining how many places there are in the number is pretty dumb looking, it’s super fast. If we wanted to support bigger numbers, we could just add some more lines.

I also did a few optimizations to the loading function. I don’t want to get bogged down too much in the technical details, but essentially I did the same type of thing as I did for saving. I buffer up each line of the panel before parsing it, so as to cut down on string generation as well as calls to the StreamReader.

Procedural Puzzle Generation

So, one other thing that I’ve wanted to work on for a while is improving the tools that I use to make the game. Basically all of the puzzles in the game thus far have been hand designed. However, I’ve made use of a brute-force puzzle solver in some cases.

The puzzle solver worked by using a recursive function. This is a bit complicated, but the important thing to know is that it was very slow. So slow in fact that Unity would kill its own process because I froze the UI for so long.

Since this was a case where I didn’t care too much about the overhead cost of firing up a coroutine, the first thing I did was make the meat of the solver work as a coroutine. That way I could return control to the Unity UI and keep Unity from thinking something had crashed.

I was going to inline the code for the recursive function that iterates over all the possible states of the panel, but it’s too big and unwieldy to be properly readable in this blog’s formatting. So you can look at it on pastebin here: https://pastebin.com/deBHmn0h

A Clever Optimization

One of the biggest performance issues that I was having with the puzzle solver code is that it was an extreme memory hog. This would be bad enough if the code was just executing straight through, but because each recursion of the function was encapsulated in a new coroutine, the memory problem gets out of control pretty quickly. I even ran into a situation (possibly to do with script reloading) where the Unity garbage collector failed to clean up and leaked several gigabytes of heap memory.

The natural solution to this is to change the method by which we traverse all possible states of the panel. Ideally, we would like to change this from a recursive function into a loop, but the way in which we should do that is not entirely clear.

Not to pat myself too much on the back, but when thinking about how to speed this up, I think I made a clever connection. Because the panels are essentially series of bits, we can just think about storing the state of the panel in the bits of an integer. In C# a ulong is a 64-bit integer. This gives us enough space to cover the state a panel with 64 fields in it, or an 8×8 panel.

Importantly, we also can know ahead of time how many possible states there are in a panel. The formula is as follows:

c = 2w×h

Where c is the count of all possible states, w is width of the panel, and h is height of the panel.

Funnily enough, if we store c in binary, it will take a (w×h)+1 bits.

For example, if we have a 4×4 panel, there are 16 tiles on the panel (w×h). By plugging that into our formula we get 216 or 65536 states, which in binary is:

0001 0000 0000 0000 0000

This seems useless, but if we take one less than that, or 65535, we get the following in binary:

1111 1111 1111 1111

Which, if we re-project that as a 4×4 panel where means that tile is on and means that tile is off, is the state in which all tiles are toggled on.


If we then take that number and continually subtract one from it over and over again until we reach 0, we will have covered all possible binary states between. This principle means we can create the following iterative function which will visit all possible states of the panel:

ulong possibilities = (ulong)Mathf.Pow(2, width*height);
while (true)
     for(int x = 0; x < currentState.Length; x++)
          currentState[x] = ((possibilities & (ulong)(1<<x)) != 0);
     if(possibilities == 0) break;


The above code was a bit simplified so that it could be inlined. If you’d like to look at the final complete function, you can find that on pastebin here: https://pastebin.com/5xGi7y5V

In addition to being much more lean on memory, it is much faster. It is also easily resumable with or without coroutines because the state is so simple that we only need to store one variable between runs: possibilities.


Wrapping up, we have done some good optimization work on saving and loading. Additionally, because of our puzzle solver optimization, we’ve laid the ground work towards enabling an offline puzzle generator. But I’ll have to cover that in the next devlog post….

I know. I’m cruel… Thanks for reading. 🙂

3 thoughts on “46. Optimizations From Underneath a Woolly Mammoth

  1. Thanks for this blog — I’m always interested to read about details of optimisations and hacks (even in code I don’t know, in a language I don’t use), and I’m looking forward to seeing the actual game!

    I’m confused by ‘IntToCharArray’ though. My intuitive understanding of this is that (for example) the integer 5678 should end up in a file as “5678”, so is being converted into an array of ASCII values representing “5”, “6”, “7” and “8” in that order. And I can see the “8” (in this example) being put into the array, but I’d then expect the input to have the 8 subtracted and then be divided by 10, to give “567” (with the “7” ready to be taken next time). Instead it’s having 10^len subtracted, which is either 14/9/8/11 (if ^ is XOR in this language) or 10000/1000/100/10 (if it’s an exponent). Neither makes sense to me — is it a bug, or am I missing something?

    I also notice that ‘input_copy’ is never used (presumably it’s what should be in the loop, to avoid modifying ‘input’) and that the digits are added to the array left-to-right (so if the loop was sort-of-right-shifting as I expected, the array would represent “8765”, a reversed number).

    Apologies if I’m being daft and not seeing something obvious here.


    1. No, you’re definitely not being daft. You pointed out some pretty obvious bugs that I missed. Late night programming at it’s finest.

      I’ll update the code in the article, and the game of course.

      Not taking the 8 off of 5678 before moving on to the next column is definitely an oversight, although it shouldn’t be a big deal so long as I’m converting to an integer. The fractional component will get discarded in that case.

      As for the ^, it’s a double whammy of stupidity. First, I thought that ^ was the exponent operator, and it is actually the XOR operator in C#, and basically every other programming language. Second, I don’t need to be doing an exponential subtraction anyway, and should be dividing by 10 for each place as you mentioned.

      input_copy is just cruft which should have been deleted when I moved this out into a function.

      You’re also right about the characters being added to the array backwards.

      So…the question is why did this work at all? Well, it didn’t for any number larger than 10. Luckily, the only integers that I’m saving into the file right now are the width and height of panels. Most of the panels in the game are less than 10×10. And additionally, in the case that they are, the save file doesn’t fail to load, it’s just that that panel is skipped and will be in its default state after the load happens. This is necessary to maintain some level of compatibility between builds, because I might have deleted the panel or changed it drastically.

      I do output a warning when they don’t match up. But I had it disabled, oops! I probably need to add a couple verbosity levels so I’m not just disabling error printing.

      No wonder it was working so fast. 😛


  2. It did occur to me that you’d been working with <10 until now so not seen the errors.

    Glad to have been of help (if indeed you hadn't already figured it out).



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s