72. Gotta Save Fast

Been a little busy with various things, including dealing with my girlfriend’s dog having gotten run over. He’s okay but with a broken hip. Somehow, getting a dev log post out in time for the end of November totally slipped my mind. In any case, if you haven’t been keeping up with the video dev logs. Now would be a good time. The latest one is a bit beefy:

With that said, I’d like to dive a bit more deeply into what was involved in getting the save game hitch down to an acceptable level. As of writing this post, There’s still a hitch doing the actual saving, so the job is really only halfway done. However, I did do some work to get screenshot saving down to an acceptable level, and I think that’s a problem that many other games have, so it may be more useful to cover it in detail.

If you just want a decent solution to the problem, then feel free to skip to the end of the article to the section labeled “The Solution” and take a look at the code posted there. For the rest of you interested in hearing my journey to get there, read on!

The Problem

So why are we wanting to save screenshots as part of our save game routine? Well, the answer here is pretty simple, which is that we want the load game menu to show a screenshot of where the player was when they last saved. That way they can easily tell whether they are picking the right file to load:

Unity’s API really only gives you a few functions to handle taking screenshots. The first one and the easiest to use is ScreenCapture.CaptureScreenshot. It’s a simple to use function that takes a screenshot and saves it to a png file. This is great, right?

The problem is that it causes a massive lag spike. The peak of this single frame spike is close to 66 milliseconds. Needless to say this is unacceptable for a game that is targeting under 16.66ms per frame (and don’t even get me started about 8.33ms 120FPS which is apparently so impossible that Unity didn’t see fit to label it in this graph)

A Million and One Ways to Save a Frame

Okay okay, so lets look at some of our other alternatives. We also have CaptureScreenshotAsTexture and CaptureScreenshotIntoRenderTexture. The second one will come back much later, but for now lets take a look at the first one:

That’s an improvement, however we are still exceeding 33ms, which is slow enough to cause a noticable hitch even if the game were locked at 30fps. Additionally, this doesn’t even include the hit from writing out the file, as we only have a texture in memory.

Alright, so we should look into writing out this screenshot into a file. The easiest way to do this is the following:

System.IO.File.WriteAllBytes("test1.png", ImageConversion.EncodeToJPG(ScreenCapture.CaptureScreenshotAsTexture()));

Here’s the performance graph for that:

So, obviously this is even worse than ScreenCapture.CaptureScreenshot but we should probably have expected that because the fine folks at Unity HQ know what they’re doing (perhaps), and because we’re just taking a function designed for putting a screenshot into a Texture2D and then we’re writing that out to disk.

However this isn’t actually the approach that I used, because I was actually unaware of both CaptureScreenshotAsTexture and CaptureScreenshotIntoRenderTexture when I initially went about writing this code. I don’t know if they are new API additions or if I am just a bit slow. Either way they both got ignored for no good reason. They are perfectly fine approaches for certain cases, but with that said I probably wouldn’t have stumbled upon my ultimate solution if I had known about them. Instead, my approach was using Texture.ReadPixels.

ReadPixels requires implementing in a coroutine delayed to WaitForEndOfFrame(). This seems alright as far as performance goes, not quite under the 60FPS threshold here, but an improvement. However, this graph is simply calling ReadPixels on a 3840×2160 framebuffer, without even writing that out to a file. If we take a similar approach to the way we did earlier, then we’d get the following code and the following performance characteristics:

    IEnumerator CoWriteScreenshotTemp(int width, int height)
    {
        yield return new WaitForEndOfFrame();
        if(screenshot==null) screenshot = new Texture2D(width, height, TextureFormat.RGB24, false);
        screenshot.ReadPixels(new Rect(0, 0, width-1, height-1), 0, 0, false);
        screenshot.Apply();
        screenshot_stored = true;
        System.IO.File.WriteAllBytes("test2.jpg", ImageConversion.EncodeToJPG(screenshot));
    }

Damn, so now we’re almost as bad as CaptureScreenshotAsTexture, while having made the code significantly more complicated. However, because we already need to do this ReadPixels process in a Coroutine, perhaps we can do it over multiple frames, so what might happen if we tossed in a few yield return‘s in there and spread that out over a couple frames, and while we’re at it, why don’t we put the file writing on its own frame?

Okay, performance implications are not always obvious in these high-level languages. We have almost an equivalent hitch across two frames now. Still, it seems like there has to be some fertile ground in this direction, even with all the obfuscation of a high-level language. So perhaps the issue is that we’re saving out the file on the main thread. Perhaps if we spin up a thread to write out the file?

Thread _encodeThread;

IEnumerator CoWriteScreenshotTempMultiFrameWithFileWriteThread(int width, int height)
    {
        if(screenshot==null) screenshot = new Texture2D(width, height, TextureFormat.RGB24, false);
        yield return new WaitForEndOfFrame();
        screenshot.ReadPixels(new Rect(0, 0, width/2, height-1), 0, 0, false);
        yield return new WaitForEndOfFrame();
        screenshot.ReadPixels(new Rect(width/2, 0, (width/2)-1, height-1), width/2, 0, false);
        screenshot.Apply();
        yield return 0;
        rawBytes = screenshot.GetRawTextureData();
        QueuedPath = "test4.jpg";
        _encodeThread = new Thread(EncodeAndWriteFileWithArray);
        _encodeThread.Start();
    }

    void EncodeAndWriteFileWithArray()
    {
        byte[] jpgBytes = ImageConversion.EncodeArrayToJPG(rawBytes,
                          UnityEngine.Experimental.Rendering.GraphicsFormat.R8G8B8_UInt,
                          (uint)Screen.width, (uint)Screen.height, 0, 75);
        File.WriteAllBytes(QueuedPath, jpgBytes);
        Globals.updateLoadGameMenu = true;
    }

Well, in spite of the fact that the above graph looks very similar, the scale on the left has changed, and now we are peaking a bit past 33ms rather than all the way to 66ms. So this is definitely an large improvement. However, we’re still not even halfway to 60FPS… One thing we could do pretty easily is to just add in even more yield return new WaitForEndOfFrame()s and read from the framebuffer over even more frames. However, because we are just reading from whatever the most recent frame was, if the scene changes much, this pretty easily can result in problems such as below:

Notice the visible seam down the middle of the frame. The issue is that the framebuffer is changing while we are trying to read from it. The performance of spreading this out over 5 different slices would look like the following:

Admittedly, I was a bit stuck here, but then someone suggested that I take a look into doing a copy GPU side so that I could simply read from the same framebuffer and not have any of these seams. I apologize for not remember who exactly it was who suggested it. In retrospect it seems pretty obvious, but as you might imagine I was pretty lost in the details at this point.

The easiest way to do this would be to use our earlier friend CaptureScreenshotIntoRenderTexture (I told you it’d come back!). As this will do exactly what the doctor ordered, capture the framebuffer into a RenderTexture. Unfortunately (or perhaps fortunately as the case may be), I didn’t know about this function and so I proceeded to dive into several ways of doing this type of thing, ultimately settling on CommandBuffers. (In the future, this will be the most outdated part of this article, as CommandBuffers are part of the legacy graphics pipeline for Unity and will most likely be removed.)

Okay, I’m going to save the code breakdown for a bit later, because otherwise I’d be putting a bunch of code in here that’s marginally different from the final code. But we’ll just say that we set up a CommandBuffer to copy the contents of the screen over to a second RenderTexture (which is really just a GPU-side texture). As part of that process, we have to blit it to a temporary buffer so that we can flip the image vertically. Due to some vagaries in how Unity handles its graphics pipeline, the image gets flipped after it is post-processed but before it is fed into the final display. We come out the other end with a RenderTexture, only it’s actually vertically oriented correctly, unlike CaptureScreenshotIntoRenderTexture, that just leaves it upside down.

This means that, unlike with our earlier staggered read, we can split the read from this RenderTexture across as many frames as we want without having any seams show up. The performance characteristics of doing this across 5 frames looks like the following:

Okay, so another improvement, but there still seems to be a large hitch at the end associated with getting the raw bytes from a Texture2D, and even the 5-frame-staggered readback from the framebuffer into the Texture2D is exceeding our 16ms frame boundary….enter:

The Solution

So, the ideal way to avoid this readback and conversion hitch is…maybe you can guess?

Well, if you didn’t want to guess, I’m gonna tell you anyway; tell the GPU to do the conversion and readback for us in an asynchronous way. The fact is that we will always end up with a bit of a hitch if we are using the CPU to tell the GPU to give us some texture data right now, because that means it has to stop or finish what it’s doing and focus entirely on that task. Instead, we can tell the GPU to give us that texture data “whenever it has a moment”, this is called a asynchronous readback request and can be done in Unity using the AsyncGPUReadback.Request function.

Because this generates garbage, I instead chose to use the variant that uses NativeArray: AsyncGPUReadback.RequestIntoNativeArray. Note that this does mean that we will want to make sure we dispose of the NativeArray when we are done with it. For my purposes, I really only dispose of the array and reallocate it if the resolution changes.

First the performance characteristics of this approach:

Ah, we are finally at something acceptable. You may notice the peak shows slightly above the 16ms line, so it’s clear there is still some hit from the async readback, however in practice the stutter is not noticeable. I may revisit this further before release to see if I can squeeze out a bit more performance here. But for now I am happy enough with this part of the equation. For Taiji, I still have to do some work to improve the actual writing of the save file, but the screenshot taking is “good enough to ship”, as far as I’m concerned.

The actual save file performance is the left spike, and the right is the screenshot taking

So, here’s a snippet of something resembling the final code that I used. This is not exactly a plug and play class, but should be a good basis for getting something working in your own projects. I tried to at least have everything necessary in the snippet:

string QueuedPath;
NativeArray<byte> imageBytes;
byte[] rawBytes;
Thread _encodeThread;

public Material screenshotFlipperMat;

Vector2 previousScreenSize;

CommandBuffer cBuff;
public RenderTexture screenRT, tempRT;
    

//Call this function whenever you want to take a screenshot
public void TakeScreenshot(string fileOutPath)
{
    StopAllCoroutines();
    StartCoroutine(SaveScreenshot(fileOutPath, Screen.width, Screen.height));
}

//Helper function that will be called later
void InitializeBuffers()
{
    //We do some checks to see if the resolution changed and we need to recreate our RenderTextures
    bool resChanged = (previousScreenSize.x != Screen.width) || (previousScreenSize.y != Screen.height);
    previousScreenSize = new Vector2(Screen.width, Screen.height);
    
    //The array is Screen width*height*3 because we are going to use a 24bit RGB format (TextureFormat.RGB24)
    if(imageBytes.IsCreated == false) imageBytes = new NativeArray<byte>(Screen.width*Screen.height*3, Allocator.Persistent);
    else if(resChanged)
    {
        imageBytes.Dispose();
        imageBytes = new NativeArray<byte>(Screen.width*Screen.height*3, Allocator.Persistent);
    }
    
    if(tempRT == null || resChanged) tempRT = new RenderTexture(Screen.width, Screen.height, 24);
    if(screenRT == null || resChanged) screenRT = new RenderTexture(Screen.width, Screen.height, 24);
    
    //We build our command buffer, which includes a double blit using a special 
    //material (shader in article) so that we can flip the output from the backbuffer
    
    //This double blit seems to be necessary because CommandBuffer.Blit will not allow us to blit using a material
    //if we are blitting from the backbuffer (BuiltinRenderTextureType.CurrentActive)
    if(cBuff == null || resChanged)
    {
        cBuff = new CommandBuffer();
        cBuff.name = "ScreenshotCapture";
        cBuff.Blit(BuiltinRenderTextureType.CurrentActive, tempRT);
        cBuff.Blit(tempRT, screenRT, screenshotFlipperMat);
    }
    GetComponent<Camera>().AddCommandBuffer(CameraEvent.AfterImageEffects, cBuff);
}


//Function to dispose of our imageBytes from external classes
public void Dispose()
{
    imageBytes.Dispose();
}

//It may be possible to do this in one frame instead of as a coroutine, but I have not tested
IEnumerator SaveScreenshot(string fileOutPath, int width, int height)
{
    InitializeBuffers();
    yield return 0;
    Camera.main.RemoveCommandBuffer(CameraEvent.AfterImageEffects, cBuff);
    QueuedPath = fileOutPath;
    AsyncGPUReadback.RequestIntoNativeArray(ref imageBytes, screenRT, 0, TextureFormat.RGB24, ReadbackCompleted);
}

void ReadbackCompleted(AsyncGPUReadbackRequest request)
{
    if(request.hasError) return; //We just won't write out a screenshot, not a huge deal
    _encodeThread = new Thread(EncodeAndWriteFile);
    _encodeThread.Start();
}

void EncodeAndWriteFile()
{
    rawBytes = imageBytes.ToArray();
    byte[] jpgBytes = ImageConversion.EncodeArrayToJPG(rawBytes, UnityEngine.Experimental.Rendering.GraphicsFormat.R8G8B8_UInt, (uint)Screen.width, (uint)Screen.height, 0, 75);
    File.WriteAllBytes(QueuedPath, jpgBytes);
    Globals.updateLoadGameMenu = true;
}

Additionally, you’ll need a shader to do the backbuffer flipping, so here’s something for that. Better could perhaps be done, but this is based off of an internal Unity copy shader, so maybe not much better:

Shader "Custom/ScreenshotFlipper" {
    Properties{ _MainTex("Texture", any) = "" {} }
    SubShader {
        Pass {
            ZTest Always Cull Off ZWrite Off
 
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            #pragma target 2.0
 
            #include "UnityCG.cginc"
 
            sampler2D _MainTex;
            uniform float4 _MainTex_ST;
 
            struct appdata_t {
                float4 vertex : POSITION;
                float2 texcoord : TEXCOORD0;
            };
 
            struct v2f {
                float4 vertex : SV_POSITION;
                float2 texcoord : TEXCOORD0;
            };
 
            v2f vert(appdata_t v)
            {
                v2f o;
                o.vertex = UnityObjectToClipPos(v.vertex);
                float2 uv = v.texcoord.xy;
                uv.y = 1-uv.y;
                o.texcoord = TRANSFORM_TEX(uv, _MainTex);
                return o;
            }
 
            fixed4 frag(v2f i) : SV_Target
            {
                return tex2D(_MainTex, i.texcoord);
            }
            ENDCG
 
        }
    }
    Fallback Off
}

Hopefully this has been a fun deep dive for you more technically-minded folks. Lord knows it was a lot of work to write it, but I had fun too. 🙂

71. Trees in the Wind

I’ve wanted to get animation for the trees into the game for a while, but could never quite manage it. I’ve tried many different approaches to this, but always the main issue is that the game renders at a fixed low-resolution pixel grid.

Some other pixel art games choose to render the game at a higher internal resolution than that of the actual pixel art, which means that you can rotate sprites without much aliasing. You can see a comparison between a high-res rotation-based animation (left) and how the effect breaks down when you render at a low resolution (right):

I have never liked the look of pixel art games that mix different resolutions, so I chose to render Taiji in a way that would force all effects in the game to be rendered at the same resolution of the base pixel art. But as you can see above, this means that rotating pixel art tends to cause strange artifacts that sort of look like edges sliding across the image. Obviously, this is very unaesthetic looking, and we need to try something else.

One possibility that I tried was to add in some noise to attempt to jitter out the sampling and create a smoother look. This removes the “sliding edges” appearance, but ends up adding in a lot of noise along edges. The effect could perhaps work well with a game that has a more forgiving art-style with a lot of noise built into the graphics.

So, with a couple of failures under my belt, I decided to rule out large motions such as rotating the entire tree, and instead I focused my efforts on animating the leaves on their own. This type of effect can be done fairly easily in a shader by simply adding in a noise offset when you sample the texture for the leaves.

This is certainly an improvement, but the effect is a bit too strong. Also if you look at it closely, it feels more like the tree is underwater than being effected in the wind. We could tone the strength of the distortion down, but then the motion becomes so subtle that it’s almost not worth having.

Another possibility that I attempted was to custom author a secondary texture which would control how the distortion was applied. I tried using a noise texture with leaf pattern built into it. I even did some tests pre-rendering leaves with Blender so that I could use the scene normals of the leaves to modulate the distortion.

I didn’t save this iteration of the shader, but suffice to say that it did not work much better than the purely random noise I was using earlier.

However, I started to think that an approach similar to how I animated the grass would be effective. The grass is essentially just a flat texture on the inside, with all the distortion happening along the outside edges.

So what would it look like if I did the same for the trees?

We’re getting close! This effect is even more pleasing, with a better balance between the details of the original pixel art and significant enough motion to be worthwhile. However, the motion feels a bit unnatural because it is confined completely to the outside edges.

What I chose to do to resolve this was to re-incorporate the idea of having a secondary texture control where the distortion effect can be applied. When used to highlight the internal edges, this forms the final effect. The wind map texture is below on the left. You can see that some interior pixels are colored red, those are the ones that are allowed to be distorted in the final effect on the right:

Overall, I’m pretty happy with how this came out. It adds some much needed motion to the trees, giving those scenes a more dynamic feel, and it doesn’t distort the base pixel art so much that it feels unnatural.

For a fun bonus, remember when I said that the unconstrained effect looked like water? I ended up using the same effect for this reflection in the water:

68. In The Mountains of Madness (Win32 Wrangling Across the API Boundary)

As I mentioned in the previous blog post, I ran into some issues with mapping the mouse movement from the screen into the game. One of those issues required venturing outside the bounds of the Unity API.

So, for the longest time, I had a certain number of playtesters who were reporting that the mouse movement in the game was very sluggish for them. Unfortunately, I was not able to reproduce this issue until I purchased another monitor. It’s possible that the issue still exists in other circumstances and for other reasons, but in this blog post I will be covering what I did to at least resolve the issue that I could reproduce.

The Issue

“…Unity does not always properly tell you the resolution information for the monitor that the game is running on, and instead shows you the resolution of the “main monitor” as it is set in the display settings in Windows.”

–Me, earlier

Although Unity provides us with a Screen structure that contains information corresponding to the resolution of the screen and the size of the game window, this information is not always correct for our purposes. I was wrong however, in the previous post, in asserting that it always pulls info from the main monitor; what it actually does depends on whether the game is running in windowed mode or fullscreen mode.

If the game is running in windowed mode, then Screen.currentResolution is the native resolution of the monitor that the game is currently running on. However, if the game is running in fullscreen mode, then currentResolution will always be the same as the game’s internal rendering resolution, regardless of the final output resolution.

For example, if we are running the game fullscreen at 1920×1080 but the display is 3840×2160, even though Unity upscales the game to 3840×2160 in order to make it fullscreen, currentResolution will be 1920×1080.

This is a big problem, because in order to scale our mouse movement from screen space into world space, we need to know the ratio between the size of the screen pixels and game pixels. In fullscreen mode, if the player is running below their native monitor resolution, a single pixel in the game’s internal resolution will correspond to multiple pixels on the screen because of upscaling.

Funnily enough, even though we can get this ratio in windowed mode, it is totally unnecessary there, as the game is be rendered unscaled in windowed mode.

The Solution

Because I couldn’t reproduce the issue for the longest time, my initial hunch here was that solving this problem would involve coming up with some other way to handle mouse input appropriately.

I hoped that the new input system for Unity would allow me to solve this issue, however my initial experiments showed some kind of cumulative performance issue which would cause the game to become unplayable after about 30 minutes or so. (I am not sure if this issue has been resolved at this point, but in any case, I decided to pursue other possibilities for fixing the mouse movement.)

Upon finally reproducing the issue, and coming to the diagnosis that I mentioned in the previous section of this article, I set about trying to get the information about the monitor in some other way.

There are other functions in the Unity API, but none of them were very helpful. For example, you can easily find information about all the monitors using the Display class, but there is no information about which monitor the game is currently running on. Display.main is simply the main system monitor according to Windows (this was the cause of my earlier confused reporting about Screen)

So I did what any confused programmer would do at this point; I googled the problem. This led me to this thread on the Unity forums, and some very confusing code.

There’s no real good way around saying it, I just copied and pasted that code and tried to work from there. This was after locating the only thing I could find written officially by Unity about this type of functionality, which was really no help at all.

(I also found a Unity Answers thread with some similar code to the thread on the forums.)

So, in hopes of explaining what I have learned in a better way, and adding one more random thing to the internet about how to handle calls to the Win32 api from Unity. I will post the entire class I wrote and we’ll go through the code as far as I can explain it.

First, the code:

using System.Collections;
using System.Collections.Generic;
using UnityEngine;
using System;
using System.Runtime.InteropServices;

public static class MonitorInfo 
{
    [DllImport("user32.dll")]
    private static extern IntPtr GetActiveWindow();
    [DllImport("user32.dll")]
    private static extern IntPtr MonitorFromWindow(IntPtr hwnd, int flags);
    
    [StructLayout(LayoutKind.Sequential)]
    public struct RECT
    {
        public int left;
        public int top;
        public int right;
        public int bottom;
    }
    
    [StructLayout(LayoutKind.Sequential)]
    public class MONITORINFO
    {
        public int cbSize = Marshal.SizeOf(typeof(MONITORINFO));
        public RECT rcMonitor = new RECT();
        public RECT rcWork = new RECT();
        public int dwFlags = 0;
    }
    
    [DllImport("user32.dll", CharSet = CharSet.Auto)] [return: MarshalAs( UnmanagedType.Bool )]
    private static extern bool GetMonitorInfo(IntPtr hmonitor, [In, Out] MONITORINFO info);
    
    
    public class monitor
    {
        public int width, height;
    }
    
    public static monitor current;
    static MONITORINFO info;
    public static bool isValid;
    
    public static void Update()
    {
        if(info == null) info = new MONITORINFO();
        isValid = GetMonitorInfo(MonitorFromWindow(GetActiveWindow(), 0), info);
        if(isValid)
        {
            if(current == null) current = new monitor();
            current.width = info.rcMonitor.right - info.rcMonitor.left;
            current.height = info.rcMonitor.bottom - info.rcMonitor.top;
        }
    }
    
    
}

Looking at the boilerplate at the top of the file, it’s pretty standard fare, however we need the following in order to interface between Unity’s API and the Win32 API layer:

using System.Runtime.InteropServices;

Next, in our class we need to declare prototypes for the external functions that we plan on calling. We also tell Unity with the [DllImport] attribute to import these functions at runtime from the user32.dll file.

[DllImport("user32.dll")]
private static extern IntPtr GetActiveWindow();
[DllImport("user32.dll")]
private static extern IntPtr MonitorFromWindow(IntPtr hwnd, int flags);

These function definitions are based off the interfaces specified for these functions on MSDN:

GetActiveWindow()

HWND GetActiveWindow();

MonitorFromWindow()

HMONITOR MonitorFromWindow(
  HWND  hwnd,
  DWORD dwFlags
);

A big part of the annoyance here is these types that Microsoft uses for their API calls. What the heck is an HWND? Well, unfortunately I do have some experience with the terrible windows API, so I know that HWND is a Handle to a WiNDow. Similarly for HMONITOR, which is a handle to a monitor. And by handle, they just mean a 32-bit pointer.

God knows how you’re supposed to find this out, but the type that we’re supposed to use in C# to deal with 32-bit pointers is IntPtr.

Okay, so that just leaves DWORD to figure out.

Well, a DWORD, or Double WORD is just a 32-bit integer. This is essentially based off the idea that the processor word size is 16 bits (which it is not, but probably was at the time the Windows API was designed).

Anyway, moving on, we can just use int in our C# code, as the C# integer size is 32 bits. That gives us the definitions below:

private static extern IntPtr GetActiveWindow();
private static extern IntPtr MonitorFromWindow(IntPtr hwnd, int flags);

private and static may not be necessary for you, but in my case this is part of a static class so that it can be easily accessed at global scope from all my other classes in Unity. Static classes in C# require all members to be declared static (and unfortunately it doesn’t do this automatically, we actually have to type static a billion times). I also chose to make these members private because this makes this all a bit less worry-prone as a globally scoped class.

So, next we have a couple of structure definitions:

[StructLayout(LayoutKind.Sequential)]
public struct RECT
{
    public int left;
    public int top;
    public int right;
    public int bottom;
}
[StructLayout(LayoutKind.Sequential)]
public class MONITORINFO
{
    public int cbSize = Marshal.SizeOf(typeof(MONITORINFO));
    public RECT rcMonitor = new RECT();
    public RECT rcWork = new RECT();
    public int dwFlags = 0;
}

These are again, based on the MSDN documentation for these structures.

Rect

typedef struct tagRECT {
  LONG left;
  LONG top;
  LONG right;
  LONG bottom;
} RECT, *PRECT, *NPRECT, *LPRECT;

MonitorInfo

typedef struct tagMONITORINFO {
  DWORD cbSize;
  RECT  rcMonitor;
  RECT  rcWork;
  DWORD dwFlags;
} MONITORINFO, *LPMONITORINFO;

We could use long in the C# code for the RECT members, and it would technically be more correct, however integers work just fine here. This may be some legacy feature of the API though.

We also need to use the [StructLayout] attributes, because normally, in C# the compiler is free to re-order the elements of a struct or class for memory efficiency purposes, however in our case we need the elements of these structures to be in the exact order that they are in the Win32 API.

One strange detail is this line:

public int cbSize = Marshal.SizeOf(typeof(MONITORINFO));

This is essentially code that I copied from the forum post I mentioned earlier, and as such I can’t entirely explain why it works this way, but the Marshal class is part of the System.Runtime.InteropServices that we included earlier. Specifically, it is a class that is used in order to convert between managed and unmanaged memory types.

Based on that, I can hazard a guess that what we’re doing here is actually pulling the size of the MONITORINFO struct from the Win32 API side of the the code, and not the size of the MONITORINFO class that we are currently in the process of defining in our own code. This seems reinforced by the fact that if we change this line to….

public int cbSize = sizeof(typeof(MONITORINFO));

…then Unity will complain that MONITORINFO is undefined.

Okay, moving on. Now we have this whopper of a function prototype definition.

[DllImport("user32.dll", CharSet = CharSet.Auto)] [return: MarshalAs( UnmanagedType.Bool )]
private static extern bool GetMonitorInfo(IntPtr hmonitor, [In, Out] MONITORINFO info);

This is, of course, still based on the definition on the MSDN page:

GetMonitorInfoA()

BOOL GetMonitorInfoA(
  HMONITOR      hMonitor,
  LPMONITORINFO lpmi
);

“Wait, hold on, why is this called GetMonitorInfoA?”

Answering that will also answer why we need to have this CharSet = CharSet.Auto attribute as part of the dll import.

There are two versions of GetMonitorInfo in the Win32 api, one for ASCII characters (GetMonitorInfoA, which is the legacy version) and one for UTF-32 characters (GetMonitorInfoW).

“But wait, why on earth are we worried about text?”

We only have to care about this because this function could potentially return a MONITORINFOEX structure, which contains a string that is a name of the monitor. In our case, we are just throwing that data away and using the smaller MONITORINFO struct, but we still have to support it as part of our function prototype definition.

*sigh*

Another oddity as part of the Attribute definition is this:

[return: MarshalAs( UnmanagedType.Bool )]

Why do we have to marshal bools? Don’t ask me, but the function returns a bool specifying whether or not a monitor could successfully be found, and if you actually want to know that, you’ll need to marshal the bool across the API boundary, because managed and unmanaged bools are not compatible.

The only detail that might be confusing about the actual definition of the function is the [In, Out] attribute. This seems to be the way to specify that a parameter is passed by reference across the API boundary here. Changing it to ref does not work.

At this point, the rest of the code should be fairly understandable, if you have any experience with Unity C# coding:

public class monitor
{
    public int width, height;
}
    
public static monitor current;
static MONITORINFO info;
public static bool isValid;
    
public static void Update()
{
    if(info == null) info = new MONITORINFO();
    isValid = GetMonitorInfo(MonitorFromWindow(GetActiveWindow(), 0), info);
    if(isValid)
    {
        if(current == null) current = new monitor();
        current.width = info.rcMonitor.right - info.rcMonitor.left;
        current.height = info.rcMonitor.bottom - info.rcMonitor.top;
    }
}

One thing that’s worth noting, is that I keep track of a isValid bool publicly, so that I can always check if the calls to the Win32 api returned valid data before I go around using it.

Implementation

So with all that done, we can now change the code that handles the mouse scaling to the following:

Vector2 ScreenRes = new Vector2(Screen.width, Screen.height);
if(MonitorInfo.isValid) ScreenRes = new Vector2(MonitorInfo.current.width, MonitorInfo.current.height);

This means, that as long as we are able to return some valid information from the Win32 API, we will use that. If not, we will fall back to the Screen structure that Unity provides for us, which may be wrong in some cases.

Hope you learned something!

67. Mouse Wrangling

Okay, so I’m a bit late on this devlog entry, but in one of my recent video dev logs (Which I do weekly over on my YouTube channel; if you haven’t been checking them out) I promised that I would write about how mouse movement is handled in-depth, and so I’m going to pay off on my promise.

Mouse movement in games is almost always a bit more of a tricky thing to get right than it seems when playing a finished product. For one, the mouse is a direct input method, so when you move the mouse you expect an equivalent motion in-game, either of the camera or a cursor. This means that latency is much more noticeable here than it is on something indirect like a thumbstick.

Latency can be handled in any number of ways and is mostly outside the scope of this article, but I mention it because it’s a very common case where the technical details of how you handle input can have a large effect on how the game feels.

Mouse motion in Taiji is handled in a way that I haven’t seen any other games use. Most top-down or 2d games with mouse controls (i.e. Diablo, Civilization, Starcraft), just have you move the mouse around on the screen like it were your desktop, and the camera moves very rigidly and only under direct player control. However, in Taiji, camera movement is much more dynamic and loosely follows the position of your player avatar. This causes a bit of an issue, in that the camera may still be moving when the player is trying to use the mouse cursor to target a tile on a puzzle panel. Trying to hit a moving target with the mouse is a bit of a challenge that I am not interested in introducing into the game.

There are a few ways that this problem could have been resolved. One possible solution is to just never move the camera when a puzzle is visible and interact-able. However, this causes some discontinuities in the camera movement which can be, at best, irritating or at worst, nauseating. It also doesn’t work for some panels that are always interact-able if they are on screen. Another possibility is to just give up and lock the camera to the player as many other games do (Diablo, Nuclear Throne). This approach didn’t appeal to me for aesthetic reasons. I want the game to have a relaxed feel, and the slower camera movement is part of that.

The approach I chose instead was to treat the mouse cursor as though it is character existing in the world of the game, and allow the player to control that character directly with the mouse. Another way of thinking about this is that the mouse behaves as though the entire world of the game was your computer desktop, and we are just seeing a small view into that “desktop” at any one time. The view into this “desktop” can move around, but the mouse cursor would stay in the same place relative to everything else on the “desktop”. Technically, this is to say that the cursor is in world-space rather than screen space. This can all seem a bit abstract though, so to help make this concept of “world-space” vs “screen-space” mouse cursors a bit more clear, I recommend watching the video below, which I excerpted from one of my recent video devlogs.

Great! So this fixes our issue of having the mouse cursor drift around as the camera moves, and of the player trying to hit a moving target sometimes when they are clicking on puzzle panel tiles. However, we now have introduced another problem, which is that, since the fairy cursor character only moves if the player moves the mouse; when the player walks across the map, they might forget and leave the cursor behind. Luckily, for this game, in particular, the player is seldom needing to move both the player avatar and the cursor at the same time, so if the player walks more than a certain distance without touching the mouse, we can just put the little fairy character into a separate mode where they follow the player’s avatar around automatically. This might seems like it would get annoying, but in practice, it never really gets in the way of what you’re doing.

So how does this work?

So far, I’ve mostly been recapping the same ground that I covered in my recent devlog video about the mouse movement, but in this written devlog I’ve got much more space to dive into some of the technical details about how this functions.

Fundamentally, the work that we need to do here is to translate a certain amount of motion of a physical mouse into an equivalent motion in the game world. Taiji is being developed using Unity, so in this case, I use Unity’s input system (which Unity is currently in the process of replacing). In Unity’s input system, there are a few ways that I can access information about the actual mouse device.

One possibility is to simply look at the screen position of the mouse, and then project that into the game world in some way. However, we don’t use this method, as we want to make sure the OS mouse cursor is confined to the window while the game is running (you can free up the mouse to use other applications by pausing the game, of course). So we lock the OS mouse cursor to the center of the game window and just ask Unity to tell us how much it tried to move each frame.

mouseDelta = new Vector3(Input.GetAxisRaw("Mouse X"),Input.GetAxisRaw("Mouse Y"));

In the above code, mouseDelta represents the delta (amount of change) in the position of the mouse from the previous time we polled (the last frame). We get the horizontal (X) and vertical (Y), components of this delta, using the GetAxisRaw functions to avoid any time smoothing that Unity might otherwise apply.

Now we can get an amount that the mouse has moved, but there’s one problem; if the OS mouse cursor moved 10 units horizontally, we don’t know how far that really is. We don’t have any idea what the units are. Unfortunately, Unity’s documentation is no real help here, but through experimentation, I have determined that these values are in “pixels at 96dpi“. This might seem the same as pixels on the screen, however, because of screen scaling in Windows, these values may not correspond 1:1 with pixels on the screen.

In any case, correcting for this is fairly easy, as we can simply ask unity for the DPI of the screen. Then we normalize this value to 96dpi and multiply the mouse movement by this value:

float DPI_Scale=(Screen.dpi/96.0f);
mouseDelta *= DPI_Scale;

This now means our mouseDelta is in actual screen pixels.

So…at this point we can just take the movement and project it into the world of the game…right?

Well, unfortunately, no, and this is part of the reason that I had to go down an annoying rabbit-hole that involved calling into the Win32 API. For now let’s just continue onward and ignore that problem, as the explanation of how the mouse gets transformed into world space is already quite complicated on its own. But just make a mental note and we’ll come back here in a bit.

So, we have another issue that we have to resolve, which is that the game can be running in its own scaled resolution. This happens at two levels. The first is that the player can run the game at a sub-native resolution but in fullscreen mode. The game runs in a borderless fullscreen window, which means that if, for example, the game is running fullscreen at 1920×1080 on a 4k monitor, one pixel in the game’s output resolution will correspond to 4 screen pixels.

There are unfortunately some issues here, in that Unity does not always properly tell you the resolution information for the monitor that the game is running on, and instead shows you the resolution of the “main monitor” as it is set in the display settings in Windows. This will be the cause of the Win32 API madness later (interestingly enough, the DPI value is always correct, even with the screen resolution information is not). In any case, we will pretend for now that the Unity API call returns the correct value in all cases, and so the code to resolve this possible mismatch in resolution is as follows:

Vector2 ScreenRes = new Vector2(Screen.width, Screen.height);
float renderScale = (ScreenRes.x / ScreenRes.y < GameRes.x / GameRes.y) ? ScreenRes.x/GameRes.x : ScreenRes.y/GameRes.y;
if(!Screen.fullScreen) renderScale = 1.0f;
mouseDelta *= renderScale;

You might notice that this is a bit more complicated than the code accounting for screen dpi scaling. This is because the game runs at a forced 16:9 aspect ratio in order to tightly control what the player can and cannot see at any given time. This means that if the player is running the game fullscreen on a monitor of a different aspect ratio, it will be either letterboxed or pillarboxed, depending on whether the monitor’s native aspect is wider or narrower than the games. The final rendering scale will, therefore, depend on which side of the game’s view fully spans the monitor (horizontal in the case of letterboxing, and vertical in the case of pillar boxing)

Also of course, if the game is not fullscreen, we don’t have to worry about this render scaling at all.

Omigosh, we’re almost done with this and then we can go home. So the next and final thing that we have to do to get the mouse movement into world space is to account for the degree to which the in-game camera is zoomed in. Luckily, Unity provides us with a fairly straightforward function to map a position from the game’s rendered viewport into a world position. We use that to figure out a scaling value between screen and world space:

float mouseScale = (mainCamera.ScreenToWorldPoint(Vector3.one).x - mainCamera.ScreenToWorldPoint(Vector3.zero).x)/2.0f;
mouseDelta *= mouseScale;

ScreenToWorldPoint takes an input coordinate in pixels and returns a coordinate in “Unity Units”, so we are taking the horizontal distance from one screen pixel to a pixel immediately up and to the right of it, then dividing that horizontal distance by 2 to find the zoom factor. The reason for the division by 2 is actually a bit of a mystery to me at the time of writing this, which is why writing these deep tech dives can be a bit more useful for development than they might otherwise seem. I initially thought that perhaps this was because I was somehow returning the diagonal distance here. However, changing the code to use a pixel directly to the right does not produce a different result. So I guess it remains a mystery to me. However, without the division, the scaling will be wrong. Perhaps someone will comment on this devlog and tell me where I’ve obviously messed up the math somewhere earlier that would require this division or some other reason why it needs to happen here.

At this point, other than multiplying the mouse movement by an additional sensitivity value, we are done, and we can now apply mouseDelta as a translation to the actual in-game object representing the mouse cursor.

To Be Continued

I know, I know, I said I would get into the API nonsense, but this piece has gone on long enough on its own, so I’m gonna leave this on a cliffhanger and get back to fill in the details on that later this week. Till now, thanks for reading, and remember to wishlist the game on steam if you haven’t!

57. Don’t Go Chasing Waterfalls

So, I’ve been continuing to bang away on the visuals for the game, which means I could just post up some more screenshots like I did last month, but I thought it might be more fun to do a bit of a technical breakdown of the waterfall effect in the game.

You can see what the finished effect looks like below:

So, there are obviously lots of references I can go to for waterfalls. Photo references, other games. I happen to live near quite a few streams and rivers, many of which have waterfalls.

A few of the bigger influences on this effect are the waterfalls in the Zelda games Breath of the Wild and The Wind Waker. You can see examples of both of those below:

I’m actually not a huge fan of the look of the waterfalls in Breath of the Wild, but the way in which the effect is technically achieved is fairly obvious there, and so I consider it somewhat of an influence on my approach. Really, both of the above waterfall effects, as well as the waterfalls in Taiji are essentially a variant on a basic scrolling texture effect.

Below on the left, you can see the source texture I use for the bulk of the waterfall effect. And on the right you can see and what it looks like when scrolled across a distorted UV map. ( A UV map is what tells the graphics card which part of the 2D texture to draw on each part of a 3D model. In this case, the UVs are stretched in the shader, with the underlying geometry just being a flat rectangle)

So the basis of the effect is that I overlay two copies of this texture, with different offsets and slightly different scrolling speeds. These form the white foam layer.

After that, I make two more layers, only these scroll much faster and are partially transparent. This forms a second layer to go beneath the white layer.

Both these layers are composited together and then overlaid (at a lower alpha and with a fade towards the top) onto a screen-space gradient that acts as the water’s base color. The gradient is subtle but resembles the reflection of a blue sky.

Alright, so we now have the base of our effect, and can add the edges. The edges are just another scrolling texture using the same distorted UVs as before. The left and the right edges are just mirror images of eachother, offset a bit along the direction of scrolling. The UVs for the edge texture are also pinched in a bit at the top of the waterfall. Below, the original edge texture is on the left and how it gets distorted and scrolled is on the right

The black area of the edge texture is used to mask off the effect so that it can be composited into the rest of our effect and blended in. We add a slight fade to transparency at the top of the waterfall and we’ve completed the base effect.

At this point, we add a churn effect to the bottom of the waterfall, using particle systems. One system is emitting large circles which shrink and fade out, the other system emits smaller circles which fly up and then are killed off when they cross the bottom edge of the waterfall. You can see the two particle effects separately below.

When we put it all together, we get our full waterfall effect:

Thanks for reading, hopefully this was an interesting dive into some of the technical art that I’ve been doing lately for the game. 🙂

51. Snake Puzzles

Since the last blog post, I finished up revisions on the other areas in the game and sent out a test build to a few people. Still not done getting feedback from that, but overall the game seems to at least be improving. So that’s good news.

Needless to say, there’s quite a number of things that I could write about. But since automated puzzle generation/solving seems to be a pet topic of mine lately for the devlog. I thought I might cover a little bit of how I adapted the automated solver to work for the “snake puzzles.”

There are two main types of panels in the game, if categorized based on how you interact with them. Those are the “toggle” panels, which you click on, and the “snake” panels, which you walk around on. I mostly refer to the snake panels as “walking” panels, but the name still sticks around in the code from the early development days when you weren’t walking on them.

An example of how the snake/walking puzzles work.

In any case, because of the nature of the input method for the walking puzzles, the same brute force method of finding all possible solutions doesn’t really work.

Admittedly, there is probably a much smarter way of doing things than what I chose to do, but since I don’t want to spend too much time on the auto solver unless I really have to, I chose to simply take the approach of feeding the output of the current solver into a separate “pruning pass.” This works well enough, because all the solutions which are walk-able are really just a subset of the solutions that are possible using the click-to-toggle mechanics.

I chose to essentially take the output of the brute force solver and feed it through two rejection passes. One is a broad (and hopefully fast) rejection pass, which will never reject valid solutions, but will still leave some impossible solutions un-rejected.

The above grid represents a 3×5 solution state that we might be checking to see if it could be done by walking. I didn’t draw any symbols on the panel because it would be irrelevant as that part of solution validation is already done by this point.

So we take this panel and feed it into our first rejection pass. This broad and hopefully fast pass simply checks to see if all of the walked tiles (those would be the ones in white in the above image) are part of the same area. Since as the player can only step on tiles which are adjacent to ones they have already stepped on, this is a guaranteed property of all valid solutions.

Obviously, we can see that there are several different white regions, so this solution is not valid. But the question you might have in your mind is how I would go about that in code? Well, since I already have a function to find all the tiles in a given area, I simply look across the panel until I find a single white tile, call the function to find all the tiles in that area, and then I compare the count of tiles found to the total number of white tiles in the solution state. If they’re equal, then we pass it on to the next pass.

If not, we reject the solution as invalid.

int whiteTileCount = 0;
bool isValidSolution = false;
foreach(bool b in currentState)
{
    if(b) whiteTileCount++;
}

for(int x = 0; x < width; x++)
{
    for(int y = 0; y < height; y++)
    {
        
        if(currentState[x+width*y] && !lockedTiles[x+width*y]) //We only want to look at the area of white tiles
        {
            ArrayList checkedTiles = new ArrayList();
            int num_same_tiles = 0;
            FindAllTilesInArea(x, y, ref checkedTiles, ref num_same_tiles, true);
            
            if(num_same_tiles == whiteTileCount)
            {
                //Could be possible, so we do further testing, we
                //want to OR the values here, because if any spot
                //on the panel allows the solution to be drawn,
                //it's valid
                isValidSolution = isValidSolution | IsSolutionDrawableFrom(x, y, tempState.Clone() as bool[]);
            }
            else
            {
                //We found a disconnected white area, this in and
                //of itself will exclude this solution from being
                //possible on a snake panel
                isValidSolution = false;
                goto EndOfThisSolution;
            }
     }
}

So, we've rejected the previous state and now we will move on to another potential solution state (pictured above). This one passes the first test, as all of the white tiles are connected into one area. So that means it it will get passed on to the second phase.

In this phase, we walk through the panel, tile by tile, from all available starting points (the example panel starts in the bottom left), and we mark off each tile as we go through it. When we reach a branch, we follow out one of the branches until we have eaten all of the tiles. If we do not eat all the tiles by following a branch, that means there is a dead end in the panel and it is not walk-able.

Below is a visualization of this approach.

The code to do this is somewhat more complex than the first pass, and involves a recursive function:

public bool IsSolutionDrawableFrom(int x, int y, bool[] testState)
{
    testState[x+width*y] = false;
    bool returnVal = false;
    
    if(x > 0 && testState[(x-1)+width*y]) returnVal = returnVal | IsSolutionDrawableFrom(x-1, y, testState.Clone() as bool[]);
    if(x  0 && testState[x+width*(y-1)]) returnVal = returnVal | IsSolutionDrawableFrom(x, y-1, testState.Clone() as bool[]);
    if(y < height-1 && testState[x+width*(y+1)]) returnVal = returnVal | IsSolutionDrawableFrom(x, y+1, testState.Clone() as bool[]);

    if(returnVal) return true;

    foreach(bool b in testState)
    {
        //if we reach this point, we've hit a dead end somewhere, this
        //means if find a lit tile that's disconnected from somewhere we
        //could walk to, then this solution must be untenable;
        if(b) return false;
    }
    return true;
}

That's pretty much it for it this week. Perhaps next week I'll discuss a spoilery meta-puzzle thing I've been working on lately. Although if you'd follow the development live on twitch, you'd already know about that!

47. The Greatest Puzzle Generation

I promised in the last blog post that I would write a bit about how optimizing the solution-finder paved the way for puzzle generation.

The good news is that I worked on the game a lot in the past week. The bad news is that I still haven’t gotten the offline puzzle generator entirely working. Since it’s a relatively low priority task, and I wanted to ship another quick build to a few people over the weekend, I set the puzzle generator aside in order to make improvements to the starting flow in two of the areas of the game.

Once you have a puzzle solver, making a puzzle generator is actually pretty straightforward. Really all you have to do is plug in random layouts into the solver until you find some that are solvable.

This is already good enough to generate a bunch of solvable puzzles, but it’s a rather blunt tool. It will generate duplicate puzzles, and it doesn’t tell us which puzzles might be worth looking at.

No Dupes

To prevent duplicates, we could generate puzzles in a way that never produces duplicates in the first place. However, this could mean that for any given stretch of time, we could be stuck in a local minima of uninteresting puzzles.

Finding interesting puzzles usually involves a lot of intuition, but generating them could be considered analogous in some ways to a hill-climbing problem. I don’t want to lean much into the analogy, so I’ll just say that since we are hoping to find interesting puzzles that are not very similar to each other, it is good to be as random as possible.

With random puzzles, the only way to prevent duplicates is to keep track of all previously generated layouts. This way, when we want to generate a new layout, we just keep generating layouts until we find one that we have not used already. Since we are already keeping a list of all solvable puzzles, we can easily keep an additional list of all puzzles tested, and check new layouts against the list.

The only real concern here is a trade-off between memory usage and performance. Keeping the list of all tested puzzles in memory will eventually use a lot of memory, but writing the list to disk and parsing that list over and over could be time consuming.

For the time being, I’ve decided to go with the in-memory approach. This simplifies a few things for now, and I can always write things out to disk later.

I chose to use C#’s built-in HashSet collection type to store all the previously generated panels. The benefit of this data structure is that it should be the fastest built-in type for checking if a certain item is on the list (although I have not profiled this).

Currently, puzzles are stored as several arrays. However, for testing the puzzle generator, I’ve just been generating one of these arrays: the symbols array, which stores any symbols that might be on the panel and is just a 1-dimensional integer array of size width*height. (I use a single dimensional array because I’m relying on Unity’s built-in serialization to save the panels and it does not natively support multi-dimensional arrays. I could take the time to write a custom serializer, but I’ve just been accepting the cognitive overhead of doing x+width*y every time I want to do a lookup.)

An annoyance that I immediately encountered when attempting to store generated panels in the HashSet is that when doing the following:

bool done_generating = false;
while(!done_generating)
{
    //symbols is an int[]
    //previousPanels is a HashSet<int[]>
    symbols = GenerateNewPanel();
    if(!previousPanels.Contains(symbols)) 
    {
        //The panel is unique!
        done_generating = true; 
    }
}

previousPanels.Contains() would sometimes let duplicate panels through. That is to say, it would declare that a certain panel state was not in the HashSet when it really should have been.

I would have expected to get hash collision problems, and have panels get rejected as duplicates when they were not, but for some reason the opposite problem arose. I’m not sure why this is. Perhaps there is something funny going on with the implicit call to symbols.GetHashCode() since it’s an integer array (You can find the implementation in the reference source code, but I don’t understand it enough to tell what went wrong). Perhaps I don’t understand hash generation well enough in general.

In any case, I averted my eyes from whatever madness was happening there, and just passed a string in instead and moved along.

One other important thing to consider in my specific use case is that puzzles can sometimes be rotated or reflected duplicates. What I mean is that the following three puzzles are identical, as the solution gets rotated and reflected along with the symbols.

rotational_reflectional_symmetry.png
Left: A Panel;   Center: Left, Rotated 90° Clockwise;   Right: Left, Mirrored Vertically;

Checking for these duplicates is really just a matter of comparing all possible manipulations of a generated panel to the HashSet before declaring it unique.

The Cream of the Crop

Once you’ve got a list of unique puzzles, the problem then becomes how to sort through the many generated puzzles to find ones that are interesting.

This is the thing that I have mostly not finished. I began by keeping track of how many solutions any given puzzle had, as generally speaking, puzzles with only one solution are more interesting to me (although not always more interesting to put in the game, for various peculiar reasons) than ones that have several solutions.

However, I ran into a strange bug where the number of solutions is being incorrectly reported during puzzle generation. I spent a few hours trying to track down some fault in my generation code, or in the way in which I was serializing the finished puzzles to disk.

My best hypothesis at the moment for the location of the bug is that it has to do with the  solution validation code. When checking if a puzzle is solved or not, I am assuming that the panel has been completely initialized. This is usually a safe assumption because panels don’t have symbols moving around at runtime (yet?) and because if I move them around in the editor, the panel gets reinitialized when I’m done.

However, inside the puzzle generator, I’m not re-initializing the entire panel when the symbols change (primarily as an optimization). This may be causing the solution checking to fail because some state has not been properly set.

More on that later. I won’t guarantee that I will be back in the next post to talk more about puzzle generation, as I am beginning to hack away at the tasklist for the next real test build(!)

The list is only 11 items long, but they’re big items.

46. Optimizations From Underneath a Woolly Mammoth

farcryprimalmammothHeader Image from Far Cry Primal reveal trailer.

I’ve been super productive the past week or so. I worked on some fun relatively non-spoilery things, so I can actually talk about them (yay!). Be forewarned; It’s all very technical. But if you’re intrigued by optimization or procedural puzzle generation, press on.

I sent a quick build out to a few people the past couple weeks. Less for playtesting and more for overall thoughts. I’ve still got the next real test build incoming eventually, so if you’re on the list of testers and have been waiting a while….

Don’t worry, I see you.

Anyway, the next build will still be a while, as I have to essentially revise every area in the game. This will mostly amount to cutting lots of puzzles. But after that we can do some testing, and see if I cut too much.

Optimizing Save/Load

One of the “not-testers” this past couple weeks unfortunately lost their progress due to a VM crash. I had assumed this was probably a failure of my saving code (as no untested code is perfect) resulting in a corrupted file, but it turned out that there was no save file at all. The game was only saving when you exited the program. I had forgotten that I disabled periodic saves because there was a noticable hitch if I saved in the middle of gameplay.

Most of this hitch is just because there are so many puzzles floating around in the game world (about 460, with only 230 or so being accessible to the player, and the rest just in the scene as WIP areas or puzzles which I cut for now but that might find a place eventually), but I figured there had to be some performance gains to be had, since I hadn’t really written it in the first place with an eye to performance.

Even still, I tend to avoid doing unnecessary optimization work if I can, so my first inclination was to take the saving and just spread it out over multiple frames. The main issue there is that there is a lot of state that needs to be kept track of to resume saving in an iterative fashion. In theory, this is what coroutines are for, a simple way to make any function you have into an iterative function, by allowing the function to return an iterator which contains its own stack contents. Thus far, I haven’t found a good use for coroutines, because most of the time it’s easier to just store the state in a variable or two and manually resume on the next frame. But aha! Here was the chance for the coroutine to shine!

Except naturally, when I put the function into a coroutine, initializing the coroutine took longer than the actual saving did. So my hitch got worse instead of better. Another “nice” side affect was that I could no longer handle .NETs I/O exceptions because you’re not allowed to use the yield statement (required for a coroutine) inside of a try{} block.

I tried to make the function use asynchronous file operations as well, but this had a similar overhead hitch. Better than the coroutine, but I wasn’t making gains here.

I could’ve continued on. Maybe used threads. Maybe created a coroutine/thread at the startup of the game and just leave it always running along and waiting for saving tasks, but instead I took all this as a hint that I was wasting time on API nonsense and I should actually instrument my code a little and do some real optimizations.

Since the save system is currently human-readable, writing the save files naturally involves writing a lot of text to the file. So, it wasn’t a surprise to see StreamWriter.Write() eating up the majority of the time spent saving. What was a bit more of a surprise was realizing that most of the time spent in that function was on converting integers to strings.

Now, be warned, the following code may disturb you greatly, but this is what I was actually doing for the inner loop which writes the state of all tiles on all panels in the game:

int currentValue;
for(int x = 0; x < (panel.width*panel.height); x++)
{
     if(panel.currentState[x] == true) currentValue = 1;
     else currentValue = 0;
     sw.WriteLine(currentValue); //sw is the StreamWriter we're using
}

Now it should be apparent why I was spending so much time converting integers into strings. For every single tile on every single puzzle in the game, I’m calling the StreamWriter.Write() command and passing it an integer. This means that every time, StreamWriter has to convert it to a string before it can write it to the file.

The reason I was passing an integer into the StreamWriter instead of the booleans directly is because if I simply passed in the booleans, StreamWriter will output them into the file as “True” or “False”, and wanted to write the panel state as “1” and “0”.

So obviously, I made the simple change to the following:

for(int x = 0; x < (panel.width*panel.height); x++)
{
     if(panel.currentState[x] == true)  sw.WriteLine('1');
     else  sw.WriteLine('0');
}

Just this one change got me about a 3x speedup, by avoiding the string allocations and the integer to string conversion. But there’s a few more things I decided to do to make it a bit faster. The first should be pretty obvious:

char[] panelValues = new char[(panel.width*panel.height)+(panel.height)];
int ii = 0;
for(int y = 0; y < panel.height; y++)
{
     for(int x = 0; x < panel.width; x++)
     {
          panelValues[ii++] = (panel.currentState[x+panel.width*y] ? '1':'0');
     }
     panelValues[ii++] = '\n';
}
sw.Write(panelValues);

The main change here is that I simply pre-allocate a char[] buffer for the entire string that represents the state of the panel, and only access the file in one call to StreamWriter. The reason to switch to char[] instead of string should be obvious: strings produce more garbage. And since we’re in a managed language, the less garbage the better.

There’s a few details that are not completely self-explanatory though. One is that I’m now iterating over both width and height. This is just an aesthetic change in that I wanted the panel state to be written 2 dimensionally in the file for better human readability. This is also why the size of the char[] gets and additional panel.height added onto it at the end. I do this because I need an additional column of characters compared to panel states, in order to add the newline characters at the end of each row.

As a bonus optimization on saving, for the situations in which I have no choice but to write integers to the file, I wrote a function to convert integers to char arrays. The function is not very general purpose (obviously), but is much faster than whatever StreamWriter was using.

Note: This code has been updated from the original version due to some bugs pointed out by “Steve” in the comments. If you’d like to view the original buggy code. I’ve posted it here: https://pastebin.com/HNNp7Zug 

    private static char[] IntToCharArray(int input)
    {
        short len = 0;
        if(input < 10) len = 1;
        else if(input < 100) len = 2;
        else if (input < 1000) len = 3;
        else if (input  0)
        {
            output[len-1] = (char)(48+(input%10));
            if(len>1)input = (int)(input/10);
            len--;
        }
        return output;
    }

Although the method of determining how many places there are in the number is pretty dumb looking, it’s super fast. If we wanted to support bigger numbers, we could just add some more lines.

I also did a few optimizations to the loading function. I don’t want to get bogged down too much in the technical details, but essentially I did the same type of thing as I did for saving. I buffer up each line of the panel before parsing it, so as to cut down on string generation as well as calls to the StreamReader.

Procedural Puzzle Generation

So, one other thing that I’ve wanted to work on for a while is improving the tools that I use to make the game. Basically all of the puzzles in the game thus far have been hand designed. However, I’ve made use of a brute-force puzzle solver in some cases.

The puzzle solver worked by using a recursive function. This is a bit complicated, but the important thing to know is that it was very slow. So slow in fact that Unity would kill its own process because I froze the UI for so long.

Since this was a case where I didn’t care too much about the overhead cost of firing up a coroutine, the first thing I did was make the meat of the solver work as a coroutine. That way I could return control to the Unity UI and keep Unity from thinking something had crashed.

I was going to inline the code for the recursive function that iterates over all the possible states of the panel, but it’s too big and unwieldy to be properly readable in this blog’s formatting. So you can look at it on pastebin here: https://pastebin.com/deBHmn0h

A Clever Optimization

One of the biggest performance issues that I was having with the puzzle solver code is that it was an extreme memory hog. This would be bad enough if the code was just executing straight through, but because each recursion of the function was encapsulated in a new coroutine, the memory problem gets out of control pretty quickly. I even ran into a situation (possibly to do with script reloading) where the Unity garbage collector failed to clean up and leaked several gigabytes of heap memory.

The natural solution to this is to change the method by which we traverse all possible states of the panel. Ideally, we would like to change this from a recursive function into a loop, but the way in which we should do that is not entirely clear.

Not to pat myself too much on the back, but when thinking about how to speed this up, I think I made a clever connection. Because the panels are essentially series of bits, we can just think about storing the state of the panel in the bits of an integer. In C# a ulong is a 64-bit integer. This gives us enough space to cover the state a panel with 64 fields in it, or an 8×8 panel.

Importantly, we also can know ahead of time how many possible states there are in a panel. The formula is as follows:

c = 2w×h

Where c is the count of all possible states, w is width of the panel, and h is height of the panel.

Funnily enough, if we store c in binary, it will take a (w×h)+1 bits.

For example, if we have a 4×4 panel, there are 16 tiles on the panel (w×h). By plugging that into our formula we get 216 or 65536 states, which in binary is:

0001 0000 0000 0000 0000

This seems useless, but if we take one less than that, or 65535, we get the following in binary:

1111 1111 1111 1111

Which, if we re-project that as a 4×4 panel where means that tile is on and means that tile is off, is the state in which all tiles are toggled on.

1111
1111
1111
1111

If we then take that number and continually subtract one from it over and over again until we reach 0, we will have covered all possible binary states between. This principle means we can create the following iterative function which will visit all possible states of the panel:

ulong possibilities = (ulong)Mathf.Pow(2, width*height);
while (true)
{
     possibilities--;
     for(int x = 0; x < currentState.Length; x++)
     {
          currentState[x] = ((possibilities & (ulong)(1<<x)) != 0);
     }
     if(possibilities == 0) break;

}

The above code was a bit simplified so that it could be inlined. If you’d like to look at the final complete function, you can find that on pastebin here: https://pastebin.com/5xGi7y5V

In addition to being much more lean on memory, it is much faster. It is also easily resumable with or without coroutines because the state is so simple that we only need to store one variable between runs: possibilities.

TTFN

Wrapping up, we have done some good optimization work on saving and loading. Additionally, because of our puzzle solver optimization, we’ve laid the ground work towards enabling an offline puzzle generator. But I’ll have to cover that in the next devlog post….

I know. I’m cruel… Thanks for reading. 🙂

42. Non-Uniform Tiles

This week, I’ve been working on implementing a feature that I have wanted to have for a long time: puzzles which feature tiles of non-uniform size.

Thus far, all of the puzzles in the game have been built around fixed-size tiles. It’s a bit hard to explain the difference in words, so here’s a visual example:

nonuniformtileexample
Left Uniform tile sizes Right Non-uniform tiles

The left image is how all the puzzles in the game currently operate, as a grid of uniform tiles of all the same size. The panel on the right is a mock-up of what I want to do. Essentially, I want the capability to make puzzles where tiles can be “connected” together into larger groups, or meta-tiles. Doing this requires a bit of ingenuity, both within the tech and from a graphic design perspective.

Since I’m still keeping the grid-based setup, the problem is much simpler than it could be. 

As far as the artwork is concerned, we can cover all of the shapes we might want by subdividing each panel tile into 4 smaller sub-tiles, each of which can be one of three different graphics:

nonuniformtileexample2.png
Top (Enlarged):The 3 sub-tiles. Bottom An example panel built from the sub-tile graphics, enhanced to show the individual tiles.

This is the most economical choice from an asset design perspective, because although there are more variations required than the 3 sub-tiles, they can all be expressed simply by rotating the tile images.

As simple as the art is, the code is kind of complex, both in determining how to rotate the sub-tiles, and because there is an additional “corner” case (Get it?). 

For example, with the basic approach, in which you just check the neighboring tile and see which ones are connected, if you were to have a large connected block of four tiles, you’d end up with something like the left-hand image below:

nonuniformtileexample3.png
Four tiles connected, in two ways.

Notice that each tile is drawn as a single-width corner, and so we end up with a hole in the middle of the area. This might be desirable in some cases, but here it’s really just an accident. 

Luckily, this isn’t too difficult to resolve. You simply check, for each tile, if it is connected so as to form a corner, and then you additionally check if the tile diagonally opposite that corner is also connected to form a corner, and if it is, you just change the inner corner graphic. This technically requires a fourth sub-tile for the inner corner, which is just a solid color.

Data Structure

One of the other challenges was deciding upon a data format to express these meta-tiles, and additionally, how I would interface with that format in the code.

There are two approaches that initially came to mind, both of which have their costs and benefits.

  1. Add four booleans (true/false values) to each tile, for each of the four cardinal directions, to indicate whether or not the tile is connected to its neighbor along that direction. The benefit here is that checking for connections is simply a matter of checking a value stored alongside the tile. However, it is rather wasteful of memory, with four additional entries per tile. And furthermore, for each connection, there are two values, leaving the possibility that the connection values could end up out of sync. (This can be resolved with reference types though, as I’ll explain later)
  2. Create an array which stores the connections between tiles as booleans. This is memory efficient, as there is no data duplication. However, it is somewhat cumbersome to work with in the code, without doing some additional engineering. (The panel is made up of one 2D array, and the connections are an additional 2D array, but it is somewhat hard to make direct comparisons between the two arrays.)

I decided to go with the second option to begin with. Both options have their upsides and downsides, but the memory efficiency of the second option seemed most appealing to me for whatever reason.

Notice that both approaches are thinking about properties of the individual tiles which make up the meta-tile. Each tile has connections to its neighbors. This approach has some flaws which will come back to bite me in a bit, but this is what I chose to go with first, and I think it’s important to document my process as it happens.

My general strategy when doing any engineering work (and sometimes design as well), is to do the simplest dumbest thing that seems like it will work, and clean it up later. This means I may just implement the first solution that comes to my head. This is a good approach because, most of the time, unless a problem is utterly trivial (or you have implemented a similar solution many times before), you don’t really understand the problem well enough before you begin. The likelihood of anything you think of before you start coding being the right solution is pretty close to zero.

So get your ego out of the way and just start coding!

(Relevant:)

It’s a bit difficult to explain the way that I structured the data, but since I went through the trouble of creating an editor interface for the connections, I’ll simply show you what that looks like, as it’s a useful way to visualize the data structure of the panel in terms of how it relates to the connections.

nonuniformpanels5
Left: A 3×3 puzzle panel     Right: The editor interface for that panel

In the editor interface shown above, the actual tiles of the panel are represented by the ☐’s. The connections between the tiles are represented by the checkbox buttons.

In terms of the data layout, the panel state is stored as a 3×3 array of booleans, and the connections are stored as a 3×5 array. I need the additional rows. as each tile has vertical as well as horizontal connections.

Fun fact: For any particular panel of dimensions W¹×H¹, the formula to determine the size of the array necessary for storing the connections between tiles is

W²=W¹; H²=(H¹*2)-1.

Second Fun Fact: Every other row, I don’t use the last horizontal entry in the row, so I technically do waste a bit of memory.

As I mentioned earlier, because of the unusual data layout, it is not exactly obvious, for any given tile (x, y) on the panel, which (xy) to look at in the connections array. This is one of the flaws of the tile-centric approach.

Ultimately I ended up writing a helper function:

bool isConnected(int x, int y, Direction dir)
 {
     if(dir == Direction.E) return (x < (width-1) && connections[x+width*(y*2)]);  
     if(dir == Direction.W) return (x > 0 && connections[(x-1)+width*(y*2)]);

     if(dir == Direction.N) return (y < (height-1) && connections[x+width*((y*2)+1)]);  
     if(dir == Direction.S) return (y > 0 && connections[x+width*((y*2)-1)]);

     return false;
 }

This function gets called a lot, and is pretty computationally inefficient. One solution would be to cache the results of these calculations and store the results alongside each tile. This is equivalent to option #1 from earlier, however if the stored values were only references to the values in the connections array, this would avoid the problem of the data getting out of sync.

Highlighting Inefficiency

The next thing we have to resolve is toggling the whole connected piece when the player clicks on it, and handling the highlights when the player hovers over the piece.

nonuniformpanels
Clicking some tiles in an incomplete, but functional form.

Since the code that deals with the player clicking on tiles still works with individual tiles, rather than the whole connected group, when the player clicks on any given tile, we have to do a search to determine all the tiles connected to that tile. From a technical standpoint this is somewhat non-trivial, but it is analogous enough to the issue of finding all the tiles within a colored area, that it was a problem which I’d already solved. The only difference here is the criteria by which the tiles are considered “connected.”

So, we do the exhaustive search for all connected tiles when the player clicks to toggle. Although this works well enough when we’re toggling, we also need to highlight the tile when the player mouses over it. (Notice in the above GIF that the tile highlight is still isolated to the individual tiles, and doesn’t cover the whole connected group)

We could do the same search that we were doing when toggling a tile, it would technically work, but doing it every frame seems a bit wasteful. Of course, we could do some basic optimization and only do the search when we really need to, when the player has moused over a new tile.

This is all fine, but something started to smell funny to me. I didn’t like how wasteful things were getting computationally. After thinking about it for a bit, I came to a more optimal solution.

Until I got fairly deep into the implementation, I had been thinking about this whole problem from the bottom up. I was thinking only about the same uniform tiles I always had, rather than the meta-tiles. A meta-tile, as far as I was concerned was just an emergent result of a system which allowed tiles to be connected to each other.

But what if we think of a meta-tile as it’s own entity? If we do, then our problem of finding all the tiles that make up a meta-tile suddenly becomes trivial.

Additionally, because the layout of the panels in the game never changes in real-time (thus far), I have the luxury of baking down any complex calculations that have to do with the panel layout. In this case, I can just bake-down the meta tile groups at startup.

In the baking process, I iterate across all the tiles on the panel and store an integer for each tile, denoting which meta-tile group it is part of. For example, if the first tile I check is connected to form a meta-tile, I will mark all of those tiles with a meta-tile number of 1. The next meta-tile I find, will have a meta-tile number of 2. And so on. 

The numbers don’t really matter as long as they are the same for all tiles within the same meta-tile group. 

This means I only have to do the exhaustive search at startup, and from then on, whenever I need to find all tiles that are connected to a current tile, I can simply compare the current tile’s meta-tile number to all the other tiles on the panel, and if the numbers match, the tiles are connected.

(This is actually a simple enough way to handle connected tiles that I may choose to do away with the connections array. Additionally, opening up the ability for overriding the meta-tile group manually would allow for some additional things, such as having tiles that are not physically connected, but can only be toggled in unison.)

Fixing the Mechanics

Getting the gameplay mechanics working again is somewhat more complicated, as I have to combine the idea of areas connected by color with the idea of multiple “tiles” being connected together into one big tile. This shouldn’t be too difficult though, as we just have to add meta-tile awareness to our normal area search. Still, I didn’t get to it this week.

So…to be continued.

Optimizations

In the process of doing all of this, I ran into some performance problems. This is somewhat to be expected. Previously, each puzzle panel tile required 3 sprites, but under the new system, there are 9 sprites per tile. (I have the four corner sprites both for the fill part of the tile and the outline part, as they can be colored/enabled separately. Additionally, there is a background color sprite for each tile.)

One of the obvious issues was a huge uptick in the time spent on start-up. This has been a bit of an issue for some time already, but it really just skyrocketed. I try to keep the startup time down to less than 5 seconds (Ideally it would be much faster than that, but there seems to be a certain amount of overhead in Unity that is simply unavoidable), but after making the changes to allow for the increased graphical flexibility, the start-up time ballooned past 10 seconds. 

This may not seem like a lot, but I like to keep my iteration times as fast as I can. This desire for instant feedback (or as close to it as possible) is why I took the time to get a live updating feature for the puzzle panels.

The overhead came because I was initializing every single puzzle panel in the game at startup. This involves not just spawning objects for each panel, but for every single tile on every single panel. As many of you who are familiar with Unity may know, spawning new GameObjects is one of the more expensive things you can do. Unfortunately, with all the additional GameObjects involved in the new sub-tiles, there were just too many objects being created at start-up for it to happen in a reasonable timeframe.

So in order to reduce the start-up time significantly, the thing that made the most sense was to avoid initializing anything that wasn’t completely necessary. I could have chosen to go right to an object pool system, where I reuse tiles from panels as needed and only spawn a few at a time. But instead, I just chose to go for the simpler method, which is just to check which panels are visible to the player when the game starts, and only initialize those.

Apart from the startup time, there were some other issues with how I was handling control of the colors and images on the sprites. I made the initial mistake of delegating all of that work down to the individual 8x8px corner tiles, but it really made more sense to cluster the work up at a higher abstraction level in the 16x16px tiles. There is a decent overhead to each individual MonoBehaviour component in Unity, so the less you can have of them, the better.

As time goes on, it may make more sense to bring that tile management work all the way up to the level the entire panel itself, but for now this is a good balance between locus-of-control and performance.

There were some additional minor performance issues unrelated to the new panel changes that I nevertheless took the time to clear up. I won’t really go into those here, but it always feels nice when you get things running better.

I’ll be back next week to talk about getting the puzzle mechanics working again. I also need to repair the walking puzzles, cause I seem to have broken the graphics for them.

41. More Tool Improvements

This week I didn’t spend a ton of time on the game, nor did I add anything flashy. However, I did take some of the things that I learned last week while improving the puzzle panel editor GUI and used them to redesign the editor for my custom animation system.

This was the previous system, which involved using several separate components. Both a master Sprite Animation component, and an additional Sprite Animation Strip component for each individual animation:

animationsystem.png

Although this certainly worked. It was pretty cumbersome to add new animations or tell what you’re looking at. As you can see, the first Sprite Animation component simply has a flat array references to the other Sprite Animation Strip components. Unfortunately, it is impossible to tell the difference between each animation in the list, because all the duplicate components have the same name. Not to mention, having a bunch of separate components wastes vertical space on irrelevant information, and overall… it’s just a mess.

Here’s the redesigned interface:

newanimationeditor.PNG

Although it is almost functionally identical to the previous system (in fact, this is the same set of animations), you can see that there are a number of usability improvements here. For one thing, each animation gets its own name, this makes things way more readable.

Now, you might ask, why even go through the trouble of writing your own animation system when you could just use the built-in Unity animation system, which surely supports way more that this.

I prefer as much as possible for the code to be doing the heavy lifting on things like this, so the Unity GUI editor and data-driven approach just doesn’t appeal to me. Additionally, although the Unity system is fully featured; for a 2D sprite-based game, it’s just overkill. I’m not doing animation blending, so thinking about the animation as a tree is a strange way to think about it. Essentially, a 2D animation is just a flipbook of frames. That can be represented as a 1D array both in the code and in the interface. If I give each of these arrays an index, this makes it super easy to change an animation from within the code. There’s no need to for me to create a FSM with a state change graph and all that when I could just set an animation index. If I want to change which animation is currently playing. I can just do this:

playerAnimation.index = 0; //Or the index of the animation that I want

And the animation component will just change over to the new animation and continue handling the playback.

Now, it is perhaps a bit cumbersome if I want to think about things in terms of the names of the animations, as the code is not particularly illuminating on this front, but that can either be handled by just adding a comment to the code, or by creating a lookup that maps from the name to the index.

Either way, it just works better for me.

Additionally, as with the Puzzle Panel editor, reducing the overall complexity on both the editor end and the code end leaves me with much more room to add, without having an overwhelming amount of unnecessary complexity. For the time being, the only frame animation in the game is the player’s walking animations (which need to be redone), so I don’t really need more functionality than this.

That may also mean that this was a misappropriation of my limited time to work on the game, but time will tell.