Intro to Machine Learning — Producing a neural network capable of generating naturalistic terrain at massive scale
Mar 20, 2022
When we set out to build a massive-scale sandbox world, we started at the ground. Early on in our process, it became clear to us that our first challenge in creating the worlds on the size that we were envisioning was simply building them in the most basic way: creating the terrain, the features, the hills, valleys and mountains that would comprise these playgrounds and then filling them with all the rocks, trees, grass and water required to make them feel real.
It’s a challenge for any project, but an even bigger challenge when you’re staring down the idea of a 100 X 100 km map, or even a 1000 X 1000 km map.
Building these worlds by hand is limiting in obvious ways. It would take tens of thousands of artists and designers years to manually shape, sculpt and populate these worlds in believable ways, and even then the level of coordination required would likely limit the level of naturalism you could achieve with the results. Typically, a studio would begin to cut down that colossal workload by using procedural generation in some way. These tools, too, felt limiting. Procedural generation works essentially by developing individual pieces, whether tens or hundreds or thousands, and then developing patterns to fit those pieces together into a coherent whole. This strategy fell short for our purposes: it can tend to result repetitive results when applied on a massive scale, and it struggles with making large individual features like mountains or valleys feel unique and natural. From a processing perspective, it can also become extremely cumbersome at scale.
Producing a neural network capable of generating naturalistic terrain at massive scale.
These processing question indicates a second challenge that is just as important, but more nuanced. Even if we were somehow able to produce maps this huge, they would be so large that they wouldn’t fit on any reasonable expectation of a player’s hard drive. Streaming from a data center would be one way to address this, but this just moves the bottleneck to a different point in the process and creates numerous other potential failure points. In order to address this problem, we needed a solution that would allow players to experience a virtual world without storing it at all, whether on a hard drive or at a remote data center. The solution to both of these problems turns out to be the same. We needed a way to generate worlds at runtime, building the ground beneath the player’s feet as they moved. Instead of shipping the data that forms these worlds, we would ship the tools necessary to build them on the fly.For that, we turned to Machine Learning, producing a neural network capable of generating naturalistic terrain at massive scale.
Intro to Machine Learning
It would have been difficult to observe the tech world over the past few years without coming across the idea of Machine Learning in some way, but the actual process can be opaque from the outside. In reality, ML is a more straightforward idea than it might appear. At its most fundamental level, the term describes a loop intended to train a model capable of producing a desired output based on a particular input.
The basic steps are straightforward:
Start with a situation where you have pairs of known inputs and outputs: this is called the training data. This means that if the model produces the wrong result, it will know.
The model runs based on a given input. If it produces the correct output: great news.
If the model produces an incorrect output, it adjusts its parameters in an attempt to consistently produce the correct output.
Repeat this procedure using other pairs of inputs and outputs, trillions of times if necessary. Eventually, the model should be able to define its parameters in such a way that it can consistently produce the desired output based on the given input. But while you start with situations where the desired output is already known, the idea is that once the algorithm is trained it can start working on its own and producing accurate outputs even in new and unpredictable situations.For an example, we can imagine a simple algorithm: Ax2 + Bx + C = Y, or the quadratic equation. This defines a parabola, useful in modelling flight paths. To model this function we would start by giving our algorithm a set of data that we want to be described with a quadratic equation. It would look like this:
We already know the output we’re looking for. It looks like this:
That means we know the desired X for any given Y, so we can ask the algorithm to start guessing. Because our equation is Ax2 + Bx + C, the algorithm has three parameters it can adjust to try to “discover” this regression line: A, B and C. Because this is a very simple equation, it can probably get there pretty quickly. Once it learns how to assign a quadratic function to this data, it can draw a regression line and determine the output even for inputs where it is not already defined. That is the process of machine learning, on a simple level. In actual use cases, things get much more complicated. The inputs, models and outputs can all have thousands of variables, steps, parameters and figures, producing a process that becomes impossible to visualize like the quadratic equation above. If the desired result is a 100 X 100 pixel picture, for example, that means that the output will be comprised of a 10,000 distinct pixels, each of them described individually. Since a common use of ML is upscaling lower-resolution images, consider that a 4K image is 3840 X 2160 pixels and that the desired output would be 8,294,400 figures long. Now imagine that at 60 frames a second. The process of ML allows algorithms to perform tasks that may have seemed impossible before, approximating creativity in ways that people can begin to describe as Artificial Intelligence.
What We’re Doing
The goal of our machine learning processes is to produce large-scale, realistic maps quicker than could ever be made by hand and with more interesting, naturalistic results that can be produced given current procedural-generation tools.
To do this we use neural networks that we refer to as agents: webs of simple processes that feed into each other like a network of neurons in your brain. Here’s an example below, but note that this is not one of our real agents, just what one might look like:
Again, while the input, process and output are all vastly more complicated than a human could execute or even conceptualize, the basic idea is the same: we take an input and we run it through an algorithm to attain a desired output.To create a map, we use a series of four agents. While we work with each of them individually, they could also be considered one gigantic neural network because each agent feeds into the agent after it. Here’s how we go from a string of random numbers to realistic terrain:
Basemap Agent: This is the first step of map generation, designed to produce a swath of realistic-looking elevation data, with natural hills, valleys, mountains, desserts, and whatever other features are present in landscape we’re trying to emulate. This means both elevation data and biome data (mountain, forest, grassland, etc). Because we have to start somewhere, the initial input here is a random number (think like a map seed in a normal procedural generation system, just much longer). That number describes various maps created by our technical artists based off of topographical data and satellite imagery.
Upres Agent: This takes the simple map generated by the Basemap Agent and increases the resolution to something usable in the context of a realistic game. To train this agent, we can take a higher-resolution image akin to what we want an area to look like in-game, and then downscale it to the resolution produced by our Basemap Agent. We then ask it to scale it back up and try to make it look exactly like it did before. If it gets it right, it can apply the same process to images where we don’t yet know what we want it to look like. Once the agent is trained, we can run it sequentially to produce very high-resolution images. This means that we can run it only once or twice for areas that are far away from the player and don’t require high resolution, or maybe ten times for areas right next to the player.
Texture Agent: This applies textures to the map produced by the Upres agent. That means it can distinguish between rock, grass, moss, sand, and whatever other textures we decide will come to comprise our world as well as whatever more textures we wind up using in the future. Like the rest of these processes, the texture agent should eventually be able to handle increasingly complex scenarios once the basic model is established. Like the Basemap Agent, the training data here is produced by our technical artists, using more standard procedural generation tools with human monitoring to make sure that we’re working with images that have the naturalistic look we’re attempting to produce.
Population Agent: This is the final step in bringing these maps to life. The Population Agent distributes anything that has a volume and shape to it: trees, rocks, shrubbery and all the other assets that will keep these maps from being pleasantly textured barren wastelands. Again, this training data is produced by our artists designing scenes in the style that we want to reproduce.
Why We’re Doing It This Way
When people talk about “artificial intelligence”, they often emphasize the concept of “intelligence” over the concept of “artificial,” attempting or expecting a machine to able to mimic an idea of intelligence developed by observing human beings.
This leads to projects like training an AI to play games like GO or Starcraft, or to reproduce human-like voices and faces. While this is an interesting approach, we are less concerned with attempting to make a tool that mimics human intelligence and more concerned with making a tool that augments it.We’re using the concept of machine learning, in essence, for one reason: speed. We are interested in producing an incredibly powerful tool that can assist and empower our artists, designers, and ultimately our players, allowing them to create far more and far more quickly than they would be able to with traditional methods. The result is a kind of collaboration between human and artificial intelligence, with a world built by AI processes trained and guided by artists.Speed matters for creating our world both on a grand scale and in the moment. AI processes are lean compared to traditional approaches, which is crucial for our goal of generating a world at runtime. A standard procedural generation system might take a few seconds to generate a map before you start playing: fine for certain applications, but prohibitively slow for the level of detail and speed that we require.
We are less concerned with attempting to make a tool that mimics human intelligence and more concerned with making a tool that augments it.
On a basic level, ML was the only appropriate answer to the challenges outlined at the beginning of this post. But there are numerous advantages to this approach that go beyond the obvious, as well. Ultimately, our technical goals for this project come down to efficiency, scalability and flexibility. These are also crucial goals for our simulation engine, which you can read more about here. We need tools that are designed to grow and change over time, capable of producing and adapting to a deeper and broader world over the course of many years. We could, for example, design a new agent to actually model the trees in the world rather than simply populate it with a set of premade trees (this is a long way off, but it’s an example of what might be possible). This might mean a brand-new agent inserted before or after the population agent that allowed for more naturalistic flora throughout the world, something that’s easier to do because of the lean nature of AI processes. There is no reason to limit the output to what we create at the beginning, as well. The processes we’re designing are agnostic to the actual data that they are trained on, so they can become extraordinarily flexible once they’re established. We might start by training an agent based on the forests of Northern Europe, but once it’s up and running it should be able to shift gears towards building other biomes, even fantastical ones. Our population agent is trained off our artists’ work rather than real-world data, meaning that it should eventually be possible for artists to create any number of different types of scenes and then use our agents to extend the idea of that scene over thousands of kilometers. The goal is to create adaptable processes designed for maximum flexibility, rather than purpose-built processes designed to do a more limited range of tasks well.
Empowering third-party creativity from users both large and small is also crucial to our long-term vision, and so there are ML tools that could eventually become user-facing. Again, AI technology is lean at its core, so this could eventually become a professional-level world generation tool available to the general public, allowing for granular control over biomes, terrain features, asset population and more with relative ease. The promises of ML may seem idealistic at times, but as we saw in the intro, the process is simple. You take an input, run it through an algorithm, you get an output, you repeat. It is a tool built off of data, and it can become as expansive as the data you give it.