Hi Everyone, during the past month my main focus was on upping the quality of our two October releases: CitizenCon and SC 1.3.
During this time I worked on several optimizations, mostly around the ZoneSystem and CPU side object culling. Besides those, my second focus was on improving our thread backend to be more optimal for a PC only game.
CPU Performance has changed a lot over the recent years. In the old times, optimization was straight forward, there was only a single execution unit inside the CPU which did all the computations. In addition, there was an active GHz race, causing your code to automatically run faster by each new released CPU. Nowadays, the GHz of CPUs don’t change much anymore (single core performance still increases, but not on the level as it did before) and CPUs have gone “wide”, by providing more execution units.
This puts more burden on the programmer, as concurrent programming can be very complex. Since games have (by their nature) are very sequential execution; each frame first must update the state of the world, and then send this state to the GPU for rendering, It is hard to parallelize those in a way that actually gets you any performance gain. To do this, one of the most prominent models used for Games is a so called Main Thread. This Main Thread can be assumed to be like a regular game loop from the single core CPU times. The other cores are then used during a frame to help the Main Thread. If for example, we must update the state of 100 particle systems, we can distribute those over all CPU cores, to reduce the latency between beginning to update the state and being done with it. See the attached picture for a simplified example how this distribution helps.
To make all of this even more complex, the PC platform has to be more general than consoles. On consoles, the game normally has all the resources exclusively and on a known hardware set. On a PC, the game has to share the resources dynamically with an unknown number of processes running at the same time on an unknown hardware platform.
So to better utilize the PC platform, we switch the thread backend to batch oriented work stealing approach. By using this new model, we can massively reduce the cost to communicate with different threads as we only need to send a signal once per batch, and not once per entry. We also reduced the contention between the worker threads (important to scale to a higher number of CPUs), by utilizing work stealing so that threads communicate with each other instead of over a central queue.
This whole threading change was one of the major improvements for performance for 1.3, which also causes a higher CPU usage (which is good, as we now actually make use of the cores inside your CPU). Currently, nearly all legacy jobs are already ported over to this system, as well as all CPU side culling of the ZoneSystem. In the future this system will be used to parallelize parts of the game code, as well as a few additional things.