I love that quote, "premature optimization is the root of all evil". Donald Knuth is one of the timeless masters of programming, whom every budding programmer imho should know!Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%
Fortunately, we have certainly not fallen into the trap of premature optimization when making Mechanic Miner! Quite the opposite, and the players of the game have had to live with what has been some pretty poor performance, which only became worse over time. Even when using a high end gaming rig, players would experience lag when playing the game.
We're sorry everyone had to endure that, but now at least Daniel and myself have been able to use the previous period on making things better. There is still a lot more that could be done, but I think we have managed to attack the 3% of the code which Knuth talks about in the quote and improve things quite a bit.
I'll go into the main key optimizations we have done in some more technical detail now.
We made a small tool for outputting the measured time spent in the terrain generation code, for each rendered frame. Here is a dump for the desert overworld, where the player is flying to the right.
As you can see the time spent varies a lot between frames. Some frames several terrain tiles are being generated, and as a result there will be stuttering. I then tried disabling almost all of the terrain generation and running the test again, and indeed the stuttering all but disappeared.
So now make the terrain generation faster! Actually we did not need to make them faster, just change the code so it could run entirely on its own thread. The reason this had not already been done was that the drawing of polygons on the terrain was done on the GPU controlled by the main thread. We had long wanted to change that and had already written a small routine for CPU rasterizing polygons. What remained was to change the generation code to use this method, and that was easy enough, although it took some time because there is a number of different kinds of terrain generated in the desert overworld, mines, sand caves etc.
In that same pass we took the opportunity to change the generation of terrain collision data for the physics engine from individual pixels into small segments (typically around 20 pixels wide). Our profiling of physics in the game so far has indicated pretty consistently that most of the CPU time spend is moving objects around in the physics engine's spatial hash. If number of objects in that hash is reduced, it will be faster to relocate objects every time they are moved.
In the future we will probably need to work more on the collision hash in the physics engine, because it will ultimately be the performance limitation with big complex machines.
CPU using large amount of time on rendering
The CPU of course does not do much rendering work, as it is the GPU's job mainly. We had identified two probably causes of this:
- Too many "draw calls", i.e. much of the graphics was sent to the GPU one quad at a time instaed of being grouped together ("batched")
Too much data being transferred to the GPU every frame, much of which was actually exactly the same as the previous frame
As it turns out, reducing the number of draw calls had almost no effect:-( We did not realise at the time, that OpenGL has actually become really good at handling a large number of draw calls / state changes.
On the other hand, the caching of data in the CPU made a big difference and reduced the CPU rendering time.
The light buffer
So the light is rendered on top of the game gaphics every frame, is calculated by the CPU and sent to the GPU for rendering. The sending to the GPU was the most time consuming part, and we changed this to a double buffering scheme, where the CPU would render a frame of light, and then simply queue uploading to the GPU (using a DMA transfer). The next frame, this data is then available on the GPU to render to the screen.
Essentially, because the main rendering context is already double buffered, it changes the main rendering pipeline to "triple buffering", meaning that the game is simultaneously working on the two next frames, as well as displaying the current one.
This change more or less made the CPU cost of uploading light buffer data zero.
The light is still a bit of a CPU time hog, as the calculation of the light buffer is done by the CPU. We want to opmitized this as well - most of the time is spent blurring the buffer (using a box blur algorithm), but this could be offloaded to the GPU, which is normally able to do blurring much faster.
Big machines quickly kill the framerate
We did simple profiling of large machines and realised machines with a lot of blocks in them cost a lot of time calculating buoyancy, which is the code that make machines float somewhat realistically in water! This was of course quite easy to address, because it was done at a way too high frequency, and also, that code was optimized a lot.
After this update, we are much closer to the point, where the game is limited in performance by stuff that is actually hard for the computer to do, i.e. Mechanic Miner's real physics on machines.
Of course we want to improve this, and think we can do so. The two main idea we have is to simplify redundant collision geometry in machines, which currently have collision box for each block in the machine - and to run the physics engine on its own thread, so we can use a full frame worth of CPU time on just the physics.
We are going to leave this for now though, and start working instead on content in the game!
Feel free to comment below. We would like to hear your thoughts!