Optimizations

Optimizations cannot be done blindly, you need to know what is causing the code to run slowly. In most cases, data movement tends to be the most expensive operation while running algorithms or functions on data is less expensive. If you begin development by organizing your data into arrays, you open yourself up to many opportunities for optimization. By having such an agnostic data layout, any improvements made can be applied in many locations with little concern for incompatibility.

To go even further, by using stateless transformations we can also be assured that we are not introducing unexpected side effects. Most projects suffer from not thinking about optimization early enough, claiming that premature optimization is inherently bad. However, optimization late in project’s life cycle becomes nearly impossible with object instances existing everywhere and flawed design patterns heavily entrenched into the code.

A primary flaw with Object Oriented Design is the idea that an Object Instance is an atomic unit of processing. This assumption leads to collections of objects should be treated as collections of individuals. This concept is true, but it then directs all instructions to flow through those object instances. This works at small scale, but when considering hundreds of individual objects that now need to be processed sequentially, the wheels fall off. Objects tend to exist just to mirror how things exist in the real world. This leads to features being added that match some real world relationship. This is engineering by story-telling and is unreasonable.

When should we optimize?

Premature optimization is proposing a solution without knowing whether it will actually make a difference. Optimizing without using any data to prove that the code is slow or is using too much memory is bad and premature. Without profiling, even if the code is visually running slow and erratic, any changes introduced are by definition premature optimizations.

The only way to avoid premature optimization is to look at data. Profile the application and observe the results. Any improvements after this cannot be premature, because they have been measured. Theses measurements can then be evaluated in terms of failure, success, or progress.

Profiling should only start when you have metrics to work towards. When metrics have been defined we can begin profiling. Without having any metrics defining what is considered to be optimal, it is unreasonable to check the performance of the code.

Feedback

Not being aware that you are writing poorly optimized code hurts more than just the application. Having zero feedback, people cannot improve their skills. This leads to enforcing bad habits and producing even more bad results. Delayed feedback isn’t great either because it limits the opportunity to internalize the feedback and develop intuition.

Adding performance metrics for the code that can be quickly observed helps with this process. Learning early that a technique is slow helps remove the sunk cost fallacy that can develop over time. Additionally, having data that proves an approach is bad/slow is capable of winning over the most stubborn engineers. Curiosity is a powerful remedy for the developer with a wounded ego.

A classic method of developing software is allocating budgets for time, memory, bandwidth, disk, and any other limitations that could impact performance. Any sort of limitation that can impact the application should have a budget. Establishing budgets early and complying with them will eliminate future grief in the product life cycle. This prevents early poor design decisions from getting deeply integrated.

Get or build a profiler that runs constantly, providing detailed breakdowns whenever any budget is exceeded.

Strategy

Define

Measure

Analyze

Implement

Confirm

Summary

Tables

Keeping data as vectors has a lot of positive benefits. Whether this is a standard implementation or a roll-your-own dynamic array, it’s a great starting point for future improvements. Most processing will end up being reading an array, transforming one array to another, or modifying it in place.

One step further is moving from an array to a structure of arrays, but only if the access patterns fit. When considering how to structure data, it is important to consider how it will be loaded and stored. CPUs are optimized for certain patterns of memory activity. Often there can be a cost when switching from read to write operations and it can be beneficial to arrange writing to memory to be predictable and serial.

The following code example contains a mixture of hot/cold writes:

type PosInfo struct {
  Pos      Vec3
  Velocity Vec3
}

type Nodes struct {
  PosInfos []PosInfo
}

func updateNodes(nodes *Nodes, trialCount int, nodeCount int, deltaTime float32) {
  for times := 0; times < trialCount; times++ {
    posInfos := nodes.PosInfos

    for i := 0; i < nodeCount; i++ {
      posInfos[i].Pos.X += posInfos[i].Velocity.X * deltaTime
      posInfos[i].Pos.Y += posInfos[i].Velocity.Y * deltaTime
      posInfos[i].Pos.Z += posInfos[i].Velocity.Z * deltaTime
    }
  }
}

We can modify the data layout to avoid loading data that is not being written:

type Nodes struct {
  Positions  []Positions
  Velocities []Velocities
}

func updateNodes(nodes *Nodes, trialCount int, nodeCount int, deltaTime float32) {
  for times := 0; times < trialCount; times++ {
    for i := 0; i < nodeCount; i++ {
      nodes.Positions[i].X += nodes.Velocities[i].X * deltaTime
      nodes.Positions[i].Y += nodes.Velocities[i].Y * deltaTime
      nodes.Positions[i].Z += nodes.Velocities[i].Z * deltaTime
    }
  }
}

The struct of arrays can be more CPU cache friendly, but only if the data is not strongly related for reading and writing. It would make little sense in this case to isolate the X, Y, and Z values into their own arrays since they are handled as a unit.

Another argument against struct of arrays as a solution is for data that is frequently inserted and deleted. This can be mitigated by using a “free list” to prevent the array from being mutated, but this adds more complexity and can be not worth the effort.

References