Not taking into account instruction-level parallelism
Instruction level parallelism (ILP) is performance optimization that allows the parallelization of a sequence of instructions. Instead of being a sum of the instructions total execution time, our total time is just the length of whichever instruction took the longest to execute.
This can be incredibly beneficial, but it can lead to hazards. If one of the instructions is dependent on another parallel instruction, like a conditional operation, this is called a control/branching hazard. CPU designers solved this problem using branch prediction. Using internal metrics, the CPU can make a best guess at what the most likely branching path will be. In the event that the CPU guesses incorrectly, the CPU will flush the current execution pipeline to ensure there are no inconsistencies leading to a performance penalty of ~10-20 cycles.
Data hazards should also be considered. Consider the following:
// write A and B to register C
ADD C, A, B
// write C and D to register D
ADD D, C, D
The second instruction in this case is dependent on the first setting register C. To avoid this, CPU designers began using a trick call forwarding to bypass writing to a register.
Mistake
const n = 1_000_000
func add(s [2]int64) [2]int64 {
for i := 0; i < n; i++ {
s[0]++ // data hazard
if s[0]%2 == 0 { // control hazard
s[1]++ // data hazard
}
}
return s
}
Fix
const n = 1_000_000
func add(s [2]int64) [2]int64 {
for i:=0; i < n; i++ {
v := s[0] // removes the data hazard from the increment
s[0] = v + 1 // this can now be run in parallel by the CPU
// removes the control hazard by checking the variable instead of the array
if v%2 != 0 { // this can now be run in parallel by the CPU
s[1]++
}
}
}