Continuing our series of blog posts on optimizing embedded software with the aim of improving (i.e. reducing) worst-case execution times, this week we look at loop invariants.
Software optimization techniques which improve worst-case execution times #6: Loop Invariants
The worst-case execution time of some loops can be improved by hoisting loop invariants out of the loop. Loop invariants are any value or expression that remains the same on all iterations of the loop. Common cases of loop invariants include some parts of complex calculations and Booleans used in conditions. In general compilers are good at optimising code with loop invariants, so it is worth checking the compiler manual and the assembler code generated to determine if manual hoisting of loop invariants is necessary. The following example illustrates two possible implementations of the same functionality. In each case, the code deals with updating either one or two dimensional position data.
Example: Loop invariants present
void loop_invariants1(struct obj *obj, Uint32 *xpos, Uint32 *ypos, Uint8 two_dimensions, Uint32 t) { Uint32 i; for (i=0; i<MAX; i++) { obj[i].x = (5 * obj[i].x * t * t) + xpos[i]); if(two_dimensions) { obj[i].y = ypos[i] * t; } } }
The first implementation has a long worst-case execution time as 3 multiply operations and the conditional branch (if statement) are evaluated on each iteration of the loop. As the variable two_dimensions and the expression (5 * t * t) are both loop invariant, they can be hoisted out of the loop resulting in code that has a shorter worst-case execution time. Note that in the example, moving the if statement outside the loop results in some code repetition. This is typical of the maintenance versus execution time trade-offs that occur when optimising code for minimum worst-case execution time.
Example: Loop invariants hoisted
void loop_invariants2(struct obj *obj, Uint32 *xpos, Uint32 *ypos, Uint8 two_dimensions, Uint32 t) { Uint32 i; const Uint32 factor = 5 * t * t; if(two_dimensions) { for (i=0; i<MAX; i++) { obj[i].x = (factor * obj[i].x) + xpos[i]); obj[i].y = ypos[i] * t; } } else { for (i=0; i<MAX; i++) { obj[i].x = (factor * obj[i].x) + xpos[i]); } } }
Finally, the if(two_dimensions) test could be removed entirely with both x and y co-ordinates updated even if only the x co-ordinates are used. This reduces the code size and improves the worst-case execution time. (Of course care needs to be taken to ensure that operations involved in the code obj[i].y = ypos[i]; can always be carried out, even if the data copied is garbage). This implementation has the additional advantage that it is single path code, which is easier to test and will have more consistent cache and pipeline behaviour for subsequent blocks of code.
Example: Single path, loop invariants hoisted
void loop_invariants3(struct obj *obj, Uint32 *xpos, Uint32 *ypos, Uint32 t) { Uint32 i; const Uint32 factor = 5 * t * t; for (i=0; i<MAX; i++) { obj[i].x = (factor * obj[i].x) + xpos[i]); obj[i].y = ypos[i] * t; } }
The worst-case execution times of the three implementations, assuming a maximum of 10 iterations of the loop(s), were as follows:
Function | WCET (clock ticks) | |
MPC555 | HC12 | |
loop_invariates1 | 287 | 3401 |
loop_invariates2 | 245 | 2460 |
loop_invariates3 | 242 | 2450 |
Reduction in WCET (1 to 3) | 15.7% | 29.7% |