The penultimate blog post on optimizing embedded software with the aim of improving (i.e. reducing) worst-case execution times focuses on repositioning code and state machines.
Software optimization techniques which improve worst-case execution times #17: Repositioning Code and State Machines
State machines are a common construct in real-time systems. Typically a state machine executes periodically and must complete its execution within a defined execution time budget. On each iteration of the state machine, the code for the current state is executed and then the state for the next iteration selected. Often this state-based operation provides an opportunity to reposition code so that it is no longer on the worst-case path. This is achieved by moving code that would execute at the end of one state to the beginning of the next or visa-versa, the aim being to balance the execution time of the different states such that the worst-case execution time is minimised.
Example: Simple state machine
void state_machine1() { Static Uint32 tmp[5]; Uint32 total, mean, j; switch(g_state) { case 0: case 1: case 2: case 3: tmp[g_state] = take_reading(); g_state++; break; case 4: tmp[g_state] = take_reading(); total = 0; for(j = 0; j < 5; j++) { total = tmp[j]; } mean = total / 5; output(mean); g_state = 0; break; } }
The first implementation given above illustrates a simple state machine. The state machine takes readings each time it executes. On every fifth invocation, it calculates and outputs the mean value of the last five readings. This state machine is heavily unbalanced, with short execution times for states 0, 1, 2 and 3 and a long worst-case execution time for state 4.
Example: State machine improved balanced
void state_machine2 () { Uint32 tmp, mean; static Uint32 total = 0; switch(g_state) { case 0: total = take_reading(); g_state++; break; case 1: case 2: case 3: total+= take_reading(); g_state++; break; case 4: total += take_reading(); mean = total / 5; output(mean); g_state = 0; break; } }
The second implementation (above) shows how the processing can be shared out more evenly between the states. Here, the summation is done incrementally in each state and the initialisation of the static variable total is moved to state 0 where the first in each set of five readings is taken. This results in a significantly reduced worst-case execution time.
Function | WCET (clock ticks) | |
MPC555 | HC12 | |
state_machine1 | 89 | 464 |
state_machine2 | 52 | 272 |
Reduction in WCET | 41.6% | 41.3% |