Our series of blog posts on optimizing embedded software with the aim of improving (i.e. reducing) worst-case execution times continues with static loop iteration count.
Software optimization techniques which improve worst-case execution times #10: Static Loop Iteration Count
The following example combines the use of the maximum (rather than a variable) number of loop iterations to achieve single path code and further optimisations.
This example is based on device driver code for a CAN controller; however, it is applicable to most scenarios where a relatively short but variable length message is copied from one buffer to another.
In the first implementation a for loop is used to copy up to 8 bytes of data from the memory mapped communications device (CAN controller). The address of the start of the data is a fixed memory location: MSG_DATA. The number of bytes in the message is read from another memory location MSG_LEN.
The initial implementation using a loop copies only the bytes of data that make up the actual message. It may be faster for short messages and faster in the average case; however, in the worst-case, when the message has 8 bytes of data, it executes many more instructions than the unrolled alternative.
Example: For loop, variable iterations
void variable_copy1(Uint8 *buf) { Uint32 i, length; length = *MSG_LEN; /* copy data as necessary */ for (i=0; i<length; i++) { buf[i] = *(MSG_DATA + i); } }
Example: Single path alternative
void s_path_copy2(Uint8 *buf) { Uint32 length; length = *MSG_LEN; /* always copy all data */ buf[0] = *(MSG_DATA); buf[1] = *(MSG_DATA + 1); buf[2] = *(MSG_DATA + 2); buf[3] = *(MSG_DATA + 3); buf[4] = *(MSG_DATA + 4); buf[5] = *(MSG_DATA + 5); buf[6] = *(MSG_DATA + 6); buf[7] = *(MSG_DATA + 7); }
The second implementation always copies all the data from the communications device irrespective of the message length (0 to 8 bytes). This is fine, as the buffer has to be long enough for 8-byte messages and subsequent code will use the length variable to determine how many bytes are actually part of the message.
In this example, moving to single path code not only completely removes the overhead of the loop termination test; it also results in a set of addresses for the 8-bytes of message data that can be evaluated statically at compile time. This removes the need for an add operation on each access, further improving the worst-case execution time. This implementation is also far easier to test as the same code is executed each time. All data dependency has been removed.
Note with a 16- or 32-bit interface to the CAN controller, this code could be made more efficient still by copying 16- or 32-bit words from the controller, reducing the number of operations required from 8 to 4 or 2. The execution time for a 32-bit single path copy is also given in the table below.
Function | WCET (clock ticks) | |
MPC555 | HC12 | |
variable_copy1 | 72 | 323 |
s_path_copy2 | 49 | 50 |
s_path_32bit_copy3 | 16 | 38 |
Reduction in WCET (1-3) | 77.7% | 88.2% |