We did some analysis on a Freescale P4080 multicore device with cache disabled (a previous blog post explains some of the challenges of doing real-time software in the presence of a cache). A benchmark run was set up with 100 executions of an industry standard benchmark. The same benchmark was used on both core 0 and core 1 of the P4080.
With RapiTime, instrumentation code which highlights the execution of a specific section of code is automatically added. In this experiment, the traces were recorded to common memory, which means the cores needed to arbitrate memory access.
We expected the arbitration to be fair, in other words, for the tests to take the same amount of time to run on both cores. In fact, the results of the experiment showed this not to be the case:
Core | 1st iteration timestamp | 100th iteration timestamp | Difference for 99 tests | Unit / tests |
0 | 133346 | 213306 | 79960 | 808 |
1 | 9422 | 56480 | 47058 | 476 |
As can be seen, running tests on core 0 took almost 70% longer, which wasn't the anticipated result. Based on these observations, our inference is that core 0 memory accesses are being held up waiting for core 1 memory accesses.
Sign up to our newsletter to keep up to date with our multicore research activities.