Simple P4080 Benchmarking Reveals Arbitration Bias

2013-07-23

We did some analysis on a Freescale P4080 multicore device with cache disabled (a previous blog post explains some of the challenges of doing real-time software in the presence of a cache). A benchmark run was set up with 100 executions of an industry standard benchmark. The same benchmark was used on both core 0 and core 1 of the P4080.

With RapiTime, instrumentation code which highlights the execution of a specific section of code is automatically added. In this experiment, the traces were recorded to common memory, which means the cores needed to arbitrate memory access.

We expected the arbitration to be fair, in other words, for the tests to take the same amount of time to run on both cores. In fact, the results of the experiment showed this not to be the case:

Core	1st iteration timestamp	100th iteration timestamp	Difference for 99 tests	Unit / tests
0	133346	213306	79960	808
1	9422	56480	47058	476

As can be seen, running tests on core 0 took almost 70% longer, which wasn't the anticipated result. Based on these observations, our inference is that core 0 memory accesses are being held up waiting for core 1 memory accesses.

Sign up to our newsletter to keep up to date with our multicore research activities.