Cache Memory: the hardware acceleration feature that can slow you down

Antoine

2010-04-29

Most processors now use caches to reduce the average time to access memory. What do you need to know about the possible pitfalls of using caches? Here's a practical example. If an access to memory that would normally require 4 cycles is replaced by a cache access (accessed in 1 cycle), the memory appears to be 4 times faster.

The drawback is that an access to memory that does not hit the cache now takes 5 cycles (1 cycle to check the cache + 4 cycle for accessing the memory).

Also, if the cache misses it has to evict one of the existing entries in order to make room for the new entry.

As an example, let's assume that this very simple software is executed on a processor with direct mapped cache.

void f0 (void)
{
	for(;;) {
		f1();
		f2();
	}
}

Assuming this is the only code executed, if the functions f0, f1 and f2 do not conflict in the instruction cache, then the cache hit ratio for this code would be close to 100% (because after the first execution of f0, f1 and f2 all their code is in cache).

Now, in the very unlikely case where f0, f1 and f2 have the exact same size and that all their instructions conflict in the cache (i.e. their addresses modulo the size of the cache is the same), the cache hit ratio would be 0%, and the code would be executed a lot slower. In fact it would be even slower than not having a cache at all.

However unlikely, such a behaviour is possible. So what can trigger it? Various factors are to be considered: cache configuration, execution path, size of the code and location of the code in memory.

This last one is particularly interesting as it is usually neither controlled nor considered as a risk.

Although we usually tell the linker where we want or code to reside in memory, we don't consider the order. For example we might compile like this: gcc *.c -o all.elf

In this example (see figure), all the difference between having 0% of near 100% hit ratio can be made by either compiling like this:

gcc f0.c f1.c f2.c f3.c f4.c f5.c f6.c -o all.elf

or like this:

gcc f0.c f3.c f4.c f1.c f5.c f6.c f2.c -o all.elf

DO-178C webinars

White papers

Mitigation of interference in multicore processors for A(M)C 20-193

Developing DO-178C and ED-12C-certifiable multicore software

Efficient Verification Through the DO-178C Life Cycle

A Commercial Solution for Safety-Critical Multicore Timing Analysis

Multicore Verification

Engineering Services

Other

Webinars

Brochures

Product briefs

Technical notes

Research projects

Multicore resources

Cache Memory: the hardware acceleration feature that can slow you down

DO-178C webinars

White papers

Industry leading verification tools & services

Multicore Verification

Industries

Other

Latest from Rapita HQ

Technical resources for industry professionals

Discover Rapita

Who we are

Working at Rapita

Careers

Cache Memory: the hardware acceleration feature that can slow you down