Is it possible to use diversity on a single processor to reliably detect hardware faults?
That's the thought that lead to research which resulted in ED4I, a technique to support error detection. How we got from this research to the point where cars on the road are now relying upon ECUs that have been developed using this technique is an interesting story.
The story starts a few years ago when one of our customers asked if we could do anything to help them use ED4I to automate how they generate diverse code of their embedded safety-critical control systems. Maintaining different variants of the same control system and having to maintain a diverse version was using up a lot of engineering work. If they could automate how diverse versions of their control algorithms were generated, then the customer could save a lot of precious time and effort.
What is ED4I?
ED4I stands for "Error Detection by Diverse Data and Duplicated Instructions", it was a technique published in 2000 by Nahmsuk Oh, Subahshish Mitra and Edward J. McCluskey from the Center for Reliable Computing, Stanford [1].
A common technique to detect temporary faults is to run the same program twice and see if you get the same results. ED4I is a variant on this, that detects permanent faults too.
ED4I works by running two similar but different versions of the software. Each version of the software performs the same basic computations but it uses different numbers in those calculations. This means that the data used in the computation uses different binary bits throughout its journey through the system architecture: memory, registers, data bus, arithmetic logic unit. Therefore a variety of faults in the hardware (temporary or permanent) can be detected.
Oversimplifying the theory a lot: if you scale (multiply) all the inputs to an algorithm by a constant value then the output you get is also scaled by the same value. There are a number of transformations that you need to make to the program to make this true. This presentation [2] has some simple explanations.
The scaling factor used is -2, which means that all the numbers in an algorithm are shifted and inverted. e.g.
57 (0011 1001) becomes -114 (1000 1110)
The basic transformations needed to use this technique are concerned with comparisons (<, > etc), constants, multiplication and division:
An example of ED4I
/* Version 1 of computation */ x1 = 2; y1 = 3; z1 = x1 * y1 /* z1 is 6 */ /* Version 2 of computation */ x2 = -4; /* scale all constants */ y2 = -6; z2 = x2 * y2 / (-2); /* transform multiplication by dividing by -2*/ /* z2 is -12 */ /* Compare results */ if (z1 != z2/(-2) ) { /* error detected */ }
There are, of course, lots of loose ends to tie up: constants, global variables, arrays, pointer arithmetic, types, overflow... the list goes on. Some of these are explained in [1]. These situations are handled automatically by our ED4I tool.
What did we do?
We wrote a tool which takes embedded C software and produces an ED4I version of certain functions by automatically applying the correct transformations, scaling constants, dealing with comparisons and all the other oddities.
The customer has used this several times on automotive applications as a fast way of getting redundant software versions to detect hardware faults. It turns out that this is an efficient and cost-effective way of getting a diverse version of software, suitable for certain ISO-26262 and AUTOSAR systems.
The real cost saving comes from being able to automatically generate diverse software without having to maintain multiple versions. This is especially important when you have many similar product variants generated from the same basic software source.
We are now doing some research in the VeTeSS project based on this, to understand how this technique can be applied more broadly. If you're interested, why not get in touch?