Assured Multicore Partitioning for FACE Systems

Alex Wilson (Wind River) and Steven H. VanderLeest (Rapita)

2020-11-10

Our previous blog identified the benefits of partitioning and explored assurance of partitioning on a single core. In this fourth blog of the series, we continue with the topic of partitioning, but now with a focus on multicore processors. Avionics system designers and integrators designing to the FACE standard are adopting the MultiCore Processor (MCP) to meet future performance demands. MCP provides the avionics industry with a platform that can provide greater performance in a lower footprint, translating to systems with lower Size, Weight, and Power (SWaP). Regardless of these benefits, OEMs in the avionics industry are pressured to adopt MCP technology when making upgrades to replace legacy single-core designs because nearly every new processor on the market today is based on multicore technology. The benefits of partitioning still hold even when implemented on an MCP, including portability, modularity, reduced integration effort, and reduced certification effort (due to incremental and compartmentalized assurance). However, MCP also introduces additional complexity in implementation and certification. System integrators need guidance on how to be successful.

Guidance for Assurance of Multicore Systems

DO-178C, DO-297, and ARINC 653 were all written in the context of single core processors, so the introduction and use of MCP adds further complexity. The FAA's CAST-32A position paper was the first major publication to address DO-178C/ED-12C assurance for MCPs. This has since been superseded for projects certified with the European Union Safety Agency (EASA) in AMC 20-193 and for projects certified with the Federal Aviation Administration (FAA) in AC 20-193.

CAST-32A extends the partitioning concept for multicore processors “Robust Time Partitioning (on an MCP) is achieved when, as a result of mitigating the time interference between partitions hosted on different cores, no software partition consumes more than its allocation of execution time on the core(s) on which it executes, irrespective of whether partitions are executing on none of the other active cores or all of the other active cores.”

Although each US military service has its own airworthiness authority, evidence of airworthiness according to FAA guidelines is often accepted as part of the certification effort. Specifically, CAST-32A is one of the FAA guidance documents that some military programs are currently using when considering the adoption of MCP systems. In addition, starting in 2019, augmentation has been underway on Section 15 “Computer Systems and Software” of MIL-HDBK-516C Airworthiness Certification Criteria. Of the 42 criteria listed in section 15, 20 have been identified as needing augmentation to account for the use of MCPs. An update to MIL-HDBK-516C is expected in the coming year.

The certification applicant for a FACE conformant system running on a multicore processor will need to meet either civilian flight certification guidelines (such as DO-178C, DO-297, and AC 20-193) and/or military airworthiness guidance (such as the augmented MCP requirements for MIL-HDBK-516C). Conformance to the FACE technical standard in itself does not ensure that these standards are met, though the FACE EA-25 subcommittee is currently working on a white paper to provide more detailed advice on which FACE activities and artifacts may contribute to flight certification evidence for standards such as DO-178C and MIL-HDBK-516C.

Assuring Partitioned, Multicore Systems

When multicore processors first started appearing in avionics, early adopters avoided some of the certification issues by simply deactivating all but one of the cores. This worked, because In many cases that single core offered better performance than older unicore processors. However, as the years have gone by, the individual cores have not improved much further in performance, meaning that significant gains in functionality through higher processor computational throughput can only be achieved more than one of the cores in an MCP.

Proving the correctness of time-partitioning even on a unicore system is challenging. Although only one partition can run at a time when there is only one core, partitions could still interfere with each other directly by causing unconstrained partition jitter, i.e., variation in the time for each partition scheduled time slot without a deterministic bound. Partitions on single cores can also interfere with each other indirectly. For example, one partition could start an operation using a bus master other than the CPU, such as a DMA engine, whose activity extends past the end of the partition’s scheduled time slot, and thus overlaps and possibly contends with another partition’s activity.

In an MCP system, multiple applications can now run simultaneously, each on its own core. Some resources are private and exclusive to a single core, such as the L1 cache, as shown in the example four-core architecture in Figure 1. However, other resources are shared, such as the L2 cache, main memory, and I/O devices. The applications on different cores can thus contend for access to these shared resources, potentially impact each other’s performance, which breaks down the isolation between partitions.

Figure 1: Multicore Computer Architecture Hosting Three Applications

One simplified approach to achieve time partitioning schedules a single partition at any given time but allows that partition to use more than one core. This approach is limited, as only multithreaded application software can take advantage of multiple cores, while single-threaded cores leave the other cores idle during their time window of the schedule, as shown in Figure 2 .

Figure 2: Simplified Multicore ARINC653 Schedule Restricted to One Partition at a Time

Only recently have early adopters of MCPs attempted to use all cores with mixed-criticality software by following the guidance in DO-297 and A(M)C 20-193. Figure 3 illustrates the complex nature of this approach, where multiple independent applications can run within the same partition window (each on its own core), even if the applications are certified at different design assurance levels. In this case, independent partitions with different criticality, i.e., different software levels, are able to run simultaneously, on different cores, within the same minor time frame of the schedule. An example of this complex approach is given by the work Collins Aerospace has done with Wind River as referenced by the White Paper.

Figure 3: Complex Multicore ARNC653 Schedule

Proving the correctness of time-partitioning on a mixed-criticality multicore system is quite challenging since partitions can run simultaneously on different cores. Independent partitions running on distinct cores now compete for shared resources. That contention can cause delays, thus increasing the software’s Worst-Case Execution Time (WCET). While an application running in a partition may meet its requirements when no other cores are active, interference from functions running on other cores may increase the application’s WCET to the point that it no longer meets its requirements. A variety of resources may cause such interference because the processor cores share access, including lower levels of cache memory, the main memory, I/O devices, and bus interconnects.

A(M)C 20-193 provides guidance on assuring a multicore system. Ten objectives must be met to demonstrate that multicore interference channels have been mitigated. The planned additions to MIL_HDBK-516C also include criteria to analyze interference channels. An interference channel is defined as any property of the computing platform “that may cause interference between independent applications.” All interference channels within the avionics system must either be eliminated or else their impact must be sufficiently reduced that all applications meet their timing requirements even in the presence of the worst-case level of interference from other cores. Similarly, the MIL-HDBK-516C additions require that “execution rates provided by the executive/control structure … are consistently obtainable and sufficient to safely provide the required performance.” That is, even in the face of multicore interference, timing requirements must be met. Successfully meeting the objectives of these standards requires both attention to mitigation of interference channels during the early life cycle stage of design and careful attention to measurement methods during the later life cycle stage of verification.

Mitigating Multicore Interference

Mitigation of multicore interference requires that partitions be isolated from each other. Multiple separation mechanisms to achieve partitioning are available to the system designer and integrator, implemented in hardware and/or software.

Some isolation mechanisms are provided by the hardware. Selecting a processor that gives each core exclusive resources can eliminate contention for them. For example, at least one level of cache memory is typically provided that is distinct and exclusive to each core. For CPU-intensive applications with strong locality of access, a private L1 cache is often sufficient to render the software largely insensitive to interference from other cores. When resources must be shared, the hardware may provide isolation by ensuring equitable access. Even then, one core’s access may impact access by another core, breaking down the separation. Unfortunately, silicon providers may not provide better determinism in cases like this because it would impact raw performance.

Although hardware can provide some isolation mechanisms, most processors are designed to optimize the average execution time on all cores, often at the expense of the WCET on any one core. While this is a good trade-off for many commercial applications, it is a problem for safety-critical avionics, as it represents a form of multicore interference. Thus, additional isolation approaches beyond those implemented through hardware are necessary.

Some isolation mechanisms are provided by the RTOS. A multicore RTOS can manage processor cores, ensuring the usage of any one hosted application is deterministically bounded within its partition so that all applications meet their requirements -- even when multicore interference is present. An example of such an RTOS is the Wind River VxWorks 653 Multi-Core edition.

Verifying Mitigation of Multicore Interference

Verification that multicore interference channels have been mitigated is an essential step in meeting A(M)C 20-193 objectives. Since no one has yet flight certified a civilian aircraft with a mixed-criticality multicore system, approaches to verification are only just appearing. Nevertheless, best practices for verification are emerging based on interference generators.

Interference generators are carefully crafted software applications with a small code footprint that create high bandwidth use of a targeted resource. For example, Rapita Systems has a library of interference generators called Rapi Daemons that target resources such as a shared L3 cache, DDR main memory, or DMA. These interference generators are used to create inter-core stress on the targeted resource. For example, Figure 4 shows Rapi Daemons running on cores 1, 2, and 3 with the application running on core 0, where the Rapi Daemons create interference for access to the main memory, as well as any shared resource along the logical path to that resource, including interconnects and cache.

Figure 4: Measuring Multicore Interference with Rapi **Daemon** Interference Generators

Comprehensive analysis of multicore interference requires the identification of all potential interference channels. The system architecture must be reviewed for shared resources, including potentially obscure or hidden channels where cores can contend, such as interconnects that are not equitably arbitrated or last level caches with insufficient write ports. For each interference channel identified, an interference generator must be designed and calibrated to stress that channel. For systems at the highest design assurance levels, such as DO-178C Levels A and B, this testing must be done with independence.

The Rapita Systems approach to multicore timing analysis uses two phases within a test methodology for measuring multicore interference channels. The first phase is platform characterization, wherein the platform is defined as both the computational hardware as well as the RTOS. In this phase, we check the outer bounds of possible multicore interference by competing Rapi Daemons against each other on all cores. Because Rapi Daemons are tuned to stress a single targeted shared resource, this provides a signature of the performance possible when cores compete continuously for that resource. This phase can be done even before application software is available.

The second test phase is software characterization. Individual software applications intended for use in flight are first measured running alone on a core while the other cores are dormant, providing a baseline measurement of the software timing behavior. Next, we measure the increase in WCET in the presence of Rapi Daemon adversaries running on the other cores. By comparing the WCET with and without the Rapi Daemons adversaries running, we can quantify worst-case multicore interference. Furthermore, the WCET with multicore interference can be compared with system requirements to determine if they are still met -- thus demonstrating whether or not interference channels have been sufficiently mitigated.

Integrated Modular Avionics (IMA) designed with robust partitioning according to DO-297 allows for the incremental acceptance of assurance evidence, accumulating certification artifacts over the course of several independent verification efforts. That is, we can measure each partition application independently of the others, verifying the mitigation of multicore interference channels. No partition application will produce contention higher than that produced by the Rapi Daemons. The assurance evidence can be collected for each partition application by itself, even though all planned applications will be integrated together in the final system. Furthermore, the timing tests for the existing partitions need not be repeated when new software is added in the future when mapped to in spare partition slots. This incremental verification approach enabled by robust partitioning is a key factor in reducing the number of tests that must be performed.

Even with incremental verification benefits, the complexity and number of interference channels in a typical multicore avionics system can still lead to a large number of tests that must be completed. Thus, tool automation is important to keep the schedule and cost of verification reasonably constrained. The Rapita Systems approach to multicore timing analysis automates key stages of the workflow using both Rapi Daemon interference generators and the Rapita Verification Suite software, as illustrated in Figure 5.

Figure 5: Tool automation through Rapi **Daemons** and Rapita Verification Suite Software

However, not everything can be automated. Human wisdom from the system and test engineers is still needed to identify interference channels in the system architecture at the start of the process and to properly interpret results at the end of the process.

Wrap-up

System designers and integrators need help to successfully design, integrate, and flight certify modern multicore avionics systems software. Assuring the safety of these systems, e.g., achieving A(MC)C 20-193 (formerly CAST-32A) objectives, can be challenging. Wind River and Rapita Systems can be your guide to success. Wind River provides operating system technology to mitigate interference. Rapita provides a commercial approach for multicore timing analysis using Rapi Daemon interference generators integrated within an automated verification tool suite, which can provide the insights you need early in the project and the artifacts you need to support A(MC)C 20-193 and MIL-HDBK-516C MCP objectives at the end of the project.

In our next blog, we will provide an overview of how the activities and artifacts produced while pursuing conformance to the FACE technical standard can also be leveraged toward certification of airworthiness.

Learn about the FACE standard and how to comply with it in our blog series:

Part 1: How the Operating System Segment fits into the FACE architecture
Part 2: FACE Components - Interchangeable but not Equal
Part 3: Assured Partitioning for FACE Systems
Part 4: Assured Multicore Partitioning for FACE Systems (this post)
Part 5: Leveraging FACE Conformance Artifacts to Support Airworthiness

DO-178C webinars

White papers

Mitigation of interference in multicore processors for A(M)C 20-193

Developing DO-178C and ED-12C-certifiable multicore software

Efficient Verification Through the DO-178C Life Cycle

A Commercial Solution for Safety-Critical Multicore Timing Analysis

Other

RVS Software Policy

Engineering Services

Webinars

Brochures

Product briefs

Technical notes

Research projects

Multicore resources

Industries

Standards

Assured Multicore Partitioning for FACE Systems

Guidance for Assurance of Multicore Systems

Assuring Partitioned, Multicore Systems

Mitigating Multicore Interference

Verifying Mitigation of Multicore Interference

Wrap-up

Learn about the FACE standard and how to comply with it in our blog series:

DO-178C webinars

White papers

Related blog posts

Industry leading verification tools

Multicore Verification

Other

RVS Software Policy

Industry leading verification services

Latest from Rapita HQ

Technical resources for industry professionals

Discover Rapita

Industries

Standards

Assured Multicore Partitioning for FACE Systems

Guidance for Assurance of Multicore Systems

Assuring Partitioned, Multicore Systems

Mitigating Multicore Interference

Verifying Mitigation of Multicore Interference

Wrap-up

Learn about the FACE standard and how to comply with it in our blog series: