Software-Based Fault Injection Countermeasures (Part 2/3)

Authors: Jeremy Boone, Sultan Qasim Khan 

This blog post is a continuation of part 1, which introduced the concept of fault injection attacks. You can read that prior post here.

When advising our clients on the matter of fault injection (FI), we are often asked how to determine whether low-level software is vulnerable, and more importantly, how to thwart glitching attacks. Fault injection mitigations can be incorporated into a product at both the hardware and software level. Though, in this blog post, we will focus only on software-based countermeasures.

This post contains various C functions, macros and programming patterns that can be used to achieve double glitch resistance within software. By “double glitch resistance”, we mean that skipping or incorrect evaluation of any two instructions should not be able to induce incorrect entry to the protected side of a conditional check. Entry to protected code paths in disallowed states must require the skipping or incorrect evaluation of three or more instructions.

Our Test Environment 

The code samples shown below were validated for double glitch resistance on 32-bit ARMv7 at the highest optimization level supported by GCC (-O3) on the following compiler versions: 

  • gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)  
  • gcc version 5.4.1 20160919 (release) [ARM/embedded-5-branch revision 240496] (GNU Tools for ARM Embedded Processors), gcc-arm-none-eabi-5_4-2016q3 

Due to the delicate nature of these software-based countermeasures, we strongly recommend that anyone adapting this code for their products carefully validate the compilation in their own build environment to ensure that the compiler is not undermining the degree of protection.

Countermeasure Concepts 

The best practices for implementing software-based fault injection mitigations tend to include the following concepts:  

  1. Performing redundant conditional checks. 
  2. Storing critical values redundantly with their complement. 
  3. Introducing random duration delays after externally observable events or before sensitive operations. 
  4. Code refactoring to avoid fail-open scenarios. 
  5. Code refactoring to replace easy-to-glitch boolean comparisons with difficult-to-glitch bitwise comparisons. 

These software-based countermeasures (CM) are expanded upon in the following sections, which illustrate how these concepts can be incorporated into a code base. 

CM-0: Failure Handling 

Many of the code examples in this blog post will invoke a fatal function. This function should halt or reset the CPU when an inconsistency is detected by redundant operations.

Such a function should be implemented appropriately to suit the platform. In many boot ROMs, fatal errors are induced using an infinite loop (while(1)), however, a simple fault injection attack could break out of such a loop, allowing firmware execution to continue after an error has been observed. In fact, breaking out of loops is one of the early ChipWhisperer tutorials.

Therefore, we recommend the following reference implementation of a fatal function that triggers a NULL dereference, assuming this will cause the program to abort. If the NULL address is mapped and valid, or a NULL dereference does not trigger a CPU reset, a different implementation that causes an immediate CPU reset should be used.

static inline void fatal() 
{ 
    // should trigger a CPU reset or crash in some way 
    *(volatile uint32_t *)0; 
} 

Finally, depending on the optimization settings, some compilers could choose not to respect the above inline hint, so this function should be thoroughly tested to ensure that is not the case. In these situations, it may be desirable to refactor the function as a macro.

CM-1: Resiliency Through Redundancy 

The main purpose of performing redundant checks is to force an attacker to successfully perform multiple successive glitches. Since the faults induced by glitching tend to be inconsistent, the likelihood of an attacker successfully inducing multiple desirable faults is lower than inducing a single desirable fault.

Furthermore, glitching multiple instructions in short succession often has different or more severe effects than glitching any of the instructions individually. This is because multiple instructions are executed simultaneously in pipelined architectures; an additional glitch affecting a particular pipeline stage of one instruction will also likely influence different pipeline stages of preceding and following instructions. Glitching of adjacent instructions would cause some instructions to be affected by multiple glitches in different pipeline stages. Performing multiple glitches affecting a particular instruction reduces the consistency of glitch results and may change the outcome compared to a single glitch.

It should be noted that instruction-level redundancy may be weakened when variable length or reduced-size instruction encodings are used, such as ARM Thumb-2. When the instruction size is smaller than the size of instruction words that are fetched, multiple instructions may be contained in a single fetch. Thus, a single glitch could cause the skipping of multiple instructions, defeating glitch countermeasures based on instruction duplication, as experimentally observed by Moro et al. The denser instruction representation of Thumb-2 was also found to increase the risk of instructions being misinterpreted as completely different instructions due to single-bit faults, compared to traditional 32-bit ARM instructions. To reduce the likelihood of single glitches affecting multiple instructions, avoid using reduced or variable length instruction size modes such as Thumb-2.

Additionally, redundancy can be applied to both memory read and write operations.

When reading security critical states from memory or registers, reads should be performed multiple times; the software should fail closed into a secure state if any of the redundant reads indicate the device should be in a secure state or if mismatches between reads are detected.

For security critical writes, redundancy can be achieved by either writing the same value repeatedly, or by performing a read after a write to verify that the write succeeded and failing closed when the read value does not match what was written. The approach of reading after a write is preferable to repeated writes, as reading allows detection of and reaction to write failures.

Redundancy is also useful in the storage of critical flags or variables. Such values in eFuses and RAM can be stored redundantly in multiple words, and in a complementary form. Redundant checks can verify that both the original value and the complement correspond to each other and fail closed to a secure state in the event of a mismatch. When storing both original values and complements for critical flags and words, an attacker would need to induce both bit-set and bit-reset faults to change the value used, since the bit polarities would be opposite for the complement. Different CPUs are more amenable to inducing bit-set or bit-reset faults, but it is difficult to induce both in similar operations by the same CPU.

CM-1-A: Redundant Memory Reading 

The below examples make use of the following macro to read a 32-bit register or memory word from an arbitrary address. The use of volatile pointers ensures that the redundant reads are actually performed repeatedly.

#define READ_REG32(addr) *(volatile uint32_t *)(addr) 

It is simplest if the value provided by READ_REG32 is used directly without storing it in a local variable. The READ_REG32 macro can be called redundantly in conditional checks.

If the value of a read register needs to be stored in a local variable, the variable must have a volatile qualifier to ensure repeated accesses to it in later redundant conditional checks are performed. Use of the volatile qualifier for local variables has the unfortunate performance-impacting side effect of forcing the variable to the stack and requiring a fetch from the stack every time the variable is used.

The following inline function can be used for copying the contents of a 32-bit register to a different memory location (such as a local stack variable) in a double glitch resistant manner. Any discrepancies will trigger a CPU fault or reset through calling the previously described fatal function.

static inline void multi_read_reg32(volatile uint32_t *dst, void *src)
{
    *dst = READ_REG32(src);
    if (READ_REG32(src) != *dst || READ_REG32(src) != *dst)
        fatal();
}

CM-1-B: Redundant Memory Writing

Glitch-resistant register or memory writes can be achieved by writing to a volatile register pointer, and then performing redundant reads from the same pointer to ensure the new register value is equal to the value that was written. This can be achieved by the following inline function:

static inline void multi_write_reg32(volatile uint32_t *dst, uint32_t val) 
{ 
    *dst = val; 
    if (*dst != val || *dst != val) 
        fatal(); 
} 

The following code shows an example of this inline function being used. Our fault model assumes that values already present in general purpose CPU registers cannot be modified by a glitch, since no fetch is involved in their usage.

void write_reg(void) 
{ 
    multi_write_reg32((uint32_t *)0x10002004, 0xB0DAC105); 
} 

CM-1-C: Redundant Conditionals

Redundancy in conditional checks can be easily achieved through logical AND and OR operators, as shown in the macros below, provided that the condition being checked involves examination of a volatile variable. For instance, the READ_REG32 macro dereferences a volatile pointer, so it can be used for redundant register reads and checks.

#define MULTI_IF_FAILOUT(condition) if ((condition) && (condition) && (condition)) 
#define MULTI_IF_FAILIN(condition)  if ((condition) || (condition) || (condition)) 

Redundant checks that should fail out (i.e., go outside the conditional path) upon a glitch should use the logical AND operator, since failing any of the three checks would disallow entering the conditional code. Likewise, redundant checks that should fail in (i.e., enter the conditional path) upon a glitch should use the logical OR operator, since passing any of the three checks would enter the conditional path.

The code sample below demonstrates usage of the fail-out and fail-in forms of the redundant conditional check macros with conditions that involve volatile variable reads.

uint16_t reg_conditional(void) 
{ 
    MULTI_IF_FAILOUT (READ_REG32(0x10002000) == 0xF00DBADD) 
        return 0x7EAF; 
    MULTI_IF_FAILIN  (READ_REG32(0x10002000) < 3000) 
        return 0xB0A5; 
    return 0; 
} 

The redundant conditional check macros can also be chained with else clauses, the same way else if is normally performed in the C language.

uint16_t reg_to_var_bitfield_conditional_chained(void) 
{ 
    volatile union ExampleRegister r; 
    multi_read_reg32(&r.value, (uint32_t *)0x10002000); 
 
    MULTI_IF_FAILOUT (r.fields.b == 0x0DBA) 
        return 0xFEA5; 
    else MULTI_IF_FAILOUT (r.fields.b == 0xC417) 
        return 0xC1E4; 
    else MULTI_IF_FAILOUT (r.fields.b == 0xB04F) 
        return 0x3B15; 
    else return 0; 
} 

CM-2: Reduce Exploit Reliability Through Random Delays

All forms of glitching rely on an attacker’s ability to inject a fault at the precise moment when a critical operation is occurring. Typically, an attacker will measure the timing as a fixed offset from initial power-on, or from some externally visible event, such as transactions on an inter-chip bus. The introduction of random delays between externally observable events and sensitive operations makes it more difficult to time the triggering of glitches.

For example, a random delay in a boot ROM that occurs immediately before a security critical operation is executed could prevent an attacker from triggering a glitch at a fixed delay following the chip coming out of reset.

However, for greater protection, additional random delays are required to prevent an attacker from timing a glitch following other externally observable events. Attackers can observe accesses to external memory (such as SPI flash or DRAM), other uses of inter-chip buses such as SPI or I2C, toggling of GPIO pins, and logging; thus, random delays are needed after such observable operations to prevent them from being used as anchor points for delay triggers.

void random_delay( void ) { 
    uint32_t random_ticks = 0; 
    random_ticks = read_HWRNG(); 
    delay_ticks( random_ticks % 0xFF ); 
} 

Of course, random delays are not without their drawbacks.

First, it is important to recognize that they have an immediate and measurable impact on the overall product boot time. It therefore becomes necessary to determine the ideal delay duration that maximizes glitch protection while also minimizing the undesirable impact on boot time. This type of optimization is often best achieved through experimentation.

Secondly, it should be noted that more sophisticated attackers can observe side channels such as current consumption and electromagnetic emissions of the CPU, and such analysis can measure the duration of random delays in software. It is difficult to defend against side-channel leakage of random delays without sophisticated hardware countermeasures.

CM-3: Code Refactoring for Fault Resistance

For effective software fault resistance, it is essential to fail closed, which means to fall back to a closed (i.e., secure) state in all plausible fault scenarios, and only enter an open (i.e., insecure) state when an unlikely set of conditions are satisfied. Examples of insecure states that can be considered failing open include skipping firmware signature checks and enabling debugging functionality. Fault-resistant software must never fail open. In fault-resistant code, it must never be possible to enter an insecure state by skipping or glitching any single instruction. The three main rules of failing closed are:

  1. Software must default to a closed state.
  2. With redundant checks, use logical OR when checking for a closed state, and logical AND when checking for an open state.
  3. Rather than using boolean variables for storing critical states, use multi-bit state variables with unlikely values representing open states, and all other values representing closed states.

Common fault-resistant code refactoring examples are described below.

CM-3-A: Closed by Default

Defaulting to a closed state means that insecure states must be the exception rather than the norm. A fault that causes state-changing instructions to be skipped should leave the system in a secure state. See the pseudocode snippets below for examples of failing open to an insecure default and failing closed to a secure default.

// BAD: failing open with an insecure default 
validate_signature = false; 
if (state != STATE_INSECURE) 
    validate_signature = true; 
  
// BETTER: failing closed with a secure default 
validate_signature = true; 
MULTI_IF_FAILOUT(state == STATE_INSECURE) 
    validate_signature = false; 

CM-3-B: Resetting to Closed State

When status variables are reused within a function, it is important to reset the variable to a closed state after it has been checked, but before it is used a second time.

In the following example, if a glitch results in an instruction skip and prevents validate_signature from being called, then success will never be updated. This allows the second success<0 check to succeed, even though signature validation has not been performed. It is therefore important to reset success to a closed state, and to declare it as volatile to prevent the compiler from eliminating the dead store.

// BAD: well-timed glitch can deny signature validation, preventing update of success variable 
int success = 0; 
success = calculate_signature(); 
if( success < 0 ) 
    fatal(); 
success = validate_signature(); 
if( success < 0 ) 
    fatal(); 
  
// BETTER: resetting success between operations 
volatile int success = -1; 
success = calculate_signature(); 
MULTI_IF_FAILIN( success < 0 ) 
    fatal(); 
success = -1; 
success = validate_signature(); 
MULTI_IF_FAILIN( success < 0 ) 
    fatal(); 

CM-3-C: Failing Closed

For redundant checks to be effective, an attacker must need to successfully glitch all the checks to enter an open state. This means that when performing redundant checks in a condition for an if statement, the logical AND operator should be used when the if statement guards an open case, and the logical OR operator should be used when the if statement guards a closed case.

Performing redundant checks with the wrong logical operator would reduce fault resistance rather than increase it, since it would create more opportunities to glitch into an insecure code path. See the pseudocode below for examples of failing open and failing closed with redundant checks. Also note that it is critical that the redundant checks not be optimized out by the compiler, hence the use of volatile in the pseudocode.

volatile uint32_t state; 
  
// BAD: glitching any one check would fail open 
if (state == STATE_INSECURE || state == STATE_INSECURE || state == STATE_INSECURE) 
    insecure_action(); 
else 
    secure_action(); 
  
// BAD: glitching any one check would fail open 
if (state != STATE_INSECURE && state != STATE_INSECURE && state != STATE_INSECURE) 
    secure_action(); 
else 
    insecure_action(); 
  
// BETTER: all three checks must pass to enter open state, else fail closed 
if (state == STATE_INSECURE && state == STATE_INSECURE && state == STATE_INSECURE) 
    insecure_action(); 
else 
    secure_action(); 
  
// BETTER: fail closed if any of the checks detect state is not insecure 
if (state != STATE_INSECURE || state != STATE_INSECURE || state != STATE_INSECURE) 
    secure_action(); 
else 
    insecure_action(); 

It is also critical that the CPU instructions resulting from the redundant checks in code also fail closed such that no single glitch can allow entry to an insecure state. See the pseudo-assembly below for an example of dangerous fail-open patterns. Note that an attacker can enter the insecure code path by skipping the b secure instruction or causing incorrect evaluation of any one of the cmp r1, INSECURE_MAGIC or be insecure instructions.

cmp r1, INSECURE_MAGIC 
be insecure 
cmp r1, INSECURE_MAGIC 
be insecure 
cmp r1, INSECURE_MAGIC 
be insecure 
b secure 
insecure: 
... 
secure: 
... 

A more effective approach would require glitching all the checks to enter the insecure code path, as shown below.

cmp r1, INSECURE_MAGIC 
bne secure 
cmp r1, INSECURE_MAGIC 
bne secure 
cmp r1, INSECURE_MAGIC 
bne secure 
insecure: 
... 
secure: 
... 

CM-3-D: Normally Closed State Representations

Boolean variables are commonly stored as 8-bit integers where zero evaluates to false, and all non-zero values evaluate to true. Such representations are very vulnerable to glitching, as faults that cause words to be read as all zeros, and faults that cause one or more bits to be erroneously set, are both easily achievable. For fault resilience, boolean security states should instead be stored as integers where most values represent a closed/secure state, and a single specific binary value containing a mixture of ones and zeros represents an open/insecure state. This is much more resistant to fault injection because while it is feasible to induce faults that set read words to all zeros, all ones, or randomly corrupt bits, it is difficult to cause a specific value with a mixture of ones and zeros to be induced by a glitch.

This approach of storing booleans as integers where a specific unlikely magic number represents the open state (i.e., normally closed state representations) can be used in conjunction with or as an alternative to the previously discussed approach of storing the complement of state variables in addition to the original value. The approach of storing complements is actually a form of normally closed state representation, since 3 of the 4 states representable by a bit and its expected complement bit should fail closed. In the simplest scheme of storing a bit with its complement, binary state 01 can mean closed, and 10 can mean open, and invalid states values of 00 and 11 should default to failing closed.

One key benefit of storing values together with their complement is that both bit-set and bit-reset faults would need to be induced to enter an insecure code path. If the usual closed value of a normally closed state representation is all ones or all zeros, then bit-set or bit-reset faults alone could hit the magic value for the open state. To protect against this, a normally closed state representation should represent both the open state and normal closed state with magic numbers containing a mixture of ones and zeros. See the following pseudocode snippet for an example.

#define STATE_OPEN   0xCAFE 
#define STATE_CLOSED 0x10AF 
  
#define FUSED   0xDEAD 
#define UNFUSED 0xA3B5 
  
volatile uint16_t state; 
 
MULTI_IF_FAILIN(production_fused() != UNFUSED) 
    state = STATE_CLOSED; 
else 
    state = STATE_OPEN; 
  
... 
  
// a combination of bit-set and bit-reset faults would be needed to change 
// STATE_CLOSED to STATE_OPEN 
MULTI_IF_FAILOUT(state == STATE_OPEN) 
    insecure_action(); 

Note that boolean return codes for functions are also values that are at risk of glitching, even if they are not stored in a C language variable. Thus, functions that return security-critical booleans should be refactored to instead return integer values with different magic numbers representing open and closed states, but with all other values also failing closed. In the prior pseudocode snippet, the production_fused function should return FUSED when production fuses are blown, and UNFUSED when they are not. This follows the principle of normally closed state representations. While all return values other than UNFUSED fail closed, the specific FUSED value is returned in the closed case so that a combination of bit-set and bit-reset faults would be required to change the value to UNFUSED.

Caveat: Constant Loading

The C language does not provide a reliable and optimization-safe mechanism of adding redundancy to the loading of constants and literals. Though, there may be an architecture-specific method to achieve this using inline assembly to achieve glitch resistant redundancy when loading constants into registers from the text section. However, for the purposes of this blog post, NCC Group did not create a convenient, reliable, and optimization-safe macro or inline function for redundant glitch-resistant constant loading.

Unfortunately, a lack of redundancy when loading constants diminishes the benefits of redundancy added to other aspects of the code. Constants (also known as literals) are used throughout most code bases for purposes such as specifying the addresses of registers to load and specifying values for comparison against. Consider the following simple program employing redundant checks of a peripheral register at address 0x10002000 against a magic value of 0xF00DBADD:

#include <stdint.h> 
  
int test(void) 
{ 
    MULTI_IF_FAILOUT(READ_REG32(0x10002000) == 0xF00DBADD) 
        return 1; 
    return 0; 
} 

The assembly resulting from the compilation of this code with full optimization is as follows:

00000000 <test>: 
   0:   e59f3034        ldr     r3, [pc, #52]   ; 3c <test+0x3c> 
   4:   e59f1034        ldr     r1, [pc, #52]   ; 40 <test+0x40> 
   8:   e5932000        ldr     r2, [r3] 
   c:   e1520001        cmp     r2, r1 
  10:   0a000001        beq     1c <test+0x1c> 
  14:   e3a00000        mov     r0, #0 
  18:   e12fff1e        bx      lr 
  1c:   e5931000        ldr     r1, [r3] 
  20:   e1510002        cmp     r1, r2 
  24:   1afffffa        bne     14 <test+0x14> 
  28:   e5930000        ldr     r0, [r3] 
  2c:   e0403001        sub     r3, r0, r1 
  30:   e2730000        rsbs    r0, r3, #0 
  34:   e0a00003        adc     r0, r0, r3 
  38:   e12fff1e        bx      lr 
  3c:   10002000        .word   0x10002000 
  40:   f00dbadd        .word   0xf00dbadd 

The first two instructions load the register address 0x10002000 into r3, and the constant 0xF00DBADD into r1. Note that these constants are fetched only once. A single glitch of either of these fetches from memory would later cause redundant reads of the wrong peripheral register, or redundant comparison against the wrong magic constant. Thus, while loading and comparison of the peripheral register is done three times, the benefits of the redundancy are diminished by the single glitch vulnerability of the constant loads.

While certain desired transformations to the code can be achieved with greater usage of volatile variables, the issue of there being a single point of failure when constants are loaded from the text section tends to remain. Furthermore, even if constant loading were made redundant through clever usage of inline assembly, there remains the issue that constants loaded into a register may get moved onto the stack at a later point by the compiler, and then read from the stack back into a general-purpose register in a manner that introduces yet another single point of failure. Explicit Register Variables may prevent the issue of data being put on and off the stack in a glitchable manner.

In the absence of a reliable way to load constants in a glitch-resistant manner, NCC Group recommends the insertion of random delays in code before security critical constants are loaded into registers.

In The Next Chapter…

At this point, it should be clear that seemingly well-written and secure code that is free of “classic” security defects (i.e., memory safety issues) can be easily glitched if it has not been intentionally and carefully designed to fail closed. Unfortunately, the majority of software has not been written with this degree of redundancy. This is especially problematic for low level firmware, such as a boot ROM or Trusted Computing Base (TCB), whose security posture underpins the entire platform, and where patches are infrequent, difficult or impossible.

In the next post, we will discuss a variety of alternative countermeasures.