An Introduction to Fault Injection (Part 1/3)

Authors: Jeremy Boone, Sultan Qasim Khan

Though the techniques have existed for some time, in recent years, fault injection (FI) has emerged as an increasingly more common and accessible method of exploitation. Typically requiring physical access, an attacker can momentarily tamper with a processor’s electrical inputs (e.g., voltage or clock). By violating the safe ranges of these operating parameters, a fault can occur within the processor, which can result in skipped instructions or corrupt memory transactions. If hardware or firmware is not intentionally designed to compensate for fault injection attacks, an attacker may be able to leverage a fault to undermine critical platform security functionality, such as secure boot.

This blog post is the first in a series on the topic of fault injection, also known as glitching. This first post covers the basic principles of fault injection – types of glitches, their effects, and how an attacker can characterize hardware and firmware to achieve a successful glitch. In later posts we will describe how fault injection weaknesses typically manifest in software, and how they can be mitigated at both the software and hardware levels. 

Types of Glitches 

The main types of glitches used in fault injection attacks are voltage, clock, electromagnetic, and optical. Voltage glitching is performed by momentarily dropping supply voltages during the execution of specific operations. Clock glitching is performed by altering clock timing to violate setup and hold time requirements of the hardware, such as through the insertion of clock glitch pulses in between normally timed clock pulses. Electromagnetic fault injection (EMFI) is performed by generating a localized short-duration high-intensity electromagnetic pulse that induces currents within internal chip circuitry. And finally, optical fault injection uses a infrared laser, and usually requires chip decapsulation to expose the silicon die.

Many classes of fault injection attacks require hardware modification of the target device. For example, voltage glitching attacks usually require the removal of power filtering capacitors, which grants the attacker more fine-grained control over the processor’s voltage input.

All glitching attacks require the use of an apparatus whose purpose is to accurately inject the fault condition. Specialized tools such as NewAE’s ChipWhisperer and ChipShouter, or Riscure’s Spider, are commonly used to conduct fault injection attacks, though, an attacker may also build their own custom glitching harness. The cost of such tooling for FI attacks has continued to decline, and today, attacks are feasible with under $100 worth of equipment.

In some cases, it is possible to conduct fault injection attacks through software alone without the need for hardware modification. For example, software-controlled voltage regulators and clock sources are present in many System-on-Chips (SoCs) that have multiple power and clock domains. Examples of these types of attacks would include CLKSCREW and Plundervolt. Software-controlled voltage regulators and clock sources are typically used for Dynamic Voltage and Frequency Scaling (DVFS) to save power. However, interfaces that control these power management features must not be exposed to low privilege software and must enforce thresholds to prevent malicious software from inducing a fault.

Effects of Glitching 

The results of glitches cannot always be predicted, however, they do typically manifest in the form of skipped or repeated CPU instructions, incorrect evaluation of CPU instructions, or corrupt reads from memory devices. Glitching attacks can simultaneously impact all stages of a CPU pipeline, including instruction fetch, decode, execution, memory accesses, and writeback. Overall, the potential effects are listed below: 

Fault Type Potential Effects 
Skipping of instructions Typical targets for instruction skips are branch or compare instructions. By skipping them, the affected software can be coerced into “failing open” or executing unintended or insecure code paths. 

This type of fault can occur with or without repetition of the prior instruction.  In general, single instruction skips are easily achievable, though skips of multiple nearby instructions are more difficult to induce and control. 
Incorrect data fetch Leads to uncontrolled subset of bits in the read word being flipped, or the entire word being read as all zeros or ones. When security-critical data is corrupted in this way, the affected software may fall back to a less secure state. 

Bit-reset faults change read bits from one to zero, whereas bit-set faults change read bits from zero to one.

In general, single-bit flips are easier to achieve than multiple bit flips, though it is difficult to control which bit is flipped. It is difficult for a glitch to change a read word to a specific value other than all ones or zeros. 
Incorrect instruction fetch or decode Leads to evaluation of the wrong instruction by the ALU and later pipeline stages.

Certain instructions, typically those consisting of mostly ones or zeros in their undecoded or decoded form, may be easier to trigger through glitching than an arbitrary specific instruction consisting of a combination of ones and zeros. 

It is difficult for a glitch to change a fetched or decoded instruction to an arbitrary specific instruction, since it is difficult to control which bits are flipped by a glitch. 
Failed writeback This can cause the register or memory state to not be updated based on the executed instruction. 

Often times, fault injection attacks are described in terms of their spatial or temporal precision. That is, the ability for the glitch to target a specific element within the processor. Many times, varying the glitch parameters such as pulse timing, duration, and voltage can aid in controlling which aspects of the CPU pipeline are most affected, resulting in different glitch outcomes. 

With electromagnetic and laser-based optical fault injection, a high degree of spatial and temporal precision is achieved by varying the location on the chip at which an electromagnetic or laser pulse is applied (e.g., the X-Y coordinates in the silicon package). Such localized attacks can more effectively target specific pipeline stages than global approaches such as voltage and clock glitching, providing greater flexibility. 

On systems with multiple power and clock domains, it is often possible to perform targeted fault injection against specific peripherals or CPU cores. For instance, glitches may target power domains associated with SRAM, DRAM, EEPROM, or eFuses to corrupt specific fetches or prevent specific writes without otherwise influencing other CPU operations. Glitches could also target security critical cores such as cryptography accelerators. 

Timing Constraints 

All forms of fault injection require precise timing to target specific operations of interest without having an excessive amount of undesirable side effects, such as causing the entire chip to reset. This careful timing is normally achieved by applying the glitch at a preset delay following a trigger point. For example, FI attacks against boot ROMs are most commonly performed at a preset delay following the target chip coming out of reset, since execution timing tends to be more-or-less deterministic within the processor’s early boot code.

However, when there is variability in execution timing, either due to dependence on external peripherals with inconsistent timing (such as storage devices), or due to random delays intentionally introduced to thwart attackers, a trigger other than reset is required for accurate glitch timing. Common alternate glitch triggers are based on observing patterns in areas such as external communication buses, GPIO pins, logs, current consumption, and electromagnetic radiation emissions.

These two types of triggers can be illustrated by some public research. For example, a fixed delay trigger was used by Chris Gerlinsky in a voltage glitch attack against the NXP LPC family of microcontrollers, and an eMMC based trigger was used by NCC Group in our prior research publication titled There’s a Hole in Your SoC: Glitching the MediaTek Bootrom

Characterization 

When developing a reliable fault injection exploit, an attacker must first characterize the effects of voltage and clock variation on the processor. This can be illustrated by the contrived shmoo plot below.

Essentially, the attacker aims to identify and map the yellow boundary region. This region resides just outside the normal operating range (green area) for the processor, but not so far outside the accepted operating parameters that the processor is caused to reset (red area), or otherwise fail into a safe state. In this unsafe range, the processor may be subject to partial faults wherein the chip misbehaves (e.g., skipped instructions), but otherwise continues operating. The process of determining this region is often referred to as “characterization”.

Next, the attacker must find a suitable vulnerable target operation within the firmware. That is, they must identify a specific security-critical event that they wish to bypass via an accurately timed fault. The first step usually requires reverse engineering of the target chip’s firmware to understand which security operations are performed. A typical candidate target is signature validation of the next execution stage.

Once the target operation has been identified, the attacker needs to determine when this sensitive action is actually executed by the chip. Overall, this can be a time-consuming iterative process, and although some science can be applied, it usually requires a hefty dose of brute force. That is, the accurate timing of the target operation might be revealed via something elegant like power side-channel analysis, or it could be revealed by repeatedly resetting the target while also varying the glitch pulse timing and duration, while also observing the effects on the target firmware. In practice, this process of characterization can take hours, days or even weeks.

Typical Glitching Objectives 

The goal of a fault injection attack is to undermine some security-critical responsibility within low level platform firmware. Most times, secure boot functionality is targeted by fault injection, as the goal is to enable an attacker to execute arbitrary code. Though, there are several other common targets for fault injection. The usefulness of these targets tends to vary with the product threat model: 

  • Bypassing cryptographic signature validation of firmware binaries. This is the most common form of fault injection attack and has been demonstrated many times over, such as in NCC Group’s earlier research titled There’s a Hole In Your SoC.
  • Circumventing memory readout protection to expose sensitive data that resides within a microcontroller’s internal flash memory. An example of an attack that bypasses read protection is the chip.fail exploit from 2019.
  • Re-enabling hardware debug functionality on production/fused processors. An example of this would be the nRF52 debug ressurection attack by LimitedResults.
  • Bypassing input validation of data that crosses trust boundaries between privilege levels. One such example would be the Qualcomm TEE memory range validation bypass performed by Raelize.

Stay Tuned… 

In the next installment of this blog series, we will build upon this brief introduction to glitching, by explaining how fault injection attacks can be mitigated using software-based countermeasures.

Editors Note: Originally, this post mistakenly described optical fault injection as using UV a laser. This has since been corrected to infrared.