Hardware Security By Design: ESP32 Guidance
Within the Hardware and Embedded Systems practice at NCC Group, some engagements with clients are early in the design phases of a product. In other cases, however our first interaction occurs late in the development cycle, once a product has been designed, implemented, and functionally tested. While assessments performed at this time can help identify vulnerabilities and reasonable mitigations, certain security-impacting design decisions have already been set in stone and are not easily or cheaply changed. This is especially true as it relates to component selection, hardware design, and configuration. Security by design sets the foundation for these crucial controls to be appropriately implemented.
More often lately, System on a chip (SoC) vendors make hardware security features available that can and should underpin the security of products built on their platforms. In this blog post, NCC Group considers some of these features and the corresponding product design decisions for the current generation of Espressif’s ESP32 microcontrollers. The ESP32 is a platform commonly used in the broad span of IoT products that NCC Group assesses regularly and one that has continued to develop its hardware and SDK security features over multiple product generations.
Espressif provides ample documentation about these features, but how they translate to product security best practices is worth further discussion. Much of this discussion focuses on specific configuration details of the ESP32 family of microcontrollers and the recommended best practices associated with those details, but many broad recommendations included here apply to secure hardware designs in general.
ESP32 Best Practices
The importance of any given best practice depends on a variety of factors, some of which can be quite specific to the threat model of a product. With that caveat, since this is a generalized discussion, a general ordering of importance is applied here.
Secure boot is an important security feature that prevents an attacker from tampering with firmware to execute arbitrary code. By enabling secure boot, the firmware image integrity and authenticity is verified against customer keys that were programmed during manufacturing. The nuanced details of how this integrity is validated matter.
The ESP32 supports two versions of secure boot.
Version 2 (V2) relies on RSA-PSS to verify the bootloader and application image at boot time before execution.
There is also support for App Signing Verification, the install-time verification of applications received over-the-air. Espressif acknowledges that there may be cases where this is warranted, but NCC Group strongly discourages the use of this feature as it precludes the use of boot-time verification.
Secure boot is very simply enabled by programming a properly configured bootloader. The idf.py menuconfig tool allows for this configuration and flashing. Once a properly configured bootloader, formatted partition table and signed application are flashed in this manner, secure boot will automatically be enabled.
But “simply” is not necessarily an apt word choice when discussing secure boot configuration. Enabling secure boot on any SoC involves a well-established process to burn eFuses, provision keys and load an initial signed image at manufacturing time. The image signing process too must ensure the cryptographic hygiene of the signing keys and the provenance of the images being signed. The management of the firmware signing key (that is, its generation, storage, usage, and rotation) are all important considerations that NCC Group has previously discussed at length.
Some SoC vendors, Espressif included, can enable secure boot prior to shipment to the device manufacturing facility, mitigating some threats to this process. This requires the public signing key and a signed initial image to be provided to Espressif, and confirmation at manufacturing time that the process was performed as expected. Disabling UART download mode may be postponed to later in manufacturing to mitigate the risk of any over-the-air update failure, but this too should be explicitly verified. Product manufacturing processes depend on several factors related to security, cost, and logistics, among other considerations. Contract manufacturing facilities or an OEM’s own facilities may be able to provide similar capabilities that provide more flexibility, but critically, this step should be performed in a trusted environment.
What may be the most obvious configuration step is also one of the most critical. Too often, features that facilitate development and debugging make their way into production units, which can then later be used by attackers.
Hardware debug capabilities should be disabled in production devices to prevent a physical attacker from attaching a debugger, which may allow them to read/write SRAM and flash memory, exposing secrets or undermining secure boot.
On the ESP32, because JTAG debugging is implicitly disabled by the bootloader when secure boot or flash encryption are enabled, enabling these features as recommended does not require further action configuration to meet this recommendation.
However, it is worth noting that the CONFIG_SECURE_BOOT_ALLOW_JTAG configuration will circumvent this behavior, and so it is important to ensure that it is not changed from its default, cleared state in the project configuration. Similarly, the recovery ROM debug console provides an extensive set of debug capabilities that absolutely should be disabled by setting CONSOLE_DEBUG_DISABLE.
Many microcontrollers provide some amount of internal flash, distinct from an external SPI flash chip that may also be incorporated into a product design for additional persistent storage. Internal flash cannot be accessed via board traces or pads and so often represents a more suitable location for the storage of sensitive user data or code. Any such data stored in external flash must have suitable cryptographic controls to meet its associated confidentiality and integrity requirements.
All user flash used by the ESP32 is off-chip, providing a SPI interface that sophisticated attackers may access to read or modify flash contents. Furthermore, most application code is executed-in-place on flash.
The flash encryption feature, if enabled and configured appropriately, provides a measure of security for a variety of assets (including code) that are stored on the ESP32 flash, mitigating potential attacks that aim to expose the contents of memory.
The ESP32 supports flash encryption for its off-chip SPI flash component, providing confidentiality to any sensitive assets stored therein. The nuances and limitations of this feature, many of which are documented by Espressif, are important to note.
Importantly, the flash encryption provided by the ESP32 is based on AES ECB mode with a “tweaked” key per block like AES XTS, a common primitive used in disk encryption. Although the scheme provides confidentiality, it offers no guarantees of data integrity. Modification of the ciphertext stored in flash will decrypt successfully in the absence of any other validation of the data.
Depending on what is stored in flash and how that data is handled by firmware, this may allow an attacker to impact the behavior of the device, for example by exploiting a memory safety vulnerability or causing the device to fall back into a re-initialization state.
Similarly, any code read or executed from flash after secure boot has run would be subject to similar tampering if the SPI flash interface is physically accessible. It is admittedly difficult to modify a ciphertext such that it decrypts to a binary capable of successfully running without extensive effort. In association with data integrity, an attacker with access to SPI flash may be able to rewrite a block with a known previous ciphertext to impact device behavior. Replay attacks of this nature may effectively allow a downgrade of data stored in flash (a whitelist or trusted certificate authority for example) to a previous, vulnerable version.
Block Size Disparity
The limitations of AES ECB are present in this encryption scheme despite the varying tweak described further below. Because AES uses 16-byte blocks, and the tweak is incremented for every 32 bytes, adjacent block-pairs are encrypted with the same key using AES ECB. If the content of these pairs is identical, the ciphertext too will be identical, which may reveal meaningful information to an attacker. An attacker with knowledge of the device’s flash layout may be able to determine the number of empty flash blocks based on the number of identical block-pairs, which may allow inference of sensitive information pertaining to the device or its users.
Fault Injection Attacks
In both 2019 and 2020, methods were disclosed to bypass flash encryption. With physical access, an attacker may be able to bypass the read protection on the flash encryption key, allowing them to decrypt flash contents. While these may be mitigated in newer hardware, the most recent public recommendation includes storing sensitive data in flash no longer than is necessary.
Additionally, to mitigate any potential attacks against assets stored in encrypted flash, the ESP32 allows unique encryption keys to be generated and provisioned to each device by default. This detail should be highlighted since unique and unexposed flash encryption keys are often an effective control, limiting the impact of a single compromised key and mitigating fleet-wide attacks. The ESP32 also supports a host-generated key, but the documentation astutely notes that this is not recommended for production.
The algorithm to encrypt flash blocks involves the XORing of the block index with certain bits of the root key. NCC Group notes that this is distinct from a typical tweaked cipher such as AES XTS, though even established ciphers have known weaknesses in disk encryption. These distinctions and weaknesses do not necessarily undermine the basic goal of this flash encryption scheme to protect the confidentiality of flash-persisted data, but, may provide further support for caution when using flash encryption as a broad security control.
Finally, NCC Group noted that the bits used for the tweak, that is the bits that are XORed with the root key, are configurable via the FLASH_CRYPT_CONFIG eFuse. If all bits of this eFuse are cleared, there will be no tweak of the root key over the entire flash space, effectively reducing the encryption scheme to AES ECB. An attacker with access to flash may, in this case, exploit the shortcomings of AES ECB to determine which blocks are alike, determine the contents of flash blocks with a chosen plaintext attack, or perform replay attacks more easily. NCC Group notes that while this is an optional eFuse configuration that should be avoided, the bootloader explicitly disallows it without explicit intervention, shown here.
/* CRYPT_CONFIG determines which bits of the AES block key are XORed with bits from the flash address, to provide the key tweak. CRYPT_CONFIG == 0 is effectively AES ECB mode (NOT SUPPORTED) For now this is hardcoded to XOR all 256 bits of the key. If you need to override it, you can pre-burn this efuse to the desired value and then write-protect it, in which case this operation does nothing. Please note this is not recommended! */ ESP_LOGI(TAG, "Setting CRYPT_CONFIG efuse to 0xF"); new_wdata5 |= EFUSE_FLASH_CRYPT_CONFIG_M;
Flash Encryption Recommendations
Per Espressif’s guidance, configure the bootloader to generate the flash encryption key on the part, unique per device.
Consider the sensitivity of assets stored within flash. While the effort to compromise this data may be more difficult with flash encryption, two considerations are of note:
• If authenticity of data stored in flash is required, and this data is not part of signed and version-controlled firmware, additional measures to protect against replay and ciphertext tampering must be implemented. This is not likely a requirement of a stored Wi-Fi password but may be for a SSID for example.
• If the data is of a sufficiently high value, such as a secret key common across all devices, the above-described attacks or similarly complex variants may be employed to extract it. This aligns with reasonable security best practices and Espressif’s specifically recommended mitigations to the described fault injection vulnerabilities that all secrets stored in device flash be unique per-device.
Lastly, use an NVS partition to store sensitive data. This provides some mitigation of the shortcoming associated with adjacent blocks being encrypted with the same key. In cases where a sensitive portion of flash requires frequent rewrites, designate a writable NVS partition dedicated to this purpose.
Boot Modes and UART capability
Without the appropriate eFuse configuration, the UART provides a mechanism to both update ESP32 firmware and to exercise tooling that can read sensitive device information.
The ESP32 provides the option to specify the source from which firmware will be loaded via the logic level of some GPIOs.
In ESP32 ECO V3 parts, a new eFuse UART_DOWNLOAD_DIS was added to disallow the DOWNLOAD_BOOT mode. This mode effectively provides flash access over UART. While secure boot should disallow any untrusted firmware from being loaded, restricting this access further limits any potential vulnerability. More importantly, this restricts access to eFuse that would normally be available by espefuse.py and accompanying UART commands that it uses. This provides a specific capability in cases where eFuse BLK3, available for user applications, is used to store an immutable device secret. Without this mechanism, the typical eFuse readout protection would be the only other chip-level method to prevent UART access to such a secret, but this would render it unreadable to software as well.
The UART_DOWNLOAD_DIS configuration should be set on production devices, shown here.
gt; espefuse.py dump -p /dev/cu.SLAB_USBtoUART Connecting...esp32r0_delay... False .. Detecting chip type... ESP32 BLOCK0 ( ) [0 ] read_regs: 00000000 2aec0d28 00cca803 000 0a200 00001535 00100000 00000004 BLOCK1 (flash_encryption) [1 ] read_regs: 00000000 00000000 00000000 000 00000 00000000 00000000 00000000 00000000 BLOCK2 (secure_boot_v1 s) [2 ] read_regs: 00000000 00000000 00000000 000 00000 00000000 00000000 00000000 00000000 BLOCK3 ( ) [3 ] read_regs: 00000000 00000000 00000000 000 00000 00000000 00000000 00000000 00000000 espefuse.py v3.0 gt; espefuse.py burn_efuse UART_DOWNLOAD_DIS 1 -p /dev/cu.SLAB_USBtoUART Connecting...esp32r0_delay... False .. Detecting chip type... ESP32 espefuse.py v3.0 The efuses to burn: from BLOCK0 - UART_DOWNLOAD_DIS Burning efuses: - 'UART_DOWNLOAD_DIS' (Disable UART download mode (ESP32 rev3 only)) 0b0 - 0b1 Check all blocks for burn... idx, BLOCK_NAME, Conclusion  BLOCK0 is not empty (written ): 0x0000000400100000000015350000a20000cca8032aec0d2800000000 (to write): 0x00000000000000000000000000000000000000000000000008000000 (coding scheme = NONE) . This is an irreversible operation! Type 'BURN' (all capitals) to continue. BURN BURN BLOCK0 - OK (all write block bits are set) Reading updated efuses... Checking efuses... Successful >; espefuse.py dump -p /dev/cu.SLAB_USBtoUART Connecting...esp32r0_delay... False .....esp32r0_delay... True _____esp32r0_delay... False .....esp32r0_delay... True _____esp32r0_delay... False .....esp32r0_delay... True !@! .... forever: Download mode is locked down and the chip is no longer accessible
Anti-rollback is an important feature that prevents an attacker from “updating” a device with an older firmware image version. An attacker may wish to downgrade firmware if the older version happens to contain known vulnerabilities that can be easily exploited. Such scenarios are advantageous to an adversary that wishes to compromise the device to expose the sensitive contents of memory or to achieve code execution.
As part of the over-the-air update process the ESP32 supports a mechanism for rollback prevention, in which a secure_version field within an application is compared against that stored in eFuse. Because of the nature of the eFuse value used, secure_version is limited to 32 values. It should be noted that if firmware updates are distributed over an authenticated channel, the threat of downgrading to a vulnerable version, mitigated by rollback prevention, requires the existence of a vulnerability in this OTA functionality or physical access to the device flash. This threat is therefore generally considered to be minimal risk. In cases where signed firmware is distributed over an insecure channel or from a user application that may be more easily exploited, the threat of this downgrade is more likely.
Enabling rollback prevention per Espressif’s documentation involves configuring the bootloader with the CONFIG_BOOTLOADER_APP_ANTI_ROLLBACK option and calling esp_ota_mark_app_valid_cancel_rollback() at some reasonable time after boot to establish the currently running firmware as stable, preventing rollback thereafter without risking a denial of service in the event of an unstable firmware update.
As part of a software maintenance process, some criteria to increase the secure_version of application updates that aligns with the release schedule and product lifetime of these updates should be established. For example, assuming a quarterly release cycle and a product lifetime of 10 years, this may involve increasing this value on alternating releases, allowing some flexibility to increase it in hot-fix scenarios where a known vulnerability must be patched and disallowed from running.
If a 32-value range is determined to be insufficient, a secondary, weaker form of rollback prevention may be implemented at install time, in which a signed version header or timestamp is first validated against that of the currently installed application, disallowing downgrade through this flow. In the event of a critical vulnerability to the application or this check in particular, the bootloader-based rollback prevention may instead be used by incrementing the secure_version field. The secure_version field should be incremented at some reasonable minimum cadence regardless of the details of this design.
Secure Factory Reset
Devices may store sensitive data long after they have been decommissioned, allowing vulnerabilities discovered later to be exploited if obtained.
While regulations and definitions pertaining to the handling of personal information exist, the sensitivity of stored data can vary significantly by a user’s treatment of this data is in many ways outside of the scope and control of the OEM, as is the potential impact in the event of its exfiltration. Similar can be said of any other user data that may be potentially sensitive. A reset mechanism that erases these sensitive assets provides an option to the customer to incorporate it into any processes where their sensitive data stored on these devices may be unnecessarily exposed.
In the case of the ESP32, the flash erase API is straightforward and does not distinguish between secure erasure of blocks and deletion or invalidation of individual portions, but regardless of the implementation, explicitly overwriting stored data with zeroes is prudent. If, however, NVS is used, past entries are stored even if updated with more recent values, and so the entire NVS partition should be erased.
As part of product design, it is worthwhile to enumerate all potentially sensitive user data stored in device flash. This may include device logs, state information, mobile device pairing information, and authentication keys or tokens. Implement a mechanism to allow the customer to overwrite this data as part of a decommissioning process. While allowing this to be done remotely may be convenient, this should ideally be possible in scenarios where the device is in some unrecoverable state, thus encouraging the implementation of factory reset capability via a physical button.
Finally, it is important to test this functionality by dumping flash using ESP tools after performing this erasure, ensuring that even unused partitions (for example as part of a multi-partition firmware update “A/B” scheme) are similarly erased.
Finally, one fairly core requirement of IoT devices is the initial establishment of Wi-Fi credentials. These credentials should be secured at rest once on the device, as discussed in the Flash Encryption section above. The secure transmission of the credentials is another important consideration. In the absence of a complete user interface on the device, some out-of-band key exchange between an initializing user application and the device is necessary to accomplish this.
Espressif provides a couple options for this out-of-the-box, using either SoftAP mode that strictly relies on Wi-Fi, or BLE.
With respect to SoftAP mode, it is important to set the wifi_prov_security argument to WIFI_PROV_SECURITY_1, as shown in Espressif’s sample implementation.
In addition to protecting the Wi-Fi credentials themselves, it is further important to consider the circumstances that the device can enter the provisioning mode to connect to another network. The threats associated with this consideration will vary, but often, physical access to the device to factory reset it or otherwise force a reprovisioning state is a reasonable method. Still, this does not necessarily address physical threats in a multi-tenant environment such as an office building, hotel, or outdoors, and so additional consideration is warranted to ensure that the device is connected to a trusted network even when that network is unavailable or must be changed.
The above discussion covers some of the main considerations that should be established early in the design of any ESP32-based product. Provisioning a root of trust, determining how and where to store secrets, and configuring eFuses are all critical to product security and not easily addressed late in the release cycle or in-field.
The focus of this discussion is specific to the ESP32, so more general topics like trusted device identity and secure communication with other endpoints are not covered. They are however similarly critical in embedded device security and warrant similar prioritized consideration. Furthermore, many of the topics discussed above apply to the general best practices of embedded device security regardless of SoC choice. The details however will vary in their design, level of support, and public transparency. Few of these topics are straightforward, and it has taken generations of iteration for such features to be made available and hardened by SoC vendors. In cases where the SoC vendor does not provide guidance on how to address a particular security requirement, it is often left to the OEM to implement in firmware, which can present a greater likelihood of vulnerability due to the comparative lack of oversight and scrutiny that product-specific designs and implementations are given.