Back

Paul Bottinelli

Cryptography

North American Research

Research

April 17, 2020

7 mins read

How cryptography is used to monitor the spread of COVID-19

On April 10, Apple and Google announced^{1, 2} that they were joining forces in an effort to help reduce the spread of COVID-19. Their solution leverages Bluetooth technology to trace interactions between individuals. This principle is known as contact tracing and public health agencies are heavily relying on it to monitor and prevent the spread of the virus. But the introduction of tracking or similar technologies can be detrimental to individual privacy. Fortunately, the solution proposed makes use of interesting cryptographic constructions to trace individuals without sacrificing privacy. This blog post will showcase what and how cryptographic primitives are used in this real-world scenario. Note that the description provided below is based on the initial proposal³, which might be subject to modifications in the future.

Key Idea

Contact tracing is the process of identifying and monitoring which members of a population have had contact with an infectious person. While listing all the people one has met in the past may seem like an impossible task, there are interesting ways to address it by leveraging the computing power of our smartphones. The key idea is that users’ phones broadcast identifiers, which are then recorded by nearby receiving devices. If a user tests positive for the virus, the list of past identifiers broadcasted by that user is disclosed to all participants in the system. Upon reception of that list, users will check whether they have seen one of these IDs in the past, meaning that they were in close range from an infected person.

The design of the contact tracing proposal incorporates interesting elements of privacy by design. For instance, the identifiers are pseudonymous and are rotated at regular intervals in order to prevent long-term tracking of devices (on average every 15 minutes in the current proposal). Furthermore, the system does not make use of any geolocation data. It exclusively broadcasts lists of privacy-preserving identifiers, which are then treated on the participants’ devices.

Presentation of the scheme

The contact tracing proposal allows users to opt in to the system, using their smartphones running an authorized application. Additionally, a Diagnosis Server, which collects and distributes the lists of IDs has to be set up. The server could be run by different organizations, such as public health agencies, etc.

Let us now describe how the scheme works in practice. Suppose that Alice has just opted in to the system. Her application computes a 32-byte Tracing Key (TK) as the output of a Cryptographically-Secure Pseudorandom Number Generator (CSPRNG):

TK = CSPRNG(32).

As opposed to a Pseudorandom Number Generator (PRNG), a CSPRNG comes with certain guarantees regarding the quality of the randomness it produces and its security in the event of a compromise, making it suitable for cryptographic purposes.

Then, for a given day d_i, Alice generates a Daily Tracing Key (DTK) based on her Tracing Key and d_i. This daily key is a 16-byte key computed with the Key Derivation Function HKDF⁴ (a construction based on HMAC, see below) using SHA-256 as the underlying hash function:

DTK = HKDF(TK, “CT-DTK” || d_i ).

A KDF allows users to derive cryptographic keys from an initial key. An established Key Derivation Function like HKDF comes with important security properties ensuring that the keys obtained are cryptographically strong, provided that the underlying hash function is secure. Note that the security of the KDF also inherently depends on the security of the initial key, which is why a Cryptographically-Secure PRNG must be used in the first place.

The value d_i, the Day Number, corresponds to a unique value assigned to the current day, counted from the Unix Epoch Time⁵, the number of seconds since January 1, 1970 UTC. For example, April 16, 2020 corresponds to the value 18368. The Day Number is therefore public and it is the same for all users.

Once Alice has generated a Daily Tracing Key, she can start computing the Rolling Proximity Identifiers (RPI), which are the privacy-preserving IDs we alluded to earlier. These RPIs are computed using HMAC⁶ (a Message Authentication Code with SHA-256 as the underlying hash function), and Alice’s current DTK as the key. The output is then truncated to 16 bytes, as follows:

RPI = Trunc(HMAC(DTK_i, “CT-RPI” || t_j ), 16).

Similar to the HKDF used above, HMAC also provides strong security guarantees based on the security of the underlying hash function. In this case, HMAC is used as a keyed hash function, and the important property is that possession of the DTK is necessary to compute the associated RBI.

The value t_j, the Time Interval Number, is an integer corresponding to the index of a 10-minute range in a given day, referred to as the Time Interval. Since a day is composed of 144 such 10-minute intervals, the value of t_j is an integer in the range [0, 143]. To prevent linkage of multiple Rolling Proximity Identifiers with a given Bluetooth MAC address, new RPIs are generated every time the Bluetooth address changes.

We now follow Alice going about her daily life (in a socially-responsible manner given the state of the pandemic) broadcasting Proximity Identifiers as Bluetooth Advertisements to nearby participants. Suppose for example that on a given day and at Time Interval t_j, another user participating in the system, Bob, is in close proximity to Alice. In this simple scenario, we consider Alice as the sender and Bob as the receiver (even though in real-life parties would be broadcasting and recording). Bob sees the RPI broadcasted by Alice (in green in the following diagram) and stores it on his device.

Alice broadcasts a Rolling Proximity Identifier which is recorded on Bob’s device.

Every time a nearby participant advertises a new Rolling Proximity Identifier, Bob adds it to his list of received RPIs. Let us now introduce Charlie, a third participant who, at a later time, also comes in close proximity to Bob. At that point, Alice is broadcasting a new RPI (in orange) reflecting the current Time Interval, and Charlie is also broadcasting his current RPI (in blue, different from Alice’s, since it is based on his own Daily Tracing Key).

Alice and Charlie both broadcast identifiers at a later time, which are added to Bob’s records.

Even in our simple scenario with two advertisers, Bob is not able to link Alice’s current (orange) RPI with the one she broadcasted earlier (in green). More specifically, Bob should not be able to distinguish whether the Proximity Identifier of a participant he collected previously was broadcasted by one of the current nearby participants. Note that in a real-world scenario, Bob would also be broadcasting RPIs, but we omit it here for clarity.

If a user tests positive, the Daily Tracing Keys for the days where the user could have been infectious are uploaded to a Diagnosis server. The set of DTKs uploaded by an infected patient to the Diagnosis Server is referred to as the Diagnosis Keys.

The Diagnosis Server aggregates these Diagnosis Keys from all users and makes them available to all individuals participating in the system, who can then download them at their convenience.

Alice is diagnosed positive. A set of her Diagnosis Keys are uploaded to the Diagnosis Server. The server then distributes these keys to all participants in the system. In a real-world scenario, the lists of Diagnosis Keys distributed by the Server would be aggregated from many infected users

To monitor whether he has been in contact with an infectious person, Bob derives the Rolling Proximity Identifiers associated with each of the Diagnosis Keys received from the server and checks whether the RPIs computed match any he has accumulated through Bluetooth scanning. Recall that given a Diagnosis Key and its associated day, it is straightforward to compute all its associated RPIs, one simply has to iterate over all 144 possible Time Interval Numbers (one for each 10-minute period) and compute the truncated HMAC keyed with the Diagnosis Key.

Each participant iterates through the list of Diagnosis Keys and computes the corresponding Rolling Proximity Identifiers. They then check whether they have seen that RPI in the past, in which case they know they might be infected

Privacy and Security Considerations

Although the deployment of such contact tracing systems raises privacy concerns, numerous considerations are detailed in the technical documentation. For instance, the fact that new Rolling Proximity Identifiers are derived whenever the Bluetooth MAC address changes, which prevents two different RPIs overlapping with one address. Additionally, different measures are in place to perform as much computation locally as possible. As examples, matches of RPIs are computed on the devices only or the fact that Daily Tracing Keys of non-infected users are never uploaded to the Diagnosis Server.

Perhaps more interestingly is how cryptography is used in addition to the privacy measures described above. There are two important security properties that underpin the privacy of individuals in the system. Firstly, the fact that it should be computationally infeasible, without knowledge of the Daily Tracing Key, to link the Rolling Proximity Identifiers together. Secondly, the fact that it should be computationally infeasible to obtain the Tracing Key of a user given a set of their Daily Tracing Keys, as may be disclosed if the user has tested positive. These aspects rely on the security of the underlying cryptographic primitives, namely HKDF and HMAC and on the security of the hash function they use. Informally, these properties are guaranteed by the collision resistance of the hash function. In other words, if the hash function is collision-resistant, HMAC behaves as a secure pseudorandom function and thus both properties are ensured.

Conclusion

Currently, an API is under development and expected to be released in May, which will enable apps from public health authorities to integrate contact tracing technology in the coming months. A subsequent goal announced by the two companies is to build contact tracing directly into the iOS and Android operating systems, enabling better interactions with a wider ecosystem of applications. While some of the constructions in the protocol might be subject to modifications in the future, the elegance of the scheme in terms of cryptographic primitives and their underlying privacy guarantees make it a very promising candidate for contact tracing applications.

Here are some related articles you may find interesting

Ghidra nanoMIPS ISA module

Introduction In late 2023 and early 2024, the NCC Group Hardware and Embedded Systems practice undertook an engagement to reverse engineer baseband firmware on several smartphones. This included MediaTek 5G baseband firmware based on the nanoMIPS architecture. While we were aware of some nanoMIPS modules for Ghidra having been developed…

Hardware & Embedded Systems

Reverse Engineering

Tool Release

May 7, 2024

6 mins read

Sifting through the spines: identifying (potential) Cactus ransomware victims

Authored by Willem Zeeman and Yun Zheng Hu This blog is part of a series written by various Dutch cyber security firms that have collaborated on the Cactus ransomware group, which exploits Qlik Sense servers for initial access. To view all of them please check the central blog by Dutch…

Digital Forensics and Incident Response (DFIR)

Fox-IT and European Research

Vulnerability Research

April 25, 2024

7 mins read

Public Report – Confidential Mode for Hyperdisk – DEK Protection Analysis

During the spring of 2024, Google engaged NCC Group to conduct a design review of Confidential Mode for Hyperdisk (CHD) architecture in order to analyze how the Data Encryption Key (DEK) that encrypts data-at-rest is protected. The project was 10 person days and the goal is to validate that the…

Public Reports

April 12, 2024

1 min read

View articles by category

Most recent posts

Call us before you need us.

Our experts will help you.

Get in touch