On April 10, Apple and Google announced1, 2 that they were joining forces in an effort to help reduce the spread of COVID-19. Their solution leverages Bluetooth technology to trace interactions between individuals. This principle is known as contact tracing and public health agencies are heavily relying on it to monitor and prevent the spread of the virus. But the introduction of tracking or similar technologies can be detrimental to individual privacy. Fortunately, the solution proposed makes use of interesting cryptographic constructions to trace individuals without sacrificing privacy. This blog post will showcase what and how cryptographic primitives are used in this real-world scenario. Note that the description provided below is based on the initial proposal3, which might be subject to modifications in the future.
Contact tracing is the process of identifying and monitoring which members of a population have had contact with an infectious person. While listing all the people one has met in the past may seem like an impossible task, there are interesting ways to address it by leveraging the computing power of our smartphones. The key idea is that users’ phones broadcast identifiers, which are then recorded by nearby receiving devices. If a user tests positive for the virus, the list of past identifiers broadcasted by that user is disclosed to all participants in the system. Upon reception of that list, users will check whether they have seen one of these IDs in the past, meaning that they were in close range from an infected person.
The design of the contact tracing proposal incorporates interesting elements of privacy by design. For instance, the identifiers are pseudonymous and are rotated at regular intervals in order to prevent long-term tracking of devices (on average every 15 minutes in the current proposal). Furthermore, the system does not make use of any geolocation data. It exclusively broadcasts lists of privacy-preserving identifiers, which are then treated on the participants’ devices.
Presentation of the scheme
The contact tracing proposal allows users to opt in to the system, using their smartphones running an authorized application. Additionally, a Diagnosis Server, which collects and distributes the lists of IDs has to be set up. The server could be run by different organizations, such as public health agencies, etc.
Let us now describe how the scheme works in practice. Suppose that Alice has just opted in to the system. Her application computes a 32-byte Tracing Key (TK) as the output of a Cryptographically-Secure Pseudorandom Number Generator (CSPRNG):
TK = CSPRNG(32).
As opposed to a Pseudorandom Number Generator (PRNG), a CSPRNG comes with certain guarantees regarding the quality of the randomness it produces and its security in the event of a compromise, making it suitable for cryptographic purposes.
Then, for a given day di, Alice generates a Daily Tracing Key (DTK) based on her Tracing Key and di. This daily key is a 16-byte key computed with the Key Derivation Function HKDF4 (a construction based on HMAC, see below) using SHA-256 as the underlying hash function:
DTK = HKDF(TK, “CT-DTK” || di ).
A KDF allows users to derive cryptographic keys from an initial key. An established Key Derivation Function like HKDF comes with important security properties ensuring that the keys obtained are cryptographically strong, provided that the underlying hash function is secure. Note that the security of the KDF also inherently depends on the security of the initial key, which is why a Cryptographically-Secure PRNG must be used in the first place.
The value di, the Day Number, corresponds to a unique value assigned to the current day, counted from the Unix Epoch Time5, the number of seconds since January 1, 1970 UTC. For example, April 16, 2020 corresponds to the value 18368. The Day Number is therefore public and it is the same for all users.
Once Alice has generated a Daily Tracing Key, she can start computing the Rolling Proximity Identifiers (RPI), which are the privacy-preserving IDs we alluded to earlier. These RPIs are computed using HMAC6 (a Message Authentication Code with SHA-256 as the underlying hash function), and Alice’s current DTK as the key. The output is then truncated to 16 bytes, as follows:
RPI = Trunc(HMAC(DTKi, “CT-RPI” || tj ), 16).
Similar to the HKDF used above, HMAC also provides strong security guarantees based on the security of the underlying hash function. In this case, HMAC is used as a keyed hash function, and the important property is that possession of the DTK is necessary to compute the associated RBI.
The value tj, the Time Interval Number, is an integer corresponding to the index of a 10-minute range in a given day, referred to as the Time Interval. Since a day is composed of 144 such 10-minute intervals, the value of tj is an integer in the range [0, 143]. To prevent linkage of multiple Rolling Proximity Identifiers with a given Bluetooth MAC address, new RPIs are generated every time the Bluetooth address changes.
We now follow Alice going about her daily life (in a socially-responsible manner given the state of the pandemic) broadcasting Proximity Identifiers as Bluetooth Advertisements to nearby participants. Suppose for example that on a given day and at Time Interval tj, another user participating in the system, Bob, is in close proximity to Alice. In this simple scenario, we consider Alice as the sender and Bob as the receiver (even though in real-life parties would be broadcasting and recording). Bob sees the RPI broadcasted by Alice (in green in the following diagram) and stores it on his device.
Every time a nearby participant advertises a new Rolling Proximity Identifier, Bob adds it to his list of received RPIs. Let us now introduce Charlie, a third participant who, at a later time, also comes in close proximity to Bob. At that point, Alice is broadcasting a new RPI (in orange) reflecting the current Time Interval, and Charlie is also broadcasting his current RPI (in blue, different from Alice’s, since it is based on his own Daily Tracing Key).
Even in our simple scenario with two advertisers, Bob is not able to link Alice’s current (orange) RPI with the one she broadcasted earlier (in green). More specifically, Bob should not be able to distinguish whether the Proximity Identifier of a participant he collected previously was broadcasted by one of the current nearby participants. Note that in a real-world scenario, Bob would also be broadcasting RPIs, but we omit it here for clarity.
If a user tests positive, the Daily Tracing Keys for the days where the user could have been infectious are uploaded to a Diagnosis server. The set of DTKs uploaded by an infected patient to the Diagnosis Server is referred to as the Diagnosis Keys.
The Diagnosis Server aggregates these Diagnosis Keys from all users and makes them available to all individuals participating in the system, who can then download them at their convenience.
To monitor whether he has been in contact with an infectious person, Bob derives the Rolling Proximity Identifiers associated with each of the Diagnosis Keys received from the server and checks whether the RPIs computed match any he has accumulated through Bluetooth scanning. Recall that given a Diagnosis Key and its associated day, it is straightforward to compute all its associated RPIs, one simply has to iterate over all 144 possible Time Interval Numbers (one for each 10-minute period) and compute the truncated HMAC keyed with the Diagnosis Key.
Privacy and Security Considerations
Although the deployment of such contact tracing systems raises privacy concerns, numerous considerations are detailed in the technical documentation. For instance, the fact that new Rolling Proximity Identifiers are derived whenever the Bluetooth MAC address changes, which prevents two different RPIs overlapping with one address. Additionally, different measures are in place to perform as much computation locally as possible. As examples, matches of RPIs are computed on the devices only or the fact that Daily Tracing Keys of non-infected users are never uploaded to the Diagnosis Server.
Perhaps more interestingly is how cryptography is used in addition to the privacy measures described above. There are two important security properties that underpin the privacy of individuals in the system. Firstly, the fact that it should be computationally infeasible, without knowledge of the Daily Tracing Key, to link the Rolling Proximity Identifiers together. Secondly, the fact that it should be computationally infeasible to obtain the Tracing Key of a user given a set of their Daily Tracing Keys, as may be disclosed if the user has tested positive. These aspects rely on the security of the underlying cryptographic primitives, namely HKDF and HMAC and on the security of the hash function they use. Informally, these properties are guaranteed by the collision resistance of the hash function. In other words, if the hash function is collision-resistant, HMAC behaves as a secure pseudorandom function and thus both properties are ensured.
Currently, an API is under development and expected to be released in May, which will enable apps from public health authorities to integrate contact tracing technology in the coming months. A subsequent goal announced by the two companies is to build contact tracing directly into the iOS and Android operating systems, enabling better interactions with a wider ecosystem of applications. While some of the constructions in the protocol might be subject to modifications in the future, the elegance of the scheme in terms of cryptographic primitives and their underlying privacy guarantees make it a very promising candidate for contact tracing applications.
Let us also mention two efforts led by academic institutions on the topic of contact tracing, that will certainly impact future deployment of contact tracing schemes. A European-wide collaboration is proposing a secure and decentralized privacy-preserving proximity tracing system, called DP-3T7. Another interesting collaboration effort is the Private Automated Contact Tracing8 (PACT), led by MIT researchers.