# Reviewing Verifiable Random Functions

While Verifiable Random Functions (VRFs) were first described just over twenty years ago [1], they have recently seen a strong resurgence in popularity due to their usefulness in blockchain applications [2]. This blog post will introduce VRFs in the context of other well-known cryptographic primitives, describe three example use cases, and then highlight over two dozen topics to consider during an implementation review.

## VRFs Defined in Context

The name "Verifiable Random Function" may not align well with many folks' natural intuition, so let's set that aside for the moment. Grab a blank sheet of paper and we'll take three gentle steps towards a fresh definition.

First, an ideal cryptographic hash function [3] deterministically maps an arbitrary-sized message string to a fixed-size digest that is impossible to direct/predict without brute-force, essentially indistinguishable from a random oracle [4], and infeasible to run backwards (i.e. invert). The hash function makes it incredibly hard to find two input messages mapping to the same digest. The resulting digest is commonly associated with a message (over a safe channel) to support integrity checks.

Second, a keyed hash function [5] is a simple extension where a symmetric key is included alongside the message string in the deterministic digest calculation. The result is called a Message Authentication Code (MAC, HMAC, or often a 'tag'), and is also commonly associated with a message to support both integrity and authenticity checks. Note that these two assurances are dependent upon a symmetric key that is somehow generated and securely shared between the communicating parties.

Third, consider a further keyed hash function extension involving asymmetric public and private/secret keys rather than a shared symmetric key. The input message and secret key are used to generate an intermediate proof, which has similarities to a digital signature [6] (e.g. a signed digest). The proof is simply hashed to calculate the final digest result which is indistinguishable from a random value, unique to the original input message and will not collide with other input messages. Typically, only this final digest is initially revealed. Later, the proof and digest can together be independently verified with the corresponding public key. Now, integrity and authenticity assurances can be supported without having to share a symmetric key.

More precisely, a VRF involves three primary functions and four data objects. Alice runs a `Prove()` function that takes an input message (`alpha_string`) and her secret key (`sk`), and returns an intermediate proof (`pi_string`). She next runs a `Proof_to_hash()` function that takes the proof (`pi_string`) and returns the final digest (`beta_string`) to share with Bob. Later, Bob can run a `Verify()` function that takes Alice's public key (`y`), the input message (`alpha_string`) and proof (`pi_string`) and returns a locally calculated digest (`beta_string'`) for comparison with the original received digest (`beta_string`). From this point forward, the four data objects will be referred to as simply `alpha_string`, `sk`, `proof_string`, `beta_string` and `y`. There are RSA and elliptic curve VRF variants (and a few other lesser-known approaches), but this post is constrained to an elliptic curve VRF.

Regarding naming intuition, consider a Verifiable Random Function to be a function capable of producing deterministic and unique 'randomness' that can later be independently verified. Note that typical digital signature algorithms may have aspects of non-determinism, malleability and insufficient randomness that can prevent their direct applicability for our purposes. Where these aspects are properly addressed, a hashed signature is a very comparable approach.

One example VRF specification can be found in a draft hosted by the Internet Research Task Force (IRTF) Crypto Forum Research Group (CFRG) at https://datatracker.ietf.org/doc/draft-irtf-cfrg-vrf/. Ultimately, this work-in-progress draft may become a finalized IETF RFC. The three primary functions described above have their elliptic curve variants outlined below along with more precise success/failure indicators. A fourth and optional function that validates a public key is also shown. The VRF specification draft delegates (elliptic curve) key generation to specifics in Section 5.1.5 of RFC 8032.

```# Section 5.1. ECVRF Proving
ecvrf_prove(sk, alpha_string) -> pi_string

# Section 5.2. ECVRF Proof To Hash
ecvrf_proof_to_hash(pi_string) -> beta_string | "INVALID"

# Section 5.3. ECVRF Verifying
ecvrf_verify(y, pi_string, alpha_string) -> ("VALID", beta_string) | "INVALID"

# Section 5.6.1. ECVRF Validate Key (optional)
ecvrf_validate_key(pk_string) -> "INVALID" | y```

## VRF Use Cases

#### A. Blind Auctions

Loosely speaking, VRFs are essentially an asymmetric keyed hash function. This can be quite useful in precommitment systems with low-entropy inputs that must be resistant to brute-force pre-image attacks. For example, imagine we would like to place the bid shown below into a blind (sealed envelope) auction [7] without initially revealing anything about the bid contents or our identity. However, later we may want to reveal the bid contents and claim our item, which will require some notion of integrity and authenticity and thus involve keys. The starting point is this `alpha_string` which contains our bid:

`      {"Horse":"IntegrityChain", "Amount":"50", "Currency":"USD"}`

While simply submitting a basic hash digest of the above `alpha_string` does hide its contents, the digest itself provides no identity linkage and is subject to brute-force pre-image attacks. Instead, we use `sk` and `alpha_string` to first calculate our initially private `pi_string` and save it for future use. Next, we use that `pi_string` to calculate and submit the `beta_string` as our sealed auction bid. After the auction finally closes, presenting our winning bid consists of revealing the `alpha_string`, `pi_string` and `y`. The auctioneer can now verify the validity and the correspondence of a locally calculated `beta_string'` to the `beta_string` we submitted earlier.

#### B. DNS Denial of Existence

Basic domain name system (DNS) lookup messages map human-readable names to IP addresses without any guarantee of authenticity. To address this, DNS Security Extensions (DNSSEC) introduce the NSEC and NSEC3 records which include cryptographic signatures to support authenticity. While authenticated responses prevent man-in-the-middle denial-of-existence spoofing, the newly defined records simplify zone enumeration which may leak sensitive information. This is due in part to the original objective of performing all cryptographic signature operations in advance offline.

RFC 7129 [8] describes a query for a non-existent `b.example.com` returning a response including the NSEC record shown below. This signed record states that nothing exists between `a.example.org` and `d.example.org` and can also be re-used for any other query within the same domain name range gap. As the nameserver holds pre-calculated records linking together all available domain name range gaps, zone enumeration is simplified [9] requiring only a small number of online queries. Note that the RRSIG field contains a cryptographic Resource Record Signature.

` a.example.org.       NSEC d.example.org. A TXT RRSIG NSEC`

The NSEC3 record is very similar to a NSEC record, but instead of a signed range gap of domain names in plaintext, the NSEC3 record includes a signed range gap of domain name hashes (sorted by hash value). The nameserver maintains pre-calculated records linking together all available domain name range gap hashes. Further, the delivered response includes the both the hash salt and the number of iterations. As a result, the zone enumeration task now becomes a viable offline brute-force pre-image attack on the domain name range gap hashes [10]. Automated tools are available for this [11].

The internet draft for NSEC5 at https://datatracker.ietf.org/doc/draft-vcelak-nsec5/ introduces a scheme using VRFs to prevent brute-force offline attacks. First, the nameserver still holds pre-calculated records linking together all available domain name range gap hashes, but these are now VRF hashes (i.e. the `beta_string`). Without knowledge of the `sk` it is not viable to brute-force attack the `beta_string`. Second, when a non-existent name is queried, the nameserver delivers both a `proof_string` and `beta_string` for the queried name. Given the known `y`, the client can confirm the `beta_string` and confirm its correct position in the also-provided domain name range gap hash record. Thus, there is an authenticated gap and an authenticated position within that gap. While existing names are still signed offline and thus do not impact performance, queries for non-existent names now involve an online step. Note that NSEC5 is one of several proposals being debated.

#### C. Blockchain Cryptographic Sortition

A common criticism of blockchain systems involves the excessive costs and wasteful energy consumption involved in the mining process. This is largely due to the 'Proof of Work' approach used in the consensus system for finalizing blocks, which generally involves every miner racing to solve the same extremely-difficult cryptographic puzzle first.

An alternative to the proof of work approach is to utilize cryptographic sortition, which is essentially "the action of selecting or determining something by the casting or drawing of lots." [12]. The idea is to use a VRF to securely select a subset of miners to participate in forming a consensus around the next finalized block. This approach offers a variety of benefits so is becoming increasingly popular and hence the renewed interest in VRFs. Leading proponents of this approach include Algorand [13], Ontology [14] and Coda [15]; VRFs have even been written in Solidity [16] for deployment on Ethereum (and compatible) systems.

The general process follows a simple block-by-block pattern. An initial finalized block contains a unique and unpredictable identifier that serves as the `alpha_string` for all miners that may potentially participate in forming a consensus on the next finalized block. Once an initial block is finalized and the identifier broadcast, each participant uses their `sk` to privately calculate their unique `pi_string` and 'random' `beta_string`. If a miner's `beta_string` meets agreed requirements (e.g. perhaps has specific trailing digits, perhaps scaled by a stake), then that miner can participate in forming a consensus on the next finalized block. This consensus process typically involves a series of votes, where the vote submitted for each participant includes their `pi_string`. As a result, votes are essentially authenticated at the time of collection where the requirements on the `beta_string` are validated. The votes conclude with a consensus on (or election of) the next finalized block.

This strategy offers a very attractive alternative to the proof of work approach. Rather than having every participant waste resources on the same 'useless' puzzle to chase a single prize, a subset of participants are able to co-operate. The participating subset can self-identify without any communication or co-ordination beyond the `alpha_string` broadcast at the outset. Adversaries cannot know in advance which subset of participants will ultimately be empowered to vote on a particular block, so the attack target is diffuse. Participants themselves have no influence over the `beta_string` beyond their `sk`. Requirements placed on the `beta_string` can include staking amounts and other novel attributes to disincentivize misbehavior (which can be identified, attributed and punished). Further, the time between finalized blocks can be reduced and/or transaction throughput increased. Finally, overall participation can be greatly increased as this approach does not present significant compute power or energy consumption as barriers to participation.

## Security Considerations

Any complex distributed system involving multiple interlocked parts can fail in unanticipated ways. VRFs are not yet standardized or mature, and failure could result in the disclosure of keys, creation of forged proofs, digest collisions, unacceptable predictability from reduced randomness, and the loss of protected assets. While specifics are obviously dependent upon the unique system functionality, protocols and threat model, we can offer a starter set of topics to review prior to VRF deployment. Not all of these items necessarily lead to serious problems, but the answers to each question should be well understood and carefully considered.

#### A. Ciphersuite Configuration

• Are the internal algorithms mainstream and modern with a robust track record?
• Are the algorithm configuration parameters (e.g which elliptic curve, base point) set appropriately?
• Are the security parameters matched across algorithms (e.g. group order to hash width)?
• Are there special requirements for uniqueness, collision resistance and/or pseudorandomness?
• Are there any RSA or EC specific risks to avoid (e.g. low exponents)?

#### B. Key Lifecycle

• What is the source and quality of randomness? Is it blocking?
• Who generates the keys, in what manner, and with what assurances?
• What are the validity and longevity requirements for keys?
• How are the keys validated prior to use?
• How are the private keys transported, stored, rotated, destroyed?

#### C. Implementation

• Is all input tightly validated?
• Can the `alpha_string` be unduly influenced?
• Is the `beta_string` unique and uniformly distributed?
• Without knowledge of `sk` or `pi_string`, is `beta_string` effectively random?
• Is it possible to infer anything about `sk` given a `pi_string` (or multiples)?
• Given a fixed set of inputs, is there any source of non-determinism in the system?
• Is there any opportunity for malleability within and between components?
• Are any generated nonces uniformly distributed?
• Do all hashes have unique prefixes?
• Is transport and storage protected for confidentiality, integrity and authenticity (e.g. `alpha_string`)?
• Are timing side-channels present (especially in hash-to-curve, point multiplication and nonce generation calculations)?

#### D. Completeness

• What external functionality (e.g. 3rd party libraries) are used? Are they modern, mainstream, under active development with a solid track record? Are all dependencies up to date?
• Is strict typing and bounds-checking implemented?
• Are all error conditions correctly handled?
• Is there potential for serialization and deserialization issues?
• Is memory management and scrubbing 'perfect'?
• Test: plan, happy/unhappy paths, total coverage, CI infrastructure, fuzzing?
• Are there interoperability and/or compatibility requirements?

## Conclusion

This blog post introduced VRFs in the context of other well-known primitives, described three example use cases, and then highlighted over two dozen topics to consider during an implementation review. While VRFs clearly present a range of concerns, the opportunity to migrate away from proof of work consensus systems offers huge benefits. As such, we expect their popularity to only increase and look forward to participating in their continued development.