The BLAKE3 Hashing FrameworkTaurus SATaurus SAPlace Ruth Boesiger 61201GenevaSwitzerlandjeanphilippe.aumasson@gmail.comhttps://taurushq.comUniversity of CoimbraPolo IIPinhal de MarrocosCoimbra3030-290Portugalsneves@dei.uc.ptSpaceX356 167th Ave NEBellevueWA 98008USAoconnor663@gmail.comhttps://jacko.ioElectric Coin Co1750 30th St.apt #217Colorado80301USAzookog@gmail.com
General
Internet Engineering Task ForceBLAKE3Cryptographic HashExtendable-Output FunctionKey Derivation FunctionPseudo-Random FunctionMessage Authentication Code
This document specifies the cryptographic hashing primitive BLAKE3,
a secure algorithm designed to be fast and highly parallelizable.
Apart from the standard hashing functionality, BLAKE3 can serve to
realize the following cryptographic functionalities:
extendable-output function (XOF), key derivation function (KDF),
pseudo-random function (PRF), and message authentication code (MAC).
The cryptographic hash function was
designed by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and
Zooko Wilcox-O'Hearn.
BLAKE3 is an evolution from its predecessors and
. BLAKE2 is widely used in open-source
software and in proprietary software. For example, the Linux kernel
(from version 5.17) uses BLAKE2 in its cryptographic
pseudorandom generator, and the WireGuard secure tunnel protocol
uses BLAKE2 for hashing and keyed hashing.
BLAKE3 was designed to be as secure as BLAKE2, yet considerably
faster, thanks to 1) a compression function with a reduced number of
rounds, and 2) a tree-based mode allowing implementations to
leverage parallel processing. BLAKE3 was designed to take advantage
of multi-thread and multi-core processing, as well as of
single-instruction multiple-data (SIMD) instructions of modern
processor architectures.
At the time of its publication, BLAKE3 was demonstrated to be
approximately five times faster than BLAKE2 when hashing 16 kibibyte
messages and using a single thread. When using multiple threads and
hashing large messages, BLAKE3 can be more than twenty times faster
than BLAKE2.
BLAKE3 was also designed to instantiate multiple cryptographic
primitives, to offer a simpler and more efficient alternative to
dedicated legacy modes and algorithms such as those in . These primitives include:
This is the general-purpose hashing mode, taking a single input
of arbitrary size. BLAKE3 in this mode can be used whenever a
preimage- or collision-resistant hash function is needed, and to
instantiate random oracles in cryptographic protocols. For
example, BLAKE3 can replace SHA-3, as well as any SHA-2
instance, in applications such as digital signatures.
The keyed mode takes a 32-byte key, in addition to the arbitrary
size input. BLAKE3 in this mode can be used whenever a
pseudorandom function (PRF) or message authentication code (MAC)
is needed. For example, keyed BLAKE3 can replace HMAC instances.
The key derivation mode takes two input values, each of
arbitrary size: a context string, and key material. BLAKE3 in
this mode can be used whenever a key derivation function (KDF)
is needed. For example, BLAKE3 in key derivation mode can
replace HKDF.
Further, all 3 modes can produce an output of arbitrary size. The
hash mode can thus be used as an extendable-output-function (XOF);
the keyed hash mode can thus be used as a deterministic random bit
generator (DRBG). By default, each mode returns a 32-byte output.
Applications and use cases of BLAKE3 are further discussed in
Section 6 in .
We provide a high-level overview of BLAKE3's internal structure, and
introduce the associated terminology.
BLAKE3 processes input data according to a binary tree structure. It
first splits its input into 1024-byte chunks, processing each chunk
independently of the other chunks, using a compression function
iterating over each of the 16 consecutive 16-byte blocks of a chunk.
From the hash of each chunk, a binary hash tree is built to compute
the root of the tree, which determines the BLAKE3 output.
In the simplest case, there is only one chunk. In this case, this
node is seen as the tree's root and its output determines BLAKE3's
output. If the number of chunks is a power of 2, the binary tree is
a complete tree and all leaves are at the same level. If the number
of chunks is not a power of 2, not all chunks will be at the same
level (or layer) of the tree.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in .
For real-valued x we define the following functions:
Floor, the largest integer less than or equal to x.
Ceiling, the smallest integer greater than or equal to x.
Positive fractional part of x, frac(x) = x - floor(x).
Operator notation in pseudocode:
2 to the power "n". 2**0=1, 2**1=2, 2**2=4, 2**3=8, etc.
Bitwise exclusive-or operation between "a" and "b".
Remainder "a" modulo "b", always in range [0, b-1].
floor(x / 2**n). Logical shift "x" right by "n" bits.
(x * 2**n) mod (2**w). Logical shift "x" left by "n".
(x >> n) ^ (x << (w - n)).
Rotate "x" right by "n".
BLAKE3 performs operations on 32-bit words, and on arrays of
words. Array indexing is zero-based; the first element of an
n-element array "v" is v[0] and the last one is v[n - 1]. All
elements is denoted by v[0..n-1].
Byte (octet) streams are interpreted as words in little-endian
order, with the least significant byte first. Consider this
sequence of eight hexadecimal bytes:
When interpreted as a 32-bit word from the beginning memory address,
x contains two 32-bit words x[0] and x[1], respectively equal to
0x67452301 and 0xefcdab89 in hexadecimal, or 1732584193 and
4023233417 in decimal.
The initial value (IV) of BLAKE3 is the same as SHA-256 IV,
namely the 8-word IV[0..7]: .
This IV is set to the initial chaining value of BLAKE3 when no key
is used. Otherwise the 256-bit key is set as the initial chaining
value.
This IV is also used as part of the compression function, where
the first four words, IV[0..3] are copied into the 16-word local
initial state, at positions v[8..11].
BLAKE3 uses a permutation of the 16 indices (0 to 15). This
permutation must be following one, where the second line shows the
index of the word move to the position indexed on the first line:
For example, after applying the permutation to an array v[0..15]
consisting of elements v[0], v[1], ..., v[15], the permuted array
shall consist of v[2] at the first position, v[6] at the second
position, and so on.
The compression function of BLAKE3 uses a set of flags to
control various aspects of its operation. These flags are
defined as follows:
This flag must set for the first block of each chunk, and only
for this block.
This flag signals the end of the chunk. It must be set for the
compression function of the last block within a chunk, and only
for this block. If a chunk contains only one block, that block
must set both CHUNK_START and CHUNK_END.
In the binary tree structure, this flag must be set on parent
nodes (non-chunk nodes), and only for parent nodes. It signals
that the node combines the 32-byte outputs of nodes in the tree
(chunks of parents). A parent node always processes exactly 64
bytes, that is, the concatenation of two 32-byte output values
from a child node.
This flag must be set to the final compression in the tree,
representing the root, and only for such nodes. The root node
may be a parent node, but may also be the last compression of a
chunk processing leaf (if there is a single chunk). In this
latter case, only the last compression function sets the ROOT
flag. In the case of a parent node, there is a single
compression function, which sets the ROOT flag.
This flag must be set to all compression functions when using BLAKE3
in the keyed_hash mode, and only in this mode. In this mode, a
256-bit key must be used as initial chaining value instead of
IV.
These flags are used in BLAKE3's derive_key mode, where the
context string is first hashed by a (non-keyed) BLAKE3
instance must set the DERIVE_KEY_CONTEXT to all compression
functions; the 32-byte output is then used as the key of a
keyed BLAKE3 instance hashing the key material, which must set
DERIVE_KEY_MATERIAL (but not DERIVE_KEY_CONTEXT) to all
compression functions.
If two or more flags are set, then all their respective bits shall
appear in the flags compression function input. This combination may
be implemented as an XOR or integer addition between the flags. For
example, if CHUNK_START and KEYED_HASH are set, then the flags input
word will be the 32-bit word 0x00000011, where 0x11 = 0x10 + 0x01 =
0x10 ^ 0x01.
BLAKE3 uses the compression function when processing chunks, when
computing parent nodes within its tree, and when producing output
bytes from the root node(s).
These variables are used in the algorithm description.
The hash chaining value, 8 words of 32 bits.
The message block processed, 16 words of 32 bits.
A 64-bit counter whose lower-order 32-bit word is t[0] and
higher-order 32-bit word is t[1].
32-bit word encoding the number of application data bytes in
the block, at least 1 and at most 64. That is, len is equal to
64 minus the number of padding bytes, which are set to
zero (0x00).
32-bit word encoding the flags defined for a given compression
function call, see .
The G function mixes two input words x and y into four
words indexed by a, b, c, and d in the working array v[0..15]. The
full modified array is returned.
BLAKE3's compression function takes as input an 8-word state h, a
16-word message m, a 2-word counter t, a data length word len, and a
word flags (as a bit field encoding flags).
BLAKE3's compression must do exactly 7 rounds, which are numbered 0
to 6 in the pseudocode below. Each round includes 8 calls to the G
function.
When processing chunks or computing parent nodes, the output is
always truncated to the first 8 words v[0..7]. When computing the
output value, all the 16 words may be used (see ).
The following describes BLAKE3's tree mode of operation, first
specifying the processing of input data as chunks in , then describing how the binary hash tree structure is
formed for a given number of chunks in .
Finally, describes how BLAKE3 can produce an
output of arbitrary length without committing to a length when
starting processing data.
BLAKE3's chunk processing divides the BLAKE3 input into
1024-byte chunks, which will be leaves of a binary tree. The
last chunk will be of less than 1024 bytes if the message byte
length is not a multiple of 1024.
Chunks are divided into 64-byte blocks. The last block of
the last chunk may be shorter, but not empty, unless the entire
input is empty. If necessary, the last block is padded with
zeros to be 64-byte.
Each chunk is processed by iterating the compression function
(1024/64 = 16 times for a full 1024-byte chunk) to process
64-byte blocks, each parsed as 16 32-bit little-endian words.
The output of each compression function is h[0..7], the 8-word
truncated output, which is the input of the next compression
function (and the output of the chunk processing for the last
compression function call).
Compression function input arguments are set as follows:
For the first block of a chunk, this is set either to the key,
or to the IV if no key is defined. For subsequent block, h is
set to the output of the previous block's compression.
This is the block processed by the compression function.
The counter t for each block is the chunk index, that is, 0 for
all blocks in the first chunk, 1 for all blocks in the second
chunk, and so on.
The block length is set to 64 for all blocks except the last
block of a chunk, where it is the number of data bytes (thus
excluding padding zeros.)
The first block of each chunk sets the CHUNK_START flag (cf.
Table 3), and the last block of each chunk sets the CHUNK_END
flag. If a chunk contains only one block, that block sets both
CHUNK_START and CHUNK_END. If a chunk is the root of its tree,
the last block of that chunk also sets the ROOT flag. Multiple
flags may thus be set.
Flags are set as specified in . More than
one flag may be set, for example the first compression of the first
chunk will have CHUNK_START and KEYED_HASH set. If a chunk is only
64-byte, then its only compression will in addition have the
CHUNK_END flag set.
From the chunk processing of input data, BLAKE3 computes its output
following a binary hash tree (a.k.a. Merkle tree) structure. Parent
nodes process 64-byte messages that consist of the concatenation of
two 32-byte hash values from child nodes (chunk nodes of other
parent nodes). Processing such a 64-byte message requires only one
call to the compression function.
The compression function used by parent nodes thus uses the
following arguments:
This is set either to the key (in keyed_hash mode), and to the
IV otherwise.
This is the 64-byte block consisting in the concatenated 32-byte
output of the two child nodes.
The counter t is set to zero (0).
The block length is set to 64.
The PARENT flag is set for all parent nodes. If a parent is the
root of the tree, then it also sets the ROOT flag (and keeps the
PARENT flag). Parent nodes never set CHUNK_START and CHUNK_END.
The mode flags (KEYED_HASH, DERIVE_KEY_CONTEXT,
DERIVE_KEY_MATERIAL) must be set for a parent node when
operating if the respective modes.
When the number of chunks is not a power of 2 (that is, when the
binary tree is not complete), the tree structure is created
according to the following rules:
If there is only one chunk, that chunk is the root node and only
node of the tree. Otherwise, the chunks are assembled with
parent nodes, each parent node having exactly two children.
Left subtrees are full, that is, each left subtree is a complete binary
tree, with all its chunks at the same depth, and a number of
chunks that is a power of 2.
Left subtrees are big, that is, each left subtree contains a
number of chunks greater than or equal to the number of chunks
in its sibling right subtree.
The implementation of this logic, especially regarding the
assignment of a chunk to a position in the tree, is discussed in
.
The root of the tree determines the final hash output. By default,
the BLAKE3 output is the 32-byte output of the root node (that is,
the final values of v[0..7] in the compression function). Output of
up to 64 bytes is formed by taking as many bytes as required from
the final v[0..15] of the root's compression function. See for the case of output values larger than 64 bytes.
BLAKE3, in any of its three modes, can produce outputs of any byte
length up to 2**64 - 1. This is done by repeating the root
compression with incrementing values of the counter t. The results
of these repeated root compressions are then concatenated to form
the output.
Detailed implementation and optimization guidelines are given in
Section 5 of . This section providers a
brief overview of these, as a starting point to implementers,
covering the most salient points.
BLAKE3 may be implemented using an application programming interface
(API) allowing for incremental hashing, that is, where the caller
provides input data via multiple repeated calls to an "update"
function, as opposed to a single call providing all the input data.
Such an API typically consists of an "init" function call to
initialize an internal context state, followed by a series of
"update" function calls, eventually followed by a "finalize"
function call that that returns the output.
To implement incremental hashing, an implementation must maintain an
internal state, which must keep track of the state of the current
chunk (if any) and of chaining values of the tree in formation. A
stack data structure may be used for this purpose, as proposed in
Section 5.1 of .
In the compression function, the first four calls to G may be
computed in parallel. Likewise, the last four calls to G may be
computed in parallel. A parallel implementation of the compression
function may leverage single-instruction multiple-date (SIMD)
processing, as described in Section 5.3 of .
The permutation of words may be implemented by pre-computing the
indices corresponding to 0, 1, 2, ..., 7 iterations of the
permutation, and then applying each of these 7 permutations to the
initial message at each of the 7 rounds. These 7 permutations would
then be:
In addition to the potential parallel computing of the compression
function internals via SIMD processing, BLAKE3 can benefit from
multi-threaded software implementation. Different approaches may be
implemented, the performance-optimal one depending on the expected
input data length. Section 5.2 in provides
further guidelines to implementers.
Because the repeated root compressions differ only in the value of
t, the implementation can execute any number of them in parallel.
The caller can also adjust t to seek to any point in the output
stream. For example, computing the third 64-byte block of output
(that is, the last 64 bytes of a 192-byte output) does not require
the computation of the first 128 bytes.
BLAKE3 does not domain separate outputs of different lengths:
shorter outputs are prefixes of longer ones. The caller can extract
as much output as needed, without knowing the final length in
advance.
This memo includes no request to IANA at the moment, but may in
further versions of this I-D.
BLAKE3 with an output of at least 32 bytes targets a security level
of at least 128 bits for all its security goals. BLAKE3 may be used
in any of the modes described in this document to provide
cryptographically secure hashing functionality. BLAKE3 must not be
used as a password-based hash function or password-based key
derivation function, functionalities for which dedicated algorithms
must be used, such as Argon2 as defined in .
We refer the reader to for detailed
cryptographic rationale and security analysis of BLAKE3.
BLAKE3BLAKE2: simpler, smaller, fast as MD5The Hash Function BLAKE
Argon2 Memory-Hard Function for Password Hashing and Proof-of-Work Applications
The BLAKE2 Cryptographic Hash and Message Authentication
Code (MAC)Key words for use in RFCs to Indicate Requirement LevelsUS Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)
Reference implementations of BLAKE3 in the C and Rust languages are
available online at . These
implementations include parallel implementations leveraging
multi-threading and different SIMD processing technologies.
At the time of writing, a number of prominent projects have
integrated BLAKE3, due to its combination of security, speed, and
versatility (see the README on .
For the sake of document size, these implementations are not copied
into the present document. However, they are expected to remain
permanently available, for the foreseeable future.
We provide execution traces for simple examples of BLAKE3 hashing.
More complex tests can be obtained from the reference source code.
In this first example, BLAKE3 in hash mode processes the 4-byte
message "IETF", padded with 60 zero bytes to form a 64-byte block.
Below we show the execution trace, including compression function input
and output values, and showing intermediate values of the 7
compression function rounds.
In this second example, BLAKE3 in keyed hash mode processes a
message composed of two 1024-byte chunks, the first consisting only
of 0xaa bytes and the second consisting only of 0xbb bytes.
Below we show the execution trace, including compression function input
and output values for each compression function: the 16 + 16 = 32
compressions of the two chunks, and the compression of the root
parent node. We only show the message block for the first
compression of a chunk, as all the subsequent blocks hash the same
block value (respectively, all 0xaa and all 0xbb for the two
chunks). Likewise, we only show the counter value and flags when
they changes (the counter is, 0, 1, and 0 respectively for the two
chunks and for the root). The len compression function argument is
always 64, so we don't show it. Chunks and blocks are numbered from
0.