F-ika | Search

Search

Items tagged with: keyDerivationFunction

NIST opened public comments on SP 800-108 Rev. 1 (the NIST recommendations for Key Derivation Functions) last month. The main thing that’s changed from the original document published in 2009 is the inclusion of the Keccak-based KMAC alongside the incumbent algorithms.

One of the recommendations of SP 800-108 is called “KDF in Counter Mode”. A related document, SP 800-56C, suggests using a specific algorithm called HKDF instead of the generic Counter Mode construction from SP 800-108–even though they both accomplish the same goal.

Isn’t standards compliance fun?

Interestingly, HKDF isn’t just an inconsistently NI

Isn’t standards compliance fun?

Interestingly, HKDF isn’t just an inconsistently NIST-recommended KDF, it’s also a common building block in a software developer’s toolkit which sees a lot of use in different protocols.

Unfortunately, the way HKDF is widely used is actually incorrect given its formal security definition. I’ll explain what I mean in a moment.

456879 Art: Scruff

What is HKDF?

To first understand what HKDF is, you first need to know about HMAC.

HMAC is a standard message authentication code (MAC) algorithm built with cryptographic hash functions (that’s the H). HMAC is specified in RFC 2104 (yes, it’s that old).

HKDF is a key-derivation function that uses HMAC under-the-hood. HKDF is commonly used in encryption tools (Signal, age). HKDF is specified in RFC 5869.

HKDF is used to derive a uniformly-random secret key, typically for use with symmetric cryptography algorithms. In any situation where a key might need to be derived, you might see HKDF being used. (Although, there may be better algorithms.)

456881 Art: LvJ

How Developers Understand and Use HKDF

If you’re a software developer working with cryptography, you’ve probably seen an API in the crypto module for your programming language that looks like this, or maybe this.

hash_hkdf( string $algo, string $key, int $length = 0, string $info = "", string $salt = ""): string

Software developers that work with cryptography will typically think of the HKDF parameters like so:

$algo — which hash function to use
$key — the input key, from which multiple keys can be derived
$length — how many bytes to derive
$info — some arbitrary string used to bind a derived key to an intended context
$salt — some additional randomness (optional)

The most common use-case of HKDF is to implement key-splitting, where a single input key (the Initial Keying Material, or IKM) is used to derive two or more independent keys, so that you’re never using a single key for multiple algorithms.

See also: [url=https://github.com/defuse/php-encryption]defuse/php-encryption[/url], a popular PHP encryption library that does exactly what I just described.

At a super high level, the HKDF usage I’m describing looks like this:

class MyEncryptor {protected function splitKeys(CryptographyKey $key, string $salt): array { $encryptKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, 'encryption', $salt )); $authKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, 'message authentication', $salt )); return [$encryptKey, $authKey];}public function encryptString(string $plaintext, CryptographyKey $key): string{ $salt = random_bytes(32); [$encryptKey, $hmacKey] = $this->splitKeys($key, $salt); // ... encryption logic here ... return base64_encode($salt . $ciphertext . $mac);}public function decryptString(string $encrypted, CryptographyKey $key): string{ $decoded = base64_decode($encrypted); $salt = mb_substr($decoded, 0, 32, '8bit'); [$encryptKey, $hmacKey] = $this->splitKeys($key, $salt); // ... decryption logic here ... return $plaintext;}// ... other method here ...}

Unfortunately, anyone who ever does something like this just violated one of the core assumptions of the HKDF security definition and no longer gets to claim “KDF security” for their construction. Instead, your protocol merely gets to claim “PRF security”.

456883 Art: Harubaki

KDF? PRF? OMGWTFBBQ?

Let’s take a step back and look at some basic concepts.

(If you want a more formal treatment, read this Stack Exchange answer.)

PRF: Pseudo-Random Functions

A pseudorandom function (PRF) is an efficient function that emulates a random oracle.

“What the hell’s a random oracle?” you ask? Well, Thomas Pornin has the best explanation for random oracles:

A random oracle is described by the following model:
There is a black box. In the box lives a gnome, with a big book and some dice.
We can input some data into the box (an arbitrary sequence of bits).
Given some input that he did not see beforehand, the gnome uses his dice to generate a new output, uniformly and randomly, in some conventional space (the space of oracle outputs). The gnome also writes down the input and the newly generated output in his book.
If given an already seen input, the gnome uses his book to recover the output he returned the last time, and returns it again.
So a random oracle is like a kind of hash function, such that we know nothing about the output we could get for a given input message m. This is a useful tool for security proofs because they allow to express the attack effort in terms of number of invocations to the oracle.
The problem with random oracles is that it turns out to be very difficult to build a really “random” oracle. First, there is no proof that a random oracle can really exist without using a gnome. Then, we can look at what we have as candidates: hash functions. A secure hash function is meant to be resilient to collisions, preimages and second preimages. These properties do not imply that the function is a random oracle.
Thomas Pornin

Alternatively, Wikipedia has a more formal definition available to the academic-inclined.

In practical terms, we can generate a strong PRF out of secure cryptographic hash functions by using a keyed construction; i.e. HMAC.

Thus, as long as your HMAC key is a secret, the output of HMAC can be generally treated as a PRF for all practical purposes. Your main security consideration (besides key management) is the collision risk if you truncate its output.

453588 Art: LvJ

KDF: Key Derivation Functions

A key derivation function (KDF) is exactly what it says on the label: a cryptographic algorithm that derives one or more cryptographic keys from a secret input (which may be another cryptography key, a group element from a Diffie-Hellman key exchange, or a human-memorable password).

Note that passwords should be used with a Password-Based Key Derivation Function, such as scrypt or Argon2id, not HKDF.

Despite what you may read online, KDFs do not need to be built upon cryptographic hash functions, specifically; but in practice, they often are.

A notable counter-example to this hash function assumption: CMAC in Counter Mode (from NIST SP 800-108) uses AES-CMAC, which is a variable-length input variant of CBC-MAC. CBC-MAC uses a block cipher, not a hash function.

Regardless of the construction, KDFs use a PRF under the hood, and the output of a KDF is supposed to be a uniformly random bit string.

456887 Art: LvJ

PRF vs KDF Security Definitions

The security definition for a KDF has more relaxed requirements than PRFs: PRFs require the secret key be uniformly random. KDFs do not have this requirement.

If you use a KDF with a non-uniformly random IKM, you probably need the KDF security definition.

If your IKM is already uniformly random (i.e. the “key separation” use case), you can get by with just a PRF security definition.

After all, the entire point of KDFs is to allow a congruent security level as you’d get from uniformly random secret keys, without also requiring them.

However, if you’re building a protocol with a security requirement satisfied by a KDF, but you actually implemented a PRF (i.e., not a KDF), this is a security vulnerability in your cryptographic design.

456889 Art: LvJ

The HKDF Algorithm

HKDF is an HMAC-based KDF. Its algorithm consists of two distinct steps:

HKDF-Extract uses the Initial Keying Material (IKM) and Salt to produce a Pseudo-Random Key (PRK).
HKDF-Expand actually derives the keys using PRK, the info parameter, and a counter (from 0 to 255) for each hash function output needed to generate the desired output length.

If you’d like to see an implementation of this algorithm, defuse/php-encryption provides one (since it didn’t land in PHP until 7.1.0). Alternatively, there’s a Python implementation on Wikipedia that uses HMAC-SHA256.

This detail about the two steps will matter a lot in just a moment.

Soatok Explains it All Art: Swizz

How HKDF Salts Are Misused

The HKDF paper, written by Hugo Krawczyk, contains the following definition (page 7).

456891

The paper goes on to discuss the requirements for authenticating the salt over the communication channel, lest the attacker have the ability to influence it.

A subtle detail of this definition is that the security definition says that A salt value , not Multiple salt values.

Which means: You’re not supposed to use HKDF with a constant IKM, info label, etc. but vary the salt for multiple invocations. The salt must either be a fixed random value, or NULL.

The HKDF RFC makes this distinction even less clear when it argues for random salts.

We stress, however, that the use of salt adds significantly to the strength of HKDF, ensuring independence between different uses of the hash function, supporting “source-independent” extraction, and strengthening the analytical results that back the HKDF design.
Random salt differs fundamentally from the initial keying material in two ways: it is non-secret and can be re-used. As such, salt values are available to many applications. For example, a pseudorandom number generator (PRNG) that continuously produces outputs by applying HKDF to renewable pools of entropy (e.g., sampled system events) can fix a salt value and use it for multiple applications of HKDF without having to protect the secrecy of the salt. In a different application domain, a key agreement protocol deriving cryptographic keys from a Diffie-Hellman exchange can derive a salt value from public nonces exchanged and authenticated between communicating parties as part of the key agreement (this is the approach taken in [IKEv2]).
RFC 5869, section 3.1

Okay, sure. Random salts are better than a NULL salt. And while this section alludes to “[fixing] a salt value” to “use it for multiple applications of HKDF without having to protect the secrecy of the salt”, it never explicitly states this requirement. Thus, the poor implementor is left to figure this out on their own.

Thus, because it’s not using HKDF in accordance with its security definition, many implementations (such as the PHP encryption library we’ve been studying) do not get to claim that their construction has KDF security.

Instead, they only get to claim “Strong PRF” security, which you can get from just using HMAC.

456893 Art: LvJ

What Purpose Do HKDF Salts Actually Serve?

Recall that the HKDF algorithm uses salts in the HDKF-Extract step. Salts in this context were intended for deriving keys from a Diffie-Hellman output, or a human-memorable password.

In the case of [Elliptic Curve] Diffie-Hellman outputs, the result of the key exchange algorithm is a random group element, but not necessarily uniformly random bit string. There’s some structure to the output of these functions. This is why you always, at minimum, apply a cryptographic hash function to the output of [EC]DH before using it as a symmetric key.

HKDF uses salts as a mechanism to improve the quality of randomness when working with group elements and passwords.

Extending the nonce for a symmetric-key AEAD mode is a good idea, but using HKDF’s salt parameter specifically to accomplish this is a misuse of its intended function, and produces a weaker argument for your protocol’s security than would otherwise be possible.

How Should You Introduce Randomness into HKDF?

Just shove it in the info parameter.

453572 Art: LvJ

It may seem weird, and defy intuition, but the correct way to introduce randomness into HKDF as most developers interact with the algorithm is to skip the salt parameter entirely (either fixing it to a specific value for domain-separation or leaving it NULL), and instead concatenate data into the info parameter.

class BetterEncryptor extends MyEncryptor {protected function splitKeys(CryptographyKey $key, string $salt): array { $encryptKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, $salt . 'encryption', '' // intentionally empty )); $authKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, $salt . 'message authentication', '' // intentionally empty )); return [$encryptKey, $authKey];}}

Of course, you still have to watch out for canonicalization attacks if you’re feeding multi-part messages into the info tag.

Another advantage: This also lets you optimize your HKDF calls by caching the PRK from the HKDF-Extract step and reuse it for multiple invocations of HKDF-Expand with a distinct info. This allows you to reduce the number of hash function invocations from to 2 + 2n (since each HMAC involves two hash function invocations).

Notably, this HKDF salt usage was one of the things that was changed in V3/V4 of PASETO.

Does This Distinction Really Matter?

If it matters, your cryptographer will tell you it matters–which probably means they have a security proof that assumes the KDF security definition for a very good reason, and you’re not allowed to violate that assumption.

Otherwise, probably not. Strong PRF security is still pretty damn good for most threat models.

456897 Art: LvJ

Closing Thoughts

If your takeaway was, “Wow, I feel stupid,” don’t, because you’re in good company.

I’ve encountered several designs in my professional life that shoved the randomness into the info parameter, and it perplexed me because there was a perfectly good salt parameter right there. It turned out, I was wrong to believe that, for all of the subtle and previously poorly documented reasons discussed above. But now we both know, and we’re all better off for it.

So don’t feel dumb for not knowing. I didn’t either, until this was pointed out to me by a very patient colleague.

456899 “Feeling like you were stupid” just means you learned.
(Art: LvJ)

Also, someone should really get NIST to be consistent about whether you should use HKDF or “KDF in Counter Mode with HMAC” as a PRF, because SP 800-108’s new revision doesn’t concede this point at all (presumably a relic from the 2009 draft).

This concession was made separately in 2011 with SP 800-56C revision 1 (presumably in response to criticism from the 2010 HKDF paper), and the present inconsistency is somewhat vexing.

(On that note, does anyone actually use the NIST 800-108 KDFs instead of HKDF? If so, why? Please don’t say you need CMAC…)

Bonus Content

These questions were asked after this blog post initially went public, and I thought they were worth adding. If you ask a good question, it may end up being edited in at the end, too.

456901 Art: LvJ

Why Does HKDF use the Salt as the HMAC key in the Extract Step? (via r/crypto)

Broadly speaking, when applying a PRF to two “keys”, you get to decide which one you treat as the “key” in the underlying API.

HMAC’s API is HMACalg(key, message), but how HKDF uses it might as well be HMACalg(key1, key2).

The difference here seems almost arbitrary, but there’s a catch.

HKDF was designed for Diffie-Hellman outputs (before ECDH was the norm), which are generally able to be much larger than the block size of the underlying hash function. 2048-bit DH results fit in 256 bytes, which is 4 times the SHA256 block size.

If you have to make a decision, using the longer input (DH output) as the message is more intuitive for analysis than using it as the key, due to pre-hashing. I’ve discussed the counter-intuitive nature of HMAC’s pre-hashing behavior at length in this post, if you’re interested.

So with ECDH, it literally doesn’t matter which one was used (unless you have a weird mismatch in hash functions and ECC groups; i.e. NIST P-521 with SHA-224).

But before the era of ECDH, it was important to use the salt as the HMAC key in the extract step, since they were necessarily smaller than a DH group element.

Thus, HKDF chose HMACalg(salt, IKM) instead of HMACalg(IKM, salt) for the calculation of PRK in the HKDF-Extract step.

Neil Madden also adds that the reverse would create a chicken-egg situation, but I personally suspect that the pre-hashing would be more harmful to the security analysis than merely supplying a non-uniformly random bit string as an HMAC key in this specific context.

My reason for believing this is, when a salt isn’t supplied, it defaults to a string of 0x00 bytes as long as the output size of the underlying hash function. If the uniform randomness of the salt mattered that much, this wouldn’t be a tolerable condition.

https://soatok.blog/2021/11/17/understanding-hkdf/

#cryptographicHashFunction #cryptography #hashFunction #HMAC #KDF #keyDerivationFunction #securityDefinition #SecurityGuidance

Dhole Moments

2020-12-24 06:16:58

As we look upon the sunset of a remarkably tiresome year, I thought it would be appropriate to talk about cryptographic wear-out.
What is cryptographic wear-out?

It’s the threshold when you’ve used the same key to encrypt so much data that you should probably switch to a new key before you encrypt any more. Otherwise, you might let someone capable of observing all your encrypted data perform interesting attacks that compromise the security of the data you’ve encrypted.
My definitions always aim to be more understandable than pedantically correct.
(Art by Swizz)
The exact value of the threshold varies depending on how exactly you’re encrypting data (n.b. AEAD modes, block ciphers + cipher modes, etc. each have different wear-out thresholds due to their composition).
Let’s take a look at the wear-out limits of the more popular symmetric encryption methods, and calculate those limits ourselves.
Specific Ciphers and Modes

(Art by Khia. Poorly edited by the author.)
Cryptographic Limits for AES-GCM

I’ve written about AES-GCM before (and why I think it sucks).
AES-GCM is a construction that combines AES-CTR with an authenticator called GMAC, whose consumption of nonces looks something like this:
Calculating H (used in GHASH for all messages encrypted under the same key, regardless of nonce):
Encrypt(00000000 00000000 00000000 00000000)

Calculating J0 (the pre-counter block):
If the nonce is 96 bits long:
NNNNNNNN NNNNNNNN NNNNNNNN 00000001 where the N spaces represent the nonce hexits.

Otherwise:
s = 128 * ceil(len(nonce)/nonce) - len(nonce)
J0 = GHASH(H, nonce || zero(s+64) || int2bytes(len(nonce))

Each block of data encrypted uses J0 + block counter (starting at 1) as a CTR nonce.
J0 is additionally used as the nonce to calculate the final GMAC tag.
AES-GCM is one of the algorithms where it’s easy to separately calculate the safety limits per message (i.e. for a given nonce and key), as well as for all messages under a key.
AES-GCM Single Message Length Limits

In the simplest case (nonce is 96 bits), you end up with the following nonces consumed:
For each key: 00000000 00000000 00000000 00000000
For each (nonce, key) pair:
NNNNNNNN NNNNNNNN NNNNNNNN 000000001 for J0
NNNNNNNN NNNNNNNN NNNNNNNN 000000002 for encrypting the first 16 bytes of plaintext
NNNNNNNN NNNNNNNN NNNNNNNN 000000003 for the next 16 bytes of plaintext…
…
NNNNNNNN NNNNNNNN NNNNNNNN FFFFFFFFF for the final 16 bytes of plaintext.

From here, it’s pretty easy to see that you can encrypt the blocks from 00000002 to FFFFFFFF without overflowing and creating a nonce reuse. This means that each (key, nonce) can be used to encrypt a single message up to $2^{32} - 2$ blocks of the underlying ciphertext.
Since the block size of AES is 16 bytes, this means the maximum length of a single AES-GCM (key, nonce) pair is $2^{36} - 256$ bytes (or 68,719,476,480 bytes). This is approximately 68 GB or 64 GiB.
Things get a bit tricker to analyze when the nonce is not 96 bits, since it’s hashed.
The disadvantage of this hashing behavior is that it’s possible for two different nonces to produce overlapping ranges of AES-CTR output, which makes the security analysis very difficult.
However, this hashed output is also hidden from network observers since they do not know the value of H. Without some method of reliably detecting when you have an overlapping range of hidden block counters, you can’t exploit this.
(If you want to live dangerously and motivate cryptanalysis research, mix 96-bit and non-96-bit nonces with the same key in a system that does something valuable.)
Multi-Message AES-GCM Key Wear-Out

Now that we’ve established the maximum length for a single message, how many messages you can safely encrypt under a given AES-GCM key depends entirely on how your nonce is selected.
If you have a reliable counter, which is guaranteed to never repeat, and start it at 0 you can theoretically encrypt $2^{96}$ messages safely. Hooray!
Hooray!
(Art by Swizz)
You probably don’t have a reliable counter, especially in real-world settings (distributed systems, multi-threaded applications, virtual machines that might be snapshotted and restored, etc.).
Confound you, technical limitations!
(Art by Swizz)
Additionally (thanks to 2adic for the expedient correction), you cannot safely encrypt more than $2^{64}$ blocks with AES because the keystream blocks–as the output of a block cipher–cannot repeat.
Most systems that cannot guarantee unique incrementing nonces simply generate nonces with a cryptographically secure random number generator. This is a good idea, but no matter how high quality your random number generator is, random functions will produce collisions with a discrete probability.
If you have possible values, you should expect a single collision(with 50% probability) after $\sqrt{2^{n}}$ (or $2^{n/2}$ )samples. This is called the birthday bound.
However, 50% of a nonce reuse isn’t exactly a comfortable safety threshold for most systems (especially since nonce reuse will cause AES-GCM to become vulnerable to active attackers). 1 in 4 billion is a much more comfortable safety margin against nonce reuse via collisions than 1 in 2. Fortunately, you can calculate the discrete probability of a birthday collision pretty easily.
If you want to rekey after your collision probability exceeds $2^{-32}$ (for a random nonce between 0 and $2^{96}$ ), you simply need to re-key after $2^{32}$ messages.
AES-GCM Safety Limits

Maximum message length: $2^{36} - 256$ bytes
Maximum number of messages (random nonce): $2^{32}$
Maximum number of messages (sequential nonce): $2^{64}$ (but you probably don’t have this luxury in the real world)
Maximum data safely encrypted under a single key with a random nonce: about $2^{68}$ bytes
Not bad, but we can do better.
(Art by Khia.)
Cryptographic Limits for ChaCha20-Poly1305

The IETF version of ChaCha20-Poly1305 uses 96-bit nonces and 32-bit internal counters. A similar analysis follows from AES-GCM’s, with a few notable exceptions.
For starters, the one-time Poly1305 key is derived from the first 32 bytes of the ChaCha20 keystream output (block 0) for a given (nonce, key) pair. There is no equivalent to AES-GCM’s H parameter which is static for each key. (The ChaCha20 encryption begins using block 1.)
Additionally, each block for ChaCha20 is 512 bits, unlike AES’s 128 bits. So the message limit here is a little more forgiving.
Since the block size is 512 bits (or 64 bytes), and only one block is consumed for Poly1305 key derivation, we can calculate a message length limit of $2^{38} - 64$ , or 274,877,906,880 bytes–nearly 256 GiB for each (nonce, key) pair.
The same rules for handling 96-bit nonces applies as with AES-GCM, so we can carry that value forward.
ChaCha20-Poly1305 Safety Limits

Maximum message length: $2^{38} - 64$ bytes
Maximum number of messages (random nonce): $2^{32}$
Maximum number of messages (sequential nonce): $2^{96}$ (but you probably don’t have this luxury in the real world)
Maximum data safely encrypted under a single key with a random nonce: about $2^{70}$ bytes
A significant improvement, but still practically limited.
(Art by Khia.)
Cryptographic Limits for XChaCha20-Poly1305

XChaCha20-Poly1305 is a variant of XSalsa20-Poly1305 (as used in libsodium) and the IETF’s ChaCha20-Poly1305 construction. It features 192-bit nonces and 32-bit internal counters.
XChaCha20-Poly1305 is instantiated by using HChaCha20 of the key over the first 128 bits of the nonce to produce a subkey, which is used with the remaining nonce bits using the aforementioned ChaCha20-Poly1305.
~~This doesn’t change the maximum message length,~~ but it does change the number of messages you can safely encrypt (since you’re actually using up to $2^{128}$ distinct keys).
Thus, even if you manage to repeat the final ChaCha20-Poly1305 nonce, as long as the total nonce differs, each encryptions will be performed with a distinct key (thanks to the HChaCha20 key derivation; see the XSalsa20 paper and IETF RFC draft for details).
UPDATE (2021-04-15): It turns out, my read of the libsodium implementation was erroneous due to endian-ness. The maximum message length for XChaCha20-Poly1305 is $2^{64}$ blocks, and for AEAD_XChaCha20_Poly1305 is $2^{64} - 1$ blocks. Each block is 64 bytes, so that changes the maximum message length to about $2^{70}$ . This doesn’t change the extended-nonce details, just the underlying ChaCha usage.
XChaCha20-Poly1305 Safety Limits

Maximum message length: $2^{70}$ bytes (earlier version of this document said ~~$2^{38} - 64$~~ )
Maximum number of messages (random nonce): $2^{80}$
Maximum number of messages (sequential nonce): $2^{192}$ (but you probably don’t have this luxury in the real world)
Maximum data safely encrypted under a single key with a random nonce: about $2^{118}$ bytes
I can ~~see~~ encrypt forever, man.
(Art by Khia.)
Cryptographic Limits for AES-CBC

It’s tempting to compare non-AEAD constructions and block cipher modes such as CBC (Cipher Block Chaining), but they’re totally different monsters.
AEAD ciphers have a clean delineation between message length limit and the message quantity limit
CBC and other cipher modes do not have this separation
Every time you encrypt a block with AES-CBC, you are depleting from a universal bucket that affects the birthday bound security of encrypting more messages under that key. (And unlike AES-GCM with long nonces, AES-CBC’s IV is public.)
This is in addition to the operational requirements of AES-CBC (plaintext padding, initialization vectors that never repeat and must be unpredictable, separate message authentication since CBC doesn’t provide integrity and is vulnerable to chosen-ciphertext atacks, etc.).
My canned response to most queries about AES-CBC.
(Art by Khia.)
For this reason, most cryptographers don’t even bother calculating the safety limit for AES-CBC in the same breath as discussing AES-GCM. And they’re right to do so!
If you find yourself using AES-CBC (or AES-CTR, for that matter), you’d best be performing a separate HMAC-SHA256 over the ciphertext (and verifying this HMAC with a secure comparison function before decrypting). Additionally, you should consider using an extended nonce construction to split one-time encryption and authentication keys.
(Art by Riley.)
However, for the sake of completeness, let’s figure out what our practical limits are.
CBC operates on entire blocks of plaintext, whether you need the entire block or not.
On encryption, the output of the previous block is mixed (using XOR) with the current block, then encrypted with the block cipher. For the first block, the IV is used in the place of a “previous” block. (Hence, its requirements to be non-repeating and unpredictable.)
This means you can informally model (IV xor PlaintextBlock) and (PBn xor PBn+1) as a pseudo-random function, before it’s encrypted with the block cipher.
If those words don’t mean anything to you, here’s the kicker: You can use the above discussion about birthday bounds to calculate the upper safety bounds for the total number of blocks encrypted under a single AES key (assuming IVs are generated from a secure random source).
If you’re okay with a 50% probability of a collision, you should re-key after $2^{64}$ blocks have been encrypted.
https://www.youtube.com/watch?v=v0IsYNDMV7A
If your safety margin is closer to the 1 in 4 billion (as with AES-GCM), you want to rekey after $2^{48}$ blocks.
However, blocks encrypted doesn’t map neatly to bytes encrypted.
If your plaintext is always an even multiple of 128 bits (or 16 bytes), this allows for up to $2^{52}$ bytes of plaintext. If you’re using PKCS#7 padding, keep in mind that this will include an entire padding block per message, so your safety margin will deplete a bit faster (depending on how many individual messages you encrypt, and therefore how many padding blocks you need).
On the other extreme (1-byte plaintexts), you’ll only be able to eek $2^{48}$ encrypted bytes before you should re-key.
Therefore, to stay within the safety margin of AES-CBC, you SHOULD re-key after $2^{48}$ blocks (including padding) have been encrypted.
Keep in mind: $2^{48}$ single-byte blocks is still approximately 281 TiB of data (including padding). On the upper end, $2^{48}$ 15-byte blocks (with 1-byte padding to stay within a block) clocks in at about $2^{51.9}$ or about 4.22 PiB of data.
That’s Blocks. What About Bytes?

The actual plaintext byte limit sans padding is a bit fuzzy and context-dependent.
The local extrema occurs if your plaintext is always 16 bytes (and thus requires an extra 16 bytes of padding). Any less, and the padding fits within one block. Any more, and the data😛adding ratio starts to dominate.
Therefore, the worst case scenario with padding is that you take the above safety limit for block counts, and cut it in half. Cutting a number in half means reducing the exponent by 1.
But this still doesn’t eliminate the variance. $2^{47}$ blocks could be anywhere from $2^{47}$ to $2^{50.9}$ bytes of real plaintext. When in situations like this, we have to assume the worst (n.b. take the most conservative value).
Therefore…
AES-CBC Safety Limits

Maximum data safely encrypted under a single key with a random nonce: $2^{47}$ bytes (approximately 141 TiB)
Yet another reason to dislike non-AEAD ciphers.
(Art by Khia.)
Take-Away

Compared to AES-CBC, AES-GCM gives you approximately a million times as much usage out of the same key, for the same threat profile.
ChaCha20-Poly1305 and XChaCha20-Poly1305 provides even greater allowances of encrypting data under the same key. The latter is even safe to use to encrypt arbitrarily large volumes of data under a single key without having to worry about ever practically hitting the birthday bound.
I’m aware that this blog post could have simply been a comparison table and a few footnotes (or even an IETF RFC draft), but I thought it would be more fun to explain how these values are derived from the cipher constructions.
(Art by Khia.)
https://soatok.blog/2020/12/24/cryptographic-wear-out-for-symmetric-encryption/
#AES #AESCBC #AESGCM #birthdayAttack #birthdayBound #cryptography #safetyMargin #SecurityGuidance #symmetricCryptography #symmetricEncryption #wearOut