Search
Items tagged with: hashFunction
NIST opened public comments on SP 800-108 Rev. 1 (the NIST recommendations for Key Derivation Functions) last month. The main thing that’s changed from the original document published in 2009 is the inclusion of the Keccak-based KMAC alongside the incumbent algorithms.
One of the recommendations of SP 800-108 is called “KDF in Counter Mode”. A related document, SP 800-56C, suggests using a specific algorithm called HKDF instead of the generic Counter Mode construction from SP 800-108–even though they both accomplish the same goal.
Isn’t standards compliance fun?
Interestingly, HKDF isn’t just an inconsistently NIST-recommended KDF, it’s also a common building block in a software developer’s toolkit which sees a lot of use in different protocols.
Unfortunately, the way HKDF is widely used is actually incorrect given its formal security definition. I’ll explain what I mean in a moment.
Art: Scruff
What is HKDF?
To first understand what HKDF is, you first need to know about HMAC.
HMAC is a standard message authentication code (MAC) algorithm built with cryptographic hash functions (that’s the H). HMAC is specified in RFC 2104 (yes, it’s that old).
HKDF is a key-derivation function that uses HMAC under-the-hood. HKDF is commonly used in encryption tools (Signal, age). HKDF is specified in RFC 5869.
HKDF is used to derive a uniformly-random secret key, typically for use with symmetric cryptography algorithms. In any situation where a key might need to be derived, you might see HKDF being used. (Although, there may be better algorithms.)
Art: LvJ
How Developers Understand and Use HKDF
If you’re a software developer working with cryptography, you’ve probably seen an API in the crypto module for your programming language that looks like this, or maybe this.
hash_hkdf( string $algo, string $key, int $length = 0, string $info = "", string $salt = ""): string
Software developers that work with cryptography will typically think of the HKDF parameters like so:
$algo
— which hash function to use$key
— the input key, from which multiple keys can be derived$length
— how many bytes to derive$info
— some arbitrary string used to bind a derived key to an intended context$salt
— some additional randomness (optional)
The most common use-case of HKDF is to implement key-splitting, where a single input key (the Initial Keying Material, or IKM) is used to derive two or more independent keys, so that you’re never using a single key for multiple algorithms.
See also: [url=https://github.com/defuse/php-encryption]defuse/php-encryption[/url]
, a popular PHP encryption library that does exactly what I just described.
At a super high level, the HKDF usage I’m describing looks like this:
class MyEncryptor {protected function splitKeys(CryptographyKey $key, string $salt): array { $encryptKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, 'encryption', $salt )); $authKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, 'message authentication', $salt )); return [$encryptKey, $authKey];}public function encryptString(string $plaintext, CryptographyKey $key): string{ $salt = random_bytes(32); [$encryptKey, $hmacKey] = $this->splitKeys($key, $salt); // ... encryption logic here ... return base64_encode($salt . $ciphertext . $mac);}public function decryptString(string $encrypted, CryptographyKey $key): string{ $decoded = base64_decode($encrypted); $salt = mb_substr($decoded, 0, 32, '8bit'); [$encryptKey, $hmacKey] = $this->splitKeys($key, $salt); // ... decryption logic here ... return $plaintext;}// ... other method here ...}
Unfortunately, anyone who ever does something like this just violated one of the core assumptions of the HKDF security definition and no longer gets to claim “KDF security” for their construction. Instead, your protocol merely gets to claim “PRF security”.
Art: Harubaki
KDF? PRF? OMGWTFBBQ?
Let’s take a step back and look at some basic concepts.
(If you want a more formal treatment, read this Stack Exchange answer.)
PRF: Pseudo-Random Functions
A pseudorandom function (PRF) is an efficient function that emulates a random oracle.
“What the hell’s a random oracle?” you ask? Well, Thomas Pornin has the best explanation for random oracles:
A random oracle is described by the following model:
- There is a black box. In the box lives a gnome, with a big book and some dice.
- We can input some data into the box (an arbitrary sequence of bits).
- Given some input that he did not see beforehand, the gnome uses his dice to generate a new output, uniformly and randomly, in some conventional space (the space of oracle outputs). The gnome also writes down the input and the newly generated output in his book.
- If given an already seen input, the gnome uses his book to recover the output he returned the last time, and returns it again.
So a random oracle is like a kind of hash function, such that we know nothing about the output we could get for a given input message m. This is a useful tool for security proofs because they allow to express the attack effort in terms of number of invocations to the oracle.
The problem with random oracles is that it turns out to be very difficult to build a really “random” oracle. First, there is no proof that a random oracle can really exist without using a gnome. Then, we can look at what we have as candidates: hash functions. A secure hash function is meant to be resilient to collisions, preimages and second preimages. These properties do not imply that the function is a random oracle.
Thomas Pornin
Alternatively, Wikipedia has a more formal definition available to the academic-inclined.
In practical terms, we can generate a strong PRF out of secure cryptographic hash functions by using a keyed construction; i.e. HMAC.
Thus, as long as your HMAC key is a secret, the output of HMAC can be generally treated as a PRF for all practical purposes. Your main security consideration (besides key management) is the collision risk if you truncate its output.
Art: LvJ
KDF: Key Derivation Functions
A key derivation function (KDF) is exactly what it says on the label: a cryptographic algorithm that derives one or more cryptographic keys from a secret input (which may be another cryptography key, a group element from a Diffie-Hellman key exchange, or a human-memorable password).
Note that passwords should be used with a Password-Based Key Derivation Function, such as scrypt or Argon2id, not HKDF.
Despite what you may read online, KDFs do not need to be built upon cryptographic hash functions, specifically; but in practice, they often are.
A notable counter-example to this hash function assumption: CMAC in Counter Mode (from NIST SP 800-108) uses AES-CMAC, which is a variable-length input variant of CBC-MAC. CBC-MAC uses a block cipher, not a hash function.
Regardless of the construction, KDFs use a PRF under the hood, and the output of a KDF is supposed to be a uniformly random bit string.
Art: LvJ
PRF vs KDF Security Definitions
The security definition for a KDF has more relaxed requirements than PRFs: PRFs require the secret key be uniformly random. KDFs do not have this requirement.
If you use a KDF with a non-uniformly random IKM, you probably need the KDF security definition.
If your IKM is already uniformly random (i.e. the “key separation” use case), you can get by with just a PRF security definition.
After all, the entire point of KDFs is to allow a congruent security level as you’d get from uniformly random secret keys, without also requiring them.
However, if you’re building a protocol with a security requirement satisfied by a KDF, but you actually implemented a PRF (i.e., not a KDF), this is a security vulnerability in your cryptographic design.
Art: LvJ
The HKDF Algorithm
HKDF is an HMAC-based KDF. Its algorithm consists of two distinct steps:
HKDF-Extract
uses the Initial Keying Material (IKM) and Salt to produce a Pseudo-Random Key (PRK).HKDF-Expand
actually derives the keys using PRK, theinfo
parameter, and a counter (from0
to255
) for each hash function output needed to generate the desired output length.
If you’d like to see an implementation of this algorithm, defuse/php-encryption
provides one (since it didn’t land in PHP until 7.1.0). Alternatively, there’s a Python implementation on Wikipedia that uses HMAC-SHA256.
This detail about the two steps will matter a lot in just a moment.
Art: Swizz
How HKDF Salts Are Misused
The HKDF paper, written by Hugo Krawczyk, contains the following definition (page 7).
The paper goes on to discuss the requirements for authenticating the salt over the communication channel, lest the attacker have the ability to influence it.
A subtle detail of this definition is that the security definition says that A salt value , not Multiple salt values.
Which means: You’re not supposed to use HKDF with a constant IKM, info label, etc. but vary the salt for multiple invocations. The salt must either be a fixed random value, or NULL.
The HKDF RFC makes this distinction even less clear when it argues for random salts.
We stress, however, that the use of salt adds significantly to the strength of HKDF, ensuring independence between different uses of the hash function, supporting “source-independent” extraction, and strengthening the analytical results that back the HKDF design.Random salt differs fundamentally from the initial keying material in two ways: it is non-secret and can be re-used. As such, salt values are available to many applications. For example, a pseudorandom number generator (PRNG) that continuously produces outputs by applying HKDF to renewable pools of entropy (e.g., sampled system events) can fix a salt value and use it for multiple applications of HKDF without having to protect the secrecy of the salt. In a different application domain, a key agreement protocol deriving cryptographic keys from a Diffie-Hellman exchange can derive a salt value from public nonces exchanged and authenticated between communicating parties as part of the key agreement (this is the approach taken in [IKEv2]).
RFC 5869, section 3.1
Okay, sure. Random salts are better than a NULL salt. And while this section alludes to “[fixing] a salt value” to “use it for multiple applications of HKDF without having to protect the secrecy of the salt”, it never explicitly states this requirement. Thus, the poor implementor is left to figure this out on their own.
Thus, because it’s not using HKDF in accordance with its security definition, many implementations (such as the PHP encryption library we’ve been studying) do not get to claim that their construction has KDF security.
Instead, they only get to claim “Strong PRF” security, which you can get from just using HMAC.
Art: LvJ
What Purpose Do HKDF Salts Actually Serve?
Recall that the HKDF algorithm uses salts in the HDKF-Extract step. Salts in this context were intended for deriving keys from a Diffie-Hellman output, or a human-memorable password.
In the case of [Elliptic Curve] Diffie-Hellman outputs, the result of the key exchange algorithm is a random group element, but not necessarily uniformly random bit string. There’s some structure to the output of these functions. This is why you always, at minimum, apply a cryptographic hash function to the output of [EC]DH before using it as a symmetric key.
HKDF uses salts as a mechanism to improve the quality of randomness when working with group elements and passwords.
Extending the nonce for a symmetric-key AEAD mode is a good idea, but using HKDF’s salt parameter specifically to accomplish this is a misuse of its intended function, and produces a weaker argument for your protocol’s security than would otherwise be possible.
How Should You Introduce Randomness into HKDF?
Just shove it in the info
parameter.
Art: LvJ
It may seem weird, and defy intuition, but the correct way to introduce randomness into HKDF as most developers interact with the algorithm is to skip the salt parameter entirely (either fixing it to a specific value for domain-separation or leaving it NULL), and instead concatenate data into the info
parameter.
class BetterEncryptor extends MyEncryptor {protected function splitKeys(CryptographyKey $key, string $salt): array { $encryptKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, $salt . 'encryption', '' // intentionally empty )); $authKey = new CryptographyKey(hash_hkdf( 'sha256', $key->getRawBytes(), 32, $salt . 'message authentication', '' // intentionally empty )); return [$encryptKey, $authKey];}}
Of course, you still have to watch out for canonicalization attacks if you’re feeding multi-part messages into the info tag.
Another advantage: This also lets you optimize your HKDF calls by caching the PRK from the HKDF-Extract
step and reuse it for multiple invocations of HKDF-Expand
with a distinct info
. This allows you to reduce the number of hash function invocations from to (since each HMAC involves two hash function invocations).
Notably, this HKDF salt usage was one of the things that was changed in V3/V4 of PASETO.
Does This Distinction Really Matter?
If it matters, your cryptographer will tell you it matters–which probably means they have a security proof that assumes the KDF security definition for a very good reason, and you’re not allowed to violate that assumption.
Otherwise, probably not. Strong PRF security is still pretty damn good for most threat models.
Art: LvJ
Closing Thoughts
If your takeaway was, “Wow, I feel stupid,” don’t, because you’re in good company.
I’ve encountered several designs in my professional life that shoved the randomness into the info
parameter, and it perplexed me because there was a perfectly good salt parameter right there. It turned out, I was wrong to believe that, for all of the subtle and previously poorly documented reasons discussed above. But now we both know, and we’re all better off for it.
So don’t feel dumb for not knowing. I didn’t either, until this was pointed out to me by a very patient colleague.
“Feeling like you were stupid” just means you learned.
(Art: LvJ)
Also, someone should really get NIST to be consistent about whether you should use HKDF or “KDF in Counter Mode with HMAC” as a PRF, because SP 800-108’s new revision doesn’t concede this point at all (presumably a relic from the 2009 draft).
This concession was made separately in 2011 with SP 800-56C revision 1 (presumably in response to criticism from the 2010 HKDF paper), and the present inconsistency is somewhat vexing.
(On that note, does anyone actually use the NIST 800-108 KDFs instead of HKDF? If so, why? Please don’t say you need CMAC…)
Bonus Content
These questions were asked after this blog post initially went public, and I thought they were worth adding. If you ask a good question, it may end up being edited in at the end, too.
Art: LvJ
Why Does HKDF use the Salt as the HMAC key in the Extract Step? (via r/crypto)
Broadly speaking, when applying a PRF to two “keys”, you get to decide which one you treat as the “key” in the underlying API.
HMAC’s API is HMACalg(key, message), but how HKDF uses it might as well be HMACalg(key1, key2).
The difference here seems almost arbitrary, but there’s a catch.
HKDF was designed for Diffie-Hellman outputs (before ECDH was the norm), which are generally able to be much larger than the block size of the underlying hash function. 2048-bit DH results fit in 256 bytes, which is 4 times the SHA256 block size.
If you have to make a decision, using the longer input (DH output) as the message is more intuitive for analysis than using it as the key, due to pre-hashing. I’ve discussed the counter-intuitive nature of HMAC’s pre-hashing behavior at length in this post, if you’re interested.
So with ECDH, it literally doesn’t matter which one was used (unless you have a weird mismatch in hash functions and ECC groups; i.e. NIST P-521 with SHA-224).
But before the era of ECDH, it was important to use the salt as the HMAC key in the extract step, since they were necessarily smaller than a DH group element.
Thus, HKDF chose HMACalg(salt, IKM) instead of HMACalg(IKM, salt) for the calculation of PRK in the HKDF-Extract step.
Neil Madden also adds that the reverse would create a chicken-egg situation, but I personally suspect that the pre-hashing would be more harmful to the security analysis than merely supplying a non-uniformly random bit string as an HMAC key in this specific context.
My reason for believing this is, when a salt isn’t supplied, it defaults to a string of 0x00
bytes as long as the output size of the underlying hash function. If the uniform randomness of the salt mattered that much, this wouldn’t be a tolerable condition.
https://soatok.blog/2021/11/17/understanding-hkdf/
#cryptographicHashFunction #cryptography #hashFunction #HMAC #KDF #keyDerivationFunction #securityDefinition #SecurityGuidance
As we look upon the sunset of a remarkably tiresome year, I thought it would be appropriate to talk about cryptographic wear-out.What is cryptographic wear-out?
It’s the threshold when you’ve used the same key to encrypt so much data that you should probably switch to a new key before you encrypt any more. Otherwise, you might let someone capable of observing all your encrypted data perform interesting attacks that compromise the security of the data you’ve encrypted.My definitions always aim to be more understandable than pedantically correct.
(Art by Swizz)The exact value of the threshold varies depending on how exactly you’re encrypting data (n.b. AEAD modes, block ciphers + cipher modes, etc. each have different wear-out thresholds due to their composition).
Let’s take a look at the wear-out limits of the more popular symmetric encryption methods, and calculate those limits ourselves.
Specific Ciphers and Modes
(Art by Khia. Poorly edited by the author.)Cryptographic Limits for AES-GCM
I’ve written about AES-GCM before (and why I think it sucks).AES-GCM is a construction that combines AES-CTR with an authenticator called GMAC, whose consumption of nonces looks something like this:
- Calculating H (used in GHASH for all messages encrypted under the same key, regardless of nonce):
Encrypt(00000000 00000000 00000000 00000000)
- Calculating J0 (the pre-counter block):
- If the nonce is 96 bits long:
NNNNNNNN NNNNNNNN NNNNNNNN 00000001
where theN
spaces represent the nonce hexits.
- Otherwise:
s = 128 * ceil(len(nonce)/nonce) - len(nonce)
J0 = GHASH(H, nonce || zero(s+64) || int2bytes(len(nonce))
- Each block of data encrypted uses J0 + block counter (starting at 1) as a CTR nonce.
- J0 is additionally used as the nonce to calculate the final GMAC tag.
AES-GCM is one of the algorithms where it’s easy to separately calculate the safety limits per message (i.e. for a given nonce and key), as well as for all messages under a key.
AES-GCM Single Message Length Limits
In the simplest case (nonce is 96 bits), you end up with the following nonces consumed:
- For each key:
00000000 00000000 00000000 00000000
- For each (nonce, key) pair:
NNNNNNNN NNNNNNNN NNNNNNNN 000000001
for J0NNNNNNNN NNNNNNNN NNNNNNNN 000000002
for encrypting the first 16 bytes of plaintextNNNNNNNN NNNNNNNN NNNNNNNN 000000003
for the next 16 bytes of plaintext…- …
NNNNNNNN NNNNNNNN NNNNNNNN FFFFFFFFF
for the final 16 bytes of plaintext.
From here, it’s pretty easy to see that you can encrypt the blocks from
00000002
toFFFFFFFF
without overflowing and creating a nonce reuse. This means that each (key, nonce) can be used to encrypt a single message up to blocks of the underlying ciphertext.Since the block size of AES is 16 bytes, this means the maximum length of a single AES-GCM (key, nonce) pair is bytes (or 68,719,476,480 bytes). This is approximately 68 GB or 64 GiB.
Things get a bit tricker to analyze when the nonce is not 96 bits, since it’s hashed.
The disadvantage of this hashing behavior is that it’s possible for two different nonces to produce overlapping ranges of AES-CTR output, which makes the security analysis very difficult.
However, this hashed output is also hidden from network observers since they do not know the value of H. Without some method of reliably detecting when you have an overlapping range of hidden block counters, you can’t exploit this.
(If you want to live dangerously and motivate cryptanalysis research, mix 96-bit and non-96-bit nonces with the same key in a system that does something valuable.)
Multi-Message AES-GCM Key Wear-Out
Now that we’ve established the maximum length for a single message, how many messages you can safely encrypt under a given AES-GCM key depends entirely on how your nonce is selected.If you have a reliable counter, which is guaranteed to never repeat, and start it at 0 you can theoretically encrypt messages safely. Hooray!
Hooray!
(Art by Swizz)You probably don’t have a reliable counter, especially in real-world settings (distributed systems, multi-threaded applications, virtual machines that might be snapshotted and restored, etc.).
Confound you, technical limitations!
(Art by Swizz)Additionally (thanks to 2adic for the expedient correction), you cannot safely encrypt more than blocks with AES because the keystream blocks–as the output of a block cipher–cannot repeat.
Most systems that cannot guarantee unique incrementing nonces simply generate nonces with a cryptographically secure random number generator. This is a good idea, but no matter how high quality your random number generator is, random functions will produce collisions with a discrete probability.
If you have possible values, you should expect a single collision(with 50% probability) after (or )samples. This is called the birthday bound.
However, 50% of a nonce reuse isn’t exactly a comfortable safety threshold for most systems (especially since nonce reuse will cause AES-GCM to become vulnerable to active attackers). 1 in 4 billion is a much more comfortable safety margin against nonce reuse via collisions than 1 in 2. Fortunately, you can calculate the discrete probability of a birthday collision pretty easily.
If you want to rekey after your collision probability exceeds (for a random nonce between 0 and ), you simply need to re-key after messages.
AES-GCM Safety Limits
- Maximum message length: bytes
- Maximum number of messages (random nonce):
- Maximum number of messages (sequential nonce): (but you probably don’t have this luxury in the real world)
- Maximum data safely encrypted under a single key with a random nonce: about bytes
Not bad, but we can do better.
(Art by Khia.)Cryptographic Limits for ChaCha20-Poly1305
The IETF version of ChaCha20-Poly1305 uses 96-bit nonces and 32-bit internal counters. A similar analysis follows from AES-GCM’s, with a few notable exceptions.For starters, the one-time Poly1305 key is derived from the first 32 bytes of the ChaCha20 keystream output (block 0) for a given (nonce, key) pair. There is no equivalent to AES-GCM’s H parameter which is static for each key. (The ChaCha20 encryption begins using block 1.)
Additionally, each block for ChaCha20 is 512 bits, unlike AES’s 128 bits. So the message limit here is a little more forgiving.
Since the block size is 512 bits (or 64 bytes), and only one block is consumed for Poly1305 key derivation, we can calculate a message length limit of , or 274,877,906,880 bytes–nearly 256 GiB for each (nonce, key) pair.
The same rules for handling 96-bit nonces applies as with AES-GCM, so we can carry that value forward.
ChaCha20-Poly1305 Safety Limits
- Maximum message length: bytes
- Maximum number of messages (random nonce):
- Maximum number of messages (sequential nonce): (but you probably don’t have this luxury in the real world)
- Maximum data safely encrypted under a single key with a random nonce: about bytes
A significant improvement, but still practically limited.
(Art by Khia.)Cryptographic Limits for XChaCha20-Poly1305
XChaCha20-Poly1305 is a variant of XSalsa20-Poly1305 (as used in libsodium) and the IETF’s ChaCha20-Poly1305 construction. It features 192-bit nonces and 32-bit internal counters.XChaCha20-Poly1305 is instantiated by using HChaCha20 of the key over the first 128 bits of the nonce to produce a subkey, which is used with the remaining nonce bits using the aforementioned ChaCha20-Poly1305.
This doesn’t change the maximum message length,but it does change the number of messages you can safely encrypt (since you’re actually using up to distinct keys).Thus, even if you manage to repeat the final ChaCha20-Poly1305 nonce, as long as the total nonce differs, each encryptions will be performed with a distinct key (thanks to the HChaCha20 key derivation; see the XSalsa20 paper and IETF RFC draft for details).
UPDATE (2021-04-15): It turns out, my read of the libsodium implementation was erroneous due to endian-ness. The maximum message length for XChaCha20-Poly1305 is blocks, and for AEAD_XChaCha20_Poly1305 is blocks. Each block is 64 bytes, so that changes the maximum message length to about . This doesn’t change the extended-nonce details, just the underlying ChaCha usage.
XChaCha20-Poly1305 Safety Limits
- Maximum message length: bytes (earlier version of this document said
)- Maximum number of messages (random nonce):
- Maximum number of messages (sequential nonce): (but you probably don’t have this luxury in the real world)
- Maximum data safely encrypted under a single key with a random nonce: about bytes
I can
seeencrypt forever, man.
(Art by Khia.)Cryptographic Limits for AES-CBC
It’s tempting to compare non-AEAD constructions and block cipher modes such as CBC (Cipher Block Chaining), but they’re totally different monsters.
- AEAD ciphers have a clean delineation between message length limit and the message quantity limit
- CBC and other cipher modes do not have this separation
Every time you encrypt a block with AES-CBC, you are depleting from a universal bucket that affects the birthday bound security of encrypting more messages under that key. (And unlike AES-GCM with long nonces, AES-CBC’s IV is public.)
This is in addition to the operational requirements of AES-CBC (plaintext padding, initialization vectors that never repeat and must be unpredictable, separate message authentication since CBC doesn’t provide integrity and is vulnerable to chosen-ciphertext atacks, etc.).
My canned response to most queries about AES-CBC.
(Art by Khia.)For this reason, most cryptographers don’t even bother calculating the safety limit for AES-CBC in the same breath as discussing AES-GCM. And they’re right to do so!
If you find yourself using AES-CBC (or AES-CTR, for that matter), you’d best be performing a separate HMAC-SHA256 over the ciphertext (and verifying this HMAC with a secure comparison function before decrypting). Additionally, you should consider using an extended nonce construction to split one-time encryption and authentication keys.
(Art by Riley.)
However, for the sake of completeness, let’s figure out what our practical limits are.
CBC operates on entire blocks of plaintext, whether you need the entire block or not.
On encryption, the output of the previous block is mixed (using XOR) with the current block, then encrypted with the block cipher. For the first block, the IV is used in the place of a “previous” block. (Hence, its requirements to be non-repeating and unpredictable.)
This means you can informally model (IV xor PlaintextBlock) and (PBn xor PBn+1) as a pseudo-random function, before it’s encrypted with the block cipher.
If those words don’t mean anything to you, here’s the kicker: You can use the above discussion about birthday bounds to calculate the upper safety bounds for the total number of blocks encrypted under a single AES key (assuming IVs are generated from a secure random source).
If you’re okay with a 50% probability of a collision, you should re-key after blocks have been encrypted.
https://www.youtube.com/watch?v=v0IsYNDMV7A
If your safety margin is closer to the 1 in 4 billion (as with AES-GCM), you want to rekey after blocks.
However, blocks encrypted doesn’t map neatly to bytes encrypted.
If your plaintext is always an even multiple of 128 bits (or 16 bytes), this allows for up to bytes of plaintext. If you’re using PKCS#7 padding, keep in mind that this will include an entire padding block per message, so your safety margin will deplete a bit faster (depending on how many individual messages you encrypt, and therefore how many padding blocks you need).
On the other extreme (1-byte plaintexts), you’ll only be able to eek encrypted bytes before you should re-key.
Therefore, to stay within the safety margin of AES-CBC, you SHOULD re-key after blocks (including padding) have been encrypted.
Keep in mind: single-byte blocks is still approximately 281 TiB of data (including padding). On the upper end, 15-byte blocks (with 1-byte padding to stay within a block) clocks in at about or about 4.22 PiB of data.
That’s Blocks. What About Bytes?
The actual plaintext byte limit sans padding is a bit fuzzy and context-dependent.The local extrema occurs if your plaintext is always 16 bytes (and thus requires an extra 16 bytes of padding). Any less, and the padding fits within one block. Any more, and the data😛adding ratio starts to dominate.
Therefore, the worst case scenario with padding is that you take the above safety limit for block counts, and cut it in half. Cutting a number in half means reducing the exponent by 1.
But this still doesn’t eliminate the variance. blocks could be anywhere from to bytes of real plaintext. When in situations like this, we have to assume the worst (n.b. take the most conservative value).
Therefore…
AES-CBC Safety Limits
- Maximum data safely encrypted under a single key with a random nonce: bytes (approximately 141 TiB)
Yet another reason to dislike non-AEAD ciphers.
(Art by Khia.)Take-Away
Compared to AES-CBC, AES-GCM gives you approximately a million times as much usage out of the same key, for the same threat profile.ChaCha20-Poly1305 and XChaCha20-Poly1305 provides even greater allowances of encrypting data under the same key. The latter is even safe to use to encrypt arbitrarily large volumes of data under a single key without having to worry about ever practically hitting the birthday bound.
I’m aware that this blog post could have simply been a comparison table and a few footnotes (or even an IETF RFC draft), but I thought it would be more fun to explain how these values are derived from the cipher constructions.
(Art by Khia.)
https://soatok.blog/2020/12/24/cryptographic-wear-out-for-symmetric-encryption/
#AES #AESCBC #AESGCM #birthdayAttack #birthdayBound #cryptography #safetyMargin #SecurityGuidance #symmetricCryptography #symmetricEncryption #wearOut
Programmers don’t understand hash functions, and I can demonstrate this to most of the people that will read this with a single observation:
When you saw the words “hash function” in the title, you might have assumed this was going to be a blog post about password storage. (Passwords are the most common knee-jerk reaction I get to any discussion about hash functions, anyway. A little bit of security knowledge can be very dangerous.)
I don’t blame software developers for their lack of understanding on anything I’m going to discuss. The term “hash function” can accurately represent any of the following disparate topics in computer science:
- Converting keys into memory addresses for hash maps
- Creating a one-way message digest to ensure said was transmitted correctly over an untrusted network
- Cryptographic building blocks for digital signatures, message authentication codes, key derivation functions, and so on
- Server-side password storage techniques meant to resist brute force and precomputation attacks
- Perceptual fingerprints of visual data
Some hash functions are reusable across multiple topics from the above list, but even then, not all of the hash functions you’d use for one purpose have the same properties.
Using a hash function for the wrong purpose, or in a place where it doesn’t provide the expected properties, can lead to security vulnerabilities. Some of these vulnerabilities aren’t obvious or straightforward, either, which only serves to magnify confusion.
So let’s talk about hash functions, what programmers get wrong about them, and the correct answers to these common misconceptions.
(Credit: ScruffKerfluff)
What Are Hash Functions?
A hash function is any function that can map an arbitrary-sized input to a fixed-size output.
This is incredibly vague; anything can be a hash function if you’re brave enough. You can treat an arbitrary string as an big integer and return the least significant bit and call this a hash function. (It’s not a particularly good one, but it fits the definition.)
function totallyHashed(input: string|Buffer): Buffer { const inBuf = Buffer.isBuffer(input) ? input : Buffer.from(input); return Buffer.from([ inBuf[inBuf.length - 1] & 1 ]);}
Being able to call something a hash function and be technically correct isn’t very helpful.
Credit: circuitslime
What Kinds of Hash Functions Do We Care About?
There are different types of hash functions suitable for solving different types of problems. (Some examples follow, but this should not be taken as an exhaustive treatment.)
General-Purpose Hash Functions
General-purpose hash functions are useful for converting a key into an index for memory addresses when constructing hash tables. Usually when someone says only “hash function” in a broad computer science context, without any qualifiers, that’s what they’re talking about.
Examples: SipHash, djb2, Murmur3. You can find a comparison here.
Worth calling out: Only some of these general purpose tables are safe to use for hash tables where the keys are provided from user input. For example: JSON data. When in doubt, SipHash-2-4 is a good default.
Cryptographic Hash Functions
Cryptographic hash functions have additional desirable properties (they’re non-invertible and must be resistant to collision attacks and preimage attacks) above general-purpose hash functions. They also have larger output sizes (typically at least 256 bits) than the sort of hash functions you’d use for hash tables. Consequently, they’re slower than the hash functions people tend to use for hash tables.
Cryptographic hash functions are often used in place of a random oracle in security protocols, because actual random oracles do not exist, as Thomas Pornin explains:
A random oracle is described by the following model:
- There is a black box. In the box lives a gnome, with a big book and some dice.
- We can input some data into the box (an arbitrary sequence of bits).
- Given some input that he did not see beforehand, the gnome uses his dice to generate a new output, uniformly and randomly, in some conventional space (the space of oracle outputs). The gnome also writes down the input and the newly generated output in his book.
- If given an already seen input, the gnome uses his book to recover the output he returned the last time, and returns it again.
So a random oracle is like a kind of hash function, such that we know nothing about the output we could get for a given input message m. This is a useful tool for security proofs because they allow to express the attack effort in terms of number of invocations to the oracle.
The problem with random oracles is that it turns out to be very difficult to build a really “random” oracle. First, there is no proof that a random oracle can really exist without using a gnome. Then, we can look at what we have as candidates: hash functions. A secure hash function is meant to be resilient to collisions, preimages and second preimages. These properties do not imply that the function is a random oracle.
Thomas Pornin
I’m intentionally eschewing a lot of detail about what makes a cryptographic hash function secure (e.g. bit diffusion even in reduced rounds), or how they achieve the desirable properties.
If you’re interested in those topics, leave a comment below and I’ll talk about that in a future post.
If you remember nothing else about cryptographic hash functions, just know that checksums (e.g. CRC32) are not cryptographic. (Yes, many hashing APIs include CRC32 alongside good options, but don’t be misled.)
Note: There’s a very pedantic mathematical discussion that can be had about whether or not cryptographic hash functions are truly one-way functions (which, like P = NP vs P != NP, is an open conjecture in mathematics). You don’t have to know, or even care, about this distinction–unless you’re making assumptions about this property in your designs.
Examples: BLAKE3, SHA256.
Art: Lynx vs Jackalope
A Word Of Caution on Message Authentication Codes
In cryptography, message authentication codes are often built with cryptographic hash functions, but not always!
AES-GCM famously uses a function called GHASH, which evaluates a polynomial in , rather than a cryptographic hash function. This provides a great speed boost, but fails to provide collision resistance, leading to interesting results.
Poly1305 is a similar story to GCM, but doesn’t reuse keys the same way.
Although GHASH and Poly1305 are secure MACs, they’re not built from cryptographic hash functions.
CBC-MAC uses a block cipher (typically AES) in cipher-block chaining mode (with an all-zero initialization vector) and outputs the last block as the authentication tag. This offers no collision resistance (as Mega learned the hard way).
When in doubt, HMAC is a safe choice.
Password Hashing Functions
Separate from, but often built from, cryptographic hash functions are password hashing functions. These are the tools you use if your users are sending a username and password over HTTP and you don’t want to become the next RockYou.
Examples: Argon2, scrypt, bcrypt.
Password hashing functions have significant overlap with key-derivation functions (KDFs), but not all KDFs are meant for password hashing, and not all password hashes are meant for key derivation.
It’s perfectly reasonable to use bcrypt to store passwords, but not to derive encryption keys.
Conversely, HKDF is a valid KDF (which has a stricter set of security requirements than a PRF), but it’s not meant for passwords.
Some algorithms (Argon2 and scrypt) can be safely used as a password hashing function or as a KDF.
Modern password hashing algorithms involve a deliberately expensive computation that’s fast to verify once, but expensive to verify multiple times. These algorithms can be tuned (memory usage, parallel threads, number of iterations) to target a specific latency goal. Additionally, most password hashing APIs take care of salt generation for you, using a secure random generator.
Perceptual Hashes
In yet another different direction, you have perceptual hash functions, such as the kind Apple is going to use to violate the privacy of their customers in a desperate attempt to catch a small percentage of depraved individuals peddling CSAM (and expecting the rest of their customers to blindly trust that this capability they built for themselves will never ever be used to stifle dissidents or whistleblowers).
https://twitter.com/matthew_d_green/status/1428414174758117389
I don’t have a lot to say here, except that I don’t trust Apple, especially after the internal memo about “screeching voices of the minority”: Their partners on this project have showed the utmost contempt for privacy activists, LGBTQ+ folks, survivors of child sexual abuse, etc. and they remain committed to them. Fuck Apple.
Suffice to say, cryptographers were not at all surprised by the discovery of practical collisions against Apple’s new perceptual hash function, because perceptual hash functions do not provide the same properties as cryptographic hash functions.
Perceptual hashes of CSAM do not provide collision or preimage resistance, and it would be possible to flood Apple with false positives if a hash of such material were to ever leak publicly. (Maybe an enterprising Internet Troll will one day make a meme generator that does this?)
How Developers Misuse Hash Functions
There are far too many ways that hash functions get misused by software developers to recount in one blog post.
Some of the more common and obvious examples (i.e., using MD5 to store passwords, calling SHA256 “encryption”) aren’t very interesting, and are covered elsewhere, so I’m going to focus on misuse examples that aren’t commonly discussed online.
Encrypt and Hash
A common design pattern from the 2010’s is to hash some data, then encrypt the data that was hashed (but not the hash), and then send both values to another machine.
The expectation here is that, upon decrypting the ciphertext, the hash can be used as a client-side fingerprint to ensure the data wasn’t corrupted in storage.
This use of a hash function is distinct from the Encrypt/MAC discussion (see: the Cryptographic Doom Principle), because it’s often implemented alongside AEAD. (If you aren’t using authenticated encryption, correct that first.)
However, there are two problems here:
- Directly invoking cryptographic hash functions doesn’t involve a cryptographic secret, and thus all clients will produce the same hash of the same plaintext.
- A cryptographic hash can be used to perform offline attacks (e.g. rainbow tables) against the plaintext, especially if the input domain is small (i.e. a phone number).
Art: Swizz
What you really want to use in this situation is HMAC with a static secret key (which is only known client-side).
HMAC ensures that, without access to the secret key, precomputation attacks are not possible. Additionally, if each client has a different secret key (which, they SHOULD), an attacker who only has access to hashes and ciphertext cannot distinguish which records correspond across multiple clients.
This is a weaker proposition than security against chosen plaintexts (IND-CPA), but still provides a higher level of assurance than a naked SHA256 hash.
Searchable Encryption with Hashed Indexes
Similar to the previous example, sometimes developers will be tasked with storing encrypted values in a database while also being able to quickly query the database based on an encrypted value.
The laziest solution to the “encrypted database” use-case is to just use deterministic encryption, such as AES-ECB or with a static IV/nonce. Most people who have some familiarity with cryptography immediately recognize that this is dangerous, so they opt to encrypt securely (i.e. AEAD with random nonces).
However, to support querying, they often just hash their plaintext and store it alongside the ciphertext.
Art: Lynx vs Jackalope
This reintroduces the issues from the previous section (especially rainbow tables), but with additional risks:
- Plaintexts are overwhelmingly likely to have smaller input domains, thereby increasing the utility of hashes to attack the confidentiality of the plaintext.
- No domain separation between different hashes of different encrypted fields.
To address this, there are a few things you can do:
- Truncate your hashes. If you want to frustrate attackers, simply don’t store a full hash of the plaintext. Instead, truncate hashes to 8 or fewer hexits and permit a small degree of false positives in your decryption logic (n.b. by filtering those rows out).
- Use distinct HMAC keys per index. This introduces the solution to the previous section, but also addresses domain separation.
If you’re left wondering, “Can I use both solutions?” The answer is, “Yes, but you just reinvented what CipherSweet calls blind indexes.”
Art: Lynx vs Jackalope
Overconfidence With Collision-Resistance
Previously on this blog, I disclosed a trivial collision vulnerability in the Kerl hash function used by the Iota cryptocurrency project without breaking the underling hash function (Keccak384).
How did I do this? I found multiple input values that, before being passed to the hash function, collide with each other.
Credit: Harubaki
Developers are frequently overconfident about the collision-resistance of their protocol, simply because they use collision-resistant hash functions inside of the protocol. They’re frequently blind to the reality of canonicalization attacks (including the somewhat famous length-extension attack).
This isn’t their fault. If you’ve made this mistake, it isn’t your fault. Cryptography is a difficult topic and requires a lot of experience and expertise to get right.
Closing Thoughts
One of my favorite cryptography websites is for the SPHINCS project, which is a stateless post-quantum hash-based digital signature algorithm.
On this webpage, which has a heavy anarchist aesthetic, there’s a special note to law-enforcement agents that reads:
The word “state” is a technical term in cryptography. Typical hash-based signature schemes need to record information, called “state”, after every signature. Google’s Adam Langley refers to this as a “huge foot-cannon” from a security perspective. By saying “eliminate the state” we are advocating a security improvement, namely adopting signature schemes that do not need to record information after every signature. We are not talking about eliminating other types of states. We love most states, especially yours! Also, “hash” is another technical term and has nothing to do with cannabis.
Now, I personally like this disclaimer because a) it’s funny and b) it reminds us that all cops are bastards.
Art: Lynx vs Jackalope
But it’s also a good reminder of how confusing the term “hash” can be in different fields. Software engineers aren’t the only people who are likely to be confused about hash functions.
(And I can’t even apply the “-ish” suffix to talk about things that behave like hash functions but aren’t hash functions, because that’s a homograph for an even more specific drug term.)
The next time you see a programmer ask a potentially unwise question involving hash functions, having read this blog post, I hope you’ll appreciate how confusing all this shit can be for virtually everyone.
If you’ve made any of the specific mistakes I’ve discussed here, know that you’re in very good company. Some of the best programmers in the industry have made these mistakes before. Hell, I’ve made these exact mistakes before, and worse.
https://soatok.blog/2021/08/24/programmers-dont-understand-hash-functions/
#cryptographicHashFunction #cryptography #hashFunction #SecurityGuidance
If you’re reading this wondering if you should stop using AES-GCM in some standard protocol (TLS 1.3), the short answer is “No, you’re fine”.I specialize in secure implementations of cryptography, and my years of experience in this field have led me to dislike AES-GCM.
This post is about why I dislike AES-GCM’s design, not “why AES-GCM is insecure and should be avoided”. AES-GCM is still miles above what most developers reach for when they want to encrypt (e.g. ECB mode or CBC mode). If you want a detailed comparison, read this.
To be clear: This is solely my opinion and not representative of any company or academic institution.
What is AES-GCM?
AES-GCM is an authenticated encryption mode that uses the AES block cipher in counter mode with a polynomial MAC based on Galois field multiplication.In order to explain why AES-GCM sucks, I have to first explain what I dislike about the AES block cipher. Then, I can describe why I’m filled with sadness every time I see the AES-GCM construction used.
What is AES?
The Advanced Encryption Standard (AES) is a specific subset of a block cipher called Rijndael.Rijndael’s design is based on a substitution-permutation network, which broke tradition from many block ciphers of its era (including its predecessor, DES) in not using a Feistel network.
AES only includes three flavors of Rijndael: AES-128, AES-192, and AES-256. The difference between these flavors is the size of the key and the number of rounds used, but–and this is often overlooked–not the block size.
As a block cipher, AES always operates on 128-bit (16 byte) blocks of plaintext, regardless of the key size.
This is generally considered acceptable because AES is a secure pseudorandom permutation (PRP), which means that every possible plaintext block maps directly to one ciphertext block, and thus birthday collisions are not possible. (A pseudorandom function (PRF), conversely, does have birthday bound problems.)
Why AES Sucks
Art by Khia.Side-Channels
The biggest reason why AES sucks is that its design uses a lookup table (called an S-Box) indexed by secret data, which is inherently vulnerable to cache-timing attacks (PDF).There are workarounds for this AES vulnerability, but they either require hardware acceleration (AES-NI) or a technique called bitslicing.
The short of it is: With AES, you’re either using hardware acceleration, or you have to choose between performance and security. You cannot get fast, constant-time AES without hardware support.
Block Size
AES-128 is considered by experts to have a security level of 128 bits.Similarly, AES-192 gets certified at 192-bit security, and AES-256 gets 256-bit security.
However, the AES block size is only 128 bits!
That might not sound like a big deal, but it severely limits the constructions you can create out of AES.
Consider the case of AES-CBC, where the output of each block of encryption is combined with the next block of plaintext (using XOR). This is typically used with a random 128-bit block (called the initialization vector, or IV) for the first block.
This means you expect a collision after encrypting (at 50% probability) blocks.
When you start getting collisions, you can break CBC mode, as this video demonstrates:
https://www.youtube.com/watch?v=v0IsYNDMV7A
This is significantly smaller than the you expect from AES.
Post-Quantum Security?
With respect to the number of attempts needed to find the correct key, cryptographers estimate that AES-128 will have a post-quantum security level of 64 bits, AES-192 will have a post-quantum security level of 96 bits, and AES-256 will have a post-quantum security level of 128 bits.This is because Grover’s quantum search algorithm can search unsorted items in time, which can be used to reduce the total number of possible secrets from to . This effectively cuts the security level, expressed in bits, in half.
Note that this heuristic estimate is based on the number of guesses (a time factor), and doesn’t take circuit size into consideration. Grover’s algorithm also doesn’t parallelize well. The real-world security of AES may still be above 100 bits if you consider these nuances.
But remember, even AES-256 operates on 128-bit blocks.
Consequently, for AES-256, there should be approximately (plaintext, key) pairs that produce any given ciphertext block.
Furthermore, there will be many keys that, for a constant plaintext block, will produce the same ciphertext block despite being a different key entirely. (n.b. This doesn’t mean for all plaintext/ciphertext block pairings, just some arbitrary pairing.)
Concrete example: Encrypting a plaintext block consisting of sixteen NUL bytes will yield a specific 128-bit ciphertext exactly once for each given AES-128 key. However, there are times as many AES-256 keys as there are possible plaintext/ciphertexts. Keep this in mind for AES-GCM.
This means it’s conceivable to accidentally construct a protocol that, despite using AES-256 safely, has a post-quantum security level on par with AES-128, which is only 64 bits.
This would not be nearly as much of a problem if AES’s block size was 256 bits.
Real-World Example: Signal
The Signal messaging app is the state-of-the-art for private communications. If you were previously using PGP and email, you should use Signal instead.Signal aims to provide private communications (text messaging, voice calls) between two mobile devices, piggybacking on your pre-existing contacts list.
Part of their operational requirements is that they must be user-friendly and secure on a wide range of Android devices, stretching all the way back to Android 4.4.
The Signal Protocol uses AES-CBC + HMAC-SHA256 for message encryption. Each message is encrypted with a different AES key (due to the Double Ratchet), which limits the practical blast radius of a cache-timing attack and makes practical exploitation difficult (since you can’t effectively replay decryption in order to leak bits about the key).
Thus, Signal’s message encryption is still secure even in the presence of vulnerable AES implementations.
Hooray for well-engineered protocols managing to actually protect users.
Art by Swizz.However, the storage service in the Signal App uses AES-GCM, and this key has to be reused in order for the encrypted storage to operate.
This means, for older phones without dedicated hardware support for AES (i.e. low-priced phones from 2013, which Signal aims to support), the risk of cache-timing attacks is still present.
This is unacceptable!
What this means is, a malicious app that can flush the CPU cache and measure timing with sufficient precision can siphon the AES-GCM key used by Signal to encrypt your storage without ever violating the security boundaries enforced by the Android operating system.
As a result of the security boundaries never being crossed, these kind of side-channel attacks would likely evade forensic analysis, and would therefore be of interest to the malware developers working for nation states.
Of course, if you’re on newer hardware (i.e. Qualcomm Snapdragon 835), you have hardware-accelerated AES available, so it’s probably a moot point.
Why AES-GCM Sucks Even More
AES-GCM is an authenticated encryption mode that also supports additional authenticated data. Cryptographers call these modes AEAD.AEAD modes are more flexible than simple block ciphers. Generally, your encryption API accepts the following:
- The plaintext message.
- The encryption key.
- A nonce (: A number that must only be used once).
- Optional additional data which will be authenticated but not encrypted.
The output of an AEAD function is both the ciphertext and an authentication tag, which is necessary (along with the key and nonce, and optional additional data) to decrypt the plaintext.
Cryptographers almost universally recommend using AEAD modes for symmetric-key data encryption.
That being said, AES-GCM is possibly my least favorite AEAD, and I’ve got good reasons to dislike it beyond simply, “It uses AES”.
The deeper you look into AES-GCM’s design, the harder you will feel this sticker.
GHASH Brittleness
The way AES-GCM is initialized is stupid: You encrypt an all-zero block with your AES key (in ECB mode) and store it in a variable called . This value is used for authenticating all messages authenticated under that AES key, rather than for a given (key, nonce) pair.
Diagram describing Galois/Counter Mode, taken from Wikipedia.
This is often sold as an advantage: Reusing allows for better performance. However, it makes GCM brittle: Reusing a nonce allows an attacker to recover H and then forge messages forever. This is called the “forbidden attack”, and led to real world practical breaks.Let’s contrast AES-GCM with the other AEAD mode supported by TLS: ChaCha20-Poly1305, or ChaPoly for short.
ChaPoly uses one-time message authentication keys (derived from each key/nonce pair). If you manage to leak a Poly1305 key, the impact is limited to the messages encrypted under that (ChaCha20 key, nonce) pair.
While that’s still bad, it isn’t “decrypt all messages under that key forever” bad like with AES-GCM.
Note: “Message Authentication” here is symmetric, which only provides a property called message integrity, not sender authenticity. For the latter, you need asymmetric cryptography (wherein the ability to verify a message doesn’t imply the capability to generate a new signature), which is totally disparate from symmetric algorithms like AES or GHASH. You probably don’t need to care about this nuance right now, but it’s good to know in case you’re quizzed on it later.
H Reuse and Multi-User Security
If you recall, AES operates on 128-bit blocks even when 256-bit keys are used.If we assume AES is well-behaved, we can deduce that there are approximately different 256-bit keys that will map a single plaintext block to a single ciphertext block.
This is trivial to calculate. Simply divide the number of possible keys () by the number of possible block states () to yield the number of keys that produce a given ciphertext for a single block of plaintext: .
Each key that will map an arbitrarily specific plaintext block to a specific ciphertext block is also separated in the keyspace by approximately .
This means there are approximately independent keys that will map a given all-zero plaintext block to an arbitrarily chosen value of (if we assume AES doesn’t have weird biases).
Credit: Harubaki
“Why Does This Matter?”
It means that, with keys larger than 128 bits, you can model the selection of as a 128-bit pseudorandom function, rather than a 128-bit permutation. As a result, you an expect a collision with 50% probability after only different keys are selected.Note: Your 128-bit randomly generated AES keys already have this probability baked into their selection, but this specific analysis doesn’t really apply for 128-bit keys since AES is a PRP, not a PRF, so there is no “collision” risk. However, you end up at the same upper limit either way.
But 50% isn’t good enough for cryptographic security.
In most real-world systems, we target a collision risk. So that means our safety limit is actually different AES keys before you have to worry about reuse.
This isn’t the same thing as symmetric wear-out (where you need to re-key after a given number of encryptions to prevent nonce reuse). Rather, it means after your entire population has exhausted the safety limit of different AES keys, you have to either accept the risk or stop using AES-GCM.
If you have a billion users (), the safety limit is breached after AES keys per user (approximately 262,000).
“What Good is H Reuse for Attackers if HF differs?”
There are two numbers used in AES-GCM that are derived from the AES key. is used for block multiplication, and (the value of with a counter of 0 from the following diagram) is XORed with the final result to produce the authentication tag.The arrow highlighted with green is HF.
It’s tempting to think that a reuse of isn’t a concern because will necessarily be randomized, which prevents an attacker from observing when collides. It’s certainly true that the single-block collision risk discussed previously for will almost certainly not also result in a collision for . And since isn’t reused unless a nonce is reused (which also leaks directly), this might seem like a non-issue.
Art by Khia.
However, it’s straightforward to go from a condition of reuse to an adaptive chosen-ciphertext attack.
- Intercept multiple valid ciphertexts.
- e.g. Multiple JWTs encrypted with
{"alg":"A256GCM"}
- Use your knowledge of , the ciphertext, and the AAD to calculate the GCM tag up to the final XOR. This, along with the existing authentication tag, will tell you the value of for a given nonce.
- Calculate a new authentication tag for a chosen ciphertext using and your candidate value, then replay it into the target system.
While the blinding offered by XORing the final output with is sufficient to stop from being leaked directly, the protection is one-way.
Ergo, a collision in is not sufficiently thwarted by .
“How Could the Designers Have Prevented This?”
The core issue here is the AES block size, again.If we were analyzing a 256-bit block variant of AES, and a congruent GCM construction built atop it, none of what I wrote in this section would apply.
However, the 128-bit block size was a design constraint enforced by NIST in the AES competition. This block size was during an era of 64-bit block ciphers (e.g. Triple-DES and Blowfish), so it was a significant improvement at the time.
NIST’s AES competition also inherited from the US government’s tradition of thinking in terms of “security levels”, which is why there are three different permitted key sizes (128, 192, or 256 bits).
“Why Isn’t This a Vulnerability?”
There’s always a significant gap in security, wherein something isn’t safe to recommend, but also isn’t susceptible to a known practical attack. This gap is important to keep systems secure, even when they aren’t on the bleeding edge of security.Using 1024-bit RSA is a good example of this: No one has yet, to my knowledge, successfully factored a 1024-bit RSA public key. However, most systems have recommended a minimum 2048-bit for years (and many recommend 3072-bit or 4096-bit today).
With AES-GCM, the expected distance between collisions in is , and finding an untargeted collision requires being able to observe more than different sessions, and somehow distinguish when collides.
As a user, you know that after different keys, you’ve crossed the safety boundary for avoiding collisions. But as an attacker, you need bites at the apple, not . Additionally, you need some sort of oracle or distinguisher for when this happens.
We don’t have that kind of distinguisher available to us today. And even if we had one available, the amount of data you need to search in order for any two users in the population to reuse/collide is challenging to work with. You would need the computational and data storages of a major cloud service provider to even think about pulling the attack off.
Naturally, this isn’t a practical vulnerability. This is just another gripe I have with AES-GCM, as someone who has to work with cryptographic algorithms a lot.
Short Nonces
Although the AES block size is 16 bytes, AES-GCM nonces are only 12 bytes. The latter 4 bytes are dedicated to an internal counter, which is used with AES in Counter Mode to actually encrypt/decrypt messages.(Yes, you can use arbitrary length nonces with AES-GCM, but if you use nonces longer than 12 bytes, they get hashed into 12 bytes anyway, so it’s not a detail most people should concern themselves with.)
If you ask a cryptographer, “How much can I encrypt safely with AES-GCM?” you’ll get two different answers.
- Message Length Limit: AES-GCM can be used to encrypt messages up to bytes long, under a given (key, nonce) pair.
- Number of Messages Limit: If you generate your nonces randomly, you have a 50% chance of a nonce collision after messages.
However, 50% isn’t conservative enough for most systems, so the safety margin is usually much lower. Cryptographers generally set the key wear-out of AES-GCM at random nonces, which represents a collision probability of one in 4 billion.These limits are acceptable for session keys for encryption-in-transit, but they impose serious operational limits on application-layer encryption with long-term keys.
Random Key Robustness
Before the advent of AEAD modes, cryptographers used to combine block cipher modes of operation (e.g. AES-CBC, AES-CTR) with a separate message authentication code algorithm (e.g. HMAC, CBC-MAC).You had to be careful in how you composed your protocol, lest you invite Cryptographic Doom into your life. A lot of developers screwed this up. Standardized AEAD modes promised to make life easier.
Many developers gained their intuition for authenticated encryption modes from protocols like Signal’s (which combines AES-CBC with HMAC-SHA256), and would expect AES-GCM to be a drop-in replacement.
Unfortunately, GMAC doesn’t offer the same security benefits as HMAC: Finding a different (ciphertext, HMAC key) pair that produces the same authentication tag is a hard problem, due to HMAC’s reliance on cryptographic hash functions. This makes HMAC-based constructions “message committing”, which instills Random Key Robustness.
Critically, AES-GCM doesn’t have this property. You can calculate a random (ciphertext, key) pair that collides with a given authentication tag very easily.
This fact prohibits AES-GCM from being considered for use with OPAQUE (which requires RKR), one of the upcoming password-authenticated key exchange algorithms. (Read more about them here.)
Better-Designed Algorithms
You might be thinking, “Okay random furry, if you hate AES-GCM so much, what would you propose we use instead?”I’m glad you asked!
XChaCha20-Poly1305
For encrypting messages under a long-term key, you can’t really beat XChaCha20-Poly1305.
- ChaCha is a stream cipher based on a 512-bit ARX hash function in counter mode. ChaCha doesn’t use S-Boxes. It’s fast and constant-time without hardware acceleration.
- ChaCha20 is ChaCha with 20 rounds.
- XChaCha nonces are 24 bytes, which allows you to generate them randomly and not worry about a birthday collision until about messages (for the same collision probability as AES-GCM).
- Poly1305 uses different 256-bit key for each (nonce, key) pair and is easier to implement in constant-time than AES-GCM.
- XChaCha20-Poly1305 uses the first 16 bytes of the nonce and the 256-bit key to generate a distinct subkey, and then employs the standard ChaCha20-Poly1305 construction used in TLS today.
For application-layer cryptography, XChaCha20-Poly1305 contains most of the properties you’d want from an authenticated mode.
However, like AES-GCM (and all other Polynomial MACs I’ve heard of), it is not message committing.
The Gimli Permutation
For lightweight cryptography (n.b. important for IoT), the Gimli permutation (e.g. employed in libhydrogen) is an attractive option.Gimli is a Round 2 candidate in NIST’s Lightweight Cryptography project. The Gimli permutation offers a lot of applications: a hash function, message authentication, encryption, etc.
Critically, it’s possible to construct a message-committing protocol out of Gimli that will hit a lot of the performance goals important to embedded systems.
Closing Remarks
Despite my personal disdain for AES-GCM, if you’re using it as intended by cryptographers, it’s good enough.Don’t throw AES-GCM out just because of my opinions. It’s very likely the best option you have.
Although I personally dislike AES and GCM, I’m still deeply appreciative of the brilliance and ingenuity that went into both designs.
My desire is for the industry to improve upon AES and GCM in future cipher designs so we can protect more people, from a wider range of threats, in more diverse protocols, at a cheaper CPU/memory/time cost.
We wouldn’t have a secure modern Internet without the work of Vincent Rijmen, Joan Daemen, John Viega, David A. McGrew, and the countless other cryptographers and security researchers who made AES-GCM possible.
Change Log
- 2021-10-26: Added section on H Reuse and Multi-User Security.
https://soatok.blog/2020/05/13/why-aes-gcm-sucks/
#AES #AESGCM #cryptography #GaloisCounterMode #opinion #SecurityGuidance #symmetricCryptography
Historical Context of Iota’s Hash Functions
Once upon a time, researchers discovered that the hash function used within the Iota cryptocurrency (Curl-P), was vulnerable to practical collisions. When pressed about this, the Iota Foundation said the following:
https://twitter.com/durumcrustulum/status/1083859956841889792
In response to this research, the Iota developers threatened to sue the researchers.
https://twitter.com/matthew_d_green/status/965731647579664385?lang=en
Iota replaced Curl-P-27 with a hash function based on Keccak-384 they call Kerl. Keccak, you may recall, is a sponge function that went on to become SHA-3.
At its face, this sounds like a conservative choice in cryptography protocol design: Migrate from a backdoored hash function to one trusted and respected by cryptographers around the world.
https://twitter.com/cybergibbons/status/1283065819300331520
Iota even offered a bug bounty for attacks against reduced round variants:
https://twitter.com/durumcrustulum/status/1083860482866343936
Kerl isn’t Keccak-384 though. It’s Keccak-384 with a bit of an odd twist: They encode the input bytes into ternary ({0, 1} -> {-1, 0, 1}) before hashing. Apparently this weird obsession with ternary encoding is part of Iota’s schtick?
Practical Kerl Collisions
As a consequence of their weird ternary obsession, the following inputs all produce the same Kerl hash:
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIZ
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIH
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIQ
This is a consequence of always zeroing out the last “trit” before passing the input to Keccak-384.
Since this zeroing was an explicit design choice of Iota’s, I decided to helpfully submit a pull request adding these inputs as test vectors to their JavaScript implementation.
Why It Matters
These are the facts:
- Kerl is intended to replace Curl-P-27 within the Iota cryptocurrency.
- The Iota Foundation previously admitted that Curl-P-27 was a “copy-protection backdoor”.
- The ternary encoding in Kerl allows three different inputs to produce the same hash output (which is not true of Keccak-384).
Given the past behavior of the Iota developers, there are three possible explanations for this:
- It’s a bugdoor (a backdoor enabled by a bug) intended to be exploited by the Coordinator–possibly as another copy-protection backdoor in the spirit of Curl-P-27.
- They made a critical design mistake in Kerl, which may or may not be exploitable in one or more of the places they use Kerl.
- Their alleged justification for zeroing out the last trit is sound, and there’s no way to exploit this collision vulnerability within Iota.
The last explanation is the one that Iota fanboys will probably cling to, but let me be clear: Even if this isn’t exploitable within Iota, it’s still a huge fuck-up in the design of Kerl.
I’ve lost count of the number of projects I’ve encountered that used secp256k1 as their elliptic curve choice simply because it was also the curve Bitcoin used. These are protocols doing ECDSA for JSON Web Tokens, or ECDH for use in ECIES schemes.
The probability of someone eventually using Kerl outside of Iota is 1, and they will be vulnerable to collisions because of that choice.
Takeaways
Iota is a cryptocurrency project that threatens security researchers, intentionally backdoored the cryptographic hash function in their original design (for “copy-protection”) and admitted to it, and designed a replacement cryptographic hash function (Kerl) that is vulnerable to trivial collision attacks (but the underlying hash function, Keccak-384, is not).
I don’t need to connect the lines on this one, do I?
It’s really freaking obvious.
Since this attack also works on “reduced round” variants of Kerl, I guess I win their bug bounty too!
If the Iota developers are reading this, please remit payment for your promised bounty to the following ZEC address: zs1lwghzjazt4h53gwnl7f24tdq99kw7eh9hgh3qumdvcndszl7ml4xmsudcmm60dut2cfesxmlcec
That being said, I never got an answer to my question, “Should I report the security reduction in bits or trits?” I’m feeling generous though. Given the cost of a collision is basically 1 trit, this is a 242-trit security reduction against collision attacks in Kerl.
https://soatok.blog/2020/07/15/kerlissions-trivial-collisions-in-iotas-hash-function-kerl/
#collision #cryptanalysis #Cryptocurrency #cryptographicHashFunction #hashFunction #iota #kerl #kerlission
Historical Context of Iota’s Hash Functions
Once upon a time, researchers discovered that the hash function used within the Iota cryptocurrency (Curl-P), was vulnerable to practical collisions. When pressed about this, the Iota Foundation said the following:https://twitter.com/durumcrustulum/status/1083859956841889792
In response to this research, the Iota developers threatened to sue the researchers.
https://twitter.com/matthew_d_green/status/965731647579664385?lang=en
Iota replaced Curl-P-27 with a hash function based on Keccak-384 they call Kerl. Keccak, you may recall, is a sponge function that went on to become SHA-3.
At its face, this sounds like a conservative choice in cryptography protocol design: Migrate from a backdoored hash function to one trusted and respected by cryptographers around the world.
https://twitter.com/cybergibbons/status/1283065819300331520
Iota even offered a bug bounty for attacks against reduced round variants:
https://twitter.com/durumcrustulum/status/1083860482866343936
Kerl isn’t Keccak-384 though. It’s Keccak-384 with a bit of an odd twist: They encode the input bytes into ternary ({0, 1} -> {-1, 0, 1}) before hashing. Apparently this weird obsession with ternary encoding is part of Iota’s schtick?
Practical Kerl Collisions
As a consequence of their weird ternary obsession, the following inputs all produce the same Kerl hash:
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIZ
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIH
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIQ
This is a consequence of always zeroing out the last “trit” before passing the input to Keccak-384.
Since this zeroing was an explicit design choice of Iota’s, I decided to helpfully submit a pull request adding these inputs as test vectors to their JavaScript implementation.
Why It Matters
These are the facts:
- Kerl is intended to replace Curl-P-27 within the Iota cryptocurrency.
- The Iota Foundation previously admitted that Curl-P-27 was a “copy-protection backdoor”.
- The ternary encoding in Kerl allows three different inputs to produce the same hash output (which is not true of Keccak-384).
Given the past behavior of the Iota developers, there are three possible explanations for this:
- It’s a bugdoor (a backdoor enabled by a bug) intended to be exploited by the Coordinator–possibly as another copy-protection backdoor in the spirit of Curl-P-27.
- They made a critical design mistake in Kerl, which may or may not be exploitable in one or more of the places they use Kerl.
- Their alleged justification for zeroing out the last trit is sound, and there’s no way to exploit this collision vulnerability within Iota.
The last explanation is the one that Iota fanboys will probably cling to, but let me be clear: Even if this isn’t exploitable within Iota, it’s still a huge fuck-up in the design of Kerl.
I’ve lost count of the number of projects I’ve encountered that used secp256k1 as their elliptic curve choice simply because it was also the curve Bitcoin used. These are protocols doing ECDSA for JSON Web Tokens, or ECDH for use in ECIES schemes.
The probability of someone eventually using Kerl outside of Iota is 1, and they will be vulnerable to collisions because of that choice.
Takeaways
Iota is a cryptocurrency project that threatens security researchers, intentionally backdoored the cryptographic hash function in their original design (for “copy-protection”) and admitted to it, and designed a replacement cryptographic hash function (Kerl) that is vulnerable to trivial collision attacks (but the underlying hash function, Keccak-384, is not).I don’t need to connect the lines on this one, do I?
It’s really freaking obvious.
Since this attack also works on “reduced round” variants of Kerl, I guess I win their bug bounty too!
If the Iota developers are reading this, please remit payment for your promised bounty to the following ZEC address: zs1lwghzjazt4h53gwnl7f24tdq99kw7eh9hgh3qumdvcndszl7ml4xmsudcmm60dut2cfesxmlcec
That being said, I never got an answer to my question, “Should I report the security reduction in bits or trits?” I’m feeling generous though. Given the cost of a collision is basically 1 trit, this is a 242-trit security reduction against collision attacks in Kerl.
https://soatok.blog/2020/07/15/kerlissions-trivial-collisions-in-iotas-hash-function-kerl/
#collision #cryptanalysis #Cryptocurrency #cryptographicHashFunction #hashFunction #iota #kerl #kerlission
There are several different methods for securely hashing a password server-side for storage and future authentication. The most common one (a.k.a. the one that FIPS allows you to use, if compliance matters for you) is called PBKDF2. It stands for Password-Based Key Derivation Function #2.
Why #2? It’s got nothing to do with pencils. There was, in fact, a PBKDF1! But PBKDF1 was fatally insecure in a way I find very interesting. This StackOverflow answer is a great explainer on the difference between the two.
Very Hand-Wavy Description of a Hash Function
Let’s defined a hash function as any one-way transformation of some arbitrary-length string () to a fixed-length, deterministic, pseudo-random output.
Note: When in doubt, I err on the side of being easily understood by non-experts over pedantic precision. (Art by Swizz)
For example, this is a dumb hash function (uses SipHash-2-4 with a constant key):
function dumb_hash(string $arbitrary, bool $raw_binary = false): string{ $h = sodium_crypto_shorthash($arbitrary, 'SoatokDreamseekr'); if ($raw_binary) { return $h; } return sodium_bin2hex($h);}
You can see the output of this function with some sample inputs here.
Properties of Hash Functions
A hash function is considered secure if it has the following properties:
- Pre-image resistance. Given , it should be difficult to find .
- Second pre-image resistance. Given , it should be difficult to find such that
- Collision resistance. It should be difficult to find any arbitrary pair of messages () such that
That last property, collision resistance, is guaranteed up to the Birthday Bound of the hash function. For a hash function with a 256-bit output, you will expect to need on average trial messages to find a collision.
If you’re confused about the difference between collision resistance and second pre-image resistance:
Collision resistance is about finding any two messages that produce the same hash, but you don’t care what the hash is as long as two distinct but known messages produce it.
On the other paw, second pre-image resistance is about finding a second message that produces a given hash.
Exploring PBKDF1’s Insecurity
If you recall, hash functions map an arbitrary-length string to a fixed-length string. If your input size is larger than your output size, collisions are inevitable (albeit computationally infeasible for hash functions such as SHA-256).
But what if your input size is equal to your output size, because you’re taking the output of a hash function and feeding it directly back into the same hash function?
Then, as explained here, you get an depletion of the possible outputs with each successive iteration.
But what does that look like?
Without running the experiments on a given hash function, there are two possibilities that come to mind:
- Convergence. This is when will, for two arbitrary messages and a sufficient number of iterations, converge on a single hash output.
- Cycles. This is when for some integer .
The most interesting result would be a quine, which is a cycle where (that is to say, ).
The least interesting result would be for random inputs to converge into large cycles e.g. cycles of size for a 256-bit hash function.
I calculated this lazily as the birthday bound of the birthday bound (so basically the 4th root, which for is ).
Update: According to this 1960 paper, the average time to cycle is , and cycle length should be , which means for a 256-bit hash you should expect a cycle after about or about 128 bits, and the average cycle length will be about . Thanks Riastradh for the corrections.
Conjecture: I would expect secure cryptographic hash functions in use today (e.g. SHA-256) to lean towards the least interesting output.
An Experiment Design
Since I don’t have an immense amount of cheap cloud computing at my disposal to run this experiments on a real hash function, I’m going to cheat a little and use my constant-key SipHash code from earlier in this post. In future work, cryptographers may find studying real hash functions (e.g. SHA-256) worthwhile.
Given that SipHash is a keyed pseudo-random function with an output size of 64 bits, my dumb hash function can be treated as a 64-bit hash function.
This means that you should expect your first collision (with 50% probability) after only trial hashes. This is cheap enough to try on a typical laptop.
Here’s a simple experiment for convergence:
- Generate two random strings .
- Set .
- Iterate until .
You can get the source code to run this trivial experiment here.
Clone the git repository, run composer install
, and then php bin/experiment.php
. Note that this may need to run for a very long time before you get a result.
If you get a result, you’ve found a convergence.
If the loop doesn’t terminate even after 2^64 iterations, you’ve definitely found a cycle. (Actually detecting a cycle and analyzing its length would require a lot of memory, and my simple PHP script isn’t suitable for this effort.)
“What do you mean I don’t have a petabyte of RAM at my disposal?”
What Does This Actually Tell Us?
The obvious lesson: Don’t design key derivation functions like PBKDF1.
But beyond that, unless you can find a hash function that reliably converges or produces short cycles (, for an n-bit hash function), not much. (This is just for fun, after all!)
Definitely fun to think about though! (Art by circuitslime)
If, however, a hash function is discovered to produce interesting results, this may indicate that the chosen hash function’s internal design is exploitable in some subtle ways that, upon further study, may lead to better cryptanalysis techniques. Especially if a hash quine is discovered.
(Header art by Khia)
https://soatok.blog/2020/05/05/putting-the-fun-in-hash-function/
#crypto #cryptography #hashFunction #SipHash
There are several different methods for securely hashing a password server-side for storage and future authentication. The most common one (a.k.a. the one that FIPS allows you to use, if compliance matters for you) is called PBKDF2. It stands for Password-Based Key Derivation Function #2.Why #2? It’s got nothing to do with pencils. There was, in fact, a PBKDF1! But PBKDF1 was fatally insecure in a way I find very interesting. This StackOverflow answer is a great explainer on the difference between the two.
Very Hand-Wavy Description of a Hash Function
Let’s defined a hash function as any one-way transformation of some arbitrary-length string () to a fixed-length, deterministic, pseudo-random output.Note: When in doubt, I err on the side of being easily understood by non-experts over pedantic precision. (Art by Swizz)
For example, this is a dumb hash function (uses SipHash-2-4 with a constant key):
function dumb_hash(string $arbitrary, bool $raw_binary = false): string{ $h = sodium_crypto_shorthash($arbitrary, 'SoatokDreamseekr'); if ($raw_binary) { return $h; } return sodium_bin2hex($h);}
You can see the output of this function with some sample inputs here.
Properties of Hash Functions
A hash function is considered secure if it has the following properties:
- Pre-image resistance. Given , it should be difficult to find .
- Second pre-image resistance. Given , it should be difficult to find such that
- Collision resistance. It should be difficult to find any arbitrary pair of messages () such that
That last property, collision resistance, is guaranteed up to the Birthday Bound of the hash function. For a hash function with a 256-bit output, you will expect to need on average trial messages to find a collision.
If you’re confused about the difference between collision resistance and second pre-image resistance:
Collision resistance is about finding any two messages that produce the same hash, but you don’t care what the hash is as long as two distinct but known messages produce it.
On the other paw, second pre-image resistance is about finding a second message that produces a given hash.
Exploring PBKDF1’s Insecurity
If you recall, hash functions map an arbitrary-length string to a fixed-length string. If your input size is larger than your output size, collisions are inevitable (albeit computationally infeasible for hash functions such as SHA-256).But what if your input size is equal to your output size, because you’re taking the output of a hash function and feeding it directly back into the same hash function?
Then, as explained here, you get an depletion of the possible outputs with each successive iteration.
But what does that look like?
Without running the experiments on a given hash function, there are two possibilities that come to mind:
- Convergence. This is when will, for two arbitrary messages and a sufficient number of iterations, converge on a single hash output.
- Cycles. This is when for some integer .
The most interesting result would be a quine, which is a cycle where (that is to say, ).
The least interesting result would be for random inputs to converge into large cycles e.g. cycles of size for a 256-bit hash function.
I calculated this lazily as the birthday bound of the birthday bound (so basically the 4th root, which for is ).Update: According to this 1960 paper, the average time to cycle is , and cycle length should be , which means for a 256-bit hash you should expect a cycle after about or about 128 bits, and the average cycle length will be about . Thanks Riastradh for the corrections.
Conjecture: I would expect secure cryptographic hash functions in use today (e.g. SHA-256) to lean towards the least interesting output.
An Experiment Design
Since I don’t have an immense amount of cheap cloud computing at my disposal to run this experiments on a real hash function, I’m going to cheat a little and use my constant-key SipHash code from earlier in this post. In future work, cryptographers may find studying real hash functions (e.g. SHA-256) worthwhile.Given that SipHash is a keyed pseudo-random function with an output size of 64 bits, my dumb hash function can be treated as a 64-bit hash function.
This means that you should expect your first collision (with 50% probability) after only trial hashes. This is cheap enough to try on a typical laptop.
Here’s a simple experiment for convergence:
- Generate two random strings .
- Set .
- Iterate until .
You can get the source code to run this trivial experiment here.
Clone the git repository, run
composer install
, and thenphp bin/experiment.php
. Note that this may need to run for a very long time before you get a result.If you get a result, you’ve found a convergence.
If the loop doesn’t terminate even after 2^64 iterations, you’ve definitely found a cycle. (Actually detecting a cycle and analyzing its length would require a lot of memory, and my simple PHP script isn’t suitable for this effort.)
“What do you mean I don’t have a petabyte of RAM at my disposal?”
What Does This Actually Tell Us?
The obvious lesson: Don’t design key derivation functions like PBKDF1.But beyond that, unless you can find a hash function that reliably converges or produces short cycles (, for an n-bit hash function), not much. (This is just for fun, after all!)
Definitely fun to think about though! (Art by circuitslime)
If, however, a hash function is discovered to produce interesting results, this may indicate that the chosen hash function’s internal design is exploitable in some subtle ways that, upon further study, may lead to better cryptanalysis techniques. Especially if a hash quine is discovered.
(Header art by Khia)
https://soatok.blog/2020/05/05/putting-the-fun-in-hash-function/
#crypto #cryptography #hashFunction #SipHash