Programmers don’t understand hash functions, and I can demonstrate this to most of the people that will read this with a single observation:
When you saw the words “hash function” in the title, you might have assumed this was going to be a blog post about password storage. (Passwords are the most common knee-jerk reaction I get to any discussion about hash functions, anyway. A little bit of security knowledge can be very dangerous.)
I don’t blame software developers for their lack of understanding on anything I’m going to discuss. The term “hash function” can accurately represent any of the following disparate topics in computer science:
- Converting keys into memory addresses for hash maps
- Creating a one-way message digest to ensure said was transmitted correctly over an untrusted network
- Cryptographic building blocks for digital signatures, message authentication codes, key derivation functions, and so on
- Server-side password storage techniques meant to resist brute force and precomputation attacks
- Perceptual fingerprints of visual data
Some hash functions are reusable across multiple topics from the above list, but even then, not all of the hash functions you’d use for one purpose have the same properties.
Using a hash function for the wrong purpose, or in a place where it doesn’t provide the expected properties, can lead to security vulnerabilities. Some of these vulnerabilities aren’t obvious or straightforward, either, which only serves to magnify confusion.
So let’s talk about hash functions, what programmers get wrong about them, and the correct answers to these common misconceptions.
(Credit: ScruffKerfluff)
What Are Hash Functions?
A hash function is any function that can map an arbitrary-sized input to a fixed-size output.
This is incredibly vague; anything can be a hash function if you’re brave enough. You can treat an arbitrary string as an big integer and return the least significant bit and call this a hash function. (It’s not a particularly good one, but it fits the definition.)
function totallyHashed(input: string|Buffer): Buffer { const inBuf = Buffer.isBuffer(input) ? input : Buffer.from(input); return Buffer.from([ inBuf[inBuf.length - 1] & 1 ]);}
Being able to call something a hash function and be technically correct isn’t very helpful.
Credit: circuitslime
What Kinds of Hash Functions Do We Care About?
There are different types of hash functions suitable for solving different types of problems. (Some examples follow, but this should not be taken as an exhaustive treatment.)
General-Purpose Hash Functions
General-purpose hash functions are useful for converting a key into an index for memory addresses when constructing hash tables. Usually when someone says only “hash function” in a broad computer science context, without any qualifiers, that’s what they’re talking about.
Examples: SipHash, djb2, Murmur3. You can find a comparison here.
Worth calling out: Only some of these general purpose tables are safe to use for hash tables where the keys are provided from user input. For example: JSON data. When in doubt, SipHash-2-4 is a good default.
Cryptographic Hash Functions
Cryptographic hash functions have additional desirable properties (they’re non-invertible and must be resistant to collision attacks and preimage attacks) above general-purpose hash functions. They also have larger output sizes (typically at least 256 bits) than the sort of hash functions you’d use for hash tables. Consequently, they’re slower than the hash functions people tend to use for hash tables.
Cryptographic hash functions are often used in place of a random oracle in security protocols, because actual random oracles do not exist, as Thomas Pornin explains:
A random oracle is described by the following model:
- There is a black box. In the box lives a gnome, with a big book and some dice.
- We can input some data into the box (an arbitrary sequence of bits).
- Given some input that he did not see beforehand, the gnome uses his dice to generate a new output, uniformly and randomly, in some conventional space (the space of oracle outputs). The gnome also writes down the input and the newly generated output in his book.
- If given an already seen input, the gnome uses his book to recover the output he returned the last time, and returns it again.
So a random oracle is like a kind of hash function, such that we know nothing about the output we could get for a given input message m. This is a useful tool for security proofs because they allow to express the attack effort in terms of number of invocations to the oracle.
The problem with random oracles is that it turns out to be very difficult to build a really “random” oracle. First, there is no proof that a random oracle can really exist without using a gnome. Then, we can look at what we have as candidates: hash functions. A secure hash function is meant to be resilient to collisions, preimages and second preimages. These properties do not imply that the function is a random oracle.
Thomas Pornin
I’m intentionally eschewing a lot of detail about what makes a cryptographic hash function secure (e.g. bit diffusion even in reduced rounds), or how they achieve the desirable properties.
If you’re interested in those topics, leave a comment below and I’ll talk about that in a future post.
If you remember nothing else about cryptographic hash functions, just know that checksums (e.g. CRC32) are not cryptographic. (Yes, many hashing APIs include CRC32 alongside good options, but don’t be misled.)
Note: There’s a very pedantic mathematical discussion that can be had about whether or not cryptographic hash functions are truly one-way functions (which, like P = NP vs P != NP, is an open conjecture in mathematics). You don’t have to know, or even care, about this distinction–unless you’re making assumptions about this property in your designs.
Examples: BLAKE3, SHA256.
Art: Lynx vs Jackalope
A Word Of Caution on Message Authentication Codes
In cryptography, message authentication codes are often built with cryptographic hash functions, but not always!
AES-GCM famously uses a function called GHASH, which evaluates a polynomial in , rather than a cryptographic hash function. This provides a great speed boost, but fails to provide collision resistance, leading to interesting results.
Poly1305 is a similar story to GCM, but doesn’t reuse keys the same way.
Although GHASH and Poly1305 are secure MACs, they’re not built from cryptographic hash functions.
CBC-MAC uses a block cipher (typically AES) in cipher-block chaining mode (with an all-zero initialization vector) and outputs the last block as the authentication tag. This offers no collision resistance (as Mega learned the hard way).
When in doubt, HMAC is a safe choice.
Password Hashing Functions
Separate from, but often built from, cryptographic hash functions are password hashing functions. These are the tools you use if your users are sending a username and password over HTTP and you don’t want to become the next RockYou.
Examples: Argon2, scrypt, bcrypt.
Password hashing functions have significant overlap with key-derivation functions (KDFs), but not all KDFs are meant for password hashing, and not all password hashes are meant for key derivation.
It’s perfectly reasonable to use bcrypt to store passwords, but not to derive encryption keys.
Conversely, HKDF is a valid KDF (which has a stricter set of security requirements than a PRF), but it’s not meant for passwords.
Some algorithms (Argon2 and scrypt) can be safely used as a password hashing function or as a KDF.
Modern password hashing algorithms involve a deliberately expensive computation that’s fast to verify once, but expensive to verify multiple times. These algorithms can be tuned (memory usage, parallel threads, number of iterations) to target a specific latency goal. Additionally, most password hashing APIs take care of salt generation for you, using a secure random generator.
Perceptual Hashes
In yet another different direction, you have perceptual hash functions, such as the kind Apple is going to use to violate the privacy of their customers in a desperate attempt to catch a small percentage of depraved individuals peddling CSAM (and expecting the rest of their customers to blindly trust that this capability they built for themselves will never ever be used to stifle dissidents or whistleblowers).
https://twitter.com/matthew_d_green/status/1428414174758117389
I don’t have a lot to say here, except that I don’t trust Apple, especially after the internal memo about “screeching voices of the minority”: Their partners on this project have showed the utmost contempt for privacy activists, LGBTQ+ folks, survivors of child sexual abuse, etc. and they remain committed to them. Fuck Apple.
Suffice to say, cryptographers were not at all surprised by the discovery of practical collisions against Apple’s new perceptual hash function, because perceptual hash functions do not provide the same properties as cryptographic hash functions.
Perceptual hashes of CSAM do not provide collision or preimage resistance, and it would be possible to flood Apple with false positives if a hash of such material were to ever leak publicly. (Maybe an enterprising Internet Troll will one day make a meme generator that does this?)
How Developers Misuse Hash Functions
There are far too many ways that hash functions get misused by software developers to recount in one blog post.
Some of the more common and obvious examples (i.e., using MD5 to store passwords, calling SHA256 “encryption”) aren’t very interesting, and are covered elsewhere, so I’m going to focus on misuse examples that aren’t commonly discussed online.
Encrypt and Hash
A common design pattern from the 2010’s is to hash some data, then encrypt the data that was hashed (but not the hash), and then send both values to another machine.
The expectation here is that, upon decrypting the ciphertext, the hash can be used as a client-side fingerprint to ensure the data wasn’t corrupted in storage.
This use of a hash function is distinct from the Encrypt/MAC discussion (see: the Cryptographic Doom Principle), because it’s often implemented alongside AEAD. (If you aren’t using authenticated encryption, correct that first.)
However, there are two problems here:
- Directly invoking cryptographic hash functions doesn’t involve a cryptographic secret, and thus all clients will produce the same hash of the same plaintext.
- A cryptographic hash can be used to perform offline attacks (e.g. rainbow tables) against the plaintext, especially if the input domain is small (i.e. a phone number).
Art: Swizz
What you really want to use in this situation is HMAC with a static secret key (which is only known client-side).
HMAC ensures that, without access to the secret key, precomputation attacks are not possible. Additionally, if each client has a different secret key (which, they SHOULD), an attacker who only has access to hashes and ciphertext cannot distinguish which records correspond across multiple clients.
This is a weaker proposition than security against chosen plaintexts (IND-CPA), but still provides a higher level of assurance than a naked SHA256 hash.
Searchable Encryption with Hashed Indexes
Similar to the previous example, sometimes developers will be tasked with storing encrypted values in a database while also being able to quickly query the database based on an encrypted value.
The laziest solution to the “encrypted database” use-case is to just use deterministic encryption, such as AES-ECB or with a static IV/nonce. Most people who have some familiarity with cryptography immediately recognize that this is dangerous, so they opt to encrypt securely (i.e. AEAD with random nonces).
However, to support querying, they often just hash their plaintext and store it alongside the ciphertext.
Art: Lynx vs Jackalope
This reintroduces the issues from the previous section (especially rainbow tables), but with additional risks:
- Plaintexts are overwhelmingly likely to have smaller input domains, thereby increasing the utility of hashes to attack the confidentiality of the plaintext.
- No domain separation between different hashes of different encrypted fields.
To address this, there are a few things you can do:
- Truncate your hashes. If you want to frustrate attackers, simply don’t store a full hash of the plaintext. Instead, truncate hashes to 8 or fewer hexits and permit a small degree of false positives in your decryption logic (n.b. by filtering those rows out).
- Use distinct HMAC keys per index. This introduces the solution to the previous section, but also addresses domain separation.
If you’re left wondering, “Can I use both solutions?” The answer is, “Yes, but you just reinvented what CipherSweet calls blind indexes.”
Art: Lynx vs Jackalope
Overconfidence With Collision-Resistance
Previously on this blog, I disclosed a trivial collision vulnerability in the Kerl hash function used by the Iota cryptocurrency project without breaking the underling hash function (Keccak384).
How did I do this? I found multiple input values that, before being passed to the hash function, collide with each other.
Credit: Harubaki
Developers are frequently overconfident about the collision-resistance of their protocol, simply because they use collision-resistant hash functions inside of the protocol. They’re frequently blind to the reality of canonicalization attacks (including the somewhat famous length-extension attack).
This isn’t their fault. If you’ve made this mistake, it isn’t your fault. Cryptography is a difficult topic and requires a lot of experience and expertise to get right.
Closing Thoughts
One of my favorite cryptography websites is for the SPHINCS project, which is a stateless post-quantum hash-based digital signature algorithm.
On this webpage, which has a heavy anarchist aesthetic, there’s a special note to law-enforcement agents that reads:
The word “state” is a technical term in cryptography. Typical hash-based signature schemes need to record information, called “state”, after every signature. Google’s Adam Langley refers to this as a “huge foot-cannon” from a security perspective. By saying “eliminate the state” we are advocating a security improvement, namely adopting signature schemes that do not need to record information after every signature. We are not talking about eliminating other types of states. We love most states, especially yours! Also, “hash” is another technical term and has nothing to do with cannabis.
Now, I personally like this disclaimer because a) it’s funny and b) it reminds us that all cops are bastards.
Art: Lynx vs Jackalope
But it’s also a good reminder of how confusing the term “hash” can be in different fields. Software engineers aren’t the only people who are likely to be confused about hash functions.
(And I can’t even apply the “-ish” suffix to talk about things that behave like hash functions but aren’t hash functions, because that’s a homograph for an even more specific drug term.)
The next time you see a programmer ask a potentially unwise question involving hash functions, having read this blog post, I hope you’ll appreciate how confusing all this shit can be for virtually everyone.
If you’ve made any of the specific mistakes I’ve discussed here, know that you’re in very good company. Some of the best programmers in the industry have made these mistakes before. Hell, I’ve made these exact mistakes before, and worse.
https://soatok.blog/2021/08/24/programmers-dont-understand-hash-functions/
#cryptographicHashFunction #cryptography #hashFunction #SecurityGuidance
If you’re reading this wondering if you should stop using AES-GCM in some standard protocol (TLS 1.3), the short answer is “No, you’re fine”.I specialize in secure implementations of cryptography, and my years of experience in this field have led me to dislike AES-GCM.
This post is about why I dislike AES-GCM’s design, not “why AES-GCM is insecure and should be avoided”. AES-GCM is still miles above what most developers reach for when they want to encrypt (e.g. ECB mode or CBC mode). If you want a detailed comparison, read this.
To be clear: This is solely my opinion and not representative of any company or academic institution.
What is AES-GCM?
AES-GCM is an authenticated encryption mode that uses the AES block cipher in counter mode with a polynomial MAC based on Galois field multiplication.In order to explain why AES-GCM sucks, I have to first explain what I dislike about the AES block cipher. Then, I can describe why I’m filled with sadness every time I see the AES-GCM construction used.
What is AES?
The Advanced Encryption Standard (AES) is a specific subset of a block cipher called Rijndael.Rijndael’s design is based on a substitution-permutation network, which broke tradition from many block ciphers of its era (including its predecessor, DES) in not using a Feistel network.
AES only includes three flavors of Rijndael: AES-128, AES-192, and AES-256. The difference between these flavors is the size of the key and the number of rounds used, but–and this is often overlooked–not the block size.
As a block cipher, AES always operates on 128-bit (16 byte) blocks of plaintext, regardless of the key size.
This is generally considered acceptable because AES is a secure pseudorandom permutation (PRP), which means that every possible plaintext block maps directly to one ciphertext block, and thus birthday collisions are not possible. (A pseudorandom function (PRF), conversely, does have birthday bound problems.)
Why AES Sucks
Art by Khia.Side-Channels
The biggest reason why AES sucks is that its design uses a lookup table (called an S-Box) indexed by secret data, which is inherently vulnerable to cache-timing attacks (PDF).There are workarounds for this AES vulnerability, but they either require hardware acceleration (AES-NI) or a technique called bitslicing.
The short of it is: With AES, you’re either using hardware acceleration, or you have to choose between performance and security. You cannot get fast, constant-time AES without hardware support.
Block Size
AES-128 is considered by experts to have a security level of 128 bits.Similarly, AES-192 gets certified at 192-bit security, and AES-256 gets 256-bit security.
However, the AES block size is only 128 bits!
That might not sound like a big deal, but it severely limits the constructions you can create out of AES.
Consider the case of AES-CBC, where the output of each block of encryption is combined with the next block of plaintext (using XOR). This is typically used with a random 128-bit block (called the initialization vector, or IV) for the first block.
This means you expect a collision after encrypting (at 50% probability) blocks.
When you start getting collisions, you can break CBC mode, as this video demonstrates:
https://www.youtube.com/watch?v=v0IsYNDMV7A
This is significantly smaller than the you expect from AES.
Post-Quantum Security?
With respect to the number of attempts needed to find the correct key, cryptographers estimate that AES-128 will have a post-quantum security level of 64 bits, AES-192 will have a post-quantum security level of 96 bits, and AES-256 will have a post-quantum security level of 128 bits.This is because Grover’s quantum search algorithm can search unsorted items in time, which can be used to reduce the total number of possible secrets from to . This effectively cuts the security level, expressed in bits, in half.
Note that this heuristic estimate is based on the number of guesses (a time factor), and doesn’t take circuit size into consideration. Grover’s algorithm also doesn’t parallelize well. The real-world security of AES may still be above 100 bits if you consider these nuances.
But remember, even AES-256 operates on 128-bit blocks.
Consequently, for AES-256, there should be approximately (plaintext, key) pairs that produce any given ciphertext block.
Furthermore, there will be many keys that, for a constant plaintext block, will produce the same ciphertext block despite being a different key entirely. (n.b. This doesn’t mean for all plaintext/ciphertext block pairings, just some arbitrary pairing.)
Concrete example: Encrypting a plaintext block consisting of sixteen NUL bytes will yield a specific 128-bit ciphertext exactly once for each given AES-128 key. However, there are times as many AES-256 keys as there are possible plaintext/ciphertexts. Keep this in mind for AES-GCM.
This means it’s conceivable to accidentally construct a protocol that, despite using AES-256 safely, has a post-quantum security level on par with AES-128, which is only 64 bits.
This would not be nearly as much of a problem if AES’s block size was 256 bits.
Real-World Example: Signal
The Signal messaging app is the state-of-the-art for private communications. If you were previously using PGP and email, you should use Signal instead.Signal aims to provide private communications (text messaging, voice calls) between two mobile devices, piggybacking on your pre-existing contacts list.
Part of their operational requirements is that they must be user-friendly and secure on a wide range of Android devices, stretching all the way back to Android 4.4.
The Signal Protocol uses AES-CBC + HMAC-SHA256 for message encryption. Each message is encrypted with a different AES key (due to the Double Ratchet), which limits the practical blast radius of a cache-timing attack and makes practical exploitation difficult (since you can’t effectively replay decryption in order to leak bits about the key).
Thus, Signal’s message encryption is still secure even in the presence of vulnerable AES implementations.
Hooray for well-engineered protocols managing to actually protect users.
Art by Swizz.However, the storage service in the Signal App uses AES-GCM, and this key has to be reused in order for the encrypted storage to operate.
This means, for older phones without dedicated hardware support for AES (i.e. low-priced phones from 2013, which Signal aims to support), the risk of cache-timing attacks is still present.
This is unacceptable!
What this means is, a malicious app that can flush the CPU cache and measure timing with sufficient precision can siphon the AES-GCM key used by Signal to encrypt your storage without ever violating the security boundaries enforced by the Android operating system.
As a result of the security boundaries never being crossed, these kind of side-channel attacks would likely evade forensic analysis, and would therefore be of interest to the malware developers working for nation states.
Of course, if you’re on newer hardware (i.e. Qualcomm Snapdragon 835), you have hardware-accelerated AES available, so it’s probably a moot point.
Why AES-GCM Sucks Even More
AES-GCM is an authenticated encryption mode that also supports additional authenticated data. Cryptographers call these modes AEAD.AEAD modes are more flexible than simple block ciphers. Generally, your encryption API accepts the following:
- The plaintext message.
- The encryption key.
- A nonce (: A number that must only be used once).
- Optional additional data which will be authenticated but not encrypted.
The output of an AEAD function is both the ciphertext and an authentication tag, which is necessary (along with the key and nonce, and optional additional data) to decrypt the plaintext.
Cryptographers almost universally recommend using AEAD modes for symmetric-key data encryption.
That being said, AES-GCM is possibly my least favorite AEAD, and I’ve got good reasons to dislike it beyond simply, “It uses AES”.
The deeper you look into AES-GCM’s design, the harder you will feel this sticker.
GHASH Brittleness
The way AES-GCM is initialized is stupid: You encrypt an all-zero block with your AES key (in ECB mode) and store it in a variable called . This value is used for authenticating all messages authenticated under that AES key, rather than for a given (key, nonce) pair.
Diagram describing Galois/Counter Mode, taken from Wikipedia.
This is often sold as an advantage: Reusing allows for better performance. However, it makes GCM brittle: Reusing a nonce allows an attacker to recover H and then forge messages forever. This is called the “forbidden attack”, and led to real world practical breaks.Let’s contrast AES-GCM with the other AEAD mode supported by TLS: ChaCha20-Poly1305, or ChaPoly for short.
ChaPoly uses one-time message authentication keys (derived from each key/nonce pair). If you manage to leak a Poly1305 key, the impact is limited to the messages encrypted under that (ChaCha20 key, nonce) pair.
While that’s still bad, it isn’t “decrypt all messages under that key forever” bad like with AES-GCM.
Note: “Message Authentication” here is symmetric, which only provides a property called message integrity, not sender authenticity. For the latter, you need asymmetric cryptography (wherein the ability to verify a message doesn’t imply the capability to generate a new signature), which is totally disparate from symmetric algorithms like AES or GHASH. You probably don’t need to care about this nuance right now, but it’s good to know in case you’re quizzed on it later.
H Reuse and Multi-User Security
If you recall, AES operates on 128-bit blocks even when 256-bit keys are used.If we assume AES is well-behaved, we can deduce that there are approximately different 256-bit keys that will map a single plaintext block to a single ciphertext block.
This is trivial to calculate. Simply divide the number of possible keys () by the number of possible block states () to yield the number of keys that produce a given ciphertext for a single block of plaintext: .
Each key that will map an arbitrarily specific plaintext block to a specific ciphertext block is also separated in the keyspace by approximately .
This means there are approximately independent keys that will map a given all-zero plaintext block to an arbitrarily chosen value of (if we assume AES doesn’t have weird biases).
Credit: Harubaki
“Why Does This Matter?”
It means that, with keys larger than 128 bits, you can model the selection of as a 128-bit pseudorandom function, rather than a 128-bit permutation. As a result, you an expect a collision with 50% probability after only different keys are selected.Note: Your 128-bit randomly generated AES keys already have this probability baked into their selection, but this specific analysis doesn’t really apply for 128-bit keys since AES is a PRP, not a PRF, so there is no “collision” risk. However, you end up at the same upper limit either way.
But 50% isn’t good enough for cryptographic security.
In most real-world systems, we target a collision risk. So that means our safety limit is actually different AES keys before you have to worry about reuse.
This isn’t the same thing as symmetric wear-out (where you need to re-key after a given number of encryptions to prevent nonce reuse). Rather, it means after your entire population has exhausted the safety limit of different AES keys, you have to either accept the risk or stop using AES-GCM.
If you have a billion users (), the safety limit is breached after AES keys per user (approximately 262,000).
“What Good is H Reuse for Attackers if HF differs?”
There are two numbers used in AES-GCM that are derived from the AES key. is used for block multiplication, and (the value of with a counter of 0 from the following diagram) is XORed with the final result to produce the authentication tag.The arrow highlighted with green is HF.
It’s tempting to think that a reuse of isn’t a concern because will necessarily be randomized, which prevents an attacker from observing when collides. It’s certainly true that the single-block collision risk discussed previously for will almost certainly not also result in a collision for . And since isn’t reused unless a nonce is reused (which also leaks directly), this might seem like a non-issue.
Art by Khia.
However, it’s straightforward to go from a condition of reuse to an adaptive chosen-ciphertext attack.
- Intercept multiple valid ciphertexts.
- e.g. Multiple JWTs encrypted with
{"alg":"A256GCM"}
- Use your knowledge of , the ciphertext, and the AAD to calculate the GCM tag up to the final XOR. This, along with the existing authentication tag, will tell you the value of for a given nonce.
- Calculate a new authentication tag for a chosen ciphertext using and your candidate value, then replay it into the target system.
While the blinding offered by XORing the final output with is sufficient to stop from being leaked directly, the protection is one-way.
Ergo, a collision in is not sufficiently thwarted by .
“How Could the Designers Have Prevented This?”
The core issue here is the AES block size, again.If we were analyzing a 256-bit block variant of AES, and a congruent GCM construction built atop it, none of what I wrote in this section would apply.
However, the 128-bit block size was a design constraint enforced by NIST in the AES competition. This block size was during an era of 64-bit block ciphers (e.g. Triple-DES and Blowfish), so it was a significant improvement at the time.
NIST’s AES competition also inherited from the US government’s tradition of thinking in terms of “security levels”, which is why there are three different permitted key sizes (128, 192, or 256 bits).
“Why Isn’t This a Vulnerability?”
There’s always a significant gap in security, wherein something isn’t safe to recommend, but also isn’t susceptible to a known practical attack. This gap is important to keep systems secure, even when they aren’t on the bleeding edge of security.Using 1024-bit RSA is a good example of this: No one has yet, to my knowledge, successfully factored a 1024-bit RSA public key. However, most systems have recommended a minimum 2048-bit for years (and many recommend 3072-bit or 4096-bit today).
With AES-GCM, the expected distance between collisions in is , and finding an untargeted collision requires being able to observe more than different sessions, and somehow distinguish when collides.
As a user, you know that after different keys, you’ve crossed the safety boundary for avoiding collisions. But as an attacker, you need bites at the apple, not . Additionally, you need some sort of oracle or distinguisher for when this happens.
We don’t have that kind of distinguisher available to us today. And even if we had one available, the amount of data you need to search in order for any two users in the population to reuse/collide is challenging to work with. You would need the computational and data storages of a major cloud service provider to even think about pulling the attack off.
Naturally, this isn’t a practical vulnerability. This is just another gripe I have with AES-GCM, as someone who has to work with cryptographic algorithms a lot.
Short Nonces
Although the AES block size is 16 bytes, AES-GCM nonces are only 12 bytes. The latter 4 bytes are dedicated to an internal counter, which is used with AES in Counter Mode to actually encrypt/decrypt messages.(Yes, you can use arbitrary length nonces with AES-GCM, but if you use nonces longer than 12 bytes, they get hashed into 12 bytes anyway, so it’s not a detail most people should concern themselves with.)
If you ask a cryptographer, “How much can I encrypt safely with AES-GCM?” you’ll get two different answers.
- Message Length Limit: AES-GCM can be used to encrypt messages up to bytes long, under a given (key, nonce) pair.
- Number of Messages Limit: If you generate your nonces randomly, you have a 50% chance of a nonce collision after messages.
However, 50% isn’t conservative enough for most systems, so the safety margin is usually much lower. Cryptographers generally set the key wear-out of AES-GCM at random nonces, which represents a collision probability of one in 4 billion.These limits are acceptable for session keys for encryption-in-transit, but they impose serious operational limits on application-layer encryption with long-term keys.
Random Key Robustness
Before the advent of AEAD modes, cryptographers used to combine block cipher modes of operation (e.g. AES-CBC, AES-CTR) with a separate message authentication code algorithm (e.g. HMAC, CBC-MAC).You had to be careful in how you composed your protocol, lest you invite Cryptographic Doom into your life. A lot of developers screwed this up. Standardized AEAD modes promised to make life easier.
Many developers gained their intuition for authenticated encryption modes from protocols like Signal’s (which combines AES-CBC with HMAC-SHA256), and would expect AES-GCM to be a drop-in replacement.
Unfortunately, GMAC doesn’t offer the same security benefits as HMAC: Finding a different (ciphertext, HMAC key) pair that produces the same authentication tag is a hard problem, due to HMAC’s reliance on cryptographic hash functions. This makes HMAC-based constructions “message committing”, which instills Random Key Robustness.
Critically, AES-GCM doesn’t have this property. You can calculate a random (ciphertext, key) pair that collides with a given authentication tag very easily.
This fact prohibits AES-GCM from being considered for use with OPAQUE (which requires RKR), one of the upcoming password-authenticated key exchange algorithms. (Read more about them here.)
Better-Designed Algorithms
You might be thinking, “Okay random furry, if you hate AES-GCM so much, what would you propose we use instead?”I’m glad you asked!
XChaCha20-Poly1305
For encrypting messages under a long-term key, you can’t really beat XChaCha20-Poly1305.
- ChaCha is a stream cipher based on a 512-bit ARX hash function in counter mode. ChaCha doesn’t use S-Boxes. It’s fast and constant-time without hardware acceleration.
- ChaCha20 is ChaCha with 20 rounds.
- XChaCha nonces are 24 bytes, which allows you to generate them randomly and not worry about a birthday collision until about messages (for the same collision probability as AES-GCM).
- Poly1305 uses different 256-bit key for each (nonce, key) pair and is easier to implement in constant-time than AES-GCM.
- XChaCha20-Poly1305 uses the first 16 bytes of the nonce and the 256-bit key to generate a distinct subkey, and then employs the standard ChaCha20-Poly1305 construction used in TLS today.
For application-layer cryptography, XChaCha20-Poly1305 contains most of the properties you’d want from an authenticated mode.
However, like AES-GCM (and all other Polynomial MACs I’ve heard of), it is not message committing.
The Gimli Permutation
For lightweight cryptography (n.b. important for IoT), the Gimli permutation (e.g. employed in libhydrogen) is an attractive option.Gimli is a Round 2 candidate in NIST’s Lightweight Cryptography project. The Gimli permutation offers a lot of applications: a hash function, message authentication, encryption, etc.
Critically, it’s possible to construct a message-committing protocol out of Gimli that will hit a lot of the performance goals important to embedded systems.
Closing Remarks
Despite my personal disdain for AES-GCM, if you’re using it as intended by cryptographers, it’s good enough.Don’t throw AES-GCM out just because of my opinions. It’s very likely the best option you have.
Although I personally dislike AES and GCM, I’m still deeply appreciative of the brilliance and ingenuity that went into both designs.
My desire is for the industry to improve upon AES and GCM in future cipher designs so we can protect more people, from a wider range of threats, in more diverse protocols, at a cheaper CPU/memory/time cost.
We wouldn’t have a secure modern Internet without the work of Vincent Rijmen, Joan Daemen, John Viega, David A. McGrew, and the countless other cryptographers and security researchers who made AES-GCM possible.
Change Log
- 2021-10-26: Added section on H Reuse and Multi-User Security.
https://soatok.blog/2020/05/13/why-aes-gcm-sucks/
#AES #AESGCM #cryptography #GaloisCounterMode #opinion #SecurityGuidance #symmetricCryptography