This is the first entry in a (potentially infinite) series of dead end roads in the field of cryptanalysis.
Cryptography engineering is one of many specialties within the wider field of security engineering. Security engineering is a discipline that chiefly concerns itself with studying how systems fail in order to build better systems–ones that are resilient to malicious acts or even natural disasters. It sounds much simpler than it is.
If you want to develop and securely implement a cryptography feature in the application you’re developing, it isn’t enough to learn how to implement textbook descriptions of cryptography primitives during your C.S. undergrad studies (or equivalent). An active interest in studying how cryptosystems fail is the prerequisite for being a cryptography engineer.
Thus, cryptography engineering and cryptanalysis research go hand-in-hand.
Pictured: How I feel when someone tells me about a novel cryptanalysis technique relevant to the algorithm or protocol I’m implementing. (Art by Khia.)
If you are interested in exploring the field of cryptanalysis–be it to contribute on the attack side of cryptography or to learn better defense mechanisms–you will undoubtedly encounter roads that seem enticing and not well-tread, and it might not be immediately obvious why the road is a dead end. Furthermore, beyond a few comparison tables on Wikipedia or obscure Stack Exchange questions, the cryptology literature is often sparse on details about why these avenues lead nowhere.
So let’s explore where some of these dead-end roads lead, and why they stop where they do.
(Art by Kyume.)
Length Extension Attacks
It’s difficult to provide a better summary of length extension attacks than what Skull Security wrote in 2012. However, that only addresses “What are they?”, “How do you use them?”, and “Which algorithms and constructions are vulnerable?”, but leaves out a more interesting question: “Why were they even possible to begin with?”
An Extensive Tale
Tale, not tail! (Art by Swizz.)
To really understand length extension attacks, you have to understand how cryptographic hash functions used to be designed. This might sound intimidating, but we don’t need to delve too deep into the internals.
A cryptographic hash function is a keyless pseudorandom transformation from a variable length input to a fixed-length output. Hash functions are typically used as building blocks for larger constructions (both reasonable ones like HMAC-SHA-256, and unreasonable ones like my hash-crypt project).
However, hash functions like SHA-256 are designed to operate on sequential blocks of input. This is because sometimes you need to stream data into a hash function rather than load it all into memory at once. (This is why you can sha256sum a file larger than your available RAM without crashing your computer or causing performance headaches.)
A streaming hash function API might look like this:
class MyCoolHash(BaseHashClass): @staticmethod def init(): """ Initialize the hash state. """ def update(data): """ Update the hash state with additional data. """ def digest(): """ Finalize the hash function. """ def compress(): """ (Private method.) """
To use it, you’d call hash = MyCoolHash.init()
and then chain together hash.update()
calls with data as you load it from disk or the network, until you’ve run out of data. Then you’d call digest()
and obtain the hash of the entire message.
There are two things to take away right now:
- You can call
update()
multiple times, and that’s valid. - Your data might not be an even multiple of the internal block size of the hash function. (More often than not, it won’t be!)
So what happens when you call digest()
and the amount of data you’ve passed to update()
is not an even multiple of the hash size?
For most hash functions, the answer is simple: Append some ISO/IEC 7816-4 padding until you get a full block, run that through a final iteration of the internal compression function–the same one that gets called on update()
–and then output the current internal state.
Let’s take a slightly deeper look at what a typical runtime would look like for the MyCoolHash class I sketched above:
hash = MyCoolHash.init()
- Initialize some variables to some constants (initialization vectors).
hash.update(blockOfData)
- Start with any buffered data (currently none), count up to 32 bytes. If you’ve reached this amount, invoke
compress()
on that data and clear the buffer. Otherwise, just append blockOfData to the currently buffered data. - For every 32 byte of data not yet touched by
compress()
, invokecompress()
on this block (updating the internal state). - If you have any leftover bytes, append to the internal buffer for the next invocation to process.
- Start with any buffered data (currently none), count up to 32 bytes. If you’ve reached this amount, invoke
hash.update(moreData)
- Same as before, except there might be some buffered data from step 2.
output = hash.digest()
- If you have any data left in the buffer, append a 0x80 byte followed by a bunch of 0x00 bytes of padding until you reach the block size. If you don’t, you have an entire block of padding (0x80 followed by 0x00s).
- Call
compress()
one last time. - Serialize the internal hash state as a byte array or hexadecimal-encoded string (depending on usage). Return that to the caller.
This is fairly general description that will hold for most older hash functions. Some details might be slightly wrong (subtly different padding scheme, whether or not to include a block of empty padding on digest()
invocations, etc.).
The details aren’t super important. Just the rhythm of the design.
init()
update()
- load buffer,
compress()
compress()
compress()
- …
- buffer remainder
- load buffer,
update()
- load buffer,
compress()
compress()
compress()
- …
- buffer remainder
- load buffer,
- …
digest()
- load buffer, pad,
compress()
- serialize internal state
- return
- load buffer, pad,
And thus, without having to know any of the details about what compress()
even looks like, the reason why length extension attacks were ever possible should leap out at you!
Art by Khia.
If it doesn’t, look closely at the difference between update()
and digest()
.
There are only two differences:
update()
doesn’t pad before callingcompress()
digest()
returns the internal state thatcompress()
always mutates
The reason length-extension attacks are possible is that, for some hash functions, the output of digest()
is its full internal state.
This means that you can run take an existing hash function and pretend it’s the internal state after an update()
call instead of a digest()
call by appending the padding and then, after calling compress()
, appending additional data of your choice.
The (F)Utility of Length Extension
Length-Extension Attacks are mostly used for attacking naive message authentication systems where someone attempts to authenticate a message (M) with a secret key (k), but they construct it like so:
auth_code = vulnerable_hash(k.append(M))
If this sounds like a very narrow use-case, that’s because it is. However, it still broke Flickr’s API once, and it’s a popular challenge for CTF competitions around the world.
Consequently, length-extension attacks are sometimes thought to be vulnerabilities of the construction rather than a vulnerability of the hash function. For a Message Authentication Code construction, these are classified under canonicalization attacks.
After all, even though SHA-256 is vulnerable to length-extension, but you can’t actually exploit it unless someone is using it in a vulnerable fashion.
That being said, it’s often common to say that hash functions like SHA-256 and SHA-512 are prone to length-extension.
Ways to Avoid Length-Extension Attacks
Use HMAC. HMAC was designed to prevent these kinds of attacks.
Alternatively, if you don’t have any cryptographic secrets, you can always do what bitcoin did: Hash your hash again.
return sha256(sha256(message))
Note: Don’t actually do that, it’s dangerous for other reasons. You also don’t want to take this to an extreme. If you iterate your hash too many times, you’ll reinvent PBKDF1 and its insecurity. Two is plenty.
Or you can do something really trivial (which ultimately became another standard option in the SHA-2 family of hash functions):
Always start with a 512-bit hash and then truncate your output so the attacker never recovers the entire internal state of your hash in order to extend it.
That’s why you’ll sometimes see SHA-512/224 and SHA-512/256 in a list of recommendations. This isn’t saying “use one or the other”, that’s the (rather confusing) notation for a standardized SHA-512 truncation.
Note: This is actually what SHA-384 has done all along, and that’s one of the reasons why you see SHA-384 used more than SHA-512.
If you want to be extra fancy, you can also just use a different hash function that isn’t vulnerable to length extension, such as SHA-3 or BLAKE2.
Questions and Answers
Art by Khia.
Why isn’t BLAKE2 vulnerable to length extension attacks?
Quite simply: It sets a flag in the internal hash state before compressing the final buffer.
If you try to deserialize this state then invoke update()
, you’ll get a different result than BLAKE2’s compress()
produced during digest()
.
For a secure hash function, a single bit of difference in the internal state should result in a wildly different output. (This is called the avalanche effect.)
Why isn’t SHA-3 vulnerable to length extension attacks?
SHA-3 is a sponge construction whose internal state is much larger than the hash function output. This prevents an attacker from recovering the hash function’s internal state from a message digest (similar to the truncated hash function discussed above).
Why don’t length-extension attacks break digital signature algorithms?
Digital signature algorithms–such as RSASSA, ECDSA, and EdDSA–take a cryptographic hash of a message and then perform some asymmetric cryptographic transformation of the hash with the secret key to produce a signature that can be verified with a public key. (The exact details are particular to the signature algorithm in question.)
Length-extension attacks only allow you to take a valid H(k || m) and produce a valid H(k || m || padding || extra) hash that will validate, even if you don’t know k. They don’t magically create collisions out of thin air.
Even if you use a weak hash function like SHA-1, knowing M and H(M) is not sufficient to calculate a valid signature. (You need to be able to know these values in order to verify the signature anyway.)
The security of digital signature algorithms depends entirely on the secrecy of the signing key and the security of the asymmetric cryptographic transformation used to generate a signature. (And its resilience to side-channel attacks.)
However, a more interesting class of attack is possible for systems that expect digital signatures to have similar properties as cryptographic hash functions. This would qualify as a protocol vulnerability, not a length-extension vulnerability.
TL;DR
Art by Khia.
Length-extension attacks exploit a neat property of a few cryptographic hash functions–most of which you shouldn’t be using in 2020 anyway (SHA-2 is still fine)–but can only be exploited by a narrow set of circumstances.
If you find yourself trying to use length-extension to break anything else, you’ve probably run into a cryptographic dead end and need to backtrack onto more interesting avenues of exploitation–of which there are assuredly many (unless your cryptography is boring).
Next: Timing Side-Channels
https://soatok.blog/2020/10/06/dead-ends-in-cryptanalysis-1-length-extension-attacks/
#cryptanalysis #crypto #cryptographicHashFunction #cryptography #lengthExtensionAttacks
A question I get asked frequently is, “How did you learn cryptography?”I could certainly tell everyone my history as a self-taught programmer who discovered cryptography when, after my website for my indie game projects kept getting hacked, I was introduced to cryptographic hash functions… but I suspect the question folks want answered is, “How would you recommend I learn cryptography?” rather than my cautionary tale about poorly-implemented password hash being a gateway bug.
The Traditional Ways to Learn
There are two traditional ways to learn cryptography.If you want a book to augment your journey in either traditional path, I recommend Serious Cryptography by Jean-Philippe Aumasson.
Academic Cryptography
The traditional academic way to learn cryptography involves a lot of self-study about number theory, linear algebra, discrete mathematics, probability, permutations, and field theory.You’d typically start off with classical ciphers (Caesar, etc.) then work your way through the history of ciphers until you finally reach an introduction to the math underpinning RSA and Diffie-Hellman, and maybe taught about Schneier’s Law and cautioned to only use AES and SHA-2… and then you’re left to your own devices unless you pursue a degree in cryptography.
The end result of people carelessly exploring this path is a lot of designs like Telegram’s MTProto that do stupid things with exotic block cipher modes and misusing vanilla cryptographic hash functions as message authentication codes; often with textbook a.k.a. unpadded RSA, AES in ECB, CBC, or some rarely-used mode that the author had to write custom code to handle (using ECB mode under the hood), and (until recently) SHA-1.
People who decide to pursue cryptography as a serious academic discipline will not make these mistakes. They’re far too apt for the common mistakes. Instead, they run the risk of spending years involved in esoteric research about homomorphic encryption, cryptographic pairings, and other cool stuff that might not see real world deployment (outside of novel cryptocurrency hobby projects) for five or more years.
That is to say: Academia is a valid path to pursue, but it’s not for everyone.
If you want to explore this path, Cryptography I by Dan Boneh is a great starting point.
Security Industry-Driven Cryptography
The other traditional way to learn cryptography is to break existing cryptography implementations. This isn’t always as difficult as it may sound: Reverse engineering video games to defeat anti-cheat protections has led several of my friends into learning about cryptography.For security-minded folks, the best place to start is the CryptoPals challenges. Another alternative is CryptoHack.
There are also plenty of CTF events all year around, but they’re rarely a good cryptography learning exercise above what CryptoPals offers. (Though there are notable exceptions.)
A Practical Approach to Learning Cryptography
Art by Kyume.If you’re coming from a computer programming background and want to learn cryptography, the traditional approaches carry the risk of Reasoning By Lego.
Instead, the approach I recommend is to start gaining experience with the safest, highest-level libraries and then slowly working your way down into the details.
This approach has two benefits:
- If you have to implement something while you’re still learning, your knowledge and experience is stilted towards “use something safe and secure” not “hack together something with Blowfish in ECB mode and MD5 because they’re familiar”.
- You can let your own curiosity guide your education rather than follow someone else’s study guide.
To illustrate what this looks like, here’s how a JavaScript developer might approach learning cryptography, starting from the most easy-mode library and drilling down into specifics.
Super Easy Mode: DholeCrypto
Disclaimer: This is my project.Dhole Crypto is an open source library, implemented in JavaScript and PHP and powered by libsodium, that tries to make security as easy as possible.
I designed Dhole Crypto for securing my own projects without increasing the cognitive load of anyone reviewing my code.
If you’re an experienced programmer, you should be able to successfully use Dhole Crypto in a Node.js/PHP project. If it does not come easy, that is a bug that should be fixed immediately.
Easy Mode: Libsodium
Using libsodium is slightly more involved than Dhole Crypto: Now you have to know what a nonce is, and take care to manage them carefully.Advantage: Your code will be faster than if you used Dhole Crypto.
Libsodium is still pretty easy. If you use this cheat sheet, you can implement something secure without much effort. If you deviate from the cheat sheet, pay careful attention to the documentation.
If you’re writing system software (i.e. programming in C), libsodium is an incredibly easy-to-use library.
Moderate Difficulty: Implementing Protocols
Let’s say you’re working on a project where libsodium is overkill, and you only need a few cryptography primitives and constructions (e.g. XChaCha20-Poly1305). A good example: In-browser JavaScript.Instead of forcing your users to download the entire Sodium library, you might opt to implement a compatible construction using JavaScript implementations of these primitives.
Since you have trusted implementations to test your construction against, this should be a comparatively low-risk effort (assuming the primitive implementations are also secure), but it’s not one that should be undertaken without all of the prior experience.
Note: At this stage you are not implementing the primitives, just using them.
Hard Difficulty: Designing Protocols and Constructions
Repeat after me: “I will not roll my own crypto before I’m ready.” Art by AtlasInu.To distinguish: TLS and Noise are protocols. AES-GCM and XChaCha20-Poly1305 are constructions.
Once you’ve implemented protocols and constructions, the next step in your self-education is to design new ones.
Maybe you want to combine XChaCha20 with a MAC based on the BLAKE3 hash function, with some sort of SIV to make the whole shebang nonce-misuse resistant?
You wouldn’t want to dive headfirst into cryptography protocol/construction design without all of the prior experience.
Very Hard Mode: Implementing Cryptographic Primitives
It’s not so much that cryptography primitives are hard to implement. You could fit RC4 in a tweet before they raised the character limit to 280. (Don’t use RC4 though!)The hard part is that they’re hard to implement securely. See also: LadderLeak.
Usually when you get to this stage in your education, you will have also picked up one or both of the traditional paths to augment your understanding. If not, you really should.
Nightmare Mode: Designing Cryptography Primitives
A lot of people like to dive straight into this stage early in their education. This usually ends in tears.If you’ve mastered every step in my prescribed outline and pursued both of the traditional paths to the point that you have a novel published attack in a peer-reviewed journal (and mirrored on ePrint), then you’re probably ready for this stage.
Bonus: If you’re a furry and you become a cryptography expert, you can call yourself a cryptografur. If you had no other reason to learn cryptography, do it just for pun!
Header art by circuitslime.
https://soatok.blog/2020/06/10/how-to-learn-cryptography-as-a-programmer/
#cryptography #education #programming #Technology