Search
Items tagged with: cryptanalysis
Historical Context of Iota’s Hash Functions
Once upon a time, researchers discovered that the hash function used within the Iota cryptocurrency (Curl-P), was vulnerable to practical collisions. When pressed about this, the Iota Foundation said the following:
https://twitter.com/durumcrustulum/status/1083859956841889792
In response to this research, the Iota developers threatened to sue the researchers.
https://twitter.com/matthew_d_green/status/965731647579664385?lang=en
Iota replaced Curl-P-27 with a hash function based on Keccak-384 they call Kerl. Keccak, you may recall, is a sponge function that went on to become SHA-3.
At its face, this sounds like a conservative choice in cryptography protocol design: Migrate from a backdoored hash function to one trusted and respected by cryptographers around the world.
https://twitter.com/cybergibbons/status/1283065819300331520
Iota even offered a bug bounty for attacks against reduced round variants:
https://twitter.com/durumcrustulum/status/1083860482866343936
Kerl isn’t Keccak-384 though. It’s Keccak-384 with a bit of an odd twist: They encode the input bytes into ternary ({0, 1} -> {-1, 0, 1}) before hashing. Apparently this weird obsession with ternary encoding is part of Iota’s schtick?
Practical Kerl Collisions
As a consequence of their weird ternary obsession, the following inputs all produce the same Kerl hash:
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIZ
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIH
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIQ
This is a consequence of always zeroing out the last “trit” before passing the input to Keccak-384.
Since this zeroing was an explicit design choice of Iota’s, I decided to helpfully submit a pull request adding these inputs as test vectors to their JavaScript implementation.
Why It Matters
These are the facts:
- Kerl is intended to replace Curl-P-27 within the Iota cryptocurrency.
- The Iota Foundation previously admitted that Curl-P-27 was a “copy-protection backdoor”.
- The ternary encoding in Kerl allows three different inputs to produce the same hash output (which is not true of Keccak-384).
Given the past behavior of the Iota developers, there are three possible explanations for this:
- It’s a bugdoor (a backdoor enabled by a bug) intended to be exploited by the Coordinator–possibly as another copy-protection backdoor in the spirit of Curl-P-27.
- They made a critical design mistake in Kerl, which may or may not be exploitable in one or more of the places they use Kerl.
- Their alleged justification for zeroing out the last trit is sound, and there’s no way to exploit this collision vulnerability within Iota.
The last explanation is the one that Iota fanboys will probably cling to, but let me be clear: Even if this isn’t exploitable within Iota, it’s still a huge fuck-up in the design of Kerl.
I’ve lost count of the number of projects I’ve encountered that used secp256k1 as their elliptic curve choice simply because it was also the curve Bitcoin used. These are protocols doing ECDSA for JSON Web Tokens, or ECDH for use in ECIES schemes.
The probability of someone eventually using Kerl outside of Iota is 1, and they will be vulnerable to collisions because of that choice.
Takeaways
Iota is a cryptocurrency project that threatens security researchers, intentionally backdoored the cryptographic hash function in their original design (for “copy-protection”) and admitted to it, and designed a replacement cryptographic hash function (Kerl) that is vulnerable to trivial collision attacks (but the underlying hash function, Keccak-384, is not).
I don’t need to connect the lines on this one, do I?
It’s really freaking obvious.
Since this attack also works on “reduced round” variants of Kerl, I guess I win their bug bounty too!
If the Iota developers are reading this, please remit payment for your promised bounty to the following ZEC address: zs1lwghzjazt4h53gwnl7f24tdq99kw7eh9hgh3qumdvcndszl7ml4xmsudcmm60dut2cfesxmlcec
That being said, I never got an answer to my question, “Should I report the security reduction in bits or trits?” I’m feeling generous though. Given the cost of a collision is basically 1 trit, this is a 242-trit security reduction against collision attacks in Kerl.
https://soatok.blog/2020/07/15/kerlissions-trivial-collisions-in-iotas-hash-function-kerl/
#collision #cryptanalysis #Cryptocurrency #cryptographicHashFunction #hashFunction #iota #kerl #kerlission
Historical Context of Iota’s Hash Functions
Once upon a time, researchers discovered that the hash function used within the Iota cryptocurrency (Curl-P), was vulnerable to practical collisions. When pressed about this, the Iota Foundation said the following:https://twitter.com/durumcrustulum/status/1083859956841889792
In response to this research, the Iota developers threatened to sue the researchers.
https://twitter.com/matthew_d_green/status/965731647579664385?lang=en
Iota replaced Curl-P-27 with a hash function based on Keccak-384 they call Kerl. Keccak, you may recall, is a sponge function that went on to become SHA-3.
At its face, this sounds like a conservative choice in cryptography protocol design: Migrate from a backdoored hash function to one trusted and respected by cryptographers around the world.
https://twitter.com/cybergibbons/status/1283065819300331520
Iota even offered a bug bounty for attacks against reduced round variants:
https://twitter.com/durumcrustulum/status/1083860482866343936
Kerl isn’t Keccak-384 though. It’s Keccak-384 with a bit of an odd twist: They encode the input bytes into ternary ({0, 1} -> {-1, 0, 1}) before hashing. Apparently this weird obsession with ternary encoding is part of Iota’s schtick?
Practical Kerl Collisions
As a consequence of their weird ternary obsession, the following inputs all produce the same Kerl hash:
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIZ
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIH
- GYOMKVTSNHVJNCNFBBAH9AAMXLPLLLROQY99QN9DLSJUHDPBLCFFAIQXZA9BKMBJCYSFHFPXAHDWZFEIQ
This is a consequence of always zeroing out the last “trit” before passing the input to Keccak-384.
Since this zeroing was an explicit design choice of Iota’s, I decided to helpfully submit a pull request adding these inputs as test vectors to their JavaScript implementation.
Why It Matters
These are the facts:
- Kerl is intended to replace Curl-P-27 within the Iota cryptocurrency.
- The Iota Foundation previously admitted that Curl-P-27 was a “copy-protection backdoor”.
- The ternary encoding in Kerl allows three different inputs to produce the same hash output (which is not true of Keccak-384).
Given the past behavior of the Iota developers, there are three possible explanations for this:
- It’s a bugdoor (a backdoor enabled by a bug) intended to be exploited by the Coordinator–possibly as another copy-protection backdoor in the spirit of Curl-P-27.
- They made a critical design mistake in Kerl, which may or may not be exploitable in one or more of the places they use Kerl.
- Their alleged justification for zeroing out the last trit is sound, and there’s no way to exploit this collision vulnerability within Iota.
The last explanation is the one that Iota fanboys will probably cling to, but let me be clear: Even if this isn’t exploitable within Iota, it’s still a huge fuck-up in the design of Kerl.
I’ve lost count of the number of projects I’ve encountered that used secp256k1 as their elliptic curve choice simply because it was also the curve Bitcoin used. These are protocols doing ECDSA for JSON Web Tokens, or ECDH for use in ECIES schemes.
The probability of someone eventually using Kerl outside of Iota is 1, and they will be vulnerable to collisions because of that choice.
Takeaways
Iota is a cryptocurrency project that threatens security researchers, intentionally backdoored the cryptographic hash function in their original design (for “copy-protection”) and admitted to it, and designed a replacement cryptographic hash function (Kerl) that is vulnerable to trivial collision attacks (but the underlying hash function, Keccak-384, is not).I don’t need to connect the lines on this one, do I?
It’s really freaking obvious.
Since this attack also works on “reduced round” variants of Kerl, I guess I win their bug bounty too!
If the Iota developers are reading this, please remit payment for your promised bounty to the following ZEC address: zs1lwghzjazt4h53gwnl7f24tdq99kw7eh9hgh3qumdvcndszl7ml4xmsudcmm60dut2cfesxmlcec
That being said, I never got an answer to my question, “Should I report the security reduction in bits or trits?” I’m feeling generous though. Given the cost of a collision is basically 1 trit, this is a 242-trit security reduction against collision attacks in Kerl.
https://soatok.blog/2020/07/15/kerlissions-trivial-collisions-in-iotas-hash-function-kerl/
#collision #cryptanalysis #Cryptocurrency #cryptographicHashFunction #hashFunction #iota #kerl #kerlission
Previously on Dead Ends in Cryptanalysis, we talked about length-extension attacks and precisely why modern hash functions like SHA-3 and BLAKE2 aren’t susceptible.
The art and science of side-channel cryptanalysis is one of the subjects I’m deeply fascinated by, and it’s something you’ll hear me yap about a lot on this blog in the future.
Since my background before computer security was in web development, I spend a lot of time talking about timing side-channels in particular, as well as their mitigations (see also: constant-time-js).
Pictured: Me, when an interesting attack gets published on ePrint.
(Art by Khia.)
However, timing side-channels aren’t omnipotent. Even if your code isn’t constant-time, that doesn’t mean you necessarily have a vulnerability. Case in point:
Length Leaks Are Usually Nothing-Burgers
If you look closely at a constant-time string equality function, you’ll see some clause that looks like this:
if (left.length !== right.length) return false;
A common concern that crops up in bikeshedding discussions about the correct implementation of a constant-time compare is, “This will fail fast if two strings of non-equal length are provided; doesn’t this leak information about the strings being compared?”
Sure, but it won’t affect the security of the application that uses it. Consider a contrived example:
- You’re encrypting with AES-CTR then authenticating the ciphertext with HMAC-SHA256 (Encrypt then MAC).
- For added fun, let’s assume you’re using HKDF-HMAC-SHA512 with a 256-bit salt to derive separate a separate encryption keys and MAC keys from the input key. This salt is prepended to the ciphertext and included as an input to the HMAC tag calculation. Now you don’t have to worry about cryptographic wear-out.
- You’re padding the plaintext to exactly 16 kilobytes prior to encryption, because the exact length of the plaintext is considered sensitive.
- You remove the padding after decryption.
- Your constant-time comparison is used to validate the HMAC tags.
Even though the length of your plaintext is sensitive, it doesn’t really matter that length mismatches leak here: The inputs to the constant-time compare are always HMAC-SHA256 outputs. They will always be 32 bytes (256 bits) long. This is public knowledge.
If you’ve somehow managed to design a protocol that depends on the secrecy of the length of a non-truncated HMAC-SHA256 output to be secure, you’ve probably fucked up something fierce.
However, if you were comparing the unpadded plaintexts with this function–or passing the unpadded plaintext to a hash function–you might have cause for concern.
“Double HMAC” is a defense against compiler/JIT optimizations, not length leaks.
(Art by Khia.)
When Do Timing Leaks Cause Impact?
Timing side-channels only lead to a vulnerability when they reveal some information about one of the secret inputs to a cryptographic function.
- Leaking how many leading bytes match when comparing HMACs can allow an attacker to forge a valid authentication tag for a chosen message–which often enables further attacks (e.g. padding oracles with AES-CBC + HMAC). The cryptographic secret is the correct authentication tag for a chosen message under a key known only to the defender.
- Leaking the number of leading zeroes introduced the risk of lattice attacks in TLS when used with Diffie-Hellman ciphersuites. See also: the Raccoon Attack. The cryptographic secret is the zero-trimmed shared secret, which is an input to a hash function.
- Leaking the secret number in the modular inverse step when calculating an ECDSA signature gives attackers enough information to recover the secret key. This can happen if you’re using non-constant-time arithmetic.
Timing attacks can even break state-of-the-art cryptography projects, like the algorithms submitted to NIST’s Post-Quantum Cryptography standardization effort:
https://twitter.com/EllipticKiwi/status/1295670085969838080
However–and this is important–if what leaks is a public input (n.b. something the attackers already knows anyway), then who cares?
(Art by Khia.)
Why Timing Leaks Don’t Break Signature Verification
If you’re reviewing some cryptography library and discovered a timing leak in the elliptic curve signature verification function, you might feel tempted to file a vulnerability report with the maintainers of the library.
If so, you’re wasting your time and theirs, for two reasons:
- Signature verification is performed over public inputs (message, public key, signature).
- Knowing which byte verification the comparison fails on isn’t sufficient for forging a signature for a chosen message.
The first part is obvious (and discussed above), but the second might seem untrue at first: If HMAC breaks this way, why doesn’t ECDSA also suffer here?
The Anatomy of Elliptic Curve Digital Signatures
Elliptic curve signatures are usually encoded as . How these numbers are derived and verified depends on the algorithm in question.
In the case of ECDSA, you calculate two numbers (, ) based on the hash of the plaintext and , both multiplied by the modular inverse of (mod ). You then calculate a curve point based on the public key (). The signature is valid if and only if the x coordinate of that curve point is equal to from the signature (and isn’t equal to the point at infinity).
Why Don’t Timing Attacks Do Anything Here?
Even with a timing leak on the string compare function in hand, you cannot easily find a valid for a chosen message for two reasons:
- The derivation of is effectively an All-Or-Nothing Transform based on secret inputs.
- The curve point equation ) multiplies the ratio r/s by the public key (because ).
In order to calculate a valid pair that will validate , you’d need to know the secret key that corresponds to .
It’s not impossible to calculate this value, but it’s computationally infeasible, and the difficulty of this problem is approximately one fourth the signature size. That is to say, 512-bit signatures, derived from 256-bit keys, have a security level of 128 bits.
Thus, timing leakage won’t let you perform an existential forgery here.
Aside: Don’t confuse signatures for MACs, as iMessage famously did.
(Art by Khia.)
Under What Conditions Could Timing Side-Channels Matter to ECDSA Verification?
Suppose you have a JSON Web Token library that’s vulnerable to the type confusion attack (wherein you can swap out the "alg":"ES256"
with "alg":"HS256"
and then use the public key as if it was an HMAC symmetric key).
In this hypothetical scenario, let’s say you’re using this JWT library in an OIDC-like configuration, where the identity provider signs tokens and the application verifies them, using a public key known to the application.
Also assume, for absolutely contrived reasons, that the public key is not known to the attacker.
If you had a timing attack that leaks the public key, that would be a viable (if horrendously slow) way to make the vulnerability exploitable.
However, even in this setup, the timing leak still doesn’t qualify as a real vulnerability. It merely defeats attempts at Security Through Obscurity. The real vulnerability is any JWT library that allows this attack (or alg=none).
Additionally, you can recover the public key if you have sufficient knowledge of the curve algorithm used, message signed, etc.–which you do if the algorithm is ES256
–so you don’t really even need a timing leak for this. Consequently, timing leaks would only help you if the original algorithm was something custom and obscure to attackers.
(Aside: there are two possible public keys from each signature, so the signature alone isn’t sufficient for uniquely identifying public keys. If you’re hoping to reduce protocol bandwidth through this trick, it won’t work.)
TL;DR
In order for a timing leak to be useful for cryptanalysis, it cannot leak a publicly-known input to the cryptographic operation.
https://soatok.blog/2021/06/07/dead-ends-in-cryptanalysis-2-timing-side-channels/
#cryptanalysis #crypto #cryptography #deadEndsInCryptanalysis #ECDSA #sideChannels #Technology #timingAttacks
This is the first entry in a (potentially infinite) series of dead end roads in the field of cryptanalysis.Cryptography engineering is one of many specialties within the wider field of security engineering. Security engineering is a discipline that chiefly concerns itself with studying how systems fail in order to build better systems–ones that are resilient to malicious acts or even natural disasters. It sounds much simpler than it is.
If you want to develop and securely implement a cryptography feature in the application you’re developing, it isn’t enough to learn how to implement textbook descriptions of cryptography primitives during your C.S. undergrad studies (or equivalent). An active interest in studying how cryptosystems fail is the prerequisite for being a cryptography engineer.
Thus, cryptography engineering and cryptanalysis research go hand-in-hand.
Pictured: How I feel when someone tells me about a novel cryptanalysis technique relevant to the algorithm or protocol I’m implementing. (Art by Khia.)
If you are interested in exploring the field of cryptanalysis–be it to contribute on the attack side of cryptography or to learn better defense mechanisms–you will undoubtedly encounter roads that seem enticing and not well-tread, and it might not be immediately obvious why the road is a dead end. Furthermore, beyond a few comparison tables on Wikipedia or obscure Stack Exchange questions, the cryptology literature is often sparse on details about why these avenues lead nowhere.
So let’s explore where some of these dead-end roads lead, and why they stop where they do.
(Art by Kyume.)
Length Extension Attacks
It’s difficult to provide a better summary of length extension attacks than what Skull Security wrote in 2012. However, that only addresses “What are they?”, “How do you use them?”, and “Which algorithms and constructions are vulnerable?”, but leaves out a more interesting question: “Why were they even possible to begin with?”An Extensive Tale
Tale, not tail! (Art by Swizz.)To really understand length extension attacks, you have to understand how cryptographic hash functions used to be designed. This might sound intimidating, but we don’t need to delve too deep into the internals.
A cryptographic hash function is a keyless pseudorandom transformation from a variable length input to a fixed-length output. Hash functions are typically used as building blocks for larger constructions (both reasonable ones like HMAC-SHA-256, and unreasonable ones like my hash-crypt project).
However, hash functions like SHA-256 are designed to operate on sequential blocks of input. This is because sometimes you need to stream data into a hash function rather than load it all into memory at once. (This is why you can sha256sum a file larger than your available RAM without crashing your computer or causing performance headaches.)
A streaming hash function API might look like this:
class MyCoolHash(BaseHashClass): @staticmethod def init(): """ Initialize the hash state. """ def update(data): """ Update the hash state with additional data. """ def digest(): """ Finalize the hash function. """ def compress(): """ (Private method.) """
To use it, you’d callhash = MyCoolHash.init()
and then chain togetherhash.update()
calls with data as you load it from disk or the network, until you’ve run out of data. Then you’d calldigest()
and obtain the hash of the entire message.There are two things to take away right now:
- You can call
update()
multiple times, and that’s valid.- Your data might not be an even multiple of the internal block size of the hash function. (More often than not, it won’t be!)
So what happens when you call
digest()
and the amount of data you’ve passed toupdate()
is not an even multiple of the hash size?For most hash functions, the answer is simple: Append some ISO/IEC 7816-4 padding until you get a full block, run that through a final iteration of the internal compression function–the same one that gets called on
update()
–and then output the current internal state.Let’s take a slightly deeper look at what a typical runtime would look like for the MyCoolHash class I sketched above:
hash = MyCoolHash.init()
- Initialize some variables to some constants (initialization vectors).
hash.update(blockOfData)
- Start with any buffered data (currently none), count up to 32 bytes. If you’ve reached this amount, invoke
compress()
on that data and clear the buffer. Otherwise, just append blockOfData to the currently buffered data.- For every 32 byte of data not yet touched by
compress()
, invokecompress()
on this block (updating the internal state).- If you have any leftover bytes, append to the internal buffer for the next invocation to process.
hash.update(moreData)
- Same as before, except there might be some buffered data from step 2.
output = hash.digest()
- If you have any data left in the buffer, append a 0x80 byte followed by a bunch of 0x00 bytes of padding until you reach the block size. If you don’t, you have an entire block of padding (0x80 followed by 0x00s).
- Call
compress()
one last time.- Serialize the internal hash state as a byte array or hexadecimal-encoded string (depending on usage). Return that to the caller.
This is fairly general description that will hold for most older hash functions. Some details might be slightly wrong (subtly different padding scheme, whether or not to include a block of empty padding on
digest()
invocations, etc.).The details aren’t super important. Just the rhythm of the design.
init()
update()
- load buffer,
compress()
compress()
compress()
- …
- buffer remainder
update()
- load buffer,
compress()
compress()
compress()
- …
- buffer remainder
- …
digest()
- load buffer, pad,
compress()
- serialize internal state
- return
And thus, without having to know any of the details about what
compress()
even looks like, the reason why length extension attacks were ever possible should leap out at you!Art by Khia.
If it doesn’t, look closely at the difference between
update()
anddigest()
.There are only two differences:
update()
doesn’t pad before callingcompress()
digest()
returns the internal state thatcompress()
always mutatesThe reason length-extension attacks are possible is that, for some hash functions, the output of
digest()
is its full internal state.This means that you can run take an existing hash function and pretend it’s the internal state after an
update()
call instead of adigest()
call by appending the padding and then, after callingcompress()
, appending additional data of your choice.The (F)Utility of Length Extension
Length-Extension Attacks are mostly used for attacking naive message authentication systems where someone attempts to authenticate a message (M) with a secret key (k), but they construct it like so:
auth_code = vulnerable_hash(k.append(M))
If this sounds like a very narrow use-case, that’s because it is. However, it still broke Flickr’s API once, and it’s a popular challenge for CTF competitions around the world.Consequently, length-extension attacks are sometimes thought to be vulnerabilities of the construction rather than a vulnerability of the hash function. For a Message Authentication Code construction, these are classified under canonicalization attacks.
After all, even though SHA-256 is vulnerable to length-extension, but you can’t actually exploit it unless someone is using it in a vulnerable fashion.
That being said, it’s often common to say that hash functions like SHA-256 and SHA-512 are prone to length-extension.
Ways to Avoid Length-Extension Attacks
Use HMAC. HMAC was designed to prevent these kinds of attacks.Alternatively, if you don’t have any cryptographic secrets, you can always do what bitcoin did: Hash your hash again.
return sha256(sha256(message))
Note: Don’t actually do that, it’s dangerous for other reasons. You also don’t want to take this to an extreme. If you iterate your hash too many times, you’ll reinvent PBKDF1 and its insecurity. Two is plenty.Or you can do something really trivial (which ultimately became another standard option in the SHA-2 family of hash functions):
Always start with a 512-bit hash and then truncate your output so the attacker never recovers the entire internal state of your hash in order to extend it.
That’s why you’ll sometimes see SHA-512/224 and SHA-512/256 in a list of recommendations. This isn’t saying “use one or the other”, that’s the (rather confusing) notation for a standardized SHA-512 truncation.
Note: This is actually what SHA-384 has done all along, and that’s one of the reasons why you see SHA-384 used more than SHA-512.
If you want to be extra fancy, you can also just use a different hash function that isn’t vulnerable to length extension, such as SHA-3 or BLAKE2.
Questions and Answers
Art by Khia.Why isn’t BLAKE2 vulnerable to length extension attacks?
Quite simply: It sets a flag in the internal hash state before compressing the final buffer.If you try to deserialize this state then invoke
update()
, you’ll get a different result than BLAKE2’scompress()
produced duringdigest()
.For a secure hash function, a single bit of difference in the internal state should result in a wildly different output. (This is called the avalanche effect.)
Why isn’t SHA-3 vulnerable to length extension attacks?
SHA-3 is a sponge construction whose internal state is much larger than the hash function output. This prevents an attacker from recovering the hash function’s internal state from a message digest (similar to the truncated hash function discussed above).Why don’t length-extension attacks break digital signature algorithms?
Digital signature algorithms–such as RSASSA, ECDSA, and EdDSA–take a cryptographic hash of a message and then perform some asymmetric cryptographic transformation of the hash with the secret key to produce a signature that can be verified with a public key. (The exact details are particular to the signature algorithm in question.)Length-extension attacks only allow you to take a valid H(k || m) and produce a valid H(k || m || padding || extra) hash that will validate, even if you don’t know k. They don’t magically create collisions out of thin air.
Even if you use a weak hash function like SHA-1, knowing M and H(M) is not sufficient to calculate a valid signature. (You need to be able to know these values in order to verify the signature anyway.)
The security of digital signature algorithms depends entirely on the secrecy of the signing key and the security of the asymmetric cryptographic transformation used to generate a signature. (And its resilience to side-channel attacks.)
However, a more interesting class of attack is possible for systems that expect digital signatures to have similar properties as cryptographic hash functions. This would qualify as a protocol vulnerability, not a length-extension vulnerability.
TL;DR
Art by Khia.Length-extension attacks exploit a neat property of a few cryptographic hash functions–most of which you shouldn’t be using in 2020 anyway (SHA-2 is still fine)–but can only be exploited by a narrow set of circumstances.
If you find yourself trying to use length-extension to break anything else, you’ve probably run into a cryptographic dead end and need to backtrack onto more interesting avenues of exploitation–of which there are assuredly many (unless your cryptography is boring).
Next: Timing Side-Channels
https://soatok.blog/2020/10/06/dead-ends-in-cryptanalysis-1-length-extension-attacks/
#cryptanalysis #crypto #cryptographicHashFunction #cryptography #lengthExtensionAttacks
This is the first entry in a (potentially infinite) series of dead end roads in the field of cryptanalysis.
Cryptography engineering is one of many specialties within the wider field of security engineering. Security engineering is a discipline that chiefly concerns itself with studying how systems fail in order to build better systems–ones that are resilient to malicious acts or even natural disasters. It sounds much simpler than it is.
If you want to develop and securely implement a cryptography feature in the application you’re developing, it isn’t enough to learn how to implement textbook descriptions of cryptography primitives during your C.S. undergrad studies (or equivalent). An active interest in studying how cryptosystems fail is the prerequisite for being a cryptography engineer.
Thus, cryptography engineering and cryptanalysis research go hand-in-hand.
Pictured: How I feel when someone tells me about a novel cryptanalysis technique relevant to the algorithm or protocol I’m implementing. (Art by Khia.)
If you are interested in exploring the field of cryptanalysis–be it to contribute on the attack side of cryptography or to learn better defense mechanisms–you will undoubtedly encounter roads that seem enticing and not well-tread, and it might not be immediately obvious why the road is a dead end. Furthermore, beyond a few comparison tables on Wikipedia or obscure Stack Exchange questions, the cryptology literature is often sparse on details about why these avenues lead nowhere.
So let’s explore where some of these dead-end roads lead, and why they stop where they do.
(Art by Kyume.)
Length Extension Attacks
It’s difficult to provide a better summary of length extension attacks than what Skull Security wrote in 2012. However, that only addresses “What are they?”, “How do you use them?”, and “Which algorithms and constructions are vulnerable?”, but leaves out a more interesting question: “Why were they even possible to begin with?”
An Extensive Tale
Tale, not tail! (Art by Swizz.)
To really understand length extension attacks, you have to understand how cryptographic hash functions used to be designed. This might sound intimidating, but we don’t need to delve too deep into the internals.
A cryptographic hash function is a keyless pseudorandom transformation from a variable length input to a fixed-length output. Hash functions are typically used as building blocks for larger constructions (both reasonable ones like HMAC-SHA-256, and unreasonable ones like my hash-crypt project).
However, hash functions like SHA-256 are designed to operate on sequential blocks of input. This is because sometimes you need to stream data into a hash function rather than load it all into memory at once. (This is why you can sha256sum a file larger than your available RAM without crashing your computer or causing performance headaches.)
A streaming hash function API might look like this:
class MyCoolHash(BaseHashClass): @staticmethod def init(): """ Initialize the hash state. """ def update(data): """ Update the hash state with additional data. """ def digest(): """ Finalize the hash function. """ def compress(): """ (Private method.) """
To use it, you’d call hash = MyCoolHash.init()
and then chain together hash.update()
calls with data as you load it from disk or the network, until you’ve run out of data. Then you’d call digest()
and obtain the hash of the entire message.
There are two things to take away right now:
- You can call
update()
multiple times, and that’s valid. - Your data might not be an even multiple of the internal block size of the hash function. (More often than not, it won’t be!)
So what happens when you call digest()
and the amount of data you’ve passed to update()
is not an even multiple of the hash size?
For most hash functions, the answer is simple: Append some ISO/IEC 7816-4 padding until you get a full block, run that through a final iteration of the internal compression function–the same one that gets called on update()
–and then output the current internal state.
Let’s take a slightly deeper look at what a typical runtime would look like for the MyCoolHash class I sketched above:
hash = MyCoolHash.init()
- Initialize some variables to some constants (initialization vectors).
hash.update(blockOfData)
- Start with any buffered data (currently none), count up to 32 bytes. If you’ve reached this amount, invoke
compress()
on that data and clear the buffer. Otherwise, just append blockOfData to the currently buffered data. - For every 32 byte of data not yet touched by
compress()
, invokecompress()
on this block (updating the internal state). - If you have any leftover bytes, append to the internal buffer for the next invocation to process.
- Start with any buffered data (currently none), count up to 32 bytes. If you’ve reached this amount, invoke
hash.update(moreData)
- Same as before, except there might be some buffered data from step 2.
output = hash.digest()
- If you have any data left in the buffer, append a 0x80 byte followed by a bunch of 0x00 bytes of padding until you reach the block size. If you don’t, you have an entire block of padding (0x80 followed by 0x00s).
- Call
compress()
one last time. - Serialize the internal hash state as a byte array or hexadecimal-encoded string (depending on usage). Return that to the caller.
This is fairly general description that will hold for most older hash functions. Some details might be slightly wrong (subtly different padding scheme, whether or not to include a block of empty padding on digest()
invocations, etc.).
The details aren’t super important. Just the rhythm of the design.
init()
update()
- load buffer,
compress()
compress()
compress()
- …
- buffer remainder
- load buffer,
update()
- load buffer,
compress()
compress()
compress()
- …
- buffer remainder
- load buffer,
- …
digest()
- load buffer, pad,
compress()
- serialize internal state
- return
- load buffer, pad,
And thus, without having to know any of the details about what compress()
even looks like, the reason why length extension attacks were ever possible should leap out at you!
Art by Khia.
If it doesn’t, look closely at the difference between update()
and digest()
.
There are only two differences:
update()
doesn’t pad before callingcompress()
digest()
returns the internal state thatcompress()
always mutates
The reason length-extension attacks are possible is that, for some hash functions, the output of digest()
is its full internal state.
This means that you can run take an existing hash function and pretend it’s the internal state after an update()
call instead of a digest()
call by appending the padding and then, after calling compress()
, appending additional data of your choice.
The (F)Utility of Length Extension
Length-Extension Attacks are mostly used for attacking naive message authentication systems where someone attempts to authenticate a message (M) with a secret key (k), but they construct it like so:
auth_code = vulnerable_hash(k.append(M))
If this sounds like a very narrow use-case, that’s because it is. However, it still broke Flickr’s API once, and it’s a popular challenge for CTF competitions around the world.
Consequently, length-extension attacks are sometimes thought to be vulnerabilities of the construction rather than a vulnerability of the hash function. For a Message Authentication Code construction, these are classified under canonicalization attacks.
After all, even though SHA-256 is vulnerable to length-extension, but you can’t actually exploit it unless someone is using it in a vulnerable fashion.
That being said, it’s often common to say that hash functions like SHA-256 and SHA-512 are prone to length-extension.
Ways to Avoid Length-Extension Attacks
Use HMAC. HMAC was designed to prevent these kinds of attacks.
Alternatively, if you don’t have any cryptographic secrets, you can always do what bitcoin did: Hash your hash again.
return sha256(sha256(message))
Note: Don’t actually do that, it’s dangerous for other reasons. You also don’t want to take this to an extreme. If you iterate your hash too many times, you’ll reinvent PBKDF1 and its insecurity. Two is plenty.
Or you can do something really trivial (which ultimately became another standard option in the SHA-2 family of hash functions):
Always start with a 512-bit hash and then truncate your output so the attacker never recovers the entire internal state of your hash in order to extend it.
That’s why you’ll sometimes see SHA-512/224 and SHA-512/256 in a list of recommendations. This isn’t saying “use one or the other”, that’s the (rather confusing) notation for a standardized SHA-512 truncation.
Note: This is actually what SHA-384 has done all along, and that’s one of the reasons why you see SHA-384 used more than SHA-512.
If you want to be extra fancy, you can also just use a different hash function that isn’t vulnerable to length extension, such as SHA-3 or BLAKE2.
Questions and Answers
Art by Khia.
Why isn’t BLAKE2 vulnerable to length extension attacks?
Quite simply: It sets a flag in the internal hash state before compressing the final buffer.
If you try to deserialize this state then invoke update()
, you’ll get a different result than BLAKE2’s compress()
produced during digest()
.
For a secure hash function, a single bit of difference in the internal state should result in a wildly different output. (This is called the avalanche effect.)
Why isn’t SHA-3 vulnerable to length extension attacks?
SHA-3 is a sponge construction whose internal state is much larger than the hash function output. This prevents an attacker from recovering the hash function’s internal state from a message digest (similar to the truncated hash function discussed above).
Why don’t length-extension attacks break digital signature algorithms?
Digital signature algorithms–such as RSASSA, ECDSA, and EdDSA–take a cryptographic hash of a message and then perform some asymmetric cryptographic transformation of the hash with the secret key to produce a signature that can be verified with a public key. (The exact details are particular to the signature algorithm in question.)
Length-extension attacks only allow you to take a valid H(k || m) and produce a valid H(k || m || padding || extra) hash that will validate, even if you don’t know k. They don’t magically create collisions out of thin air.
Even if you use a weak hash function like SHA-1, knowing M and H(M) is not sufficient to calculate a valid signature. (You need to be able to know these values in order to verify the signature anyway.)
The security of digital signature algorithms depends entirely on the secrecy of the signing key and the security of the asymmetric cryptographic transformation used to generate a signature. (And its resilience to side-channel attacks.)
However, a more interesting class of attack is possible for systems that expect digital signatures to have similar properties as cryptographic hash functions. This would qualify as a protocol vulnerability, not a length-extension vulnerability.
TL;DR
Art by Khia.
Length-extension attacks exploit a neat property of a few cryptographic hash functions–most of which you shouldn’t be using in 2020 anyway (SHA-2 is still fine)–but can only be exploited by a narrow set of circumstances.
If you find yourself trying to use length-extension to break anything else, you’ve probably run into a cryptographic dead end and need to backtrack onto more interesting avenues of exploitation–of which there are assuredly many (unless your cryptography is boring).
Next: Timing Side-Channels
https://soatok.blog/2020/10/06/dead-ends-in-cryptanalysis-1-length-extension-attacks/
#cryptanalysis #crypto #cryptographicHashFunction #cryptography #lengthExtensionAttacks
A question I get asked frequently is, “How did you learn cryptography?”I could certainly tell everyone my history as a self-taught programmer who discovered cryptography when, after my website for my indie game projects kept getting hacked, I was introduced to cryptographic hash functions… but I suspect the question folks want answered is, “How would you recommend I learn cryptography?” rather than my cautionary tale about poorly-implemented password hash being a gateway bug.
The Traditional Ways to Learn
There are two traditional ways to learn cryptography.If you want a book to augment your journey in either traditional path, I recommend Serious Cryptography by Jean-Philippe Aumasson.
Academic Cryptography
The traditional academic way to learn cryptography involves a lot of self-study about number theory, linear algebra, discrete mathematics, probability, permutations, and field theory.You’d typically start off with classical ciphers (Caesar, etc.) then work your way through the history of ciphers until you finally reach an introduction to the math underpinning RSA and Diffie-Hellman, and maybe taught about Schneier’s Law and cautioned to only use AES and SHA-2… and then you’re left to your own devices unless you pursue a degree in cryptography.
The end result of people carelessly exploring this path is a lot of designs like Telegram’s MTProto that do stupid things with exotic block cipher modes and misusing vanilla cryptographic hash functions as message authentication codes; often with textbook a.k.a. unpadded RSA, AES in ECB, CBC, or some rarely-used mode that the author had to write custom code to handle (using ECB mode under the hood), and (until recently) SHA-1.
People who decide to pursue cryptography as a serious academic discipline will not make these mistakes. They’re far too apt for the common mistakes. Instead, they run the risk of spending years involved in esoteric research about homomorphic encryption, cryptographic pairings, and other cool stuff that might not see real world deployment (outside of novel cryptocurrency hobby projects) for five or more years.
That is to say: Academia is a valid path to pursue, but it’s not for everyone.
If you want to explore this path, Cryptography I by Dan Boneh is a great starting point.
Security Industry-Driven Cryptography
The other traditional way to learn cryptography is to break existing cryptography implementations. This isn’t always as difficult as it may sound: Reverse engineering video games to defeat anti-cheat protections has led several of my friends into learning about cryptography.For security-minded folks, the best place to start is the CryptoPals challenges. Another alternative is CryptoHack.
There are also plenty of CTF events all year around, but they’re rarely a good cryptography learning exercise above what CryptoPals offers. (Though there are notable exceptions.)
A Practical Approach to Learning Cryptography
Art by Kyume.If you’re coming from a computer programming background and want to learn cryptography, the traditional approaches carry the risk of Reasoning By Lego.
Instead, the approach I recommend is to start gaining experience with the safest, highest-level libraries and then slowly working your way down into the details.
This approach has two benefits:
- If you have to implement something while you’re still learning, your knowledge and experience is stilted towards “use something safe and secure” not “hack together something with Blowfish in ECB mode and MD5 because they’re familiar”.
- You can let your own curiosity guide your education rather than follow someone else’s study guide.
To illustrate what this looks like, here’s how a JavaScript developer might approach learning cryptography, starting from the most easy-mode library and drilling down into specifics.
Super Easy Mode: DholeCrypto
Disclaimer: This is my project.Dhole Crypto is an open source library, implemented in JavaScript and PHP and powered by libsodium, that tries to make security as easy as possible.
I designed Dhole Crypto for securing my own projects without increasing the cognitive load of anyone reviewing my code.
If you’re an experienced programmer, you should be able to successfully use Dhole Crypto in a Node.js/PHP project. If it does not come easy, that is a bug that should be fixed immediately.
Easy Mode: Libsodium
Using libsodium is slightly more involved than Dhole Crypto: Now you have to know what a nonce is, and take care to manage them carefully.Advantage: Your code will be faster than if you used Dhole Crypto.
Libsodium is still pretty easy. If you use this cheat sheet, you can implement something secure without much effort. If you deviate from the cheat sheet, pay careful attention to the documentation.
If you’re writing system software (i.e. programming in C), libsodium is an incredibly easy-to-use library.
Moderate Difficulty: Implementing Protocols
Let’s say you’re working on a project where libsodium is overkill, and you only need a few cryptography primitives and constructions (e.g. XChaCha20-Poly1305). A good example: In-browser JavaScript.Instead of forcing your users to download the entire Sodium library, you might opt to implement a compatible construction using JavaScript implementations of these primitives.
Since you have trusted implementations to test your construction against, this should be a comparatively low-risk effort (assuming the primitive implementations are also secure), but it’s not one that should be undertaken without all of the prior experience.
Note: At this stage you are not implementing the primitives, just using them.
Hard Difficulty: Designing Protocols and Constructions
Repeat after me: “I will not roll my own crypto before I’m ready.” Art by AtlasInu.To distinguish: TLS and Noise are protocols. AES-GCM and XChaCha20-Poly1305 are constructions.
Once you’ve implemented protocols and constructions, the next step in your self-education is to design new ones.
Maybe you want to combine XChaCha20 with a MAC based on the BLAKE3 hash function, with some sort of SIV to make the whole shebang nonce-misuse resistant?
You wouldn’t want to dive headfirst into cryptography protocol/construction design without all of the prior experience.
Very Hard Mode: Implementing Cryptographic Primitives
It’s not so much that cryptography primitives are hard to implement. You could fit RC4 in a tweet before they raised the character limit to 280. (Don’t use RC4 though!)The hard part is that they’re hard to implement securely. See also: LadderLeak.
Usually when you get to this stage in your education, you will have also picked up one or both of the traditional paths to augment your understanding. If not, you really should.
Nightmare Mode: Designing Cryptography Primitives
A lot of people like to dive straight into this stage early in their education. This usually ends in tears.If you’ve mastered every step in my prescribed outline and pursued both of the traditional paths to the point that you have a novel published attack in a peer-reviewed journal (and mirrored on ePrint), then you’re probably ready for this stage.
Bonus: If you’re a furry and you become a cryptography expert, you can call yourself a cryptografur. If you had no other reason to learn cryptography, do it just for pun!
Header art by circuitslime.
https://soatok.blog/2020/06/10/how-to-learn-cryptography-as-a-programmer/
#cryptography #education #programming #Technology