Search
Items tagged with: SipHash
There are several different methods for securely hashing a password server-side for storage and future authentication. The most common one (a.k.a. the one that FIPS allows you to use, if compliance matters for you) is called PBKDF2. It stands for Password-Based Key Derivation Function #2.
Why #2? It’s got nothing to do with pencils. There was, in fact, a PBKDF1! But PBKDF1 was fatally insecure in a way I find very interesting. This StackOverflow answer is a great explainer on the difference between the two.
Very Hand-Wavy Description of a Hash Function
Let’s defined a hash function as any one-way transformation of some arbitrary-length string () to a fixed-length, deterministic, pseudo-random output.
Note: When in doubt, I err on the side of being easily understood by non-experts over pedantic precision. (Art by Swizz)
For example, this is a dumb hash function (uses SipHash-2-4 with a constant key):
function dumb_hash(string $arbitrary, bool $raw_binary = false): string{ $h = sodium_crypto_shorthash($arbitrary, 'SoatokDreamseekr'); if ($raw_binary) { return $h; } return sodium_bin2hex($h);}
You can see the output of this function with some sample inputs here.
Properties of Hash Functions
A hash function is considered secure if it has the following properties:
- Pre-image resistance. Given , it should be difficult to find .
- Second pre-image resistance. Given , it should be difficult to find such that
- Collision resistance. It should be difficult to find any arbitrary pair of messages () such that
That last property, collision resistance, is guaranteed up to the Birthday Bound of the hash function. For a hash function with a 256-bit output, you will expect to need on average trial messages to find a collision.
If you’re confused about the difference between collision resistance and second pre-image resistance:
Collision resistance is about finding any two messages that produce the same hash, but you don’t care what the hash is as long as two distinct but known messages produce it.
On the other paw, second pre-image resistance is about finding a second message that produces a given hash.
Exploring PBKDF1’s Insecurity
If you recall, hash functions map an arbitrary-length string to a fixed-length string. If your input size is larger than your output size, collisions are inevitable (albeit computationally infeasible for hash functions such as SHA-256).
But what if your input size is equal to your output size, because you’re taking the output of a hash function and feeding it directly back into the same hash function?
Then, as explained here, you get an depletion of the possible outputs with each successive iteration.
But what does that look like?
Without running the experiments on a given hash function, there are two possibilities that come to mind:
- Convergence. This is when will, for two arbitrary messages and a sufficient number of iterations, converge on a single hash output.
- Cycles. This is when for some integer .
The most interesting result would be a quine, which is a cycle where (that is to say, ).
The least interesting result would be for random inputs to converge into large cycles e.g. cycles of size for a 256-bit hash function.
I calculated this lazily as the birthday bound of the birthday bound (so basically the 4th root, which for is ).
Update: According to this 1960 paper, the average time to cycle is , and cycle length should be , which means for a 256-bit hash you should expect a cycle after about or about 128 bits, and the average cycle length will be about . Thanks Riastradh for the corrections.
Conjecture: I would expect secure cryptographic hash functions in use today (e.g. SHA-256) to lean towards the least interesting output.
An Experiment Design
Since I don’t have an immense amount of cheap cloud computing at my disposal to run this experiments on a real hash function, I’m going to cheat a little and use my constant-key SipHash code from earlier in this post. In future work, cryptographers may find studying real hash functions (e.g. SHA-256) worthwhile.
Given that SipHash is a keyed pseudo-random function with an output size of 64 bits, my dumb hash function can be treated as a 64-bit hash function.
This means that you should expect your first collision (with 50% probability) after only trial hashes. This is cheap enough to try on a typical laptop.
Here’s a simple experiment for convergence:
- Generate two random strings .
- Set .
- Iterate until .
You can get the source code to run this trivial experiment here.
Clone the git repository, run composer install
, and then php bin/experiment.php
. Note that this may need to run for a very long time before you get a result.
If you get a result, you’ve found a convergence.
If the loop doesn’t terminate even after 2^64 iterations, you’ve definitely found a cycle. (Actually detecting a cycle and analyzing its length would require a lot of memory, and my simple PHP script isn’t suitable for this effort.)
“What do you mean I don’t have a petabyte of RAM at my disposal?”
What Does This Actually Tell Us?
The obvious lesson: Don’t design key derivation functions like PBKDF1.
But beyond that, unless you can find a hash function that reliably converges or produces short cycles (, for an n-bit hash function), not much. (This is just for fun, after all!)
Definitely fun to think about though! (Art by circuitslime)
If, however, a hash function is discovered to produce interesting results, this may indicate that the chosen hash function’s internal design is exploitable in some subtle ways that, upon further study, may lead to better cryptanalysis techniques. Especially if a hash quine is discovered.
(Header art by Khia)
https://soatok.blog/2020/05/05/putting-the-fun-in-hash-function/
#crypto #cryptography #hashFunction #SipHash
There are several different methods for securely hashing a password server-side for storage and future authentication. The most common one (a.k.a. the one that FIPS allows you to use, if compliance matters for you) is called PBKDF2. It stands for Password-Based Key Derivation Function #2.Why #2? It’s got nothing to do with pencils. There was, in fact, a PBKDF1! But PBKDF1 was fatally insecure in a way I find very interesting. This StackOverflow answer is a great explainer on the difference between the two.
Very Hand-Wavy Description of a Hash Function
Let’s defined a hash function as any one-way transformation of some arbitrary-length string () to a fixed-length, deterministic, pseudo-random output.Note: When in doubt, I err on the side of being easily understood by non-experts over pedantic precision. (Art by Swizz)
For example, this is a dumb hash function (uses SipHash-2-4 with a constant key):
function dumb_hash(string $arbitrary, bool $raw_binary = false): string{ $h = sodium_crypto_shorthash($arbitrary, 'SoatokDreamseekr'); if ($raw_binary) { return $h; } return sodium_bin2hex($h);}
You can see the output of this function with some sample inputs here.
Properties of Hash Functions
A hash function is considered secure if it has the following properties:
- Pre-image resistance. Given , it should be difficult to find .
- Second pre-image resistance. Given , it should be difficult to find such that
- Collision resistance. It should be difficult to find any arbitrary pair of messages () such that
That last property, collision resistance, is guaranteed up to the Birthday Bound of the hash function. For a hash function with a 256-bit output, you will expect to need on average trial messages to find a collision.
If you’re confused about the difference between collision resistance and second pre-image resistance:
Collision resistance is about finding any two messages that produce the same hash, but you don’t care what the hash is as long as two distinct but known messages produce it.
On the other paw, second pre-image resistance is about finding a second message that produces a given hash.
Exploring PBKDF1’s Insecurity
If you recall, hash functions map an arbitrary-length string to a fixed-length string. If your input size is larger than your output size, collisions are inevitable (albeit computationally infeasible for hash functions such as SHA-256).But what if your input size is equal to your output size, because you’re taking the output of a hash function and feeding it directly back into the same hash function?
Then, as explained here, you get an depletion of the possible outputs with each successive iteration.
But what does that look like?
Without running the experiments on a given hash function, there are two possibilities that come to mind:
- Convergence. This is when will, for two arbitrary messages and a sufficient number of iterations, converge on a single hash output.
- Cycles. This is when for some integer .
The most interesting result would be a quine, which is a cycle where (that is to say, ).
The least interesting result would be for random inputs to converge into large cycles e.g. cycles of size for a 256-bit hash function.
I calculated this lazily as the birthday bound of the birthday bound (so basically the 4th root, which for is ).Update: According to this 1960 paper, the average time to cycle is , and cycle length should be , which means for a 256-bit hash you should expect a cycle after about or about 128 bits, and the average cycle length will be about . Thanks Riastradh for the corrections.
Conjecture: I would expect secure cryptographic hash functions in use today (e.g. SHA-256) to lean towards the least interesting output.
An Experiment Design
Since I don’t have an immense amount of cheap cloud computing at my disposal to run this experiments on a real hash function, I’m going to cheat a little and use my constant-key SipHash code from earlier in this post. In future work, cryptographers may find studying real hash functions (e.g. SHA-256) worthwhile.Given that SipHash is a keyed pseudo-random function with an output size of 64 bits, my dumb hash function can be treated as a 64-bit hash function.
This means that you should expect your first collision (with 50% probability) after only trial hashes. This is cheap enough to try on a typical laptop.
Here’s a simple experiment for convergence:
- Generate two random strings .
- Set .
- Iterate until .
You can get the source code to run this trivial experiment here.
Clone the git repository, run
composer install
, and thenphp bin/experiment.php
. Note that this may need to run for a very long time before you get a result.If you get a result, you’ve found a convergence.
If the loop doesn’t terminate even after 2^64 iterations, you’ve definitely found a cycle. (Actually detecting a cycle and analyzing its length would require a lot of memory, and my simple PHP script isn’t suitable for this effort.)
“What do you mean I don’t have a petabyte of RAM at my disposal?”
What Does This Actually Tell Us?
The obvious lesson: Don’t design key derivation functions like PBKDF1.But beyond that, unless you can find a hash function that reliably converges or produces short cycles (, for an n-bit hash function), not much. (This is just for fun, after all!)
Definitely fun to think about though! (Art by circuitslime)
If, however, a hash function is discovered to produce interesting results, this may indicate that the chosen hash function’s internal design is exploitable in some subtle ways that, upon further study, may lead to better cryptanalysis techniques. Especially if a hash quine is discovered.
(Header art by Khia)
https://soatok.blog/2020/05/05/putting-the-fun-in-hash-function/
#crypto #cryptography #hashFunction #SipHash