Search
Items tagged with: cryptographicAlacrity
There are two mental models for designing a cryptosystem that offers end-to-end encryption to all of its users.
The first is the Signal model.
Predicated on Moxie’s notion that the ecosystem is moving, Signal (and similar apps) maintain some modicum of centralized control over the infrastructure and deployment of their app. While there are obvious downsides to this approach, it allows them to quickly roll out ecosystem-wide changes to their encryption protocols without having to deal with third-party clients falling behind.
The other is the federated model, which is embraced by Matrix, XMPP with OMEMO, and other encrypted chat apps and protocols.
This model can be attractive to a lot of people whose primary concern is data sovereignty rather than cryptographic protections. (Most security experts care about both aspects, but we differ in how they rank the two priorities relative to each other.)
As I examined in my criticisms of Matrix and XMPP+OMEMO, they kind of prove Moxie’s point about the ecosystem:
- Two years after the Matrix team deprecated their C implementation of Olm in favor of a Rust library, virtually all of the clients that actually switched (as of the time of my blog post disclosing vulnerabilities in their C library) were either Element, or forks of Element. The rest were still wrapping libolm.
- Most OMEMO libraries are still stuck on version 0.3.0 of the specification, and cannot communicate with XMPP+OMEMO implementations that are on newer versions of the specification.
And this is personally a vexing observation, for two reasons:
- I don’t like that Moxie’s opinion is evidently more correct when you look at the consequences of each model.
- I’m planning to develop end-to-end encryption for direct messages on the Fediverse, and don’t want to repeat the mistakes of Matrix and OMEMO.
(Aside from them mistakenly claiming to be Signal competitors, which I am not doing with my E2EE proposal or any implementations thereof.)
Fortunately, I have a solution to both annoyances that I intend to implement in my end-to-end encryption proposal.
Thus, I’d like to introduce Cryptographic Alacrity to the discussion.
Note: The term “crypto agility” was already coined by people who never learned from the alg=none vulnerability of JSON Web Tokens and think it’s A-okay to negotiate cryptographic primitives at run-time based on attacker-controllable inputs.Because they got their foolish stink all over that term, I discarded it in favor of coining a new one. I apologize for the marginal increase in cognitive load this decision may cause in the future.
Cryptographic Alacrity
For readers who aren’t already familiar with the word “alacrity” from playing Dungeons & Dragons once upon a time, the Merriam-Webster dictionary defines Alacrity as:
promptness in response : cheerful readiness
When I describe a cryptography protocol as having “cryptographic alacrity”, I mean there is a built-in mechanism to enforce protocol upgrades in a timely manner, and stragglers (read: non-compliant implementations) will lose the ability to communicate with up-to-date software.
Alacrity must be incorporated into a protocol at its design phase, specified clearly, and then enforced by the community through its protocol library implementations.
The primary difference between Alacrity and Agility is that Alacrity is implemented through protocol versions and a cryptographic mechanism for enforcing implementation freshness across the ecosystem, whereas Agility is about being able to hot-swap cryptographic primitives in response to novel cryptanalysis.
This probably still sounds a bit abstract to some of you.
To best explain what I mean, let’s look at a concrete example. Namely, how I plan on introducing Alacrity to my Fediverse E2EE design, and then enforcing it henceforth.
Alacrity in E2EE for the Fediverse
One level higher in the protocol than bulk message and/or media attachment encryption will be a Key Derivation Function. (Probably HKDF, probably as part of a Double Ratchet protocol or similar. I haven’t specified that layer just yet.)
Each invocation of HKDF will have a hard-coded 256-bit salt particular to the protocol version that is currently being used.
(What most people would think to pass as the salt in HKDF will instead be appended to the info parameter.)
The protocol version will additionally be used in a lot of other places (i.e., domain separation constants), but those are going to be predictable string values.
The salt will not be predictable until the new version is specified. I will likely tie it to the SHA256 hash of a Merkle root of a Federated Public Key Directory instance and the nickname for each protocol version.
Each library will have a small window (probably no more than 3 versions at any time) of acceptable protocol versions.
A new version will be specified, with a brand new KDF salt, every time we need to improve the protocol to address a security risk. Additionally, we will upgrade the protocol version at least once a year, even if no security risks have been found in the latest version of the protocol.
If your favorite client depends on a 4 year old version of the E2EE protocol library, you won’t be able to silently downgrade security for all of your conversation participants. Instead, you will be prevented from talking to most users, due to incompatible cryptography.
Version Deprecation Schedule
Let’s pretend, for the sake of argument, that we launch the first protocol version on January 1, 2025. And that’s when the first clients start to be built atop the libraries that speak the protocols.
Assuming no emergencies occur, after 9 months (i.e., by October 1, 2025), version 2 of the protocol will be specified. Libraries will be updated to support reading (but not sending) messages encrypted with protocol v2.
Then, on January 1, 2026 at midnight UTC–or a UNIX timestamp very close to this, at least–clients will start speaking protocol v2. Other clients can continue to read v1, but they should write v2.
This will occur every year on the same cadence, but with a twist: After clients are permitted to start writing v3, support for reading v1 MUST be removed from the codebase.
This mechanism will hold true even if the protocols are largely the same, aside from tweaked constants.
What does Alacrity give us?
Alacrity allows third-party open source developers the capability of writing their own software (both clients and servers) without a significant risk of the Matrix and OMEMO problem (i.e., stale software being used years after it should have been deprecated).
By offering a sliding window of acceptable versions and scheduling planned version bumps to be about a year apart, we can minimize the risk of clock skew introducing outages.
Additionally, it provides third-party developers ample opportunity to keep their client software conformant to the specification.
It doesn’t completely eliminate the possibility of stale versions being used in silos. Especially if some developers choose malice. However, anyone who deviates from the herd to form their own cadre of legacy protocol users has deliberately or negligently accepted the compatibility risks.
Can you staple Alacrity onto other end-to-end encryption projects?
Not easily, no.
This is the sort of mechanism that needs to be baked in from day one, and everyone needs to be onboard at the project’s inception.
Retroactively trying to make Matrix, XMPP, OpenPGP, etc. have Cryptographic Alacrity after the horses left the barn is an exercise in futility.
I would like your help introducing Alacrity into my pet project.
I’d love to help, but I’m already busy enough with work and my own projects.
If you’re designing a product that you intend to sell to the public, talk to a cryptography consulting firm. I can point you to several that are reputable, but most of them are pretty good.
If you’re designing something for the open source community, and don’t have the budget to hire professionals, I’ll respond to such inquiries when my time, energy, and emotional bandwidth is available to do so. No promises on a timeline, of course.
How do you force old versions to get dropped?
You don’t.
The mechanism I mostly care about is forcing new versions get adopted.
Dropping support for older versions is difficult to mechanize. Being actively involved in the community to encourage implementations do this (if for no other reason to reduce risk by deleting dead code) is sufficient.
I am choosing to not make perfect the enemy of good with this proposal.
This isn’t a new idea.
No, it isn’t a new idea. The privacy-focused cryptocurrency, Zcash, has a similar mechanism build into their network upgrades.
It’s wildly successful when federated or decentralized systems adopt such a policy, and actually enforce it.
The only thing that’s novel in this post is the coined term, Cryptographic Alacrity.
Addendum – Questions Asked After This Post Went Live
Art: ScruffKerfluff
What about Linux Distros with slow release cycles?
What about them?!
In my vision of the future, the primary deliverable that users will actually hold will most likely be a Browser Extension, not a binary blob signed by their Linux distro.
They already make exceptions to their glacial release cadences for browsers, so I don’t anticipate whatever release cadence we settle on being a problem in practice.
For people who write clients with desktop software: Debian and Ubuntu let users install PPAs. Anyone who does Node.js development on Linux is familiar with them.
Why 1 year?
It was an example. We could go shorter or longer depending on the needs of the ecosystem.
How will you enforce the removal of old versions if devs don’t comply?
That’s a much lower priority than enforcing the adoption of new versions.
But realistically, sending pull requests to remove old versions would be the first step.
Publicly naming and shaming clients that refuse to drop abandoned protocol versions is always an option for dealing with obstinance.
We could also fingerprint clients that still support the old versions and refuse to connect to them at all, even if there is a version in common, until they update to drop the old version.
That said, I would hope to not need to go that far.
I really don’t want to overindex on this, but people keep asking or trying to send “what about?” comments that touch on this question, so now I’m writing a definitive statement to hopefully quell this unnecessary discourse.
The ubiquitous adoption of newer versions is a much higher priority than the sunsetting of old versions. It should be obvious that getting your users to use the most secure mode available is intrinsically a net-positive.
If your client can negotiate in the most secure mode available (i.e., if we move onto post-quantum cryptography), and your friends’ clients enforce the correct minimum version, it doesn’t really matter so much if your client in particular is non-compliant.
Focusing so much on this aspect is a poor use of time and emotional bandwidth.
Header art also made by AJ.
https://soatok.blog/2024/08/28/introducing-alacrity-to-federated-cryptography/
#cryptographicAgility #cryptographicAlacrity #cryptography #endToEndEncryption #fediverse #Matrix #OMEMO #XMPP
I don’t consider myself exceptional in any regard, but I stumbled upon a few cryptography vulnerabilities in Matrix’s Olm library with so little effort that it was nearly accidental.It should not be this easy to find these kind of issues in any product people purportedly rely on for private messaging, which many people evangelize incorrectly as a Signal alternative.
Later, I thought I identified an additional vulnerability that would have been much worse, but I was wrong about that one. For the sake of transparency and humility, I’ll also describe that in detail.
This post is organized as follows:
- Disclosure Timeline
- Vulnerabilities in Olm (Technical Details)
- Recommendations
- Background Information
- An Interesting Non-Issue That Looked Critical
I’ve opted to front-load the timeline and vulnerability details to respect the time of busy security professionals.
Please keep in mind that this website is a furry blog, first and foremost, that sometimes happens to cover security and cryptography topics.Many people have, over the years, assumed the opposite and commented accordingly. The ensuing message board threads are usually is a waste of time and energy for everyone involved. So please adjust your expectations.
Art by Harubaki
If you’re curious, you can learn more here.
Disclosure Timeline
- 2024-05-15: I took a quick look at the Matrix source code. I identified two issues and emailed them to their
security@
email address.
In my email, I specify that I plan to disclose my findings publicly in 90 days (i.e. on August 14), in adherence with industry best practices for coordinated disclosure, unless they request an extension in writing.- 2024-05-16: I checked something else on a whim and find a third issue, which I also email to their
security@
email address.- 2024-05-17: Matrix security team confirms receipt of my reports.
- 2024-05-17: I follow up with a suspected fourth finding–the most critical of them all. They point out that it is not actually an issue, because I overlooked an important detail in how the code is architected. Mea culpa!
- 2024-05-18: A friend discloses a separate finding with Matrix: Media can be decrypted to multiple valid plaintexts using different keys and Malicious homeservers can trick Element/Schildichat into revealing links in E2EE rooms.
They instructed the Matrix developers to consult with me if they needed cryptography guidance. I never heard from them on this externally reported issue.- 2024-07-12: I shared this blog post draft with the Matrix security team while reminding them of the public disclosure date.
- 2024-07-31: Matrix pushes a commit that announces that libolm is deprecated.
- 2024-07-31: I email the Matrix security team asking if they plan to fix the reported issues (and if not, if there’s any other reason I should withhold publication).
- 2024-07-31: Matrix confirms they will not fix these issues (due to its now deprecated status), but ask that I withhold publication until the 14th as originally discussed.
- 2024-08-14: This blog post is publicly disclosed to the Internet.
- 2024-08-14: The lead Matrix dev claims they already knew about these issues, and, in fact, knowingly shipped cryptography code that was vulnerable to side-channel attacks for years. See Addendum.
- 2024-08-23: MITRE has assigned CVE IDs to these three findings.
Vulnerabilities in Olm
I identified the following issues with Olm through a quick skim of their source code on Gitlab:
- AES implementation is vulnerable to cache-timing attacks
- Ed25519 signatures are malleable
- Timing leakage in base64 decoding of private key material
This is sorted by the order in which they were discovered, rather than severity.
AES implementation is vulnerable to cache-timing attacks
a.k.a. CVE-2024-45191Olm ships a pure-software implementation of AES, rather than leveraging hardware acceleration.
// Substitutes a word using the AES S-Box.WORD SubWord(WORD word){unsigned int result;result = (int)aes_sbox[(word >> 4) & 0x0000000F][word & 0x0000000F];result += (int)aes_sbox[(word >> 12) & 0x0000000F][(word >> 8) & 0x0000000F] << 8;result += (int)aes_sbox[(word >> 20) & 0x0000000F][(word >> 16) & 0x0000000F] << 16;result += (int)aes_sbox[(word >> 28) & 0x0000000F][(word >> 24) & 0x0000000F] << 24;return(result);}
The code in question is called from this code, which is in turn used to actually encrypt messages.
Software implementations of AES that use a look-up table for the SubWord step of the algorithm are famously susceptible to cache-timing attacks.
This kind of vulnerability in software AES was previously used to extract a secret key from OpenSSL and dm-crypt in about 65 milliseconds. Both papers were published in 2005.
A general rule in cryptography is, “attacks only get better; they never get worse“.
As of 2009, you could remotely detect a timing difference of about 15 microseconds over the Internet with under 50,000 samples. Side-channel exploits are generally statistical in nature, so such a sample size is generally not a significant mitigation.
How is this code actually vulnerable?
In the above code snippet, the vulnerability occurs inaes_sbox[/* ... */][/* ... */]
.Due to the details of how the AES block cipher works, the input variable (
word
) is a sensitive value.Software written this way allows attackers to detect whether or not a specific value was present in one of the processor’s caches.
To state the obvious: Cache hits are faster than cache misses. This creates an observable timing difference.
Such a timing leak allows the attacker to learn the value that was actually stored in said cache. You can directly learn this from other processes on the same hardware, but it’s also observable over the Internet (with some jitter) through the normal operation of vulnerable software.
See also: cryptocoding’s description for table look-ups indexed by secret data.
How to mitigate this cryptographic side-channel
The correct way to solve this problem is to use hardware accelerated AES, which uses distinct processor features to implement the AES round function and side-steps any cache-timing shenanigans with the S-box.Not only is this more secure, but it’s faster and uses less energy too!
If you’re also targeting devices that don’t have hardware acceleration available, you should first use hardware acceleration where possible, but then fallback to a bitsliced implementation such as the one in Thomas Pornin’s BearSSL.
See also: the BearSSL documentation for constant-time AES.
Art by AJ
Ed25519 signatures are malleable
a.k.a. CVE-2024-45193Ed25519 libraries come in various levels of quality regarding signature validation criteria; much to the chagrin of cryptography engineers everywhere. One of those validation criteria involves signature malleability.
Signature malleability usually isn’t a big deal for most protocols, until suddenly you discover a use case where it is. If it matters, that usually that means you’re doing something with cryptocurrency.
Briefly, if your signatures are malleable, that means you can take an existing valid signature for a given message and public key, and generate a second valid signature for the same message. The utility of this flexibility is limited, and the impact depends a lot on how you’re using signatures and what properties you hope to get out of them.
For ECDSA, this means that for a given signature , a second signature is also possible (where is the order of the elliptic curve group you’re working with).
Matrix uses Ed25519, whose malleability is demonstrated between and .
This is trivially possible because S is implicitly reduced modulo the order of the curve, , which is a 253-bit number (
0x1000000000000000000000000000000014def9dea2f79cd65812631a5cf5d3ed
) and S is encoded as a 256-bit number.The Ed25519 library used within Olm does not ensure that , thus signatures are malleable. You can verify this yourself by looking at the Ed25519 verification code.
int ed25519_verify(const unsigned char *signature, const unsigned char *message, size_t message_len, const unsigned char *public_key) { unsigned char h[64]; unsigned char checker[32]; sha512_context hash; ge_p3 A; ge_p2 R; if (signature[63] & 224) { return 0; } if (ge_frombytes_negate_vartime(&A, public_key) != 0) { return 0; } sha512_init(&hash); sha512_update(&hash, signature, 32); sha512_update(&hash, public_key, 32); sha512_update(&hash, message, message_len); sha512_final(&hash, h); sc_reduce(h); ge_double_scalarmult_vartime(&R, h, &A, signature + 32); ge_tobytes(checker, &R); if (!consttime_equal(checker, signature)) { return 0; } return 1;}
This is almost certainly a no-impact finding (or low-impact at worst), but still an annoying one to see in 2024.
If you’d like to learn more, this page is a fun demo of Ed25519 malleability.
To mitigate this, I recommend implementing these checks from libsodium.
Art: CMYKat
Timing leakage in base64 decoding of private key material
a.k.a. CVE-2024-45192If you weren’t already tired of cache-timing attacks based on table look-ups from AES, the Matrix base64 code is also susceptible to the same implementation flaw.
while (pos != end) { unsigned value = DECODE_BASE64[pos[0] & 0x7F]; value <<= 6; value |= DECODE_BASE64[pos[1] & 0x7F]; value <<= 6; value |= DECODE_BASE64[pos[2] & 0x7F]; value <<= 6; value |= DECODE_BASE64[pos[3] & 0x7F]; pos += 4; output[2] = value; value >>= 8; output[1] = value; value >>= 8; output[0] = value; output += 3;}
The base64 decoding function in question is used to load the group session key, which means the attack published in this paper almost certainly applies.
How would you mitigate this leakage?
Steve Thomas (one of the judges of the Password Hashing Competition, among other noteworthy contributions) wrote some open source code a while back that implements base64 encoding routines in constant-time.The real interesting part is how it avoids a table look-up by using arithmetic (from this file):
// Base64 character set:// [A-Z] [a-z] [0-9] + /// 0x41-0x5a, 0x61-0x7a, 0x30-0x39, 0x2b, 0x2finline int base64Decode6Bits(char src){int ch = (unsigned char) src;int ret = -1;// if (ch > 0x40 && ch < 0x5b) ret += ch - 0x41 + 1; // -64ret += (((0x40 - ch) & (ch - 0x5b)) >> 8) & (ch - 64);// if (ch > 0x60 && ch < 0x7b) ret += ch - 0x61 + 26 + 1; // -70ret += (((0x60 - ch) & (ch - 0x7b)) >> 8) & (ch - 70);// if (ch > 0x2f && ch < 0x3a) ret += ch - 0x30 + 52 + 1; // 5ret += (((0x2f - ch) & (ch - 0x3a)) >> 8) & (ch + 5);// if (ch == 0x2b) ret += 62 + 1;ret += (((0x2a - ch) & (ch - 0x2c)) >> 8) & 63;// if (ch == 0x2f) ret += 63 + 1;ret += (((0x2e - ch) & (ch - 0x30)) >> 8) & 64;return ret;}
Any C library that handles base64 codecs for private key material should use a similar implementation. It’s fine to have a faster base64 implementation for non-secret data.
Worth noting: Libsodium also provides a reasonable Base64 codec.
Recommendations
These issues are not fixed in libolm.Instead of fixing libolm, the Matrix team recommends all Matrix clients adopt vodozemac.
I can’t speak to the security of vodozemac.
Art: CMYKat
But I can speak against the security of libolm, so moving to vodozemac is probably a good idea. It was audited by Least Authority at one point, so it’s probably fine.
Most Matrix clients that still depended on libolm should treat this blog as public 0day, unless the Matrix security team already notified you about these issues.
Background Information
If you’re curious about the backstory and context of these findings, read on.Otherwise, feel free to skip this section. It’s not pertinent to most audiences. The people that need to read it already know who they are.
End-to-end encryption is one of the topics within cryptography that I find myself often writing about.In 2020, I wrote a blog post covering end-to-end encryption for application developers. This was published several months after another blog I wrote covering gripes with AES-GCM, which included a shallow analysis of how Signal uses the algorithm for local storage.
In 2021, I published weaknesses in another so-called private messaging app called Threema.
In 2022, after Elon Musk took over Twitter, I joined the Fediverse and sought to build end-to-end encryption support for direct messages into ActivityPub, starting with a specification. Work on this effort was stalled while trying to solve Public Key distribution in a federated environment (which I hope to pick up soon, but I digress).
Earlier this year, the Telegram CEO started fearmongering about Signal with assistance from Elon Musk, so I wrote a blog post urging the furry fandom to move away from Telegram and start using Signal more. As I had demonstrated years prior, I was familiar with Signal’s code and felt it was a good recommendation for security purposes (even if its user experience needs significant work).
I thought that would be a nice, self-contained blog post. Some might listen, most would ignore it, but I could move on with my life.
I was mistaken about that last point.
Art by AJAn overwhelming number of people took it upon themselves to recommend or inquire about Matrix, which prompted me to hastily scribble down my opinion on Matrix so that I might copy/paste a link around and save myself a lot of headache.
Just when I thought the firehose was manageable and I could move onto other topics, one of the Matrix developers responded to my opinion post.
Thus, I decided to briefly look at their source code and see if any major or obvious cryptography issues would fall out of a shallow visual scan.
Since you’re reading this post, you already know how that ended.
Credit: CMYKat
Since the first draft of this blog post was penned, I also outlined what I mean when I say an encrypted messaging app is a Signal competitor or not, and published my opinion on XMPP+OMEMO (which people also recommend for private messaging).
Why mention all this?
Because it’s important to know that I have not audited the Olm or Megolm codebases, nor even glanced at their new Rust codebase.The fact is, I never intended to study Matrix. I was annoyed into looking at it in the first place.
My opinion of their project was already calcified by the previously discovered practically-exploitable cryptographic vulnerabilities in Matrix in 2022.
The bugs described above are the sort of thing I mentally scan for when I first look at a project just to get a feel for the maturity of the codebase. I do this with the expectation (hope, really) of not finding anything at all.
(If you want two specific projects that I’ve subjected to a similar treatment, and failed to discover anything interesting in: Signal and WireGuard. These two set the bar for cryptographic designs.)
It’s absolutely bonkers that an AES cache timing vulnerability was present in their code in 2024.
It’s even worse when you remember that I was inundated with Matrix evangelism in response to recommending furries use Signal.I’m a little outraged because of how irresponsible this is, in context.
It’s so bad that I didn’t even need to clone their git repository, let alone run basic static analysis tools locally.So if you take nothing else away from this blog post, let it be this:
There is roughly a 0% chance that I got extremely lucky in my mental
grep
and found the only cryptography implementation flaws in their source code. I barely tried at all and found these issues.I would bet money on there being more bugs or design flaws that I didn’t find, because this discovery was the result of an extremely half-assed effort to blow off steam.
Wasn’t libolm deprecated in May 2022?
The Matrix developers like to insist that their new Rust hotness “vodozemac” is what people should be using today.I haven’t looked at vodozemac at all, but let’s pretend, for the sake of argument, that its cryptography is actually secure.
(This is very likely if they turn out to be using RustCrypto for their primitives, but I don’t have the time or energy for that nerd snipe, so I’m not going to look. Least Authority did audit their Rust library, for what it’s worth, and Least Authority isn’t clownshoes.)
It’s been more than 2 years since they released vodozemac. What does the ecosystem penetration for this new library look like, in practice?
A quick survey of the various Matrix clients on GitHub says that libolm is still the most widely used cryptography implementation in the Matrix ecosystem (as of this writing):
Matrix Client Cryptography Backend https://github.com/tulir/gomuks libolm (1, 2) https://github.com/niochat/nio libolm (1, 2) https://github.com/ulyssa/iamb vodozemac (1, 2) https://github.com/mirukana/mirage libolm (1) https://github.com/Pony-House/Client libolm (1) https://github.com/MTRNord/cetirizine vodozemac (1) https://github.com/nadams/go-matrixcli none https://github.com/mustang-im/mustang libolm (1) https://github.com/marekvospel/libretrix libolm (1) https://github.com/yusdacra/icy_matrix none https://github.com/ierho/element libolm (through the python SDK) https://github.com/mtorials/cordless none https://github.com/hwipl/nuqql-matrixd libolm (through the python SDK) https://github.com/maxkratz/element-web vodozemac (1, 2, 3, 4) https://github.com/asozialesnetzwerk/riot libolm (wasm file) https://github.com/NotAlexNoyle/Versi libolm (1, 2) 3 of the 16 clients surveyed use the new vodozemac library. 10 still use libolm, and 3 don’t appear to implement end-to-end encryption at all.
If we only focus on clients that support E2EE, vodozemac has successfully been adopted by 19% of the open source Matrix clients on GitHub.
I deliberately excluded any repositories that were archived or clearly marked as “old” or “legacy” software, because including those would artificially inflate the representation of libolm. It would make for a more compelling narrative to do so, but I’m not trying to be persuasive here.
Deprecation policies are a beautiful lie. The impact of a vulnerability in Olm or Megolm is still far-reaching, and should be taken seriously by the Matrix community.
Worth calling out: this quick survey, which is based on a GitHub Topic, certainly misses other implementations. Both FluffyChat and Cinny, which were not tagged with this GitHub Topic, depend a language-specific Olm binding.These bindings in turn wrap libolm rather than the Rust replacement, vodozemac.
But the official clients…
I thought the whole point of choosing Matrix over something like Signal is to be federated, and run your own third-party clients?If we’re going to insist that everyone should be using Element if they want to be secure, that defeats the entire marketing point about third-party clients that Matrix evangelists cite when they decry Signal’s centralization.
So I really don’t want to hear it.
An Interesting Non-Issue That Looked Critical
As I mentioned in the timeline at the top, I thought I found a fourth issue with Matrix’s codebase. Had I been correct, this would have been a critical severity finding that the entire Matrix ecosystem would need to melt down to remediate.Fortunately for everyone, I made a mistake, and there is no fourth vulnerability after all.
However, I thought it would be interesting to write about what I thought I found, the impact it would have had if it were real, and why I believed it to be an issue.
Let’s start with the code in question:
void ed25519_sign(unsigned char *signature, const unsigned char *message, size_t message_len, const unsigned char *public_key, const unsigned char *private_key) { sha512_context hash; unsigned char hram[64]; unsigned char r[64]; ge_p3 R; sha512_init(&hash); sha512_update(&hash, private_key + 32, 32); sha512_update(&hash, message, message_len); sha512_final(&hash, r); sc_reduce(r); ge_scalarmult_base(&R, r); ge_p3_tobytes(signature, &R); sha512_init(&hash); sha512_update(&hash, signature, 32); sha512_update(&hash, public_key, 32); sha512_update(&hash, message, message_len); sha512_final(&hash, hram); sc_reduce(hram); sc_muladd(signature + 32, hram, private_key, r);}
The highlighted segment is doing pointer arithmetic. This means it’s reading 32 bytes, starting from the 32nd byte in
private_key
.What’s actually happening here is:
private_key
is the SHA512 hash of a 256-bit seed. If you look at the function prototype, you’ll notice thatpublic_key
is a separate input.Virtually every other Ed25519 implementation I’ve ever looked at before expected users to provide a 32 byte seed followed by the public key as a single input.
This led me to believe that this
private_key + 32
pointer arithmetic was actually using the public key for calculatingr
.The variable
r
(not to be confused with big R) generated via the first SHA512 is the nonce for a given signature, it must remain secret for Ed25519 to remain secure.If
r
is known to an attacker, you can do some arithmetic to recover the secret key from a single signature.Because I had mistakenly believed that
r
was calculated from the SHA512 of only public inputs (the public key and message), which I must emphasize isn’t correct, I had falsely concluded that any previously intercepted signature could be used to steal user’s private keys.Credit: CMYKat
But because
private_key
was actually the full SHA512 hash of the seed, rather than the seed concatenated with the public key, this pointer arithmetic did NOT use the public key for the calculation ofr
, so this vulnerability does not exist.If the code did what I thought it did, however, this would have been a complete fucking disaster for the Matrix ecosystem. Any previously intercepted message would have allowed an attacker to recover a user’s secret key and impersonate them. It wouldn’t be enough to fix the code; every key in the ecosystem would need to be revoked and rotated.
Whew!
I’m happy to be wrong about this one, because that outcome is a headache nobody wants.
So no action is needed, right?
Well, maybe.Matrix’s library was not vulnerable, but I honestly wouldn’t put it past software developers at large to somehow, somewhere, use the public key (rather than a secret value) to calculate the EdDSA signature nonces as described in the previous section.
To that end, I would like to propose a test vector be added to the Wycheproof test suite to catch any EdDSA implementation that misuses the public key in this way.
Then, if someone else screws up their Ed25519 implementation in the exact way I thought Matrix was, the Wycheproof tests will catch it.
For example, here’s a vulnerable test input for Ed25519:
{ "should-fail": true, "secret-key": "d1d0ef849f9ec88b4713878442aeebca5c7a43e18883265f7f864a8eaaa56c1ef3dbb3b71132206b81f0f3782c8df417524463d2daa8a7c458775c9af725b3fd", "public-key": "f3dbb3b71132206b81f0f3782c8df417524463d2daa8a7c458775c9af725b3fd", "message": "Test message", "signature": "ffc39da0ce356efb49eb0c08ed0d48a1cadddf17e34f921a8d2732a33b980f4ae32d6f5937a5ed25e03a998e4c4f5910c931b31416e143965e6ce85b0ea93c09"}
A similar test vector would also be worth creating for Ed448, but the only real users of Ed448 were the authors of the xz backdoor, so I didn’t bother with that.
(None of the Project Wycheproof maintainers knew this suggestion is coming, by the way, because I was respecting the terms of the coordinated disclosure.)
Closing Thoughts
Despite finding cryptography implementation flaws in Matric’s Olm library, my personal opinion on Matrix remains largely unchanged from 2022. I had already assumed it would not meet my bar for security.Cryptography engineering is difficult because the vulnerabilities you’re usually dealing with are extremely subtle. (Here’s an unrelated example if you’re not convinced of this general observation.) As SwiftOnSecurity once wrote:
https://twitter.com/SwiftOnSecurity/status/832058185049579524
The people that developed Olm and Megolm has not proven themselves ready to build a Signal competitor. In balance, most teams are not qualified to do so.
I really wish the Matrix evangelists would accept this and stop trying to cram Matrix down other people’s throats when they’re talking about problems with other platforms entirely.
More important for the communities of messaging apps:You don’t need to be a Signal competitor. Having E2EE is a good thing on its own merits, and really should be table stakes for any social application in 2024.
It’s only when people try to advertise their apps as a Signal alternative (or try to recommend it instead of Signal), and offer less security, that I take offense.
Just be your own thing.
My work-in-progress proposal to bring end-to-end encryption to the Fediverse doesn’t aim to compete with Signal. It’s just meant to improve privacy, which is a good thing to do on its own merits.
If I never hear Matrix evangelism again after today, it would be far too soon.If anyone feels like I’m picking on Matrix, don’t worry: I have far worse things to say about Telegram, Threema, XMPP+OMEMO, Tox, and a myriad other projects that are hungry for Signal’s market share but don’t measure up from a cryptographic security perspective.
If Signal fucked up as bad as these projects, my criticism of Signal would be equally harsh. (And remember, I have looked at Signal before.)
Addendum (2024-08-14)
One of the lead Matrix devs posted a comment on Hacker News after this blog post went live that I will duplicate here:the author literally picked random projects from github tagged as matrix, without considering their prevalence or whether they are actually maintained etc.if you actually look at % of impacted clients, it’s tiny.
meanwhile, it is very unclear that any sidechannel attack on a libolm based client is practical over the network (which is why we didn’t fix this years ago). After all, the limited primitives are commented on in the readme and https://github.com/matrix-org/olm/issues/3 since day 1.
So the Matrix developers already knew about these vulnerabilities, but deliberately didn’t fix them, for years.Congratulations, you’ve changed my stance. It used to be “I don’t consider Matrix a Signal alternative and they’ve had some embarrassing and impactful crypto bugs but otherwise I don’t care”. Now it’s a stronger stance:
Don’t use Matrix.
I had incorrectly assumed ignorance, when it was in fact negligence.
There’s no reasonable world in which anyone should trust the developers of cryptographic software (i.e., libolm) that deliberately ships with side-channels for years, knowing they’re present, and never bother to fix them.
This is fucking clownshoes.
https://soatok.blog/2024/08/14/security-issues-in-matrixs-olm-library/
#crypto #cryptography #endToEndEncryption #Matrix #sideChannels #vuln