Why We Just Open‑Sourced Our WhatsApp & Signal Capture Code - And Why It Matters
August 5, 2025 - Jeremiah Church
Picture a regulator scrolling through thousands of WhatsApp chats while your legal team mutters, “We think the vendor captured everything…”
That guess‑and‑finger-crossing routine ends today
Most RegTech vendors cloak their capture logic in proprietary code: you sign the contract, wish on a 4-leafed clover, and trust that the code is solid and does what the marketing material states.
You find out whether it was or wasn’t when the auditors arrive, or worse, when there’s a security breach.
Veiled compliance has felt broken to me for years, and it feels reckless after the May 2025 TeleMessage breach that left unencrypted chats scattered across the internet.
We’re sick of the obscurity, so Comma Compliance took a different path.
So we open‑sourced our WhatsApp and Signal connectors on GitHub (Apache 2.0 and GPL v3). Free for anyone to inspect, fork, or self‑host. No NDA required, no marketing gate, just the code.
Why open source - and why now?
Proof. A commit hash beats a brochure in any audit. Regulators ask how messages are captured, encrypted, and transported. A marketing diagram won’t answer that. Source code can.
Sunlight. Security engineers are now free to review our logic, file issues, and suggest fixes. We know code isn’t perfect, but by bringing bugs to light, we harden the logic.
Freedom. Some firms want a fully managed archive, and others want to build their own product without restriction. Go for it. If you want the AI of conversations, the automated storage, and the convenience, you can feel secure in your choice with Comma.
Standardization. "Proprietary secret sauce" has been the excuse for opacity in regtech for decades. We're done with that excuse. It leads to security breaches and a lack
What exactly did we release?
- Real‑time capture code that transmits encrypted WhatsApp and Signal messages to show how we transmit these immutable threads to our processing and archival system.
What are we keeping hidden?
- The processing & analysis engine – the context‑aware NLP tooling and rule‑matching layer that spots genuine, high‑risk content instead of blunt keyword hits.
- The archival workflow – the exact behind‑the‑scenes mechanics of how we split data into separate chunks, keep every historical copy intact, and routinely refresh the encryption keys.
We’ve made the capture code public for full audit scrutiny, but we’re keeping the analysis and storage layers private to protect customers. As threat models evolve—and as we build additional safeguards—we may also open-source those parts.
How does compliance improve with transparency?
- Auditors get line‑level visibility. Instead of asking us to “prove” we’re secure, they can walk straight through the repo.
- CIOs and CISOs get control. They can run static-analysis tools or threat-model the flow: no more faith-based compliance.
- Legal teams get defensibility. When a regulator asks, “Show me how this message was captured,” you send them a commit hash, not a marketing sheet.
- Engineering & Tech get peace of mind - Clone our repo, inspect every commit, run your own tests, and pull updates straight into your CI/CD—no more “trust me, bro,” just transparent, auditable code.
What’s next?
Open‑sourcing two connectors is the start, not the finish. We have other channel captures like all the big players, and we’ll be continuing to release our source code for others, from iMessage to LinkedIn with the aim to have our entire capture stack living on GitHub.
As we push towards more transparency, we’ll begin publishing our roadmap so you’re never left guessing what’s next. We’re striking the balance between “not overwhelming you” and “not hiding anything.” Ultimately, black box compliance is yesterday’s risk.
If you want, go kick the tires, clone the repo, and tell us where we can do better.
And if you’d rather see the whole shebang in action, book a demo. We’ll walk you through all the ways we can help keep your firm compliant.
Compliance shouldn’t be mysterious. It should be measurable.
That’s why we just shipped our playbook to GitHub.
Explore the code here: