Nov 21, 2024

BitSNARK 💘 TLA+

*This blog was produced by Israel, a developer of BOS.*

“Thinking doesn't guarantee that we won't make mistakes. But not thinking guarantees that we will.  - Leslie Lamport, computer scientist and mathematician

I'd like to tell you about the BitSNARK protocol, which enables the verification of zero-knowledge proofs on the Bitcoin blockchain; about TLA+, which is a formal language for describing, conceptualizing, and reasoning about the behavior of systems; and just a little bit about me and my opinions.

The protocol lets two participants, a prover and a verifier, prepare a bunch of signed transactions. The prover can use one of those transactions to publish a zero-knowledge proof on the blockchain, and the verifier can use another transaction to challenge the proof, and eventually - after a little song and dance that's all committed to the blockchain - the verifier "wins" the challenge if and only if the proof is really invalid. And all of this is interesting mostly because it is possible to make another transaction, one with a high value, conditional on the outcome.

With such high stakes, we need to convince our users as well as ourselves of the protocol's correctness. Even more importantly, while the requirements of the system are pretty simple, the implementation is highly complex, so we need a clear spec to bridge the two. And most importantly, distributed systems, especially ones that are byzantine-fault tolerant, are not only notoriously hard to design, they are also very hard to clearly conceptualize and reason about. Fundamentally, we need a good way to think and to communicate about the system's behavior.

All those needs are satisfied by TLA+, a formal language for writing specs, designed specifically for reasoning about the correctness of distributed systems, and it comes with a model checker that can analyze all the possible states, steps and behaviors of a specified system and verify that correctness. If you imagine your system as a directed graph of interconnected states (and who hasn't?), the model checker actually traverses the entirety of that graph, automatically verifying that it satisfies the properties you've specified.

Sounds awesome, right? Well, it is. And it was developed by none other than Leslie Lamport, one of the people who birthed the very concept of distributed systems, and who developed a few of the most useful and common distributed algorithms (his name was committed to our main development repo more than 500 times the last time I counted, mostly because, as a leading pioneer in the field of digital signatures, he invented one of the simplest, cleverest, most robust, and earliest digital signatures). If I had to choose one person to take advice from regarding the methodology of designing distributed systems, Leslie Lamport would be the one.

Oh, and there's me. I've been programming professionally for over three decades, and close to a decade before that as a hobby. I've been around. I know how important and valuable a good spec can be, and I know how much programmers hate to write them. I also think I know why. Programmers are not just used to solving problems by writing code, but also to thinking about problems by writing code, and writing a spec requires a lot of thinking. About problems you intend to solve in code anyway. That's why we tend to use pseudo-code and code samples. It often feels silly. If I'm writing code anyway, why not write the code, right?

Well, I'm growing more and more convinced that thinking like a programmer isn't the best way to think about a system's behavior. A good spec describes what a system does, never how it does it, while code is all about the how. By thinking like a programmer I do not only run the risk of occasionally losing sight of the behavior by getting sidetracked into implementation details, I actively mesh the two together. It changes the way I conceptualize the system, adding noise and strange artifacts to my mental model. It actively inhibits my ability to think about the system in an abstract way - and that's the last thing I want when writing a spec.

TLA+ offers a highly rewarding alternative: it provides a way to describe a system in a way that is both abstract and precise, freeing the mind from the constraints of implementation and allowing it to focus on the correctness of the system's behavior. There is a cost, of course. That abstract and precise language is math, the language of science. TLA+ uses logical operators from first-order logic, set theory notation, and a bit of temporal logic in order to reason about a system and the way its state changes over time. In short, unless you've had some formal training in math, it's going to look pretty intimidating. But the rewards are significant.

Not only because I now know that the protocol can evolve to 174 fundamental states, that only 117 of them are distinct, and that all of them satisfy a bunch of invariants that I've specified. And that I can prove it to anyone who cares to listen. Also, and mostly, because my thinking about the system has cleared up and improved, as did my ability to describe it and reason about it - with others, by myself, and with my computer. The execution of the protocol is like a maze, a garden of forking paths. Having a robot that can traverse it in seconds is amazing, but nothing makes you feel better than simply having a good map.