How a small, proved model plus a differential fuzzer catches deep logic bugs normally missed by tests.
For a real-world example, see our Verified Ledger repo.
Most developers hear “formal verification” and think math-heavy papers, not something you can run in a CI pipeline. But there is a practical, high-leverage pattern that makes formal models immediately useful: differential fuzzing against a verified model, also called model-based conformance testing.
You do not need to be a proof expert to benefit from it. All you need is a small model that is correct, and a harness that compares it to the real implementation. The model becomes a high-signal oracle, and the fuzzer turns that oracle into a constant, automated audit.
This post explains the idea, why it works, and how to start with minimal formal knowledge.
Traditional fuzzing throws random inputs at a program and watches for crashes. Differential fuzzing adds a second system and compares their behavior on the same inputs. If the systems diverge, that is a bug.
When one of those systems is a formally verified model, the comparison becomes especially strong. The model provably behaves correctly, so any disagreement is a real bug in the implementation.
Think of it like this:
That disagreement is exactly what you want to know.
A reference implementation is already useful for differential tests. A verified model is stronger because it is small, explicit, and backed by proofs. It does not carry accidental behavior or hidden assumptions. It is the spec.
This has three big consequences:
Even if you never read the proofs, you still benefit from the model being disciplined and correct. The fuzzer is only as good as its oracle.
At a high level, the loop is simple:
That is it. The complexity is in the adapter code and in making the comparison precise. But the structure is straightforward.
Here is a minimal pseudocode sketch:
for op in random_ops():
model_result, model_state = model.apply(op)
impl_result, impl_state = impl.apply(op)
assert (model_result, model_state) == (impl_result, impl_state) Once this exists, you can plug in better fuzzers, bigger generators, a seed corpus, or even more targeted property-based testing instead of general fuzzing. The core idea stays the same.
The most expensive bugs are not simple crashes. They are silent logic errors:
Unit tests usually miss these because they are too specific. Integration tests miss them because they are too broad. Differential fuzzing finds them because the model is strict and the fuzzer explores long, adversarial sequences.
It is especially powerful for:
If the invariant is real, the model can encode it. If the model encodes it, the fuzzer will eventually find the inputs that violate it.
You can get value without deep proof knowledge. A practical path looks like this:
The proofs make the model trustworthy. But even a tiny model with a couple of lemmas can be enough to unlock the testing loop.
If you want to see an end-to-end example, check out our Verified Ledger repo, which shows a Lean 4 model compiled into a Rust fuzzing harness with replay support.
If you are starting from scratch, keep it intentionally small:
Most teams fail by modeling too much. The model is a tool, not a mirror of the whole system.
Model-based fuzzing is powerful, but there are a few sharp edges:
These are manageable. The best fix is to keep the model small and to compare full state, not just return values.
Differential fuzzing against a verified model is not flashy. It is not a single proof that “everything is correct.” It is a practical engineering loop that catches real bugs, quickly, with high confidence.
Once you have the model, the fuzzer becomes a continuous audit. It guards against regression, specification drift, and the kind of subtle logic errors that make it into production.
That is a rare trade: a relatively small upfront cost that compounds every time you change the code.
Formal verification does not have to be a separate, academic effort. A small, verified model can become the most effective testing oracle you have. Pair it with differential fuzzing, and you get a practical, repeatable way to find the bugs that matter most.