Verified Model-Based Conformance Testing for Dummies

How a small, proved model plus a differential fuzzer catches deep logic bugs normally missed by tests.

For a real-world example, see our Verified Ledger repo.

Most developers hear “formal verification” and think math-heavy papers, not something you can run in a CI pipeline. But there is a practical, high-leverage pattern that makes formal models immediately useful: differential fuzzing against a verified model, also called model-based conformance testing.

You do not need to be a proof expert to benefit from it. All you need is a small model that is correct, and a harness that compares it to the real implementation. The model becomes a high-signal oracle, and the fuzzer turns that oracle into a constant, automated audit.

This post explains the idea, why it works, and how to start with minimal formal knowledge.

What differential fuzzing actually is

Traditional fuzzing throws random inputs at a program and watches for crashes. Differential fuzzing adds a second system and compares their behavior on the same inputs. If the systems diverge, that is a bug.

When one of those systems is a formally verified model, the comparison becomes especially strong. The model provably behaves correctly, so any disagreement is a real bug in the implementation.

Think of it like this:

The model says what should happen.
The implementation does what actually happens.
The fuzzer finds inputs where they disagree.

That disagreement is exactly what you want to know.

Why a verified model

A reference implementation is already useful for differential tests. A verified model is stronger because it is small, explicit, and backed by proofs. It does not carry accidental behavior or hidden assumptions. It is the spec.

This has three big consequences:

You get an oracle that is not just “the other code,” but the intended rules.
The model is minimal, so it is easy to understand and hard to accidentally encode the same bug.
The proofs force you to write down the invariants you actually rely on.

Even if you never read the proofs, you still benefit from the model being disciplined and correct. The fuzzer is only as good as its oracle.

What model-based conformance testing looks like

At a high level, the loop is simple:

Generate a random operation or input.
Apply it to the verified model.
Apply it to the real implementation.
Compare the results and the full state.
If they differ, stop and record a replay.

That is it. The complexity is in the adapter code and in making the comparison precise. But the structure is straightforward.

Here is a minimal pseudocode sketch:

for op in random_ops():
    model_result, model_state = model.apply(op)
    impl_result, impl_state = impl.apply(op)
    assert (model_result, model_state) == (impl_result, impl_state)

Once this exists, you can plug in better fuzzers, bigger generators, a seed corpus, or even more targeted property-based testing instead of general fuzzing. The core idea stays the same.

Why it finds bugs other tests miss

The most expensive bugs are not simple crashes. They are silent logic errors:

A balance check that is off by one.
A state transition that is allowed in the wrong order.
A transfer that debits the sender but forgets the recipient.

Unit tests usually miss these because they are too specific. Integration tests miss them because they are too broad. Differential fuzzing finds them because the model is strict and the fuzzer explores long, adversarial sequences.

It is especially powerful for:

State machines and protocols.
Financial logic and conservation of value.
Parsers and codecs with strict round-trip behavior.
Anything with “should never happen” invariants.

If the invariant is real, the model can encode it. If the model encodes it, the fuzzer will eventually find the inputs that violate it.

You do not need to be a formal methods expert

You can get value without deep proof knowledge. A practical path looks like this:

Start with a small, executable model in the simplest language you can.
Keep the model focused on the critical invariants.
Write a differential harness and start fuzzing.
Only then, add proofs to the model as time allows.

The proofs make the model trustworthy. But even a tiny model with a couple of lemmas can be enough to unlock the testing loop.

If you want to see an end-to-end example, check out our Verified Ledger repo, which shows a Lean 4 model compiled into a Rust fuzzing harness with replay support.

A day-one starting point

If you are starting from scratch, keep it intentionally small:

Choose a boundary with real security or correctness value.
Define the smallest state and operations that capture the behavior.
Make the model total and deterministic.
Make the implementation adapter deterministic as well.
Add a random sequence generator and compare full state after each step.

Most teams fail by modeling too much. The model is a tool, not a mirror of the whole system.

Common pitfalls (and how to avoid them)

Model-based fuzzing is powerful, but there are a few sharp edges:

Nondeterminism: time, randomness, and I/O must be mocked or removed.
Ambiguous errors: the model and implementation must agree on failure modes.
Hidden preconditions: if the implementation assumes a precondition, make it explicit in the model or in the generator.
Overly broad models: if the model is too big, it becomes as hard to trust as the implementation.

These are manageable. The best fix is to keep the model small and to compare full state, not just return values.

Why this is underrated

Differential fuzzing against a verified model is not flashy. It is not a single proof that “everything is correct.” It is a practical engineering loop that catches real bugs, quickly, with high confidence.

Once you have the model, the fuzzer becomes a continuous audit. It guards against regression, specification drift, and the kind of subtle logic errors that make it into production.

That is a rare trade: a relatively small upfront cost that compounds every time you change the code.

The takeaway

Formal verification does not have to be a separate, academic effort. A small, verified model can become the most effective testing oracle you have. Pair it with differential fuzzing, and you get a practical, repeatable way to find the bugs that matter most.

Ready to de-risk your stack?

We deliver formal specs, differential fuzzing suites, and conformance reports with remediation guidance.