On tests and perceptual domain modelling

For a while, when I first started out, I really enjoyed the process of finding a thing and boiling it down to just the abstract pieces of it in a very skin-tight way. Doing this results in tests that resolve to essentially one correct scenario; if a scenario doesn't resolve correctly, it's just wrong.

But there's another side to that, almost antithetical to it, where the things you're working on aren't clean. There's no single thing you're testing. You're testing almost a process or a transformation or something.

Creating those tests always kind of irked me. I could come up with examples of good and bad inputs, but they were never enough. It's not like you can throw a few examples at a parser and call it a day. I found it mentally exhausting, like there was no way to finish it.

Property-based testing clicks

Then one day I saw a post by Rain where she was sharing how property-based testing works. I started reading about it and thought, "This sounds really cool but I don't really know what a strategy is." The docs weren't fully making sense to me at first, but eventually it clicked.

What you're really doing with a strategy is taking the potter's clay of the domain of all possible data, putting it on a wheel, and spinning and shaping it into the shape of the data you expect. Then you can take that shaped clay and say: on this shape of data, this should always hold true. It should never fail.

I really liked that, and I wasn't totally sure why at first.

Perceptual domains

Thinking about it more, I think the way my brain holds these tests is more mathematical in nature than I originally realized, but not purely math. Software's point is not to be math; it's to solve problems. Which means that both the problems the code is solving and the code itself have a lot of human cognition behind them.

What I think is actually happening is that you're working in perceptual domains. The line I'd draw is between tests that need to hold true over a closed perceptual domain (where there's a humanly countable number of possibilities) versus an open perceptual domain, where there's a humanly innumerable number of possibilities and you're forced to reckon with the pattern of the data rather than the data itself.

In a closed perceptual domain, you can exhaustively enumerate: if this happens then this should happen, and if not then this other thing should happen. You can ensure you've covered every possible input.

In an open perceptual domain, you can't do that. Say you're writing a round-trip data format, parsing inputs, or working with an abstract syntax tree. The number of possible inputs, while technically finite, is beyond practical consideration by human minds. We're manually enumerating inputs at human speed, not computer speed. You can try to paper over this with example-based testing, but you're really just pretending it's a closed domain when it isn't.

This is where property-based testing comes in. You take the open perceptual domain and use it to model the shape of your data. You selectively cut off chunks that are inconsistent with the state of the universe for this test. The result isn't complete; I think of it like the render preview when you're ray tracing a scene in Blender. Rather than a rasterized image, it's sampling color and light based on the physics of the environment. Property-based testing works the same way: you're not getting a perfect picture because the space is too large, but by scattershotting enough of it you can gain real confidence that your implementation is correct.

Where other test types fit

Regression testing doesn't obviously fit this model at first glance, but I think it does once you think about it. Regression tests are just known failure states in a system. If that system is an open perceptual domain, regression cases still live inside it — they're like known scars of your implementation that you're short-circuiting as an optimization. So they fit neatly into the open domain.

Mutation testing is harder to justify. I still think it's an open domain situation, but because you're testing the veracity of your tests rather than the semantics of your code, it's less clear. The problem is no longer about what your code does, it's about how fully you've sampled the assertions that sample your code. I'm still not quite sure where it sits.