Trust Is Verifiability, Not Accuracy
Most data tools claim to be accurate. The good ones are. The bad ones are too, sometimes, and from the outside you can't tell which is which.
What you can tell, from the outside, is whether the tool is verifiable. Whether you, a stranger with no relationship to whoever built it, could walk a number backward to its source without their help. Whether you could open the methodology page and reach the original file. Whether you could read the refresh date and know how stale the figure in front of you really is.
That line, between accuracy and verifiability, is the one that decides whether a data tool gets cited or ignored. It's almost always under-built. The team is convinced their numbers are right, so they assume everyone else will take it on faith.
They won't. Not without the receipts.
Accuracy is a claim. Verifiability is the receipt.
Accuracy is something you assert about your own work. It's what you put on the homepage. Most of the time it's even true.
Verifiability is what a third party can do to check that claim without ever contacting you: a journalist, an academic, an AI engine, an attorney, a regulator. That's a different kind of work, and it's worth strictly more, because:
- Accuracy without verifiability is invisible. The reader can't tell your accurate tool apart from a confidently wrong one.
- Verifiability without accuracy is also invisible, but only for a while. Errors get caught. The tool either improves or loses standing.
- Verifiability and accuracy together is the only state in which a data tool gets used as a source.
You see the gap most clearly the moment someone tries to cite you. A journalist on deadline doesn't have time to take your accuracy on faith. They need to follow your methodology to the source, confirm the figure, and link it. If your tool offers no methodology, no source links, no refresh date, the journalist has two options: cite you on faith, which the good ones won't, or walk away. Most walk away. The number gets attributed to a federal source instead, and your tool, the one that did the actual joining work, drops out of the chain entirely.
That drop-out is the cost of being unverifiable. It never feels like a cost in the moment, because nobody emails to say "I almost cited you and couldn't." It just accumulates. A few hundred near-misses later you have a tool that is accurate, well-built, used by no one as a source, and quietly losing ground to whichever competitor figured out the receipts.
What "verifiable" actually requires
The discipline here isn't glamorous. It's the part a venture-funded competitor cuts first when they want to ship faster, and it's exactly the part that compounds. The minimum building blocks:
A written methodology page. Not a marketing page. A document that names the source datasets, the join keys, the transformations applied, the rounding rules, the cases the tool excludes and why. Boring to write. The single highest-leverage page on a civic data site.
A per-record source trail. For every figure you show the reader, the reader should be able to reach the original publishing entity. Not a generic "data from federal sources" line. The actual hospital's actual file URL, dated, fetched.
A refresh date. On every page, not buried on the about page. Stale data is fine if you disclose it. Stale data presented as current is fraud, even when it's an accident.
Disclosed limitations. What the tool doesn't cover. What it can't say. Where error is most likely to creep in. The temptation is to hide all of this because it reads like weakness. It isn't weakness. It's the thing that makes the tool trustworthy. A tool that names its limits is a tool that has actually thought about them.
A stable URL pattern. Sounds like an SEO concern. It's a citation concern. If your hospital profile lives at `/hospital/12345` today and `/va/norfolk/sentara-norfolk` next year, every citation written today breaks. The URL is part of what makes the tool verifiable. It has to outlast the redesign.
Provenance over polish. When you're torn between making the page prettier and making the data trail clearer, pick the trail. The audience that would cite you values the trail. The audience that values the polish was never going to cite you anyway.
The two audiences
Most data tools are built for one audience: the end user. The patient, the defendant, the homeowner.
The tools that compound are built for two. The end user, and the second-party auditor: the journalist, academic, AI engine, regulator, attorney who decides whether the tool is citable at all.
They want different things. The end user wants the answer. The auditor wants the receipt. A tool that serves only the end user can be useful. A tool that serves both becomes a source.
Becoming a source is what starts the compounding. Sources get cited, citations attract other sources, the citation graph thickens, and the tool earns standing inside the ecosystem of journalists, academics, and AI engines that decide what counts as ground truth on a topic. None of that happens for a tool that's merely accurate.
The work nobody sees
I keep returning to this because it runs against the usual product instinct. The most leveraged work in a civic data tool is the part the user never reads: the methodology page, the source trail, the disclosed limitations. The user doesn't need to read them. The auditor does. And whether the auditor stays or walks decides whether the tool ever earns the standing to be useful at scale.
You can build for accuracy alone. Plenty do. They make accurate, unverifiable tools, then wonder why nobody cites them, then add features instead of receipts, then quietly disappear.
Or you build for verifiability, accept that it's slower and uglier and harder to raise money on, and end up with a tool that journalists, academics, and AI engines reach for by default. There's no third option. The instinct to skip the receipts and just say the numbers louder is the most expensive instinct in this whole category, and the people who beat it earliest are the ones left holding the durable tools.
Trust is verifiability. The numbers are downstream of that.
Related Reading
The Only Thing That Still Compounds
Everyone's drawing the same lesson from AI: the work is free now, so judgment is what's left. That take is everywhere and useless. Here's the narrower one I'd defend — judgment evaporates unless you log it, and the only entries worth keeping are the times you overruled the machine and were right.
4 min readcivic data accountabilityThe Real Civic Data Gap
Hospitals publish prices. Courts publish dockets. The civic-data work isn't getting it published. It's bridging the gap from published to usable.
5 min readSubscribe to the Newsletter
5-minute notes on civic data, trust engineering, and decision quality.
No spam. Unsubscribe anytime.