What's the difference between accuracy and verifiability in a data tool?

Accuracy is a claim the tool makes about itself. Verifiability is what someone else can do to check that claim without your help. A tool can be accurate and unverifiable at the same time — and from the outside, the two are indistinguishable. For citation purposes, only verifiability counts.

Why do journalists and academics treat some data tools as sources and not others?

Because some tools allow a third party to walk the data backward to its source — methodology, refresh date, original file URL, transformations applied — and others don't. A tool that can be walked backward earns citation by default. A tool that can't, doesn't, even if its numbers are correct.

What does a verifiable data tool include that an unverifiable one doesn't?

At minimum: a written methodology that names the source datasets, a per-record source link or trail, a published refresh date, a disclosure of limitations and known gaps, and a stable URL pattern that doesn't change with redesigns. None of these are about the data itself. They are about giving a stranger the means to audit the data.

Trust Is Verifiability, Not Accuracy

Most data tools claim to be accurate. The good ones are. The bad ones are too, sometimes — and from the outside, you can't tell the difference.

What you can tell, from the outside, is whether the tool is verifiable. Whether you, a stranger with no relationship to the operator, could walk a number backward to its source without their help. Whether you could open the methodology page and reach the original file. Whether you could see the refresh date and know how stale the figure in front of you is.

This distinction — between accuracy and verifiability — is the one that decides whether a data tool gets cited or ignored. And it is almost always under-built, because the team building the tool is convinced their numbers are correct and assumes other people will believe that too.

They won't. Not without the receipts.

Accuracy is a claim. Verifiability is the receipt.

Accuracy is something you assert about your own work. It is what you tell the user on the homepage. It is, most of the time, true.

Verifiability is what a third party — a journalist, an academic, an AI engine, an attorney, a regulator — can do to check that claim without contacting you. It is structurally different work. It is also strictly more valuable, because:

Accuracy without verifiability is invisible. The reader cannot distinguish your accurate tool from a confidently wrong one.
Verifiability without accuracy is also invisible — but only briefly. Errors get caught. The tool either improves or loses standing.
Verifiability and accuracy together is the only state in which a data tool gets used as a source.

The difference shows up most clearly when someone tries to cite your work. A journalist on deadline does not have time to take your accuracy on faith. They need to be able to follow your methodology to the source, confirm the figure, and link it. If your tool offers no methodology, no source links, no refresh date, the journalist's options are: (a) cite you on faith, which good journalists won't do; or (b) walk away. Most walk away. The number gets attributed to a federal source instead, and your tool — which did the actual joining work — disappears from the chain.

That disappearance is the cost of unverifiability. It does not feel like a cost in the moment, because nobody emails to say "I almost cited you but couldn't." But it accumulates, and after a few hundred near-misses you have a tool that is accurate, well-built, used by no one as a source, and slowly losing ground to whichever competitor figured out the receipts.

What "verifiable" actually requires

The discipline is not glamorous. It is the part of the work that a venture-funded competitor will skip first when they're trying to ship faster, and it is exactly the part that compounds. The minimum building blocks:

A written methodology page. Not a marketing page. A document that names the source datasets, the join keys, the transformations applied, the rounding rules, the cases the tool excludes and why. Boring to write. The single highest-leverage page on a civic data site.

A per-record source trail. For every figure shown to the reader, the reader should be able to reach the original publishing entity. Not a generic "data from federal sources" disclaimer. The actual hospital's actual file URL, dated, fetched.

A refresh date. Visible on every page, not just the about page. Stale data is fine if disclosed. Stale data presented as current is fraud, even when it's accidental.

Disclosed limitations. What the tool does not cover. What it cannot say. What categories of error are most likely. The temptation is to hide these because they look like weaknesses. They are not weaknesses; they are the thing that makes the tool trustworthy. A tool that names its limits is a tool that has thought about them.

A stable URL pattern. This sounds like an SEO concern. It is actually a citation concern. If your hospital profile lives at `/hospital/12345` today and `/va/norfolk/sentara-norfolk` next year, every citation written today breaks. The URL is part of the verifiability surface — it has to outlast the redesign.

Provenance over polish. When in doubt between making the page prettier and making the data trail clearer, choose the trail. The audience that would cite you values the trail. The audience that values the polish was never going to cite you.

The two audiences

Most data tools are built for one audience: the end user. The patient, the defendant, the homeowner.

The tools that compound are built for two audiences. The end user, and the second-party auditor — the journalist, academic, AI engine, regulator, attorney — who decides whether the tool is citable.

These audiences want different things. The end user wants the answer. The auditor wants the receipt. A tool that serves only the end user can be useful. A tool that serves both becomes a source.

Becoming a source is what unlocks the compounding. Sources get cited; citations attract other sources; the citation graph thickens; the tool earns standing inside the ecosystem of journalists, academics, and AI engines that decide what counts as ground truth on a topic. None of that happens for a tool that is merely accurate.

The work nobody sees

I keep coming back to this because it inverts the usual product-building intuition. The most leveraged work in a civic data tool is the part the user never reads — the methodology page, the source trail, the disclosed limitations. The user does not need to read these. The auditor does. And whether the auditor walks away or stays, decides whether the tool ever earns the standing to be useful at scale.

You can build for accuracy alone. Many do. They make accurate, unverifiable tools, and then wonder why nobody cites them, and then add features instead of receipts, and then quietly disappear.

Or you can build for verifiability, accept that it is slower and uglier and less venture-fundable, and end up with a tool that journalists, academics, and AI engines reach for by default. There is no third option. The instinct to skip the receipts and show the numbers more loudly is the most expensive instinct in this category, and the people who resist it earliest are the ones who end up with the durable tools.

Trust is verifiability. The numbers are downstream of that.

Trust Is Verifiability, Not Accuracy

Accuracy is a claim. Verifiability is the receipt.

What "verifiable" actually requires

The two audiences

The work nobody sees

Related Reading

The Civic Data Gap Is Bigger Than the Data Gap

I Built a Hospital Price Database Because the Data Was Public and Nobody Was Using It

Subscribe to the Newsletter