Back to Insights
civic data accountability5 min read

The Civic Data Gap Is Bigger Than the Data Gap

Share:

The Hospital Price Transparency Rule has been in effect since January 2021. Every US hospital is required to publish a machine-readable file listing the prices it has negotiated with every insurance plan. The data exists. It is technically public. It is discoverable, downloadable, and indexable.

It is also, for the patient trying to make a decision, not data.

A 600,000-row JSON file does not help the person sitting in a parking lot deciding whether to walk into the hospital across the street. A spreadsheet of CPT codes is not a decision aid. The bare fact of publication does not bridge the gap between the data exists and I can use the data.

I want to argue that this gap — the gap between published and usable — is the actual frontier of civic data work. It is bigger than the gap that gets celebrated in policy press releases (the gap between secret and published), and it is the one where the next decade of consumer-facing accountability tools will be built or fail to be built.

Publication is not access

The public-sector definition of access is "we made it available." The citizen's definition of access is "I could read it, understand it, and act on it." These are not the same definition, and the distance between them is enormous.

Consider four examples that I keep coming back to:

  • Hospital price files. Required since 2021. Published by every US hospital. Read by approximately no patients. The format is engineered for software, but no software was funded to read it on the patient's behalf.
  • Court case dockets. Open records in every state. Searchable, in theory, on every court's website. In practice, the search interfaces are built for clerks, the URLs aren't shareable, and the schemas vary so wildly that even attorneys often resort to hand-keying.
  • Inspection reports for restaurants, nursing homes, daycares. Public. Almost never indexed at the level of "which restaurant in my zip code failed last month for which violations."
  • Permitting and zoning data. Public. Downloadable in unwieldy GIS formats that require a master's degree in cartography to parse.

Every one of these datasets satisfies the legal definition of public. Every one of these datasets fails the citizen's definition of accessible.

The gap is downstream of publishers

This is the part that took me the longest to internalize: the agency that publishes the data is not the entity that closes the gap.

Their job ends at compliance. They publish, they file the report, they update the URL once a quarter. The fact that the file is unusable to the people it nominally serves is, from the publisher's perspective, not their problem.

This is not a moral failing. It is a structural one. Compliance is funded; usability is not. A hospital's incentive ends at "we posted the file." A court's incentive ends at "the docket is searchable." Closing the last mile to a person trying to make a decision is downstream work that nobody upstream is paid to do.

That last-mile work is the civic data gap. It is what consumer-facing accountability tools — built by independent operators, journalists, civic technologists, and the occasional rogue founder — exist to close.

What "usable" actually means

Building from raw to usable is not a translation problem. It is a series of distinct disciplines that each have to be done well, and most of them are invisible from the outside:

1. Discovery. Finding every relevant file. Public datasets are typically scattered across thousands of source URLs with no central registry. The first job is just enumerating what exists. 2. Normalization. The same field will be named six ways across publishers. Reconciling these into one schema is a slog, but everything downstream depends on it. 3. Quality verification. Public data has bugs. Date fields with impossible dates. Codes that don't match any known taxonomy. Negotiated rates of $0 or $99,999. Knowing what to throw out is a craft. 4. Joinability. A hospital price by itself is not decision-ready. The same hospital's safety record, infection rate, and quality star — that's decision-ready. The work of joining is most of the work. 5. Trust scaffolding. Methodology pages. Source links. Refresh dates. Disclosed limitations. Without these, the tool is unverifiable, which means it can't be cited, which means it can't be trusted. 6. Reading-level discipline. A patient is not the audience for chargemaster terminology. A defendant is not the audience for a judge's calendar code. Translation to the audience is most of the design work.

Each of those disciplines is a real skill. None of them are funded by the publisher. All of them have to be done correctly for the data to actually serve the citizen.

The gap is the asset

If you are trying to figure out where consumer-facing civic tools come from, the answer is this gap. It is not the data — the data is upstream and free. It is the disciplines listed above, applied carefully, repeatedly, and at the level of an actual person's decision.

The reason this gap will not close on its own is that the upstream incentive structure cannot reach it. The reason it gets closed at all is that small numbers of independent operators decide the gap is worth their time. They build the bridge. The bridge is the tool. The tool is the asset.

I think the next decade of consumer-facing accountability work will be people noticing this gap and walking into it on purpose. It is not a glamorous category. It does not pay quickly. The compounding asset, when it does compound, is trust — earned slowly, over many versions, by people who took the data more seriously than its publishers did.

That's what I'm trying to do with hospital pricing. The data was already public. The civic data gap is the work.

Subscribe to the Newsletter

5-minute notes on civic data, trust engineering, and decision quality.

No spam. Unsubscribe anytime.