The Real Civic Data Gap
The Hospital Price Transparency Rule has been in effect since January 2021. Every US hospital is required to publish a machine-readable file listing the prices it has negotiated with every insurance plan. The data exists. It is technically public. It is discoverable, downloadable, and indexable.
For the patient trying to make a decision, it is also not data.
A 600,000-row JSON file does nothing for the person sitting in a parking lot deciding whether to walk into the hospital across the street. A spreadsheet of CPT codes is not a decision aid. The bare fact that a file got posted does not carry you from the data exists to I can use the data.
That second gap, the one between published and usable, is the real frontier of civic data work. It is wider than the gap policy press releases celebrate, the one between secret and published. And it's the gap where the next decade of consumer-facing accountability tools either gets built or doesn't.
Publication is not access
The public-sector definition of access is "we made it available." The citizen's definition is "I could read it, understand it, and act on it." Those are not the same sentence, and the distance between them is enormous.
Four examples I keep coming back to:
- Hospital price files. Required since 2021. Published by every US hospital. Read by approximately no patients. The format is built for software, and no software was funded to read it on the patient's behalf.
- Court case dockets. Open records in every state. Searchable, in theory, on every court's website. In practice the search interfaces are built for clerks, the URLs won't share, and the schemas vary so wildly that even attorneys end up hand-keying.
- Inspection reports for restaurants, nursing homes, daycares. Public. Almost never indexed down to the question a person actually asks: which restaurant in my zip code failed last month, and for what.
- Permitting and zoning data. Public. Downloadable in GIS formats that want a master's in cartography before they'll tell you anything.
Every one of these satisfies the legal definition of public. Every one of them fails the citizen's definition of accessible.
The gap sits downstream of the publisher
This is the part that took me longest to accept: the agency that publishes the data is not the one that closes the gap.
Their job ends at compliance. They publish, they file the report, they update the URL once a quarter. That the file is useless to the people it nominally serves is, from where they sit, not their problem.
It's not a moral failing. It's a structural one. Compliance gets funded. Usability doesn't. A hospital's incentive ends at "we posted the file." A court's ends at "the docket is searchable." The last mile, the one that reaches a person trying to decide something, is work nobody upstream is paid to do.
That last mile is the civic data gap. It's what consumer-facing accountability tools exist to close, the ones built by independent operators, journalists, civic technologists, and the occasional rogue founder.
What "usable" actually means
Going from raw to usable is not a translation problem. It's a stack of separate disciplines, each one done well or the whole thing fails, and most of them are invisible from the outside:
- Discovery. Find every relevant file. Public datasets are scattered across thousands of source URLs with no central registry. The first job is just naming what exists.
- Normalization. The same field gets named six different ways across publishers. Reconciling them into one schema is a slog, and everything downstream depends on it.
- Quality verification. Public data has bugs. Date fields with impossible dates. Codes that match no known taxonomy. Negotiated rates of $0 or $99,999 sitting in the file as if they were real prices. Knowing what to throw out is a craft.
- Joinability. A hospital price on its own is not decision-ready. Put 4 federal signals in one row — price, safety record, infection rate, quality star — and that's decision-ready. The joining is most of the work.
- Trust scaffolding. 4 things, every time: methodology pages, source links, refresh dates, disclosed limits. Without them the tool is unverifiable, which means it can't be cited, which means it can't be trusted.
- Reading-level discipline. A patient is not the audience for chargemaster terminology. A defendant is not the audience for a judge's calendar code. Writing to the person who's actually reading is most of the design.
Every one of those is a real skill. None of them are funded by the publisher. All of them have to be right before the data does anything for the citizen.
The gap is the asset
If you want to know where consumer-facing civic tools come from, it's this gap. Not the data: the data is upstream and free. The disciplines above, applied carefully, applied again, applied at the level of one real person's decision.
The reason the gap won't close on its own is that the upstream incentive can't reach it. The reason it closes at all is that a small number of independent operators decide it's worth their time. They build the bridge. The bridge is the tool. The tool is the asset.
I think the next decade of consumer-facing accountability work is people seeing this gap and walking into it on purpose. It's not a glamorous category. It doesn't pay quickly. The thing that compounds, when it finally compounds, is trust, earned slowly across many versions by people who took the data more seriously than the people who published it.
That's what I'm trying to do with hospital pricing. The data was already public. The gap is the work.
Related Reading
The Only Thing That Still Compounds
Everyone's drawing the same lesson from AI: the work is free now, so judgment is what's left. That take is everywhere and useless. Here's the narrower one I'd defend — judgment evaporates unless you log it, and the only entries worth keeping are the times you overruled the machine and were right.
4 min readtrust engineeringTrust Is Verifiability, Not Accuracy
Most data tools claim accuracy. Few are verifiable. What turns a dashboard into a citable source isn't the data. It's the receipts a stranger can check.
6 min readSubscribe to the Newsletter
5-minute notes on civic data, trust engineering, and decision quality.
No spam. Unsubscribe anytime.