Make the "semantic web" web 3.0 again -- with the help of SQLite

January 11th, 2022

Before the term was hijacked by crypto-grifters (and, admittedly, a few genuinely neat projects), web 3 (point oh) referred to Tim Berners-Lee's project to promote a standard way to expose and parse metadata on the web. It never took off. In this piece, I'll argue that a breakthrough idea introduced last year by HN user phiresky might be the first practical way to implement it.

An early critique of TBL's idea came from Cory Doctorow, describing his utopian vision as Metacrap -- a term with a different potential meaning in 2022. Doctorow's point was that you can't rely on people to provide accurate or comprehensive metadata on their web pages. We're lazy, and your utopian idea is just more work for us with little short term reward.

The semantic web will never happen if it requires additional manual labor. This is where the SQLite-over-HTTP idea comes in. In the spring of 2021 we saw a demo of a technique in which backend-less static sites could efficiently query SQLite databases using HTTP range requests. In the process, it demonstrated a new kind of web app whose entire database was exposed and queryable from the outside.

I wrote before about how this technique is a powerful way to make software that is more scalable and offline-friendly. What I didn't mention was how it enables all the utopian "open data" concepts that Berners-Lee has been talking about for 20 years. In his TED talk, he made the audience shout out "raw data now!" What could be more raw than a fully-exposed read-only database, queryable by anyone using standard SQL?

Some will scoff at the idea of tying TBL's vision to a single implementation of a database, but exposing this data as XML was never going to work. The data needs to be exposed in its original form; any additional translation step will ensure that most people won't bother. The beauty of this technique is that you are already using SQLite because it's such a powerful database; with no additional work, you can throw it on a static file server and others can easily query it over HTTP.

ANSIWAVE BBS is built entirely this way. The board is a SQLite database, and the clients just query it via range requests. In the future, anyone could build a web app or other client without my permission, and even run their own custom SQL queries on the posts. Data on the web will only be "semantic" if that is the default, and with this technique it will be.