Using the SQLite-over-HTTP "hack" to make backend-less, offline-friendly apps

January 1st, 2022

Last spring, a project took over HN's front page in which a SQLite database was hosted on a static file host and successfully queried -- entirely from the frontend. The idea was simple: compile SQLite to web assembly, and add a little layer that redirects all of its disk read attempts to instead go over HTTP range requests.

Yeah, that's pretty fucking clever. Most people just thought of it as just a neat hack, but for me it was groundbreaking. This is a completely new way to make database-driven software. Think about the advantages here:

  1. Scalability - Suddenly your database is as scalable as S3, Wasabi, or whatever object store you choose. You never need to think about sharding, load balancing, or caching.
  2. No REST or GraphQL necessary - Instead of maintaining a typical assortment of GET endpoints to query data, you're using direct SQLite queries, which can be updated without any changes to the server (unless a schema change is needed). Many have adopted GraphQL to get this kind of flexibility, but with this technique you don't need any new layer. Your "REST API" is just SQL.
  3. Offline friendliness - Since your app is already built around SQLite, there is a very clear path to making an offline version -- just remove the HTTP layer and make it read from a local SQLite file.

The last point needs to be repeated multiple times, put up on billboards, and printed on pamphlets so you can canvas your neighborhood spreading the good word. What is the one thing we all utterly hate about modern software, particularly on the web?

It's all so ephemeral.

Web apps come and go, usually taking your data along with them. So much software today requires a constant network connection even when the functionality doesn't require it. With this technique, maybe we can finally reverse the trend and go back to making software that has longevity.

Despite the title, we do still need a small backend for writes. Every time a user modifies the data, they will need the POST to your server, which will modify the SQLite database. This leads us to the big question: how do we update the database? Cloud object stores like S3 do not allow partial file updates, so what happens when your database becomes 1 GB or larger?

EDIT: The stuff below is out of date because i'm no longer using s3. I'm running my own servers, so the multiplexing technique mentioned below is no longer necessary.

For me, the solution was lying inconspicuously in the SQLite source tree: an optional extension that allows you to multiplex a SQLite database across multiple files. Choose a chunk size, and your database will be automatically broken into multiple files as you add data to it. Now, you can just use a tool like rclone to copy the parts that have changed, instead of the entire 1+ GB database.

This is not just theoretical. The technique above is how I built ANSIWAVE BBS. The entire BBS is hosted on S3, and every time someone writes a post, the SQLite database is updated there. As you browse the board, your SQLite queries are pulling the data directly from there, without involving my backend.

Try it out.

This is the only way I was able to implement offline mode. As described on the main page, you can clone the board via git and browse it completely offline using the terminal client. It took less than an hour to implement this, because it's been using SQLite all along.

Of course, there are downsides. You need to obsessively check that your queries are using indexes, because a full table scan over HTTP is not exactly fast. And even with well-optimized queries, it won't be as fast as a query run on the backend.

Nonetheless, ANSIWAVE proves that this technique is practical. We need to start caring about offline friendliness again, and users need to start demanding it. As modern software has progressed, it has regressed by an equal measure. It would be quite a plot twist if a humble 20-year-old embedded database helped to change that.