May 27, 2026

Building Bewks

Why I spent a year of evenings building a self-hosted library, and what the architecture looks like underneath.

Bewks is a self-hosted digital library I’ve been building for ebooks and audiobooks, with a small role-based access layer so a handful of people can share a collection. There’s a live instance, though most of it is gated. This post is about the architecture, and a few decisions I’d defend if cornered.

I wanted a project where I could own the boring middle of a system end-to-end (auth, queues, storage, deploys, smoke tests) at a scale small enough to hold in my head. A personal media library turned out to be a near-perfect shape for that.

The bones

The application has the same shape I’d build for a paying client. There’s a four-layer separation (app/api routes call controllers, controllers call services, services call repositories) with an interface defined at each boundary. That sounds heavyweight for a personal project and it is, but it means I can swap any layer without rippling: SQLite locally, Postgres in production, both behind the same Prisma client. Local disk storage at home, Cloudflare R2 in production, both behind the same storage interface.

That last interface earned itself. I started out assuming I’d just keep files on a server volume. Later I wanted to push them to object storage, and being able to swap the implementation behind a single interface, without touching business logic, felt like the abstraction paying me back for the discipline of writing it.

The Goodreads problem

The hardest single problem in this project, by a long way, was Goodreads metadata.

Goodreads has the most complete book metadata catalog on the public internet, and they very much do not want you to use it programmatically. There is no real public API anymore. The HTML they serve to plain HTTP clients is often a degraded version of what they serve to browsers, and the gap is growing. So I run two paths: a fast HTTP-based scraper for pages that still serve real data, and a Playwright fallback that spins up a headless browser when the fast path comes back empty.

Both paths run as BullMQ jobs on a Redis-backed queue. When I add a book, a job goes on the queue, the worker tries the fast path, falls back to Playwright if it has to, normalizes the result, and writes back to the book record. The user-facing UI doesn’t block on any of this. The book shows up with whatever metadata the file already had, and Goodreads-enriched fields (covers, series, ratings) light up when the job completes.

A side effect I didn’t anticipate: queue-backed enrichment turned out to be the right shape for every enrichment task. The TTS pipeline that converts ebook text into audiobooks via Piper TTS runs the same way. The TMDB/IMDb/Rotten Tomatoes lookups for the media module (added later when I realized I wanted the same library experience for films) run the same way. Pretty much any operation that touches a third party with non-trivial latency or failure modes belongs on a queue, even (maybe especially) in a small system.

Auth, because I had to

This is the part I’d usually skip in a personal project, and the part that made it usable for other people.

There are three credentials that get you into Bewks: a session via NextAuth.js credentials login (humans on the web UI), a long-lived API key prefixed bwks_ and path-scoped (scripts and bot integrations), and a static service token (for things I trust unconditionally, like an internal sync worker). Sessions get full CSRF protection, all auth events go through a structured audit log, and there’s rate limiting in front of the auth endpoints. None of this is unusual, but I wrote it because the moment another person has an account, the project stops being a toy.

The test suite reflects the same shift. There’s Vitest for units, React Testing Library for components, and 19 Playwright E2E specs that each run against an isolated database in parallel. A subset of those specs runs as smoke tests against the live URL after every deploy. If I break login on a Tuesday evening, I want to know on Tuesday evening, not when someone tries to grab a book the next morning.

What I’d change

A few things, if I were starting over:

I’d reach for BullMQ from day one instead of letting “I’ll just do this synchronously for now” linger.
I would not have built my own Goodreads scraper twice (once in the fast path, once in Playwright). I’d write the Playwright path first, get it working, and only add the fast path as an opportunistic optimization once I knew what data the slow path produced.
I’d separate the media module (films, TV) from the books module behind a feature flag earlier than I did. It got bolted on, and the seams show.

Why bother

There is a meta-question hanging over a project like this, which is: why not just use Calibre, or one of the dozen open-source library managers that already exist?

The honest answer is that I learn the most by building the boring middle of a product end-to-end (the auth, the queues, the storage abstractions, the smoke tests against production), and a personal library is the rare case where I have a real user with real expectations (my wife) and full creative control over the surface. Every interesting decision in my day job sits somewhere in this codebase too, only smaller and unblocked.

That is, more than anything, what I want a side project to be.