April 6, 20266 min read

Atomic SQLite Backups Without the Footgun

SQLite in WAL mode is the silent workhorse of production systems. But copying a WAL-mode database file while it is running is how you get corruption. We built a tool that does it right.

OpenForge devcap memory-quality-gate Harness Design

Repository Release v0.1.0 OpenForge

Featured Chronicle Image

Greyforge Thesis

If your backup strategy is cp database.db backup.db, you do not have a backup strategy.

SQLite ships with an atomic backup API. Almost nobody uses it. This tool makes that API a single command.

>_The problem is not that backups are hard. It is that the easy way corrupts.

SQLite databases in WAL mode split their state across two files: the main database and a write-ahead log. When an application is running, the WAL may contain committed transactions that have not yet been flushed back into the main file. Copy the database file while the WAL is active and your backup is missing data. Copy both files but catch one mid-write, and the backup is corrupt. This is not a theoretical edge case. It is the default failure mode for anyone who reaches for cp, rsync, or a filesystem snapshot without understanding the WAL lifecycle.

SQLite itself has the answer. The online backup API creates a consistent point-in-time copy by holding a shared lock and copying pages atomically. It is safe under concurrent writes. It has been stable since SQLite 3.6.11 (2009). And almost nobody uses it directly because the interface is a C function, not a command-line tool.

The same gap exists for WAL checkpointing. SQLite will auto-checkpoint when the WAL reaches a threshold, but production systems often need explicit control: flush the log before backup, truncate it to reclaim disk, or verify that all pages were checkpointed. The PRAGMA wal_checkpoint interface supports four modes, but wrapping it into reliable automation means handling error codes, timing, and reporting that a one-liner does not provide.

>_We had this problem internally. Then we searched for what existed.

Greyforge runs several long-lived services backed by SQLite in WAL mode. The internal backup scripts worked, but they were one-off, untested, and coupled to infrastructure paths. When we looked at extracting them into a proper tool, we searched for existing solutions first.

The landscape was thin. The most visible project was a Docker-focused shell wrapper that shells out to the .backup dot-command. That approach does not use the online backup API, does not support WAL checkpoint control, and requires a Docker runtime. Other solutions were embedded in larger backup frameworks that bring their own scheduling, cloud provider integrations, and configuration surface. The gap was a standalone, zero-dependency tool that wraps the right SQLite APIs and runs as a single command or Python import.

That gap is what sqlite-checkpoint fills.

>_Four operations. One tool. No dependencies.

The tool exposes four operations, each available as both a CLI command and a Python function.

Checkpoint

sqlite-checkpoint cp myapp.db -m truncate

Flush the WAL into the main database file. Four modes: PASSIVE, FULL, RESTART, TRUNCATE.

Backup

sqlite-checkpoint backup myapp.db /backups/snap.db

Create an atomic copy via the online backup API. Safe under concurrent writes. SHA-256 verified.

Snapshot

sqlite-checkpoint snapshot myapp.db /backups/snap.db

Checkpoint then backup in a single command. The recommended workflow for production.

Info

sqlite-checkpoint info myapp.db

Inspect journal mode, WAL state, page counts, and freelist. JSON output for scripting.

Every operation returns structured results: page counts, checksums, timing. Pass --json and the output is machine-readable for scripting and automation pipelines. The Python API returns frozen dataclasses, so results are hashable and safe to log directly.

>_The design choices that matter.

The backup operation uses Python's sqlite3.Connection.backup(), which is the direct binding to the C-level online backup API. This is not the .backup dot-command. It is the page-level copy mechanism that holds a shared lock and handles concurrent writers transparently. If a write occurs during the backup, the API restarts from the modified page. The result is always a consistent snapshot.

The backup also computes a SHA-256 hash of the destination file and includes it in the result. This is not paranoia. It is the only way to verify backup integrity without opening the file and running PRAGMA integrity_check on every backup. The hash travels with the result object, so downstream systems can verify without re-reading the file.

The tool refuses to overwrite an existing destination. This is deliberate. Silent overwrites in backup tooling are how you lose the backup you were trying to preserve. If you need timestamped rotation or latest-symlink management, that belongs in the scheduling layer, not in the backup primitive itself. The same bounded-responsibility instinct shows up across OpenForge: devcap scans but does not install, memory-quality-gate scores but does not store.

>_Why open source this.

The answer is the same one behind every OpenForgerelease. The useful part is the tool itself. The proprietary value is in the systems that schedule it, the retention policies that wrap it, and the infrastructure it serves. Releasing the backup primitive costs nothing competitive and solves a universally Googled problem: "how to safely back up SQLite in production."

That is the line Greyforge draws on every open-source decision. Share the WHAT and WHY. Protect the HOW. This tool is a well-documented WHAT. The operational context around it stays internal, exactly as described on the About page and in the wider OpenForge catalog.

>_What comes next.

The immediate step is PyPI publishing so pip install sqlite-checkpoint works without routing through GitHub. After that, the useful expansions are retention policy helpers, a --watch mode for periodic snapshots, and integration examples for common schedulers. The tool itself stays narrow. Backup, checkpoint, inspect. The surrounding automation belongs in the layer above.

Navigate Greyforge

This chronicle is part of the OpenForge release series. Each release ships a bounded tool, a rationale, and a clean separation between public utility and private infrastructure.

About Greyforge

The company doctrine, product line, and operating posture behind these releases.

OpenForge

The public catalog where sqlite-checkpoint and other Greyforge utilities live.

Store

The private product surface where SQLite-backed services run in production.

Continue Reading

The same design instincts behind sqlite-checkpoint - bounded scope, zero dependencies, deterministic behavior - run through the rest of the OpenForge catalog.

devcap

A dev environment scanner that audits 100+ tools in under 200ms.

memory-quality-gate

A zero-LLM heuristic filter that rejects garbage before it hits long-term storage.

Harness Engineering

Why reliability comes from the control plane, not the model.