Snapshot system

How Backstop captures, stores, verifies, and restores table snapshots using Parquet files in S3-compatible object storage.

Backstop's snapshot system is the recovery foundation. Snapshots are point-in-time table captures stored as Parquet files in your own S3-compatible storage. They are what make a CRITICAL operation reversible.

How snapshots are taken

The sync sidecar takes snapshots in two modes:

Continuous background snapshots — The sidecar runs on a configurable interval. By default it captures every discovered table on startup, then captures new, changed, or retry-needed tables on later polls. --snapshot-every-poll=true is available when you want a full table snapshot every poll. For each captured table, it:

  1. Reads all rows from PostgreSQL
  2. Converts row values to Apache Parquet
  3. Uploads to S3/MinIO under s3://bucket/snapshots/{table}/{timestamp}.parquet
  4. Writes a manifest file with row count, schema DDL, captured indexes/constraints, and checksums
  5. Reports liveness via the heartbeat record in SQLite

On-demand snapshots — SDK/local guarded flows can capture before-images immediately before destructive table operations. The gateway recovery gate verifies the latest sidecar snapshot before approved CRITICAL execution.

Storage format

Snapshots are stored in Apache Parquet:

  • Columnar storage — efficient reads for large tables
  • Portable row data — Row values are stored in a form the restore path can load back through the captured table schema
  • Compression — Snappy compression by default
  • Manifest — every snapshot has a .manifest.json alongside it:
{
  "snapshot_id": "snap_a3f9e2c1",
  "table": "users",
  "schema": "public",
  "row_count": 1842933,
  "columns": ["id", "email", "name", "created_at"],
  "parquet_path": "snapshots/users/2026-05-06T10:30:00Z.parquet",
  "checksum": "sha256:9f86d081884c7d659a2feaa0c55ad015",
  "created_at": "2026-05-06T10:30:00Z",
  "source": "sidecar",
  "db_size_bytes": 218000000
}

Snapshot storage URL format

Backstop uses a URL convention to specify S3-compatible storage:

s3://bucket-name@http://endpoint-url
s3://bucket-name                        # AWS S3 (uses AWS SDK credentials)
s3://bucket-name@http://localhost:9000  # MinIO local
s3://bucket-name@https://storage.example.com  # any S3-compatible API

Recovery readiness gate

Before any CRITICAL operation can be approved, the gateway checks:

CheckDefault threshold
Snapshot exists for target tableRequired
Snapshot ageMax 300 seconds (5 min)
Sidecar heartbeat ageMax 120 seconds (2 min)
Manifest checksum validRequired

All checks must pass. If the sidecar has been down for more than 2 minutes, CRITICAL operations are blocked even with operator approval.

Restoring from a snapshot

Guided table recovery

For operator-driven recovery, start with the guided command:

backstop recover \
  --db postgresql://postgres:postgres@localhost:5432/mydb \
  --storage s3://backstop-snapshots@http://localhost:9000 \
  --table users

The wizard lists only valid checksummed recovery points, refuses to restore over the original table by default, restores into users_recovered, validates the restored table, and prints copyback SQL only after validation passes.

Fast table restore

The lower-level backstop restore command reads a Parquet snapshot and writes it back to the target table. It is intended for automation and scripted incident procedures. Runtime depends on table size, storage throughput, and whether you restore into a recovered table first for validation.

# Dry run — shows what would be restored
backstop restore \
  --db postgresql://postgres:postgres@localhost:5432/mydb \
  --storage s3://backstop-snapshots@http://localhost:9000 \
  --snapshot-id snap_a3f9e2c1 \
  --table users \
  --dry-run

# Execute restore
backstop restore \
  --db postgresql://postgres:postgres@localhost:5432/mydb \
  --storage s3://backstop-snapshots@http://localhost:9000 \
  --snapshot-id snap_a3f9e2c1 \
  --table users

The restore:

  1. Verifies the manifest checksum
  2. Reads the Parquet file from S3
  3. Creates the target schema/table from the captured DDL
  4. Inserts all rows in batches
  5. Reapplies captured indexes and constraints on a best-effort basis
  6. Verifies row count, target existence, indexes, constraints, and sample equality where possible

Listing available snapshots

backstop snapshots list \
  --db postgresql://postgres:postgres@localhost:5432/mydb \
  --storage s3://backstop-snapshots@http://localhost:9000 \
  --table users

Or via the API:

curl http://localhost:8080/metadata/snapshots?table=users | jq '.snapshots'

Limitations

  • Snapshots are per-table. They can capture table DDL, indexes, and constraints, but they do not guarantee cross-table transactional consistency. For multi-table domains, configure recovery groups and validate before copyback.
  • Snapshots are not a full PostgreSQL backup. Functions, triggers, grants, extensions, custom types, schemas outside the captured target, and cluster-level state require logical backups or PITR.
  • Snapshot data reflects the state at snapshot time. Data written after the snapshot was taken is not recoverable from the snapshot alone — you need PITR for that window.

For schema recovery and post-snapshot data recovery, see the Operations runbooks.