Runbooks
Step-by-step procedures for common operational scenarios — pauses, restores, approval backlogs, and incident response.
These runbooks cover the most common operational scenarios. Keep this page bookmarked and accessible before an incident occurs.
Emergency pause
Use when you observe suspicious agent behavior, runaway queries, or any situation where you need to stop all writes immediately.
Pause the gateway
curl -X POST \ -H "Authorization: Bearer bsp_admin_token" \ -H "Content-Type: application/json" \ -d '{"reason": "Suspicious agent behavior — investigating"}' \ http://localhost:8080/admin/pauseAfter pausing: all WRITE, HIGH, and CRITICAL queries are rejected. SELECT (SAFE) queries continue.
Verify the pause took effect
curl -H "Authorization: Bearer bsp_admin_token" \ http://localhost:8080/admin/statusLook for
"paused": truein the response.Investigate
Check the audit log for the triggering event:
curl -H "Authorization: Bearer bsp_ops_token" \ "http://localhost:8080/metadata/audit?limit=50&risk=CRITICAL"Check alerts:
curl -H "Authorization: Bearer bsp_ops_token" \ http://localhost:8080/metadata/alertsResume when safe
curl -X POST \ -H "Authorization: Bearer bsp_admin_token" \ http://localhost:8080/admin/resume
Restore a table from snapshot
Use after a destructive query executed — whether accidental or approved.
Run guided recovery first
backstop recover \ --db postgresql://postgres@localhost:5432/mydb \ --storage s3://prod-snapshots \ --table usersThis restores to
users_recovered, validates the result, and prints copyback SQL only after validation passes.Find the right snapshot
Use this lower-level path when scripting or when you need a specific snapshot ID.
backstop snapshots list \ --db postgresql://postgres@localhost:5432/mydb \ --storage s3://prod-snapshots \ --table usersNote the
snapshot_idof the snapshot taken before the destructive operation.Dry run first
backstop restore \ --db postgresql://postgres@localhost:5432/mydb \ --storage s3://prod-snapshots \ --snapshot-id snap_a3f9e2c1 \ --table users \ --dry-runThe dry run verifies the manifest checksum and reports what would be written. Always run this first.
Execute the restore to a recovered table
backstop restore \ --db postgresql://postgres@localhost:5432/mydb \ --storage s3://prod-snapshots \ --snapshot-id snap_a3f9e2c1 \ --table users \ --target-table users_recoveredDo not restore over the original table during first response. Restore to a recovered table, validate, then copy back or rename after review.
Validate before copyback
backstop restore-validate \ --db postgresql://postgres@localhost:5432/mydb \ --storage s3://prod-snapshots \ --snapshot-id snap_a3f9e2c1 \ --table users \ --target-table users_recoveredThen generate reviewable copyback SQL:
backstop restore-copyback-plan \ --source-table users_recovered \ --target-table users
Point-in-time recovery (PITR)
Use when the destructive operation happened between snapshots or when you need sub-second precision.
Identify the target time
Find the timestamp just before the incident from the audit log:
curl -H "Authorization: Bearer bsp_ops_token" \ "http://localhost:8080/metadata/audit?limit=100" | jq '.[] | select(.risk_level == "CRITICAL")'Prepare the restore
backstop pitr prepare-restore \ --storage s3://prod-snapshots \ --cluster-id prod \ --backup-id backup_2026-05-06 \ --target-dir /var/lib/postgresql/pitr-restore \ --target-time "2026-05-06 12:29:00+00"This prepares a PostgreSQL recovery directory with
recovery.signaland arestore_commandthat fetches archived WAL through Backstop.Start a recovery instance
Point a PostgreSQL instance at the target directory and start it. It will replay WAL up to the target time and then pause in recovery mode.
Verify the recovery completed to the right point:
psql postgresql://postgres@localhost:5433/mydb \ -c "SELECT pg_last_xact_replay_timestamp()"Promote or export
Either promote the recovery instance to take over, or export specific tables back to production:
pg_dump -t users postgresql://postgres@localhost:5433/mydb | \ psql postgresql://postgres@localhost:5432/mydb
Clear an approval backlog
When a large number of queries are awaiting approval (for example, after a policy change), process them in bulk.
# List all pending
curl -H "Authorization: Bearer bsp_ops_token" \
http://localhost:8080/pending | jq '.pending[] | {id, sql, risk_level, agent_id}'
# Approve by ID
curl -X POST \
-H "Authorization: Bearer bsp_ops_token" \
http://localhost:8080/approve/appr_4f9e2c1a
# Deny by ID
curl -X POST \
-H "Authorization: Bearer bsp_ops_token" \
http://localhost:8080/deny/appr_4f9e2c1aFor bulk operations, use a script:
# Approve all pending from a specific agent
curl -H "Authorization: Bearer bsp_ops_token" http://localhost:8080/pending \
| jq -r '.pending[] | select(.agent_id == "cursor-local") | .id' \
| while read id; do
curl -s -X POST \
-H "Authorization: Bearer bsp_ops_token" \
"http://localhost:8080/approve/$id"
echo "Approved $id"
doneSidecar not heartbeating
When /metadata/health shows sidecar_status: "stale" or sidecar_heartbeat_age_seconds > 120.
- Check sidecar logs:
docker logs backstop-sidecarorjournalctl -u backstop-sidecar - Verify sidecar can reach the gateway:
curl http://<gateway>/healthfrom sidecar host - Check S3 connectivity from sidecar:
backstop doctor storage-permissions --storage s3://... - Restart sidecar:
docker restart backstop-sidecar