Troubleshooting — Backstop Docs

Gateway won't start

Symptom: backstop-gateway exits immediately or fails to bind.

Check 1 — Port in use

# macOS / Linux
lsof -i :8080

# Windows
netstat -ano | findstr :8080

Change the port with BACKSTOP_PORT=8081.

Check 2 — Database unreachable

psql $BACKSTOP_DB_URL -c "SELECT 1"

The gateway exits if it can't connect to the database on startup.

Check 3 — Token file missing or malformed

cat $BACKSTOP_TOKENS | python3 -m json.tool

The token file must be valid JSON and contain an array of token objects.

`401 Unauthorized` on every request

The bearer token is missing, misspelled, or not in the token file.

# Verify the token file contains the token you're using
cat $BACKSTOP_TOKENS | jq '.[] | .token'

# Test directly
curl -H "Authorization: Bearer your_token_here" http://localhost:8080/health

Note: /health does not require authentication — if this 401s, check the header syntax. Use Bearer <token>, not Token <token> or Basic <token>.

`403 Forbidden` — insufficient_scope

Your token exists but lacks the required scope for the endpoint.

{
  "error": "insufficient_scope",
  "message": "Token does not have approval:write scope",
  "required_scope": "approval:write"
}

Add the required scope to the token in your tokens.json file and restart the gateway, or use a different token that already has the scope.

execute_query returns `blocked` unexpectedly

Check safety_metadata.policy_reason in the response — it explains exactly why the query was blocked.

Common causes:

Cause	Fix
Table is in `protected_tables`	Intended — this table requires extra protection
Query matches a blocked operation type	Check `BACKSTOP_POLICY_*` environment variables
`parse_error_present: true`	Backstop couldn't parse the SQL — simplify the query or check for syntax errors
Bulk operation exceeds threshold	`affected_percent` is too high — add a WHERE clause to narrow scope

execute_query always returns `approval_required`

If this is expected — your policy is set to approve for the risk level. An operator needs to approve via POST /approve/{id}.

If this is unexpected — check:

# What policies are active?
curl -H "Authorization: Bearer admin-token" \
  http://localhost:8080/admin/status | jq '.policy'

Adjust policy with environment variables:

BACKSTOP_POLICY_HIGH=execute   # Don't require approval for HIGH
BACKSTOP_POLICY_CRITICAL=approve  # Require approval for CRITICAL only

Sidecar shows `stale` in health check

{
  "sidecar_status": "stale",
  "sidecar_heartbeat_age_seconds": 450
}

The sidecar hasn't checked in for more than 120 seconds.

Check sidecar is running: docker ps | grep sidecar or systemctl status backstop-sidecar
Check sidecar logs for errors: docker logs backstop-sidecar --tail 50
Verify sidecar can reach gateway: curl http://<gateway>:8080/health from sidecar host
Check S3 write permissions: backstop doctor storage-permissions --storage s3://your-bucket

`last_snapshot_age_seconds` is growing

The sidecar is running but not producing new snapshots.

Check sidecar logs for snapshot errors
Verify S3 bucket has write space (MinIO: check disk, S3: check bucket policy)
Check if a large snapshot is in progress — it may take several minutes for wide tables

curl -H "Authorization: Bearer ops-token" \
  "http://localhost:8080/metadata/snapshots?table=users" | jq '.[0]'

Restore fails with checksum mismatch

Error: manifest checksum mismatch — snapshot may be corrupted

The Parquet file in S3 doesn't match the checksum stored in the manifest. Possible causes:

File was manually modified in S3
Partial upload (network issue during snapshot)
S3 bucket has object mutation enabled

Try the previous snapshot:

backstop snapshots list --table users --storage s3://...
# Use the next-oldest snapshot_id
backstop recover --table users --snapshot-id snap_previous --storage s3://... --db postgresql://...

Corrupt or quarantined snapshots are not eligible for recovery readiness or guided restore. Treat this as a storage integrity incident: preserve the object, check sidecar logs, verify bucket mutation controls, and use a different valid snapshot or PITR.

`backstop recover` says no valid snapshots exist

The recovery wizard only lists snapshots that are valid, checksummed, and not quarantined.

Check sidecar health: curl http://localhost:9091/health
Check gateway metadata: curl http://localhost:8080/metadata/health
Check storage permissions: backstop doctor storage-permissions --storage s3://... --strict
List snapshots directly: backstop snapshots list --table users --storage s3://... --db postgresql://...
If this is a database-level incident, use backstop recover --type pitr or backstop recover --type logical

Do not force a restore from an invalid or corrupt manifest.

High query latency through the gateway

Backstop adds classification overhead on every query. Expected overhead is 5–15ms for simple queries. If you're seeing much higher latency:

Check gateway host resources — CPU/memory on the gateway process
Check database connectivity — pg_stat_activity for connection wait states
Check snapshot sidecar impact — Snapshots briefly lock tables; don't schedule them during peak load windows
Check policy evaluation — Complex policy rules add evaluation time

# Check Prometheus metrics for gateway processing time
curl http://localhost:8080/metrics | grep backstop_query_duration

Dev token active in production

curl http://your-production-gateway/health \
  -H "Authorization: Bearer dev-token"

If this returns 200, BACKSTOP_DEV_MODE=true is set in your production environment. Remove it immediately and restart the gateway.

Gateway won't start#

401 Unauthorized on every request#

403 Forbidden — insufficient_scope#

execute_query returns blocked unexpectedly#

execute_query always returns approval_required#

Sidecar shows stale in health check#

last_snapshot_age_seconds is growing#

Restore fails with checksum mismatch#

backstop recover says no valid snapshots exist#

High query latency through the gateway#

Dev token active in production#