Troubleshooting
Diagnosis and fixes for the most common Backstop issues.
Gateway won't start
Symptom: backstop-gateway exits immediately or fails to bind.
Check 1 — Port in use
# macOS / Linux
lsof -i :8080
# Windows
netstat -ano | findstr :8080Change the port with BACKSTOP_PORT=8081.
Check 2 — Database unreachable
psql $BACKSTOP_DB_URL -c "SELECT 1"The gateway exits if it can't connect to the database on startup.
Check 3 — Token file missing or malformed
cat $BACKSTOP_TOKENS | python3 -m json.toolThe token file must be valid JSON and contain an array of token objects.
401 Unauthorized on every request
The bearer token is missing, misspelled, or not in the token file.
# Verify the token file contains the token you're using
cat $BACKSTOP_TOKENS | jq '.[] | .token'
# Test directly
curl -H "Authorization: Bearer your_token_here" http://localhost:8080/healthNote: /health does not require authentication — if this 401s, check the header syntax. Use Bearer <token>, not Token <token> or Basic <token>.
403 Forbidden — insufficient_scope
Your token exists but lacks the required scope for the endpoint.
{
"error": "insufficient_scope",
"message": "Token does not have approval:write scope",
"required_scope": "approval:write"
}Add the required scope to the token in your tokens.json file and restart the gateway, or use a different token that already has the scope.
execute_query returns blocked unexpectedly
Check safety_metadata.policy_reason in the response — it explains exactly why the query was blocked.
Common causes:
| Cause | Fix |
|---|---|
Table is in protected_tables | Intended — this table requires extra protection |
| Query matches a blocked operation type | Check BACKSTOP_POLICY_* environment variables |
parse_error_present: true | Backstop couldn't parse the SQL — simplify the query or check for syntax errors |
| Bulk operation exceeds threshold | affected_percent is too high — add a WHERE clause to narrow scope |
execute_query always returns approval_required
If this is expected — your policy is set to approve for the risk level. An operator needs to approve via POST /approve/{id}.
If this is unexpected — check:
# What policies are active?
curl -H "Authorization: Bearer admin-token" \
http://localhost:8080/admin/status | jq '.policy'Adjust policy with environment variables:
BACKSTOP_POLICY_HIGH=execute # Don't require approval for HIGH
BACKSTOP_POLICY_CRITICAL=approve # Require approval for CRITICAL onlySidecar shows stale in health check
{
"sidecar_status": "stale",
"sidecar_heartbeat_age_seconds": 450
}The sidecar hasn't checked in for more than 120 seconds.
- Check sidecar is running:
docker ps | grep sidecarorsystemctl status backstop-sidecar - Check sidecar logs for errors:
docker logs backstop-sidecar --tail 50 - Verify sidecar can reach gateway:
curl http://<gateway>:8080/healthfrom sidecar host - Check S3 write permissions:
backstop doctor storage-permissions --storage s3://your-bucket
last_snapshot_age_seconds is growing
The sidecar is running but not producing new snapshots.
- Check sidecar logs for snapshot errors
- Verify S3 bucket has write space (MinIO: check disk, S3: check bucket policy)
- Check if a large snapshot is in progress — it may take several minutes for wide tables
curl -H "Authorization: Bearer ops-token" \
"http://localhost:8080/metadata/snapshots?table=users" | jq '.[0]'Restore fails with checksum mismatch
Error: manifest checksum mismatch — snapshot may be corruptedThe Parquet file in S3 doesn't match the checksum stored in the manifest. Possible causes:
- File was manually modified in S3
- Partial upload (network issue during snapshot)
- S3 bucket has object mutation enabled
Try the previous snapshot:
backstop snapshots list --table users --storage s3://...
# Use the next-oldest snapshot_id
backstop recover --table users --snapshot-id snap_previous --storage s3://... --db postgresql://...Corrupt or quarantined snapshots are not eligible for recovery readiness or guided restore. Treat this as a storage integrity incident: preserve the object, check sidecar logs, verify bucket mutation controls, and use a different valid snapshot or PITR.
backstop recover says no valid snapshots exist
The recovery wizard only lists snapshots that are valid, checksummed, and not quarantined.
- Check sidecar health:
curl http://localhost:9091/health - Check gateway metadata:
curl http://localhost:8080/metadata/health - Check storage permissions:
backstop doctor storage-permissions --storage s3://... --strict - List snapshots directly:
backstop snapshots list --table users --storage s3://... --db postgresql://... - If this is a database-level incident, use
backstop recover --type pitrorbackstop recover --type logical
Do not force a restore from an invalid or corrupt manifest.
High query latency through the gateway
Backstop adds classification overhead on every query. Expected overhead is 5–15ms for simple queries. If you're seeing much higher latency:
- Check gateway host resources — CPU/memory on the gateway process
- Check database connectivity —
pg_stat_activityfor connection wait states - Check snapshot sidecar impact — Snapshots briefly lock tables; don't schedule them during peak load windows
- Check policy evaluation — Complex policy rules add evaluation time
# Check Prometheus metrics for gateway processing time
curl http://localhost:8080/metrics | grep backstop_query_durationDev token active in production
curl http://your-production-gateway/health \
-H "Authorization: Bearer dev-token"If this returns 200, BACKSTOP_DEV_MODE=true is set in your production environment. Remove it immediately and restart the gateway.