Skip to content

Starting, Stopping & Monitoring

Use xian-cli for local operator-facing lifecycle commands. Use xian-deploy for the remote Linux-host equivalents. Use xian-stack directly only for backend validation or low-level debugging.

For the maintained whole-stack local validation flow, use the dedicated 5-Validator Localnet E2E run instead of stitching together ad hoc load, governance, and logging checks by hand.

Start and Stop

bash
uv run xian node start validator-1
uv run xian node stop validator-1

If the node profile enables the dashboard, xian node start also brings up the optional dashboard service on the configured host/port.

If the node profile enables monitoring, xian node start also brings up the Prometheus and Grafana sidecars through the xian-stack backend.

If the node profile enables xian-intentkit, xian node start also brings up the stack-managed IntentKit frontend, API, worker, scheduler, and support services as a separate Compose project.

If the node profile enables xian-dex-automation, xian node start also brings up the deterministic DEX automation sidecar. The default admin UI/API is http://127.0.0.1:38280.

Status

bash
uv run xian node status validator-1
uv run xian node endpoints validator-1
uv run xian node health validator-1

node status reports:

  • whether the node home is initialized
  • the resolved manifest and profile paths
  • the xian-stack backend state when available
  • optional live RPC reachability
  • the age of the latest observed block so a stalled chain is visible even when the RPC is still reachable
  • the configured image mode plus registry image digests when the profile uses published images
  • the embedded release-manifest provenance block for canonical images, including component Git refs and build toolchain
  • the actual running container image names seen by Docker when the backend is reachable
  • a compact summary of readiness, sync height, peer count, and optional dashboard / monitoring / xian-intentkit / xian-dex-automation reachability
  • the effective local endpoint catalog for RPC, abci_query, metrics, and optional dashboard / monitoring / xian-intentkit / xian-dex-automation services

node health is the concise machine-readable live-health view. It adds:

  • backend state as healthy, degraded, or stopped
  • RPC reachability and current sync detail
  • CometBFT and Xian metrics reachability
  • BDS queue, pool, spool, lag, and database status when services.bds.enabled is true
  • optional dashboard / Prometheus / Grafana reachability when enabled
  • optional xian-intentkit frontend and API reachability when enabled
  • optional xian-dex-automation sidecar reachability when enabled
  • optional disk-pressure checks through the local xian-stack storage report
  • rendered state-sync readiness from config.toml
  • the effective snapshot bootstrap URL

node endpoints is the quickest discovery command for local operator URLs. It prints the expected entrypoints for:

  • CometBFT RPC
  • CometBFT /status
  • ABCI query
  • CometBFT metrics
  • Xian metrics
  • BDS status and spool ABCI query URLs when services.bds.enabled is true
  • GraphQL when services.bds.enabled is true
  • dashboard and dashboard status when enabled
  • Prometheus and Grafana when monitoring is enabled
  • xian-intentkit frontend and API health URLs when enabled
  • xian-dex-automation admin UI, health, wallet, rules, and runs URLs when enabled

For stack-managed nodes, the endpoint catalog reflects the actual published Docker host ports of the running services when they differ from the profile defaults. That matters for localnet and validation workspaces that remap ports to avoid collisions.

Dashboard Monitoring

When the optional dashboard is enabled, the Execution Health -> Parallel row is the quickest operator-facing check for speculative block execution posture.

Current behavior:

  • disabled: the node config has parallel execution turned off
  • activated: the node config has parallel execution turned on
  • activated · <workers> workers · min <n> tx · waiting for eligible block: the feature is enabled, but no recent block met the threshold yet
  • activated · <accepted> accepted / <prefiltered> prefiltered / <fallback> fallback: a recent eligible block used the parallel executor and the row is showing that block's execution summary

That row is config-aware now, so it no longer looks disabled just because the latest block was too small or not eligible.

doctor is the broader workspace and node-health preflight:

bash
uv run xian doctor validator-1
uv run xian doctor validator-1 --skip-live-checks

By default, doctor <name> now performs live health checks when the node name is known. That includes:

  • xian-stack backend reachability
  • RPC reachability and the current sync summary
  • dashboard reachability when the profile enables it
  • Prometheus and Grafana reachability when monitoring is enabled
  • state-sync readiness from the rendered CometBFT config
  • snapshot-bootstrap availability from the effective snapshot_url
  • trusted snapshot-manifest signing keys when the effective bootstrap source is manifest-backed

Use --skip-live-checks when you want the older artifact-only behavior for an offline preflight.

Application Logs

The Xian application runtime writes its own logs separately from CometBFT's logs.

Current behavior:

  • live stderr output follows [xian].app_log_level
  • rotated application log files live under .cometbft/xian/logs
  • rotation follows [xian].app_log_rotation_hours
  • retention follows [xian].app_log_retention_days
  • both stderr and file logging are queued asynchronously to avoid blocking the runtime on every write
  • when [xian].app_log_json = true, both stderr and the rotated file sink are structured JSON

Use this logger when you need to answer questions like:

  • why a transaction was rejected in CheckTx
  • why prepare_proposal dropped a transaction
  • what happened during finalize_block
  • why readonly simulation was rejected, timed out, or failed

transaction_trace_logging is the noisy per-transaction debug mode. Keep it off for normal operation and enable it temporarily when you need tx-by-tx execution summaries.

Practical detail:

  • app_log_level=DEBUG gives you compact per-tx summaries
  • app_log_level=TRACE is the expensive mode that also emits full serialized tx-result payloads

Practical Log Workflow

Typical local path:

bash
tail -f ~/.cometbft/xian/logs/*.log

If JSON logging is enabled:

bash
tail -f ~/.cometbft/xian/logs/*.log | jq .

Useful patterns to look for:

  • stage=check_tx for mempool admission failures
  • stage=prepare_proposal for transactions dropped before proposal assembly
  • stage=process_proposal for proposed-block rejection reasons
  • stage=finalize_start, stage=finalize_parallel, and stage=finalize_complete for block lifecycle debugging
  • stage=simulate_tx for readonly simulation saturation, timeout, or worker failures

Recommended escalation order:

  1. start with app_log_level=INFO
  2. if that is not enough, move to app_log_level=DEBUG
  3. only use app_log_level=TRACE for short-lived deep tx debugging

When you are done debugging, turn transaction_trace_logging back off and return the node to its normal log level.

Backend Commands

From xian-stack, the stable machine-readable backend is:

bash
python3 ./scripts/backend.py start --no-bds-enabled --no-dashboard --no-monitoring
python3 ./scripts/backend.py status --no-bds-enabled --no-dashboard --no-monitoring
python3 ./scripts/backend.py endpoints --no-bds-enabled --no-dashboard --no-monitoring
python3 ./scripts/backend.py health --no-bds-enabled --no-dashboard --no-monitoring
python3 ./scripts/backend.py stop --no-bds-enabled --no-dashboard --no-monitoring

With stack-managed xian-intentkit:

bash
python3 ./scripts/backend.py start --bds-enabled --intentkit --intentkit-network-id xian-mainnet
python3 ./scripts/backend.py endpoints --bds-enabled --intentkit --intentkit-network-id xian-mainnet
python3 ./scripts/backend.py health --bds-enabled --intentkit --intentkit-network-id xian-mainnet
python3 ./scripts/backend.py stop --bds-enabled --intentkit --intentkit-network-id xian-mainnet

With stack-managed xian-dex-automation:

bash
python3 ./scripts/backend.py start --no-bds-enabled --dex-automation
python3 ./scripts/backend.py endpoints --no-bds-enabled --dex-automation
python3 ./scripts/backend.py health --no-bds-enabled --dex-automation
python3 ./scripts/backend.py stop --no-bds-enabled --dex-automation

For BDS-enabled integrated runs:

bash
python3 ./scripts/backend.py start --bds-enabled --monitoring
python3 ./scripts/backend.py endpoints --bds-enabled --monitoring
python3 ./scripts/backend.py health --bds-enabled --monitoring --no-check-disk

The maintained stack now defaults to a fail-closed network posture:

  • CometBFT RPC binds to 127.0.0.1 unless you pass --public-rpc
  • CometBFT and Xian metrics bind to 127.0.0.1 unless you pass --public-metrics
  • PostGraphile binds to 127.0.0.1 unless you enable BDS and pass --public-query
  • local BDS and PostGraphile credentials are generated once into xian-stack/.stack-secrets.env

That split is intentional. --public-query publishes the read-only indexed surface. It does not also publish live CometBFT RPC, mempool access, or raw ABCI submission endpoints.

Examples:

bash
# Public read/query surface on a BDS node.
python3 ./scripts/backend.py start --bds-enabled --public-query

# Explicit public RPC and metrics on a node without BDS.
python3 ./scripts/backend.py start --no-bds-enabled --public-rpc --public-metrics

Host-side storage inspection from xian-stack:

bash
python3 ./scripts/backend.py storage-report
make storage-report

Remote Hosts With xian-deploy

Use xian-deploy when the node is running on a remote Linux host and you want the deployment-side equivalent of the local health and recovery workflow.

Remote starter flows use the same node profiles as local flows. In xian-deploy, set xian_node_profile for each host:

yaml
xian_node_profile: /path/to/network/nodes/validator-1.json

The profile decides runtime intent such as BDS, dashboard, monitoring, block policy, pruning, metrics, P2P peers, snapshots, state sync, and node images. Inventory stays focused on deployment bindings such as host paths, published ports, database credentials, memory limits, and xian_deploy_topology.

The xian_runtime role exposes the same node-local runtime controls that xian-configure-node writes: logging, simulation, transaction fee mode, pending-nonce limits, metrics, state sync, BDS, P2P peers, snapshot verification, and the advanced parallel execution guardrails. Host-publish variables still decide which container ports are reachable outside the remote host.

Common entrypoints:

bash
ansible-playbook playbooks/status.yml
ansible-playbook playbooks/health.yml
ansible-playbook playbooks/smoke.yml

What they are for:

  • status.yml: inspect the remote deployment state through the runtime role
  • health.yml: the full remote equivalent of xian node health plus the broader deployment checks that matter on a host
  • smoke.yml: a lighter post-deploy sanity check for services and endpoints

The remote health playbook checks:

  • expected running containers for the selected topology
  • RPC reachability and current sync status
  • Xian metrics
  • optional dashboard / Prometheus / Grafana reachability
  • BDS queue, spool, lag, and database state when BDS is enabled
  • rendered state-sync readiness from the remote config.toml
  • deploy-root and BDS-spool disk pressure

Recovery Runbooks

Use the recovery/bootstrap path that matches the artifact you already have. These procedures are operator runbooks, not contract APIs. Prefer the least destructive path that restores the node to a known-good state, and verify with xian node health, remote health.yml, or /bds_status before returning the node to normal service.

Prepared Node-Home Archive

Use this when you already have a full prepared .cometbft home for the target node.

Local path:

bash
uv run xian network join validator-1 --network testnet
uv run xian node init validator-1

For mainnet recovery, pass the operator-supplied mainnet manifest with --network-manifest.

Remote path:

bash
ansible-playbook playbooks/push-home.yml
ansible-playbook playbooks/deploy.yml

This is the closest remote equivalent to a local snapshot_url / node-home restore workflow.

Application-State Snapshot Import

Use this when you have an exported xian-state-snapshot archive and want to restore Xian application state without replacing the full prepared node home.

Local path:

bash
uv run xian-state-snapshot import --input-path ./xian-state-snapshot.tar.gz

Remote path:

bash
ansible-playbook playbooks/restore-state-snapshot.yml

Required remote variable:

  • xian_state_snapshot_archive

Protocol State Sync

Use this when you want the node to bootstrap from trusted peers that already serve Xian application snapshots through CometBFT state sync.

Required profile settings:

  • advanced.statesync.enabled=true
  • at least two advanced.statesync.rpc_servers
  • advanced.statesync.trust_height
  • advanced.statesync.trust_hash
  • advanced.statesync.trust_period

Local path:

  • set the rendered [statesync] config through the node profile
  • use xian node health / xian doctor to verify readiness

Remote path:

bash
ansible-playbook playbooks/bootstrap-state-sync.yml

This playbook validates the state-sync profile settings first, deploys the runtime, then prints a focused bootstrap summary from the remote host.

Snapshot Or State-Sync Bootstrap Fails

Use this when a newly joined node fails during snapshot restore or gets stuck before it can join through state sync.

Operator response:

  1. confirm the node is using the intended network manifest and profile
  2. run the local or remote health check before deleting any state
  3. verify the snapshot URL, signed manifest keys, or state-sync trust data
  4. confirm at least two state-sync RPC servers are reachable and on the same chain
  5. confirm the trust height and trust hash came from that chain and are still inside the trust period
  6. retry the restore only after the inputs are corrected

Local checks and retry:

bash
uv run xian node health validator-1
uv run xian doctor validator-1
uv run xian snapshot restore validator-1

Remote checks:

bash
ansible-playbook playbooks/health.yml
ansible-playbook playbooks/bootstrap-state-sync.yml

Important boundaries:

  • snapshot_url restores a prepared node-home archive or signed snapshot manifest
  • CometBFT state sync restores Xian application snapshots from trusted peers
  • BDS/Postgres data is not restored by either path; rebuild or import BDS separately when indexed history matters

Forward State Patch Activation

Use this when the chain is still live and the protocol issue can be corrected forward without rewriting finalized history.

Operator checklist:

  1. place the approved patch bundle under config/state-patches/ on every validator
  2. confirm the local bundle inventory and bundle_hash match
  3. approve the state_patch proposal on-chain through protocol governance
  4. verify the scheduled activation height through the query/API surfaces
  5. watch the activation block and confirm the patch status moves to applied

Useful inspection paths:

text
GET /api/abci_query/state_patch_bundles
GET /api/abci_query/scheduled_state_patches/<height>
GET /api/abci_query/state_patches
GET /api/abci_query/state_patches_for_block/<height>

Important boundary:

  • validators must already have the local bundle before the activation block
  • if the local bundle is malformed or mismatched, the runtime now fails hard instead of silently skipping the patch

Consensus-Halt Emergency Recovery

Use this when the bug itself prevents the chain from continuing, for example a determinism or metering issue that causes validators to diverge during block execution.

In that case, on-chain governance is not enough by itself because the chain may not advance far enough to approve or execute a patch.

Operator response:

  1. stop validators from continuing divergent execution
  2. agree off-chain on the fixed runtime build and recovery procedure
  3. roll the validator set onto the same fixed deterministic runtime
  4. restart the network in a coordinated way
  5. if the resulting state still needs correction, use a governed forward patch after recovery

Treat this as a social-consensus / operator runbook event, not a normal contract-level governance action.

For the concrete JSON plan format and xian recovery validate/apply commands, see Recovery Plans.

RPC Reachable But Height Is Stale

Use this when /status answers but the latest block height or block age stops moving.

Operator response:

  1. compare local height against at least one trusted peer or public RPC
  2. check local peer count and whether CometBFT reports catching_up
  3. inspect application logs for process_proposal, finalize_block, or state-patch failures
  4. if only the local node is isolated, fix peers/seeds or restart the local runtime
  5. if multiple validators are stalled at the same height, stop treating it as a local repair and coordinate with the validator set

Useful checks:

bash
uv run xian node status validator-1
uv run xian node health validator-1
uv run xian node endpoints validator-1

Remote checks:

bash
ansible-playbook playbooks/status.yml
ansible-playbook playbooks/health.yml

Important boundary:

  • do not wipe local state just because the RPC process is reachable but stale
  • a local peer/connectivity issue is a node operation
  • a network-wide halt or app-hash divergence is a coordinated recovery-plan event

Monitoring Layers

Use the monitoring surfaces in this order:

  • CometBFT RPC and raw ABCI query for canonical low-level reads
  • Xian Prometheus metrics plus CometBFT metrics for alerting and time-series monitoring
  • dashboard REST/WebSocket for operator UX and exploration
  • BDS-backed ABCI query for indexed/history reads
  • GraphQL/PostGraphile v5 only as an optional convenience layer over BDS

Dashboard and GraphQL

Optional services:

  • dashboard: port 8080 by default
  • Xian Prometheus metrics: port 9108 by default
  • CometBFT metrics: port 26660 by default
  • Prometheus: port 9090 by default
  • Grafana: port 3000 by default
  • GraphQL/PostGraphile v5: port 5000 when BDS is enabled

Use the dashboard for chain inspection and WebSocket subscriptions.

For a direct local dashboard process against an already running node:

bash
uv run --project /path/to/xian-abci python3 -m xian.dashboard.cli \
  --rpc-url http://127.0.0.1:26657 \
  --host 127.0.0.1 \
  --port 18080

Use Prometheus and Grafana for remote monitoring, alerting, and retention.

Profile-specific monitoring assets exist on top of the generic overview:

  • Xian Shared Network dashboard for consortium/shared-network BDS nodes
  • shared-network Prometheus alert variants

From xian-stack:

bash
make monitoring-up
make monitoring-down
make monitoring-bds-up
make monitoring-bds-down
make monitoring-fidelity-up
make monitoring-fidelity-down

The built-in monitoring commands now map to meaningful monitoring postures:

  • monitoring-up: generic integrated monitoring with the overview dashboard
  • monitoring-bds-up: integrated BDS monitoring with the overview and BDS recovery dashboards
  • monitoring-fidelity-up: shared-network monitoring with the shared-network alert variant

What gets scraped:

  • CometBFT metrics on :26660
  • Xian metrics on :9108

In the Docker stack, Xian performance snapshots are enabled by default so the dashboard can show recent execution timing without additional setup. Override that with XIAN_PERF_ENABLED=0 if you explicitly want to disable the /perf_status snapshot path.

What the dashboard adds without duplicating the main node cards:

  • validator-set visibility with set height, active validator count, total power, a clickable validator list, and rows for jumping to a known peer dashboard target
  • validator rows keep a consistent height even when the local validator row is marked with the self badge
  • the validator list expands to use the full panel height for smaller validator sets and stays scrollable for larger ones, so the validator card does not leave dead space on the standard desktop layout
  • live P2P peer visibility separate from the validator set, so current network connectivity problems remain visible even when the consensus membership is unchanged
  • a dedicated explorer at /explorer, plus /explorer/contracts, /explorer/addresses, and /explorer/events, so block/event browsing stays in the explorer instead of duplicating those tables on the main dashboard
  • the block explorer auto-refreshes while you are on the newest block page, but keeps older paginated block views stable while you inspect historical data
  • /explorer/addresses opens with a recent indexed address list instead of an empty prompt-only view when the node exposes the BDS sender-history index, and selecting a row drills down into that address's submitted transaction history
  • contract browsing sorted by creation date or name
  • contract code browsing with syntax-highlighted original source when that source is available, explicit runtime-code labeling when only the stored runtime form is available, and function-to-source jumping
  • address drill-down that shows indexed sender history and lets you reopen tx detail from an address page
  • richer contract metadata, including owner / developer / deployer / creator fields, clickable address links, and indexed generated developer-reward totals when BDS is available
  • recent indexed event browsing on nodes with BDS enabled
  • execution health from /perf_status, plus explicit visibility when advanced perf capture is disabled
  • BDS lag, pending-buffer depth, pool utilization, spool state, filesystem-free space, and alerts from /bds_status
  • click-to-copy middle truncation for long node identity values in the dashboard cards
  • peer switching that keeps the dashboard scoped to a selected node, including localnet host-port inference for the standard node-<n> layout

Use the node's ABCI query surface for canonical reads:

  • raw current-state reads like /get/..., /contract_source/..., /contract_ir/..., and /simulate_tx/...
  • BDS-backed indexed/history reads like /blocks/..., /tx/..., /events/..., /state_history/..., and /developer_rewards/... when BDS is enabled
  • BDS operator reads like /bds_status and /bds_spool/... to inspect queue, spool, and indexed-head health
  • performance reads like /perf_status to inspect recent block timing and tracer metadata

Use GraphQL only when you want a convenience query layer over the BDS database.

BDS Catch-Up and Reindex

When BDS is enabled, the validator finalizes blocks first and BDS indexes them asynchronously. Live finalized blocks are buffered in memory and persisted in strict contiguous block order.

If BDS sees a gap, it catches up from CometBFT RPC automatically while newer live blocks keep arriving.

Example:

  • indexed head is 100
  • live block 102 arrives before 101 was indexed
  • BDS keeps 102 pending
  • the catch-up worker fetches 101 from RPC
  • BDS writes 101, then 102

So yes: BDS can receive new block data and simultaneously retrieve missed data. It just never persists them out of order.

For explicit offline spool maintenance:

bash
uv run xian-bds-spool compact --offline
uv run xian-bds-spool drain --offline

What these are for:

  • compact: remove stale spool files that are already covered by the indexed BDS head
  • drain: persist the currently pending local spool into Postgres on an existing BDS database

Use drain when BDS was temporarily unavailable but the local spool still has the missing finalized blocks. Do not use it as a cold-bootstrap replacement for historical indexing.

For full historical backfill, use:

bash
uv run xian-bds-reindex

Useful options:

bash
uv run xian-bds-reindex --start-height 1000
uv run xian-bds-reindex --end-height 5000
uv run xian-bds-reindex --rpc-url http://127.0.0.1:26657
uv run xian-bds-reindex --reset

What this needs:

  • local or remote CometBFT RPC access
  • retained block history for the heights you want to index
  • a maintenance window or runtime shape where the reindex command is not racing another live BDS writer against the same Postgres database

BDS Lag After A Database Outage

Use this when BDS was temporarily unable to write, for example because Postgres was down or unreachable, but the BDS database is not believed to be corrupted.

Operator response:

  1. restore Postgres or the network path to Postgres
  2. check /bds_status and confirm db_status, worker_running, indexed.indexed_height, height_lag, and catching_up
  3. let automatic catch-up run while height_lag is decreasing
  4. use offline spool maintenance only if the local spool contains finalized blocks that were not persisted
  5. use xian-bds-reindex only if automatic catch-up cannot close the gap

Useful checks:

text
GET /api/abci_query/bds_status
GET /api/abci_query/bds_spool/limit=50/offset=0

Offline maintenance commands:

bash
uv run xian-bds-spool compact --offline
uv run xian-bds-spool drain --offline
uv run xian-bds-reindex --rpc-url http://127.0.0.1:26657

Important boundary:

  • do not use --reset for ordinary lag
  • run spool maintenance or manual reindex from a maintenance shell where the same BDS database is not also being actively rewritten by another BDS process

BDS Database Corruption: Reset And Rebuild

Use this when Postgres has corrupted BDS rows or schema, BDS cannot start cleanly after restart, or indexed reads are known to be wrong. This rebuilds the optional indexed database; it does not rewrite CometBFT history or Xian consensus state.

Prerequisites:

  • stop the BDS writer that normally uses this Postgres database
  • keep or choose a trusted CometBFT RPC source for the reindex
  • make sure that RPC source retains every height you need to rebuild

Procedure:

bash
uv run xian-bds-reindex --reset --rpc-url http://127.0.0.1:26657

Use a remote or archival RPC URL when the local node is stopped during the BDS maintenance window:

bash
uv run xian-bds-reindex --reset --rpc-url https://archival-rpc.example.invalid:26657

After restart, verify:

text
GET /api/abci_query/bds_status

Expected recovery signals:

  • db_status is ok
  • worker_running is true
  • indexed.indexed_height reaches the current node height
  • height_lag reaches 0 or keeps decreasing during catch-up
  • catching_up becomes false when the index is current

Important boundaries:

  • --reset resets BDS schema and local spool before replaying indexed history
  • it is the right tool for a corrupted BDS database, not for corrupted consensus state
  • if the RPC source lacks old block history, the rebuild will stop at the missing height; use an archival RPC source or BDS snapshot instead

Missing History On Pruned Nodes

Use this when a local reindex cannot reconstruct older heights because local CometBFT history has already been pruned.

Practical options:

bash
uv run xian-bds-reindex --rpc-url https://archival-rpc.example.invalid:26657
uv run xian-bds-snapshot import --input-path ./xian-bds-snapshot.tar.gz --clear-spool

On xian-stack, use the backend wrapper for snapshot import:

bash
python3 ./scripts/backend.py bds-snapshot-import --clear-spool

Important boundary:

  • pruning does not remove current LMDB application state
  • pruning does remove local historical block data needed for local replay and BDS rebuilds
  • keep at least one archival RPC source or recent BDS snapshot for indexed deployments

Chain State Snapshots

Application-state snapshots are separate from BDS snapshots.

Use them when you want CometBFT state sync or a clean local application-state archive:

bash
uv run xian-state-snapshot list
uv run xian-state-snapshot export
uv run xian-state-snapshot export --output-path ./xian-state-snapshot.tar.gz
uv run xian-state-snapshot import --input-path ./xian-state-snapshot.tar.gz

What these snapshots contain:

  • latest Xian application height and state-root app hash
  • contract state
  • nonce state

What they do not contain:

  • full CometBFT data/ history
  • BDS/Postgres data

Use snapshot_url restore when you already have a full prepared node-home archive. In the normal node-profile flow this can now be either a direct archive URL with an explicit SHA256, or a signed snapshot manifest validated against trusted Ed25519 public keys.

Use xian-state-snapshot plus CometBFT state sync when you want protocol-level application snapshot bootstrap.

The app hash is a 32-byte Merkle root over Xian consensus state. It includes contract key/value state and committed nonce keys, and excludes local runtime metadata that is not part of consensus. Validators update this root incrementally during block finalization from the block's pending state writes. When a snapshot is imported, Xian recomputes the root from exported_state.json and rejects the archive if it does not match the advertised app hash.

To consume peer-served application snapshots through state sync, configure the node with trusted RPC servers and trust metadata:

bash
uv run xian-configure-node \
  --moniker validator-1 \
  --validator-privkey <hex> \
  --statesync-enable \
  --statesync-rpc-server http://rpc-1.example:26657 \
  --statesync-rpc-server http://rpc-2.example:26657 \
  --statesync-trust-height 123456 \
  --statesync-trust-hash <trusted-block-hash>

Current model:

  • snapshot export is manual
  • snapshot serving/loading is implemented through the ABCI snapshot lifecycle
  • imported snapshots are stored locally so the node can serve them afterward
  • snapshot contents are verified against the trusted CometBFT app hash

Pruning

Current pruning is block-history pruning through retain_height.

The short operational rule is:

  • current LMDB application state remains available
  • historical local replay/reindex depends on retained block history
  • disabling pruning later does not restore history that has already been pruned

For the full operator policy, sizing guidance, setup flags, and recovery implications, see Pruning And History Retention.

BDS Snapshot Export and Import

For faster bootstrap, migration, or recovery, BDS can now be exported and imported separately from the live chain state:

bash
uv run xian-bds-snapshot export --output-path ./xian-bds-snapshot.tar.gz
uv run xian-bds-snapshot import --input-path ./xian-bds-snapshot.tar.gz
uv run xian-bds-snapshot import --input-path ./xian-bds-snapshot.tar.gz --clear-spool

On xian-stack, the standardized operator path is:

bash
python3 ./scripts/backend.py bds-snapshot-export
python3 ./scripts/backend.py bds-snapshot-import
python3 ./scripts/backend.py bds-snapshot-import --clear-spool

That backend command writes and reads the canonical archive at:

bash
.cometbft/snapshots/xian-bds-snapshot.tar.gz

You can override it, but it must still live under XIAN_COMETBFT_HOME so the stack container can access the file.

Recommended use:

  • export from a healthy indexed node
  • import into a stopped node before bringing BDS online
  • use --clear-spool when the local spool may contain stale or mismatched payloads from before the import
  • let the local spool replay or xian-bds-reindex fill any remaining gap after the imported indexed height

Snapshot import is the best path when:

  • BDS is being enabled for the first time on a large network
  • the local node is pruned and cannot rebuild full history from its own RPC
  • you want a faster bootstrap than replaying the whole chain from scratch

Operational requirement for shielded / indexed deployments:

  • keep at least one recent exported BDS snapshot from a healthy indexed node
  • keep at least one archival recovery source, or another node that can still export a full BDS snapshot
  • treat BDS snapshot export/import as part of normal recovery validation, not a one-off disaster procedure

Storage and Retention

Docker images themselves are immutable layers. The thing that grows during node operation is host-side storage:

  • CometBFT data under .cometbft
  • Xian state under .cometbft/xian
  • the local BDS spool under .cometbft/xian/bds-spool
  • Postgres data under .bds.db
  • Docker build cache, image layers, writable layers, and container logs

Use the stack storage report to inspect the Xian-specific paths:

bash
python3 ./scripts/backend.py storage-report

Use /bds_status to inspect the BDS worker, indexed head, spool size, and low-disk alerts.

Disk Pressure Runbook

Use this when xian node health, remote health.yml, /bds_status, or the storage report warns about low free space.

Operator response:

  1. identify whether pressure is from CometBFT data, Xian state, BDS spool, Postgres data, Docker cache, or logs
  2. preserve .cometbft/data, .cometbft/xian, and .bds.db unless you are deliberately restoring or rebuilding that component
  3. compact stale BDS spool files only after confirming the indexed BDS head covers those heights
  4. free Docker cache, rotated logs, or unrelated host files before deleting chain data
  5. if the host is structurally undersized, increase disk capacity or adjust pruning/retention policy before restarting normal load

Useful checks:

bash
python3 ./scripts/backend.py storage-report
uv run xian node health validator-1

Remote check:

bash
ansible-playbook playbooks/health.yml

BDS spool maintenance:

bash
uv run xian-bds-spool compact --offline

Important boundary:

  • deleting CometBFT history can make local replay and BDS rebuilds impossible for older heights
  • deleting .bds.db is a BDS database rebuild/import decision, not routine disk cleanup
  • deleting .cometbft/xian or current node-home state is a recovery-plan or snapshot-restore decision

Interpretation note:

  • current_block_height and height_lag are now derived from the latest committed node height even when no block is currently being executed
  • catching_up reflects actual indexing lag or spool backlog
  • queue_depth still matters operationally, but a nonzero queue by itself does not necessarily mean BDS is behind

Operational guidance:

  • on pruned nodes, local BDS reindex only works for heights the node still retains
  • on archival nodes, local BDS reindex can rebuild the full index directly from RPC
  • if neither local history nor spool is sufficient, use an archival RPC source or import a BDS snapshot from another node

Multi-Node Testing

Local multi-node consensus testing lives in xian-stack localnet:

bash
python3 ./scripts/backend.py localnet-init --nodes 4 --topology integrated --clean
python3 ./scripts/backend.py localnet-up --wait-for-health
python3 ./scripts/backend.py localnet-status
python3 ./scripts/backend.py localnet-down