Skip to content

Parallel Block Execution

Xian can speculatively execute block transactions in parallel without changing the canonical result of the block.

The important boundary is this:

  • xian-contracting does not execute multiple contracts concurrently inside a single Python process
  • xian-contracting now owns the native speculative execution controller
  • xian-abci wraps that controller for block processing, reward shaping, and node-facing metrics
  • final accepted state must stay equivalent to normal serial execution in block order

That makes parallel execution a node-side optimization, not a change to consensus semantics.

Where The Boundary Lives

Inside xian-contracting, each in-process execution is guarded by a lock. That runtime mutates process-global Python import hooks and module caches, so it does not try to run multiple contract executions concurrently inside one Python process.

Instead, xian-contracting exposes a native speculative controller that uses separate worker processes for speculative execution. In the block path, xian-abci uses that controller with a worker runtime built from ContractingClient, Driver, TxProcessor, and the optional rewards handler against the same committed LMDB state snapshot.

So the model is:

  • one block still has one canonical transaction order
  • workers speculate independently on the last committed state
  • the main process decides which speculative results are safe to accept

How It Works

At a high level:

  1. CometBFT finalizes the block contents and order.
  2. If parallel execution is enabled and the block is large enough, the native controller in xian-contracting builds a speculative wave and sends those transactions to a worker pool.
  3. Each worker executes its assigned transaction with auto_commit=false, collects access metadata, and returns a proposed result.
  4. The main process checks the speculative results in canonical order and accepts the conflict-free prefix of that wave.
  5. If the tail conflicts with earlier accepted work, the node either respeculates that tail against the updated overlay or falls back to serial execution when the remaining tail is no longer worth speculating.
  6. Accepted speculative results are applied in canonical order.
  7. After the block is complete, the node commits the final block state through the normal LMDB batch write path.

The critical point is that speculation happens first, but acceptance still happens in normal block order.

Worked Example

Consider a canonical block ordered like this:

  1. tx1: currency.transfer(alice -> bob, 10)
  2. tx2: currency.transfer(carol -> dave, 20)
  3. tx3: orders.place(id=1) writing orders:1
  4. tx4: orders.summary() scanning orders: with Hash.all()

The controller can speculate all four in one wave because nothing about the front of the queue makes that impossible up front.

The worker results might look like this:

  • tx1 reads and writes Alice and Bob balance keys
  • tx2 reads and writes Carol and Dave balance keys
  • tx3 writes orders:1
  • tx4 records a prefix read on orders:

Acceptance still walks the block in order:

  • tx1 is accepted
  • tx2 is accepted
  • tx3 is accepted
  • tx4 is rejected from that wave because its prefix read overlaps with the earlier accepted write to orders:1

At that point, the node does not reorder the block. It keeps tx1, tx2, and tx3 exactly where they are, then handles the tail:

  • it can respeculate tx4 against the updated overlay in a later wave
  • or it can execute tx4 serially if the remaining tail is no longer worth speculating

Either way, the final result must still match normal serial execution of tx1 -> tx2 -> tx3 -> tx4.

What Metadata Is Tracked

The safety model depends on deterministic access tracking.

For each transaction, the runtime records:

  • exact reads: keys loaded through normal state reads
  • exact writes: keys the transaction wants to set
  • prefix reads: collection scans such as Hash.all() that depend on every key under a prefix
  • additive writes: special commutative reward deltas tracked separately from normal writes

This metadata comes from the runtime/storage layer:

  • Driver.get(...) tracks exact reads
  • collection scans record prefix reads on the scanned prefix
  • successful execution returns the transaction write set
  • reward outputs are separated into additive deltas so they can be merged safely when they are purely incremental

When A Speculative Result Is Rejected

The main process falls back to serial execution when a speculative result is no longer safe relative to earlier accepted transactions.

Current fallback conditions include:

  • the same sender already appeared earlier in the accepted block path
  • a key this transaction read was written earlier
  • a key this transaction writes was read or written earlier
  • a tracked prefix scan overlaps with earlier accepted writes
  • a normal write overlaps with earlier additive writes
  • an additive write overlaps with earlier normal writes
  • the worker failed to return a usable speculative result

This is why parallel execution is described as speculative rather than concurrent mutation.

Prefiltered vs Fallback

The runtime exposes two different operator-facing counters because they mean different things:

  • serial_prefiltered: the controller chose not to speculate a remaining head transaction at all, usually because there were no longer at least two safe front-of-queue candidates worth putting into a wave
  • serial_fallbacks: the transaction was part of speculative handling, but the accepted-prefix checks or a worker failure forced it back onto the serial path

In practice, same-sender reuse near the front of the remaining queue often shows up as prefiltering, while read-after-write tails, write conflicts, and prefix-scan conflicts are more likely to show up as speculative fallback or later-wave respeculation.

Why It Is Real Parallelism

This is real parallel execution, but not unsafe shared-state concurrency.

  • multiple transactions can execute at the same time in separate worker processes
  • one in-process Executor still executes one transaction at a time
  • final acceptance stays serial-equivalent

So Xian now uses multiple CPU cores for speculative contract execution, while keeping the correctness model tied to canonical block order.

The Reward-Delta Exception

Normal shared writes are treated conservatively as conflicts.

The current explicit exception is reward accounting. Reward outputs are modeled as additive deltas, not ordinary overwrites. Two transactions can both add to the same recipient balance and still be accepted speculatively because the merge operation is deterministic addition.

But if another transaction reads that balance, or directly overwrites it, the executor falls back to serial execution.

Why This Is Safe

This design is consensus-safe because it preserves serial semantics:

  • canonical block order never changes
  • validators can mix enabled and disabled parallel posture and still converge on the same final block result
  • speculative workers do not commit their writes to disk
  • accepted speculative results are revalidated against earlier accepted writes
  • conflicting transactions are re-run serially on the latest state
  • if the speculative executor itself fails, the node falls back to ordinary serial block execution

In other words, the node is allowed to guess in parallel, but it is only allowed to commit what is still correct in serial order.

Representative Throughput

The runtime now includes a dedicated benchmark harness in xian-contracting/tests/performance/benchmark_parallel_tps.py.

Representative local numbers from June 1, 2026 were collected on an Apple M1 development machine with 8 logical CPUs, Python 3.14.5, 4 parallel workers, 5 iterations per scenario, warmed workers, and the default guardrails of 4 speculative waves, 25% minimum wave acceptance, and 8 low-acceptance minimum wave size.

Command:

bash
uv run python tests/performance/benchmark_parallel_tps.py --scenario all --iterations 5 --workers 4 --markdown
WorkloadTxRoundsSerial TPSParallel TPSSpeedupSpeculative outcome
Independent CPU-heavy writes25650,000341203.55x256 / 256 accepted
Independent light writes2561,0003009383.14x256 / 256 accepted
Small independent block810,0001214323.63x8 / 8 accepted
Same sender, independent keys12810,0001171161.01x128 prefiltered to serial
Hot counter contention12810,000126970.78x1 accepted, 127 guardrail fallbacks
Many writes, one scan at tail12810,0001264353.44x127 accepted, 1 prefiltered
Alternating writes and scans645,0001871480.79x1 accepted, 63 guardrail fallbacks
Mostly independent, periodic hot counter12810,0001251030.82x9 accepted, 119 guardrail fallbacks

These numbers are execution-engine throughput, not full end-to-end network TPS. They compare the serial and speculative execution paths after worker warmup; they should not be quoted as a guaranteed validator TPS figure.

The result is workload-dependent:

  • parallel execution is clearly useful when a block contains many transactions from different senders that touch disjoint state
  • it still helps when broad reads such as Hash.all() are pushed to the tail of a block, because the independent prefix can be accepted first
  • same-sender runs naturally collapse back to serial execution and are roughly neutral
  • hot shared state and alternating scan/write patterns are worse than serial execution, even with guardrails, because the node pays for failed speculation before serializing the tail

That makes speculative parallel execution worth keeping as a configurable node-side feature, especially for high-throughput workloads with independent accounts or sharded contract state. It should remain rollout-managed rather than blindly enabled for every validator profile. Operators should monitor speculative_accepted, serial_prefiltered, serial_fallbacks, and guardrail_fallbacks; if fallback-heavy blocks dominate, serial execution is the better posture for that workload.

One important measurement boundary: this benchmark uses custom contract functions, so the node's conservative access estimator cannot know most shapes ahead of time. Built-in known shapes, such as common token transfer and approve patterns, can be prefiltered or staged earlier by the ABCI wrapper. Unknown contract shapes are still safe because the runtime records actual reads, writes, and prefix reads before accepting speculative results.

What It Does Not Do

Parallel block execution does not:

  • make xian-contracting multithreaded in-process
  • allow naive concurrent mutation of shared state
  • weaken deterministic execution requirements
  • let validators accept different speculative winners

All validators still have to end the block with the same final state and the same app_hash.