Execution in Libra Core

The execution component utilizes the Move virtual machine to execute a block of transactions. The execution component’s job is to coordinate execution of a block of transactions and maintain a transient state that can be voted upon by consensus. Please see our documentation and share your feedback.

4 Likes

The Libra blockchain whitepaper states the following:

The sparse Merkle tree approach to computing authenticators allows sharding the database across multiple machines (which increases capacity) or processing updates in parallel (which increases throughput)

However, even after reading the documentation I’m failing to see how a versioned database design (with ledger states …, S_{i-1}, S_i, S_{i+1}, … signed by validators) can support parallel execution in the future, especially in a scenario where two updates to be processed in parallel have some implicit dependency (for instance, transaction A sends funds from account 0x12 to 0x34, and transaction B sends funds from account 0x34 to 0x56. but tx B would fail due to insufficient balance unless 0x34 has the funds from tx A).

I get that (from Implementation Details section) we can “batch” together multiple uncommitted transactions, but this does not seem like parallel execution to me, especially since by committing block E we discard conflicting blocks B, C, D (in Overview).

Some clarification would be very helpful. Thanks!

2 Likes

You are right that sometimes transactions have dependencies and they have to be processed sequentially. However in practical it’s reasonable to assume that dependent transactions are not common and most of transactions do not conflict with each other. Therefore it’s possible to collect a batch of n transactions T_{i+1}, T_{i+2}, ..., T_{i+n}, execute them against S_i in parallel. Then each of these transactions can generate a read set and a write set. By looking at the read/write sets, we can check if some transaction reads the output of some previous transaction. If they do not conflict, the write sets we generated are the final writes, otherwise we re-execute them against newer state. We might not be able to fully parallelize everything but it could improve the throughput significantly. Especially for cheap transactions like payments, majority of the execution time is spent in signature verification, fetching data from DB and the parallelization could be very effective. Similarly, once we have all these write sets, we can update the sparse Merkle trees in parallel.

In addition, as a further optimization, if we run a leader-based consensus algorithm, it’s possible for the leader to order/choose transactions in a block proposal to maximize parallelization.

All these are not implemented in current code yet, but we have done some experiment in the past showing that some parallelization can lead to significant throughput improvement.

2 Likes

Can you explain how this check would be performed (what it means to “read the output”)? I was under the impression that a transaction’s parameters are entirely user-provided.
I guess in a parallel execution setting, you can discard failed transactions and potentially include them in the next block to be re-evaluated.

On a more general note, I’m curious why Libra decided not to use multi-leader BFT (perhaps that’s a better question for the Consensus category).

2 Likes

Hi @ShangyanLi, as an example, let’s say the initial state is S_0. The next incoming block has three transactions T_1, T_2, T_3 and we want to execute them. Before executing them we don’t know which accounts they will read and write.

We can execute all three of them in parallel against state S_0. After this is done we have

  • T_1 reads from account A, B, writes to account B.
  • T_2 reads from account A, C, writes to account C.
  • T_3 reads from account C, D, writes to account C, D.

Once we have these it’s easy to see that T_1 and T_2 are good, but the value we read to execute T_3 is not up to date, since the values of account C are different in S_0 and S_2 and T_3 is supposed to be executed against S_2. So we’ll need to execute T_3 again based on S_2. This way we can usually execute a bunch of transactions in parallel and re-execute a few of them if necessary.

On a more general note, I’m curious why Libra decided not to use multi-leader BFT (perhaps that’s a better question for the Consensus category).

Feel free to post in the consensus category :slight_smile:

4 Likes

@wqfish makes sense, the example covers exactly the scenario I had in mind. Thanks for clarifying!

2 Likes

I’d like to better understand the difference between a block and a chunk in the Execution Engine.

It seems like the main difference is that a chunk of transactions comes with quorum cert for a ledger version.

A block of transactions is executed first and then the quorum cert is generated during a commit.

Is this correct?

1 Like

Hi @zmanian. Yeah that’s right! In the normal mode the system always operates on a block of transactions at a time – execute a block then vote for it. Once some block gets enough votes we will commit it.

Occasionally the node might fall behind, for example due to a short network outage. If other nodes have all moved forward to some newer version, this node would need to pull these chunks of transactions. Since other nodes have committed them, it means other nodes have agreed on these transactions and the ledger info object and the signatures that come with the chunk is the proof. This node would just need to execute and commit them as long as the execution result is same as what it gets from other nodes (which should be the case unless there are bugs).

2 Likes