Timber

Cutting Merkle Tree gas costs

Michael Connor
12 min readDec 12, 2019

--

Quick Learn

If you’re in a rush…
Here’s the repo:

Here’s a GIF of the ‘old’ on-chain Merkle Tree design (notice how much was being stored on-chain):

Here’s a GIF of the ‘new’ on-chain Merkle Tree design (a.k.a. Timber):

See how little on-chain storage there is now? Only the frontier and the root get stored.

If you’re in less of a rush, here’s a slowed down GIF of Timber:

I love GIFs.

If you’re not in a rush at all, I explain Timber below…

Motivation

I’ve been part of the team working on Nightfall for over a year now. It’s a set of tools for private transactions over Ethereum. We keep trying to reduce the cost of a private transfer (whilst preserving security and privacy), and this article explains one little development.

Keeping gas costs down is important when building anything on Ethereum. With private transaction protocols, the costliest of cost burdens are:

  • Elliptic curve operations
  • Storage
  • Hashing

We’re not going to talk about reducing the use of elliptic curve operations here. Nor will we talk about hashing costs². Timber tackles the less painful storage problem.

² Timber is agnostic to the hashing function which gets used within the Merkle Tree, although the analysis later in this article uses sha256 (because it’s cheap).

Why are Merkle Trees used?

Many private transaction protocols use Merkle Trees. Tokenised assets can be ‘wrapped’ in cryptographic commitments, and each of these commitments can be stored as a new leaf in the tree. We use a Merkle Tree structure so that we can ‘refer’ to a commitment at some point in the future; but without revealing to the world which commitment we’re actually referring to. I.e. we can spend a token privately, without revealing the token we’re spending. (Have a read around ‘zero knowledge set membership proofs’ if you’re unfamiliar with this).

Every time a new leaf (commitment) gets added to the tree, we ask the smart contract to recalculate the root of the tree, by hashing all the way up the ‘path’ from the leaf to the root.

What was the storage problem?

Before Timber, we previously stored all the nodes of the Merkle Tree within a smart contract.

The problem was (and is), that the Merkle Trees we use are MASSIVE. They contain 2³² leaves. That’s 4,294,967,296 leaves. That’s 8,589,934,591 tree nodes altogether!

Why so many leaves? Because more leaves means greater anonymity. It’s harder for outside observers to deduce which leaf is being referred to by a transactor, if there is a pool containing millions of leaves. I.e. it makes it harder to figure out what ‘private token’ commitment is being spent. If we were to reduce the number of leaves (a.k.a. the ‘Anonymity Set’), it might make it easier for analytics software to make inferences about what’s being transacted over the private transaction protocol. Users should be wary of small anonymity sets.

When we were previously storing every node of the Merkle Tree in the smart contract, we were storing between 1 and 33 new values on-chain every transaction (because the tree has height 32). Even in the cheapest case, we were overwriting 32 storage values. That meant users would have been paying between 180,000 gas ( = 32 nodes * 5,000 gas + 20,000 gas) and 660,000 gas ( = 33 nodes * 20,000 gas) every transaction, just in Merkle Tree storage (that’s between $0.72 and $2.64 assuming 200USD/ETH and 20Gwei/gas).

Not only was this storage an expensive component of a single transaction, its large variance wasn’t very nice. A transaction could have cost an extra couple of dollars just because of the ‘bad timing’ of a leaf’s position in the tree.

Timber provides a method for reducing unnecessary Merkle Tree storage costs each transaction. It also results in much less price volatility per transaction. Read below if you’re interested in the details.

Contents

  • Objectives
  • Summary
  • Technical Details
  • In the repo…
  • API
  • Gas Costs
  • What next?

Objectives

  1. Keep track of a Merkle Tree’s root on-chain;
  2. Minimise the gas cost of adding leaves to the Tree;
  3. Minimise the gas cost of updating the root;
  4. Ensure data availability;

Summary

Leaves are submitted to a MerkleTree smart-contract by users. Submitting multiple leaves in-bulk results in considerable gas cost savings per leaf.

Only the root and a small frontier of nodes is stored on-chain. New leaves are not stored on-chain; they’re emitted as events.

A local merkle-tree database (off-chain) is populated with the leaves and nodes of the Tree, based on NewLeaf events emitted by the smart-contract.

The database can then be queried, e.g. for sibling-paths in order to provide set-membership proofs to the smart-contract.

Disclaimer: Note that the code has not yet completed a security review and therefore we strongly recommend that you do not use it in production. We take no responsibility for any loss you may incur through the use of the code.

Technical Details

On-chain

The leaves of the tree are not stored on-chain; they’re emitted as events.

The intermediate-nodes of the tree (between the leaves and the root) are not stored on-chain.

Only a frontier array is stored on-chain (see the detailed explanation below).

Off-chain

We filter the blockchain for NewLeaf event emissions, which contain:

NewLeafEvent: {
leafIndex: 1234,
leafValue: ‘0xacb5678’,
root: ‘0xdef9012’
}

We then insert each leaf into a local mongodb database.

With this database, we can reconstruct the entire merkle tree.

Technical details

We consistently use the following indexing throughout the codebase:

We start with an empty frontier = [ , , , , ].

The frontier will represent the right-most fixed nodeValues at each level of the tree. So frontier[0] will be the right-most fixed nodeValue at level 0, and so on up the tree. By ‘fixed nodeValue’, we mean that the nodeValue will never again change; it is permanently fixed regardless of future leaf appends.

Inserting leaf 0

A user submits the 0th leaf (leafIndex = 0) to the MerkleTree smart contract.

We add it to leafIndex = 0 (nodeIndex = 15) in the contract’s local stack (but not to persistent storage, because we can more cheaply emit this leaf’s data as an event).

Let’s provide a visualisation:

We use the unusual notation 15.0 to mean “the nodeValue from the 0th update of nodeIndex 15”.

We now need to hash up the merkle tree to update the root. In order to do this, we need the nodeValues of the sibling-path of leafIndex = 0.

The 0th leaf is an easy case where the sibling-path nodes are always to the right of the leaf’s path:

Hashing up the tree is easy in this case; if a sibling-node is to the right, then it must never have been updated before, and hence must have nodeValue 0.

So it’s easy to update the tree:

We will only use the frontier to inject sibling-nodes which are to the left of a leaf’s path. More on that later.

Our updated tree can be visualised like this:

By 7.0, we mean “the nodeValue from the 0th update of nodeIndex 7”, etc.

Note that nothing has yet been stored in persistent storage on-chain.

Notice now, that the nodeValue 15.0 will never change in future. Notice also that when we come to insert a new leaf to leafIndex = 1, its sister-path will include nodeValue 15.0 on its left. Therefore, when we come to update the root to include the new leaf, we will need to left-inject nodeValue 15.0 into our hashing computation.

Now hopefully the purpose of the frontier starts to become clear. We will add nodeValue 15.0 to frontier[0] (persistent storage), so that we can later left-inject it into our hashing computation when we come to insert leafIndex = 1.

The smart contract emits the leaf value as an event NewLeaf, which is then picked-up by Timber's event listener and added to the mongodb.

That completes the insertion of leaf 0.

Inserting leaf 1

Let's add some more leaves (always appending them from left to right):

A user submits the 1th leaf (leafIndex = 1) to the MerkleTree smart contract.

We add it to leafIndex = 1 (nodeIndex = 16) in the contract's local stack (but not to persistent storage, because we can more cheaply emit this leaf's data as an event).

Let's provide a visualisation:

Note, we haven't yet recalculated the path from leafIndex = 1 (nodeValue = 16.0) to the root.

The above visualisation is a bit misleading, because most of this data wasn't stored in persistent storage. In actual fact all the smart contract can draw upon at this time is:

// Data known by the smart contract:frontier = [ 15.0,   ,   ,   ,   ];

In order to insert nodeValue 16.0 into the tree and update its path, we will need the nodeValues of the sibling-path of leafIndex = 1, and we will also need to know whether those sibling-nodes are on the left or the right of the path up the tree:

We can actually deduce the 'left-ness' or 'right-ness' of a leaf's path up the tree from the binary representation the leaf's leafIndex:

Notice how for `leafIndex = 1 = 0b0001` the path is on the right, left, left, left as we work up the tree? (Associate a binary `1` with `right` and a binary `0` with `left`, and you'll see a pattern for the 'left-ness' or 'right-ness' of the path up the tree from a particular leafIndex):

Now we can hash up the tree, by injecting the sister-path to the opposing 'left, right, right, right' positions (as indicated by the arrows below):

We can visualise the tree after updating the path (but remember the smart contract isn't actually storing anything except the frontier!):

Now we've udpated the tree, how do we decide which nodeValue to add to the frontier?

We use the following algorithm to decide which index (or storage 'slot') of the frontier to update:

After adding the 1st leaf and updating the root, the smart contract now has stored:

That's very lightweight in terms of storage costs!

Future inserts

We can project forward and show how the frontier will progress as new leaves are added:

In the repo…

./docker-compose.yml ← ‘startup’ configuration
|
./merkle-tree/ ← the ‘main’ microservice of this repo
|
src/
|
db/ ← services for managing the merkle-tree mongodb
|
routes/ ← API routes for externally accessing this
microservice & db
|
filter-controller.js ← Ethereum event filter
|
merkle-tree-controller.js ← merkle-tree update &
calculation controller
|
./deployer/ ← example ‘stub’ microservice, demonstrating how one
would use ‘timber’ as part of a Dapp
|
contracts/ ← example MerkleTree.sol contract for efficient
merkle-tree updates

Example Usage:

Add leaves to the tree:

Submit a leaf / leaves to the MerkleTree.sol smart contract. (We provide test scripts to simulate this).

Interact with the merkle-tree microservice through its API. A postman collection (for local testing) is provided in the repo.

Start the event filter:

Send a start request to start the merkle-tree’s event filters for NewLeaf and NewLeaves events. Any ‘new leaf’ events will be picked up by the filters, and cause the new leaf data to be inserted into the mongodb.

Update the merkle-tree database:

Send an update request to the merkle-tree microservice. Given the leaves now stored in the mongodb, this update command will recalculate all of the intermediate nodes from the leaves to the root, and store all of these new or updated nodes in the mongodb.

Get information from the merkle-tree database:

Send a GET request for (e.g.) the siblingPath for a particular leaf.

API

At the time of writing, Timber has the following route functionalities:

insertLeaf
getLeafByLeafIndex
getLeafByValue
insertLeaves
getLeaves (by indices, a range of indices, or all)
checkLeaves (to check the integrity of the db)
countLeaves

insertNode
getNodeByNodeIndex
getNodeByValue
updateNodeByNodeIndex
getNodes
updateNodes
countNodes
getRoot

getMetadata
getContractAddress
insertContractAddress
getContractInterface
insertContractInterface
getLatestLeaf
updateLatestLeaf
getLatestRecalculation
updateLatestRecalculation

startEventFilter
update
getSiblingPathByLeafIndex
getPathByLeafIndex

Gas Costs

The following gives gas cost measurements for inserting leaves into the MerkleTree.sol contract. 2^n

insertLeaf

  • treeHeight = 32
  • 32 sha256() hashes. We use assembly to minimise the cost of calling the sha256 precompiled contract.
  • 1,024 leaves inserted, one-at-a-time, from left to right.
  • Notice how there is a jump in cost every 2**n leaves, when a new level of the frontier is written to for the first time. We could avoid these jumps by initialising the frontier at deployment, through the contract’s constructor() .
  • The very first transaction costs the most, due to initialising of the leafCount and latestRoot parameters.
  • Gas values shown include the 21,000 gas transaction cost.

insertLeaves

  • treeHeight = 32
  • We explore the cost of inserting leaves in batches of varying sizes (doubling the batch size each time).
  • We insert each batch into an empty tree each time.
  • Gas values shown include the 21,000 gas transaction cost each time.
  • The gas cost per leaf reduces asymptotically with batch size.
  • Batches of 128 leaves and over appear to start levelling out around 10,000 gas per leaf.

We also explore the cost of inserting a fixed total of 512 leaves, but in batches of varying sizes (doubling the batch size each time).
I.e.:

  • 512 transactions of batches of 1 leaf
  • 256 transactions of batches of 2 leaves
  • 64 transactions of batches of 4 leaves
  • 1 transaction of a batch of 512 leaves
  • We begin each set of transactions with an empty tree each time.
  • Gas values shown have been adjusted to exclude the 21,000 gas transaction cost each time. This helps hone in on the ‘internal contract’ gas costs, but massively understates the costs of multiple transactions of small batches (e.g. we’re understating the cost of “512 transactions of batches of 1” by over 10MGas in the below graph).
  • The gas cost for inserting a given number of leaves reduces asymptotically as batch size increases.
  • Beyond batches of 128 leaves, the gas savings begin to level out.

What next?

You might ask “Why don’t we take the Merkle Tree off-chain entirely, and just provide a zero knowledge proof that the root has been updated correctly?”.

We could. But there are some compromises. I might discuss these compromises in a future article…

--

--

Michael Connor

Blockchain, Cryptography & Mathematics | Applied Cryptographer @ Aztec | Any opinions are my own.