Building a block-chain database the right way

We’ve audited several blockchain related projects having an application that polls bitcoind for new blocks and builds a parallel database of blocks and transactions. This is generally done in order to provide search capabilities, link transactions to user accounts, store auxiliary data for higher level protocols, or to provide block-chain intelligence. Almost all Bitcoin based services do it: web wallets, blockchain explorers, and other front-end clients. But almost all do it wrong. They asssume that blocks are added to the block-chain at regular 10 minutes intervals, and that block-chain reorganizations can only extend the length of the best-chain. This is not true, as there are several factors, either probabilistic, game-theoretic, protocol-mandated and network-connectivity dependent that may cause the block-chain to perform sudden changes, such as adding many blocks at once or performing long reorganizations.

Block-chain reorganization race condition

The most common assumption of block-chain cloning applications is that the bitcoind block-chain is “locked” somehow while the polling application makes several requests to identify the changes. The block-chain can asynchronously change its state and reorganize in-between RPC call and hidden race conditions may appear. Some polling algorithms try to detect a change on the tip and then use the tip height to gather more information assuming the block-chain hasn’t changed in-between. They end up building an block-chain which is generally coherent, but erratically may become incoherent.

Block-chain branch choosing criteria

The second most common misconception is that the Bitcoin protocol chooses the block-chain branch with the higher height (or length). This is false. The Bitcoin protocol chooses the block-chain branch with higher accumulated expected difficulty, often called chain-work (but don’t confuse it with the actual work done). So the best chain may (in theory) switch from a first branch to a second branch shorter branch if the accumulated difficulty of the second branch is higher. In practice, since reorganizations of no more than 53 blocks have ever occurred, this can only happen just after a difficulty change boundary, if there are two competing blocks at the boundary such as the time-stamps of the competing boundary block establishes two different expected difficulties for the following block. We call this event a difficulty-split event. The difficulty field (nBits) has a enough precision to distinguish two interval values with a 1 minute temporal difference. The rate of competing blocks (orphan rate) is about 3%, so the expected rate of a difficulty-split event (by chance) is about 1/2016 * 3/100. This is approximately 1 difficulty-split event every 450 days. Not very likely, but it can occur. But if we take into account that a party may maliciously try to generate a difficulty-split event, then the cost of the attack will be approximately equal to the current block reward, with a success probability that depends on the attacker’s total hashing power.

First time block-chain download and database incoherence

Another moment where an attack can take place is during first time of block-chain download: an attacker can easily build branches that reorganize at will during the interval where the difficulty is low, before the first checkpoint. Suppose that an attacker is connected to a bitcoind node that is the source node of a block-chain cloning application, and suppose that the attacker can force the application to rebuild its block-chain database. If the cloning algorithm does not take into account arbitrary asynchronous reorganization, then the attacker will be able to force the application to build a non-coherent block-chain. In a certain application we managed to build a non-coherent block-chain that could not be undone because the application could not travel the block-chain backwards, so the attack was persistent, even if the bitcoind node later synchronized with the network.

Ending block… atomic change?

The most realistic assumption applications take is that a block-chain branch of length N ending with block X cannot instantaneously mutate into a block-chain branch of length N ending with block Y (different from X) but with all the remaining blocks equal. This assumption is currently correct. In fact, to replace a block at height N with another block at the same height would require that the accumulated difficulty of the second block-chain branch is higher, but since the expected difficulty of block N is known at block (N-1), both X and Y have the same difficulty. Nevertheless, this situation can happen in more sophisticated setups:

For example, super-secure setup would be that the cloning software polls different unconnected instances of bitcoind and use a voting system to decide the best chain, in order to avoid targeted attacks to single bitcoind or a single point of failure. Then the voting result may change as if it were a last tip reorganization even if each instance will never perform such a reorganization.

Also there is the possibility that future versions of bitcoind decide which block to add to the best chain in a different way (e.g. by using the GHOST method). And there is the possibility that a node being connected is a modified node.
Suppose that a pool mining server node detects that two client miners solved different blocks at the same height and the second solved block received has a higher reward that the first (because of transaction fees). Then the mining node may switch the tip from the first to the second, for his own benefit, even if the best chain has the same accumulated difficulty.

Human intervention

Last, some cloning applications assume only reorganization at a maximum height will occur, such as only the last 10 blocks. This can be violated if a block-chain forking bug occurs and human intervention needs to resolve the conflict, such at the event that took place in August 2010 or in March 2013. The length of the reorganization in the first case was 53 blocks, and 31 blocks in the second case. But luckily the cloning method can be made generic so it doesn’t need to be bounded in the number of blocks.

Block-Chain cloning Java source code snippet

This is our proposed and simple method to clone the blockchain, which doesn not suffer from race conditions. We assume MyDatabase allows the application to manage the block-chain stored in a database and bitcoindRPC is allows the application to access a bitcoind node using the RPC interface.

        tipBlock = MyDatabase.GetBestChainTipBlock()
        while(1)
          {
            nextblockHash = bitcoindRPC.getBlockHash(tipBlock.getHeight()+1)
            if (nextblockHash != null)
              { // there is a next block
                nextBlock = bitcoindRPC.getBlock(nextblockhash);
                if (nextBlock.getParentHash() != tipBlock.getHash())
                  {
                    previousBlock = tipBlock.getPreviousBlock();
                    previousBlock.setNextBlock(null);
                    MyDatabase.destroy(tipBlock);
                    tipBlock = previousBlock;
                  }
                else
                  {
                    MyDatabase.add(nextBlock);
                    tipBlock.setNextBlock(nextBlock);
                    tipBlock = nextBlock;
                  }
          }
    [*]
    }

This code is both simple and easy to check for correctness. The only drawback is that it will detect one-block reorganizations (which we already said practically never happen) only when the following block is included (a delay of approximately 10 minutes).
To prevent this delay we add a second check of the tip block, inserted in the position marked by [*] in the previous snippet.

      else
        {
          // check if tip block has changes or new best chain is shorter (very rare)
          tipBlockHash = bitcoindRPC.getBlockHash(tipBlock.getHeight())
          if (tipBlockHash != tipBlock.getBlockHash())
            {
              previousBlock = tipBlock.getPreviousBlock();
              previousBlock.setNextBlock(null);
              Mydatabase.destroy(tipBlock);
              tipBlock = previousBlock;
            }
        }