All right, now that you know my conclusion, let’s see how to get there with data. First, some background.
Let me give very quick overview of Bitcoin in this context. (There are many comprehensive overviews elsewhere.) Bitcoin is an ongoing ledger of transactions of along the lines of “This guy had 5 Bitcoins, and he sent 2 of them to that guy. Now he has 3 Bitcoins.” The transaction ledger is public, which prevents people from sending coins they already sent – the ledger rejects invalid transactions. Everyone on the network has a copy of the ledger. New transactions are appended to the ledger in the form of blocks containing a bunch of transactions. The ledger is formally called the blockchain, because it’s a chain of blocks which contain descriptions of transactions. There’s a bunch of cryptography which prevents people from making transactions involving coins they don’t control, but we don’t care about that for now. What counts here is how to add new blocks to the blockchain.
Anybody can add new blocks, but you don’t want any one person to have full control over this. The process of adding new blocks is called mining because the person who adds a block is currently rewarded with 25 Bitcoins. Each miner puts a valid block together and attempts to append it to the end of the chain. They do this by running a random number generator on their computer which spits out random numbers at a colossal rate, and the first miner whose random number generator spits out an appropriate number is the miner whose block is appended to the chain. Then everybody starts work on the next block, hoping to be the one to win the RNG lottery this time. (The details of this process are a bit involved, but the required output of the RNG is tuned such that a block will be appended about every 10 minutes on average. The RNG involves a cryptographic hash, which is why the number-generation process is called hashing.)
The “selfish miner” attack, proposed by Ittay Eyal and Emin Gun Sirer of Cornell, is a way that a dishonest miner could finesse the protocol and win more blocks than their percentage of the overall hash rate would indicate. In this attack, a miner finds a new block but doesn’t immediately distribute it to the network so everyone can get to work on the next one. Instead, the miner begins work on the block which would follow their unreleased block. If they find the next block in the chain, they keep going. When they see that the rest of the miners have found a block, the selfish miners quickly release all the work they’ve done – which might be several blocks, which means their blocks get added to the chain because in the event of a conflict the longest chain wins. The rest of the miners never had a shot at those previously hidden blocks, so some of their hashing power was wasted. The selfish miners could thus generate blocks disproportionately faster than their percentage of the total hashing power would indicate. If the selfish miners can generate more than one block before the next fair-mined block is found, they always win because they have the longer chain.
Selfish miners can do even better if they can manage to win more of the “ties” where they find the first block but the fair miners post a block before the selfish miners find their second block. If the selfish miners are able to quickly release their single block before the honest miner’s block can propagate through the network, they’ll always be at an advantage relative to the honest miners. This takes some work, because they have to have lots of nodes which can react quicker than honest blocks can propagate. But even if the selfish miners can only get their single blocks accepted half the time, they’ll still come out ahead if they have more than 1/4 of the hashing power. And if fact even if the selfish single blocks never beat the honest blocks in their distribution throughout the network, the selfish miners will still come out ahead if they have more than 1/3 of the total hashing power. A salient question is thus how to detect whether or not such an attack is in progress.
Note that mining is essentially a lottery. The miners generate a ton of random hashes, and when one of them happens to win, that block is valid and may be posted to the blockchain. Each hash is essentially a ticket to a lottery with 1 in a quadrillion odds. The creation of each new block is a new lottery, but the odds are the same and there’s no pause between blocks. Thus, each win is an independent event. The number of independent random events in a given time interval is Poisson distributed, and the time between events in a Poisson process is exponentially distributed.
But dishonest mining relies on quickly publishing as soon as one of those honestly-generated blocks is found. This means the creation of the next block is no longer uncorrelated with the creation of the block before. Block creation would not be independently distributed. Thus a dishonest miner will appear in the statistics of new blocks. If blocks are being generated in clusters, there’ll be a spike in the distribution of time-between-blocks near t = 0, in comparison to the smooth falloff of the exponential distribution. This t = 0 spike would be a characteristic signature of the presence of dishonest mining.
So let’s take a look at the exponential distribution for a rate of 1 per 10 minutes (the nominal block creation rate):
λ is the average rate at which events happen, here 0.1 per minute. Of course with independent events something is just as likely to happen in minute number 35 as it is minute 1, but the exponential distribution measures the probability of the when the first event will happen after you start your timer. This happening in minute 35 means it didn’t happen in any of the previous minutes, which is pretty unlikely, so the exponential distribution trails off as you’d expect. Honest mining should follow this distribution. Dishonest mining involves releases of blocks immediately subsequent to a previous block, so more events will happen in the first minute than the exponential distribution suggests.
To test this against the actual blockchain as it’s created, I wrote a Python script to monitor the timing of blocks as they’re released into the blockchain. I let this script run for 202 blocks (a little over a day), and if you want to review the data yourself here’s the CSV file. (Note the ‘messiness’ paragraph below for caveats.) I have binned out this data into one-minute increments and plotted a histogram. (The technically knowledgeable will note that Bitcoin block rate is not strictly constant because of the growth of the hash rate. It varies by up to about 10% over 2-week intervals, though not nearly by so much over the shorter interval I measured. In this spike-detection context it doesn’t matter terribly much either way.) Here’s the histogram of the actual data, with a bin size of 1 minute:
Now let’s figure out the exponential distribution. To be as clean as possible, I’m converting the actual measured time-between-blocks into a fraction of the average time between blocks. Ie, “2” on the axis means “2 times the average time between blocks over the interval I measured”. I’m overlaying this with an exponential distribution of rate 1, and scaling the whole thing such that the areas under the curve and the histogram are both equal to the total number of blocks in the sample. The bins are 1/5 of the average block time. The result is below:
What do we see? The histogram matches the exponential distribution pretty well. Most crucially, the first bin is not notably higher than the distribution predicts. According to the exponential distribution, some 18.12% of blocks should be found in the first 1/5 of the average block time. In fact, we see that 31 out of 202 blocks, or about 15.3%, are actually found in that period of time. Given the small sample size, we can’t really be that precise with the percentage. Using the binomial confidence interval, we can only say that with 95% confidence something between 11% and 21% blocks are actually being created in that first 1/5 of the average block time. But the expected 18.12% is comfortably within that range. Given that we expect about 18% in a fair mining scenario and we can rule out anything greater than about 21% with pretty high confidence, concerned Bitcoiners can perhaps breathe a little easier.
Some messiness: I calculated the times based on the arrival time of the blocks at my computer, with blocks being received through the blockchain.info websockets API. This assumes that the arrival time of blocks at my computer is the same as the posting of blocks to the chain, which is a possible source of systematic error. However, the interface is said to be low-latency, and provided the latency is significantly lower than 1 minute the effect should be minimal given our bin size. (Timing data is built into the blocks, but it’s wildly inconsistent for reasons which are not clear to me. I have not used it.) Additionally, there are a few cases where I apparently either missed blocks or received duplicates. My programming skill is likely at fault. This only happened a few times, and I rejected any time intervals involving non-consecutive blocks. Finally, the sample size is not terribly large, so this measurement is not terribly sensitive and could potentially miss small-scale selfish mining efforts.
Nonetheless, provided the potential sources of error in this measurement are not causing spurious results, we can thus currently conclude that the timing behavior of newly mined blocks is consistent with a blockchain that is being mined with fair methods.
I encourage interested parties to freely repeat and refine this test in the future to see if the situation changes. If you’re interested enough to want to keep a continuous watch, one easy method would be to keep a running average of the time between the last ~1000 blocks, and see if the number of blocks separated by less than 1/5 of that average exceeds about 225. If it did, it would not be a definite proof of selfish mining, but it would be a >99% statistical anomaly.
[DISCLAIMER: I am not involved with Bitcoin, I don’t own any Bitcoin, and I am not interested in changing either of these things at the moment. Proselytization of the “Bitcoin is awesome!” or “Bitcoin is horrible!” varieties is going to be wasted, so try to avoid it in the comments. But I’m fascinated by the Bitcoin system from a mathematical and computational perspective, and am relatively well-versed in its technical details. Whether it’s a good or bad idea from a practical or economic perspective I leave to the early adopter crowd to find out.]
Is Bitcoin Currently Experiencing a Selfish Miner Attack? by Matthew Springer is licensed under a Creative Commons Attribution 4.0 International License.