Built on Facts

Probably not.

All right, now that you know my conclusion, let’s see how to get there with data. First, some background.

Let me give very quick overview of Bitcoin in this context. (There are many comprehensive overviews elsewhere.) Bitcoin is an ongoing ledger of transactions of along the lines of “This guy had 5 Bitcoins, and he sent 2 of them to that guy. Now he has 3 Bitcoins.” The transaction ledger is public, which prevents people from sending coins they already sent – the ledger rejects invalid transactions. Everyone on the network has a copy of the ledger. New transactions are appended to the ledger in the form of blocks containing a bunch of transactions. The ledger is formally called the blockchain, because it’s a chain of blocks which contain descriptions of transactions. There’s a bunch of cryptography which prevents people from making transactions involving coins they don’t control, but we don’t care about that for now. What counts here is how to add new blocks to the blockchain.

Anybody can add new blocks, but you don’t want any one person to have full control over this. The process of adding new blocks is called mining because the person who adds a block is currently rewarded with 25 Bitcoins. Each miner puts a valid block together and attempts to append it to the end of the chain. They do this by running a random number generator on their computer which spits out random numbers at a colossal rate, and the first miner whose random number generator spits out an appropriate number is the miner whose block is appended to the chain. Then everybody starts work on the next block, hoping to be the one to win the RNG lottery this time. (The details of this process are a bit involved, but the required output of the RNG is tuned such that a block will be appended about every 10 minutes on average. The RNG involves a cryptographic hash, which is why the number-generation process is called hashing.)

The “selfish miner” attack, proposed by Ittay Eyal and Emin Gun Sirer of Cornell, is a way that a dishonest miner could finesse the protocol and win more blocks than their percentage of the overall hash rate would indicate. In this attack, a miner finds a new block but doesn’t immediately distribute it to the network so everyone can get to work on the next one. Instead, the miner begins work on the block which would follow their unreleased block. If they find the next block in the chain, they keep going. When they see that the rest of the miners have found a block, the selfish miners quickly release all the work they’ve done – which might be several blocks, which means their blocks get added to the chain because in the event of a conflict the longest chain wins. The rest of the miners never had a shot at those previously hidden blocks, so some of their hashing power was wasted. The selfish miners could thus generate blocks disproportionately faster than their percentage of the total hashing power would indicate. If the selfish miners can generate more than one block before the next fair-mined block is found, they always win because they have the longer chain.

Selfish miners can do even better if they can manage to win more of the “ties” where they find the first block but the fair miners post a block before the selfish miners find their second block. If the selfish miners are able to quickly release their single block before the honest miner’s block can propagate through the network, they’ll always be at an advantage relative to the honest miners. This takes some work, because they have to have lots of nodes which can react quicker than honest blocks can propagate. But even if the selfish miners can only get their single blocks accepted half the time, they’ll still come out ahead if they have more than 1/4 of the hashing power. And if fact even if the selfish single blocks never beat the honest blocks in their distribution throughout the network, the selfish miners will still come out ahead if they have more than 1/3 of the total hashing power. A salient question is thus how to detect whether or not such an attack is in progress.

Note that mining is essentially a lottery. The miners generate a ton of random hashes, and when one of them happens to win, that block is valid and may be posted to the blockchain. Each hash is essentially a ticket to a lottery with 1 in a quadrillion odds. The creation of each new block is a new lottery, but the odds are the same and there’s no pause between blocks. Thus, each win is an independent event. The number of independent random events in a given time interval is Poisson distributed, and the time between events in a Poisson process is exponentially distributed.

But dishonest mining relies on quickly publishing as soon as one of those honestly-generated blocks is found. This means the creation of the next block is no longer uncorrelated with the creation of the block before. Block creation would not be independently distributed. Thus a dishonest miner will appear in the statistics of new blocks. If blocks are being generated in clusters, there’ll be a spike in the distribution of time-between-blocks near t = 0, in comparison to the smooth falloff of the exponential distribution. This t = 0 spike would be a characteristic signature of the presence of dishonest mining.

So let’s take a look at the exponential distribution for a rate of 1 per 10 minutes (the nominal block creation rate):

\displaystyle f(x;\lambda) = \mathrm \lambda e^{-\lambda x}

Exponential distribution for a rate of 0.1 per minute, x-axis in minutes

Exponential distribution for a rate of 0.1 per minute, x-axis in minutes

╬╗ is the average rate at which events happen, here 0.1 per minute. Of course with independent events something is just as likely to happen in minute number 35 as it is minute 1, but the exponential distribution measures the probability of the when the first event will happen after you start your timer. This happening in minute 35 means it didn’t happen in any of the previous minutes, which is pretty unlikely, so the exponential distribution trails off as you’d expect. Honest mining should follow this distribution. Dishonest mining involves releases of blocks immediately subsequent to a previous block, so more events will happen in the first minute than the exponential distribution suggests.

To test this against the actual blockchain as it’s created, I wrote a Python script to monitor the timing of blocks as they’re released into the blockchain. I let this script run for 202 blocks (a little over a day), and if you want to review the data yourself here’s the CSV file. (Note the ‘messiness’ paragraph below for caveats.) I have binned out this data into one-minute increments and plotted a histogram. (The technically knowledgeable will note that Bitcoin block rate is not strictly constant because of the growth of the hash rate. It varies by up to about 10% over 2-week intervals, though not nearly by so much over the shorter interval I measured. In this spike-detection context it doesn’t matter terribly much either way.) Here’s the histogram of the actual data, with a bin size of 1 minute:

Histogram of times between consecutive blocks, in seconds

Histogram of times between consecutive blocks, in minutes

Now let’s figure out the exponential distribution. To be as clean as possible, I’m converting the actual measured time-between-blocks into a fraction of the average time between blocks. Ie, “2″ on the axis means “2 times the average time between blocks over the interval I measured”. I’m overlaying this with an exponential distribution of rate 1, and scaling the whole thing such that the areas under the curve and the histogram are both equal to the total number of blocks in the sample. The bins are 1/5 of the average block time. The result is below:

Histogram and exponential distribution, x-axis in minutes

Histogram and exponential distribution, x-axis in fractions of average block time

What do we see? The histogram matches the exponential distribution pretty well. Most crucially, the first bin is not notably higher than the distribution predicts. According to the exponential distribution, some 18.12% of blocks should be found in the first 1/5 of the average block time. In fact, we see that 31 out of 202 blocks, or about 15.3%, are actually found in that period of time. Given the small sample size, we can’t really be that precise with the percentage. Using the binomial confidence interval, we can only say that with 95% confidence something between 11% and 21% blocks are actually being created in that first 1/5 of the average block time. But the expected 18.12% is comfortably within that range. Given that we expect about 18% in a fair mining scenario and we can rule out anything greater than about 21% with pretty high confidence, concerned Bitcoiners can perhaps breathe a little easier.

Some messiness: I calculated the times based on the arrival time of the blocks at my computer, with blocks being received through the blockchain.info websockets API. This assumes that the arrival time of blocks at my computer is the same as the posting of blocks to the chain, which is a possible source of systematic error. However, the interface is said to be low-latency, and provided the latency is significantly lower than 1 minute the effect should be minimal given our bin size. (Timing data is built into the blocks, but it’s wildly inconsistent for reasons which are not clear to me. I have not used it.) Additionally, there are a few cases where I apparently either missed blocks or received duplicates. My programming skill is likely at fault. This only happened a few times, and I rejected any time intervals involving non-consecutive blocks. Finally, the sample size is not terribly large, so this measurement is not terribly sensitive and could potentially miss small-scale selfish mining efforts.

Nonetheless, provided the potential sources of error in this measurement are not causing spurious results, we can thus currently conclude that the timing behavior of newly mined blocks is consistent with a blockchain that is being mined with fair methods.

I encourage interested parties to freely repeat and refine this test in the future to see if the situation changes. If you’re interested enough to want to keep a continuous watch, one easy method would be to keep a running average of the time between the last ~1000 blocks, and see if the number of blocks separated by less than 1/5 of that average exceeds about 225. If it did, it would not be a definite proof of selfish mining, but it would be a >99% statistical anomaly.

[DISCLAIMER: I am not involved with Bitcoin, I don't own any Bitcoin, and I am not interested in changing either of these things at the moment. Proselytization of the "Bitcoin is awesome!" or "Bitcoin is horrible!" varieties is going to be wasted, so try to avoid it in the comments. But I'm fascinated by the Bitcoin system from a mathematical and computational perspective, and am relatively well-versed in its technical details. Whether it's a good or bad idea from a practical or economic perspective I leave to the early adopter crowd to find out.]

Creative Commons License
Is Bitcoin Currently Experiencing a Selfish Miner Attack? by Matthew Springer is licensed under a Creative Commons Attribution 4.0 International License.

Comments

  1. #2 redfacedquark
    UK
    January 11, 2014

    > because in the event of a conflict the longest chain wins

    Not quite, the longest chain is a confusing way of saying the chain with the most work on it. Not sure if you were simplifying or if that affects your calculations. Everything else seems spot on though. Wel done and glad you find bitcoin interesting, same here. Now if you could just reduce ghash’s hash rate we’d love you for ever.

    Will you release the source so someone could polish and automate the monitor please?

  2. #3 Matt Springer
    January 11, 2014

    Sure, he’s a quick pastebin of the Python source: http://pastebin.com/fmPHGA47

    It probably needs a lot of polish, it’s pretty crude but it works. How you analyze the resulting time-between-blocks data is up to you. I used Mathematica, but don’t have any real ‘source’ for that – I just used the built-in histogram function.

  3. #4 Pablo
    January 11, 2014

    For known pools, I guess the simplest way to find out if they are dishonest is to subscribe a miner and check what blocks you are hashing vs the known public blockchain.

    In the case of the tie that you describe, a quick release and propagation of the block by the selfish miner may only by a temporary advantage. As soon as a node gets both blocks, the block with more work (lower hash) wins in this node.

  4. #7 Mezzmarr
    California
    January 12, 2014

    In the event of a selfish minor attack, I think it is important to emphasize that only the transactions of the last 10 minutes are affected, not the entire Blockchain history. Notice how Ghash.io, the closest threat so far, was ameliorated by a swift and concise response from the minors. The mining community has a vested interest on preserving the block chain integrity, after all, their earnings are entirely in Bitcoin.

  5. #8 Random
    January 12, 2014

    But what if sb would want to destroy bitcoin because of his “vested interest” in banking industry (Fed, GS, etc.). All they have to do is create a mining pool and attract more than 50% of mining power. It doesn’t look secure to me if the future of bitcoin relies on the biggest mining pool operator decisions.

  6. #9 G
    January 12, 2014

    Two of the things the public including businesses, values in a currency, are stability and fungibility. At present, Bitcoin is poor on both axes: it appears more like a kind of speculative instrument than a currency. That it has become popular among people with a strong anti-government ideology, only adds to the bad smell as far as law-abiding citizens are concerned.

    Whether or not the current reported attack is real in any sense, there is still a strong impression that the complexity of the system is a vulnerability that will eventually be exploited. I’m in small biz in a high tech industry and I wouldn’t touch Bitcoin; a couple of programmers I know have dabbled around the edges but wouldn’t use it in any “serious” way that entails risk.

    Contrast to Ithaca Hours, a local currency in upstate New York, that uses printed notes. This system was developed by people whose primary interest was in local economic stability, as a means of economic resilience against unpredictable downturns in the national/global economy. They quickly got on good terms with the IRS* and a few local financial institutions, and the system became a success with local residents and businesses.

    As far as I know, there is no way to use Ithaca Hours as a speculative vehicle. The system is deliberately “not exciting” to financial cowboys, but instead fulfills the role of an actual currency. If something like this was available in the San Francisco Bay Area, I’d use it.

    * The IRS is OK with local currencies, provided that: a) the exchange rate to the US Dollar is publicized, is reasonable (in the sense that there is a logical and empirical basis for it), and is not subject to excessive rapid fluctuations, and b) that income earned in a local currency is reported on income tax filings, and taxes on that income are paid in US Dollars. This is entirely reasonable, and any organization that is serious about developing a local currency should be able to meet these terms easily.

  7. #10 Ajinkya
    January 13, 2014

    I seriously doubt if bitcoin can succeed without any government backup, it still look precarious to me Thanks

  8. #11 blab01
    January 14, 2014

    Just to let you know im working on a webpage with more statistics and suspicious blocks.
    I hope to get it online till the weekend. I’m getting back to you when its done with the url.

  9. #12 Jonathan Wood
    January 15, 2014

    According to my research, selfish mining is another form of manipulating well gained currency. Bitcoins are actually experiencing a lot of attacks and hacking. To protect BTC, it is better to have a back up account for your Bitcoin wallet.