Sunday Function

By mspringer on June 26, 2011.

Head down to Box Office Mojo and pull up the list of the top grossing films of the year thus far. Seven of the top ten have a dollar gross beginning with the number 1. Okay, that's not too weird. Big films tend to pull down somewhere between $100-200 million, while only the real monsters have high grosses. So what if we look at the inflation-adjusted all-time list, which is less likely to be fixed by the coincidental size of the film-going public and ticket prices? Again, seven of the 10 have grosses beginning with 1.

Well, maybe movies are just weird. What about cities? In the US, five of the top ten cities have a population figure which begins with a 1.

Maybe cities are just weird too. How about election results? If you rank the states of the 2008 US presidential election by Obama's vote total, zero of the top ten have Obama vote totals beginning with 1 - but then again, all the rest of the top 20 did.

Why this preponderance of numbers that happen to start with 1? Is it just an artifact of the data sets I've picked, or something more interesting. Try a thought experiment:

Pick a number, say, one million. Write it out in decimal notation and it reads 1,000,000. Its first digit is the number 1. If you increase or decrease 1,000,000 by ten percent, you get 1,100,000 or 900,000, which start with 1 and 9 respectively. If you increase or decrease 1,000,000 by twenty percent, you get 1,200,000 or 800,000, which start with 1 and 8 respectively. If you increase or decrease 1,000,000 by thirty percent, you get 1,300,000 or 700,000, which start with 1 and 7 respectively.

Continue this exercise and basically the pattern continues. Essentially the million numbers following 1,000,000 start with 1, but the million below 1,000,000 can start with just about anything, including 1.

Obviously had you started with (say) 3,000,000 the effect would be much less pronounced, but it would still be there. It's possible to rigorously analyze this sort of thing, and the result is Benford's Law, which gives the probability distribution for the first digits of random numbers:

Plotting this distribution gives:

From Benford's law, you'd expect around 30% of leading digits to be the number 1. Not every set of randomly chosen integers satisfies the conditions required to Benford's Law and its odd preponderance of 1s, but lots of them do. In the financial industry, the law has even been used to search for fraud. Humans are generally terrible at making up random numbers that act anything like actual random numbers, and as a result the figures they make up when cooking the books don't tend to satisfy laws like Benford's.

Unless you're angling to hang out with Bernie Madoff in Club Fed, you should probably use your math knowledge for good rather than evil. But if you're gonna cook your books, your recipe should probably include about 30% 1s as leading digits...

More like this

Dr. DB needs to be hit hard

So hit him. Hit him hard. He's 40,000 hits away from 1,000,000 visits. Help get him over the top.

Why Don't People Understand the Concept of Tax Brackets?

It never ceases to amaze me just how little some people actually understand about taxes, given how het up we get about taxes. We saw this before, when Obama unveiled his tax plan.

1,000,000

I demand the sum of.....one MILLION visits! Muhahahahahaha!

Sometimes in math we'll understand one aspect of a problem very well, while at the same time we understand another aspect of a problem very poorly. For instance, take the prime numbers. According to the prime number theorem, the number of prime numbers below x is approximately given by:

Benford's law does not work for binary; it gives '1' as the leading digit with frequency 1. The leading digit of zero is '0'.

It works just dandy for binary!

log2(1 + 1/d) gives 1 for a number starting with 1, just as you'd expect.

If one wanted to apply Benford's Law-type reasoning in a binary context, one would presumably use a generalization to leading prefixes of length > 1.

cf. a report from February, 2010, that "The number 4 occurs less frequently than chance would dictate in the tenths of a cent digit for quarterly earnings." Of course, this may be statistically significant for a data set consisting of all companies, but not for any individual company, at which point it's hard to crack down on this sort of thing.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Is Bitcoin Currently Experiencing a Selfish Miner Attack?

January 11, 2014

Probably not. All right, now that you know my conclusion, let's see how to get there with data. First, some background. Let me give very quick overview of Bitcoin in this context. (There are many comprehensive overviews elsewhere.) Bitcoin is an ongoing ledger of transactions of along the lines of…

How often does the sun emit 1 TeV photons?

November 27, 2013

I had an interesting question posed to me recently: how frequently does the sun emit photons with an energy greater than 1 TeV? All of you know about the experiments going on at the LHC, where particles are accelerated to an energy which is equivalent to an electron being accelerated through a…

Everything in Pi... maybe.

April 12, 2013

George Takei posted the following thing to Facebook recently: It got reposted by a bunch of people and provoked a tremendous amount of discussion (for a math topic, anyway), much of which was somewhere in the continuum between merely wrong and psychedelically incoherent. It's not a new subject - a…

Why are clouds white?

April 1, 2013

Why is the sky blue? It's a classic question - probably the classic question of the genre of explanatory popular physics. The famous short version of the answer is that Rayleigh scattering by air molecules affects short-waveength light more than long-wavelength light, and so blue light tends to get…

Light from a Hairbrush

March 15, 2013

Question from a reader: Pick up a comb, rub it with your hair and you have got some electric charge. Now shake it and you are generating an electromagnetic wave. Am I right? Yes indeed. So why don't we see light emitted when we brush our hair? Let's run some numbers. If you wiggle around an…

Sunday Function

More like this

Dr. DB needs to be hit hard

Why Don't People Understand the Concept of Tax Brackets?

1,000,000

Sunday Function

Is Bitcoin Currently Experiencing a Selfish Miner Attack?

How often does the sun emit 1 TeV photons?

Everything in Pi... maybe.

Why are clouds white?

Light from a Hairbrush

STOP 'feeding' the ducks

Weekend Diversion: An 8-bit mashup, reimagined

October Pieces Of My Mind #1