Randomness wrap-up

This week's article on the "most random" number was the most popular post ever on Cognitive Daily. The stats aren't all in yet, but so far the post has been viewed at least 40,000 times. It wasn't long ago that 40,000 was a good month for Cognitive Daily! Since comments and questions about the project were spread over at least four different threads, as well as at least a dozen posts on other blogs, I thought I'd sum up some of the questions about our poll and the results in one place.

We polled 347 CogDaily readers, asking them to simply "think of a random number between 1 and 20," and found that the number 17 was chosen significantly more frequently than it would be in a truly random sample.

Some of the best responses:

  • Shouldn't you have called 17 the least random number, since it was the number picked most frequently?
  • How likely is it that this result is due to chance? "62 people out of 347 replied 17. To give a flavor, the chance of more than 30 people saying 17 is 1 in a thousand. The chance for 40 or more is 1 in a million. The chance for 62 or more choosing 17 is practically nil (unless 17 is really the most popularly chosen random number)." [note: I haven't checked these calculations --DM]
  • "17 is a prime example of a random number. *math pun*"
  • Shouldn't you have asked people to pick a number from 1 to 20? (yes, we should have!)
  • Dilbert on random numbers

Some of the best questions (with my responses):

Why make the computer calculate only 347 points? If you want to find the distribution you should take much more points than that (it would only take some seconds to get a perfectly uniform distribution).

Because there were 347 responses to the poll. I wanted to see if computer-generated random numbers showed as much variance from the theoretical distribution as humans do.

It's silly to even plot the computer output. If you did more trials, and had a proper random number generator then the distribution would be even along all the numbers. Always.

You're misunderstanding "random." The reason I plotted the computer's random numbers is to show how much variance there is in a set of random numbers the same size as our poll's sample. The dotted line on each graph shows the theoretical limit that a random sample approaches, but with a small sample, there will always be variance in a set of random numbers. As I said in the original post, if you roll a die six times, it's extremely unlikely that you'll get one of each number.

Different algorithms may output different numbers, so those "computer" numbers shows nothing relevant.

They are a reasonable set of random numbers. I used the generator at random.org to come up with my list of random numbers.

But that's beside the point. Think of it this way: I could have just compared the percentage of "17" guesses with the theoretical number of times 17 should appear in a random sample, but I wanted assess our poll using a stricter standard. The point is this: it's possible that there were more 17s than any other number just due to random chance. If you roll a die 6 times, and the number "2" comes up twice, you don't have enough evidence to show that the die is biased.

Yes, "17" was chosen significantly more frequently than the 5 percent of the time we would expect, but even in a truly random sampling of a finite number of numbers, we would expect that some numbers would come up more than 5 percent of the time, and some would come up less. By comparing the "17" to the most common number from the computer-generated set of numbers, we can see if the large number of "17" responses the humans came up with reflected a true bias, or if they might have been due simply to chance.

There is certainly a way to do this using a statistical calculation (in fact the 90 percent confidence interval around the theoretical 5 percent value would probably do it), but I wanted to demonstrate it in a way that was clearer to people who do not have a background in statistics.

The only problem with this is that it's highly likely many of those who took the survey had read the Pharyngula and/or Cosmic Variance post, compromising the results.

I would think if the results were compromised, they would have been compromised in the opposite direction; that is, people would have tried to think of some number other than 17. I also think nearly all CogDaily readers are interested in good research, and so were participating in good faith -- in nearly all the "casual" studies we've conducted, participants have been exceptionally careful to be sure they aren't polluting our results.

Finally, some misconceptions expressed by commenters:

  • My wife said "17" so it must be true
  • You only surveyed 347 people, so your results don't apply to the general population
  • I really like the number 17, so that's why it's the most random
  • 17 has the most syllables (17, meet 11)
  • Random numbers are perfectly distributed
  • The question never asked the reader to respond in a random manner (yes, it did)
  • The study author didn't take statistical significance into account (yes, he did)
  • It's interesting that the computer came up with 19 as the most random number (no, it's random)

And speaking of "random," how random are the British when they pick their computer passwords? (Liverpool and arsenal seemed pretty random to me until I realized they were both Premier League teams). I'd love to see a similar U.S. list (redsox, anyone?).

Tags

More like this

People are known to be very bad random number generators.

This fact was discovered by criptologists in the 1930's when they managed to break many cyphers using the fact humans have favorite nunbers. During the Second World War, human generated "random" keys were deemed so unreliable that machine-made random number generation became an industry. Random number generation became an impetus behind attempts to calculate numbers like pi to many thousands of decimal places.

I believe Claude Shannon (the guy who invented information theory) built a simple machine to predict human-generated "random" coin flips years ago. People are famously bad at generating coin flips, partly because they don't like to allow locally unlikely events like runs of three heads in a row to ever happen.

An interesting question is: What process happens inside my brain when I try to make a random choice. In a lot of contexts, "random" just means "stuff I didn't observe." I wonder how much of random choice in this context is just choice happening by more-or-less deterministic algorithms that are running below the level of consciousness. I also wonder if I'd be better at generating random-looking behavior than other people, because I've spent a lot of time studying random number generation processes. (Maybe I'd just subconsciously omit different kinds of locally unlikely behavior.)

By albatross (not verified) on 09 Feb 2007 #permalink

Like the poster above, I was familiar with the idea that my brain could have some algorithm if I let it run. I tried to counter this by just using the very first number that popped into my head, without second-guessing it. That too could have been biased of course, but at least that way I didn't use some of the conscious biases described by some posters. I came up with 11. Of course I had to check and see how "well" I did - I admit to being a little pleased that my number was chosen infrequently - though I recognize that's a bit irrational :) plus my number WAS a prime so it fits some of the theories here.

By Teresa Michelsen (not verified) on 09 Feb 2007 #permalink

Re: #1:

I think the word you're looking for is "cryptanalysts", not "cryptologists" (and certainly not gangland researchers as "criptologists" might be).

Another point to ponder: what if the number chosen were between 20 and 40? I think that a lot of Americans have an aversion to "13". Perhaps because of that, I think that 13 is one of my own personal favorites.

Then again, Michael Jordan makes "23" famous. Rolling Rock's "33". ".38 Special". "21 - Blackjack". I guess there's a lot of reasons to pick any number.

Griffiths and Tenenbaum have a nice paper on people's perception of randomness. They argue that the "biases" are actually a manifestation of a general pattern inference mechanisms : we try to figure out what sort of process might be generating the data, and we call "random" whatever can't be easily explained by a simple generative process. Take those two fictional series of coin flips :
1000111001
and
1010101010

Given an unbiased coin, the two sequences are equally likely, but if what we're looking for is really a generating process, then the one that underlies sequence 2 is really more obvious. So people say that sequence 1 is more probable than sequence 2 (Tversky & Kahnemann have done a few experiments on that if memory serves)

The original paper is here:
http://www-psych.stanford.edu/%7Egruffydd/papers/algrand.pdf

Regarding your particular experiment, beeba is probably right in that a lot of numbers between 1 and 20 have a cultural significance that make them seem "non-random". 1 is unity, 2 is the smallest "plural", 3 and 7 have all sorts of cultural bits and pieces attached to them, so does 13; 5, 10, 20 add and multiply nicely because of our decimal system and I also suspect that even numbers seem less random. That doesn't explain 17 but biases the probabilities heavily in its direction.

I have this blog on my Google Reader, and therefore wasn't biased by reading another one first. As for how many people read this via RSS, link from another site, bookmark, etc., didn't you already take a poll on that? If not, you should.

It would be interesting to try the experiment with a bunch of different ranges, as suggested above.

My first thought was 17 also, but then I realized everyone would choose that.

Between 20 and 40, I pick 25. I don't see what's going to be popular here, whereas in the first one, I *knew* it would be 17. How? No idea. Maybe the prime-number theory has soemthing to do with it. 17 sounds 'random'. As in, not related to anything. (Do women choose 17 more often than men do? Do engineers/math teachers/programmers choose it less often?)

Could also try with different parameters: guess the most popular number between 1 and 20 (the one the most other people answering this will pick); pick a random number between 20 and 1....

Dave,

Another online study you may conduct is for choosing a random alphabet. We only have 26 alphabets and just like the number study, we might get some pattern there.

I know that the domains are not analogous, with the frequency of letters in english lanuage words being a factor; but we can see if the frequency of letters in english vocabulary matches the random selction of alpabets by the audience.

I would have done this study at my blog,The Mouse Trap, but I am afraid I wont have that big an audience and thus not a valid sample size.

You need to be careful not to confuse "humans" with "computer-owning Americans who read your blog and have enough time and interest to answer this particular question at the moment they did answer it".

By Barnacle Bill (not verified) on 22 Mar 2007 #permalink

You need to be careful not to confuse "humans" with "computer-owning Americans who read your blog and have enough time and interest to answer this particular question at the moment they did answer it".

Agreed. Is there some conclusion I draw from the results that leads you to believe that I'm confusing those things?

I have a friend who thinks it is kind of spooky that the number 23 shows up "everywhere." Just watch for it, she says -- you'll see it all the time! I have always said it's for the same reason that 17 is the most (therfore least) random number: 23 sounds random, whereas 24 and 25 do not, because they are easily divisble and therefore used in packaging, etc. Therfore, if people want to pick a number that "seems random," they pick 23 (or 17 or 27!), not 24 or 25: therfore the recent movie, "The Number 23." This fascination with 23 is well known as the 23 Enigma, which is easy to google.

By peter alexander (not verified) on 06 Apr 2007 #permalink

I think it's even easier than that. People tend to aim for the middle. Ask for a number between 1 and 10, the most common answers are 3 or 7. Why? Given a range of 1 to 10, the middle is 5. That's "too obvious," so you go to the middle again -- either up to 7 or down to 3. Since smaller numbers like 3 also get used in smaller ranges, like 1 to 5, most people tend to go with the higher split. Likewise with 17. The first split puts you at 10. Too obvious. The next splits would be 5 or 15. Tendancy is to go to the higher split. However, 15 is a factor of 5, which tends to sound "not random," so you subconciously split again, this time, either up to 17 or down to 13. Doing a quick office test showed 17 and 13 as the most common answers.

I disagree with part of the analysis:

"Clearly humans aren't very good random number generators. We predictably select some numbers more than others."

The problem with this statement is that random number generators are also not really random. They use algorithms to generate pseudo-random numbers based on a statistical distribution which, over time, is forced to conform to the distribution. Humans are not forced into this confinement. Without an explanation for the statistical anomaly (17 being more popular), I can't accept the conclusion that humans are not very good random number generators. It's not that I necessarily think the conclusion is incorrect, I simply think the basis for the conclusion is extremely flimsy. The implication is that humans are worse RNGs than computers, which simply can't be supported with the evidence provided.

On top of that, 347 responses for something of this nature is far too few. There is also the possibility that some people chose multiple times, which would skew the results as they are unlikely to choose the same number twice. Someone aware of the premise about 17 being more popular could also have found the site and used that information skew the results on purpose.

To some extent the number you asked people to choose is just a symbol. It represents a mathematic concept, and as such we can talk about odd or even, primes, middle of the range, divisibility by 3 or 5. We cna also talk about the human mind choosing a number, then deciding on a different one because the first choice was "too common". People want to feel unique and special (at least the people I'm used to being around!).

Then again, the number chosen is on another level just a symbol. I like the look of 11, 8 is quite nice, and 4 has its charm in an industrial sort of way. I've always had beef with 5 and 6, because when I write them in a rush they come out nearly the same. Are single digits more pleasing than double? And who can doubt the lovely appearance of 101 or the jarring inexplicability of 696?

Cultural meanings of various numbers may play a role. 3, 7, and 11 are very commonly-used small numbers. Many commercial objects come in packs of 4, 6, or 8. Maybe in your country 13 is lucky, or 8 is unlucky. When people see a small number they might think of age, and as such choose an age which they feel positively about (perhaps from their own experience, perhaps from a cultural bias) or their own current age.

The sound of it being spoken may factor in. I personally thing "One" has a nice sound, while "Three" is pretty lame. "Five" sounds pretentious. "Nine" sounds stressed out. "Eleven" is far more pleasing to my ear than "Seventeen". Every multiple of 10, when counting upward by ones, has a small feeling of victory - a pause too. And every multiple of 100 sounds like a minor triumph.

And so culture and language should make a huge impression on this. In order to find out you'd need control groups of people who were natively from various cultures, within each such group you'd need people who grew up as native speakers of a set of perhaps five representative languages. A study like that might need thousands of participants remaining at the end to achieve a statistically useful result.

That result would determine how much your number choices reflect your culture, your language, and your human mind.

However, I believe such a study's results would lead to unethical behavior by advertisers. If an advertiser knows a little better how people think, they can invade your decision-making processes even further. I want to be shown in disinterested fashion the strengths and weaknesses of a product, then decide for myself whether I need it and which I should choose. I don't want Apple to sell me on the lifestyle of using a Macintosh. I want to choose the computer and operating system that performs to my specifications optimally.