Circumstances under which it is OK for scientists to pull numbers out of thin air?

By jstemwedel on March 23, 2009.

Some commenters on my last post seem to be of the view that it is perfectly fine for scientists to pull numbers out of thin air to bolster their claims, at least under some circumstances.

I think it's a fair question to ask: In which circumstances are you comfortable giving scientists the go-ahead to make up their numbers?

One suggestion was that scientists ought to be "permitted to use ordinary language meanings of words and colloquialisms in their non-peer reviewed discourse" -- or at least, to do so in discussions on blogs. I take it this assumes that "ordinary language meanings of words and colloquialisms" covers totally made-up statistics (which itself seems to assume either extreme innumeracy or disregard from actual facts among people talking in anything other than a peer-reviewed context).

I'm guessing that adopting this standard would undercut scientists' attempts to communicate with non-scientists about matters of interest or importance. Why on earth would a non-scientist be swayed by a scientist's claims about the impact of carbon emissions on global climate, or about vaccination on disease prevention (or the incidence of autism), or about the age of the earth, or about the size of an atom, if every number that appears in such claims can be presumed to have been made up?

Another suggestion was that it is permissible to use "joke statistics to make a rhetorical point". In the case where the cited statistic is clearly made-up (and I think there's some room to discuss whether joke statistics not explicitly flagged as such might be taken as authoritative), it may advance a rhetorical point. Does the rhetorical point depend on the joke statistics, though? Could the joke statistics (by their very jokey-ness) actually undermine the rhetorical point?

Ultimately, can the rhetorical point end up distracting attention from the question of how things actually are in the situation being discussed?

Let me repeat my worry as I elaborated it in a comment:

My point is, when scientists do it -- whether in the context of a discussion of their findings, or of the prevalence of fraud in their field, or of mentoring strategies, or what have you -- when they make up data rather than admitting "I don't know the exact prevalence of X (nor does anyone, to the best of my knowledge)", that's lying. Scientists, of all people, should understand the importance of distinguishing between what you know and what you guess. Scientists, of all people, should understand the value of admitting ignorance in situations where good data has not yet been collected.

Doing otherwise may be human, but it's not scientific.

So, I'd be very curious to hear an elaboration of when you think it's OK for scientists to pull numbers out of thin air.

Those of you asking mentors (or bloggers) for career advice, tips for getting grants, and so forth: which bits of that advice would you like to have grounded in reality, and which bits not? How about when they're giving you advice about running experiments?

Those of you depending on scientists for reliable information about the world: in which non-peer-reviewed contexts would you prefer that scientists only cite numbers grounded in reality, and in which are you fine with them making it up? Letters to the editor or op-ed pieces? Statements at school board or city council meetings? Conversations at the doctor's office? Chatting in line for coffee or at a soccer game?

Scientists: do the made-up numbers make the point you're trying to make better than would saying, "I don't know for sure but my hunch is this is not a big problem," or "My impression, from what I've seen, is this is more frequent than you might guess"? If so, how do the made-up numbers make the point better?

Would the point be bolstered better still by actual numbers? Would your point be undermined by your audience noticing that you've pulled numbers out of thin air?

It's quite possible that my prejudice about scientists leaning on made-up numbers is mistaken, at least in certain contexts. But I'd like to see a reasoned argument to that effect.

Preferably an argument that doesn't rely on numbers pulled out of thin air.

More like this

Christianity Does Not Cause War

tags: Christianity, war, humor,

Puttin' on the Ritz Variation

Continuing from yesterday's post on approximation methods in quantum mechanics, here's another common method worth a close look.

Lurkers of the World Unite!

Janet remind

How to Effectively Teach New Concepts to Others?

Have you noticed how you tend to remember things better after you've figured them out for yourself rather than listening to someone else's explanation? Well, this phenomenon is typical for toddlers, too.

MartibB,

In my statistics you are among the undecided (2). There are 15 in the majority, 2 in the minority and 2 undecided, for a total of 19. You may change your affiliation at any time without notice, it won't change the trend. ;)

I think I agree with your larger point, I don't like made up statistics if they're meant to be taken as actual numbers. But, I would like to quibble with your .001%/99.99 example (and also would like to ignore the particular context). 1% vs 99% and all of its variants seems to be to be a very useful shorthand. It's a colloquialism, and I never interpret it as a genuine statistic. Rather it is a way of saying 'almost all'. Or even better, it is a way of avoiding categorical statements. If one makes a categorical generalization in an argument, and then someone comes along and points out a low frequency exception to your generalization, which is utterly irrelevant to the main point, the conversation can get side tracked. If you say 99.99%, you are acknowledging exceptions, but also not letting that get in the way of the relative truth of a generalization.

Perhaps it would be better if we just said 'most', or 'the vast majority', or almost all. But nothing really bothers me about 99% or 99.99%. I don't treat that statement as a real number when it comes up in conversation.

And, as I said, I actually prefer an acknowledgment of exceptions to categorical statements. I get an allergic reaction when i hear statements of the form : All X are Y when we are talking about the messy real world. Especially if X refers to a group of people.

While in principle I agree that scientists (or anyone, I don't see why we should be the only sane ones) shouldn't make stuff up, your original example wasn't really people making numbers up.

Saying something is 99% (with extra .999... if you prefer) is just a different way of saying "almost all the time except for extremely few, or none, exceptions". I don't know that anyone would really take a number like 99.999% as a scientifically validated number, in any context. Except maybe a journal article where something just happened to have that value.

You might as well complain about people claiming that they are giving "110% of everything they got" or rating something an 11 on a scale from 1 to 10. It's just an expression.

I agree with you 200%!

This is why FSM created the numbers "eleventy" and "eleventeenth". Also squillion(th), etc. Fake numbers aren't just easily identified as joke stats, they're usually funnier than equivalent real numbers.

Setting this up as a choice between "99% of all PhysioProfs are Hoofnagles" and "in my experience, most PhysioProfs are Hoofnagles", though, is missing some middle cases. The most important difference those two statements, IMO, is that one statement explicitly qualifies the source and scope of the person's knowledge, while the other does not. If someone says "in my experience, 99%...", then it's pretty easy to take that number with an appropriately-sized grain of salt, and I don't see it as any better or worse than "in my experience, most...".

Also: Aren't words like "most", "almost all", etc., commonly held to have quantitative meaning anyway? I know I've encountered nitpick wars over whether "most" can properly refer to a plurality, or whether it implies an actual majority. I don't think embiggening the error bars of a made-up number by using imprecise language, instead of made-up numbers of arbitrary precision, really makes unsourced bullshittery any more or less okay.

I'm not sure I see a systematic problem with this. What *is* a systematic problem is the misunderstanding and misuse of numbers--particularly percentages--that actually are real empirically determined values. The mainstream media are absolutely horrible with this, especially in the case of epidemiological risk measurement, frequently conflating relative risk with absolute risk.

If what we're really talking about is just goofaloons like Dave and Rivlington pulling shit out of their asses and smearing it on the walls, then this has nothing to with "scientists making up numbers" and everything to do with whiny attention-seeking dumbasses trolling blogs and interfering with people trying to engage in useful discourse.

I'm not sure I see a systematic problem with this.

I don't have good numbers from a well-designed study of prevalence*, but I have seen and heard more than a few scientists do it, especially around issues like the prevalence of fraud in science and the extent to which affirmative action affects admissions and hiring decisions.

The first of these, especially, is a discouraging arena in which to encounter totally made-up numbers.

______
*See how easy that was?

Re Maria's answer:

My five-year-old has been asking how many zeroes there are in "gazillion."

My five-year-old has been asking how many zeroes there are in "gazillion."

Fewer than in a fucktillion!

the prevalence of fraud in science

Is anyone aware of any decent studies on this? Rivlin is constantly blithering on about how it is so much worse than the good old days and that fraud is RUINING SCIENCE!!!111!!ELEVENTY!11!!

I thought we were in general agreement that making stuff up when you are a scientist meant that you were no longer a scientist?

Coriolis. I completely disagree. If somebody starts talking about 99% then that is a scientifically relevant number. There is a world of difference between verbal hyperbole (bullshitting to your friends in a bar) and written estimates of probability in a scientific discussion.

Actually, I do have a serious response. Not about made-up statistics about mentoring or affirmative action or fraud, but about a case in which a friend wanted me to pull numbers out of the air.

This friend used to live in New Orleans. She evacuated after Hurricane Katrina, and she decided that she didn't want to deal with the risks involved living there any more. So she wanted to systematically evaluate several cities as possible places to move to. And when I say systematically, I mean that she built this matrix to allow her to compare several cities based on various criteria (cost of living, hip-ness, possibility of finding a job, risk of natural disasters...). She asked me to help quantify the risk of disasters.

But that's pretty hard to do. I decided to consider both total destructiveness and likelihood of occurrence, but even after I decided to use two logarithmic scales similar to earthquake magnitude scales, it was difficult to give her numbers. So I made stuff up. Sort of. I had some idea of how often Mt Rainier had erupted or how often magnitude 8 earthquakes occur on the San Andreas Fault, and I had experience with big snowstorms in Minnesota and earthquakes in California. But there were huge uncertainties, orders of magnitude in some cases.

One difference between the situations Janet's describing and this one, though, might be that I tried to use the most reliable information that I had, and I sent my friend a long e-mail telling her the basis of my estimates (which ranged from "I lived in Minnesota for four years and it snowed every year" to "Paleoseismic studies of recurrence intervals on the San Andreas Fault suggest that magnitude 8 earthquakes have recurrence intervals on the order of centuries."). It was a guess based on partial knowledge, as opposed to a certain-sounding statement based on nothing.

(By the way, for Janet in earthquake country: my friend moved to the Bay Area.)

I cannot believe people are attempting to rationalize situations where it would be ok to make up, bascially, data. Next thing you know we'll be talking about when it's ok to fabricate gels or photoshop microscope images.

The fabrication of data in this case was especially distasteful because it was done to minimize the importance of some very valid concerns had by a female commenter without actual data to support the assertion. Funny behavior from someone who claims it is their mission to fight fraud.

The only circumstance that seems acceptable to me is when you're estimating -- like what Kim describes above... if you're trying to express a general order of magnitude of something (or of a ratio between somethings), and *explicitly* labeling your number as an estimate, ideally with a range of accuracy.

I am perplexed. I suppose because your comment at 4:33 is so foreign to me. I simply do not run across many situations, such as the one you picked up from the comments at DrugMonkey, in which I am under the impression that example numbers are in fact real numbers. Nor do I get the idea that the person who has deployed the made-up example numbers intends me to believe that they are anything other than qualitative (and subjective, mind) estimates.

The contexts in which I have noticed the numbers-from-thin-air strategy from scientists have included interviews with reporters (on how big a problem fraud is -- or isn't, as the case may be -- for the body of scientific literature) and on panels at professional meetings (discussing the same issue), to name the two stuck at the top of my craw.

I see it slightly more in real life than online, but given the extent to which online venues like blogs are being used to disseminate serious mentoring and career advice these days, I think some rhetorical practices that may have worked on Usenet groups might bear re-examination.

I'm with DM and PP on this -- to consider this any kind of real problem I'd need to see concrete examples where one wouldn't have to be daft to take the "made up numbers" for anything more than a colloquialism. In my experience it's relatively easy to distinguish between a rhetorical flourish and a specific claim that may or may not have solid data behind it, and in the latter instance to ask to see said data if it exists.

And if we're going to start putting limits on what scientists can and cannot say, particularly in their private lives, then we'd better have some handy definitions of what is a scientist (that is, who has to abide by these limits) and where is the border between private and professional discussion. Good luck with those. (Or we could, you know, apply the same standards of reason to everyone, regardless of profession...)

Bill, one of the places I'm aware of this happening is in an interview given to a publication whose reporter did not, apparently, ask to see the data but took the "statistics" at face value. As did others, who cited the article as a source of authoritative numbers.

Does this fall within the bounds of how scientists ought to behave in their private lives? Hey, it's not like I can stop them. But one foreseeable consequence of treating the thin-air numbers as obviously rhetorical (obvious to whom?) might be that, in larger societal discussion, scientists are presumed to be just as full of B.S. and just as disconnected from reality as everyone else. Maybe that's worth it to be able to make up numbers freely -- but it is surely a cost to consider.

"In which circumstances are you comfortable giving scientists the go-ahead to make up their numbers?" Anytime they are labeled as madeup, or when I can tell from context extremely easily.

"Next thing you know we'll be talking about when it's ok to fabricate gels or photoshop microscope images. " When they are going in the Fisher calendar as art? It's not making up data if you aren't presenting it as data. You'd have to be a complete whackaloon to take anything unclesolthetroll says as data, ever. I only proved Dave's numbers wrong once- after that, I assumed they were all pulled out of his ass... and, more importantly, that he wasn't any good at the ever-so-fun-game of back-of-the-envelope-calculations!

Note- I'm not trying to say poor estimates are not a source of real annoyance- nor that I'm not more or less offended by representing estimates as actual data... but you have to take everything with a grain of salt. You can lie with real statistics every bit as easily as you can lie with made up.

Janet, in this discussion you are coming across as a cartoon-like parody of a clueless philosopher. 99.99% of native English speakers can tell the difference between this form of colloquialism and hard data.

And, by the way, I should note that in larger societal discussion, scientists are just as full of B.S. and just as disconnected from reality as everyone else. Scientists are not, and should not be, specially privileged as citizens.

I gotta go with other people here and say that context matters. If somebody says in casual conversation "In 99% of cases this will always work...." I assume that he/she is using "99%" as shorthand for "Damn near always", a phrase that nobody would mistake for a precise statistical claim.

OTOH, I have to admit that we shouldn't always assume that people understand us. A patient listening to a doctor may take "99%" and "one in a million" literally.

In general, if somebody offers a numerical-sounding statement for how common or rare something is and I'm not sure how precise that number is, I ask for clarification rather than making assumptions. I might be assuming precision that isn't there, or I might be dismissing a claim that is in fact quite well-supported.

So rather than taking a statement in isolation and asking whether it's ethical to make such a statement, let's look at the conversation in which the statement occurs, and see how people respond to the statement. That's what really matters.

Oh, here's a context in which physicists routinely make up numbers and get good science: When they want to illustrate how one variable is proportional to another variable, or proportional to the square of another variable, or the logarithm of another variable, or whatever. So they'll say "Let's suppose this bond energy is, I dunno, 5 eV. Anyway, this other thing depends exponentially on the bond energy, so if we doubled the energy, then the result should be squared, and if it was 0.1 before now it's 0.01."

Economists do it all the time too. "OK, so this widget costs $8 to make and your average consumer has a widget budget of $7, but a certain fraction of them have a widget budget of $16. So, we increase the widget cost to $6, and some of the customers don't change their buying habits at all, while the others stop buying widgets altogether..."

I find that biologists hate this. If I'm doing something in optics, and I want to know whether a feature on a cell is big enough to resolve with whatever lens and wavelength, I don't give a damn if the thing is 1 micron or 2 microns. Either way it's big enough. And if the thing is 10 nanometers or 20 nanometers, that's too small. So I'll say "OK, let's say it's 1 micron....." and start my calculation, and then a biologist will object and ask me to hold on and he'll ruffle through some papers and say "I think in my latest data some of them were 1.2 microns."

"I cannot believe people are attempting to rationalize situations where it would be ok to make up, bascially, data. Next thing you know we'll be talking about when it's ok to fabricate gels or photoshop microscope images."

Isis, I love you and your blog (in a completely familial, non-erotic way) but do you really believe this? That there is just a bright line of correct and verifiable and incorrect and corrupt, in every situation?

Do you forbid yourself from saying "I've told you a million times to clean up your room" to your child unless you have a verified tally of the previous occasions? Do you report a colleague to the dean's office if they do so? Is "It's a million to one chance" equally off limits to "I'm 99% sure"? There's no risk at all of misinterpretation with these kind of phrases; indeed, you would have to go to some rather absurd lengths to misinterpret them.

If you forbid all use of hyperbole, colloquialisms, set expressions and all the other facets of ordinary language, even when speaking informally, then you may as well give up communicating with non-scientists at all.

Janne, I love you too (but must we place constraints on our love?)!

But, you highlight two different examples. Would I say to my child as his mother, "I've told you a million times...?" Perhaps.

Would I ever say to a colleague with reference to data, "I'm 99% sure of something" or "there is a million to one change of something" as a scientist without the data to support those assertions? Nope, never, ever, ever. It has nothing to do with corrupt versus verifiable. It's because in science there are particular mathematical constructs by which we can determine probability. If I have performed a statistical test on my data then I can say, "There is greater than a 95% likelihood that blah, blah, blah" and that has real, rigorous meaning. I think it is especially clear when I am trying to convey the impact of my work to non-scientists that I am precise about the importance of my findings. That makes it much less likely that things will be misconstrued. I would argue that precision in language makes one a better communicator with non-scientists.

And that's the point of Janet's original question, isn't it? When is it ok for a scientist to make some shit up with regards to science -- not how we might speak to our children.

99.99% of native English speakers can tell the difference between this form of colloquialism and hard data.

How do you know that? Do you have a sample large enough to detect 0.01% effects? Well, do you? I mean, you used a number that was written out very precisely, with digits after the decimal point and everything, so I assume you're making a quantitative claim.

@Isis -- I think you are giving Janet too much credit. Her original question was specifically not about statements made in scientific fora "with reference to data." The career-oriented advice and discussions conducted at DrugMonkey's joint are collegial, not scientific, discourse -- a fact that is implicit yet (I thought) fairly obvious. (Occasionally DM discusses specific scientific questions in his area of interest, and different rules apply.)

Not every utterance by a scientist, even made in the company of other scientists on a website labelled "Scienceblogs," is intended as a scientific communication. That would be an absurdity, although apparently this is what passes for philosophy of science these days. I think Janet would be well advised to go back and re-read her (later) Wittgenstein.

@Alex -- LOL. Let me go back and collect the data. Look for it to be published in Science sometime in 2015.

Isis, I would not use expressions like "one in a million" either - in formal situations, that is. Not in a paper, not in a lecture or presentation. In a conversation, however, I certainly would, since such expressions are part of normal conversational speech, and understood as expressions by its listeners.

It's the difference between doing a reinforcement learning experiment on dogs (and needing formal permission and training for animal research) on one hand; and training your pet dachshund to come when you whistle (with no permission or anything) on the other.

In formal, scientific situations we have specific restrictions as scientists. Once that +5 White Coat of Research Respectability comes off, however, we are under no further restrictions than anybody else. You don't need a special permission to keep a pet just because you're a researcher, and you don't need to refrain from well understood everyday forms of speech either.

I think pulling the numbers out of various places is ok in two circumstances: when it is clearly an estimate ("I don't know the numbers, but it's around 80%, I think") or when working out the consequences of that being the number ("Let's look at what would happen if, for example, it were 80%").

And that's the point of Janet's original question, isn't it? When is it ok for a scientist to make some shit up with regards to science -- not how we might speak to our children.

If that is the case, then the example she chose to illustrate the issue is inapposite. Both Rivlin's and Dave's "quantitative" assertions have nothing to with science and everything to do with their own betes noires. And everyone reading those assertions knew to discount them as obsessive gibbering.

When some deranged wackaloon lurching down the sidewalk starts ranting that the CIA is broadcasting instructions to him through his tooth fillings on the 675.934 kHz radio frequency, I don't ask him to justify how he knows that it is 675.934 kHz and not 675.974 kHz.

Just added this on the previous post before seeing that the discussion continues here:

I think it is obvious that the original quote is a rip-off of Eddisons
"1 percent inspiration, 99% transpiration".
I see nothing wrong in this kind of situation.

No MartinB, Edison was clearly a liar and a disgrace to science.

OK, clearly there are two camps on this thread. And since I am specifically mentioned repeatedely by one camp (the minority), and although not by name, by the owner of this blog and post, I believe that it is time to "pull out of thin air" another proportion:

1. A majority camp (as of now at least 15 out of 19 participants) understands and agrees that in a discussion on a blog, Sb or other, the use of exaggerated figures to make a point is acceptable and normal.

2. A minority camp (2 out of 19) lets its emotions get the worse of it - carrying such accepted normal speech to an extreme, equalling it with fraud ("The fabrication of data in this case was especially distasteful because it was done to minimize the importance of some very valid concerns had by a female commenter without actual data* to support the assertion. Funny behavior from someone who claims it is their mission to fight fraud.")

One would expect of scientists to stay calm and unbiased
despite their emotions, unless, of course, they are using exaggeration, too, in making their point.

*One could thus understand that data maximizing the importance of said valid concern do exist and said member of the minority camp would present them accurately.

Sol, I'm not particularly interested in the apparently contentious question of what you (or Dave) contribute to the substance or tone of conversations at the DrugMonkey blog.

What I am interested in is the larger question of when, as you so nicely express it, "the use of exaggerated figures to make a point is acceptable and normal".

When discussions center on matters of fact, or when there's a disagreement about what the matters of fact are, I am still not seeing where exaggeration of figures makes a point -- at least, if the point is meant to be something other than "despite the fact that I can't be bothered to establish what the facts really are, I'm going to pull out a number to support my view". (Frankly, I think in some instances "exaggeration" may be a charitable way of describing the connection between the numbers being thrown into conversation, given the extent to which the people using them try to acquaint themselves with the relevant data.)

Indeed, in just the kind of disagreement you describe, when one camp says, "X is a very important problem" (where here you can cash out "important" in terms of high prevalence or high impact on effects, depending on the context of the discussion) and another camp says, "No, X is not an important problem at all" (and this camp is concerned with the same sort of "important" as the first camp), what is in dispute is precisely what is the actual state of affairs with respect to X. In such a case, making up numbers rather than collecting data -- at least enough data to support a back-of-the-envelope calculation that both camps think reasonable -- is a decision to try to win the argument on the basis of rhetoric rather than the facts.

Is this really what you are advocating?

@S.Rivlin
If you count me (because of the Eddison remark) among those who think
"the use of exaggerated figures to make a point is acceptable and normal" - not really.
I think it is acceptable and normal when it is obvious to everyone that the numbers are just a figure of speech.
Otherwise, making up or exaggerating numbers is not correct, even in an informal context. So I'm happy with the 99%perspiration quote, but I would not happy if you were saying 99% of all scientists believe XY to be a serious problem, unless you have a clear statistics on that, because then it would *not* be obvious that the numbers are not really meant as numbers.
I think that Janet erred when she used the original quote in that otehr post to make this point, but in principle I see her point very well and agree: As soon as a number may be interpreted as being a real scientific number, better make sure that it is.

Janet,

First, if my attempt at making a point in defending my off-the-air numbers will advance this discussion toward some understanding and maybe an agreement, then I feel that I have already did my part.

That said, there are issues on which no statistical data exist and the most we have is a "gut feeling" about where the data might fall if they were collected. The "gut feeling" is not an uneducated one. Frequently, we intuitively "know" where data on one issue or another would fall, and until such data are collected, the "gut feeling" is the only thing we have.

Allow me to elaborate, using the issue raised on DM blog. Clearly, no data was presented, either by DM or any of the commenters, on the frequency of trailing spouses who apply to NIH grants with the very same study section as their wife/husband. While statistic may exist as to the number of trailing spouses in academia (anywhere from 5% to 20%? someone correct me please, if this is an exaggeration one way or another), how many of the trailing spouses do work in the same field as their spouses or in the same department or the same lab? As we sort those estimates, we clearly see a trend toward a relatively small, probably very small number of trailing spouses who submit their NIH grant proposals to the same study section as their spouses. Whether that number comprises 0.1% or 0.001% of all NIH grant proposals it does not make a real difference in reaching the conclusion that it is an insignificant one, especially when we can assume that of those, not all trailing spouses proposals are suffering from antagonistic, unfair study section membership.

Thus, under circumstances where specific data do not exist, but other obvious data can guide us to reach an estimate that is probably close enough to reality, making a point by exaggeration is as valid as the one made by Brian Williams on his NBC Nightly News, as he described the amount of viewers e-mails sent to him as "tons of e-mails."

I can see your point of view regarding science, scientists and the importance of accuracy in scientific discussion among ourselves in scientific meetings, lectures and other scientific events, and when scientists communicate with the media. But surely, when scientists and others discuss issues on blogs, just because they are being discussed on SciBlogs, they are not necessarily scientific. Even the trailing spouse issue is not very scientific, no matter how important it is for Isis and how opportunistic it is for PhysioProfane.

Sol, thanks for your reply here -- it's helpful in understanding the point you are making.

The "gut feeling" you're describing here -- the rough guess at magnitudes (of frequencies are probabilities, of whatever) -- this actually comes from some kind of contact with the world, right? It's not completely internally generated, but rather drawn from what we might call (in circumstances in which we're inclined to be more careful) "anecdata", yes? So, it's not like someone estimates the number of same-field trailing-spouses from first principles. Rather, one makes an estimate from the actual instances one has seen plus some assumption about roughly uniform distribution of similar such cases through the relevant population as a whole. Right?

And in instances where you want some rough estimate of the situation rather than none at all, this may do the job. But, when someone else in the conversation comes to the table with a different estimate (likely generated from different anecdata in much the same way), is it more productive to fight about the competing rough estimates (and/or about the agendas they might bolster), or to recognize that better data might bring more light to the questions on which the rough estimates were hoped to provide guidance in the first place?

(I suspect we're not in significant disagreement about this, but if we are, I'd be grateful if you'd clarify the disagreement.)

I agree that not every discussion a scientists has is a scientific discussion. However, I expect that most discussions in which scientists are discussing career strategies or are mentoring scientific trainees ought to strive for connection to the actual state of affairs in the scientific community (and in universities, funding agencies, and so forth) -- at least, this ought to be the case if we expect the advice being offered to have a reasonable chance of success.

I understand that people disagree about the wisdom of seeking such reality-informed advice from blogs. So it goes.

1) There are times when the use of numbers in context should be taken as a joke, but those ought to be sort of obvious. If it's not obvious that the use of numbers in a particular case is a joke (by use of nonexistent numbers (*illion?) or overaccuracy in describing something other than chip manufacturing), then people are likely to take them as real (duh). If the figures being used aren't intended to be taken literally, but can not be easily distinguished from those that are, then either you are either intentionally misleading people or are unable to be clear in even basic writing styles. Dishonesty is supposed to be deprecated in all walks of life, and even if it isn't, it's a waste of time to discuss things with people when they either don't mean what they say or can't be bothered to be clear in even the most obvious ways about what they are saying or how they say it. (It's also possible that different people understand humor differently - but it would seem to be the writer's job to communicate as accurately as possible their message to an audience, and if the audience's understanding of humor is heterogeneous, then your writing should reflect that expectation.)

2) People use both scientific backgrounds and numbers to give legitimacy to their opinions - they use their status as scientists to imply that they are smart and meticulous about their opinions (that they know what they're talking about), and numbers to imply quantitative knowledge of a phenomenon (rather than just anecdotal or qualitative knowledge). These are not always accurate implications (does being a chemist mean I know meteorology? does the number I have actually mean anything with respect to the variable I claim it's measuring, and if so, how accurate and reliable is it?), but they are presumably what is intended. If you trumpet your scientific background while pulling numbers out of various orifices (at least when you don't obviously intend them to be funny), you're using other people's work and reputations to justify an opinion that they may not hold and that their work may not support. It's usually considered rude to use other people's reputations in the service of an opinion you can't be bothered to support yourself.

3) If someone is using their status as a scientist to justify their opinion on something (climate change, for example) without either real data or discussion of basic methodological underpinnings, the needle on your BS meter should either enter the red zone or your BS meter should be considered broken.

If you forbid all use of hyperbole, colloquialisms, set expressions and all the other facets of ordinary language, even when speaking informally, then you may as well give up communicating with non-scientists at all.

If you can't communicate the very simple fact of the existence of uncertainty to a lay audience without condescending to us, then you may as well leave the task of science communication to those of us who know how to convey imprecise information accurately without "dumbing it down."

It seems to me that you are simply putting a bit too much attention to the notion of putting down numbers as opposed to statements. I don't know what Sol posted, but taking the quote from Dave in your OP Janet, would you really have an issue with it if he had instead said:

"Ultimately, 'strategies' like those espoused on this blog, are almost irrelevant and success is almost entirely based on simply doing good science and explaining a good plan well"

Where I've simply restated what he said with no numbers but with the meaning that I think practically anyone would get from made up numbers like 99.999% or 0.001%.

Now if you don't have a problem with this restatement, then all we're disagreeing about is just how obvious it is that statistics like 99.999% are made up. And fine, we can agree to disagree but I think most people would agree with me (if this blog is a good sample, looks like 90% or so ;)).

On the other hand if you still have a problem with it, because you think a scientist shouldn't make statements based on gut feelings (which that post was obviously based on), that's another matter. Here I would again disagree - alot of science is started based on intuition, gut feeling and rough estimates. For us physicists in particular, being able to make rough order of magnitude estimates of anything is considered one of the most important (and coolest) skills to have. So long as people don't claim that they have more evidence for their statements then they actually do, I think it's a good thing.

If you think this is may be confusing to non-scientist bystanders who think that every word uttered by the holy scientists is ultimate truth, you may be right. Personally I'd like to think that non-scientists aren't silly enough to think like that, and if they are, it's about time for them to be disabused of such notions.

Coriolis, the problem is precisely scientists holding their gut-feelings to be as authoritative as other people's data (or to be superior to other people's anecdata-grounded gut feelings, in the absence of an examination of evidence that might let one make a more reality-based judgment). Expressing these gut-feelings as quantitative claims (without tagging them as gut feelings) just increases the chances that they will be taken as reflective of actual measurements.

Hunches and guesstimates are lovely as long as all parties to the discussion are clear that they are hunches and guesstimates.

Janet -- In the examples described, there is no evidence (or even anecdata) that any parties (other than yourself) were unclear about the nature of the discussion. Your perseveration on this pedantic and picayune point is puzzling and peculiar.

"I find that biologists hate this. If I'm doing something in optics, and I want to know whether a feature on a cell is big enough to resolve with whatever lens and wavelength, I don't give a damn if the thing is 1 micron or 2 microns. Either way it's big enough. And if the thing is 10 nanometers or 20 nanometers, that's too small. So I'll say "OK, let's say it's 1 micron....." and start my calculation, and then a biologist will object and ask me to hold on and he'll ruffle through some papers and say "I think in my latest data some of them were 1.2 microns."
Dude, that's messed up. Biologists love estimates... but why use an estimate when you have the actual number??

""There is greater than a 95% likelihood that blah, blah, blah" and that has real, rigorous meaning. I think it is especially clear when I am trying to convey the impact of my work to non-scientists that I am precise about the importance of my findings. That makes it much less likely that things will be misconstrued. I would argue that precision in language makes one a better communicator with non-scientists." But dearest Dr. Isis... what is the "real rigorous meaning"? Is it that in the cases where the sample we chose is representative of the total population and the variable follows a normal distribution we can establish using a non-parametric test that there's a 95% chance of blah blah blah..." Or is it one of the hundreds of other possible things it could mean?
Saying there's a 95% chance of something, without giving any information of how that number was derived, is never, ever, ever "precise".

"A minority camp (2 out of 19) lets its emotions get the worse of it -"
Well you can create a new camp (n=1) that lets its emotions get the worse of it and says FUCK YOU WITH A GOAT.
Moving on...

"So long as people don't claim that they have more evidence for their statements then they actually do, I think it's a good thing." On the one hand, I enjoy plugging in numbers and estimating to figure things out (at least within an order of magnitude).
On the other hand, I think it's fair to note that on social sciencey topics, what 'everyone knows' is quite frequently spectacularly wrong, and some of the most interesting research starts with finding out how so many of us got an incorrect 'gut feeling'. It's the difference between plugging in values to see if the unknown you get makes any sense and making sure the equation is set up to reflect reality, and plugging in values to get the 'unknown' you secretly want to justify.

"Well you can create a new camp (n=1) that lets its emotions get the worse of it and says FUCK YOU WITH A GOAT."

becca, for a while I was with you, following and agreeing with everything you have stated, until I realized that somewhere you dropped your sense of humor and decided to flash the above-quoted pearl. If you kept your sense of humor you would realized that I qualified my division of commentrs into two camps thusly: 'I believe that it is time to "pull out of thin air" another proportion."

S. Rivlin- I reserve the right to drop my sense of humor when people denigrate expressing emotions. For one thing, it's often (although by no means always) thinly veiled sexism and probably best avoided for those connotations alone. For another, emotions are useful. For a third (and probably most influentially, truth be told), it's just personally irritating to me. Because I, ya know, have emotions and stuff.
That said, I don't think I dropped my sense of humor. I just think my sick and twisted sense of humor is highly tickled by the idea of you being violated by farm animals.

Some statistics jokes

Speaking as a former mathematician, it's not okay to use "99%" to mean "I'm really damn sure", just like a biologist would not think it was okay to say that a "lizard is a mammal because they're both animals, wait I misspoke, oh you know what I meant!"

People do it, sure. I've done it, and I'll do it again, it's a common enough conversational gambit that people pull it out of their brain automagically. It's probably not significantly ethically dishonest to do it. It's still sloppy thinking. If you say 99%, mean 99%.

One of the meta reasons people in this country have bad math and science skills is that people in this country don't respect math and science skills... and that includes scientists and mathematicians. Say "one in gazillion"; that's a made up term and contextually conveys what you're trying to say.

I think this whole discussion is getting slightly diverted by the inclusion of obvious jokes and exaggerations, like the Edison quote (where nobody could seriously imagine that it was a scientific assertion, and it's almost meaningless to imagine a study determining that it's actually 7% inspiration). These sorts of statements are pretty obviously harmless but not really the issue here.

Here's a test case. Suppose I've just completed a study of computer viruses, and some reporter asks me who writes them. My study didn't address this, but I'm pretty sure they are mostly written by maladjusted teenagers and I know some examples of this, so I reply "Well, in my experience 90% of them are written by maladjusted teenagers." Even though I qualified it by saying this was just in my experience and I intentionally used a suspiciously round number, my use of a percentage could be misleading and suggest greater scientific value than my remarks deserve. Certainly, all honest scientists should avoid doing this.

I can sympathize with the other side, though. I have a strong emotional reaction against arguments that sound like "Oh no, people are abusing the sacred figures! They must not be allowed to contaminate our pure statistics with their dirty lies." The problem is that I believe a nonnegligible fraction of real scientific statistics are pretty much garbage. Some published studies are really shoddy, with poor experimental procedures, confounding variables that haven't been identified yet, etc. Sometimes these poor studies get the most attention, either because the results are surprising or because the problems are important and nobody has been able to do a good study yet. In reality, there's a whole continuum, from statistics that are totally made up to those that are very convincing, and there's no clear or principled line between good and bad statistics. When I hear someone make an argument that implicitly draws such a line, it gets on my nerves. Of course entirely made-up statistics are bad, but I don't like the implied suggestion that other statistics are necessarily good.

Still, I agree that it's easy to mislead people by making informal statements that could be given too much scientific weight, and honest scientists should be careful not to do so, even by accident.

"Scientist" is a professional title like "doctor" or "director of the CIA." If you're talking to your wife, you can do it as a human being. If you're yelling at your son to clean his room, you can do it as a human being. If you're griping over a pint with your friends, you can do it as a human being. As soon as you move out of your intimate circle, you cease to be a human being for the purpose of communications. You're a professional unless you specifically request to be otherwise ("off the record").

If that cramps your style, tough luck. If someone asks you about global warming after finding out that you're a scientist, you're on the clock. Are you an expert in climate change? If so, great, tell them what you know. If not, you explain that you aren't expert in this area but (one of:) by professional courtesy you assume that consensus in other fields has the same validity as in yours/you really have no idea of the validity of that consensus. For instance, I currently withhold my professional courtesy from economics, and I'll gripe about this to my close friends who know that I'm cranky, but on the record I do not consider my opinion professionally informed.

So my short answer is that a scientist is never justified is pulling numbers from thin air. On the other hand, as a number of examples above have showed, it's important to distinguish the two cases of a logical proposition A being true and a logical proposition A=>B being true.

What about situations in which one has numerous data values, say in m dimensions, where m > 20, for which there are n >> 20 observations for each of the m dimensions (ie a highly overdetermined system with a extremely small root mean error among observations). Such matrices can be highly ill-conditioned, even though they are extremely precise and perhaps even highly accurate.

Under such circumstances isn't an exaggeration nearly as good as any other value, since small perturbations of input values, however, precise (or accurate), lead to very large differences in the values estimate. That is, while the input data may be both highly precise and accurate, inferences that can be drawn from them are not. Is it not justified in such circumstances or even useful to draw some inference rather than none, if this such are drawn from the best available data?

Circumstances under which it is OK for scientists to pull numbers out of thin air?

More like this

Christianity Does Not Cause War

Puttin' on the Ritz Variation

Lurkers of the World Unite!

How to Effectively Teach New Concepts to Others?

Another turning point, a fork stuck in the road.

Friday Sprog Blogging: waking up.

Research methods and primary literature.

Friday Sprog Blogging: climate change and ecosystems.

Americans for Medical Progress names two Hayre Fellows in Public Outreach.

The Stingray Nebula and XKCD

Sunday Function

7 Questions with... Bora Zivkovic