Is (are) data singular?

To me, few things are more annoying than someone who nitpicks about grammar. Grammar is important, to be sure, but how much does it really matter if your sentences are grammatically "correct," as long as your message is communicated clearly?

Michael Bach recently emailed me lamenting that often reviewers comment that "the English could be improved" in his papers. That comment could be made about at least 99 percent of all papers published, but what does it help? If a reviewer can't point to a specific instance where the language is unclear, why make the observation in the first place?

But even when comments are specific, they're often not useful at all. Case in point: "data." Is "data" a singular or a plural noun? Purists say that data is the plural of datum, but Mike Kellerman questions that notion:

There are a couple of problems with the "data is the plural of datum" story. (These have been discussed widely on the web, and I'm drawing freely on those discussions). First, it is not quite right even in Latin to say that "data" is the plural of the singular count noun "datum"; both are conjugations of the verb dare, to give. Second, in English, we hardly ever refer to one piece of data as a datum; at least in political science it is an observation, a case, or perhaps a data point. When the word datum is used, it usually has a specialized meaning and takes the plural form "datums."

But here's the killer example:

The bigger problem, from my perspective, is that fully adhering to "data" as a plural count noun forces you into constructions like "How many data are enough?" instead of "How much data is enough?" The first of these, "How many data are..." is correct for a plural count noun, while the second, "How much data is..." is appropriate for a mass noun such as "gold" or "water." The second sentence sounds much better to me. It also wins on a Google Scholar search by a margin of 10 to 1 (2120 to 198).

So even though people claim that data is plural, they actually use it as a singular "mass noun." What's "correct"? The "proper" form, or the way the word is actually used by people? More importantly, either form is perfectly comprehensible by anyone reading it. It just doesn't matter.

Getting back to Michael Bach's complaint, I can certainly empathize with him. I can speak a little French and Italian, but I can't imagine putting together much more than a paragraph or two in either language, much less an entire scientific paper (Michael has published over 150!).

That's not to say that papers written by non-native speakers aren't sometimes a little more difficult to understand for us native speakers. However, generally, I find, criticizing grammar for the sake of "correctness" rather than understanding, is condescending at best and counterproductive at worst. Native English speakers are at a tremendous advantage in the scientific community today since nearly all "serious" science is published in English.

Bach has a modest proposal for native English-speaking scientists: All U.S. and U.K. scientists should pay into a fund to support English language tutoring for the remainder of the world's scientific community. It would be a small price to pay for the convenience of having everyone else adapt to our language.

Tags

More like this

Who's Afraid of Peer Review? by John Bohannon is about his experiments in sending a fatally-flawed paper to a variety of open-access journals, and the appalling lack of rejections that followed (note that PLOS-ONE correctly rejected it). To make it not too easy to reject just based on "I can't find…
A post over at the Scientist blog laments the difficulty in getting people to acknowledge the English-language bias in science: Many, perhaps most, scientists are grateful that English has become the international language, but an informative protest comes from Prof. Tsuda Yukio of Japan, who has…
Who says religion and science can't go together well? I just read an interesting paper by Kinzler et al.(1), published last year in the Proceedings of the National Academy of Sciences with apparent Biblical inspiration (OK, maybe not), as it begins with Judges 12:5-6 as an epigraph. In that passage…
Robert Krulwich has a typically brilliant piece on Shakespeare, roses, gendered language and the latest version of the Sapir-Whorf hypothesis. Boroditsky proposes that because the word for "bridge" in German -- die brucke -- is a feminine noun, and the word for "bridge" in Spanish -- el puente --…

...few things are more annoying than someone who nitpicks about grammar.

So ... I improve the writing, you point out errors, and he nitpicks, right? :-)

I suggest that the entire data/datum exposition is a red herring. You're not talking about someone whose writing contains "grammatical errors"; you're getting into the prescriptivist/descriptivist debate. I think it's obvious beyond the need for debate that if enough people use a construction one way, that way will become near-universal and will be (sooner or later) deemed to be "correct." I suggest that you demonstrate that data/datum is there already.

You state what I believe to be the core of the argument early on:

...does it really matter if your sentences are grammatically "correct," as long as your message is communicated clearly?

Clear and crisp communication is certainly the goal. But grammar (and spelling and sentence/paragraph structure) do matter in a subtle manner - they cause some (many?) readers to "hiccup" as they read, and to go back over things. Result - a certain increase in time spent and a possible decrease in comprehension.

I have read of research (but cannot give a reference)that seemed to show that grammatical/spelling errors did indeed have such effects.

And my final proposition - let's say I'm reading something which is perfectly clear to me as to meaning, but has grammatical and spelling errors. If it's something like a job application or a project proposal where my overall opinion matters, then my opinion will be lower for that instance than for the one without such errors. Is this fair? Assuming that this is not an ESL situation where allowances should be made, then I believe it is. What's more, it's almost inevitable, I think.

By Scott Belyea (not verified) on 01 Oct 2007 #permalink

As an all-too-frequent reviewer, I would say that I make a negative comment about the English of a paper in around 5% of reviews. I estimate that around 70% of the papers I review come from non-native speakers of English. If Michael Bach frequently gets such comments, maybe he *should* consider his options, such as a final proof-read, or asking a colleague to help. Of course it is hard to write in a language which is not your own, and of course I'm glad I don't have to. However, it has to be done, and should be done well. Very many authors do it well (in that their papers are clear to read, even if they may not be natively idiomatic). To criticise a reviewer for not providing examples is beside the point. Generally, it is clear (for example, from the unevenness of the writing) that the author is perfectly capable of improving their own English if they choose to invest the effort in doing so; an example would not provide extra information, and to expect the reviewer to give every correction for a paper which has an error or infelicity in most paragraphs is unreasonable.

By Mathematician (not verified) on 01 Oct 2007 #permalink

an example would not provide extra information, and to expect the reviewer to give every correction for a paper which has an error or infelicity in most paragraphs is unreasonable.

Surely reviewers have more choices than simply saying "the English could be improved" or documenting every example of poor grammar/usage. And while I can't speak for Michael, I'm certain he spends a great deal of time working on his English skills.

As someone who has taught writing to ESL students at both the undergraduate and graduate level, it's clear to me that many very bright students, despite significant efforts, find it very difficult to improve beyond a certain level. This isn't to say that they shouldn't continue to work on their writing, but without specific examples, they have no hope of improving: what they've submitted represents their best efforts.

But I suspect the reason that you're not sympathetic with Michael Bach is that we're really talking about two different things. There are some documents written by non-native speakers that are indeed very difficult to understand, so laced are they with grammatical and other errors. You're right -- these need to be fixed, and it's not the reviewer's job do it. But there are lots of other papers with minor errors, ones which don't impact the overall meaning, and while they might slow the reader down a bit, aren't a big problem.

If the paper's otherwise acceptable and there are only a few minor errors in grammar or usage, then I think the reviewer should provide a few examples to show the writer the types of things he or she should be looking to fix.

There's no obligation on the reviewer's part to document every single error, but I agree with Michael that saying "the English could be improved" without examples is irresponsible reviewing. You'd never say "the research could be improved" or "the graphs could be improved" without providing examples, right?

Thank you for addressing the grammatical dispute around "data"! I try hard to follow the rules of grammar, but I still have trouble treating "data" as singular - probably because it just sounds so wrong. The "mass noun" explanation makes a lot of sense.

I'd say "the English could be improved" if I was pretty sure the author could tell how. And yes, it happens all the time. Waste of my time to spend time the author couldn't find to do a job the author could have done. I agree that's not all authors, of course.

By Mathematician (not verified) on 01 Oct 2007 #permalink

few things are more annoying than someone who nitpicks about grammar.

Is that a subtle troll? People aren't things- but this is a problem with your word choice, not your grammar.

And echoing Scott from above,
...does it really matter if your sentences are grammatically "correct," as long as your message is communicated clearly?

Yes, it does matter- and I think you have stated a contradiction as a premise. It is impossible to communicate clearly without using proper grammar. Language has a set of rules. If you don't follow the rules, you can't help but confuse readers who expect the rules to be followed.

Written language is by necessity more formal than spoken language. Writing must convey all of the intent and nuance that a speaker might be able to get across with non-verbal cues. Likewise, effective writing must avoid the ambiguity that a speaker can resolve through repetition or rephrasing.

Think about human language as though it is a computer language. When my brain is parsing a sentence, it is looking for patterns among the words to define their relationships to one another. Grammar is a set of rules which helps the reader understand those relationships. It is possible to communicate without using proper grammar, but abandoning the rules makes comprehension a more difficult task for the reader.

It's bad enough that even if you DO follow the rules you can still end up with colorless blue ideas sleeping furiously. But simple things like verb-subject tense agreement make writing easier to read and easier for the reader to understand. And it's not difficult to use proper grammar- there are clear rules to follow, set down in black and white, with answers for nearly every question you might have. The Chicago Manual of Style is widely available and costs less than $30 at Amazon.com.

Since rules exist, and are available to everyone who cares to look them up, and readers expect writers to follow them, the only explanations an author can offer for failing to write with proper grammar are laziness or lack of concern for the reader. If those are the impressions you want your writing to convey to your audience, using poor grammar is a great way to get the job done.

Yeah, well if there's one thing that annoys me, it's people who use "data" as a singular! And I'm not talking about "How much data do we need;" that seems to me a special case and not bothersome. I'm talking about "This data shows..." and "The data is clear..." Sorry, those constructions are just wrong.

Medium/media
Stratum/strata
Phylum/phyla
Flagellum/flagella
Cilium/cilia
and...
Data/datum

If you don't care about "grammar," then I don't want to read your writing.

By Sven DiMilo (not verified) on 01 Oct 2007 #permalink

The title should be: Is "data" singular?
"Are" shouldn't be used in this special case since we are referring to the word "data" and not real data.
Anyway, I don't really notice - or care - about "data" grammars; I do notice other forms of mistakes sometimes, though.

Allow me to be the first to point out (sheepishly) my error above:
"Yeah, well if there's one thing that annoys me, it's people..."

Hoisted by my own petard!

By Sven DiMilo (not verified) on 01 Oct 2007 #permalink

The data/datum issue is a lost cause for the descriptivists. Data has become a mass noun in meaning, at least in America. Both Merriam-Webster & American Heritage list the singular as acceptable. OED still has plural only, but they've always been a bit stodgy.

Face it, languages change. And this is a good thing, as the world they're describing is constantly changing too. I'm certainly not the only one for whom "google" is now a verb am I?

Sure, it gets messy for a bit now and then, as anybody whose struggled over what to use as a neuter 3rd person singular pronoun knows: (s)he? he/she? hir? ze? (I like "they"). But the rules get settled on fairly quickly-- if nothing else because the speakers of the old dialect simply die off eventually.

Personally, I'm happy for language change, it keeps things interesting for us psycholinguists. But I still have my idiosyncratic dislikes, most notable 'dove' changing to 'dived' and the local "this car needs washed" construction drive me up the wall, but I'll live.

Steve @8/10:

Don't worry. "People who do X" is obviously a unified group and English treats groups as singular for verb agreement* (e.g., The reading group meets weekly NOT meet weekly) so you are certainly justified treating that set of people as "one thing" (the set) that annoys you.

Of course, this is really a mass/count distinction which is what underlies the data/datum debate, so you're still in a bit of trouble.

*American English at least, I think the other side of the pond is differs on this issue?

err Sven, not Steve. Sorry.

Epidemiologists habitually use "data" as a plural. Affectation or not, we do it all the time and then "data are" stops sounding weird. It's more a matter of training your ear. The general rule I learned was it could be either but you need to be consistent in whatever it is you are writing -- either plural throughout or singular throughout.

I was once married to an epidemiologist and I know them as very precise people. But "data" as a plural is NOT an affectation! It's correct! I privately judge people by how they use this word (also by how they pronounce "long-lived"). If that makes me a pedant, so be it.

breton (#11): Don't you mean it's a lost cause for prescriptionists?
"Media" I'll grant as a lost cause, but for "data" I will toe the line (at least in science) until I die.

By Sven DiMilo (not verified) on 01 Oct 2007 #permalink

Since rules exist, and are available to everyone who cares to look them up,

Here is a problem. How do you discover mistakes if you don't know the rule? Which rule applies in a specific situation? Either you have to relearn the language with complete grammar, or you are in Bach's situation where he possibly would want to make incremental improvements but isn't offered suggestions.

I know that I could never make that effort as grammar rules makes me fall asleep. Every person have different learning styles, and I just don't know how to learn a language that way. With some difficulty I can point to a subject or a verb as the concepts have been explained to me often enough. But I can't remember actively using them. (And I'm afraid that is quite obvious. :-P)

I think you can turn that around. If the author has made every effort to make himself understood to the his current abilities, why shouldn't an attentive listener or reader point out remaining mistakes that are severely annoying? In most contexts (some of) the readers do have feedback options, such as on the web or paper review/book publishing.

Communication goes both ways.

By Torbjörn Lars… (not verified) on 01 Oct 2007 #permalink

Related pet peeve:

Scientists (especially psychologists, in my experience) who pronounce "processes" as "pro-cess-ESE" as though it were the plural of "processis" (as "theses" is the plural of "thesis"). Those classical language plurals wreak their havoc, don't they? -- in this case by hijacking a nearby word.

By Eric Schwitzgebel (not verified) on 01 Oct 2007 #permalink

Data is a mass noun. Datum actually seems to be something else -- sometimes it means a single piece of data, but it also has a secondary meaning of "reference point" that data doesn't have. They are almost, but not entirely, two separate words. To those who say it "should" be otherwise, I can only say that hardcore prescriptivists tend to come off as pedantic jerks. A certain amount of prescriptivism is necessary to keep people able to communicate with each other (flout vs. flaunt, anyone?), but for the most part prescriptivists tend to be quite a bit behind the curve on language.

There's a reason classical languages are frequently replaced with vernacular languages or koines as cultures age -- literary/liturgical languages eventually tend to diverge so far from the spoken language that unless it's strictly enforced by a governmental authority (China being a prime example), the literary languages cease to be an effective means of communication. An example would be Charlemagne's huffing early in his reign about the "debasement" of Latin in 8th century France -- he was originally quite distraught over it, but by the end of his reign he had thrown in the towel, and the first legal documents in French appeared as two of his sons signed a treaty in vernacular Old French and Old High German in Strasbourg to oppose the third.

Eric:

/'prosesiz/ is not that egregious -- think about it. You've got three sibilants in close proximity. While it's definitely nonstandard, converting the schwa to a front vowel makes perfect sense as a hypercorrection in what, frankly, is a decidedly mushy word.

You've got three sibilants in close proximity

Now you'll probably get another thread started with people complaining that "close proximity" is a tautology.

By Chris Noble (not verified) on 01 Oct 2007 #permalink

me compltesly agree w/ u wen u say wot dos it matre 4 sentence be grammar correctly, for long as yor massage is spoken clearlee. me thing gramar nazi is bad as pincturation facist. u understand me prefectly cleerlee, know?

By Grammar Nazi (not verified) on 01 Oct 2007 #permalink

Dave Munger:

...minor errors, ones which don't impact the overall meaning, and while they might slow the reader down a bit, aren't a big problem.

Well, I really disagree with this. With all the stuff I need to read each day, if one item has grammatical errors that slow me down, I resent it. Unless reading it is mandatory, I may just drop it part way through and move on. Life is too short and too busy to allow the rude shortcomings of others to affect me in this manner.

Rude? Yes. If you can't take the time to learn to do it correctly (even to the extent of using a grammar/spell-checker), why should I be happy to "invest" more of my time than ought to be necessary to compensate?

Again, allowances should be made for genuine ESL issues.

By Scott Belyea (not verified) on 02 Oct 2007 #permalink

Sven @15: Just so you know, I privately judge people on the criteria on which they privately judge people.

As a reviewer, I have scored papers low (major revision) and made comments like "The English needs to be improved." This occurs when there are so many grammatical errors that I had trouble understanding. If there are only a few grammatical errors, I will point them out to the authors. Otherwise it is a huge waste of time.

There's nothing wrong with change. Regardless of its history, the word "data" (as it is used in everyday life as well as academic writing) has clearly evolved into a word that can be either plural or singular, like "moose" or "deer. This is natural language change at work.

When I wrote my PhD thesis, many years ago, the one and only grammar rule that I was officially told to follow is 'the word "data" is plural'.

Well, I thought that was stupid because obviously it isn't in many cases. But I didn't have to worry about it, because my field was mathematics and I didn't have any data.

By Paul Clapham (not verified) on 02 Oct 2007 #permalink

I'm a classics professor and teach Latin and ancient Greek: "data" is formed from the 4th principle part of the verb "give," do, dare, dedi, datum. Datum is a substantive, a noun formed from the 4th principle part, which is a perfect passive participle ("having been given"). Datum being a 2nd declension neuter noun, data is the plural form.

As you may have guessed, I say "the data are....."

Something in my brain rebels against starting a word with the same vowel sound as the end of the word before. "The data are" is just hard for me to say! Now, this shouldn't have any direct bearing on whether I should say "the data occupies 1.22 gigabytes of hard-disk space" or "the data occupy 1.22 gigabytes of hard-disk space", but I wonder if it doesn't bias me in favor of the former construction.

Now you'll probably get another thread started with people complaining that "close proximity" is a tautology.

Or at least a pleonasm.

What's "correct"? The "proper" form, or the way the word is actually used by people?

well, you've constructed a straw man here, as a quick check of my Webster's gives both usages as common and correct. as an editor, I check merely that the usage is consistent (like anything else) throughout the paper. (and those who claim to judge people one way or the other on this usage reveal their own ignorance.)

similarly, I'd argue that this sets up a false dichotomy:

how much does it really matter if your sentences are grammatically "correct," as long as your message is communicated clearly?

people often think that they're communicating clearly, because, of course, their intent is clear in their heads; however, a loosely constructed sentence can easily have two equally supportable interpretations, leading to all kinds of confusion for the average reader. (in fact, a lab that I worked in once discovered just such an ambiguity in a paper that *we* wrote, when a couple of years later we wanted to use or reproduce the result but weren't sure which of two meanings was the correct conclusion. eep.)

(p.s.) there is a solution to a reviewer's remark about language: good journals have copyeditors that will polish the English so that one can hardly tell the difference between native and foreign authors. then, the reviewer's remark helps separate "logic weak" from "language needs clarification" -- quite different in terms of whether the paper gets accepted!

I mapped a black hole once.

My data were singular.

By El Christador (not verified) on 02 Oct 2007 #permalink

If you're teaching Geology, and you say "This strata is Ordovician," it's wrong. If you're teaching Biology and you say "This flagella is of the 9+2 type," it's wrong. If you say "This worm belongs to the Phyla Annelida," it's wrong.

And if you say "This data is..." it's wrong. Sorry, it's wrong. I don't care how most people use it; a lot of people are stupid. Others should know better.

But then, I'm an insufferable pedant. *shrug*

By Sven DiMilo (not verified) on 02 Oct 2007 #permalink

re copy editors:
A mediocre one is worse than none at all. I have had pitched battles with copy editors that just want to insert their preferred phrasing even if it doesn't read as well. Once I used the Latin phrase "per anum" (through the anus) in a paper, appropriately (don't ask), and an editor changed it to the nonsensical (in that context) "per annum," AFTER I had seen the proofs. That one really pissed me off; it made me look foolish in print because of somebody else's error.
So maybe copy editing only for articles that really need it?

By Sven DiMilo (not verified) on 02 Oct 2007 #permalink

I'd just like to chime in here and suggest that using Google as a tool to sort of debates about proper usage is a great thing, b/c so many of them rest on the question of relative prevalence. Time and again, it seems to me that these debates over what is the a priori basis for deciding who's right, find that no one side can be shown without a shadow of a doubt to be correct. OTOH, an a posteriori appeal can give an unambiguous answer. After all, aren't these grammar rules all post hoc in any case?

http://www.google.com/search?hl=en&q=%22data+is%22&btnG=Google+Search

http://www.google.com/search?hl=en&q=%22data+are%22&btnG=Search

Sorry, people, but "data is" wins.

By boojieboy (not verified) on 02 Oct 2007 #permalink

Sven, regarding copy editors, I have to agree that "experts" who don't know the limits of their expertise are sometimes worse than useless. Considering that, you might want to listen to the people who are pointing out that the Latin word pairs you're citing in support of using data as the plural of datum don't really apply. Think goose/geese vs. moose/meese. English is a weird and wonderful language.

As a copy editor, my primary job is to see that the author's meaning is conveyed as clearly as possible. Frequently this is done by following the rules of grammar. Sometimes it means not letting the rules torture poor sentences to death. I'll happily let a preposition dangle if the only way to support it is to write something essentially backward.

By Stephanie Z (not verified) on 02 Oct 2007 #permalink

I didn't mean to impugn all copy editors; good ones are, well, good.

"Goose" (and "geese") is Indo-European via Old English; "Moose" is Algonquin, so that's a poor comparison. And the so-called "sportsmen's plural" for game species is an enirely different kettlle of fish. Hey, there's an example: "fish" for a bunch of individuals, "fishes" for a bunch of species.

The Latin word pairs I'm citing do apply, perfectly. The fact that a singular "data" is commonly used and even accepted is the result of laziness, pure and simple.

Sure, OK, language evolves; especially the gemisch of modern English with all its myriad borrowings. I accept that. But laziness and ignorance are poor reasons to permit such evolution. What if people started saying stuff like "This numbers is less than that one"...would that be OK with you just because people were using that way?

By Sven DiMilo (not verified) on 03 Oct 2007 #permalink

Sven,

Agendum/Agenda is missing from your list. Agenda is generally considered a singular noun, right? Because it's meaning has grown. Agendum is something to get done. Agenda is the plural, or a set of things to get done. But now we refer to this set, or list, as a singular object, a mass noun. Why? I would hazard that the agenda can have properties that any specific agendum doesn't. (eg The agenda is long). Same thing with data. An individual datum probably doesn't tell you anything. But put them all together, and the data can inform.

Point being that language should evolve, not haphazardly, but to allow for the most precise expression of thoughts. ("This numbers. . ." makes nothing more precise.) And I think the advent of the computer age allows for a singular "data".

I am just speculating on why agenda has become singular. But I wonder if all of the "these data are" people have an "agenda" agendum.

"the data can inform"
And in fact it is often true that data inform.
But not that data informs.

Still, your point about "agenda" is well taken. It's interesting. I think I am going to try resurrecting "agendum," as it's more pithy than "agenda item." (I would similarly argue that "datum" is preferable to "data point.") And I think I'll try "agenda" as plural too..."the agenda for our meeting are posted on the website"...I kind of like it.

I don't think that using "data" as a singular connotes any special properties that are lost by using it as a plural, though.

By Sven DiMilo (not verified) on 03 Oct 2007 #permalink

I agree that certain words have lost their original "correct" usage in common language (e.g. data becoming singular or plural). I disagree that grammar doesn't matter, and that the correction of it is "condescending at best". Simple grammatical errors that don't change the meaning of a sentence, but disrupt the flow are still important to correct. Everyone makes mistakes, but to "not care" to fix them shows laziness and ignorance. Of course ESL writers should be afforded more room for error, but academic, english-language speakers have no excuses for incorrect usage of their own language.

This is, perhaps, the most hilarious comment thread I've read at scienceblogs. Enough so that it's made me comment, when I'm usually just content to be enlightened, or at least to have some thoughts provoked.

The one point I don't really think has been made clearly -- 'data' is English, not solely Latin. Latin grammar really has no bearing on how it's used in English. (And yes to someone's question about 'group are' being standard British English. People mass nouns get treated as grammatical plurals. I'm not sure about non-people ones. Also, 'dangling prepositions' are perfectly legitimate English, and pretty much always have been. Just because a dangling preposition was literally impossible in Latin doesn't mean it can't be perfectly grammatical in English. And everyone should go read Language Log instead of having silly [but hilarious!] arguments about data/datum and whether or data is a count or mass noun.)

Sven,
"the data can inform" is independent of the number of the subject. I can inform you as well as we can inform you.

Still, I am glad you are taking up the agendum/agenda cause. To it you can add the stamen/stamina and medium/media causes. I am sure the media are going to be happy to cover these agenda of yours, given that your stamina for such discussion are strong.

It doesn't matter if the word was plural at some point in time, even at that etiological point. It matters if the word, as it is currently used, is used to draw attention to one thing or many things. Media refers to one thing with many parts, as does agenda or crowd. By using them in the singular, you are emphasizing the notion of a whole as opposed to a sum of parts. "His agenda is devious" implies not so much that each agendum is devious, but that taken together they are devious. Saying "His agenda are devious" implies that each agendum is devious. I will grant the point, though, that agendum is pithy.

Here is something to ponder; why not both, where usage is determined by what you are trying to emphasize? Then using "data" would no longer cause a boring "wrong or right" discussion, but a much more interesting and subjective style debate.

Sure, OK, whatever. You use your usage and I'll use mine and we'll let the language evolve whither it will. *shrug*
If nothing else, this comments has been interesting.

By Sven DiMilo (not verified) on 03 Oct 2007 #permalink

Interesting that this post generated more commentary than any other that I've seen. First, I'll say happily that I'm a descriptivist--whatever your feelings may be, "data" is widely preceived by most English speakers to be a mass noun. That is the status of the word in ENGLISH. In LATIN, the story is different, so saying:

First, it is not quite right even in Latin to say that "data" is the plural of the singular count noun "datum"; both are conjugations of the verb dare, to give

is just wrong, and not only because nouns can't be "conjugations" of a verb--Latin freely used (as do French, Spanish, Italian, etc. today) parts of verbs, especially participles, as nouns (datum means literally "a given thing," amanda means literally "a woman who must be loved" and so on). Here's the problem: for many reasons, none of them good, pedants of the English language have insisted on confusing Latin and English. You may not split an infinitive ("to boldly go...") because it was impossible to do so in Latin since the infinitive form of the verb was only one word (how to you split amare?) Likewise, you are not supposed to end a sentence with a preposition because it was unconsciously and intuitively unnatural to do so in Latin (as it still is today in Italian). But here's a shocker: Latin and English are two very different languages, and importing grammatical rules from one into the other is silly and pretentious--and will always eventually fail, since the real grammar of English says there's lots of words I can end my sentence with. Even prepositions.

People are no longer regularly taught Latin in school, and so educated speakers and writers tend to nervously hypercorrect. I'll give three examples, all of the same type. Many Latin words end in -us, like fungus, and we obligingly learned to say "fungi" and "cacti" in grade school, just like the Romans did. However, many Latin words that end in -us don't end in -i in the plural. Will you all be distraught to learn that the Latin plural of syllabus is just syllabus? (The only sensible English plural would be "syllabuses," but as a good descriptivist I have to say that "syllabi" is common enough to be correct--in English). I actually had to make a plural out of "nexus" the other day, in front of a classicist no less, and I blurted in a panic "nexi?" No, the Latin plural of nexus is just nexus--but we should just say "nexuses" (not that we need to very often).

Finally, the much belabored plural of virus? Virus was a mass noun in Latin, and there's no evidence it ever had any plural at all--it meant something like "filth" or "dirtiness," and you won't see the English words "filths" or "dirtinesses" very often, either. But English can have a plural of "virus," since it means something different in our language, something that is countable. And it should be a nice English plural: viruses. So, unless you're going to learn, really learn, Latin and ancient Greek, you should do what comes naturally--and treat data as a singular (mass) noun.

By Rob Rushing (not verified) on 03 Oct 2007 #permalink

"the data can inform" is independent of the number of the subject."

...which is why I left out the "can."

Since nobody agrees with me, I suppose I should learn something. I am no linguist (as, I guess, is obvious)(though I do resent being called "ignorant" and a "jerk"). I admit I had never even heard of the mass/count distinction before I weighed in here. I also admit that I just now looked up "mass noun" on 'kipedia. Just to address a few counterexamples, however (and these are opinions, maybe even ignorant ones):
The data/datum thing is not comparable to dangling prepositions or split infinitives or other syntax-specific cases; of course different languages use different syntax(es); that's a different issue from usage of a particular word or class of words.
I also don't see "crowd" or "people" (in the sense of ethnic group--"people" is of course also a nice, ordinary plural of "person") as good analogies, because one can count crowds or peoples; I can talk about three or four crowds coming together to form a bigger crowd. If you see "data" as that kind of collective noun (committee, herd, flock) then you shouldn't mind me putting two datas together to make a bigger data. (I don't think this is how people use "media" either.)
The examples of mass nouns given in the original post were "gold" and "water." I guess that's the kind of mass noun that most of you think "data" is, or should be: the water is cold, the gold is yellow, the equipment is rusty, the data is flawed. That usage makes sense in constructions like "add my data to your data and we'll have more data," but that also makes sense if "data" is a plural count noun. Other ways that "water" is used do not lend themselves to substitution of a plural, including the original post's example "how much money do we need," and "this furniture suggests poor taste." I agree that this is how most people use "data" and "media," and this is the usage I (perhaps rashly) denoted "wrong."
But here's something interesting: we also have the singular term "data set" (short for "set of data") to use in most of the mass-noun situations (not, however, for "how much"). This term, I guess, functions like a collective noun (crowd, tribe, team, bunch). Collective nouns like this are applied to plural count nouns, not mass nouns, correct?

Finally, some context for my arguments above: I am a biologist (an area in which I am much less ignorant). When I talk about "my data" I really do mean a collection of discrete measurements or observations. I'm not using it as a mass noun, as if "data" was a substance like jelly or air, I mean something very similar to "numbers" or, well, "measurements." "Your data show that I was wrong." For me, "how many data will we need to collect" makes perfect sense and doesn't even sound jarring.
On further reflection, however, it occurs to me that, unlike "data set," where I would argue that data is plural, the common term "data point" (for datum) doesn't bother me, and there data is clearly used in the mass sense: point of data, blade of grass, piece of furniture, head of cattle, milliliter of water.
Frankly, though, that just makes me want to avoid using the term "data point." I really do not see that "data" as a mass noun (not really "singular," I now see--thank you) conveys or connotes any more precision or subtlety of meaning than its original plural-count sense. If, then, the argument for "data" as a mass noun is purely descriptionist ("hey, everybody's doing it!"), I prefer to remain a stodgy pedant. The descriptionist attitude that a word borrowed directly from another language need not be used in English the same way bothers me. In biology we use a lot of terms that are borrowed directly from Latin (or Greek), and the rules of usage (at least in the sense of singular vs. plural) are retained, as far as I know (I will cop, however, to "stamens" for several male flower parts, which until now I had no clue was related to "stamina").
I would venture to say that all descriptionists would agree with the commenter who wrote "A certain amount of prescriptivism is necessary to keep people able to communicate with each other," yet there is no bright line marking "a certain amount." You don't mind "data" crossing your line, but it's going to remain on this side of mine. I am going to continue using "data" only as a plural count noun, and so are my graduate students.

But I promise to stop telling others they are wrong.

By Sven DiMilo (not verified) on 04 Oct 2007 #permalink

I just had a conversation with a friend (a biologist) about this the other day, and I was doing some googling and came across this thread. I want to add two things:

First, that last comment from Sven was really great.

Second, one of the things that I find strange about "data" as a plural count noun is the idea of saying "I have ten data." I've never heard that usage from anyone. I come from a software background, and I've only ever been in situations where there were large volumes of data to deal with, so the mass noun usage feels right to me. But I've worked with a number of biologists, and I know that many of them had to spend lots of effort (especially in the past) for each individual datum, so maybe the count noun usage comes more naturally to people who went through that.

All the biologists that I know are drowning in data nowadays, though, so I wonder if there will be a generational shift there.

Sven: The major problem I have with strict prescriptivism of the kind that you espouse is that, by your own principles, you are speaking English entirely wrong.

Do you use Shakespeare's English? It certainly doesn't appear so from your postings. Either you're deigning to use the vernacular in the digital realm (^_^), or you actually speak like that.

Well, how do you think your current mode of speaking evolved? Your current use of words, grammar, and so on is different from Shakespeare to a relatively significant degree. That's not even going back very far. Perhaps you would like to compare your English to that of Chaucer? It's quite difficult for a modern speaker without training to understand Chaucer's English.

Was Chaucer using bad English? Was the Bard? Or were they speaking correct English for their time? If they are correct, then you must be, by your own admission, speaking English badly.

Unless you choose to abandon strict prescriptivism, of course.

By Xanthir, FCD (not verified) on 28 Nov 2007 #permalink