I am looking at the question: How many words are there in a language? I'd like to know for languages in general, comparatively, and for pedagogical reasons, in some well known western language which may as well be English.
What I found quite incidentally is a hornets nest of curmudgeonistic pedanticmaniacal jibberishosity. (There. Whatever the count was, it is now N+3)
(For more Falsehoods, click here. Also, listen to "Everything You Know is Sort of Wrong," on Skeptically Speaking Talk Radio. )
First I want to explain why I was interested in this at all. There has for some years been discussion of the vastosity of language, and how impressive this vastosity is in relation to the ability of a child to enlearn it all. Various studies have shown that children of a certain age know (as in recognize) a waylot of words, a virtual spoorload of lexicon. When you do the maths, it turns out that children are learning some horrific number of words per day from the time they are yajabbering infants in order to reach that number by said age.
Indeed, it has been guestimated that the number of words in English is far far greater than the number of words we tend to think there is in English, and rugrats know way more of them than anyone has ever ponderified. The usual story goes like this: The English dictionary you can find with the mostest of words is probably Funk and Wagnall's New Standard Unabridged, with just under a hemimillion (maybe 450,000) entries. When this list is adjusted to account for the fact that words are not really what they seem when they are listed in the dictionary, a sublist can be generated. If this list, about a quatromillion in size, is sampled one can make a test to see how many words a person, perhaps a rugrat, knows. Call the result the Lexiknowitall Quotient, if you will. Or, for simpleness sake, "L." (I will not be using the variable "L" for the rest of this post, so there really was no reason to tell you that.)
Given this, a fully growd adult with a high school education knows about 45,000 words. A six year old knows 13,000. Do the maths. To get from zero to 13,000 a child has to learn one new word every two hours. Watch them. You can see them doing it.
Well, really, you can't see it. Which is why this is all very interesting. Is it really happening, or is this just some fantasy of Steven Pinker, who would really prefer to think that the words are practically encoded in our genomes somehow. Perhaps, I imagine him thinking, we have a lexinome from which these words spring to be spoken in the context of our grammarome.
Anyway, if you go to Teh Google and ask it "how many words are there, huh?" you will get this one answer that is repeatedly plagiarized, and it is little more than curmudgeonistic pedantistery. In fact, I have identified it as a Falsehood of sorts. It goes something like this:
How can we tell how many words there are!!???? We can't even tell what a word is!!!???11?? (That's the falsehood part ... that we can't even tell what a word is.) And these are the reasons given that we can't tell what a word is:
1) What IS a word? If "run" is a verb, is the noun "run" another word?
OMG. I can't believe they start out with this one. Run to run and the run of a mill are utterly different things. r-u-n is a spelling, and ru-nh is a pronunciation. Run the verb and run the noun are two words, and there are many many things called "run" that are nouns. Each and every one of them is a different homophone, a different word. Duh.
2) What about inflected forms, like ran, runs, and stuff???11??
Ah ... no ... those are tenses and such. Not different words. And, in the study I mentioned above where the toddlers are learning a new word every several minutes, run, ran, runs etc. are NOT counted as different words. Or at least, that is the story as I have gorfed it.
3) Are compounds, such as man-child or man-eater or man-bites-dog different words?
Well, ok, there is a tiny bit of ambiguity here. Man-child, man-eater, and similar cases are clearly words. In English, this is easy to figure out. Take out the dash (or space). Does it still work? Then it's a distinct word. One example given was "man-bites-dog." That is not a word. It is a sentence where someone has put dashes in where the spaces are supposed to go. "Manbitesdog" is not a word. For the most part, the "compound" issue is as goosechasingness as one can get.
4) What is English, anyway? What about "veal" which comes from The French. Is that a word!!!??? Huh?!?!!??
More stupidosity. Yes, veal is a word in English. Jeesh. So is spaghetti. And pho. Give me a break.
5) What about obsolete words? Are they words!!!/???? Do you count them?
Well, no. they are words but we are looking for a lexicon, not a word list. If it's on the line and still rarely used it's in, otherwise it's off the list. Obviously. Duh.
6) What about the names of chemicals and stuff?E?E?E?
Well, there you've got me. That is a little ambiguous. Yes, they are words, but no, since chemists have a systematic way of creating the words in advance and there are a lot of combinations of chemicals (even those with a low existostiy index) then we can't count this any more than we'd count the arbitrary assignation of morphemes to, say, items on a Mexican Restaurant menu so we could create a word for every combination of taco, burrito, enchilada, quesadilla, etc. where a given meal can have from one up to six per plate. I totally ate ni a place with a menu like that in San Diego once. The menu itself was dozens of pages long, and only a summary of the actual theoretical menu. ("I'll have the bitacoqadroburrito, please.")
Either way it is an arbitrary non-lexicographic alinguistic expansion of a word list. Really, it is a verbose numbering system. Numbering systemds don't count.
But yes, "busigagor" (the Magic School bus transformed into an alligator) is a word. Bugigator is ... the word for the Magic School bus when it is an alligator. This is not hard.
(The above statements about the hardosity of counting words are cribbed from here and here. See also this, this, this, and this. The discussion of how many words there are is cribbed from The Language Instinct: How the Mind Creates Language (P.S.) and refers mainly to the research of Nagy and Richard Anderson.)
So how many words are there? Actually, it's kind of irrelevant because words mean little more than what they mean, and meaning has only a vague association with the details of the lexicon, which gives the curmudgeons and pedants nightmares. Or would, if they noticed. I mean, really, did you have any trouble understanding the meanings of minifalsehood, curmudgeonistic, pedanticmaniacal, vastosity, jibberishosity, spoorload, yajabbering, waylot, ponderified, mostest, hemimillion, quatromillion, Lexiknowitall, existostiy, alinguistic, or hardosity?
No, I mentated negatorially.

Looking for stuff about birds?
Learn more about Charles Darwin and his work.
Lean more about lions

Comments
Megabig interestingisity.
(Couldn't resistify.)
Posted by: Hank Fox | May 31, 2010 11:35 AM
Speaking of which, I make up words all the time, but I don't know many other people who do that. Part of it may be that I'm a writer, but another part, I've often thought, might be just the fact that, sometimes, I don't KNOW the right word I need, and so just wing it.
Posted by: Hank Fox | May 31, 2010 11:39 AM
I wish to engage in a protestification of the way you have spellified 'existostiy'. Clearlitudinarily, the spellification should be 'existosity'.
Posted by: Elf Eye | May 31, 2010 12:43 PM
I'd... gulp, like to um leave a comment, but I'm chortle laughing too extravagentully. Great postise.
Posted by: Sally | May 31, 2010 1:16 PM
I think this idea of counting words is rather further complicated by those assholes who make up words (like myself). Unless there is some threshold number of people who recognize and could use a word in context, there are probably thousands of words out there that are known/used by a relatively small number of people. In many cases such words make enough sense that people independent of each other might come up with the same word.
I tend to take the attitude that if random people would know what you mean, when you use a word you made up, it is a valid word. If context is a hard requirement for such understanding, then I would argue it needs to become relatively wide spread before it is entirely legit.
The bottom line for me, is that I refuse to let something as silly as a word not previously existing, stop me from using it when I think it is appropriate. Language "purists" who take exception to the coining of new words rather irritate me. Calling them "purists" irritates me even more - just what the hell is pure about a stagnant pool?
Posted by: DuWayne | May 31, 2010 1:49 PM
You seem to be saying that every different meaning a word has counts as a different word. Where then to draw the line among degrees of meaning? My dashboard dictionary lists about 50 variations.
Posted by: Rosie Redfield | May 31, 2010 2:19 PM
Rosie: I'm not sure what a dashboard dictionary is, but the problem of nuanced difference is not really that common.
I'm going to pick up a book and randomly open it five times, and pick the fifth word on each page that is not "of" or "and" or whatever. Here are the words:
sampling. point. gasps. financial. happening.
(ten points for anyone who can guess what book this is. I've reviewed it here on this blog!)
What are the five or six, or fifty, or even one word that is either almost sampling but a different word that may or may not count as a synonym, or the homophone that is different enough but not too different to make us argue about if it is a different word for each of these five? If this was a common problem, all five, or four of them anyway, would have candidates. If it is not a common problem, none of them.
I can't think of any, but I may be biased in favor of my own point and not trying hard. Well, actually, it is true that I'm not trying hard, for a couple of different reasons.
But you can. When I get back from the gym, I want to see two or three hundred nuanced homophones, which is what you seem to be implying must exist!
Posted by: Greg Laden | May 31, 2010 2:31 PM
Oops, forgot to say, about 50 variations for 'run'.
Posted by: Rosie Redfield | May 31, 2010 2:31 PM
run might actually be one of those words that has a lot of variants, but off hand I can't think of any to subtle to be questionable in their distinctitude.
Posted by: Greg Laden | May 31, 2010 2:36 PM
One way to think about this is the following. Take run as in a person running along in a race, and run as in the water is running from the faucet. They could be considered the same because they are both something moving along at a steady relatively fast pace.
But, can either word be perfectly replaced with another word that is not nuanced in overlap with the other word?
So, running in a race and running water from the faucet becomes running in a race and flowing water from a faucet. The person does not flow in a race literally. the person can flow in the race metaphorically. The fact that it is metaphorical means the meanings are different.
We have a zillion words for things that move along. The fact that some of them have the same sound (and remember sound-meaning link is arbitrary in language) does not affect their distinctness.
Ambiguity implies distinction (which is masked). That's why it's called ambiguity instead of homoguity.
Posted by: Greg Laden | May 31, 2010 2:49 PM
"For the most part, the "compound" issue is as goosechasingness as one can get."
In the related languages German and Dutch 'goosechasingness' would be one word. In Dutch the way these compounds are split can make a big difference. A company sold extra sturdy boxes to put heavy stuff in it. Heavy = zwaar stuff = spullen box = doos, so the printed 'zware spullendoos' on the boxes. Which is wrong. It should have been zwarespullendoos. A zwarespullendoos is a box for heavy stuff. A zware spullendoos is a heavy box for stuff. Likewise a gele truidrager is a jersey (trui) wearer (drager) who is yellow (hepatitis?), while a geletruidrager is the guy who leads in the Tour de France (Alberto Contador?). This leads to there being much more words in German and Dutch.
Posted by: Martijn | May 31, 2010 2:55 PM
"This leads to there being much more words in German and Dutch"
Or, it leads to the use of the phrase "words and phrases" when speaking of lexicon in some languages and not others.
Which is a phrase in English but probably a word in some other language.
Posted by: Greg Laden | May 31, 2010 3:01 PM
But, can either word be perfectly replaced with another word that is not nuanced in overlap with the other word?
Posted by: red pepper | May 31, 2010 3:58 PM
The 'dashboard' dictionary is just the dictionary that comes as part of the Mac 'Dashboard' utility. It's a very simple low-level dictionary.
For example, it doesn't have 'sampling', but it has five meanings for 'sample'. It has nine primary meanings for 'point' as a noun and two as a verb, plus another nine phrases that use the word in different senses. And separately, promontory. 'gasp' has two noun and one verb meaning, and 'financial' has only one. 'happening has two (noun and adjective, plus another four for 'happen'.
My point was that we don't know where to draw a line for meanings that are different enough to be different 'words'. I agree that 'tear' (rip) and 'tear (teardrop') are different words. Should the noun and verb meanings of 'run' be considered different words? - maybe.
Ironically, the basic problem is that the users of English haven't agreed on a precise meaning of 'word', one that would let us make the distinctions we'd need in order to count the number of English words.
Posted by: Rosie Redfield | May 31, 2010 4:05 PM
You really have to start speaking Toki Pona...
http://en.tokipona.org/wiki/What_is_Toki_Pona%3F
Posted by: Lassi Hippeläinen | May 31, 2010 5:15 PM
Rosie: Right, those are just the words I sampled, and the sample was not processed. So sample (rather than sampling) would be the root one would test for being a distinct word.
For the most part those different numbered definitions in the dictionary are the agreed upon different definitions, even if not everyone agrees perfectly with all of them. So when you look up "run" as a noun, and you get 12 entries, there should be 12 words, with a couple of possible exceptions. I can have the run of the place, the place could be a mill with a run (for water) down the street from a dog run, one of several dog runs made by a company in a run of dog run making, even though it is a run of the mill dog run (ah, it so so good to have the run of the comment page so I can run my mouth ... oops, that's a verb).
Anyway, I see all 12 definitions as being different words. That is why they are listed separately. We could as "which two of these definitions are candidates to merge into one?" but I'm sure the editors of the dictionary have asked themselves that when they made the definitions, and I'm sure there are lumpers and splitters, but I doubt the difference is more than a few percent.
I think the noun and verb forms have to be considered different words. Most of the noun vs. verb forms of run don't correspond. he run of a mill has no non-trivial verb (other than that the water runs in the mill) (Oh, I should mention that this is a bit of a joke ... the phrase "run of the mill" probably has nothing to do with a mill run, aka a mill race).
The dog runs on the dog run. Those two uses of run are connected. But the dog running is not a plot of ground and the dog run does not move. They really are different words.
We do have a definition of "word." It may not be as immutable and clear as a chemists definition of "element" but linguistics is a scholarly field and people get their PhD's and big fat salaries to sit around and do this stuff, so it must be real! A word is "a speech sound or series of speech sounds that symbolizes and communicates a meaning usually without being divisible into smaller units capable of independent use" (Websters).
Here's where the falsehood thing comes in: Yes, there are interesting academic games we can play with words and meaning and symbols, but linguists pretty muck know what a word is, and a room full of linguists all counting words with he same limited lexicon would come up with a number that would vary less than 10% if asked to do so.
But if you told the linguists "Count the words in this text of 10000 written wordy things. Write the number down on a piece of paper without showing it to anyone else. I believe they will all be the same" the linguists will laugh at you and say "oh, no they won't" and they will proced to come up with slightly different numbers based on their proclivities.
But, then add this to the instructions: "If, when I look at the numbers, they are all the same number, you all get free beer until midnight" they would all come up with the same number. Their proclivities towards sophistry would be outweighed by their desire for free beer.
Posted by: Greg Laden | May 31, 2010 5:19 PM
Lassi: Tomo telo li lon seme? Mi pilin nasa. Mi wile pana e telo jelo.
Posted by: Greg Laden | May 31, 2010 5:26 PM
sampling. A sampling of the shellfish were poisonous. A sampling of the shellfish showed 10% poisonous. By sampling the shellfish we found a 10% poisonous fraction.
point. A point is that which has no part. At what point on the map does our highway cross the city limit? What's your point? What's the point? Could you point out the point on the map with the point of your pencil?
gasps. He was five gasps from the finish line. There were five gasps from six people at the comment.
financial. Not even going to try.
happening. Is the happening that's an orgy at the neighbors' house (the ones that went on vacation leaving their teen-aged daughter behind) the same as the happening that was Woodstock? What about Nagasaki? What about a giant bolide crashing into the planet and wiping out all the dinosaurs? What about a giant bolide crashing into the planet and wiping out all the humans?
(I did the above while you were writing comment #16, but I'll post it anyway. BTW, is a post in the comments section of a blog post the same word as the blog post? One could be replaced with "comment" while I've seen a blog post called a "blog". Or is the latter a mis-use of the word?
The definition you gave above is obviously wrong: what about "intransitive", "inconceivable", or any other word formed using an affix that's also a stand-alone word. I'd suggest the "ice cream" test: if the meaning of the compound is different from that of the thing treated as a phrase, it's a word. Even if it's two words.
There's an incident recounted in (IIRC) Lord's Singer of Tales regarding when he asked several singers about words and they referred to whole formulas. Thus, if a pre-Homeric aoidos recited the Greek syllables for "when rosy-fingered dawn the early one appeared", he was singing one word, meaning roughly "next morning", but fitting a particular metrical requirement. To South Slavic (and presumably pre-classical Greek) bards this was evidently equivalent to "ice cream".
Posted by: AK | May 31, 2010 6:05 PM
Or should I say, 'To South Slavic (and presumably pre-classical Greek) bards this was apparently equivalent to "ice cream"'?
Posted by: AK | May 31, 2010 6:09 PM
My point was that we don't know where to draw a line for meanings that are different enough to be different 'words'.
In all honesty Rosie, I know where I draw that line and I am guessing that you either know, or have a good idea where you would draw that line. I sincerely doubt that your line would be in the same place as my own and while both of us are right, I would have to admit you are probably more right than myself.
The way I look at this discussion is from the perspective of a burgeoning linguist. The way you look at this discussion is from the perspective of an average (or maybe above average) user of the English language. While in science commonsensical assumptions are not particularly useful or valid - thus why I look at language the way that I do, the practical use of language is - really must be commonsensical.
Verbal language is what makes us human. It is how we see the world, understand the world - how we truly experience the world. Language is culture, society and the foundation of human cognition. In a sense, you really can't fuck it up. But, as a burgeoning linguist, I can try to do just that. I can try to box it, break it down, explore it's myriad facets. And that is something I very much want to do.
But I will not pretend that because I want to approach and explore language with the tools of science, that I am somehow imbued with some special wisdom to determine how valid your perspective on words really is. Especially when your perspective is bound to be far more practical than my own, when it comes to the daily use of this abstraction we call words.
That would be antithetical to what I have learned thus far, about the confluence of language, culture and cognition.
Ironically, the basic problem is that the users of English haven't agreed on a precise meaning of 'word', one that would let us make the distinctions we'd need in order to count the number of English words.
Oh, I don't think it is ironic at all. It is really the heart of this discussion. No words can avoid a certain level of ambiguity, though the level of ambiguity of course varies. And that is why I am right, you are right and Greg is right, though I doubt any of us really agrees when it come to the minutia.
Their proclivities towards sophistry would be outweighed by their desire for free beer.
I don't know Greg, for me it would be a matter of not wanting to be the asshole who prevented everyone else from getting free beer. Unless Unibrue was in the offering. If La Fin Du Monde was part of the equation, I might be convinced to have a bottle...Maybe...
Posted by: DuWayne | May 31, 2010 6:12 PM
Interesting topic, but probably a bit besides the point... I'll get to that.
First a little personal anecdote. I had a "speech impediment" when I was very young. Basically, it boiled down to my uninhibited use of words which I had heard maybe once and inferred a meaning for from context. Combine this with mild pronunciation problems, and I was pretty close to unintelligible. Immensely frustrating.
Anyway, the idea that people have a massive lexicon in their heads has always seem off to me. Brains just don't really work that way... processing and memory are normally relational. Humans are really good at pattern recognition in specific contexts, and "language organ/instinct" studies indicate that language is very very very much one of those "specific contexts". (Humans are also really good at facial recognition.) We don't need to know a lot of words in a rote memorization sense to communicate effectively since we are so fabulously good at deriving meaning from the phonetic relationship (roots and such) and the syntax (context).
Sure, people can rattle off a big long list of words (though not a big as most people seem to think), but this is because we can run our recognition systems in reverse to an extent and generate all sorts of words. Combine with associational memory, and you can 'remember' a lot of words; though amusingly these include words you have never actually heard or read but manage to "recollect" due to the fuzziness of the association matching with memory (basically a nascent false memory).
All this, combined with a lot of other stuff from computational linguistics and generally spending too many brain cycles thinking about it, leads me to the conclusion that people never actually learn a language in the strict sense. It is always a learning process, which actually helps explain how people are so damn good at communicating. You don't have to really share the same language (again in a strict sense) to communicate since you can assume the listener is actually learning as you speak. This also helps explain why normal language is so inefficient in a channel-capacity / information density sense... we include all sorts of redundant information to assist the listener's learning process.
On the down side, this view means that a lot of the more traditional approaches and categorizations in old-school linguistics are more-or-less off base. (Not saying they aren't valuable, but they miss a big important dynamic and are therefore hopelessly incomplete.) The idea of a language as a fixed lexicon plus syntax just isn't reflective or reality IMO.
Posted by: travc | May 31, 2010 6:33 PM
I can't quite tell how many layers of sarcasm you're laying on here, but you seem to be saying that it's stupid to claim that counting words is a difficult and ambiguous activity. Is that right? If so, this suggests a useful test case:
Run the verb and run the noun are two words, and there are many many things called "run" that are nouns
So, how many English words spelled "run" are there?
Posted by: Gareth Rees | May 31, 2010 6:49 PM
is a post in the comments section of a blog post the same word as the blog post?
Sometimes getting something wrong leads to new meaning, but in this case it is just getting something wrong. The big puppy above is a post. These little puppies down here are comments. My post is a commentary. These comments were posted. Four words.
A blog post is not a blog. But people use "blog" for "blog post" as a short cut. This is a common linguist convention. Thus, "blog" referring to a blog (the noun) and "blog" referring to "blog post" (also a noun) are two different meantings and could be two different entries in a dictionary some day, but for now, one just stands in for the other. Either way, they are two different words. One being, as you say, a mis-use for now.
The definition you gave above is obviously wrong:
If you are sure the definition I gave is wrong, you should contact Mr. Webster right away! Personally, I liked the definition. I don't get your point about afixes. "afix" is a big word, means lots of things in linguistics (it is really a statement about how utterances are organized, not what the bits do). The examples you give are meaning-changes. Conceivable is a word, and inconceivable is a different word. I do not see how that violates the definition. What am I missing from what you are saying?
if the meaning of the compound is different from that of the thing treated as a phrase, it's a word. Even if it's two words.
What you are missing here is that a phrase is equivalent to a word. Whether something is a word or a phrase (usually) has more to do with how a language is structures, sometimes as trivially as in how words are written down. This is why the definition given, which removes the trivial and misleading concept of "words" and replaces it, essentially, with utterances, then links utterances to meanings, is the way it is. I think you would benefit from assuming the definition is correct then rethinking these other bits from that point and see how that goes.
And, the bit you refer to at the end is a great example of this.
But why ice cream? Ice is one thing, cream is another, ice cream is a third thing, and icecream and ice-cream are both the same thing as ice cream. Again, I'm probably missing your point.
(There is a bit of narcotic haze going on here, so you may have to talk slower.)
Posted by: Greg Laden | May 31, 2010 9:24 PM
I don't know Greg, for me it would be a matter of not wanting to be the asshole who prevented everyone else from getting free beer.
Well,that's correct. But one could also argue that the free beer is the ultimate cause and the fear of asshatitude is the proximate mechanism.
Posted by: Greg Laden | May 31, 2010 9:27 PM
travc: You might be wrong or you might be right.
Assume the tests are correct and a ten year old knows 10k word (or whatever). What that really means is that the kid got 100 questions right out of 250 given, and the test was a sample of 25000 words. (I'm making these numbers up)
Like a turing test, it would be impossible to distinguish on this alone if the subject had a memory bank of 10K words, or had a brain capable of correctly identifying 10K words using a "memorized" lexicon of a few thousand "wordemes" (word like template thingies in the brain) and some processing.
The latter would be your explanation, and its the one I'm inclined to as well.
But I've disagreed with Steve Pinker twice this week already so I'm trying to back off a bit.
Posted by: Greg Laden | May 31, 2010 9:30 PM
Gareth: Websters on line gives about fifty. So I'm guessing about fifty. A few of those may be too subtly different for one person or another's taste, and there may be a few missing. There may be a couple (well, I'm pretty sure of this) that are archaic or very very esoteric, so maybe they can be left out (as suggested by my post). And some are grammatical oddities. Like a salmon that has finished it's run upstream is a run salmon. the first run is a word that has a distinct meaning that we actually use, the second is probably esoteric and used only by ... salmon specialists? Canadians in the Nortern territories? I'm not sure.
If one wants to argue that "run" is one word and all these definitions are variants, then one has to tell me why I should think a tear in a stocking and rolling five snake-eyes in a row are the same thing.
Or, more realistically, take the 50 or so words and make reasonable arguments that X pairs are really too close to call different and then tell us how many there really are. that could happen but the number is not going to go down too far.
See, the thing is, that is what Dictionary writers do. They figure out what the words are. But many of the words are the same sound and spelling but with different meanings so they have this numbering system and such.
I can't quite tell how many layers of sarcasm you're laying on here, Me? Sarcasm?
Posted by: Greg Laden | May 31, 2010 9:39 PM
If you are going to count obviously foreign words like "pho" as part of English, then the same thing counts for English words used by speakers of other languages. You can't listen to modern students speaking Hindi or Chinese without hearing an "English" word every few seconds. English speakers like to pretend that English is the only language that borrows words, but in reality they all do. It's just other example of fallacious British/American exceptionalism to think otherwise.
Posted by: Jonathan Badger | June 1, 2010 10:28 AM
I didn't know that English speakers were pretending that other languages don't borrow words. I mean, I totally get the exceptioanlism thing and all, but that I did not know. Do you have any examples of it?
By the way, a word like "pho" is not obviously foreign to everyone. New words that are regionally introduced become quite non-foreign locally, then that familiarity spreads. In my Minneapolis Neighborhood, pho is becoming as non-alien as taco. More so maybe.
Posted by: Greg Laden | June 1, 2010 12:27 PM
Jonathon Badger -
Wow, you're kind of a fucking asshole now, aren't you?
First of all, no one here claimed that English is the only word that integrates the words of other languages. Doing so would be supremely ignorant, as there are no languages that aren't derived from other distinct languages. But hey, feel free to whine about exceptionalism all you like, it certainly is a problem. You might however, look like something less of an asshole if you focus your claims of exceptionalism on examples being observed at a given moment.
Second, there is a huge difference between using roots like "pho" and using words like faux and claiming them as legitimate elements of the English language. Not that it is illegitimate to claim either, as faux is common to English vernacular. And accepting this as legitimate language assimilation, of course other language communities adopting words from different languages into their vernacular is just as legitimate.
You seem to be assuming some objection to a rather obvious conclusion where one doesn't exist. If you had simply pointed this out it would have been entirely reasonable. But instead you chose to be rather hostile in your approach, which is why I accused you of being a fucking asshole - in case you were wondering.
Posted by: DuWayne | June 1, 2010 12:31 PM
Oi - being out of Portland for so long, I didn't think of Pho in the terms of kick ass beef soup. That is even sillier in a sense, than what I was thinking.
Food names are never obviously foreign words. They are the words that identify specific dishes, or types of dishes and ultimately become a part of any language who's language community partakes of such dishes. Sauerbraten, for example, is a particular German dish that I am fond of, one for which there really isn't a distinctly U.S. American counter. Thus in the English language, sauerbraten refers to that dish. It is entirely reasonable to consider sauerbraten an English word, because there is no other English word to refer to that dish.
Posted by: DuWayne | June 1, 2010 12:38 PM
Your argument is that people agree by common sense on what meanings of a word are the same, but you count "run" as about 50 at Merriam-Webster on line (I assume that's what you mean by "Webster's"), and I count it as 3—one noun, one verb, and one adjective—plus 7 phrases that I'd say use familiar meanings of the verb. And I think M-W counts the way I count. Likewise I think your number of 450,000 at Funk and Wagnall's is the way I count, not the way you count. This is not sophistry or a slight difference.
"Rare" is listed at M-W as two words, both adjectives (one for birds and the other for burgers), so it's not like they automatically lump together all the senses that are the same part of speech. That's what I mean by homonyms, not "run a race" and "run a program".
Actually, you're the only person I know of who thinks "run of the mill" and "mill run" include two different words spelled r-u-n, not two different senses of the same word. But I haven't taken a poll.
Posted by: Jerry Friedman | June 1, 2010 12:42 PM
Mark Liberman has a response over at Language Log:
Laden on word counting
Posted by: Rosie Redfield | June 1, 2010 12:52 PM
Sauerbraten, for example, is a particular German dish that I am fond of, one for which there really isn't a distinctly U.S. American counter.
Interesting. English and German are sister languages. I believe it is the case that the largest European immigrant group in the US is German, and certainly the largest Euoimmigrant langauge group is Germanic (English, Irish, German, etc.).
So Sauerbraten does not have a locally adaptive form because it is culturally comfortable already, perhaps.
Posted by: Greg Laden | June 1, 2010 12:57 PM
My most enthusiastic contrafibularities!
Posted by: Alex | June 1, 2010 1:08 PM
Your argument is that people agree by common sense on what meanings of a word are the same,
No, I never said that. As DuWayne has pointed out in a comment above, common sense would never work. Humans manage to use this amazing thing called language quite effectively, but once they start to analyze it (to, for instance, describe the elements of a lexicon) they can't do that without a great deal of methodology. Humans interact with chemicals all the time (are made of chemicals as a matter of fact) but common sense would not be a good discoursive milieu to understand or describe chemistry. (We tried that early in Western Civ, Alchemy, it was not very good at it).
but you count "run" as about 50 at Merriam-Webster on line (I assume that's what you mean by "Webster's"), and I count it as 3—one noun, one verb, and one adjective—plus 7 phrases that I'd say use familiar meanings of the verb.
That would be because you don't know how a dictionary works. You see, the rip in a stocking and three aces in rummy are NOT the same thing at all. Both are runs. A run salmon is not run of the mill except at certain times of the year. And so on.
Here's where you've gone wrong, probably: When you look in the dictionary, and you see "run" in bold, that is NOT a word. It is an Entry.
The Entries are further subdivided by both word form (like noun, verb, etc.) and "definition" and each definition is, by the standard way linguists divide up the world of lexicon, different words.
This has been figured out years ago yet, as you demonstrate, many people don't know his any more than they can really explain the periodic table of elements.
Actually, you're the only person I know of who thinks "run of the mill" and "mill run" include two different words spelled r-u-n, not two different senses of the same word.
Your example of "run a race" and "run a program" would have been better here. As I've said all along, let's see the examples of words that may be better placed under the same definition!!! You may or may not have hit on one there. It would be arguable either way. I think they are different because if my wife is sitting there with her laptop out last tuesday and says "time to run it" I would not be sure if she meant running a program on her computer or running the race she ran that evening. Running a program and running a race are entirely different things. But, the problem here is where the meaning resides. The operative unit is the phrase in this case. So, if we were phrase oriented in our dictionary building (which we are not) we would have an entry for "run which meant fewer things than it does now, and an entry for "run a race" and an entry for "run computer code" and thus get the two meanings.
The point is that the meanings have to go somewhere.
The example you give here, though, that I'm apparently the only person who gets it, is not a good one. Run of the mill means ordinary. A mill run is a bolt of cloth.
Posted by: Greg Laden | June 1, 2010 1:09 PM
What do 'glempize' or 'glempization' mean? Not to mention 'glempizationary' etc etc etc. I know what they mean, and I bet, with a moment's thought, a lot of my friends would know what they mean, but do you? Have I just invented a new word? If I have, is it an English word?
I have no answers, just questions, questions...
Posted by: Szwagier | June 1, 2010 1:13 PM
Rosie, thanks for pointing that out. I will respond.
Posted by: Greg Laden | June 1, 2010 1:36 PM
I do want to point out that there is a misunderstanding developing here, that I now realize on reading the comments as well as Liberman's missive.
I have not stated, nor would I, that it is easy to tell what a word is, or that the ca 50 definitions of "run" are immutably different words. That I have not said that, and have in fact said something different, is clear to anyone reading the post and comments, but that idea seems to be out there.
I have in fact suggested that the 50 definitions of "run" work as fifty words according to that particular dictionary, where we define "word" (as it should be) as utterances with distinct meaning. But I've also suggested that it could be changed. There may be definitions that one would not count because they are too esoteric (the adjective "run" as in "a run fish" for example) or compressed because they are not sufficiently different. And, I asked for examples, got two, one was good, one sucked, IMO.
My OP's point is pretty clear if you read it, but I'll restate it here: The internet meme to which I refer (and I doubt any of the commenters above or Liberman at Language log have read them, which would not be hard as links are provided) constitute a virtual throwing up of the hands at any effort to count words because we just cant tell what the heck a word is. In contrast, I'm describing a world in which we can more or less tell what a word is, though we may disagree, but but not to the extent that there is a difference between counting "run" as one, three, or fifty.
The commenters above who have claimed that run is one word, or three, simply don't know what a word is, but that's OK because it is a technical term. With a fairly specific definition ... :)
Posted by: Greg Laden | June 1, 2010 1:44 PM
The point in stressing that other languages borrow English words is that it invalidates the argument that English has more words than other languages if one is going to count foreign words used in English as "English words".
A very common myth is that English is somehow more open to word borrowings than other languages (perhaps because of misleading news stories that claim that French has "banned" English terms like "hot dog" or "weekend". Despite what the ivory tower types in the Académie française decree, French people routinely refer to "le weekend" in real life.)
Posted by: Jonathan Badger | June 1, 2010 2:28 PM
Every English word that uses sub,a, trans, contra, ab, extra, in, inter, ifra, and so on is at least in part "borrowed" because those are latin afixes. It is probably the case that a minority of the modern English lexicon does NOT derive from an original English language (and I'm avoiding referring to a specific early language because that's a whole other ball of wax). In other words, if you sampled "English words" you'd find most derive from non-germanic languages, and thus cannot be "original" to English, and among the Germanic ones, some of those are borrowed too.
You can look this stuff up in any history of language or english language, but for fun I've opened my 1966 Websters ten times and picked ten random words, and here is where they originate:
4 English (early or unspecified)
2 Greek
3 Latin
1 Old French
Posted by: Greg Laden | June 1, 2010 2:44 PM
Do names count as words? Is Brooklyn an English word? How about Thoof?
What about jargon words? Where I am right now there are perhaps half a dozen people who use the word "nunboxing", as in "I think nunboxing will fix that". Is that a word?
Do you claim it is practical or useful to count all words regardless of how few people are using them?
It seems to me you are flat-out wrong about this. The whole post is just uninformed nonsense.
Posted by: Jason Orendorff | June 1, 2010 4:06 PM
Jason, as long as we're talking about semantics, "flat-out wrong" means something very different from "didn't address every instance of language you can think of". What part of the post is wrong?
And did you read #5? Your question about number of people using a word is actually addressed.
Posted by: Stephanie Z | June 1, 2010 4:28 PM
If you have a community of people using a word (numboxing) that pretty much makes it a word. I'm not sure why there is a question there.
In a study of a culture's lexicon, a small number of words would be picked up that were very quirky and local like numboxing, from the communities samples, and not from those not sampled. And that would make up a tiny part of the observed lexicon.
Do you claim it is practical or useful to count all words regardless of how few people are using them?
Imma let you read the post for the answer to that one, because I do say something about that.
It seems to me you are flat-out wrong about this. The whole post is just uninformed nonsense.
If you are honestly asking, because you don't know, if a word used by only a few people is a word, then I think you may not be the best judge of nonsense in this area!
Posted by: Greg Laden | June 1, 2010 4:32 PM
We're not disagreeing about how dictionaries work; we're disagreeing about what to call the elements they're made of.
The terminology I used is that of Merriam-Webster, the authors of the dictionary you're citing (though I didn't know it when I commented). They discuss it here:
There are twelve entries for "post". The number of separate meanings is higher—at least, I imagine we agree that "mail a letter" and "stand up and sit down on a trotting horse" are different meanings, though they're in the same entry. So according to them, the dispute is between the number of entries and the number of spellings or pronunciations (though my choice might be the number of separate origins, which is in between), not the number of "utterances with distinct meaning", as you put it.
The number of entries for "run" is 3, so the question they recognize for "run" is between 1 and 3. Since you envision something like 50, the number of words that "run" counts as does indeed vary between 1, 3, and 50.
I looked at another dictionary, the New Shorter Oxford Dictionary, for its definitions. It says [brackets mine]:
So each entry is about one word (the singular "word being defined") but has a whole section of definitions.
You wrote, Here's where you've gone wrong, probably: When you look in the dictionary, and you see "run" in bold, that is NOT a word. It is an Entry.
I think somebody might have gone wrong here. If it was me, at least I was with the people who make the dictionaries. I implied before that you're the only person I know of who uses "word" to mean "utterance with a distinct meaning". Do you know of anyone else?
I apologize for exaggerating your position before. But I hope I've got it now that you clarified it. Also, though you probably read my comment at Language Log, I'll repeat it where it's not behind your back: I enjoyed your verbicuneifying, but I'd have enjoyed it more with better rectiscription.
Now to post in a sense that's not in Webster's Third.
Posted by: Jerry Friedman | June 1, 2010 4:53 PM
Jerry: the number of "utterances with distinct meaning", as you put it.
Well, I did put it that way, but I was using "utterance" as short for "a speech sound or series of speech sounds" and "distinct meaning" as short for "symbolizes and communicates a meaning usualy without being divisible into smaller units capable of independent use"
Which come from Webster's definition of "word." (http://tinyurl.com/2tm5w5)
so when you ask "I implied before that you're the only person I know of who uses "word" to mean "utterance with a distinct meaning". Do you know of anyone else?" the answer is only one guy, Merriam something, starts with W.
Good point about origins. I had been thinking about that but I don't think I mentioned it. Two spellings can have two meanings because the meaning is extended (to run a horse and the locomotion of running are obviously closely related) but perhaps a run in the stocking comes from some other origin (which it probably doesn't)
It is easier to find interesting examples of that with phrases. Straw man is a good one. I think it has three origin stories, and it may not be the case that only one is correct.
Posted by: Greg Laden | June 1, 2010 5:15 PM
The point in stressing that other languages borrow English words is that it invalidates the argument that English has more words than other languages if one is going to count foreign words used in English as "English words".
And my point in calling you a fucking asshole, is that you barged in here and with some hostility responded to an argument that no one here was making. I don't think it is too tangential to the content of the discussion to render it an inappropriate point to make, to be clear. I just thought it was rather ridiculous to engage the point with such hostility. It makes you appear to be an asshole.
Jason Orendodorff -
Do names count as words? Is Brooklyn an English word? How about Thoof?
Absolutely. Given the fact that given names are often different in different languages - not just pronounced differently, but in many cases spelled differently, I think names quite definitively count as words. Why would they not?
What about jargon words? Where I am right now there are perhaps half a dozen people who use the word "nunboxing", as in "I think nunboxing will fix that". Is that a word?
This is where the discussion becomes subjective. Language isn't something that anyone, least of all academic types have some claim to. Personally, I believe that yes, nunboxing is a word. Especially as given just a little bit more context I could probably take my suspicion as to it's meaning and conclusively define it.
When it comes to jargon, of course we are talking about words. It is complicated by the fact that one specific language community may not have a reasonable claim on a specific word - but that is irrelevant to whether it is a word or not and whether it is part of the native language of a given user. There is all sorts of computer jargon, for example, that transcends language communities. While there is a lot of it that has origins in English, there is no reason that such jargon can't also be claimed as part of any other language in which it is used.
Just because a word that I might used would only be understood by a very specific segment of the population doesn't mean it isn't a word. I would take it further and argue that if I make up a word and other people know what I mean by it, it is also a word. Why not?
To be rather vulgar (mainly because it popped into my head as a great example) take the word "queef." I am not entirely sure of it's actual origin, but in the context of a particular noise that sometimes occurs during intercourse, it's usage began in my lifetime. While I am sure there are a great many American English speakers who have absolutely no idea what that word refers to, I doubt that a sizable percentage of people under the age of forty or so are among them. Would you consider that a word? If so, at what point in it's journey into general vernacular did it qualify as a word? If not, why not?
Do you claim it is practical or useful to count all words regardless of how few people are using them?
Personally, I don't see a practical value in counting words at all. But I do think there is practical value in considering a word to be a word, whether it is part of common vernacular or not. I think doing so is an accurate view of what verbal language is and how verbal language functions and evolves.
Arguing that a word is not a word, based on some arbitrary criteria is antithetical to functional language.
It seems to me you are flat-out wrong about this. The whole post is just uninformed nonsense.
Since you are all about the words, making a subjective claim about Greg's wrongness (i.e. it "seems" to you) and then making an objective claim that it is uninformed nonsense is kind of silly. This is made even more silly by your lack of any specific argument. All you did was ask a bunch of questions, following them up with claim that Greg is wrong.
What, pray tell, is Greg seemingly wrong about? And please, feel free to inform us in a more sensical fashion. Counter Greg's uniformed nonsense and enlighten us, please.
Posted by: DuWayne | June 1, 2010 5:56 PM
In Scrabble, proper names are not words.
Posted by: Irene | June 1, 2010 6:05 PM
The AskOxford faq you link to actually goes on to answer the question, after listing some of the different ways that "word" and "language" can be defined. An answer with real numbers and everything! So I don't know why anyone's knickers need to get in a twist. The real falsehood is the idea that the number of meanings a language can express is somehow tied to the number of words in the lexicon.
Posted by: Vicki Baker | June 1, 2010 6:08 PM
Jerry Friedman -
I implied before that you're the only person I know of who uses "word" to mean "utterance with a distinct meaning". Do you know of anyone else?
This would be me raising my hand. Not that that is the only way that I would use the word word. Take this example of the words "run," that has become a major theme in this discussion. Depending on the context of the discussion, I would say that "run" is one word with several meanings. \
A good example of this is one in which it has actually been relevant to me. When teaching my son to read I went with simplicity. I am not going to sit down with a four year old and explain to him that run is several different words spelled the same way - that is way too abstract for that stage in neurodevelopment. It is much easier to explain that run is a word that has several meanings.
On the other hand, if I were teaching a native Japanese speaker English as a second language, I would explain that there are several words that happen to be spelled r-u-n. Not that I have done this or would be capable mind, it is just that I have a friend who is very capable and has, in point of fact, taught native Japanese speakers English as a second language. And while he did not explain this concept using "run" as an example, it applies just as well as the example he did use.
Another context in which I would use "word" to refer to "utterance with a distinct meaning," is when discussing the evolution of language (generally in the context of said evolution's correlation with cognitive and cultural evolution). "Run" (to really get the most out of this example) is a brilliant fucking example of words that progress from strictly colloquial use, to absolute acceptance in the most formal writing. There are not many uses of "run" that would not be accepted without question by the strictest judges of academic/professional writing.
Just think about that for a moment and think about all the various words that are "run."
And in any formal discussion of language, I would consider word to refer to utterance with a distinct meaning, because at the end of the day "run" of the mill "run" and I "ran" away from the angry cop "run," are two completely different words that happen to be spelled the same. And the I am "run"ing for office "run" is different than either, a colloquialism that has become acceptable formal vernacular.
Is the distinction always important? Hell no. But then most formal linguistic distinctions are only important in specific contexts. When it comes to the daily use of language, none of it is important, as long as we can understand each other.
Posted by: DuWayne | June 1, 2010 6:23 PM
Irish is not a Tuetonic (aka Germanic) language. It's a Celtic language. Most linguists AFAIK consider the group more closely related to Italic than Tuetonic.
Posted by: AK | June 1, 2010 7:05 PM
The real falsehood is the idea that the number of meanings a language can express is somehow tied to the number of words in the lexicon.
The number of meanings a language can express is a bit of a vague term. If you mean it like I think you may mean it, then please read the comment subsequent to yours by Brayton the linguistics student.
Posted by: Greg Laden | June 1, 2010 7:40 PM
"rish is not a Tuetonic (aka Germanic) language. It's a Celtic language. Most linguists AFAIK consider the group more closely related to Italic than Tuetonic."
I meant, and said but way too clumsily, Irish people. I'm Irish, I grew up among Irish immigrants, and I've never heard anyone speak Irish. We were all busy speaking that Germanic language, English, which some assholes imposed on us in historical times.
Posted by: Greg Laden | June 1, 2010 7:42 PM
@Greg: I think you may be misinterpreting your friend Merriam. First of all, as I pointed out, he clearly implied in his front matter that a word can have more than one meaning. Second, "a" in "symbolizes and communicates a meaning" doesn't necessarily mean it's exactly one meaning. Here's one of their definitions of "doctor": "2 a : a person skilled or specializing in healing arts; especially : one (as a physician, dentist, or veterinarian) who holds an advanced degree and is licensed to practice". Obviously "an advanced degree" doesn't exclude people who hold two advanced degrees. (In another definition, "1 c : a person who has earned one of the highest academic degrees (as a PhD) conferred by a university", even "one" doesn't mean "exactly one".) So I'm going with the sense of "symbolizes and communicates a meaning" that agrees with their front matter.
@DuWayne: Sorry, but "at the end of the day" doesn't help. "Run of the mill" and "run for sheriff" strike me as metaphorical extensions of the original "run"—still the same word. I sort of think that when I learned "run of the mill", I assumed it was the same "run" without knowing the meaning.
The odd thing, of course, is that we're taking each other's side. As you and DuWayne say that "run" is dozens of words, counting vocabulary is far more problematic than I'd imagined it could be and than the lexicographers say, but your original point was that it's not so problematic.
Speaking of original points, Greg, you said what led you to this was measures of people's vocabularies. But as far as I know, the dictionary-sampling method of measuring vocabulary counts entries. See here ("at least one sense" is near the bottom of the page) and here (p. 7). If you count distinct meanings, you'll get considerably bigger vocabularies.
@DuWayne again: I agree that this topic has no relevance to communication (except insofar as this thread is a study in failure to communicate). But it's a FAQ at Merriam-Webster, so it must interest some people, and I'm one. Where I live it may have some practical importance to self-esteem, since some people have or used to have an absurd belief that average local teenagers have vocabularies under 1000 words.
Posted by: Jerry Friedman | June 2, 2010 12:06 AM
Jerry, sorry, your post got caught in moderation. There must be a phrase or word you used that is also used by one of my common trolls.
You lost me with the physicians. Are you trying to make an argument that all of the different meanings linked to utterances are less different if the utterences happen to be coded into the same lexicographic entity?
"He imitated an ape"
"He aped an ape"
In the second sentence, is "ape" as in to imitate and "ape" as in the primate get counted as one word, while "imitated" and "ape" in the first sentence as different words? Are the first ape in "I ape an ape every day" and the first ape in "I aped an ape last week" different words?
And by word I mean meaning-wise. (I assume this conversation has gone beyond the concept that words, spellings, etc are parts of a code and that we are confounded by homophones and such).
Answer: Ape to imitate and ape the primate are different words. Ape (to imitate) present tense and past tense are the same word. I don't care about no stinking metaphorical extension. Everything is a metaphorical extension of something.
It may well be that some of the meanings for "run" can be conflated. The fact that you could argue that some can does not affect my point at all, unless you are specifically arguing that if you find ambiguity in a list of definitions (of word meanings) in a particular dictionary means that we can never count meanings. I assume there will be ambiguity, variation between counts, etc. Some "words" will be harder than others. But I would also argue that the differences lexicon as a body of work (the dictionaries) owing to methodology and history as well as geography and population size add more variation to the mix than differences in opinion about what means what if a reasonable sample of terms was put on the table in front of a group of similarly trained humans (i.e. linguists) and I further re-assert that the variation in assignation to meaning would be exaggerated by the tendency towards academic yammering of both linguists and anthropologists, but that variation would mostly vaporize as soon as the beer challenge was issued.
If you are arguing that this is wrong, you'll have to make your argument more clear.
If you think I'm saying that there won't be ambiguities and arguments and different counts, then you need to go and re-read what I said at the beginning of my response in the latter post.
I don't take the sampling in Watts as limiting meanings. I don't think that if a random sample of the dictoinary came up somehow with "ape" to imitate and "ape" the primate when they were constructing the sample, they'd say "oh, well, they are part of one entry so somehow we have to cram this into one meaning"
Also, different dictionaries use different formats. I think people doing this research are not particularly confused about the concept of multiple meanings.
In your second link, they are making a distinction between entries and forms, not headings and meanings. You need to read the counter examples better before you use them.
I apologize if the above was a bit disjointed, I wrote it in three bouts interrupted by baby issues, and tempered a bit by a larger than average dose of pain medication (which really should be called anti-pain medication, now that I think about it).
Posted by: Greg Laden | June 2, 2010 1:09 AM
Jerry -
Sorry, but "at the end of the day" doesn't help.
Bummer.
"Run of the mill" and "run for sheriff" strike me as metaphorical extensions of the original "run"—still the same word. I sort of think that when I learned "run of the mill", I assumed it was the same "run" without knowing the meaning.
But they aren't metaphorical extensions, both have meanings that are entirely distinct from the action of using your legs to move more quickly than a jog.
"Run of the mill" generally refers to ordinary or plain. It is actually quite reasonable to use "run of the" to describe the ordinariness of an any industrial product, though it is not commonly used. "Run of the mill" originally referred to production clothing and has since been coopted to generically refer to anything ordinary. I would argue that this phrase is using run in a manner distinct from the original reference of a production run, because in general use it is not referring to production.
"Running for sheriff" on the other hand, has a distinct, formal meaning outside the context of the original metaphor. As does "race," in the context of the competition of two or more people for elected office. In this context both words have an origin as an interdependent colloquial metaphor, but have ultimately been coopted as formal descriptions that can and often are used dependent from each other.
As you and DuWayne say that "run" is dozens of words, counting vocabulary is far more problematic than I'd imagined it could be and than the lexicographers say, but your original point was that it's not so problematic.
I don't see where Greg is claiming that it isn't problematic. Indeed what I am reading him as saying is that it is actually quite complicated, but possible.
And for my own part, I am not arguing that lexicographers are wrong in how they define vocabulary. In some contexts the definition they use may be more appropriate - on the new post Greg wrote, I even gave examples of such contexts. But in another context defining vocabulary the way that Greg and I are is likely to be more appropriate - again, I give examples on the other thread.
Speaking of original points, Greg, you said what led you to this was measures of people's vocabularies. But as far as I know, the dictionary-sampling method of measuring vocabulary counts entries. See here ("at least one sense" is near the bottom of the page) and here (p. 7). If you count distinct meanings, you'll get considerably bigger vocabularies.
But that is just the dictionary sampling method, it is not the only metric. Sitting here I can think of eight distinct operational definitions for "word," and about four or five distinct operational definitions for "English language." Using different combinations of the above, one could end up with a fuckton of operational definitions that could be used to answer the question of vocabulary size. I can think of about a dozen that would useful in specific contexts in which one might be exploring vocabulary size.
I agree that this topic has no relevance to communication (except insofar as this thread is a study in failure to communicate).
The reason I mention this repeatedly, is because of my own interest in linguistics and because of it's relevance to my rather broad definition of "word." For my purposes, assuming a broader view is important, because I am interested in how verbal language evolves and the correlation of that evolution with cultural and cognitive change.
Explaining it in terms of everyone being an expert is useful because it acknowledges that broader view.
But it's a FAQ at Merriam-Webster, so it must interest some people, and I'm one. Where I live it may have some practical importance to self-esteem, since some people have or used to have an absurd belief that average local teenagers have vocabularies under 1000 words.
In the context you are describing, I think an understanding of how we actually assimilate language is more important than coming up with a specific number. While coming up with an absolute for the former is as wrought with complications as the latter, just using what we can actually observe would belie the stupidity of that claim. Basically, if you were to actually sit down with your average teen and get them to list all the words they know, their vocabulary would likely grow considerably while compiling such a list.
The act of making that list would force a teen to really think about it. There are many words in English that are intuitive, while there are many others that your average person is unaware that they know. Just because a person might never actually use a particular word in their lifetime, doesn't mean it is not a part of their vocabulary.
Barring pathology, most of us have near enough the same size vocabularies, to make the differences pretty irrelevant. I sincerely doubt that it takes all that long to pretty much develop the bulk of one's vocabulary. I would make a guess, based on my understanding of cognitive development, that by the time puberty hits most people have their general vocabulary. Beyond that, while there is jargon to learn, the actual increase in vocabulary is negligible in comparison to what is already there.
Posted by: DuWayne | June 2, 2010 11:44 AM
Greg, you don't have anything to apologize for, and I appreciate your not gloating over the errors in my previous post (you didn't say you started with people's vocabularies, and one of my sources was M-W's FAQ, not their front matter). Just for that, I'll take out a sarcastic sentence.
Here's what I meant with the "doctor" example.
I asked for other people who define "word" as "an utterance with a distinct meaning". You quoted M-W: "a speech sound or series of speech sounds that symbolizes and communicates a meaning usually without being divisible into smaller units capable of independent use" This implies distinct meaning only if "a" implies "exactly one", as far as I can see. But I think it can be read as saying that a word has one or more distinct meanings, which agrees with their FAQ. To support that, I quoted M-W's definition of "doctor", in which "an advanced degree" clearly means "one or more advanced degrees".
In short, I think the definition you quoted is consistent with the idea that "run" is either one word or three (verb, noun, adjective).
I've never said anything about how different meanings are and I'm not saying anything about it now. I'm pointing out that your definition of "word" is unusual in that you see different meanings as different words.
I have just now noticed someone who allows for that possibility, though. In your AskOxford link they estimate the number of English words as a quarter of a million, but "If distinct senses were counted, the total would probably approach three quarters of a million." So I'll admit that you and DuWayne aren't the only two people in the world who might count vocabularies that way.
Finally, something I know about. You can ape the ape, I know about that. I'm happy with definitions that make the noun and the verb (and the adjective, extant mostly in "go ape") different words or the same word. I don't have as many definitions for "word" as DuWayne does, but I can see more than one. The most natural for me is to say that there's only one word "ape" in English, and that it includes "aped", "aping", and "apes". Even meaning-wise (which is not a definition of "word" I use), I'd be comfortable saying that "aping his classmates" means "acting toward his classmates in a way that's traditionally considered typical of apes". You seem to assume a definition of "word" that requires the noun and verb to be different words, but I don't see why you reject the other definitions used by people, including lexicographers.
You're entitled to not care about metaphorical extensions, but other people are entitled to care, as I do.
I would also argue that the differences lexicon as a body of work (the dictionaries) owing to methodology and history as well as geography and population size add more variation to the mix than differences in opinion about what means what if a reasonable sample of terms was put on the table in front of a group of similarly trained humans (i.e. linguists)
I don't understand what you mean by the first set of differences (and I assume "in the" is missing before "lexicon"). Are you talking about the sizes of different dictionaries? If so, the overwhelming factor is how abridged they are, from elementary school to weapon for muscular murderers. But for the second set of differences, AskOxford has supplied an estimate: there's a factor of almost three in the size of an unabridged lexicon depending on whether different senses count as different words.
I don't say for a second that you think words can be counted without ambiguities.
I really think I understood both of the links I gave. I was pointing out that when people count vocabularies by dictionary sampling, they count dictionary entries. Aitchison says, "at least 50,000 words, with a word provisionally defined as a 'dictionary entry'." If she defined it as a distinct sense, "an educated adult" might have a vocabulary of almost 150,000 words.
True, she notes that "sings", "sang", and "sung" (and "singing") are subsumed in one entry, but that doesn't change her definition of "word".
Likewise Watts said, "...how many words on these pages can be used in at least one sense". This shows that he too used "word" to mean something other than "distinct meaning".
This is hard work even for people who don't have children. I may get to DuWayne's comment later.
I imagine you're right that "people doing this research are not particularly confused about the concept of multiple meanings," though they may not agree with you. They count the dictionary entries a subject knows "at least one sense" of. The results they get may depend on their choice of dictionary—"ape" is one entry in the NSOED and three in M-W—but I'm sure they can deal with that when comparing different studies.
Posted by: Jerry Friedman | June 2, 2010 7:45 PM
Sorry for getting back to this so late, but I want to clarify a bit.
In a substantive way, the "falsehood" isn't actually false. Clearly "counting words" requires some further explanation before even a fairly broad group of people agree on what is meant by it. Kind-of like the "breathing through your skin" thing... it really depends on what you mean by breathing.
Oh, and you have to define language a bit more rigorously.
A specific methodology for counting words will have implications on what that count (or comparison of counts) means.
Overall, I think "number of words in a language" isn't a very interesting or meaningful question. Forget "words" for a moment, what the hell do you mean by a language? Functionally, which is the only sense with a real grounding in reality, a language is the set of statements which are mutually understandable among a set of entities attempting to communicate. A statement is part of the language you and I share only if both of us actually understand it. However, because we make mistakes, infer meanings, and are constantly learning, all sorts of stuff (including made up fauxwordiness) is perfectly functional.
Now, what are words? Well, since natural languages (and almost all synthetic ones) used lexical items which can be combined using syntactic rules to form statements, calling those subunits words seems to make sense.
But what about affixes and suffixes? How about particles and/or conjugation? You may say, no, those are just modifiers. So does that mean assemble and disassemble are the same word? How this is decided will effect any word count dramatically, and it varies dramatically lot between different language families.
And how do we count (or justify not counting) strings of phonemes (or characters) which are just made up on the spot but are clear from context? Greg points these out quite amusingly in the post, but I don't think he really gets the implication. The meaning of a word, hell the entire essence of a word, is dependent on context. It is like trying to assign a fitness value to a particular allele... you have to assume a genetic background and environment. An allele conferring insecticide resistance in mosquitoes probably does something completely different (or nothing at all) in a human, and neither means jack for fitness on the surface of the moon.
Let me try to put this another way. How would you test if someone knows two of the "run" words that differ by meaning without providing the words in context? Are you testing knowledge of the word, or the ability to derive the meaning form context? Is there any difference?
What about synonyms? I could say "dog run", "dogrun", "dog paddock", "dog yard", "dog runnie", or "dawg ruun" and still have the meaning understood. Hell, is "dog run" a single word? It isn't all that similar to a salmon run, and I don't think there is such a thing as a cow run or a bird run.
I could go on an on... but instead let me just get to the point. You really can't count the number of words unless you actually define (or specify a methodology for determining) what a word is. Such a methodology would also have to nail down what is meant by language, which is also not nearly as clean as most people think. One can certainly do this, but what you actually get as a "word count" actually will vary by orders of magnitude depending on the specifics. And almost certainly many people will disagree that you're actually doing a word count.
Finally, "how many words" is just not a very meaningful question in the first place. We actually have metrics like expressiveness.
PS: Dictionaries are not actually used as a corpus by linguists (at least the hard-core mathematical sort I know) for a good reason. They are a fairly arbitrary post-hoc categorization designed for a pretty specific purpose.
PPS: This may be interesting... Haven't read it yet myself, but Stabler is scary smart and does excellent work.
Computational models of language universals: Expressiveness, learnability and consequences
www.linguistics.ucla.edu/people/stabler/Stabler07-UG.pdf
Posted by: travc | June 5, 2010 7:45 AM