Question for the Linguists/Psycholinguists in the House

By mixingmemory on August 10, 2007.

Does anyone around here know of a program or programs that can do the following things with text:

Frequency counts for parts of speech (nouns, verbs, adjectives, etc.).
Sort or score words/phrases based on how abstract or concrete they are.

UPDATE: Thank you everyone for the suggestions and tips. I'll try them out tomorrow when I get in the lab.

Since I asked without giving you any details, let me give you a brief, though vague description of the project. A few years ago, another psychologist and I wrote a review/theory paper about a particular type of category that we thought sounded plausible, and could have important implications for concept research. We tried a bunch of different ways to test for the existence of these categories empirically after we published the paper, but it proved difficult, mostly due to my own lack of creativity, and ultimately the research program stalled. However, this spring, I sat down with another colleague who'd been doing research that was related, though not directly linked to the paper. In one lunch (well, I just had coffee), he and I came up with a bunch of possible empirical routes, one of which involved the typical/ideal distinction that the concepts folks out there might recognize from Larry Barsalou's work on ad hoc categories from the 80s and Doug Medin's work on concepts and expertise. Basically, we wondered if the prototypical members of our type of category might be ideals, rather than central tendencies, much as the prototypical members of ad hoc categories, and the categories of some experts, are ideals. If that was the case, then we'd have a pretty good way of determining whether a particular category was one of ours or not.

To make a long story short, we had participants list characteristics for and examples of typical and ideal members of various natural categories, without hoping to find anything in this particular task (it's meant to serve as a comparison for another task), but in entering all the characteristics people listed, I began to notice some things -- like possible differences in the word-types (e.g., adjectives vs. nouns) used to describe different categories, and the abstractness of the characteristics (not surprising, since adjectives tend to be more abstract), and after the three of us working on the project talked it over, we decided there might be something interesting in there, but we weren't sure exactly how to measure those sorts of things.

More like this

the MRC database will have those parameters for words.
Web interface:
http://www.psy.uwa.edu.au/mrcdatabase/uwa_mrc.htm

It's also pretty easy loading it into a database and querying the DB using PERL, so you can read a file and go word-by-word. imageability of phrases might be trickier.

FriMan.

The most accurate way to determine word type is to use a parts-of-speech tagger. There are several open source POS taggers available with accuracies that vary from 98 to 99.xx %. My favorite is one from Carnegie Mellon by Adwait Ratnaparkhi - written in Java.

You'll need to write a little code to run it and use the output ...

Abstract vs concrete is a little harder. How do you define abstract vs concrete? One way is to use the Wordnet database (open source) and look at the hypernym/hyponym relationship - see

http://ieeexplore.ieee.org/Xplore/login.jsp?url=/iel5/10203/32544/01521…

or read the excellent book on Wordnet.

Good luck. Sounds like a fun little project.

Ashok Khosla
CTO & CoFounder
TuVox Inc.

you might try the university of south florida norms set up by douglas nelson. I believe there are imagability and concreteness ratings on many english words. you'll have to set something up to parse the files though

Have a look at LIWC: "Linguistic Inquiry and Word Count (LIWC) is a text analysis software program designed by James W. Pennebaker, Roger J. Booth, and Martha E. Francis."

I'm about to use it in some current research. Not sure whether it does abstract/concrete, I haven't had a close look at it yet.

The MRC database is quite good for a lot of things but uses the ancient (40 yr old) Kucera & Francis written word frequency norms. You might also try CELEX and the Linguistic Data Consortium.

http://www.ru.nl/celex/

http://www.ldc.upenn.edu/

The MRC database is quite good for a lot of things but uses the ancient (40 yr old) Kucera & Francis written word frequency norms. You might also try CELEX and/or the Linguistic Data Consortium. [The inclusion of links seemed to toss the comment into the trash bin, so they're omitted.]

Natural Language Toolkit:
http://nltk.sourceforge.net/index.php/Main_Page should work for you. Can do a lot of useful things.

1. Doable, with some good suggestions above. I would recommend CELEX.
2. I don't think what you want is possible. The established norms for concreteness ratings aren't large enough, as they are less than a few thousand words, with no coverage of phrases. You would have to come up with your own concreteness rating norms or a coding scheme, for the features people list.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Glyphosate reduces soil biodiversity and decreases the proportion of native species (French)

More by this author

Marvin

August 25, 2008

Back to real blogging soon, but before then, I wanted to post this. You probably saw a bit of this during NBC's Olympics coverage, but the whole thing has to be seen. It's one of the coolest things ever, though me being a huge Marvin Gaye fan might have something to do with me thinking that:

He's Just a Frackin' Adolescent Ass

July 26, 2008

Way, way back in September of 2005, a Danish newspaper published some cartoons depicting Muslims and their prophet, and in response, thousands of Muslim extremists responded with varying degrees of threatened and actual violence. As you all know, this resulted in a storm of media coverage around…

Fart Spray (And Disgust) Makes Moral Judgments More Severe

July 9, 2008

I've been meaning to post about this set of studies for a while, but because it's relevant to Chapter 4 of Lakoff's The Political Mind, I figured I'd better get around to it before I write the review of that chapter. It's been a while, but in the past, I've talked a lot about new theories of moral…

I Can't Understand Your Accent, So Keep Talking

July 8, 2008

I have this friend from New York who, most of the time, speaks in a normal (that is to say, southern) accent that she's acquired as a result of being surrounded for so long by people who speak the King's English ('cause Elvis was a southerner). Occasionally, though, usually after she's been talking…

The Political Mind, Part IV (Chapter 3)

July 7, 2008

In Chapter 3, we finally get to read all about the Strict Father and Nurturant Parent. I knew this was coming, of course, but for some reason, when I finally got to this chapter, I still felt surprised. I mean, at some point, you'd think he'd give up metaphors that even his own epigones can't find…