Can’t a machine do that?

By cpikas on September 5, 2009.

...'cause I thought I heard of a software and I know people at x conference said and seems like....

I get this all the time. Most recently I did a pretty detailed presentation of some analysis I did. Once I was done, I got the question: can you demo the tool that provided these answers for our boss? Another time that sticks out in memory - a customer saying to me: oh, find out what software or algorithm P (a male member of our group) uses to give all of that helpful info. Just as I thought, P had a reminder set up so that every Wednesday morning he opened the web page and searched and scrolled through the new entries. There was no software. There was P and his knowledge of what a ton of different groups around the lab needed (alas, P ~~was laid off~~ retired).

Similarly, ever since I wrote an article on blog searching for competitive intelligence, I've sort of browsed the alternatives for retrieval and analysis of blogs specifically, but also internet and literature content. I've also been chatting with some folks who do geolocation(pdf) and sentiment analysis of blogs as well as chatting very briefly with a computational linguistics guy who asked some very pertinent and probing questions about what research databases, human indexers, and librarians actually do or could do better than google or a smarter version of google. Beyond the whole reference interview - which IS important and part of the answer - there's this horrible mismatch between what people think the content/information analysis tools can do and what they actually deliver in the way of automatic categorization, summarization, question answering, sentiment analysis, etc.

If you need more than my experience, consider the disastrous search engine described by Tunkelang (among others). The mismatch between people's hopes (and the hype) about Wolfram alpha and the reality (sometimes funny). Another example is the over hyping of Scopus as 95% precision and recall for author identification (first of all, not even the right metrics, second, no freakin way). Scopus author and institutional identifications are often wildly inaccurate, IMO, and I hope they don't sue me for saying so!

There are lots of researchers working in this field - thousands - and there are lots of companies providing these services to businesses and government. Ideally, you'd have some sort of dashboard so that for a new product or campaign you can see how it plays in Peoria. You might also be able to figure out what features are needed or what new products are needed. Other types of dashboards would show what research is being funded, how your researchers rank, and what your competitor is getting ready to do based on job announcements, press releases, etc. Don't get me wrong, people do sell very expensive products that do each of these things (some in the 6 figures US$ or higher), but they're really not where they replace an analyst, they just make analysts better. See this discussion by someone who has used a bunch of the social media monitoring technologies (via enro).

In practice - and I do this for a living although I don't consider myself an expert - I use these tools when I can (not the $$$$ ones), but the biggest part of what I do is to use these tools to find "interesting" places in the graph that I can then investigate to see why or what's going on. A lot of creativity is needed in searching and understanding how different people use different words to describe the same thing. Then, once you get a bunch of content that might have the answer, you need to visualize it and run some of these tools against it to find out where to look. And then you read and think and discuss and question. Ultimately the systems right now are at the point where they can say: look around here - that's probably where what you want to know is but they're not at the point of saying: 42. IMHO.

More like this

There is a similar dynamic in some of the research I've been doing (using sensor networks to do some particular field bio observations). The engineering/CS side often looks for completely automated solutions. The bio side doesn't trust automated classification (for pretty good reason most of the time). Fortunately a lot of people actually grok that it makes the most sense to have automated systems trim the data down to a manageable size which humans then deal with (making all those complex judgment calls.)

"Human in the loop" is the term of art we've adopted.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Yeah, me too.

August 2, 2010

I'm also leaving ScienceBlogs, but it's not for the reasons some others have given. I don't think Pepsi's blog will hurt my real life reputation and besides, it's been pulled, there have been apologies - it's time to forgive. July was the first month I've gotten enough hits to get a paycheck - and…

Very cool - American Physical Society offers free access to public libraries

July 29, 2010

This APS rocks! Here's the press release from PAMnet: FOR IMMEDIATE RELEASE APS ONLINE JOURNALS AVAILABLE FREE IN U.S. PUBLIC LIBRARIES Ridge, NY, 28 July 2010: The American Physical Society (APS) announces a new public access initiative that will give readers and researchers in public libraries…

Michael Pater, Connecticut artist, died today

July 25, 2010

He was also my husband's uncle. I only found two of his images online, the remainder are photographs of prints we have on our walls - intentionally poor quality for those. He was a member of the Lyme Art Association, so there may be more information on their site. The Courant (Hartford, CT)…

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

July 25, 2010

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific…

Well, sometimes you just have to Google it

July 21, 2010

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work…