How to grep a PDF file (Linux)

OK, I’m going to do this without looking. It will be something like pdftotext foo | grep whatever, right?

Let’s watch and see….

Well, close enough. Note that that was not being done on a Debian system. For Debian (like Ubuntu) you would use apt to install the tools.

apt-get install poppler-utils

Comments

  1. #1 Barry
    June 14, 2009

    Or xpdf-utils, Poppler being a fork of Xpdf.

  2. #2 richard
    June 14, 2009

    Wouldn’t it just be easier to open the pdf and search for the phrase? Then you’d know the context as well as the location of the string. I’m sympathetic to the wonders of the command line, but this looks like a fair amount of work.

  3. #3 Markk
    June 14, 2009

    richard, suppose you are searching through 1000 pdf’s of articles you snagged over the last few years? This can be a batch operation over many files this way.

  4. #4 Michael Spencer
    June 15, 2009

    Oh, gosh, you linux-people are a never ending source of mirth :-)

  5. #5 Buckaroo Banzai
    November 26, 2009

    With some not-so-recent versions of Adobe Acrobat you can search multiple PDF files at once. It may be (I don’t know) resonably slower, however.

    With some tweaks and experimentations this could be very interesting, I think. Coupling it with “less” (simply “| less” at the end of the line will do it, albeit I think it’s better with grep –color=always, so the term will be highlighted) and things like that.

  6. #6 Ben Zvan
    November 26, 2009

    Or you could navigate to the folder in the Finder, type the string you want in the search field and click the name of the folder to restrict the search and click ‘contents’ (default) to search within the files rather than the file names.

  7. #7 Rachelle
    October 14, 2010

    I don’t think Windows search, Finder, or Adobe’s search can tell me all the instances of “quadratic” or “quadratics” that occurs without “relation”, “function” or “equation” following it.

    Long live grep!

  8. #8 Hanno
    September 30, 2012

    I’m aware that this entry is a bit old to comment on, but just stumbled over the same problem and there’s a tool pdfgrep:
    http://pdfgrep.sourceforge.net/
    You might wanna try out.

  9. #9 Medoc
    France
    May 16, 2013

    You can use Recoll (free GPL GUI application) to search PDFs and other document types.

Current ye@r *