I’m looking for a command-line program that will print out the text of a PDF file, just like cat
for a text file.
I’ve found pdftotxt
, and that would be workable, but I’d prefer something that replicates the cat
functionality because I want to pipe to grep
. Thanks!
Advertisement
Answer
On the man pages for pdftotext
, I found this:
pdftotext [options] [PDF-file [text-file]]
Description Pdftotext converts Portable Document Format (PDF) files to plain text.
Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is ‘-‘, the text is sent to stdout.
Thus to output to stdout
in order to pipe to grep
use this:
pdftotext mydoc.pdf - | grep mysearchterm