I have to convert some PDF files to TXT. I end up with “less” command, because for example pdftotext has some problems with tables in PDF. The problem is that when I ran the command from exec function (or shell_exec/system), less just showing me information, that selected PDF is binary file and result file is just TXT with PDF data in it. But when I do the same thing normally in terminal, everything is ok. I also tried to login as www_data user and ran command as this user, but there is also no problem.
Command:
$ less /var/www/original.pdf > /var/www/new.txt
PHP code:
exec("less -f /var/www/original.pdf > /var/www/new.txt 2>&1");
Result from PHP exec:
"/var/www/original.pdf" may be a binary file. See it anyway?
The “-f” option in exec command is there because then you don’t need to press “y” for “yes, I want to see it anyway.”
set | grep less
yields:
LESSCLOSE='/usr/bin/lesspipe %s %s' LESSOPEN='| /usr/bin/lesspipe %s' Lossless LZW RLE Zip' -- "$cur" )); _apport_parameterless _apport_parameterless _apport_parameterless _apport_parameterless _apport_parameterless ()
Advertisement
Answer
From what I read, your console is able to display a PDF file with less
because you have an input preprocessor installed, like lesspipe
or lessfile
. The way to make less
use those preprocessor is by reading an environment variable called LESSOPEN, which points to the lesspipe
and lessfile
script.
There might be a way your webserver, through environment variables and shell commands, might be able to replicate this behavior so that your calls to less
parse PDFs properly.
What I would suggest would be to call a bash script to do the conversion for you instead of calling less
directly. That way, your bash script would be able to set the appropriate environment variables and execute the appropriate commands to convert your PDF files to a readable output.
Here’s an example of how to do this:
#!/bin/bash eval $(lesspipe) less $1 > $2 2>&1
Then, from PHP, call that script like this:
exec("/path/to/your/script/script.sh /var/www/original.pdf /var/www/new.txt");
If it doesn’t work, try changing eval $(lesspipe)
to eval $(lessfile)
.