Skip to content
Advertisement

Check PDF number pages with PHP (in Linux)

I have a webpage where I let users to upload files to the account folder. Exactly PDF and JPG files only. I want to count the number of pages inside each PDF uploaded to show it to the users.

To do this, I was using PDFINFO linux library, part of XPDF proyect. This is the man page of the binary file: http://linuxcommand.org/man_pages/pdfinfo1.html

You can download the .zip with the binaries there: http://www.foolabs.com/xpdf/download.html

My code (this worked perfectly, but yesterday it failed):

function getNumPagesInPDF($document){
    if(!file_exists($document))return null;
    $cmd = "pdfinfo";
    // Open the document
    exec($cmd." '".$document."'", $output);

    // Browse the data
    $pagecount = 0;
    foreach($output as $op){
        // Extrac number of pages
        if(preg_match("/Pages:s*(d+)/i", $op, $matches) === 1){
            $pagecount = intval($matches[1]);
            break;
        }
    }
    return $pagecount;
}

I can run the command in SSH, and it works in the server. Now, this code doesn’t work in PHP, but nothing changed the code.

AH! a little addition: I checked exec works in my PHP using:

 function exec_enabled() {
   $disabled = explode(',', ini_get('disable_functions'));
   return !in_array('exec', $disabled);
 }
 if (exec_enabled()){
    echo "exec funciona";
 }

Another addition: PHP didn’t shows any error related with that and I have the error logging enabled to a log file (including warnings). My host recently activated mod_security.

TASK1: Try $document variable: the path is ok, relative to the place where the php code file is placed. The path exists and the file too.

TASK2: Check if $output variable has anything: NO, $output array is empty! Why? cannot understand.

TASK3: Check the $cmd.” ‘”.$document.”‘” : it’s ok, and copied the “result” to ssh works. I’m lost.

Advertisement

Answer

As per the comment discussion, we’ve seen that running a binary using a bare filename does not always work. This is as true on the console as it is inside a system command like exec().

When you run pdfinfo in either environment, the system will search through the environment variable PATH to discover which directories to find it in. This variable is nearly always different between your user account and the Apache environment, which is why it is important to always specify the fully-qualified filename when running a binary programmatically.

As far as I know, exec() does not regard the folder containing the current PHP script as the current working directory. Even if it did, the current directory . would need to be in the Apache user’s PATH in order for this to be found. Thus, I am not sure why this used to work for you, but it emphasises the importance of the above lesson: always use the full path.

You should also read the path from a settings file, rather than hardwiring it in code. This will help you as you move from local, test, staging and live environments of your app, which may store this binary in different locations.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement