Skip to content
Advertisement

Why does the command and its arguments have to be in a list for subprocess.Popen?

I tried doing

import subprocess
p = subprocess.Popen("ls -la /etc", stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.stdout.read().decode()

Which gives me

FileNotFoundError: [Errno 2] No such file or directory: 'ls -la /etc': 'ls -la /etc'

Following

Python subprocess.Popen with var/args

I did

import subprocess
p = subprocess.Popen(["ls", "-la", "/etc"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.stdout.read().decode()

Which did work.

Why is that? Why do I have to split my command and its arguments? What’s the rationale behind this design?


Python version:

3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]

Advertisement

Answer

That’s how all process invocations work on UNIX.

Under the hood, running a program on UNIX is traditionally done with the following steps:

  1. fork() off a child process.
  2. In that child process, open new copies of stdin, stdout, stderr, etc if redirections are requested, using the dup2() call to assign the newly-opened files over the file descriptors that are redirection targets.
  3. In that child process, use the execve() syscall to replace the current process with the desired child process. This syscall takes an array of arguments, not a single string.
  4. wait() for the child to exit, if the call is meant to be blocking.

So, subprocess.Popen exposes the array interface, because the array interface is what the operating system actually does under the hood.

When you run ls /tmp at a shell, that shell transforms the string into an array and then does the above steps itself — but it gives you more control (and avoids serious bugs — if someone creates a file named /tmp/$(rm -rf ~), you don’t want trying to cat /tmp/$(rm -rf ~) to delete your home directory) when you do the transformations yourself.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement