If everything is “just” a file in linux, how do files/nodes in /dev
differ from other files such that docker must handle them differently?
What does docker do differently for device files? I expect it to be a shorthand for a more verbose bind command?
In fact, after just doing a regular bind mount for a device file such as --volume /dev/spidev0.0:/dev/spidev0.0
, the user get’s a “permission denied” within the docker container when trying to access the device. When binding via --device /dev/spidev0.0:/dev/spidev0.0
, it works as expected.
Advertisement
Answer
The Docker run reference page has a link to Linux kernel documentation on the cgroup device whitelist controller. In several ways, a process running as root in a container is a little bit more limited than the same process running as root on the host: without special additional permissions (capabilities), you can’t reboot the host, mount filesystems, create virtual NICs, or any of a variety of other system-administration tasks. The device system is separate from the capability system, but it’s in the same spirit.
The other way to think about this is as a security feature. A container shouldn’t usually be able to access the host’s filesystem or other processes, even if it’s running as root. But if the container process can mknod kmem c 1 2
and access kernel memory, or mknod sda b 8 0
guessing that the host’s hard drive looks like a SCSI disk, it could in theory escape these limitations by directly accessing low-level resources. The cgroup device limit protects against this.
Since Docker is intended as an isolation system where containers are restricted environments that can’t access host resources, it can be inconvenient at best to run tasks that need physical devices or host files. If Docker’s isolation features don’t make sense, then the process might run better directly on the host, without involving Docker.