I’m working with a multithreaded embedded application in which epoll is used for IO in one of the threads. I’m relying on a particular feature of epoll that specifies that closing a file descriptor automatically removes it from the epoll set (Question/Answer 6 in man 7 epoll). In this case, the file descriptor close is done in the same thread that epoll_wait
is invoked. What ends up happening is that epoll_wait
returns an event on a file descriptor after it has been closed and the program ends up crashing because it tries to access resources that were deallocated when the file descriptor was closed. As far as I know, the file descriptor is not duped anywhere, though I do not know how to validate this. I know for a fact that there are no calls to fork()
, dup()
, dup2()
, or fcntl()
with the particular dup option. This particular file descriptor is registered with EPOLLOUT
, EPOLLIN
, EPOLLERR
, and EPOLLHUP
. It is level-triggered. Are there any caveats to this feature that anybody knows about? Is the man page incorrect? Any useful information that can help me further debug the issue? I know I could just remove the file descriptor from the set, but I would like to know why this is happening.
Advertisement
Answer
Closing a file descriptor does not seem to remove it from the epoll. I tried it with very simple example on a 3.12.2. I’m inclined to call the man page wrong or inaccurate.
What I did in a test was:
- created a tcp socket
- bound it to localhost:5555
- set it to listen
- created an epoll
- added the socket there with hup, err and in
- slept a bit so I could optionally connect to with with nc
- closed the socket
- epoll_wait
- epoll_ctl del
- cleaned up
The wait works even though the socket had been closed whether I had connected to it or not.
Edit: The epoll_ctl_del
did fail if the socket has been closed. And after reading the current man pages, it seems they’re actually ok. The epoll page points to select(2) about closing a socket being monitored and that page says that the behaviour is unspecified.