I have two processes (a producer and a consumer) communicating via a shared memory segment produced using the ‘old’ interface rather than mmap:
auto key = ftok(<somefile>,<someid>; int ret = shmget(key, size, flags); void* memArea = shmat(key,NULL,0); // check errors and do stuff...
The producer process could be restarted due to an error or configuration change. It creates a new region each time using the IPC_CREAT flag to shmget(). I have noticed that the consumer can continue to read from the existing shared memory segment while the replacement producer has moved on to a different one.
How can a consumer process detect and recover from this?
Advertisement
Answer
It might be better idea to alter your producer design:
Instead of using IPC_CREAT it could first check if there is an existing segment that could be re-used.
You could also consider using mmap based shared memory instead which is more flexible in some ways.
You could use some other indicator such as a lock file to determine if the shared memory interface is still viable.
However, if for some reason these are not options (someone else controls the producer code for example) then read on.
There are several things you can do:
- use shmctl() to ‘stat’ your memory segment
// return true if the shared memory region is still 'useful/useable' bool checkShm(int shmId) { struct shmid_ds statBuf; int res = shmctl(<shmid>, IPC_STAT, statBuf); if (res == -1) return false; ...
- check if the region is marked for deletion (Linux specific)
if ((statBuf.shm_perm.mode&SHM_DEST) != 0) return false;
- assuming you attached after the producer and it is the creator process – check that it dettached after you. caveat: It could have reattached again if your design allows this.
if (statBuf.shm_cpid == shmBuf.shm_lpid) return false;
- check the PID of the creator process is a running process. caveat: the PID could be recycled by a new process
if (getpgid(shmBuf.shm_cpid) == -1) return false;
note: you could use kill(shmBuf.shm_cpid,0)
instead if the producer is not a different user.
- You might also want to check if the file has been modified. A key point is that ftok uses the inode number not the actual filename as the man page suggests. So you need to be careful using it:
struct stat fstatBuf; int res = stat(fileName,&fstatBuf); if (res == -1) return false; // if the file has disappeared it could be a bad sign! if (fstatBuf.st_ino != savedInode) return false;
Having done all this you should now have a reasonably good way to check if the SHM you think is still useful is actually being used by the ‘producer’ you think it is.
- Clean up the stale shared memory segmant
You are now free to detach shmdt() from the segment, and try to clean it up shmctl(shmid,IPC_RMID,NULL). The consumer process might not have permissions to remove it if the creator did not grant them.
- Attach to the replacement shared memory segment
You are then in principle able to attach to any new shared memory segment created by a replacement producer process:
auto key = ftok(<somefile>,<someid>; void* memArea = shmat(key,NULL,0); // check errors and do stuff...
But there a cruel and interesting punishment awaits you. It will not work immediately. You have to wait a time and periodically retry. I guess this is until the operating system has had a chance to clean up the old memory segment.
I found that ftok() returns -1 for a while despite the file existing and having the same inode as the original file.