I want to try and correlate an IP packet (using libpcap) to a process. I have had some limited success using the relevant /proc/net/ files but found that on some of the machines i’m using, this file can be many thousands of lines and parsing it is not efficient (caching has alleviated some performance problems).
I read that using sock_diag netlink subsystem could help by directly querying the kernel about the socket I am interested in. I’ve had limited success with my attempts but have hit a mental block on what is wrong.
For the initial query I have:
if (query_fd_) { struct { nlmsghdr nlh; inet_diag_req_v2 id_req; } req = { .nlh = { .nlmsg_len = sizeof(req), .nlmsg_type = SOCK_DIAG_BY_FAMILY, .nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP }, .id_req = { .sdiag_family = packet.l3_protocol, .sdiag_protocol = packet.l4_protocol, .idiag_ext = 0, .pad = 0, .idiag_states = -1, .id = { .idiag_sport = packet.src_port, .idiag_dport = packet.dst_port } } }; //packet ips are just binary data stored as strings! memcpy(req.id_req.id.idiag_src, packet.src_ip.c_str(), 4); memcpy(req.id_req.id.idiag_dst, packet.dst_ip.c_str(), 4); struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; struct iovec iov = { .iov_base = &req, .iov_len = sizeof(req) }; struct msghdr msg = { .msg_name = (void *) &nladdr, .msg_namelen = sizeof(nladdr), .msg_iov = &iov, .msg_iovlen = 1 }; // Send message to kernel for (;;) { if (sendmsg(query_fd_, &msg, 0) < 0) { if (errno == EINTR) continue; perror("sendmsg"); return false; } return true; } } return false;
For the receive code I have:
long buffer[8192]; struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK }; struct iovec iov = { .iov_base = buffer, .iov_len = sizeof(buffer) }; struct msghdr msg = { .msg_name = (void *) &nladdr, .msg_namelen = sizeof(nladdr), .msg_iov = &iov, .msg_iovlen = 1 }; int flags = 0; for (;;) { ssize_t rv = recvmsg(query_fd_, &msg, flags); // error handling if (rv < 0) { if (errno == EINTR) continue; if ((errno == EAGAIN) || (errno == EWOULDBLOCK)) break; perror("Failed to recv from netlink socket"); return 0; } if (rv == 0) { printf("Unexpected shutdown of NETLINK socket"); return 0; } for (const struct nlmsghdr* header = reinterpret_cast<const struct nlmsghdr*>(buffer); rv >= 0 && NLMSG_OK(header, static_cast<uint32_t>(rv)); header = NLMSG_NEXT(header, rv)) { // The end of multipart message if (header->nlmsg_type == NLMSG_DONE) return 0; if (header->nlmsg_type == NLMSG_ERROR) { const struct nlmsgerr *err = reinterpret_cast<nlmsgerr*>(NLMSG_DATA(header)); if (err == NULL) return 100; errno = -err->error; perror("NLMSG_ERROR"); return 0; } if (header->nlmsg_type != SOCK_DIAG_BY_FAMILY) { printf("unexpected nlmsg_type %un", (unsigned)header->nlmsg_type); continue; } // Get the details.... const struct inet_diag_msg* diag = reinterpret_cast<inet_diag_msg*>(NLMSG_DATA(header)); if (header->nlmsg_len < NLMSG_LENGTH(sizeof(*diag))) { printf("Message too short %d vs %dn", header->nlmsg_len, NLMSG_LENGTH(sizeof(*diag))); return 0; } if (diag->idiag_family != PF_INET) { printf("unexpected family %un", diag->idiag_family); return 1; } return diag->idiag_inode;
The Problem:
The diag->udiag_inode value doesn’t match the one I see in netstat output or in the /proc/net/ files. Is it supposed too? If not, is it possible to use this approach to retrieve the inode number for the process so that I can then query /proc for the corresponding PID?
Another thing I didn’t quite understand is the NLMSG_DONE when checking the nlmsg_type in the header. What I am seeing:
1 - TCP 10.0.9.15:51002 -> 192.168.64.11:3128 [15047] 2 - TCP 192.168.64.11:3128 -> 10.0.9.15:51002 [0] 3 - TCP 10.0.9.15:51002 -> 192.168.64.11:3128 [0] 4 - TCP 192.168.64.11:3128 -> 10.0.9.15:51002 [15047] 5 - TCP 10.0.9.15:51002 -> 192.168.64.11:3128 [0] 6 - TCP 192.168.64.11:3128 -> 10.0.9.15:51002 [0] 7 - TCP 10.0.9.15:51002 -> 192.168.64.11:3128 [15047]
So I get an inode number on first query, then some NLMSG_DONE returns (stepping through code confirmed this was the path). Why don’t I get the same result for say lines 1 and 3?
Appreciate any help or advice.
Advertisement
Answer
Found the answer and posting in case anyone stumbles across it:
I had a uint16_t as the return type from the recv code when in fact it should have been ino_t or uint32_t. I discovered this when I noticed that a few of the inodes matched correctly after a fresh reboot and then after a while stopped matching with no code changed (inode count obviously incrementing). Using the correct type in the function return sorted the problem (so the code I posted is actually correct!)
I was getting multi part messages. I should have looped whilst NLM_F_MULTI was set in the flags and then left the loop when receiving NLMSG_DONE.