Skip to content
Advertisement

Minimizing copies when writing large data to a socket

I am writing an application server that processes images (large data). I am trying to minimize copies when sending image data back to clients. The processed images I need to send to clients are in buffers obtained from jemalloc. The ways I have thought of sending the data back to the client is:

1) Simple write call.

// Allocate buffer buf.
// Store image data in this buffer.
write(socket, buf, len);

2) I obtain the buffer through mmap instead of jemalloc, though I presume jemalloc already creates the buffer using mmap. I then make a simple call to write.

buf = mmap(file, len);  // Imagine proper options.
// Store image data in this buffer.
write(socket, buf, len);

3) I obtain a buffer through mmap like before. I then use sendfile to send the data:

buf = mmap(in_fd, len);  // Imagine proper options.
// Store image data in this buffer.
int rc;
rc = sendfile(out_fd, file, &offset, count);
// Deal with rc.

It seems like (1) and (2) will probably do the same thing given jemalloc probably allocates memory through mmap in the first place. I am not sure about (3) though. Will this really lead to any benefits? Figure 4 on this article on Linux zero-copy methods suggests that a further copy can be prevented using sendfile:

no data is copied into the socket buffer. Instead, only descriptors with information about the whereabouts and length of the data are appended to the socket buffer. The DMA engine passes data directly from the kernel buffer to the protocol engine, thus eliminating the remaining final copy.

This seems like a win if everything works out. I don’t know if my mmaped buffer counts as a kernel buffer though. Also I don’t know when it is safe to re-use this buffer. Since the fd and length is the only thing appended to the socket buffer, I assume that the kernel actually writes this data to the socket asynchronously. If it does what does the return from sendfile signify? How would I know when to re-use this buffer?

So my questions are:

  1. What is the fastest way to write large buffers (images in my case) to a socket? The images are held in memory.
  2. Is it a good idea to call sendfile on a mmapped file? If yes, what are the gotchas? Does this even lead to any wins?

Advertisement

Answer

It seems like my suspicions were correct. I got my information from this article. Quoting from it:

Also these network write system calls, including sendfile, might and in many cases do return before the data sent over TCP by the method call has been acknowledged. These methods return as soon as all data is written into the socket buffers (sk buff) and is pushed to the TCP write queue, the TCP engine can manage alone from that point on. In other words at the time sendfile returns the last TCP send window is not actually sent to the remote host but queued. In cases where scatter-gather DMA is supported there is no seperate buffer which holds these bytes, rather the buffers(sk buffs) just hold pointers to the pages of OS buffer cache, where the contents of file is located. This might lead to a race condition if we modify the content of the file corresponding to the data in the last TCP send window as soon as sendfile is returned. As a result TCP engine may send newly written data to the remote host instead of what we originally intended to send.

Provided the buffer from a mmapped file is even considered “DMA-able”, seems like there is no way to know when it is safe to re-use it without an explicit acknowledgement (over the network) from the actual client. I might have to stick to simple write calls and incur the extra copy. There is a paper (also from the article) with more details.

Edit: This article on the splice call also shows the problems. Quoting it:

Be aware, when splicing data from a mmap’ed buffer to a network socket, it is not possible to say when all data has been sent. Even if splice() returns, the network stack may not have sent all data yet. So reusing the buffer may overwrite unsent data.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement