Skip to content
Advertisement

DMA Engine Timeout and DMA Memory Mapping

I am trying to use a Linux DMA driver. Currently, when I send the transaction out and begin waiting, my request times out. I believe this has to do with the way I am setting up my buffers when I am performing DMA Mapping.

char *src_dma_buffer = kmalloc(dma_length, GFP_KERNEL);
char *dest_dma_buffer = kzalloc(dma_length, GFP_KERNEL);

tx_dma_handle = dma_map_single(tx_chan->device->dev, src_dma_buffer, dma_length, DMA_TO_DEVICE);    
rx_dma_handle = dma_map_single(rx_chan->device->dev, dest_dma_buffer, dma_length, DMA_FROM_DEVICE);

In Xilinx’s DMA driver, they take special care to look at memory alignment. In particular, they use a property of the dma_chan->dma_device called copy_align.

  • @copy_align: alignment shift for memcpy operations
const int dma_length = 16*1024;
len = dmatest_random() % test_buf_size + 1;
len = (len >> align) << align;
if (!len)
    len = 1 << align;
src_off = dmatest_random() % (test_buf_size - len + 1);
dst_off = dmatest_random() % (test_buf_size - len + 1);

src_off = (src_off >> align) << align;
dst_off = (dst_off >> align) << align;

It looks like that original address is totally random from dmatest_random(). Not sure what can be said/what guarentees can be made about that memory.

static unsigned long dmatest_random(void)
{
    unsigned long buf;

    get_random_bytes(&buf, sizeof(buf));
    return buf;
}

They then use these offsets to set up their source and destination buffers for DMA.

u8 *buf = thread->srcs[i] + src_off;

dma_srcs[i] = dma_map_single(tx_dev->dev, buf, len, DMA_MEM_TO_DEV);

I am very confused as to what this does. My only guess is that it will page align the beginning of the source and destination buffers in virtual memory.

Looking at the way I set up my buffers with kmalloc and kzalloc, do I have any guarantee that my buffers start at page boundaries? Am I right in that I need my buffers to start at page boundaries?

The source code to the Xilinx DMA test driver is here: https://github.com/Xilinx/linux-xlnx/blob/master/drivers/dma/xilinx/axidmatest.c

You can find the high level description of the problem I am trying to solve here: https://forums.xilinx.com/t5/Embedded-Linux/AXI-DMA-Drivers-for-Kernel-v-4-9-PetaLinux-2017-3/td-p/828917

Advertisement

Answer

Having a look to this link seems that you don’t have any guarantee that your memory allocation will start at the beginning of a frame page. However, this other link can be helpful, where it is explained alloc_pages, that maybe fit better with what you need.

In regarding with the alignment of the memory to be used in DMA transactions, into this link we can read the following:

What memory is DMA’able?

The first piece of information you must know is what kernel memory can be used with the DMA mapping facilities. There has been an unwritten set of rules regarding this, and this text is an attempt to finally write them down.

If you acquired your memory via the page allocator (i.e. __get_free_page*()) or the generic memory allocators (i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from that memory using the addresses returned from those routines.

This means specifically that you may not use the memory/addresses returned from vmalloc() for DMA. It is possible to DMA to the underlying memory mapped into a vmalloc() area, but this requires walking page tables to get the physical addresses, and then translating each of those pages back to a kernel address using something like __va(). [ EDIT: Update this when we integrate Gerd Knorr’s generic code which does this. ]

This rule also means that you may use neither kernel image addresses (items in data/text/bss segments), nor module image addresses, nor stack addresses for DMA. These could all be mapped somewhere entirely different than the rest of physical memory. Even if those classes of memory could physically work with DMA, you’d need to ensure the I/O buffers were cacheline-aligned. Without that, you’d see cacheline sharing problems (data corruption) on CPUs with DMA-incoherent caches. (The CPU could write to one word, DMA would write to a different one in the same cache line, and one of them could be overwritten.)

Also, this means that you cannot take the return of a kmap() call and DMA to/from that. This is similar to vmalloc().

What about block I/O and networking buffers? The block I/O and networking subsystems make sure that the buffers they use are valid for you to DMA from/to.

So that, only we need the address to be aligned with cacheline-size, and we don’t need to get memory aligned to frame page (it would work too, but it is not needed). In regarding with the manual about kmalloc, if we specify the flag GFP_DMA we get memory suitable for DMA transactions (aligned to cacheline-size).

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement