Skip to content
Advertisement

In the kernel space, how does one get the physical addresses corresponding to a file on ext4-formatted disk

If you’re here:

https://github.com/torvalds/linux/blob/master/fs/ext4/file.c#L360

You have access to these two structs inside the ext4_file_mmap function:

struct file *file, struct vm_area_struct *vma

I am changing the implementation of this function for dax mode so that the page tables get entirely filled out for the file the moment you call mmap (to see how much better performance not taking any pagefaults gives us).

I have managed to get the following done so far (assuming I have access to to the two structs that ext4_file_mmap has access to):

// vm_area_struct defined in /include/linux/mm_types.h : 284
// file defined in /include/linux/fs.h : 848

loff_t file_size = file_inode(file)->i_size;
unsigned long start_va = vma->vm_start;

Now, the difficulty lies here. How do I get the physical addresses (blocks? Not sure if dax uses blocks) associated with this file?

I have spent the last couple of days staring at the linux source code, trying to make sense of stuff, and boy have I been successful.

Any help, hint,or suggestion is greatly appreciated! Thanks!

Some updates: When you mmap a file in dax mode, you don’t fetch anything into memory. The device, in this case PMEM, is byte-addressable and gives DDR latencies, so it’s accessed directly (no memory in between). Certain ptes lead to the access of this PMEM device instead of memory.

Advertisement

Answer

First of all mmap support MAP_POPULATE flag specifically to avoid page faults. In principle it may be it does not work with dax, but that’s unlikely.

Second of all it seems you don’t have any measurements of the current state of affairs. Just “changing something and checking the difference” is a fundamentally wrong approach. In particular it may be the actual bottleneck will be removed as an unintended consequence of the change and the win will end up being misattributed. You can start by using ‘perf’ to get basic numbers and generating flamegraphs ( http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html ). If you do a lot of i/o over a small range, page faults should have a negligible effect.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement