In my /etc/default/grub file I have explicitly set aside N huge pages with “hugepages=N”. If I’m running on a box with 2 NUMA nodes, does N/2 huge pages get set aside for each node, or do they all go to node 0, or ….? Also is there a way on the command line to query how they’re split across nodes?
Advertisement
Answer
From the kernel.org post about hugepages HERE
On a NUMA platform, the kernel will attempt to distribute the huge page pool over all the set of allowed nodes specified by the NUMA memory policy of the task that modifies nr_hugepages. The default for the allowed nodes–when the task has default memory policy–is all on-line nodes with memory. Allowed nodes with insufficient available, contiguous memory for a huge page will be silently skipped when allocating persistent huge pages. See the discussion below of the interaction of task memory policy, cpusets and per node attributes with the allocation and freeing of persistent huge pages.
The success or failure of huge page allocation depends on the amount of physically contiguous memory that is present in system at the time of the allocation attempt. If the kernel is unable to allocate huge pages from some nodes in a NUMA system, it will attempt to make up the difference by allocating extra pages on other nodes with sufficient available contiguous memory, if any.
System administrators may want to put this command in one of the local rc init files. This will enable the kernel to allocate huge pages early in the boot process when the possibility of getting physical contiguous pages is still very high. Administrators can verify the number of huge pages actually allocated by checking the sysctl or meminfo. To check the per node distribution of huge pages in a NUMA system, use:
cat /sys/devices/system/node/node*/meminfo | fgrep Huge