Some program that I am currently working on consumes much more memory than I think it should. So I am trying to understand how glibc malloc trimming works. I wrote the following test:
#include <malloc.h> #include <unistd.h> #define NUM_CHUNKS 1000000 #define CHUNCK_SIZE 100 int main() { // disable fast bins mallopt(M_MXFAST, 0); void** array = (void**)malloc(sizeof(void*) * NUM_CHUNKS); // allocating memory for(unsigned int i = 0; i < NUM_CHUNKS; i++) { array[i] = malloc(CHUNCK_SIZE); } // releasing memory ALMOST all memory for(unsigned int i = 0; i < NUM_CHUNKS - 1 ; i++) { free(array[i]); } // when enabled memory consumption reduces //int ret = malloc_trim(0); //printf("ret=%dn", ret); malloc_stats(); sleep(100000); }
Test output (without calling malloc_trim):
Arena 0: system bytes = 112054272 in use bytes = 112 Total (incl. mmap): system bytes = 120057856 in use bytes = 8003696 max mmap regions = 1 max mmap bytes = 8003584
Even though almost all memory was released, this test code consumes much more resident memory than expected:
[root@node0-b3]# ps aux | grep test root 14662 1.8 0.4 129736 **118024** pts/10 S 20:19 0:00 ./test
Process smaps:
0245e000-08f3b000 rw-p 00000000 00:00 0 [heap] Size: 109428 kB Rss: 109376 kB Pss: 109376 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 109376 kB Referenced: 109376 kB Anonymous: 109376 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac 7f1c60720000-7f1c60ec2000 rw-p 00000000 00:00 0 Size: 7816 kB Rss: 7816 kB Pss: 7816 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 7816 kB Referenced: 7816 kB Anonymous: 7816 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB
When I enable the call to malloc_trim the output of the test stays almost the same:
ret=1 Arena 0: system bytes = 112001024 in use bytes = 112 Total (incl. mmap): system bytes = 120004608 in use bytes = 8003696 max mmap regions = 1 max mmap bytes = 8003584
However, the RSS decreases significantly:
[root@node0-b3]# ps aux | grep test root 15733 0.6 0.0 129688 **8804** pts/10 S 20:20 0:00 ./test
Process smaps (after malloc_trim):
01698000-08168000 rw-p 00000000 00:00 0 [heap] Size: 109376 kB Rss: 8 kB Pss: 8 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 8 kB Referenced: 8 kB Anonymous: 8 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr mr mw me ac 7f508122a000-7f50819cc000 rw-p 00000000 00:00 0 Size: 7816 kB Rss: 7816 kB Pss: 7816 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 7816 kB Referenced: 7816 kB Anonymous: 7816 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB
After calling malloc_trim, the heap got shunked. I assume the 8MB mmap segment is still available because of the last piece of memory which wasn’t released.
Why heap trimming isn’t performed automatically by malloc? Is there a way to configure malloc such that trimming will be done automatically (when it can save that much of a memory)?
I am using glibc version 2.17.
Advertisement
Answer
Largely for historical reasons, memory for small allocations comes from a pool managed with the brk
system call. This is a very old system call — at least as old as Version 6 Unix — and the only thing it can do is change the size of an “arena” whose position in memory is fixed. What that means is, the brk
pool cannot shrink past a block that is still allocated.
Your program allocates N blocks of memory and then deallocates N-1 of them. The one block it doesn’t deallocate is the one located at the highest address. That is the worst-case scenario for brk
: the size can’t be reduced at all, even though 99.99% of the pool is unused! If you change your program so that the block it doesn’t free is array[0]
instead of array[NUM_CHUNKS-1]
, you should see both RSS and address space shrink upon the final call to free
.
When you explicitly call malloc_trim
, it attempts to work around this limitation using a Linux extension, madvise(MADV_DONTNEED)
, which releases the physical RAM, but not the address space (as you observed). I don’t know why this only happens upon an explicit call to malloc_trim
.
Incidentally, the 8MB mmap segment is for your initial allocation of array
.