Linux: Slab merging

2017-12-21

We had some out of memory situations. It was a bit of surprise since the machine had 16 GB RAM and virtually no daemons running. We only knew that pretty much all of the memory was used for some kind of I/O cache.

In other words, it was used for in-kernel memory structures. Naively running htop doesn’t help here. Now what? How do you find out what exactly uses memory?

/proc/meminfo quickly pointed us to “Slab”.

What’s “Slab”? It’s an allocating mechanism. It’s useful if you have to deal with a lot of similar objects, for example lots of inode structs or process descriptors. Slab caches can hold arbitrary data, so they are not limited to I/O stuff.

Have a look at /proc/slabinfo to get an idea of what’s in there.

The thing is, each of those caches has some management overhead. Since you generally want to avoid overhead, caches for similar object types may get merged as an optimization. This, in turn, means that /proc/slabinfo is not a reliable source of information. This can be confusing and misleading, because it means that some cache could contain data that’s not related to the cache’s name in any way.

A simple kernel module with one cache

Consider the following file slab_user.c:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>

static struct kmem_cache *my_cache_a;

struct my_data_structure {
    uint64_t some_int;
    char name[256];
};

static int
slab_user_init(void)
{
    my_cache_a = kmem_cache_create("MY_FOO",
                                   sizeof (struct my_data_structure),
                                   0,
                                   0,
                                   NULL);

    return 0;
}

static void
slab_user_exit(void)
{
    kmem_cache_destroy(my_cache_a);
}

module_init(slab_user_init);
module_exit(slab_user_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("nobody-cares");

If you’re on Arch Linux, install the packages base-devel and linux-headers, then you can build it using this Makefile (make sure to use tabs for indentation):

obj-m := slab_user.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

I strongly suggest doing this in a virtual machine.

# make
# insmod slab_user.ko

Once that’s done, you’ll see a new slab cache:

# grep MY /proc/slabinfo
MY_FOO                 0      0    264   15    1 : tunables    0    0    0 : slabdata      0      0      0

The actual numbers don’t matter. The point is that you see a new slab.

Unload the module to get rid of the cache:

# rmmod slab_user

Similar slabs getting merged

Now modify the code to create two caches:

#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>

static struct kmem_cache *my_cache_a, *my_cache_b;

struct my_data_structure {
    uint64_t some_int;
    char name[256];
};

static int
slab_user_init(void)
{
    my_cache_a = kmem_cache_create("MY_FOO",
                                   sizeof (struct my_data_structure),
                                   0,
                                   0,
                                   NULL);

    my_cache_b = kmem_cache_create("MY_BAR",
                                   sizeof (struct my_data_structure),
                                   0,
                                   0,
                                   NULL);

    return 0;
}

static void
slab_user_exit(void)
{
    kmem_cache_destroy(my_cache_a);
    kmem_cache_destroy(my_cache_b);
}

module_init(slab_user_init);
module_exit(slab_user_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("nobody-cares");

What would you expect to see now? Without knowing about slab merging, you’d like to see both MY_FOO and MY_BAR in /proc/slabinfo. Well, of course that’s not what happens:

# grep MY /proc/slabinfo
MY_FOO                 0      0    264   15    1 : tunables    0    0    0 : slabdata      0      0      0

Still only one cache.

What happens when we now allocate an object from the second cache – the one which we don’t even see here?

#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>

static struct kmem_cache *my_cache_a, *my_cache_b;

struct my_data_structure {
    uint64_t some_int;
    char name[256];
};
struct my_data_structure *allocated;

static int
slab_user_init(void)
{
    my_cache_a = kmem_cache_create("MY_FOO",
                                   sizeof (struct my_data_structure),
                                   0,
                                   0,
                                   NULL);

    my_cache_b = kmem_cache_create("MY_BAR",
                                   sizeof (struct my_data_structure),
                                   0,
                                   0,
                                   NULL);

    allocated = kmem_cache_alloc(my_cache_b, GFP_KERNEL);

    return 0;
}

static void
slab_user_exit(void)
{
    kmem_cache_free(my_cache_b, allocated);

    kmem_cache_destroy(my_cache_a);
    kmem_cache_destroy(my_cache_b);
}

module_init(slab_user_init);
module_exit(slab_user_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("nobody-cares");

Result:

# grep MY /proc/slabinfo
MY_FOO                15     15    264   15    1 : tunables    0    0    0 : slabdata      1      1      0

The first cache shows active objects. That’s the effect of slab merging.

Disabling merging for debugging purposes

Append slub_nomerge to your kernel parameters and reboot. No, that’s not a typo:

Slub is the next-generation replacement memory allocator, which has been the default in the Linux kernel since 2.6.23. It continues to employ the basic "slab" model, but fixes several deficiencies in Slab's design, particularly around systems with large numbers of processors. Slub is simpler than Slab.

Loading the same module now shows:

# grep MY /proc/slabinfo
MY_BAR                15     15    264   15    1 : tunables    0    0    0 : slabdata      1      1      0
MY_FOO                 0      0    264   15    1 : tunables    0    0    0 : slabdata      0      0      0

There you have it.

Comparing a merged and an unmerged /proc/slabinfo shows about twice as many lines in the unmerged version.

Comments?