blog · git · desktop · images · contact & privacy · gopher
2017-12-21
We had some out of memory situations. It was a bit of surprise since the machine had 16 GB RAM and virtually no daemons running. We only knew that pretty much all of the memory was used for some kind of I/O cache.
In other words, it was used for in-kernel memory structures. Naively
running htop
doesn’t help here. Now what? How do you find out what
exactly uses memory?
/proc/meminfo
quickly pointed us to “Slab”.
What’s “Slab”? It’s an allocating mechanism. It’s useful if you have to
deal with a lot of similar objects, for example lots of inode
structs
or process descriptors. Slab caches can hold arbitrary data, so they are
not limited to I/O stuff.
Have a look at /proc/slabinfo
to get an idea of what’s in there.
The thing is, each of those caches has some management overhead. Since
you generally want to avoid overhead, caches for similar object types
may get merged as an optimization. This, in turn, means that
/proc/slabinfo
is not a reliable source of information. This can be
confusing and misleading, because it means that some cache could
contain data that’s not related to the cache’s name in any way.
Consider the following file slab_user.c
:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>
static struct kmem_cache *my_cache_a;
struct my_data_structure {
uint64_t some_int;
char name[256];
};
static int
slab_user_init(void)
{
my_cache_a = kmem_cache_create("MY_FOO",
sizeof (struct my_data_structure),
0,
0,
NULL);
return 0;
}
static void
slab_user_exit(void)
{
kmem_cache_destroy(my_cache_a);
}
module_init(slab_user_init);
module_exit(slab_user_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("nobody-cares");
If you’re on Arch Linux, install the packages base-devel
and
linux-headers
, then you can build it using this Makefile (make sure to
use tabs for indentation):
obj-m := slab_user.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
I strongly suggest doing this in a virtual machine.
# make
# insmod slab_user.ko
Once that’s done, you’ll see a new slab cache:
# grep MY /proc/slabinfo
MY_FOO 0 0 264 15 1 : tunables 0 0 0 : slabdata 0 0 0
The actual numbers don’t matter. The point is that you see a new slab.
Unload the module to get rid of the cache:
# rmmod slab_user
Now modify the code to create two caches:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>
static struct kmem_cache *my_cache_a, *my_cache_b;
struct my_data_structure {
uint64_t some_int;
char name[256];
};
static int
slab_user_init(void)
{
my_cache_a = kmem_cache_create("MY_FOO",
sizeof (struct my_data_structure),
0,
0,
NULL);
my_cache_b = kmem_cache_create("MY_BAR",
sizeof (struct my_data_structure),
0,
0,
NULL);
return 0;
}
static void
slab_user_exit(void)
{
kmem_cache_destroy(my_cache_a);
kmem_cache_destroy(my_cache_b);
}
module_init(slab_user_init);
module_exit(slab_user_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("nobody-cares");
What would you expect to see now? Without knowing about slab merging,
you’d like to see both MY_FOO
and MY_BAR
in /proc/slabinfo
. Well,
of course that’s not what happens:
# grep MY /proc/slabinfo
MY_FOO 0 0 264 15 1 : tunables 0 0 0 : slabdata 0 0 0
Still only one cache.
What happens when we now allocate an object from the second cache – the one which we don’t even see here?
#include <linux/init.h>
#include <linux/module.h>
#include <linux/slab.h>
static struct kmem_cache *my_cache_a, *my_cache_b;
struct my_data_structure {
uint64_t some_int;
char name[256];
};
struct my_data_structure *allocated;
static int
slab_user_init(void)
{
my_cache_a = kmem_cache_create("MY_FOO",
sizeof (struct my_data_structure),
0,
0,
NULL);
my_cache_b = kmem_cache_create("MY_BAR",
sizeof (struct my_data_structure),
0,
0,
NULL);
allocated = kmem_cache_alloc(my_cache_b, GFP_KERNEL);
return 0;
}
static void
slab_user_exit(void)
{
kmem_cache_free(my_cache_b, allocated);
kmem_cache_destroy(my_cache_a);
kmem_cache_destroy(my_cache_b);
}
module_init(slab_user_init);
module_exit(slab_user_exit);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("nobody-cares");
Result:
# grep MY /proc/slabinfo
MY_FOO 15 15 264 15 1 : tunables 0 0 0 : slabdata 1 1 0
The first cache shows active objects. That’s the effect of slab merging.
Append slub_nomerge
to your kernel parameters and reboot. No, that’s
not a typo:
Slub is the next-generation replacement memory allocator, which has been the default in the Linux kernel since 2.6.23. It continues to employ the basic "slab" model, but fixes several deficiencies in Slab's design, particularly around systems with large numbers of processors. Slub is simpler than Slab.
Loading the same module now shows:
# grep MY /proc/slabinfo
MY_BAR 15 15 264 15 1 : tunables 0 0 0 : slabdata 1 1 0
MY_FOO 0 0 264 15 1 : tunables 0 0 0 : slabdata 0 0 0
There you have it.
Comparing a merged and an unmerged /proc/slabinfo
shows about twice as
many lines in the unmerged version.