blog · git · desktop · images · contact


memfd and syscall wrappers

2017-02-11

Quite some time ago, I came across memfd. Here’s a good article about it by one of its authors:

https://dvdhrm.wordpress.com/tag/memfd/

To sum it up, Linux now supports the syscall memfd_create() which will return an ordinary file descriptor – but it’s backed by pure memory. You don’t need a tmpfs and it will never collide with any existing file, simply because it’s not visible in your file system.

I wanted to play with this feature, so I started by reading the manpage. Interestingly, it begins with the following introduction:

NAME
       memfd_create - create an anonymous file

SYNOPSIS
       #include <sys/memfd.h>

       int memfd_create(const char *name, unsigned int flags);

       Note:  There  is  no  glibc  wrapper  for this system call; see
       NOTES.

Huh? No glibc wrapper? That’s unusual. memfd is included in Linux since version 3.17 which was released on 2014-10-05. Today is 2017-02-11. What’s going on?

A patch was proposed by the memfd author in 2014, but it appears to not have met glibc’s requirements for a syscall wrapper.

– Update, 2018-06-24: Since glibc 2.27, which was released about one year after this blog post was written, finally has a wrapper.

To my surprise, not having a glibc wrapper for a syscall is not as unusual as I thought. This posting says that, as per May 2015, “a de facto status of ‘syscall wrappers present for almost all syscalls added up to Linux 3.2 / glibc 2.15 but for nothing added since then’”. Given that we’re at Linux 4.10 very soon, that would be quite some gap, but I don’t know if there have been updates in the meantime.

I have stopped reading at this point. It’s mostly politics and philosophy. The more interesting, technical question is: How do you perform a syscall from C when there’s no wrapper in your libc? I’m very familiar with performing syscalls from raw assembly, but I never had the need to do something similar from C.

Turns out, there’s a generic syscall wrapper. You can pass it the number of a syscall, its arguments, and off you go:

int fd = syscall(SYS_memfd_create, "foo", 0);

Now, syscall() is obviously part of a libc, but where do the SYS_* preprocessor macros come from?

It’s a separate package and it’s indeed not part of glibc, but fetched from kernel.org, as can be seen in its PKGBUILD. That’s an interesting combination. It means that syscall() from glibc can support any current syscall because we’re feeding it with up-to-date syscall numbers directly from kernel.org.

Finally, a short example that uses memfd:

#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int
main()
{
    int fd;
    pid_t child;
    char buf[BUFSIZ] = "";
    ssize_t br;

    fd = syscall(SYS_memfd_create, "foofile", 0);
    if (fd == -1)
    {
        perror("memfd_create");
        exit(EXIT_FAILURE);
    }

    child = fork();
    if (child == 0)
    {
        dup2(fd, 1);
        close(fd);
        execlp("/bin/date", "/bin/date", NULL);
        perror("execlp date");
        exit(EXIT_FAILURE);
    }
    else if (child == -1)
    {
        perror("fork");
        exit(EXIT_FAILURE);
    }

    waitpid(child, NULL, 0);

    lseek(fd, 0, SEEK_SET);
    br = read(fd, buf, BUFSIZ);
    if (br == -1)
    {
        perror("read");
        exit(EXIT_FAILURE);
    }
    buf[br] = 0;

    printf("child said: '%s'\n", buf);

    exit(EXIT_SUCCESS);
}

It creates a memfd, forks a child process, redirects its stdout, waits for the child to exit, and finally reads what the child has written. Traditionally, a pipe would be used to do this job.

Noteworthy:

A minor “meh”: IIUC, memfd is Linux-only.

Comments?