blog · git · desktop · images · contact & privacy · gopher
memfd
and syscall wrappers2017-02-11
Quite some time ago, I came across memfd
. Here’s a good article about
it by one of its authors:
https://dvdhrm.wordpress.com/tag/memfd/
To sum it up, Linux now supports the syscall memfd_create()
which will
return an ordinary file descriptor – but it’s backed by pure memory. You
don’t need a tmpfs and it will never collide with any existing
file, simply because it’s not visible in your file system.
I wanted to play with this feature, so I started by reading the manpage. Interestingly, it begins with the following introduction:
NAME
memfd_create - create an anonymous file
SYNOPSIS
#include <sys/memfd.h>
int memfd_create(const char *name, unsigned int flags);
Note: There is no glibc wrapper for this system call; see
NOTES.
Huh? No glibc wrapper? That’s unusual. memfd
is included in Linux
since version 3.17 which was released on 2014-10-05. Today is
2017-02-11. What’s going on?
A patch was proposed by the memfd
author in 2014, but it
appears to not have met glibc’s requirements for a syscall
wrapper.
– Update, 2018-06-24: Since glibc 2.27, which was released about one year after this blog post was written, finally has a wrapper.
To my surprise, not having a glibc wrapper for a syscall is not as unusual as I thought. This posting says that, as per May 2015, “a de facto status of ‘syscall wrappers present for almost all syscalls added up to Linux 3.2 / glibc 2.15 but for nothing added since then’”. Given that we’re at Linux 4.10 very soon, that would be quite some gap, but I don’t know if there have been updates in the meantime.
I have stopped reading at this point. It’s mostly politics and philosophy. The more interesting, technical question is: How do you perform a syscall from C when there’s no wrapper in your libc? I’m very familiar with performing syscalls from raw assembly, but I never had the need to do something similar from C.
Turns out, there’s a generic syscall wrapper. You can pass it the number of a syscall, its arguments, and off you go:
int fd = syscall(SYS_memfd_create, "foo", 0);
Now, syscall()
is obviously part of a libc, but where do the SYS_*
preprocessor macros come from?
/usr/include/sys/syscall.h
is part of glibc, but it doesn’t really
do anything. It only redefines existing macros, for example:
#define SYS_memfd_create __NR_memfd_create
__NR_*
come from /usr/include/asm/unistd_64.h
(or whatever
architecture you’re on). This file is from the package
linux-api-headers
on Arch Linux and it contains the actual
definitions:
#define __NR_memfd_create 319
It’s a separate package and it’s indeed not part of glibc, but fetched
from kernel.org, as can be seen in its PKGBUILD. That’s an
interesting combination. It means that syscall()
from glibc can
support any current syscall because we’re feeding it with up-to-date
syscall numbers directly from kernel.org.
Finally, a short example that uses memfd
:
#include <stdio.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
int
main()
{
int fd;
pid_t child;
char buf[BUFSIZ] = "";
ssize_t br;
fd = syscall(SYS_memfd_create, "foofile", 0);
if (fd == -1)
{
perror("memfd_create");
exit(EXIT_FAILURE);
}
child = fork();
if (child == 0)
{
dup2(fd, 1);
close(fd);
execlp("/bin/date", "/bin/date", NULL);
perror("execlp date");
exit(EXIT_FAILURE);
}
else if (child == -1)
{
perror("fork");
exit(EXIT_FAILURE);
}
waitpid(child, NULL, 0);
lseek(fd, 0, SEEK_SET);
br = read(fd, buf, BUFSIZ);
if (br == -1)
{
perror("read");
exit(EXIT_FAILURE);
}
buf[br] = 0;
printf("child said: '%s'\n", buf);
exit(EXIT_SUCCESS);
}
It creates a memfd
, forks a child process, redirects its stdout
,
waits for the child to exit, and finally reads what the child has
written. Traditionally, a pipe would be used to do this job.
Noteworthy:
cat /proc/$pid/fd/3
and read the contents of the
memfd
.foofile
. This name is not really important, it’s mostly only
useful for debugging purposes.memfd
as if it were an ordinary file. Nobody has to ever read the
data (a pipe blocks at some point).A minor “meh”: IIUC, memfd
is Linux-only.