blog · git · desktop · images · contact & privacy · gopher


Is my filesystem ready for 2038? (ext4)

In 2038, signed 32-bit timestamps will overflow. Lots of timestamps are stored that way, so it’ll be a problem. Or will it? 2038 is 20 years from now, clearly we will have updated all software by then. Yeah, well, I vividly remember a time when that sentence used to be “2038 is 30 years from now, clearly we will …” and I also remember the panic around the year 2000 quite well. Time flies.

Updating software is one thing. Filesystems suffer from a little more inertia, I think – it’s rather annoying to change the on-disk format of a filesystem.

Let’s have a look at ext4.

The current state of ext4

Let’s quickly create a filesystem and pretend that we’re in the future.

$ dd if=/dev/zero of=fs bs=1G count=1
$ mkfs.ext4 fs
$ mkdir m
# mount -o loop fs m
# touch -d '2140-01-01 12:00:00' m/future
# umount m

Then re-mount the filesystem to make sure that we’re really working with the timestamp stored on disk – and not with some value cached by Linux.

# mount -o loop fs m
$ ls -l m
# umount m

First thing to note: ls shows the correct time and date. We’ll come back to that later. But, hey, looks like we’re good to go!

Let’s dig a little deeper. We’ll use debugfs to show the raw disk data.

$ debugfs fs
debugfs:  ls
 2  (12) .    2  (12) ..    11  (20) lost+found    12  (4040) future   
debugfs:  inode_dump future
0000  a481 e803 0000 0000 30db c23f d294 2f5b  ........0..?../[
0020  30db c23f 0000 0000 6400 0100 0000 0000  0..?....d.......
0040  0000 0800 0100 0000 0af3 0000 0400 0000  ................
0060  0000 0000 0000 0000 0000 0000 0000 0000  ................
*
0140  0000 0000 776f 4da0 0000 0000 0000 0000  ....woM.........
0160  0000 0000 0000 0000 0000 0000 6292 0000  ............b...
0200  2000 238e 2c3c dd7d 0100 0000 0100 0000   .#.,<.}........
0220  a394 2f5b f448 f37d 0000 0000 0000 0000  ../[.H.}........
0240  0000 0000 0000 0000 0000 0000 0000 0000  ................
*

Well, well, well. How to interpret that? I honestly don’t know where to find an actual official documentation, aside from ext4’s source code, which doesn’t have that many words, though. There’s also an article in the ext4 wiki on kernel.org – that one can’t be that bad.

One important detail: The offsets shown by debugfs are octal, not hex.

Okay, we’re looking for the file’s modification time i_mtime. It’s at offset 0x10 (020) and it’s listed as a 32-bit integer, little endian. It’s this sequence of bytes:

30db c23f

Which yields the following, after adjusting for endianness:

0x3fc2db30

That can’t be everything, because we set the year to be 2140 (just convert 0x3fc2db30 to decimal and you’ll quickly noticed how small that number is). Turns out that there are two more bits used to augment the timestamp. They’re stored in i_mtime_extra at offset 0x88 (0210) – another 32 bit:

0100 0000

Taking endianness into account, it’s just the number 1.

The wiki article explains how to put these two fields together:

https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Inode_Timestamps

So, we’re only interested in the lowest two bit anyway.

Long story short, we have to add another 0x100000000 to our initial 0x3fc2db30, which yields the decimal number 5364702000. And that’s correct:

$ date -d '2140-01-01 12:00:00' +%s
5364702000

tl;dr: ext4 sacrifices two bits from the nanosecond field and adds them to the epoch timestamp. This will work until the year 2446.

Sizes of other common date/time data types

Above, ls correctly showed the year 2140. This implies that some work has already been done to make userland deal with the year 2038 problem. One of the most prominent data types is time_t. How big is it on today’s amd64 systems?

#include <stdio.h>
#include <time.h>

int
main()
{
    printf("%zd\n", sizeof (time_t));
    return 0;
}

Big enough, 64 bits:

$ gcc -Wall -Wextra -std=c99 -o foo foo.c && ./foo
8

Is it guaranteed to be that large? I don’t really know, since the C standard is locked behind some paywall. Some draft that I found says it’s implementation-defined. Meh.

time_t is used in a number of places, for example in struct timespec, which in turn is used by the stat syscall nowadays. So, as long as Linux’s VFS internally uses data types large enough to represent dates beyond 2038, userland should be fine. I guess. :-)