blog · git · desktop · images · contact & privacy · gopher
2018-06-24
In 2038, signed 32-bit timestamps will overflow. Lots of timestamps are stored that way, so it’ll be a problem. Or will it? 2038 is 20 years from now, clearly we will have updated all software by then. Yeah, well, I vividly remember a time when that sentence used to be “2038 is 30 years from now, clearly we will …” and I also remember the panic around the year 2000 quite well. Time flies.
Updating software is one thing. Filesystems suffer from a little more inertia, I think – it’s rather annoying to change the on-disk format of a filesystem.
Let’s have a look at ext4.
Let’s quickly create a filesystem and pretend that we’re in the future.
$ dd if=/dev/zero of=fs bs=1G count=1
$ mkfs.ext4 fs
$ mkdir m
# mount -o loop fs m
# touch -d '2140-01-01 12:00:00' m/future
# umount m
Then re-mount the filesystem to make sure that we’re really working with the timestamp stored on disk – and not with some value cached by Linux.
# mount -o loop fs m
$ ls -l m
# umount m
First thing to note: ls
shows the correct time and date. We’ll come
back to that later. But, hey, looks like we’re good to go!
Let’s dig a little deeper. We’ll use debugfs
to show the raw disk
data.
$ debugfs fs
debugfs: ls
2 (12) . 2 (12) .. 11 (20) lost+found 12 (4040) future
debugfs: inode_dump future
0000 a481 e803 0000 0000 30db c23f d294 2f5b ........0..?../[
0020 30db c23f 0000 0000 6400 0100 0000 0000 0..?....d.......
0040 0000 0800 0100 0000 0af3 0000 0400 0000 ................
0060 0000 0000 0000 0000 0000 0000 0000 0000 ................
*
0140 0000 0000 776f 4da0 0000 0000 0000 0000 ....woM.........
0160 0000 0000 0000 0000 0000 0000 6292 0000 ............b...
0200 2000 238e 2c3c dd7d 0100 0000 0100 0000 .#.,<.}........
0220 a394 2f5b f448 f37d 0000 0000 0000 0000 ../[.H.}........
0240 0000 0000 0000 0000 0000 0000 0000 0000 ................
*
Well, well, well. How to interpret that? I honestly don’t know where to find an actual official documentation, aside from ext4’s source code, which doesn’t have that many words, though. There’s also an article in the ext4 wiki on kernel.org – that one can’t be that bad.
One important detail: The offsets shown by debugfs
are octal, not hex.
Okay, we’re looking for the file’s modification time i_mtime
. It’s at
offset 0x10 (020) and it’s listed as a 32-bit integer, little endian.
It’s this sequence of bytes:
30db c23f
Which yields the following, after adjusting for endianness:
0x3fc2db30
That can’t be everything, because we set the year to be 2140 (just
convert 0x3fc2db30 to decimal and you’ll quickly noticed how small that
number is). Turns out that there are two more bits used to augment the
timestamp. They’re stored in i_mtime_extra
at offset 0x88 (0210) –
another 32 bit:
0100 0000
Taking endianness into account, it’s just the number 1.
The wiki article explains how to put these two fields together:
https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Inode_Timestamps
So, we’re only interested in the lowest two bit anyway.
Long story short, we have to add another 0x100000000 to our initial 0x3fc2db30, which yields the decimal number 5364702000. And that’s correct:
$ date -d '2140-01-01 12:00:00' +%s
5364702000
tl;dr: ext4 sacrifices two bits from the nanosecond field and adds them to the epoch timestamp. This will work until the year 2446.
Above, ls
correctly showed the year 2140. This implies that some work
has already been done to make userland deal with the year 2038 problem.
One of the most prominent data types is time_t
. How big is it on
today’s amd64 systems?
#include <stdio.h>
#include <time.h>
int
main()
{
printf("%zd\n", sizeof (time_t));
return 0;
}
Big enough, 64 bits:
$ gcc -Wall -Wextra -std=c99 -o foo foo.c && ./foo
8
Is it guaranteed to be that large? I don’t really know, since the C standard is locked behind some paywall. Some draft that I found says it’s implementation-defined. Meh.
time_t
is used in a number of places, for example in struct
timespec
, which in turn is used by the stat
syscall nowadays. So, as
long as Linux’s VFS internally uses data types large enough to represent
dates beyond 2038, userland should be fine. I guess. :-)