blog · git · desktop · images · contact & privacy · gopher
/tmp
2018-06-02
My setup isn’t really that complex. The system boots to a text console,
so I can log in on tty1. When I do that, my ~/.bashrc
decides to
automatically run startx
. There is no display manager. My ~/.xinitrc
consists of a bunch of programs to auto-start, mostly some terminals,
the bars at the edges of the screen, and the window manager. And it sets
a wallpaper. That’s pretty much it.
So even though little of the “big” “modern” “bloated” software is involved, there was a short freeze right after the start of X and before the bars appeared. It used to be short enough for me not to bother, but it got a lot worse yesterday.
There was no obvious cause for this. I trimmed down my ~/.xinitrc
to
just contain
openbox
and nothing else. I chose Openbox for my test runs, because it’s a mature program and won’t do something silly, right? Maybe there is a bug in my own window manager. Well, didn’t help. When I did a right-click to open Openbox’s menu, it froze for about 3-4 seconds.
But only the first time. Opening the menu a second time was fast. I also noticed that this only happened the first time after booting.
Okay, what now? Log in to a second text console on tty2 and run this, then start X on the other console:
while true; do ps aux >"$(date +%T.%N)"; done
Interesting, openbox
had 50 to 80% CPU time for some seconds. What is
it doing?
Out of curiosity, I switched back to my original ~/.xinitrc
and had a
look at the process list here. It showed that pretty much all X clients
needed a large amount of CPU time, too. Each terminal, each bar, the
window manager itself – but not tools like sxhkd
.
So, this appeared to be unrelated to the window manager and it’s
probably not a general X problem. But what are these programs doing?
Time for strace
.
I changed one of the calls in my ~/.xinitrc
from
terminal -e foo
to:
strace -tt -ff -o /tmp/term.log terminal -e foo
And there you have it:
...
06:48:17.516879 stat("/usr/share/fonts/100dpi/UTI___10-ISO8859-1.pcf.gz", {st_mode=S_IFREG|0644, st_size=5080, ...}) = 0
06:48:17.516914 openat(AT_FDCWD, "/usr/share/fonts/100dpi/UTI___10-ISO8859-1.pcf.gz", O_RDONLY) = 5
06:48:17.516939 fcntl(5, F_SETFD, FD_CLOEXEC) = 0
06:48:17.516959 fstat(5, {st_mode=S_IFREG|0644, st_size=5080, ...}) = 0
06:48:17.516981 mmap(NULL, 5080, PROT_READ, MAP_PRIVATE, 5, 0) = 0x7efcdda1b000
06:48:17.517006 close(5) = 0
06:48:17.517211 munmap(0x7efcdda1b000, 5080) = 0
06:48:17.517235 openat(AT_FDCWD, "/usr/share/fonts/100dpi/UTI___10-ISO8859-1.pcf.gz", O_RDONLY) = 5
06:48:17.517260 fcntl(5, F_SETFD, FD_CLOEXEC) = 0
06:48:17.517280 fstat(5, {st_mode=S_IFREG|0644, st_size=5080, ...}) = 0
06:48:17.517302 mmap(NULL, 5080, PROT_READ, MAP_PRIVATE, 5, 0) = 0x7efcdda1b000
06:48:17.517324 close(5) = 0
06:48:17.517457 munmap(0x7efcdda1b000, 5080) = 0
...
This was done for every font file on my system. And each and every
program, who uses fonts in some way, did it. sxhkd
is just a hotkey
daemon, it doesn’t display text, so it didn’t show up in the ps aux
log files above. But the bars, the terminals, the window manager, … they
all display text and they all crawled through my fonts.
How many fonts are there? The calls above finish in under one
millisecond, so how can this take so long? Here are some numbers from
ncdu
:
--- /usr/share/fonts ---------------------
24.3 MiB [##########] 39 /TTF
17.0 MiB [###### ] 1,897 /100dpi
15.7 MiB [###### ] 1,897 /75dpi
10.8 MiB [#### ] 584 /misc
7.0 MiB [## ] 2 /noto
5.9 MiB [## ] 26 /OTF
3.2 MiB [# ] 38 /gsfonts
2.8 MiB [# ] 60 /Type1
776.0 KiB [ ] 54 /encodings
704.0 KiB [ ] 13 /local
600.0 KiB [ ] 8 /cantarell
208.0 KiB [ ] 18 /util
16.0 KiB [ ] 3 /cde
12.0 KiB [ ] 2 /cyrillic
4.0 KiB [ ] .uuid
The megabytes aren’t that important, but there are close to 2'000 files
in /usr/share/fonts/100dpi
and …/75dpi
.
But what exactly is going on? Why are these fonts being processed? Why
only the first time? At the end of the strace
log, there was something
along these lines:
06:54:55.253048 access("/var/cache/fontconfig", W_OK) = -1 EACCES (Permission denied)
06:54:55.253078 access("/tmp/cache-void/fontconfig", W_OK) = 0
A-ha.
$XDG_CACHE_HOME
in /tmp
My ~/.bashrc
does this:
export XDG_CACHE_HOME=/tmp/cache-$USER
install -dm0700 "$XDG_CACHE_HOME"
A lot of programs adhere to the freedesktop.org proposals these
days, so they’ll use $XDG_CACHE_HOME
among others. A lot of junk ends
up there:
I want to remove all these files on a regular basis. I have often found caches to be the cause of “evil” things. Some program decides something, puts it in its cache, and it’ll stay there for a long time. This can be really annoying and it can lead to misbehavior.
Some people go one step further and have their entire $HOME
in
RAM. I think that’s a good idea, I’ve just been way too lazy to
set it up. See, when I boot the computer, I want a clean, well-defined
state. Leftovers from previous sessions can be useful, but most of the
time they feel like litter and garbage from the last party.
Well, okay then, but the thing is that fontconfig will put its cache
in $XDG_CACHE_HOME
, too. And, as we already saw, creating that cache
is rather expensive. What can I do about that?
The first thing to note: The strace
logs show that the programs
process files in /usr/share/fonts
and then they try to write to
/var/cache/fontconfig
. This means that
Me having $XDG_CACHE_HOME
point to /tmp
only revealed this issue,
it’s not the cause of it. The issue revolves around the system-wide
cache, not about the one of my user.
On Arch Linux, there is a pacman hook to automatically update the system’s fontconfig cache:
$ cat /usr/share/libalpm/hooks/fontconfig.hook
[Trigger]
Type = File
Operation = Install
Operation = Upgrade
Operation = Remove
Target = usr/share/fonts/*
[Action]
Description = Updating fontconfig cache...
When = PostTransaction
Exec = /usr/bin/fc-cache -s
Sadly, I didn’t dig deep enough to find out why it got out of date. I
was in a bit of a hurry and just ran fc-cache -s
as root.
What’s in that cache anyway? Do you really need it? Well, yes, you do – the mere fact that it takes several seconds to build the cache is a strong indicator. From what I can tell, fontconfig opens each font file, possibly decompresses it, and reads attributes like
Removing the font cache and running FC_DEBUG=32 fc-list
gives you an
idea of what’s probably in there.
I don’t want to give up having caches in /tmp
. Plus, it really isn’t
the cause of this issue: Having a persistent per-user fontconfig cache
in $HOME
would have covered up a broken system-wide cache, which is
exactly the kind of problem I’m trying to avoid. Instead, my
~/.xinitrc
now contains this, before starting any other programs:
echo 'Running fc-list to make sure fontconfig cache is up to date ...'
fc-list >/dev/null 2>&1
echo 'fc-list done.'
This triggers an update of my user’s cache in /tmp
. That cache is only
meant for the tiny set of fonts I put in ~/.fonts
– none of the
/usr/share/fonts
stuff will end up here. Running this implicitly means
that all programs starting afterwards operate on a fresh cache and won’t
trigger a cache update by themselves – which is what happened before,
because they all started at the same time and found an outdated cache.
The echo
statements are important, too. Right at the top of my
~/.xinitrc
, there is this:
#!/bin/bash
[[ "$1" != logging ]] && exec systemd-cat "$0" logging
This collects all output of those programs in my journal. (Due to file
descriptor inheritance, this also covers programs run by sxhkd
later
on, and thus also by dmenu
.) While systemd’s journal is still
controversial, it does two nice things:
(Yes, these aren’t unique features. As I said, it’s controversial.)
The next time I’m having issues like this one right here, I can look at
journalctl -ra
. The time stamps will quickly reveal if it was a
fontconfig issue.