blog · git · desktop · images · contact
time(1)
and children2020-08-29
I was running a shell script on a server. It forked off several other processes and then waited for them to quit. I was making some optimizations to that script and wanted to see if they were actually making it run faster. So, I did the obvious thing:
$ time my_script
The numbers looked reasonable. Then I wondered: Hold on, the script
itself isn’t doing anything. It just launches a bunch of other processes
and then just sits there and waits. Why does time
show such high
numbers for my script?
Turns out, the output of time
is a little bit misleading. At least to
naïve readers like me.
It so happened that the real /usr/bin/time
was used on my server, so
I’m not looking at a potential shell builtin for time
, but that
probably works the same way.
times(2)
Looking at the implementation of GNU time
, you can see that
there are (of course) multiple ifdef
s. One of them assumes that the
operating system only provides a simple wait()
: It waits until some
child quits. In this case, this is of course the command spawned
earlier, so it’s my_script
.
wait()
only tells you about the exit status of the child. So how do we
get its execution times? Also keep in mind that wait()
reaps that
process from its zombie state, so we can’t issue any other calls for the
child PID.
We ask the operating system for the execution times of the current
process, that is time
itself. You can do that using times()
:
times() stores the current process times in the struct tms that
buf points to. The struct tms is as defined in <sys/times.h>:
struct tms {
clock_t tms_utime; /* user time */
clock_t tms_stime; /* system time */
clock_t tms_cutime; /* user time of children */
clock_t tms_cstime; /* system time of children */
};
There you go, tms_cutime
and tms_cstime
tell us about the execution
times of time
’s children.
Still doesn’t answer my original question, but it’s explained just a few lines below:
The tms_cutime field contains the sum of the tms_utime and
tms_cutime values for all waited-for terminated children. The
tms_cstime field contains the sum of the tms_stime and
tms_cstime values for all waited-for terminated children.
So, tms_cutime
is not only the child’s “user time”, but the “user
time” plus “user time of its children”. As this is presumably tansitive,
you get the times of the entire process tree.
In other words, time
doesn’t even have a way in this case to tell the
execution time of my_script
apart from the execution time of the
processes it launches.
wait3()
and wait4()
The GNU implementation also has another code path where it uses one of
those syscalls (actually, only wait3()
, but that’s implemented using
wait4()
on my system). It does what I originally expected would
happen:
pid_t wait3(int *wstatus, int options, struct rusage *rusage);
If rusage is not NULL, the struct rusage to which it points
will be filled with accounting information about the child.
See getrusage(2) for details.
I guess it’s safe to assume that this version also operates transitively / recursively, so you get the times of the entire process tree.
This is much more flexible than times()
in cases where you have to
deal with multiple children.
My first (naïve) attempt to figure out what’s going on looked like this:
#!/bin/bash
for (( i = 0; i < 10000000; i++ ))
do
true
done &
killpid=$!
sleep 1
kill $killpid
And the result:
$ /usr/bin/time ./a.sh
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 3852maxresident)k
0inputs+0outputs (0major+371minor)pagefaults 0swaps
Whoops, all zero.
Well, with the knowledge above, it’s trivial. We don’t wait
.
The whole wait
vs. no wait
issue took me a second to notice. I mean,
in theory, the operating system could always include the execution
times of children in the times reported for their parent, even without a
wait
. In my mind, wait
was simply a way to tell the OS to wait for
end of that process – I wasn’t aware that it also had implications on
resource usage statistics.
And, well, I’m not sure if it’s entirely obvious that execution times of
children should be included in the first place. What if I wanted to
measure my_script
, really, just my script? To find out if it adds any
significant overhead? It appears that’s not possible from “the outside”
and must be done in my_script
directly, but I’d be happy to learn
otherwise.