`time(1)` and children

2020-08-29

I was running a shell script on a server. It forked off several other processes and then waited for them to quit. I was making some optimizations to that script and wanted to see if they were actually making it run faster. So, I did the obvious thing:

$ time my_script

The numbers looked reasonable. Then I wondered: Hold on, the script itself isn’t doing anything. It just launches a bunch of other processes and then just sits there and waits. Why does time show such high numbers for my script?

Turns out, the output of time is a little bit misleading. At least to naïve readers like me.

It so happened that the real /usr/bin/time was used on my server, so I’m not looking at a potential shell builtin for time, but that probably works the same way.

The syscall `times(2)`

Looking at the implementation of GNU time, you can see that there are (of course) multiple ifdefs. One of them assumes that the operating system only provides a simple wait(): It waits until some child quits. In this case, this is of course the command spawned earlier, so it’s my_script.

wait() only tells you about the exit status of the child. So how do we get its execution times? Also keep in mind that wait() reaps that process from its zombie state, so we can’t issue any other calls for the child PID.

We ask the operating system for the execution times of the current process, that is time itself. You can do that using times():

times() stores the current process times in the struct tms that
buf points to.  The struct tms is as defined in <sys/times.h>:

struct tms {
    clock_t tms_utime;  /* user time */
    clock_t tms_stime;  /* system time */
    clock_t tms_cutime; /* user time of children */
    clock_t tms_cstime; /* system time of children */
};

There you go, tms_cutime and tms_cstime tell us about the execution times of time’s children.

Still doesn’t answer my original question, but it’s explained just a few lines below:

The  tms_cutime  field  contains  the  sum of the tms_utime and
tms_cutime values for all waited-for terminated children.   The
tms_cstime   field  contains  the  sum  of  the  tms_stime  and
tms_cstime values for all waited-for terminated children.

So, tms_cutime is not only the child’s “user time”, but the “user time” plus “user time of its children”. As this is presumably tansitive, you get the times of the entire process tree.

In other words, time doesn’t even have a way in this case to tell the execution time of my_script apart from the execution time of the processes it launches.

`wait3()` and `wait4()`

The GNU implementation also has another code path where it uses one of those syscalls (actually, only wait3(), but that’s implemented using wait4() on my system). It does what I originally expected would happen:

pid_t wait3(int *wstatus, int options, struct rusage *rusage);

If  rusage  is  not  NULL, the struct rusage to which it points
will be filled with accounting  information  about  the  child.
See getrusage(2) for details.

I guess it’s safe to assume that this version also operates transitively / recursively, so you get the times of the entire process tree.

This is much more flexible than times() in cases where you have to deal with multiple children.

Initial confusion

My first (naïve) attempt to figure out what’s going on looked like this:

#!/bin/bash

for (( i = 0; i < 10000000; i++ ))
do
    true
done &
killpid=$!

sleep 1
kill $killpid

And the result:

$ /usr/bin/time ./a.sh 
0.00user 0.00system 0:01.00elapsed 0%CPU (0avgtext+0avgdata 3852maxresident)k
0inputs+0outputs (0major+371minor)pagefaults 0swaps

Whoops, all zero.

Well, with the knowledge above, it’s trivial. We don’t wait.

The whole wait vs. no wait issue took me a second to notice. I mean, in theory, the operating system could always include the execution times of children in the times reported for their parent, even without a wait. In my mind, wait was simply a way to tell the OS to wait for end of that process – I wasn’t aware that it also had implications on resource usage statistics.

And, well, I’m not sure if it’s entirely obvious that execution times of children should be included in the first place. What if I wanted to measure my_script, really, just my script? To find out if it adds any significant overhead? It appears that’s not possible from “the outside” and must be done in my_script directly, but I’d be happy to learn otherwise.

Discussion of this blog post on nixers.net.

Comments?

time(1) and children

The syscall times(2)

wait3() and wait4()

Initial confusion

`time(1)` and children

The syscall `times(2)`

`wait3()` and `wait4()`