blog · git · desktop · images · contact & privacy · gopher
2017-10-17
A guy recently asked me:
I have a script that’s located in
~/projects/work/foo/bar
. I added that path to my$PATH
. Now, my script needs to read some more files that are located in the same directory. How do I find them?
That’s a very interesting question. At first, you think that this should be trivial to answer. Turns out that it’s not.
As usual, this blog post assumes that you run a UNIX-like operating system.
– Update, 2018-03-24: Part 2
The answer to that question is that there is no answer.
When you run some program, there is no guarantee that this program is located somewhere on your hard drive. It’s easy to show this: Just remove the program after it has been started.
#include <stdio.h>
#include <unistd.h>
int
main()
{
sleep(5);
printf("I'm still alive.\n");
return 0;
}
Open two terminals, run the program in one of them, remove the binary in the other one. Program continues to run.
Another scenario: What if your program was available through more than one hard link? What’s the location of “the” program then?
So, there is no definite answer.
$foo
!”There are programs which try to cope with that situation. Most people
will probably be familiar with Java. You can drop a JRE somewhere on
your hard drive, put it in $PATH
and it will just work. It’s not that
Java is a massive statically linked binary – Java has to read its JARs
and other files. And, no, you don’t have to set $JAVA_HOME
for this to
work.
How does Java do that?
argv[0]
?There is some sort of convention that argv[0]
contains the name under
which the program has been invoked. This would get us a little closer to
the answer, since it would resolve the issue of multiple hard links. The
problem of removed binaries would remain.
Most importantly, there still is no guarantee that argv[0]
really does
contain the path to your program:
#include <unistd.h>
int
main()
{
execl("/bin/ps", "foobar", "-f", NULL);
return 1;
}
It will show something like this:
UID PID PPID C STIME TTY TIME CMD
void 3465 7122 0 08:24 pts/13 00:00:00 -/bin/bash
void 3544 3465 0 08:24 pts/13 00:00:00 vim -p bla.c
void 30336 1 0 08:42 pts/13 00:00:00 xclip
void 30337 1 0 08:42 pts/13 00:00:00 xclip -selection clipboard -f
void 30688 3544 0 08:42 pts/13 00:00:00 foobar -f
Also note the -/bin/bash
: That additional dash, which clearly is not
part of the program’s path, is used by login(1)
(and others) to ask
for a login shell. That’s another use case of altering argv[0]
.
$PATH
Setting argv[0]
to something weird is not common. You could argue
that you could ignore this in 99% of the cases. Well, it’s not that
easy.
When you add a directory to your $PATH
, it allows you to type ls
instead of /bin/ls
. And that’s just it: As argv[0]
should contain
the name under which your program has been invoked, it now contains just
ls
. Bummer.
procfs
on LinuxLinux has that powerful interface called procfs
. It’s a
pseudo-filesystem usually mounted under /proc
and it contains a lot
of information about running processes. Each process is identified by
its PID under /proc/$pid
. There also is a special symlink called
/proc/self
which always points to the directory for the process
reading that link.
Finally, in /proc/$pid
, you have another symlink called exe
. And
that one points to the path under which your tool has been invoked.
See for yourself:
#include <stdio.h>
#include <unistd.h>
int
main()
{
char buf[4096] = "";
ssize_t len;
if ((len = readlink("/proc/self/exe", buf, (sizeof buf) - 1)) != -1)
{
buf[len] = 0;
printf("[%s]\n", buf);
return 0;
}
else
return 1;
}
This is also how Java finds its files on Linux. You can see it when you
invoke it using strace
:
07:52:14.046676 execve("/opt/jdk-9/bin/java", ["java", "Test"], 0x7ffda7b0fa28 /* 10 vars */) = 0
07:52:14.047496 brk(NULL) = 0x1114000
07:52:14.047592 readlink("/proc/self/exe", "/opt/jdk-9/bin/java", 4096) = 19
$PATH
by yourselfWhen procfs
is not available, you can try to do what the shell (or
kernel) has done to find your binary. After all, you just entered ls
and something found your program, so why wouldn’t you be able to do just
that?
This is what Java does on OpenBSD:
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/sbin/java"
93056 java RET stat -1 errno 2 No such file or directory
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/usr/sbin/java"
93056 java RET stat -1 errno 2 No such file or directory
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/bin/java"
93056 java RET stat -1 errno 2 No such file or directory
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/usr/bin/java"
93056 java RET stat -1 errno 2 No such file or directory
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/usr/X11R6/bin/java"
93056 java RET stat -1 errno 2 No such file or directory
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/usr/local/sbin/java"
93056 java RET stat -1 errno 2 No such file or directory
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/usr/local/bin/java"
93056 java RET stat -1 errno 2 No such file or directory
93056 java CALL stat(0x7f7ffffc73d0,0x7f7ffffc7350)
93056 java NAMI "/usr/local/foobar/bin/java"
93056 java STRU struct stat { dev=1027, ino=155914, mode=-rwxr-xr-x , nlink=1, uid=0<"root">, gid=7<"bin">, rdev=623880, atime=1508220651<"Oct 17 08:10:51 2017">.082047559, mtime=1506962794<"Oct 2 18:46:34 2017">, ctime=1508219904<"Oct 17 07:58:24 2017">.120012249, size=63657, blocks=128, blksize=16384, flags=0x0, gen=0x2049be9e }
93056 java RET stat 0
(Please take this – and Java’s usage of /proc/self/exe
above – with a
grain of salt. I haven’t looked up Java’s source code, because it’s so
incredibly complex, but the output above is a strong indicator that Java
actually works this way. Either way, I’m just trying to make the point
that manually traversing the $PATH
is another possible approach.)
I won’t even start talking about the race conditions involved here.
Of course, this only works if your program still has access to the
original $PATH
variable. You can forcibly break this.
executor.c
:
#include <unistd.h>
int
main()
{
char *env[] = { "PATH=broken", NULL };
execle("my-sub-program", "foo", NULL, env);
return 1;
}
my-sub-program.c
:
#include <stdio.h>
#include <stdlib.h>
int
main()
{
printf("PATH=%s\n", getenv("PATH"));
return 0;
}
Running:
$ export PATH=$PATH:.
$ executor
PATH=broken
The fact that it shows PATH=broken
means the sub-program has been
executed – but it can no longer see the original $PATH
.
This is what many programs do. A lot of software is distributed as source code and there often is a build process. During build, a fixed string can be put into the compiled program. When you finally invoke it, it won’t have to “resolve” anything, it will just look under the hard coded path.
This is what ratterplatter
does:
Or irssi:
But of course … there are scenarios where this is not feasible. For example, when you’re dealing with a rather simple script without a build process – as the guy did who originally asked me.
It doesn’t look too good. Whatever you do, it’s fragile.
Unless: Maybe I’m missing something here. Is there a better way? Please tell me. :-)