blog · git · desktop · images · contact & privacy · gopher

Trying to verify the running blob of bpfilter_umh


Previous blog post on this topic: The Linux kernel can spawn processes on its own.

Quick recap, we can try to obtain the blob that’s currently running as a process:

root@ubuntu2004:~# ps aux | grep bpf
root        1057  0.0  0.0   2488   572 ?        S    04:51   0:00 bpfilter_umh

root@ubuntu2004:~# cp /proc/1057/exe bpfilter_umh

root@ubuntu2004:~# file bpfilter_umh 
bpfilter_umh: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), \
    dynamically linked, \
    interpreter /lib64/, \
    BuildID[sha1]=dedb02411fea70a0290d017481abf4fc66261ad8, \
    for GNU/Linux 3.2.0, not stripped

I’d like to know if this blob is what I think it is, i.e. if it really comes from the corresponding bpfilter.ko file. (As I said in the previous post, if this usermode driver was an individual ELF file installed by the package manager, this task would be much easier.)

We know how this blob gets included in the .ko file:

So, if these two labels are still present in bpfilter.ko as symbols (I don’t know if .ko files can be stripped), we should be able to extract that area of the file. In my test runs (Ubuntu 20.04), the file was not stripped:

$ file bpfilter.ko
bpfilter.ko: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), \
    BuildID[sha1]=e79e277a312a9d911b2bc4f0eb9e57c1a2bbfc8a, \
    not stripped

Let’s try to find the symbols:

$ readelf --syms bpfilter.ko
Symbol table '.symtab' contains 58 entries:
   Num:    Value          Size Type   Bind   Vis     Ndx Name
    40: 0000000000004288     0 NOTYPE GLOBAL DEFAULT  12 bpfilter_umh_end
    48: 0000000000000000     0 NOTYPE GLOBAL DEFAULT  12 bpfilter_umh_start

There they are.

Now, what does the “Value” field mean? man 5 elf isn’t of much help here, it just says this:

       This member gives the value of the associated symbol.

I found these two links:

They say it depends on the type of ELF file, so let’s check this first:

$ readelf --file-header bpfilter.ko 
ELF Header:
  Type: REL (Relocatable file)

For REL files, the symbol value is an offset in the section that this symbol belongs to. That would be “Ndx 12” in our case. So let’s look at the section listing:

$ readelf --sections -W bpfilter.ko 
Section Headers:
  [Nr] Name    Type     Address          Off    Size   ES Flg Lk Inf Al
  [12] .rodata PROGBITS 0000000000000000 000307 004288 00   A  0   0  1

“Off” is explained in man 5 elf:

       This  member's  value holds the byte offset from the be‐
       ginning of the file to the first byte  in  the  section.
       One  section  type, SHT_NOBITS, occupies no space in the
       file, and its sh_offset member  locates  the  conceptual
       placement in the file.

Great, so 0x307 is the offset in the file that we’re looking at. This means our blob should start at 0x307 (bpfilter_umh_start has value 0) and it should be bpfilter_umh_end - bpfilter_umh_start = 0x4288 bytes long.

Let’s extract it and see if it matches:

$ dd if=bpfilter.ko of=bpfilter_umh.extracted bs=1 skip=$((0x307)) count=$((0x4288))
17032+0 records in
17032+0 records out
17032 bytes (17 kB, 17 KiB) copied, 0.0197999 s, 860 kB/s

$ sha256sum bpfilter_umh bpfilter_umh.extracted 
3a1c79a7c06a23658410cd02d8a805646af58b0df5159120efebf4b7c20878ba  bpfilter_umh
3a1c79a7c06a23658410cd02d8a805646af58b0df5159120efebf4b7c20878ba  bpfilter_umh.extracted

It indeed does.

So, when you take bpfilter.ko from a trusted source, you can at least check if the currently running blob matches that trusted source.

(Assuming /proc/$pid/exe really does give you the currently running binary … And of course this program could have vulnerabilities of its own, so it might not be doing what you think it does … Verifying that blob checksum is just another piece of the puzzle. It’s hard to make definitive statements in this area.)

Side track: Trying to do the same with “normal” program binaries

Take the following program:

#include <signal.h>
#include <unistd.h>

int foo = 0x11223344;

    kill(getpid(), SIGSTOP);
    return 0;

The goal is to extract the value of the foo variable from the binary on the disk.

Compile it and dump the information we gathered above:

$ cc -Wall -Wextra -o foo foo.c

$ readelf --file-header foo
ELF Header:
  Type: DYN (Position-Independent Executable file)

$ readelf --syms foo
Symbol table '.symtab' contains 39 entries:
   Num:    Value         Size Type   Bind   Vis     Ndx Name
    30: 0000000000004038    4 OBJECT GLOBAL DEFAULT  24 foo

$ readelf --sections -W foo
Section Headers:
  [Nr] Name  Type     Address          Off    Size   ES Flg Lk Inf Al
  [24] .data PROGBITS 0000000000004028 003028 000014 00  WA  0   0  8

This time, it’s not a REL file but a DYN file. In this case, the symbol value that we see (0x4038) is “a virtual address” according to the docs linked above.

Next, man 5 elf says this about the “Address” field of sections:

       If this  section  appears  in  the  memory  image  of  a
       process, this member holds the address at which the sec‐
       tion's first byte should reside.  Otherwise, the  member
       contains zero.

To my understanding, this means that the “Address” field of a section is a virtual address as well, so it’s the same “unit” as the “Value” field of the symbol. Intezer (disclaimer: I don’t know this company, they just happened to host documentation – I’d rather not link to some random company, but docs are a little hard to come by in this area) also uses the term “virtual address” for sh_addr, so I assume I’m correct.

So, the section .data should be loaded at the virtual address 0x4028 and our foo symbol is at the virtual address 0x4038 – in other words, it should have an offset of 0x10 bytes in the .data section. Virtual addresses aren’t very useful to us since we’re only inspecting the file on the disk, but this offset is. (And these virtual addresses are subject to relocation anyway, as we’ll see in a bit.)

We already know what the “Off” field of sections means, so we can try to extract our variable (the symbol has “Size 4”, as one would expect for an int on amd64):

$ dd if=foo of=foo.var bs=1 skip=$((0x3028 + 0x10)) count=4
4+0 records in
4+0 records out
4 bytes copied, 0.000173904 s, 23.0 kB/s

$ od -t x1 foo.var 
0000000 44 33 22 11

There you go, that’s our data.

A side track to the side track

This little program contains a kill(getpid(), SIGSTOP), so we can play with virtual addresses a bit (when it stops itself, you’ll get back to the shell – run fg to continue and thus quit the program):

$ ./foo

[1]+  Stopped                 ./foo

$ cat /proc/43672/maps
563d2a84d000-563d2a84e000 r--p 00000000 00:20 57554  /tmp/tmp/foo
563d2a84e000-563d2a84f000 r-xp 00001000 00:20 57554  /tmp/tmp/foo
563d2a84f000-563d2a850000 r--p 00002000 00:20 57554  /tmp/tmp/foo
563d2a850000-563d2a851000 r--p 00002000 00:20 57554  /tmp/tmp/foo
563d2a851000-563d2a852000 rw-p 00003000 00:20 57554  /tmp/tmp/foo

(When you run this multiple times, you’ll see relocation in action.)

Let’s see if our data actually got loaded at the virtual address 0x4038, as the symbol value claimed. I’m assuming that these virtual addresses are relative to the first address that we see here in the maps file. So our foo variable should be at 0x563d2a84d000 + 0x4038 = 0x563d2a851038.

$ sudo gdb -p 43672
Reading symbols from /tmp/tmp/foo...
(gdb) x/1xw 0x563d2a851038
0x563d2a851038 <foo>:   0x11223344

There it is and gdb confirms that this corresponds to the foo symbol that it found in the file.