lariza might get JS user scripts – and, thus, a follow mode

2020-02-24

Background information
----------------------

 What lariza is and what it's not

  ...

  Especially, it's very likely that lariza will never have a "follow
  mode" like dwb, luakit or others have. I've used these browsers for
  quite some time and I've also used Firefox extensions that add a
  "follow mode". The point is, "follow mode" doesn't work anymore. This
  was a good thing ten years ago. Today, a lot of websites make heavy
  use of JavaScript or hovering. You NEED some kind of pointing device.
  I found using "follow mode" to be very frustrating today, because you
  still have to reach for the mouse all the time. So, you might as well
  just optimize your mousing workflow.

“Ten years ago” is referring to 2004 and might even be too optimistic.

Well, this could change and here’s why.

2019 was a “year of pain”. This sounds a bit dramatic, yeah. :) Throughout the year, I was struggling with wrist pain and other issues involving that area. It still remains a bit mysterious and it’s not really over, yet. The point is, using the mouse is really bad for me at the moment. As stated above, lariza pretty much forces you to use the mouse, so I simply had to stop using it and went back to Firefox. Firefox has other silly bugs, so I went on to Chromium – with Vimium as a plugin.

Vimium has some nice shortcuts, but most importantly, a “follow mode”. This is so much more comfortable than using the mouse or a touchscreen. If you ask me, every browser (every UI?) should have this built-in and it should be the standard way to navigate, but well.

So, it is literally painful to use lariza in its current state. I don’t want to abandon the project, so this has to change.

Here’s a preview.

lariza’s follow mode

Press a hotkey (currently f) to activate link hints:

Note that labels are placed right of the actual links, which is different from other implementations of this feature – we’ll come back to this. The labels are shaped a bit like little arrows. I hope this avoids some confusion.

You know the rest: Type some characters of the labels to narrow down the selection. I hit h next:

Bright green shows the current selection and I could hit Enter to open it. Note that there also is a highlighting of the actual element that is being selected – we’ll come back to that, too.

I hit d next:

Now there’s only one label left, but I still have to press Enter to actually select it. And, yep, we’ll come back to this as well.

How follow mode is implemented

WebKit separates the C/GTK code from everything that happens on the web site. In other words, you cannot manipulate the site’s DOM from C. You must use JavaScript. lariza already runs a short JS snippet to retrieve URLs for RSS/Atom feeds, for the same reason.

This poses a number of problems.

First, of course, follow mode has to be implemented in JS, too. You must access the DOM and that’s only possible from JS. The issue here is not that “JavaScript is a bad language”, but the fact that it’s an entirely different world – a big context switch from C and GTK. It’s basically a second program that has to communiate with the main browser. Its code runs asynchronously, there is a different set of input events, all that.

The way I did it was to implement a generic “user scripts” mechanism, similar to what surf has. Unlike surf, though, you can just put multiple files in a folder and lariza will run them all after every successful page load.

To be honest, though, I don’t see much use for user scripts aside from a follow mode. You might be thinking “custom key bindings”, but please keep on reading. Oh, yeah, you can use user scripts to “sanitize” some web sites, i.e. remove annoying elements. But that’s pretty much it, I think. Or maybe I’m just not creative enough. It was the least invasive way to get things going, though – the diff to core lariza is tiny and it knows nothing about follow mode at all. It just tells WebKit to run some JavaScript. As you will see, this “kind of” works, but is far from perfect and maybe even from “good”.

What I could not get to work is to isolate my own scripts from the scripts that come with the actual web site. That is, unless you wrap everything in an anonymous function as follows, the web site will be able to read and manipulate your variables! Or you accidentally clobber some important name from the web site. Pretty annoying – and it looks like browsers like Chromium work differently.

(function() {

    // Configuration
    var charset = "sdfghjklertzuivbn".split("");
    var key_follow = "f";
    var key_follow_new_win = "F";

    function update_highlights_or_abort()
    {
        ...
    }

    ...

}());

While this can be worked around using said anonymous function, things get a lot worse when it comes to key event handlers. See, you’re fighting the web site. The Firefox plugin Vim Vixen has the same problem (and loses the battle – key handlers of the web site and those of the plugin both trigger). Only Vimium has found a way to kind of work around this issue, although the code for it has comments like “run as early as possible, so the web site can’t grab keys before us”. Or maybe it’s just Chromium that does a better job at separating things, I don’t know. Now that I think about it, I know a bunch of web sites that I have to use at work where even Vimium fails.

This really sucks a lot. I couldn’t find a way to avoid it. I guess I would have to implement a custom JS API to cleanly solve this issue, so lariza user scripts can register event handlers via some custom hooks instead of the standard document.addEventListener(). Every key event would be passed to those custom hooks first and, if they decide they don’t want to handle it, then pass them to the web site as usual. I haven’t implemented that yet, because it involves a lot of stuff (and there are other things that fail miserably, so I’m not sure if this work is worth it). Instead, there’s a lousy workaround:

When follow mode gets activated, we spawn a hidden input box and focus it. When you type the characters of one of the labels, you actually enter text into this box. Ridiculous, but appears to work – at least for typing the labels. If a web site happens to grab the initial f key when no input box is focused, you’re lost. I thought about making it Alt + f, but that’s very inconvenient.

Another setback was WebKit’s performance of element.getBoundingClientRect(): This function can calculate the visible position of an element. Ideally, we would use this function to place the labels “exactly” at the position of the elements they’re referring to. But it’s slow. Really slow. On crowded web sites, it takes several seconds to create all labels on my i7 that runs at 3.4 GHz. The function in question appears to be fast in Chromium and Vimium uses it, which is why it gets very good results.

But we can’t use it. As a workaround, we try to place the labels as close as possible to the elements in the DOM and simply hope that they will show up in proximity. For <a> elements, we add them as a child. This is why they show up right of a link element.

On “fancy” web sites, this often does not work. Sometimes, the labels indicate that they refer to some element, but they’re actually linked to something completely different. It’s why the link itself gets highlighted when you select a label. And it’s why you always have to hit Enter to actually confirm a selection, because – depending on the nastiness of the web site – it can be easy to accidentally select something completely different.

And, of course, all this happens deep in the web site’s DOM. The web site’s CSS applies to our labels, too. Sometimes, they’ll be in upper-case letters because the page says so. Or there will be a little icon in front of them because the page uses the :before pseudo-attribute. Or other elements will cover parts of our labels because, guess what, the page says so. Which finally leads us to the next paragraph.

A follow mode will always be an incomplete solution

Just check the bug trackers of Vimium and Vim Vixen. Even those plugins don’t work all the time. Vimium is actually pretty damn good, but still, you can’t select all elements on some web sites.

This is not a surprise. Web sites are not passive documents. Not anymore. They’re applications. And you, as an implementor of “follow mode”, are fighting those applications. You can think of it this way: You have a native application written in C for X11 and then you have some other program that runs in the same address space and tries to find all “clickable” elements. You know the location of the target framebuffer and there is some “API” to find basic elements like a button, but strictly speaking, you have no idea what’s “clickable”, because every pixel could have some event attached to it through some code. And then you’ll have to fight the application’s internal focus system. And its event system. And what not. I’m not sure if this analogy is entirely fair, but that’s what it feels like. This might just work for simpler programs like leafpad, but not for Blender. It’ll always be incomplete.

The most important examples:

Many HTML elements could have JavaScript functions attached to them. How do we detect them? If the web site uses element.addEventListener(), there is no way to find out, because you cannot enumerate existing listeners of an element (afaik).
How do you handle hovering? Using just CSS, you can create hover menus. To open those, we would have to simulate the mouse hovering over the “root” element or what … ?
How do you handle “semi-native” elements like the controls of an <audio> tag, which have no representation in the DOM?
Even if we were able to use element.getBoundingClientRect(), this would only get us the bounding box of an element. So, where do we put our label? Top left? Center? We have no idea what the user perceives as “the element” or “the clickable area”. It can still be ambiguous, especially for large elements.
Of course, as already mentioned, we fight for key input events.

I think, if you want to get all this to work, you need to do it inside the rendering engine itself. Or better, simplify the web again and remove all those fancy features that often serve no real purpose in the first place.

All this is why I didn’t implement a follow mode for lariza in the past and never planned to. If it wasn’t for those wrist issues, I would never have bothered to even try implementing it, but yeah, I eventually did it because it’s better than nothing.

Closing thoughts

The only feature that lariza might gain is user scripts. I’m not sure yet if I’m going to bundle follow_mode.js or if I put it into another repository. Also, I have to test the current solution for a while longer. Maybe I’ll abandon it after all, if it’s just not good enough.

I would like to avoid turning lariza into yet another “vi-like browser”. There are several of them. Aside from that, it’s a ton of work. Right now, lariza is small and that’s a very good thing.

But … yes, follow mode breaks some design concepts of lariza. Suddenly, there’s a mixture of “all browser keys are handled in C and prefixed with Alt” and “hit f to activate a special mode in the JavaScript world”. It feels a bit like a hybrid of Emacs and vi. Not very clean. It feels like some of the existing key bindings and concepts should be changed to fix this, but then we would end up with just another vi-like browser.

I honestly would like to keep follow mode an optional feature. It’s such a weak concept and often fails.

All issues aside, it’s much more fun to use lariza now that it has a follow mode (for me – in my current situation). :) It’s one of those “80% solutions”, but when it works, it’s great and very comfortable.

Comments?