katriawm: The adventure of writing your own window manager

2016-01-05

I’m writing a window manager. Okay, wait, I can hear you moan. I’m not here to tell you that this is the best window manager you’ll ever use. This is not an advertisement. We all know that there are tons of window managers out there. Instead, I want to tell you a little bit about the experience of writing a WM. Why did I do it and what was it like? What did I learn?

What brought me here

My first contact with “tiling window managers” was in the 1990’s. It was a kind of manual tiling WM. To be precise, it was Windows 3.0. Windows had this feature of placing all visible windows side-by-side. I rarely used it and it adds little to my story, but it illustrates that the concept of “tiling windows” isn’t particularly new. Why am I talking about tiling WMs anyway? Because they were the ones that introduced me to in-depth customization. You’ll see.

Later, when I was already using GNU/Linux as my main OS, Windows 7 was released. It introduced a feature called “aero snap”: Drag a window to the left or right border of your screen and Windows will resize the window to fit into half of the screen. Again, kind of manual tiling. I liked the feature, so I hacked together a little Python script that would allow me to do the same on Xfce. That was around 2008 or 2009.

Also in 2009, I gave wmii a first try. It was a little slow, though, so I switched to awesome in 2010. Awesome was very easy to customize, so I started doing that. It can be customized way beyond “theming”. You can customize the actual window management routines. Even though I abandoned both my customization module and awesome later on, the legacy still lives on as lain.

Why did I leave awesome? Because I switched to dwm in 2012. This was a completely different level of customization as you can only customize dwm by editing its C source code. Inevitably, I learned a little bit about what actually happens behind the scenes. Basic Xlib calls, basic drawing, stuff like that.

Problem is, it’s always hard to understand source code you didn’t write yourself. Additionally, dwm’s source code is very “compact” and not always easy to understand. There’s little comments. So, even after some years of using it and (heavily) customizing it, I did not fully understand what was going on.

Thus, in 2014, I started hacking on my very own window manager. I wanted to know precisely how a window manager works. How do clients communicate with the WM? What the heck are “transient windows”? What is ICCCM, what’s EWMH? And many more questions.

Basically, that’s the exact reason for writing your own WM: You get to learn how things work. And, sometimes, you even get to learn why things work the way they do.

Why I failed in 2014

Unfortunately, I chose to write a reparenting WM back in 2014. I read everywhere that this is what virtually all window managers do, so it must be the way to go, right? Also, I really wanted to have proper window decorations. Especially, I wanted to have title bars which dwm cannot provide. Everywhere you look, people say that you need reparenting in order to draw window decorations.

Turned out, reparenting is not as easy as you might think. Reparenting adds an additional layer of complexity – or maybe even more than one layer.

Plus, reparenting does not magically fix all your problems. For example, Java expects to run under a reparenting window manager by default. If it doesn’t, then you might only get a grey window. Surely, when you write a reparenting WM, even a simple one, this must be fixed, right? No, it won’t be fixed. I ended up with either half of the window being grey or with misplaced menus.

This was a very frustrating experience. It led me to abandon the project pretty soon.

Trying again in December 2015

1.5 years have passed and I was still using my heavily customized dwm. Happily so, I must add. Still, I was not so happy about the fact that I didn’t fully understand what was going on.

Three weeks of vacation. This is an important thing to note because I couldn’t have succeeded otherwise. Writing your first WM is a difficult endeavour, you have to read a lot and think a lot. You must be able to leave your computer for a while and go for a long walk in order to clear your mind. I guess you just can’t do this if you have to go to work every day – no matter how much you like your job. When you get home, you’re tired and you do not want to sift through endless pages of Xlib documentation from the 1990’s.

My initial project failed and the fact that I tried to write a reparenting WM played an important role. In the meantime, a user on the suckless mailing list suggested to simply not do reparenting. Why not just draw the decorations on separate windows and that’s it? Why bother with reparenting?

So, I decided to just try. Write a non-reparenting WM with decorations and see what happens.

Long story short: It’s totally fine.

And, of course, I was (and probably still am) a beginner and knew little about Xlib and X11. Without reparenting, things get easier. For a beginner, that’s the way to go. Your first WM should not be a reparenting one. You can still start over and write a reparenting WM later on when you finally understand what the heck is going on in X11 land.

The result: katriawm

katriawm is available on GitHub. Here’s a screenshot and another one. Quoting the current feature set from the README file:

- Tiling and floating
- Title bars and theming
- Per-screen workspaces, per-workspace layouts
- Application rules, layout save slots (during runtime)
- No sloppy focus
- Basic ICCCM and EWMH support
- Non-reparenting

If you wonder why it’s called “katriawm”, go have a look at the full README.

A side project is xpointerbarrier. That’s something that I had previously incorporated into dwm. It creates barriers around your workarea which the mouse pointer cannot pass. It’s meant to increase usability because your pointer can no longer suddenly jump to a different monitor.

Lessons learned about X11 and dwm

First of all, Xephyr was extremely helpful during early development.

As stated above, one of my motivations was to better understand dwm. Oh yes, I do understand dwm much better now. A lot of things suddenly make sense. In retrospect, many of my patches to dwm now seem horrible. Furthermore, dwm is not as complicated as I thought. The code is “compact”, yes, but that’s because many things are just obvious if you know how a WM should act.

You might argue that the dwm devs were pretty lazy. :-) Things could have been commented on. But, on the other hand, you don’t comment on how to open a file, do you? People are expected to be familiar with file descriptors. Similarly, there’s no comment in dwm on what a “client message” is and where it comes from, for example.

Documentation is hard and expensive. I get that a project that aims at “elitist” users does not tell you about how to turn on your computer (from their point of view). However, this makes you appreciate the i3wm documentation even more.

Speaking of “client messages”: I did learn a lot about these kind of things. How can clients communicate through X11? What are “properties”, what are “atoms”? These questions have been answered.

katriawm comes with a client, katriac, because the main window manager does not process any user input (more on that topic can be found in README and INTERNALS.md). This client talks to katriawm only through X11. No named pipes, no UNIX sockets, no DBus. It just sends an X11 client message to the window manager. In fact, that’s pretty similar to what X11 clients do when they send EWMH messages to the WM. So, katriac is just an ordinary X11 client. I think this is very nice because you save a lot of code due to the fact that X11 does all the authorization and dirty work for you. SSH X11 forwarding? No problem. Running X11 over TCP on another machine? No problem. Multiple X sessions and multiple running instances of the WM? Sure. Not that these kinds of things are particularly useful in everyday life, but a client like this is just the natural thing to do.

Herbstluftwm has a similar client. It works a little differently, though, because X11 client messsages are rather limited. You only get 20 bytes of payload per message. Still, it’s sufficient for katriac. bspwm on the other hand uses UNIX sockets.

Now what are client messages? X11 thinks in terms of “events” and it sends those events to your client or from one client to another. For example, when the server requires a client to draw to a window, it will send an Expose event. ClientMessages are one particular type of event, some kind of “general purpose message”. Let’s say the user wishes to close a window, the WM will send a client message to the window in question asking it to please close itself. This “please close this window” message is part of the ICCCM conventions. Virtually all X11 clients follow this convention. It’s so old that there are many ICCCM-specific utility functions available in Xlib. EWMH, on the other hand, a more modern set of conventions, is not “supported” by Xlib at all. But that doesn’t matter as the X11 server does not really interpret client messages anyway. You can come up with your own message types and this is what katriac does.

X11 properties are no longer a mystery to me. Simply put, properties are arbitrary data associated with a window. Important detail: X11 itself does not interpret this data. You can read properties using the command line tool xprop. You’ll discover that the window title is nothing but a property. X11 itself has no notion of window titles – they are part of ICCCM and, to some degree, EWMH. Also, I can now recognize the fact that dwm’s status bar, which reads from a property on the root window (a special window that is always present and easy to discover), is quite elegant. It’s not even dwm-specific. Sadly, no external bar or panel that I know works the same way.

Speaking of bars: katriawm does not draw a bar by itself. Instead, it “publishes” its internal state using a property on the root window. External tools can monitor that property and thus create a bar. I’m not particularly happy with my current Bash script for this job, but I like the idea of loosely coupling tools.

When using dwm I accepted that there are no title bars. It’s okay-ish, I can live with that. But, title bars are extremely useful. Now with katriawm, I have title bars. Yes, it takes some code, but it’s worth it. I wouldn’t want to miss them.

What was pretty hard to get right is focus and focus history. When does the window manager focus which window? There’s so many corner cases. You have to deal with applications that steal focus. Unlike other areas, the window manager cannot dictate focus but the applications can grab it by themselves, which makes it very hard for the WM to enforce a certain focus policy. Also, which data structure do you use to store the focus history? In the end, I came up with a nice solution that allows you to retain a complete focus stack among all monitors and workspaces. That is, when you change the workspace, you can always jump back to the previously selected window on that workspace.

“Fullscreen mode” has been demystified. I never really understood what was going on when applications went to fullscreen mode. You see, not all window managers allow you to put any application into fullscreen mode – this is proof enough that this mode is very special, right? Maybe it requires special support by the application? Maybe it’s a completely different window type? Now I know: Clients simply ask the WM to activate fullscreen mode by sending it an EWMH message. The WM then hides any decorations and resizes the window. Finally, he sets a property on the client window, telling it the request has been fulfilled. That’s all there is to it. This, in turn, means that there’s no (technical) reason not to put any application into fullscreen mode. Fullscreen GIMP or xfontsel or xeyes? Why not.

By the way, I think that writing a window manager could be considered easier than writing a client (unless it’s a very simple one). It feels like the state of mind of X11 devs appears to be: “Clients have no rights. They are slaves to the window manager.” Clients must be able to deal with whatever the WM imposes on them. They ask for a 300x300 pixel window but get a 200x800 pixel window, 2000 pixels away from where they thought their window should appear. They have to check for non-reparenting, reparenting, double reparenting, … At any given time, their windows might even get unmapped and no longer be visible. I assume this is pretty annoying. On the other hand, a window manager can do anything it wants to.

Lessons learned about software development

I’m not a professional software developer. I only do this in my spare time. Many of the projects I’m workig on (privately) are very simple. They require little to no planning. Simply put, I can usually just go ahead and write code, as soon as there’s a concrete idea in my head.

katriawm is not what you’d call a huge project and I knew that before starting the project. Currently, there are about 3500 lines of code (wc -l). Nevertheless, I wanted to try a different approach this time. I tried to split work into two cycles: Implementing new features and maintenance. The idea is similar to how Linux kernel development works.

Here’s what I did:

I assemble a list of features that I want to implement. This list shouldn’t be “too big” so I don’t lose track of what I have done.
I implement the features.
When I get an idea for a great new feature, I add it to the list for the next cycle.
Once all features of the current implementation cycle have been completed, the maintenance cycle begins. The only thing I do now is fix bugs and improve code quality.
Goto 1.

There is only a weak time constraint: The maintenance cycle should take at least one or two days. This might not seem a lot, but when you’re on vacation, it is. ;-)

This model is pretty simple and actually quite obvious. It works well and feels right. You can also find similar ideas in today’s popular software development philosophies. It also has a nice side effect: You spend more time thinking about which features to implement. Is a feature important enough for the next cycle? Can it be delayed?

During maintenance cycles, a good habit is to read your code top to bottom. Just read it. You will notice a lot of little details, inconsistencies, things to clean up. As you’re currently in a maintenance cycle, you’re free to fix most of these issues.

There’s a TODO file. This is where I organized the project (sadly, I didn’t commit this file right from the start because I thought it didn’t belong into the repo). I didn’t use external software, nor the GitHub issue tracker (because the project wasn’t even published on GitHub and I don’t have a paid GitHub account). I think this is a good thing. Keep the TODO list right next to the code – but not in the code. I used to store TODO items as comments in the code and then use TaskList.vim to list them. This is okay most of the time, but doesn’t work with the two cycles. And, you tend to lose track of open TODO items.

Uncharted territory

As katriawm is non-reparenting, I still have not addressed any of those issues. Why did my original attempt fail? Why do so many WMs do reparenting? Should I try again?

Also, support for EWMH and, of course, ICCCM is very limited in katriawm. I’m saying “of course” because is ICCCM is long, complex, and in many ways outdated. I’ll probably focus on important bits and pieces of EWMH first. (Although katriawm inevitably breaks with EWMH, see INTERNALS.md.)

I also had no contact whatsoever with colormaps or “visuals”. Maybe that’s because it’s 2016. Selections (a.k.a. “the clipboard”) could be very interesting as well.

Conclusion

Now, what do I do? Of course, I’ll keep at it and learn more about X11. But will katriawm become my main window manager? Will it even become a successful window manager that other people use? I have no idea.

After all, it’s a learning project. It’s one of the few projects of mine that do not solve a particular problem. I just did it out of curiosity. I think katriawm should be treated as such.

I’d like to encourage everyone to write their own WM for educational purposes. It was and is one of the most interesting projects of mine. :-) Yes, it’ll take some time and a lot of energy. But you can do it. I mean, you can do it because everything is free software. All the moving parts can be analyzed and easily be replaced by your own software. The perfect learning platform.

You’re not sitting in front of a black box.

Comments?