X11: How does “the” clipboard work?

If you have used another operating system before you switched to something that runs X11, you will have noticed that there is more than one clipboard:

Those two clipboards usually don’t interfere. You can keep the content of the “Ctrl+C clipboard” while using the “middle mouse clipboard” to copy and paste something else.

How does that work? Is there more than one clipboard? How many are there? Do all X11 clients support all forms of clipboards?

Here’s the approriate section of ICCCM on this topic.

Selections as a form of IPC

First things first, in X11 land, “clipboards” are called “selections”.

Yes, there is more than one selection and they all work independently. In fact, you can use as many selections as you wish. In theory, that is. When using selections, you make different clients communicate with each other. This means that those clients have to agree on which selections to use. You can’t just invent your own selection and then expect Firefox to be compatible with it.

Looking at it from a very high altitude, it goes like this:

Client A                    X Server                    Client B
----------------------------------------------------------------

(1) |  I own selection FOO!    |
    |  ------------------->    |


                               |  Write sel. FOO to BAR!  | (2)
                               |  <---------------------  |


    | Write sel. FOO to BAR!   |
    | <---------------------   |


    |     Here is FOO.
    | -------------------------:----------------------->  |


                                     Okay, got it.        |
    | <------------------------:------------------------  |

(1) means every client can claim ownership of any selection at any time. It only informs the X server about that – no data is transferred yet. This is an important thing to understand. The X server is nothing more but a broker. It takes a note of which client owns which selection.

In (2), another client asks the X server to send it the content of selection “FOO”. The X server simply relays that request to the current owner of that selection. Client A is then responsible for actually transmitting the data to client B.

How are selections identified?

Above, I just called it “selection FOO”, meaning it’s a rather arbitrary identifier that you can choose. If you have worked with X11 before, this won’t be surprising to you: Selections are identified by atoms.

Quick recap: Atoms are a way to identify something in X11 and they are basically strings. Internally, a number is allocated for each atom, but you rarely need to ask the X server, “what’s the name of atom number 42?”

There are three “standard” selection names:

“Standard” means that they are specified by ICCCM 2.6.1. Yes, it’s confusing that one of the selections is named “clipboard”.

Program 1: Query selection owners

Knowing what we know now, we can ask the X server to tell us who owns which selection. This is xowners.c:

#include <stdio.h>
#include <X11/Xlib.h>

int
main()
{
    Display *dpy;
    Window owner;
    Atom sel;
    char *selections[] = { "PRIMARY", "SECONDARY", "CLIPBOARD", "FOOBAR" };
    size_t i;

    dpy = XOpenDisplay(NULL);
    if (!dpy)
    {
        fprintf(stderr, "Could not open X display\n");
        return 1;
    }

    for (i = 0; i < sizeof selections / sizeof selections[0]; i++)
    {
        sel = XInternAtom(dpy, selections[i], False);
        owner = XGetSelectionOwner(dpy, sel);
        printf("Owner of '%s': 0x%lX\n", selections[i], owner);
    }

    return 0;
}

Compilation this program (and all of the following ones in a similar manner):

cc -Wall -Wextra -o xowners xowners.c -lX11

FOOBAR is a non-standard selection name. It’s perfectly valid to use it, but don’t expect it to work with all clients. :-)

As you can see, the program prints IDs of windows:

$ ./xowners 
Owner of 'PRIMARY': 0x60080F
Owner of 'SECONDARY': 0x0
Owner of 'CLIPBOARD': 0x1E00024
Owner of 'FOOBAR': 0x0

Windows are another basic form of communication between clients, meaning they not necessarily work as “boxes of pixels”. Unmapped windows can exist in an X11 session just fine (and there usually are many of them).

We can use the xwininfo tool to find out more about those two windows:

$ xwininfo -id 0x60080F | grep '^xwininfo'
xwininfo: Window id: 0x60080f "xiate"
$ xwininfo -id 0x1E00024 | grep '^xwininfo'
xwininfo: Window id: 0x1e00024 "lariza"

Aha, so xiate is holding the PRIMARY selection, while lariza owns CLIPBOARD.

Let’s have a look at the full output of one of these commands:

$ xwininfo -id 0x60080F

xwininfo: Window id: 0x60080f "xiate"

  Absolute upper-left X:  -100
  Absolute upper-left Y:  -100
  Relative upper-left X:  -100
  Relative upper-left Y:  -100
  Width: 10
  Height: 10
  Depth: 0
...
  Map State: IsUnMapped
...

This is, in fact, an unmapped window. Clients often do this. They create a window with the sole purpose of managing selections. Clients could use their visible window, but that’s problematic. Sometimes, visible windows are short-lived and ownership of a selection is lost when the window dies.

Content type and conversion

So far, so good. And so simple.

Things start to get complicated once you realize that some clients might use clipboards for text, others might use it for images, some might use it for audio data, and some other client might use it for some form of data that you have never heard of.

And then there are situations where you can provide the same data in different forms. To illustrate this, just select some text in a web browser. Copy it and paste it into Vim. You’ll get plain text. But if you paste the same selection into a program like LibreOffice Writer, you’ll not only get text but also text attributes, like “this is bold, this is a code block”, and so on.

Recall the diagram from above. Step 2 said: Client B tells the X server to write selection “FOO” to “BAR”. (We have not yet covered what “BAR” is, but we’ll get there soon.) Actually, it’s more like this: “Write selection ‘FOO’ to ‘BAR’ as content type ‘BAZ’.” In other words, client B can request the current content of selection “FOO” as text. Or as an image. Or as something else.

That’s why the library call to “get” the current content of a selection is called XConvertSelection() instead of XGetSelection().

Program 2: Get clipboard as UTF-8

This is an example of “client B”:

#include <stdio.h>
#include <X11/Xlib.h>

void
show_utf8_prop(Display *dpy, Window w, Atom p)
{
    Atom da, incr, type;
    int di;
    unsigned long size, dul;
    unsigned char *prop_ret = NULL;

    /* Dummy call to get type and size. */
    XGetWindowProperty(dpy, w, p, 0, 0, False, AnyPropertyType,
                       &type, &di, &dul, &size, &prop_ret);
    XFree(prop_ret);

    incr = XInternAtom(dpy, "INCR", False);
    if (type == incr)
    {
        printf("Data too large and INCR mechanism not implemented\n");
        return;
    }

    /* Read the data in one go. */
    printf("Property size: %lu\n", size);

    XGetWindowProperty(dpy, w, p, 0, size, False, AnyPropertyType,
                       &da, &di, &dul, &dul, &prop_ret);
    printf("%s", prop_ret);
    fflush(stdout);
    XFree(prop_ret);

    /* Signal the selection owner that we have successfully read the
     * data. */
    XDeleteProperty(dpy, w, p);
}

int
main()
{
    Display *dpy;
    Window owner, target_window, root;
    int screen;
    Atom sel, target_property, utf8;
    XEvent ev;
    XSelectionEvent *sev;

    dpy = XOpenDisplay(NULL);
    if (!dpy)
    {
        fprintf(stderr, "Could not open X display\n");
        return 1;
    }

    screen = DefaultScreen(dpy);
    root = RootWindow(dpy, screen);

    sel = XInternAtom(dpy, "CLIPBOARD", False);
    utf8 = XInternAtom(dpy, "UTF8_STRING", False);

    owner = XGetSelectionOwner(dpy, sel);
    if (owner == None)
    {
        printf("'CLIPBOARD' has no owner\n");
        return 1;
    }
    printf("0x%lX\n", owner);

    /* The selection owner will store the data in a property on this
     * window: */
    target_window = XCreateSimpleWindow(dpy, root, -10, -10, 1, 1, 0, 0, 0);
    XSelectInput(dpy, target_window, SelectionNotify);

    /* That's the property used by the owner. Note that it's completely
     * arbitrary. */
    target_property = XInternAtom(dpy, "PENGUIN", False);

    /* Request conversion to UTF-8. Not all owners will be able to
     * fulfill that request. */
    XConvertSelection(dpy, sel, utf8, target_property, target_window,
                      CurrentTime);

    for (;;)
    {
        XNextEvent(dpy, &ev);
        switch (ev.type)
        {
            case SelectionNotify:
                sev = (XSelectionEvent*)&ev.xselection;
                if (sev->property == None)
                {
                    printf("Conversion could not be performed.\n");
                    return 1;
                }
                else
                {
                    show_utf8_prop(dpy, target_window, target_property);
                    return 0;
                }
                break;
        }
    }
}

This is more code than you expected? Yup. But bear with me. We’ll go through it step by step.

First, let’s uncover what “BAR” is. You see that the code above creates a target_window and an atom target_property. These two things together are “BAR”. When client A sends the content of a selection to client B, it does so by writing the data to a property on a window. This is virtually the only way two X11 clients can communicate arbitrary data through the X server.

Remember that X11 is network transparent. Clients A and B need not run on the same host. They need not even use the same network protocols. One might use TCP/IP, the other might use … whatever. ICCCM uses DECnet as an example, which nobody uses anymore today, probably. As a result, they must not communicate directly, but only through the X server.

Okay. Our target “BAR” is a window and a property.

We also need a content type. Here, I used UTF8_STRING. You won’t find this atom name in ICCCM. UTF-8 did not even exist when ICCCM was first published. Newer clients support it, though.

We then ask the X server to “perform” the conversion: XConvertSelection(). Now look closely at the first diagram at the top of this article. There is no immediate response to XConvertSelection(). The X server must first relay that request to client A, provided that there even is a selection owner right now. Then, at some point in the future, client A decides to do its work – or maybe not. This means that we can only wait for some X event to happen. That’s what the loop at the bottom of the code is for. The event SelectionNotify tells us that a conversion has happened or failed. We can then go ahead and read the property on our very own window; client A should have written its data to that property.

Some things to note:

Program 3: Owning a selection

This is the other direction. A client that claims ownership of CLIPBOARD and provides data if asked for type UTF8_STRING. So, this is client A:

#include <stdio.h>
#include <string.h>
#include <time.h>
#include <X11/Xlib.h>

void
send_no(Display *dpy, XSelectionRequestEvent *sev)
{
    XSelectionEvent ssev;
    char *an;

    an = XGetAtomName(dpy, sev->target);
    printf("Denying request of type '%s'\n", an);
    if (an)
        XFree(an);

    /* All of these should match the values of the request. */
    ssev.type = SelectionNotify;
    ssev.requestor = sev->requestor;
    ssev.selection = sev->selection;
    ssev.target = sev->target;
    ssev.property = None;  /* signifies "nope" */
    ssev.time = sev->time;

    XSendEvent(dpy, sev->requestor, True, NoEventMask, (XEvent *)&ssev);
}

void
send_utf8(Display *dpy, XSelectionRequestEvent *sev, Atom utf8)
{
    XSelectionEvent ssev;
    time_t now_tm;
    char *now, *an;

    now_tm = time(NULL);
    now = ctime(&now_tm);

    an = XGetAtomName(dpy, sev->property);
    printf("Sending data to window 0x%lx, property '%s'\n", sev->requestor, an);
    if (an)
        XFree(an);

    XChangeProperty(dpy, sev->requestor, sev->property, utf8, 8, PropModeReplace,
                    (unsigned char *)now, strlen(now));

    ssev.type = SelectionNotify;
    ssev.requestor = sev->requestor;
    ssev.selection = sev->selection;
    ssev.target = sev->target;
    ssev.property = sev->property;
    ssev.time = sev->time;

    XSendEvent(dpy, sev->requestor, True, NoEventMask, (XEvent *)&ssev);
}

int
main()
{
    Display *dpy;
    Window owner, root;
    int screen;
    Atom sel, utf8;
    XEvent ev;
    XSelectionRequestEvent *sev;

    dpy = XOpenDisplay(NULL);
    if (!dpy)
    {
        fprintf(stderr, "Could not open X display\n");
        return 1;
    }

    screen = DefaultScreen(dpy);
    root = RootWindow(dpy, screen);

    /* We need a window to receive messages from other clients. */
    owner = XCreateSimpleWindow(dpy, root, -10, -10, 1, 1, 0, 0, 0);
    XSelectInput(dpy, owner, SelectionClear | SelectionRequest);

    sel = XInternAtom(dpy, "CLIPBOARD", False);
    utf8 = XInternAtom(dpy, "UTF8_STRING", False);

    /* Claim ownership of the clipboard. */
    XSetSelectionOwner(dpy, sel, owner, CurrentTime);

    for (;;)
    {
        XNextEvent(dpy, &ev);
        switch (ev.type)
        {
            case SelectionClear:
                printf("Lost selection ownership\n");
                return 1;
                break;
            case SelectionRequest:
                sev = (XSelectionRequestEvent*)&ev.xselectionrequest;
                printf("Requestor: 0x%lx\n", sev->requestor);
                /* Property is set to None by "obsolete" clients. */
                if (sev->target != utf8 || sev->property == None)
                    send_no(dpy, sev);
                else
                    send_utf8(dpy, sev, utf8);
                break;
        }
    }
}

It creates an invisible window and then claims ownership of CLIPBOARD. As you can see, not “the client” owns a selection, but a window does.

The program then waits for events. SelectionClear is simple: Some other client has claimed ownership of the clipboard. Yes, that can happen at any time.

SelectionRequest is sent to client A by the X server. It’s the event that the X server generates due to a call to XConvertSelection() by client B. We now simply check if target is UTF8_STRING. If it’s not, we deny the request. But if it is, we call XChangeProperty() to alter the given property on the given target window. Once we’ve done that, we generate a SelectionNotify event and send it to client B.

This client sends the current date and time to requestors. I did this to illustrate further how selections don’t store data in the X server. Data is converted (and possibly generated) only when another client asks for it.

Program 4: Content type TARGETS

There are some special content types. You can ask the owner of a selection to convert the selection into the type TARGETS. This sounds a bit weird, but it’s simple. Client A will not respond with the actual data but with a list of atoms. Each atom is a valid target for the current data.

#include <stdio.h>
#include <X11/Xatom.h>
#include <X11/Xlib.h>

void
show_targets(Display *dpy, Window w, Atom p)
{
    Atom type, *targets;
    int di;
    unsigned long i, nitems, dul;
    unsigned char *prop_ret = NULL;
    char *an = NULL;

    /* Read the first 1024 atoms from this list of atoms. We don't
     * expect the selection owner to be able to convert to more than
     * 1024 different targets. :-) */
    XGetWindowProperty(dpy, w, p, 0, 1024 * sizeof (Atom), False, XA_ATOM,
                       &type, &di, &nitems, &dul, &prop_ret);

    printf("Targets:\n");
    targets = (Atom *)prop_ret;
    for (i = 0; i < nitems; i++)
    {
        an = XGetAtomName(dpy, targets[i]);
        printf("    '%s'\n", an);
        if (an)
            XFree(an);
    }
    XFree(prop_ret);

    XDeleteProperty(dpy, w, p);
}

int
main()
{
    Display *dpy;
    Window target_window, root;
    int screen;
    Atom sel, targets, target_property;
    XEvent ev;
    XSelectionEvent *sev;

    dpy = XOpenDisplay(NULL);
    if (!dpy)
    {
        fprintf(stderr, "Could not open X display\n");
        return 1;
    }

    screen = DefaultScreen(dpy);
    root = RootWindow(dpy, screen);

    sel = XInternAtom(dpy, "CLIPBOARD", False);
    targets = XInternAtom(dpy, "TARGETS", False);
    target_property = XInternAtom(dpy, "PENGUIN", False);

    target_window = XCreateSimpleWindow(dpy, root, -10, -10, 1, 1, 0, 0, 0);
    XSelectInput(dpy, target_window, SelectionNotify);

    XConvertSelection(dpy, sel, targets, target_property, target_window,
                      CurrentTime);

    for (;;)
    {
        XNextEvent(dpy, &ev);
        switch (ev.type)
        {
            case SelectionNotify:
                sev = (XSelectionEvent*)&ev.xselection;
                if (sev->property == None)
                {
                    printf("Conversion could not be performed.\n");
                    return 1;
                }
                else
                {
                    show_targets(dpy, target_window, target_property);
                    return 0;
                }
                break;
        }
    }
}

Running this when a typical GTK client currently owns a simple text selection reveals something interesting:

$ ./xtargets 
Targets:
    'TIMESTAMP'
    'TARGETS'
    'MULTIPLE'
    'SAVE_TARGETS'
    'UTF8_STRING'
    'COMPOUND_TEXT'
    'TEXT'
    'STRING'
    'text/plain;charset=utf-8'
    'text/plain'

X11 is old and many conventions exist on how to specify data types. Some of them are legacy, some are ambiguous, many not even mentioned by ICCCM. MIME types are fine today, but ICCCM does not talk about MIME types in any way.

This feels a little messy, yes. Being compatible with today’s clients and clients from 30 years ago isn’t easy.

Handling binary data using xclip

I’ve been wondering for a long time why I’m unable to paste an image using xclip. It should be simple: xclip -o >foo.img. Well, no. Knowing what I know now, it finally is simple. :-)

First, copy an image using a tool like The GIMP.

xclip can query TARGETS:

$ xclip -o -target TARGETS -selection clipboard 
TIMESTAMP
TARGETS
MULTIPLE
SAVE_TARGETS
image/png
image/tiff
image/x-icon
image/x-ico
image/x-win-bitmap
image/vnd.microsoft.icon
application/ico
image/ico
image/icon
text/ico
image/bmp
image/x-bmp
image/x-MS-bmp
image/jpeg

Choose something that you like. And then ask for the data:

$ xclip -o -target image/png -selection clipboard >foo.png
$ file foo.png 
foo.png: PNG image data, 373 x 309, 8-bit/color RGBA, non-interlaced

Not a big deal. Using xclip to copy image data works the same way, just specify a MIME type using -t.

Large amounts of data

You might have noticed that program 2 aborts if there’s something involved called INCR. This is one of the many hacks in the world of X11 selections.

Properties on windows can only hold a limited amount of data, because they live in the memory of the X server. If you want to transfer several megabytes by using selections, you can still do that. You just have to chunk your data and client B must read data in chunks. Usually, the size of each chunk is about 256 kB. Not that much, but sufficient in most cases. It makes clients more complicated, though, because each client must implement that chunking mechanism.

Clipboard managers

In your everyday work, you might have noticed this: You open a window, select some text, hit Ctrl+C, and then close the window. What happens? The selection is lost. Of course it is, the client window that owned the selection is gone. This is different from other operation systems. And even if all operating systems worked like that, it would still be annoying.

There is no “clean” solution to this problem in X11. Instead, ICCCM suggests the use of clipboard managers. They work like this:

This feels like there are many race conditions involved. It will also break when a client does not support the TARGETS target. Yes, supporting this target is required by ICCCM, so it “should” work.

Summary

I think it’s important to understand that the X server is just a broker. Clients talk to each other (via the server), exchanging content. There is no clipboard “inside” of the server. Data is converted on the fly. You can have as many selections as you like, but not all clients support all of them.

One final thing to note: At first sight, selections in X11 appear to be simple. I fear, though, that they are almost as complicated as time zones. Even the “standard” utility xclip isn’t strictly ICCCM-compliant and contains the occasional “FIXME”. There are many race conditions and many corner cases.

tl; dr: If possible, use a library.