Disclaimer: This article revolves around my experience with DWM and its interaction with the underlying X Window system (which is considered deprecated). You may not get any value out of this, unless you are using dwm or x11 like me. I wrote this because I enjoyed knowing about this and I hope you do too.
What happens when you press a key? (in X)
I had always wanted to try out tiling window managers. It was around Christmas, and I was on my holidays and I had some free time. I installed dwm and had successfully launched a session in it in less than 20 mins. Yayy!
I was about to configure it. The main configurations available can be classified as appearance-related and keybinding-related.
Appearance wise, tiling WMs split the screen into two sections – main area and the stack area, so that we have a side-by-side view of multiple open windows. (My steps towards becoming a 10x engineer 😜)
Keybinding wise, dwm has some hotkeys to manage the windows on the screen, their sizes, quitting dwm, etc. This is where I felt a roadblock because I didn’t understand the code that enabled these keybindings. (In my defense, the code is quite complex and it’s been several years since I last saw C programs 👀). In this post, I have basic sample code with which we can understand what happens when we press a key – apart from the character appearing on the screen.
Introducing X Window System
As mentioned earlier, DWM is a tiling window manager. It is responsible for creating windows, maintaining the main workspace, and the stack space, organising windows in desktop spaces, and bringing them on the screen when we switch desktops.
It is however not responsible for managing inputs from keyboard or mouse. The X Window system (X11) does it.
X follows a client-server architecture. The client and server can run on separate machines. Several clients (a.k.a windows on the screen) can establish a connection to the server and subscribe for input events. In this context, DWM itself is a client to the X server, which creates several nested clients inside it and manages them.
Lets look at a small sample code that gives a basic idea of keybinding-related code in DWM
Sample X Client program
Pro tip: If you are also out of touch with C programs like me, You can skip to the headings below to make sense of what the code does easily
|
|
There are 4 main parts in the client program.
setup
XOpenDisplay establishes a connection to the X server. We create a simple window inside the RootWindow (Think RootWindow as the entire monitor).
We map the window to the display so that it becomes visible.
input listener
Our client program subscribes to all key press events that the server receives when the window is focused (selected).
run
The main loop of the program, which prints key information about the event.
cleanup
We close the connection
From this, it is obvious that the answer to our question lies in the run method. Lets zoom in a litte to get the full picture.
The Answer
We have 3 main library functions in the loop
XNextEvent
This gives the latest event from a display, and assigns it to the event variable that we pass in the method.
XLookupString
This takes in a key, and writes the printable character to buffer and keysym to keysym variable. It also factors in the context of the keypress.
XKeysymToString
This takes in a KeySym and returns a String (that best denotes the KeySym)
We are printing all the values for inspecting.
Now, lets run the progam
Below are the keys I pressed in sequence while running the program:
a
Left Shift
a (with Left Shift held from the previous key press)
1 on the Numpad (with Numlock toggled off)
Num Lock
1 on the numpad again
1 (next to the tilde (`~`) symbol)
EscOutput:
KeyPressed KeyCode KeySym Buffer a 38 97 a Shift_L 50 65505 N/A A 38 65 A KP_End 87 65436 N/A Num_Lock 77 65407 N/A KP_1 87 65457 1 1 10 49 1 Escape 9 65307
We can see that the first and third row have the same keycodes.
KeySym however changes. (You should be having a 💡 next to your head by now)
We can see that the sixth and seventh row have same printable characters (buffers) although the keycodes and keysyms change.
Lets define these terms.
Keycode
Keycodes are representation of physical keys in keyboard and mouse. The physical keys are assigned a number during startup and they don’t change thereafter.
Keysym
Keysyms are logical numbers for keys. These are arrived at based on the context of the keypress – like the state of modifier keys (Shift, Numlock, CapsLock, etc). For the same physical key, there may be different keysymbols in different contexts
X maintains both these mappings and use them to compute the symbol to be emitted
Shell Utilities
X provides several CLI tools for us to interact with it. I have taken two commands to demonstrate the mappings and events we discussed in our program.
xmodmap
|
|
There are 7 KeySyms per KeyCode; KeyCodes range from 8 to 255.
KeyCode Keysym (Keysym) ... Value Value (Name) ... ... clipped many lines ... 24 0x0071 (q) 0x0051 (Q) 0x0071 (q) 0x0051 (Q) 25 0x0077 (w) 0x0057 (W) 0x0077 (w) 0x0057 (W) 26 0x0065 (e) 0x0045 (E) 0x0065 (e) 0x0045 (E) 27 0x0072 (r) 0x0052 (R) 0x0072 (r) 0x0052 (R) 28 0x0074 (t) 0x0054 (T) 0x0074 (t) 0x0054 (T) 29 0x0079 (y) 0x0059 (Y) 0x0079 (y) 0x0059 (Y) 30 0x0075 (u) 0x0055 (U) 0x0075 (u) 0x0055 (U) 31 0x0069 (i) 0x0049 (I) 0x0069 (i) 0x0049 (I) 32 0x006f (o) 0x004f (O) 0x006f (o) 0x004f (O) 33 0x0070 (p) 0x0050 (P) 0x0070 (p) 0x0050 (P) 34 0x005b (bracketleft) 0x007b (braceleft) 0x005b (bracketleft) 0x007b (braceleft) ... clipped many lines ...
The left most column shows the keycode.
The second column shows the primary keysym (lowercase character) that will be printed when a key is pressed without any modifiers
The third column shows the keysym and the character that will be printed when a key is pressed with Shift modifier
The fourth column shows the keysym and the character that will be printed when key is pressed with Mode_Switch (sometimes mapped to AltGr key). Some keyboards can emit more than 2 characters with the same key
The fifth column shows Mode_Switch + Shift and so on..
We can also use the above utility to change the keybindings on startup
For instance, we can run the below command to swap left and right clicks in a mouse (One way to impress our left-handed friends 😉)
|
|
Xev
xev tool which prints all the input events happening in a particular window (like our C program).
|
|
Notice how the LeaveNotify, EnterNotify, MotionNotify events are shown for movements of the cursor inside the window. The Keypress and Keyrelease events can also be seen.
Seems cool, right? Thats how we could see all the input events tracked by X.
I hope this gives a fairly better idea about how X handles input events.
Moving forward
Coming back to our roadblock (or rather a little speedbump) we faced earlier in understanding DWM’s keybinding related code, DWM follows a somewhat similar approach.
DWM uses X only to intercept specific input events from the X Server.
The only difference is, it uses XGrabKey and XGrabButton. These methods enable clients to tell X Server to always route certain key strokes to them (the client). This differs from XSelectInput which is a generic subscription for all events in the currently focused window. The XGrabKey can be used to define some hotkeys which is what DWM also uses them for. DWM defines these keybindings in the RootWindow itself, which enables it to manipulate the windows inside it, regardless of the current focused window.
Happy hacking!
References
- dwm
- This random gist
- Xorg
- Xlib
- And ofcource gemini