Xpra: Ticket #2690: Newly-launched window sometimes doesn't respond to keyboard input

Hi,

Client: Windows 7, v4.0-r25796 Server: RHEL 7.7, v3.0.5-r24939

When I launch an X11 application that opens in its own window, sometimes the new window doesn't respond to keyboard input. This seems to happen most often with gitk (around 40% of the time). (I have also seen it with gvim, but very rarely - maybe 1%.)

Steps to reproduce:

Expected behaviour: gitk exits when you type Ctrl-Q. (If this happens then re-run gitk and try again. In my environment the issue reliably reproduces within 5 attempts)

Actual behavour: gitk doesn't respond to the Ctrl-Q, or any other keyboard input. Mouse input still works. If you switch focus to another window then back to gitk, the keyboard works. (But just clicking on the gitk window isn't enough.)

I was also able to reproduce this issue with client v3.0.8-r25712, but not client v2.4.2-r21077 (still with server v3.0.5-r24939), so it seems to be a regression since v2.4.2.

Let me know if you need more info...



Tue, 31 Mar 2020 05:09:37 GMT - Antoine Martin: owner changed

Can you attach the logs with -d focus,metadata?


Tue, 31 Mar 2020 22:30:25 GMT - pm54389: attachment set


Tue, 31 Mar 2020 22:30:41 GMT - pm54389: attachment set


Tue, 31 Mar 2020 22:36:11 GMT - pm54389: owner changed

Added. It was much harder to reproduce with logging enabled (50+ attempts needed).

The logs contain (server timings):

22:29:36,558 Success case: Launch gitk, Ctrl-Q closes it 22:29:37,598 Failure case: Launch gitk, Ctrl-Q and all other keypresses ignored 22:29:44+ Click on xterm, then gitk, then press Ctrl-Q to close

Client log is from the same time period (clocks aren't in sync).


Wed, 01 Apr 2020 05:18:57 GMT - Antoine Martin: owner changed

Added. It was much harder to reproduce with logging enabled (50+ attempts needed).

Race condition confirmed then!

22:29:36,558 Success case: Launch gitk, Ctrl-Q closes it

22:29:36,782 client   1 @26.817 process_new_common: ...
22:29:36,790 client   1 @26.826 focus-out-event for wid=2
22:29:36,793 client   1 @26.826 focus-in-event for wid=138
22:29:36,794 set_window_state(138, WindowModel(0x1200047), {'focused': True})
#go back and forth:
22:29:36,824 client   1 @26.854 send_focus(0)
22:29:36,824 client   1 @26.855 recheck_focus() wid=2, focused=None, latest=False
22:29:36,824 client   1 @26.856 recheck_focus() wid=138, focused=None, latest=True
22:29:36,825 client   1 @26.856 update_focus(138, True) focused=None, grabbed=None
22:29:36,825 client   1 @26.856 send_focus(138)
22:29:36,825 client   1 @26.857 ClientWindow(138) focus_change('initial',) has-toplevel-focus=True, _been_mapped=True
22:29:36,825 client   1 @26.857 update_focus(138, True) focused=138, grabbed=None
22:29:36,909 process_focus: wid=0
22:29:36,911 sending WM_TAKE_FOCUS: 0x40001b, X11 timestamp=256623776L
22:29:36,911 process_focus: wid=138
22:29:36,911 focus: giving focus to WindowModel(0x1200047)
#window destroyed:
22:29:37,263 scrub_x11() x11 properties=['WM_STATE', '_NET_FRAME_EXTENTS', '_NET_WM_ALLOWED_ACTIONS']
22:29:37,271 world window got focus: <gtk.gdk.Event at 0x7f2c15faa7d8: GDK_FOCUS_CHANGE>, has-toplevel-focus=True[0m

22:29:37,598 Failure case: Launch gitk, Ctrl-Q and all other keypresses ignored:

22:29:37,800 client   1 @27.834 process_new_common: ...
22:29:37,805 client   1 @27.842 focus-out-event for wid=2
22:29:37,810 client   1 @27.843 focus-in-event for wid=139
22:29:37,812 set_window_state(139, WindowModel(0x1200047), {'focused': True})
#go back and forth:
22:29:37,866 client   1 @27.872 send_focus(0)
22:29:37,866 client   1 @27.872 recheck_focus() wid=2, focused=None, latest=False
22:29:37,867 client   1 @27.873 recheck_focus() wid=139, focused=None, latest=True
22:29:37,867 client   1 @27.873 update_focus(139, True) focused=None, grabbed=None
22:29:37,867 client   1 @27.873 send_focus(139)
22:29:37,867 client   1 @27.874 ClientWindow(139) focus_change('initial',) has-toplevel-focus=True, _been_mapped=True
22:29:37,867 client   1 @27.874 update_focus(139, True) focused=139, grabbed=None
22:29:37,898 process_focus: wid=0
22:29:37,899 sending WM_TAKE_FOCUS: 0x40001b, X11 timestamp=256624765L
22:29:37,899 process_focus: wid=139
22:29:37,900 focus: giving focus to WindowModel(0x1200047)
22:29:37,904 world window got focus: <gtk.gdk.Event at 0x7f2bfa2c85a8: GDK_FOCUS_CHANGE>, has-toplevel-focus=True
22:29:38,537 after_set_focus
22:29:38,537 reset_x_focus: widget with focus: DesktopManager(6)

I think that the reset_x_focus is the problem. It happens after we've focused the correct window. (139)

This brings us to this piece of code documentation from the original xpra release from 2008:

#'' This requires a very long comment, because focus management is teh awesome.
# The basic problems are:
#    1) X focus management sucks
#    2) GDK/GTK know this, and sensibly avoids it
# (1) is a problem by itself, but (2) makes it worse, because we have to wedge
# them together somehow anyway.
#
# In more detail: X tracks which X-level window has (keyboard) focus at each
# point in time.  This is the window which receives KeyPress and KeyRelease
# events.  GTK also has a notion of focus; at any given time (within a
# particular toplevel) exactly one widget is focused.  This is the widget
# which receives key-press-event and key-release-event signals.  However,
# at the level of implementation, these two ideas of focus are actually kept
# entirely separate.  In fact, when a GTK toplevel gets focus, it sets the X
# input focus to a special hidden window, reads X events off of that window,
# and then internally routes these events to whatever the appropriate widget
# would be at any given time.
#
# The other thing which GTK does with focus is simply tweak the drawing style
# of widgets.  A widget that is focused within its toplevel can/will look
# different from a widget that does not have focus within its toplevel.
# Similarly, a widget may look different depending on whether the toplevel
# that contains it has toplevel focus or not.
#
# Unfortunately, we cannot read keyboard events out of the special hidden
# window and route them to client windows; to be a proper window manager, we
# must actually assign X focus to client windows, while pretending to GTK+
# that nothing funny is going on, and our client windows are just ordinary
# widgets.
#
# So there are a few pieces to this.  Firstly, GTK tracks focus on toplevels
# by watching for focus events from X, which ultimately come from the window
# manager.  Since we *are* the window manager, this is not particularly
# useful.  Instead, we create a special subclass of gtk.Window that fills the
# whole screen, and trick GTK into thinking that this toplevel *always* has
# (GTK) focus.
#
# Then, to manage the actual X focus, we do a little dance, watching the GTK
# focus within our special toplevel.  Whenever it moves to a widget that
# actually represents a client window, we send the X focus to that client
# window.  Whenever it moves to a widget that is actually an ordinary widget,
# we take the X focus back to our special toplevel.
#
# Note that this means that we do violate our overall goal of making client
# window widgets indistinguishable from ordinary GTK widgets, because client
# window widgets can only be placed inside this special toplevel, and this
# toplevel has special-cased handling for our particular client-wrapping
# widget.  In practice this should not be a problem.
#
# Finally, we have to notice when the root window gets focused (as it can when
# a client misbehaves, or perhaps exits in a weird way), and regain the
# focus.  The Wm object is actually responsible for doing this (since it is
# responsible for all root-window event handling); we just expose an API
# ('reset_x_focus') that people should call whenever they think that focus may
# have gone wonky.

This still applies today, as I do not dare touch this code.

Then there's also this:

# FIXME: This would better be a default handler, but there is a bug in
# the superclass's default handler that means we can't call it
# properly[0], so as a workaround we let the real default handler run,
# and then come in afterward to do what we need to.  (See also
# Viewport._after_set_focus_child.)
#   [0] http://bugzilla.gnome.org/show_bug.cgi?id=462368

[0]: GtkContainer.do_set_focus_child should map None argument to (GtkWidget*)0 The fix has been added over 10 years ago (in GTK2), so maybe we can drop the workaround?


Now, client side we also have the problem that the window manager will send us a focus-out event just before the focus-in event. Sometimes more than one pair too... To avoid focusing the "world" background window before focusing the correct one, we recheck the focus via a timer (code added in r12645 - over 4 years ago).

So, I believe that waiting a little bit longer before re-checking the focus is likely to "fix" this race condition. Please try 4.0-r25893 or later. (r25894 makes the delay configurable) The more correct fix would be to figure out how to avoid those focus errors whilst still bouncing the focus back and forth between various windows, but that's much harder: those X11 focus events are asynchronous.


Thu, 02 Apr 2020 22:18:17 GMT - pm54389: owner changed

Thanks for looking at this.

I tested with client v4.0-r25898 and it seems to make no difference - I'm still seeing the issue with the same frequency as before. Setting XPRA_FOCUS_RECHECK_DELAY to values in the range 2-100 also seems to make no difference.

However, setting XPRA_FOCUS_RECHECK_DELAY=1 seems to make it happen less frequently - not a complete fix but it's an improvement!

BTW, I noticed a way to reliably reproduce this without gitk - the command xterm -geometry 200x80 & has the same issue. i.e. when I run xterm -geometry 200x80 &, the new xterm sometimes doesn't get keyboard focus (maybe 20% of the time). I can't reproduce it with just xterm though...


Fri, 03 Apr 2020 04:28:29 GMT - Antoine Martin: owner, status changed

when I run xterm -geometry 200x80 &, the new xterm sometimes doesn't get keyboard focus (maybe 20% of the time). I can't reproduce it with just xterm though...

Weird! Thanks, I'll try.

However, setting XPRA_FOCUS_RECHECK_DELAY=1 seems to make it happen less frequently - not a complete fix but it's an improvement!

As of r25915, you can use negative values for the delay to run the check synchronously. A value of zero should be the same as it was before the introduction of XPRA_FOCUS_RECHECK_DELAY.


Tue, 14 Apr 2020 13:46:47 GMT - Antoine Martin: owner, status changed

There's a very good chance that this has been fixed in r26079 + r26082, see #2723.


Tue, 21 Apr 2020 23:31:25 GMT - pm54389: owner changed

Upgraded client to v4.0-r26160 and ran with XPRA_FOCUS_RECHECK_DELAY unset, and can confirm this is now fixed. Thanks!

Leaving this ticket open in case you want to use it to track removal of the earlier attempts at fixing, in case you don't want to keep them.


Wed, 22 Apr 2020 02:40:31 GMT - Antoine Martin: status changed; resolution set

Leaving this ticket open in case you want to use it to track removal of the earlier attempts at fixing, in case you don't want to keep them.

Thanks!

This fix was included in the 3.0.9 release.


Sat, 23 Jan 2021 05:58:11 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2690