Xpra: Ticket #306: Windows client not responding to server pings and timing out after 60 seconds

This is an odd one. There is one, and only one (so far) windows client that drops xpra sessions after 60 seconds at server request due to client ping timeout.

(Other clients seem to work fine. The machine in question, however, has been dropping xpra sessions repeatedly for some weeks now with every new version of xpra. What could be different?)

The client does this even after uninstalling xpra, uninstalling anti-virus software and disabling windows firewall, before re-installing xpra.

It does it with 0.9.0 beta and 0.8.8 stable xpra windows client, with 0.9.0 fedora server xpra. With a 0.8.5 fedora server, however, the sessions don't drop.

--enable-pings doesn't have any effect (server or client-side). Sound and video are working, until the 60 second timeout occurs. Windows clients no longer have the option of turning off sound (short of disabling pulseaudio).

It looks like there is a problem, sometimes, with the client-side ping response to the server. Maybe a patch that includes an option for debugging output for ping response client-side would provide more insight?



Tue, 02 Apr 2013 23:28:12 GMT - alas: attachment set

WindowsClient_PingTimeout_Log


Tue, 02 Apr 2013 23:35:41 GMT - alas: attachment set

Xpra0.9.0_droppingBug_Server-side_Log


Wed, 03 Apr 2013 08:59:11 GMT - Antoine Martin: owner changed

I know it's a real pain on win32, but please include the log files as cut&copied text in code blocks and not as a screenshots (which isn't searchable or cut&pastable), use shell redirection if needed. None of the gmail screenshot links work, please use the attachment feature of trac instead.

Also, please include the full version numbers and always test with the latest stable releases as a baseline (0.8.8 at time of writing).


As for the bug itself, I've mentioned the following things to Smo:

Since the problem is limited to this one machine, I strongly suspect that something in the environment is making it misbehave, so please check other env variables too (%PATH%, etc) and look for things that may clash with our binary (OTOH: cygwin, gstreamer, gtk, etc)

You can also verify whether the ping-echo is being sent or not using wiki/Debugging (run with XPRA_USE_ALIASES=0 and -z 0) You will then see:

XPRA_USE_ALIASES=0 xpra attach -z 0 tcp:127.0.0.1:10000 &
ngrep -d lo "ping" port 10000
##
T 127.0.0.1:57491 -> 127.0.0.1:10000 [AP]
  P.........pingA...=..."
##
T 127.0.0.1:10000 -> 127.0.0.1:57491 [AP]
  P.........ping_echoA...=..."?.B?..?.X.

Just be aware that each end pings the other (which can be confusing).


Windows clients no longer have the option of turning off sound (short of disabling pulseaudio).

For this, please see ticket/297


Failing everything else, this is just a wild guess, but does this client patch make any difference?:

--- src/xpra/client_base.py	(revision 3046)
+++ src/xpra/client_base.py	(working copy)
@@ -342,7 +342,7 @@
             packet_type = self._aliases.get(packet_type)
         handler = self._packet_handlers.get(packet_type)
         if handler:
-            handler(packet)
+            gobject.idle_add(handler, packet)
             return
         handler = self._ui_packet_handlers.get(packet_type)
         if not handler:

Thu, 04 Apr 2013 13:10:59 GMT - Antoine Martin: description changed

Adding some info received via email (...) and edited bug text to remove the dead links.


Summary:

so the server is responding to the pings

(so the network layer seems to be ok at that point still)

But strangely enough, not a single ping from the server (and therefore no echo from the client)


Things left to test:

(though this only tells us that we ask for the packet to be sent.. no guarantees that it does get sent if the network layer is somehow broken)


Once the problematic commit is identified, we then need to figure out what to change on the client side to make it all work again. (unless there really is something wrong server-side..)


Tue, 09 Apr 2013 22:55:01 GMT - alas: status changed

Without changing anything client-side, but using a new fedora build server-side to include debugging tools- the problem seems to have gone into hiding. (Trying it with yet a different server also seems to have led the problem to go into hiding.)

I'll leave the ticket open for a time to see if it reappears.


Fri, 12 Apr 2013 16:03:12 GMT - Antoine Martin: status changed; resolution set

Fixed - see #308


Sat, 23 Jan 2021 04:51:13 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/306