xpra icon
Bug tracker and wiki

Opened 4 months ago

Closed 3 months ago

Last modified 3 months ago

#1853 closed defect (worksforme)

After connection timeout, Xpra-Launcher.exe does not terminate but stays as a zombie process

Reported by: Lukas Haase Owned by: Lukas Haase
Priority: major Milestone: 2.4
Component: client Version: 2.3.x
Keywords: Cc:

Description (last modified by Antoine Martin)

For quite a long time (at least 2.1.3, r17250) but also with the newest 2.3 (r19246) I suffer from the following issue:

In Windows 10, when the connection to an xpra server drops I get the usual message (TortoisePlink Fatal Error Network error: Software caused connection abort) and then I quit the Xpra dialog (which invites me to reconnect) but then Xpra-Launcher.exe stays as zombie process in the task manager forever until I manually kill them.

Sometimes I use xpra so regularly that I end up having hundreds of zombies.

This effect does not show up when I manually quit the session via the context menu.

I start the session via an xpra file that has for example the following content:

username=user
encoding=rgb
ssh_port=22
speed=0
min-speed=0
host=terminal-server
min-quality=30
mode=ssh
ssh=plink -noagent -i c:\\Users\\user\\AppData\\Roaming\\Xpra\\pubkey.ppk
opengl=yes
password=
quality=0
port=2012
autoconnect=True
speaker=off

In case it helps I attach screenshots of Process Explorer showing the threads and the stack of such a zombie process. NtUserMsgWaitForMultipleObjectsEx indicates to me that it waits for something (i.e. feedback from the connection) which never occurs.

Attachments (3)

threads.png (21.4 KB) - added by Lukas Haase 4 months ago.
stack_msvcrt.png (7.2 KB) - added by Lukas Haase 4 months ago.
stack_xpra_launcher.png (21.6 KB) - added by Lukas Haase 4 months ago.

Download all attachments as: .zip

Change History (12)

Changed 4 months ago by Lukas Haase

Attachment: threads.png added

Changed 4 months ago by Lukas Haase

Attachment: stack_msvcrt.png added

Changed 4 months ago by Lukas Haase

Attachment: stack_xpra_launcher.png added

comment:1 Changed 4 months ago by Antoine Martin

Description: modified (diff)
Milestone: 2.4
Status: newassigned

Thanks for the detailed bug report, I'll try to reproduce.
The NtUserMsgWaitForMultipleObjectsEx is somewhere in glib, so we'll have to find a way to cancel that or just not get in that situation in the first place..

comment:2 Changed 4 months ago by Antoine Martin

Owner: changed from Antoine Martin to Lukas Haase
Status: assignednew

Should be fixed in r19412. r19390 is also worth having.

AFAICT, this bug has been present forever.

Please try one of the latest beta 2.4 builds from https://xpra.org/beta/windows.
If the fix works for you, this can be included in the 2.3.1 release.

Last edited 4 months ago by Antoine Martin (previous) (diff)

comment:3 Changed 4 months ago by Lukas Haase

Antoine, thanks for working on this so quickly. It's a pleasure how much effort is put into Xpra!

Just a quick question: What are the differences on the files from https://xpra.org/beta/windows/ ?

My assumptions:
1.) Xpra_Setup_2.4-r19414M.exe = Xpra_Setup.exe
2.) Xpra_2.4-r19414M.zip = Xpra.zip
3.) Xpra.zip: no installer

What's left:
1.) Xpra vs. Xpra-Client?
2.) Xpra vs. Xpra-python3 (and Xpra-Client vs. Xpra-Client-Python3)?

On the topic: Can it be that something more profound changed for this beta? When I open my connection via xpra file, as usual, I get back to the connection window with the error: "SSH connection failure". When I try with command line:

Xpra_cmd.exe attach ssh:me@server.edu:1985
2018-05-26 23:29:47,112 Xpra gtk2 client version 2.3-r19246 64-bit
2018-05-26 23:29:47,115  running on Microsoft Windows 10
2018-05-26 23:29:47,354 GStreamer version 1.14.0 for Python 2.7.15 64-bit
2018-05-26 23:29:47,518 OpenGL_accelerate module loaded
2018-05-26 23:29:47,532 Using accelerated ArrayDatatype
2018-05-26 23:29:47,885 Warning: vendor 'Intel' is greylisted,
2018-05-26 23:29:47,886  you may want to turn off OpenGL if you encounter bugs
2018-05-26 23:29:47,940 OpenGL enabled with Intel(R) HD Graphics 4000
2018-05-26 23:29:47,956  desktop size is 1366x768 with 1 screen:
2018-05-26 23:29:47,956   Default (361x203 mm - DPI: 96x96) workarea: 1366x738
2018-05-26 23:29:47,957     DISPLAY1 (277x156 mm - DPI: 125x125)
2018-05-26 23:29:47,967  keyboard settings: layout=us
2018-05-26 23:29:48,626 Error: failed to receive anything, not an xpra server?
2018-05-26 23:29:48,628   could also be the wrong protocol, username, password or port
2018-05-26 23:29:48,631 Connection lost

Server says:

$ xpra --version
xpra v1.0.2-r14941

(I can imagine that the server might be pretty old but it has to run on an old RedHat machine. Besides, I don't have root rights)

Last edited 4 months ago by Antoine Martin (previous) (diff)

comment:4 Changed 4 months ago by Antoine Martin

My assumptions:
(..)

Correct.

  • the "client" ms windows builds cannot be used to run a shadow server
  • the "python3" builds are a work in progress, on ms windows the port is mostly complete and those should work almost as well as the regular (python2) builds: #1818
  • there are also (added just now but usually present) "x86_64" builds - which are the default and should be preferred over the 32-bit builds. (those are faster / safer)

More details can be found here: wiki/Download.

On the topic: Can it be that something more profound changed for this beta?
When I open my connection via xpra file, as usual, I get back to the connection window with the error: "SSH connection failure".

The only change related to SSH connections is this fix: r19430.
It shouldn't have any effect if you don't specify a password to use, and the build number you are showing doesn't have this change:

2018-05-26 23:29:47,112 Xpra gtk2 client version 2.3-r19246 64-bit

This looks like a standard 2.3 release build, not the latest beta which should be 2.4-r19414 or later, r19487 was added to the repository today.

To debug SSH connection issues, you can set the environment variable: XPRA_SSH_DEBUG=1.

xpra v1.0.2-r14941

The current 1.x version is 1.0.11, there have been many critical bug fixes since 1.0.2 (crashes, etc)... use at your own risk!

comment:5 Changed 4 months ago by Lukas Haase

To debug SSH connection issues, you can set the environment variable: XPRA_SSH_DEBUG=1.

set XPRA_SSH_DEBUG=1 before calling Xpra_cmd.exe does not do anything for me, unfortunately.

However, I started now:

xpra_cmd -d all attach ssh/me@server/1985

Does this help at all?

[...]
2018-05-27 01:47:03,332 callbacks for event WM_DWMNCRENDERINGCHANGED: None
2018-05-27 01:47:03,333 WM_DWMNCRENDERINGCHANGED: 1 / 0
2018-05-27 01:47:03,340 DefWindowProc(12981958, 799L, 1L, 0)=0
2018-05-27 01:47:03,341 NotifyIconWndProc(22877740, 799L, 1L, 0) instance=<xpra.platform.win32.win32_NotifyIcon.win32NotifyIcon object at 0x09fcb690>, message(799)=None
2018-05-27 01:47:03,342 io_thread_loop(read, <bound method Protocol._read of Protocol(Pipe(ssh://me@server/:1985))>) loop starting
2018-05-27 01:47:03,737 read_parse_thread_loop starting
2018-05-27 01:47:03,738 read thread: eof
2018-05-27 01:47:03,744 parse thread: empty marker, exiting
2018-05-27 01:47:03,745 io_thread_loop(read, <bound method Protocol._read of Protocol(Pipe(ssh://me@server/:1985))>) loop ended, closed=False
2018-05-27 01:47:03,746 Protocol.close() closed=False, connection=Pipe(ssh://me@server/:1985)
2018-05-27 01:47:03,748 Protocol.close() calling <bound method TwoFileConnection.close of Pipe(ssh://me@server/:1985)>
2018-05-27 01:47:03,749 Pipe(ssh://me@server/:1985).close() close callback=<function stop_tunnel at 0x0a051d30>, readable=<open file '<fdopen>', mode 'rb' at 0x0a003e90>, writeable=<open file '<fdopen>', mode 'wb' at 0x0a003ee8>
2018-05-27 01:47:03,749 Pipe(ssh://me@server/:1985).close() calling <function stop_tunnel at 0x0a051d30>
2018-05-27 01:47:03,750 Pipe(ssh://me@server/:1985).close() done
2018-05-27 01:47:03,754 terminate_queue_threads()
2018-05-27 01:47:03,755 write thread: empty marker, exiting
2018-05-27 01:47:03,755 Protocol.close() done
2018-05-27 01:47:03,756 Protocol.close() closed=True, connection=None
2018-05-27 01:47:03,757 check_server_echo(0) last=True, server_ok=True (last_ping_echoed_time=0)
2018-05-27 01:47:03,757 io_thread_loop(write, <bound method Protocol._write of Protocol(None)>) loop ended, closed=True
2018-05-27 01:47:03,758 Error: failed to receive anything, not an xpra server?
2018-05-27 01:47:03,759   could also be the wrong protocol, username, password or port
2018-05-27 01:47:03,760 Connection lost
2018-05-27 01:47:03,760 GTKXpraClient.quit(1) current exit_code=None
2018-05-27 01:47:03,761 UIXpraClient.cleanup()
[...]

The current 1.x version is 1.0.11, there have been many critical bug fixes since 1.0.2 (crashes, etc)... use at your own risk!

I just saw we're using the winswitch.repo (http://winswitch.org/dists/CentOS). Maybe a yum update or so is sufficient. I'll check with my sysadmin. (System is CentOS 6.8)

Last edited 4 months ago by Lukas Haase (previous) (diff)

comment:6 Changed 4 months ago by Antoine Martin

set XPRA_SSH_DEBUG=1 before calling Xpra_cmd.exe does not do anything for me, unfortunately.

This will log the ssh command string used, but you've omitted most of the logs, so it cannot be seen.

Does this help at all?
(..)

No, the ssh connection bits would be near the top, which is missing.

Maybe a yum update or so is sufficient.

It should be enough.
There are some occasional dependency issues with older distros like centos 6.8 (ie: ffmpeg rpm soname version conflicts), let me know if you hit any.

comment:7 Changed 3 months ago by Antoine Martin

Resolution: worksforme
Status: newclosed

comment:8 Changed 3 months ago by Lukas Haase

After some debugging, I think I could track it down: There seems to be a difference in how path is resolved and hence, which plink.exe is used.

When I just enter plink.exe in the command prompt, I get the TortoisePlink, Release 0.68. Strangely enough, this does not work: When I execute plink user@server, it immideately returns.

It seems that the old xpra version always used the plink.exe that's shipped with it (in the same directory) whereas the new one uses the one in the path first which is fairly strange! I will create a new bug report for this one.

In any case, in my with the newest version (2.3.2-r19729), it seems that this is fixed. At least with the tests that I made so far. If it appears again, I will re-open this ticket.

Last edited 3 months ago by Antoine Martin (previous) (diff)

comment:9 Changed 3 months ago by Antoine Martin

It seems that the old xpra version always used the plink.exe that's shipped with it (in the same directory) whereas the new one uses the one in the path first which is fairly strange! I will create a new bug report for this one.

New ticket: #1892

Note: See TracTickets for help on using tickets.