Testing for #713 by leaving some windows (including a firefox window playing some random something or other), osx client 1.0 r13763 against fedora 23 server 1.0 r13778, I got a variety of server and gstreamer error messages after a time, leading to a server disconnection of the client.
The client hung, not fully disconnecting after the server disconnected it and not disconnecting after using a control-c ... I had to Force Quit it.
Some server error messages (more for context than anything):
2016-09-19 14:33:55,015 client 2: server is not responding, drawing spinners over the windows 2016-09-19 14:33:56,057 client 2: server is OK again 2016-09-19 14:34:41,614 client 2: Warning: long timer waiting time, 2016-09-19 14:34:41,616 client 2: UI thread polling waited 10.0 seconds longer than intended (10.5 vs 0.5) ... 2016-09-19 14:57:15,159 client 2: UI thread polling waited 19.7 seconds longer than intended (20.2 vs 0.5) 2016-09-19 14:57:20,223 sound source pipeline warning: Can't record audio fast enough 2016-09-19 14:57:20,449 sound source gstaudiobasesrc.c(866) 2016-09-19 14:57:20,449 sound source gst_audio_base_src_create () 2016-09-19 14:57:20,449 sound source /GstPipeline 2016-09-19 14:57:20,450 sound source pipeline0/GstPulseSrc 2016-09-19 14:57:20,450 sound source src 2016-09-19 14:57:20,450 sound source Dropped 9600 samples. This is most likely because downstream can't keep up and is consuming samples too slowly. 2016-09-19 14:57:35,116 client 2: Warning: long timer waiting time, 2016-09-19 14:57:35,166 client 2: UI thread polling waited 19.5 seconds longer than intended (20.0 vs 0.5) 2016-09-19 14:57:55,135 client 2: Warning: long timer waiting time, 2016-09-19 14:57:55,161 client 2: UI thread polling waited 19.5 seconds longer than intended (20.0 vs 0.5) 2016-09-19 14:58:07,015 sound source pipeline warning: Can't record audio fast enough 2016-09-19 14:58:07,158 sound source gstaudiobasesrc.c(866) 2016-09-19 14:58:07,218 sound source gst_audio_base_src_create () 2016-09-19 14:58:07,219 sound source /GstPipeline 2016-09-19 14:58:07,219 sound source pipeline0/GstPulseSrc 2016-09-19 14:58:07,219 sound source src 2016-09-19 14:58:07,219 sound source Dropped 11520 samples. This is most likely because downstream can't keep up and is consuming samples too slowly. ... 2016-09-19 15:02:05,087 client 2: UI thread polling waited 19.5 seconds longer than intended (20.0 vs 0.5) 2016-09-19 15:02:21,763 client 2: server is not responding, drawing spinners over the windows 2016-09-19 15:02:43,392 sound source pipeline warning: Can't record audio fast enough 2016-09-19 15:02:47,509 sound source gstaudiobasesrc.c(866) W: [null-sink] asyncq.c: q overrun, queuing locally 2016-09-19 15:02:55,913 sound source gst_audio_base_src_create () 2016-09-19 15:03:09,793 sound source /GstPipeline 2016-09-19 15:03:27,683 sound source pipeline0/GstPulseSrc 2016-09-19 15:03:33,455 sound source src 2016-09-19 15:03:34,666 client 2: server is OK again 2016-09-19 15:03:34,404 sound source Dropped 31200 samples. This is most likely because downstream can't keep up and is consuming samples too slowly. 2016-09-19 15:04:00,178 client 2: server is not responding, drawing spinners over the windows 2016-09-19 15:04:12,752 sound source pipeline warning: Can't record audio fast enough 2016-09-19 15:04:28,209 sound source gstaudiobasesrc.c(866) 2016-09-19 15:04:37,057 sound source gst_audio_base_src_create () 2016-09-19 15:04:39,884 sound source /GstPipeline 2016-09-19 15:04:47,753 sound source pipeline0/GstPulseSrc 2016-09-19 15:04:39,548 Disconnecting client 10.0.11.102:53336: 2016-09-19 15:04:57,737 client ping timeout (waited 60 seconds without a response) 2016-09-19 15:05:04,838 sound source src 2016-09-19 15:05:09,486 client 2: server is OK again 2016-09-19 15:05:24,902 sound source Dropped 16800 samples. This is most likely because downstream can't keep up and is consuming samples too slowly. ... Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 1836, in check_echo_timeout 2016-09-19 15:06:36,852 sound source pipeline warning: Can't record audio fast enough 2016-09-19 15:06:37,877 client 2: server is OK again self.disconnect(CLIENT_PING_TIMEOUT, "waited %s seconds without a response" % timeout) File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 1105, in drop_client self.disconnect_client(proto, reason, *args) File "/usr/lib64/python2.7/site-packages/xpra/server/server_core.py", line 788, in disconnect_client 2016-09-19 15:06:41,091 sound source gstaudiobasesrc.c(866) self.disconnect_protocol(protocol, reason, *extra) File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 936, in disconnect_protocol 2016-09-19 15:06:41,088 client 2: server is not responding, drawing spinners over the windows self.cleanup_protocol(protocol) File "/usr/lib64/python2.7/site-packages/xpra/x11/server.py", line 377, in cleanup_protocol 2016-09-19 15:06:46,317 sound source gst_audio_base_src_create () X11ServerBase.cleanup_protocol(self, protocol) File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 948, in cleanup_protocol self.cleanup_source(source) 2016-09-19 15:06:47,388 sound source /GstPipeline File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 957, in cleanup_source 2016-09-19 15:06:49,763 sound source pipeline0/GstPulseSrc 2016-09-19 15:06:53,206 sound source src source.close() 2016-09-19 15:07:02,808 sound source Dropped 13440 samples. This is most likely because downstream can't keep up and is consuming samples too slowly. File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 466, in close self.remove_printers() File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 1818, in remove_printers remove_printer(k) File "/usr/lib64/python2.7/site-packages/xpra/platform/pycups_printing.py", line 296, in remove_printer exec_lpadmin(["-x", PRINTER_PREFIX+sanitize_name(name)]) File "/usr/lib64/python2.7/site-packages/xpra/platform/pycups_printing.py", line 236, in exec_lpadmin 2016-09-19 15:07:45,602 sound source pipeline warning: Can't record audio fast enough 2016-09-19 15:07:50,692 sound source gstaudiobasesrc.c(866) 2016-09-19 15:07:55,331 sound source gst_audio_base_src_create () 2016-09-19 15:08:02,656 sound source /GstPipeline 2016-09-19 15:08:07,708 client 2: server is OK again proc = subprocess.Popen(command, stdin=None, stdout=None, stderr=None, shell=False, close_fds=True, preexec_fn=preexec) File "/usr/lib64/python2.7/subprocess.py", line 710, in __init__ 2016-09-19 15:08:16,953 sound source pipeline0/GstPulseSrc 2016-09-19 15:08:17,401 client 2: server is not responding, drawing spinners over the windows errread, errwrite) File "/usr/lib64/python2.7/subprocess.py", line 1231, in _execute_child self.pid = os.fork() 2016-09-19 15:08:29,005 sound source src 2016-09-19 15:08:34,557 sound source Dropped 17280 samples. This is most likely because downstream can't keep up and is consuming samples too slowly. 2016-09-19 15:08:36,756 client 2: server ping timeout - waited 60 seconds without a response OSError: [Errno 12] Cannot allocate memory ... 2016-09-19 15:09:22,671 xpra client 2 disconnected. 2016-09-19 15:09:23,928 sound source gstaudiobasesrc.c(866) 2016-09-19 15:09:25,193 sound source gst_audio_base_src_create () 2016-09-19 15:09:29,451 sound source /GstPipeline 2016-09-19 15:09:30,078 sound source pipeline0/GstPulseSrc 2016-09-19 15:09:30,983 sound source src 2016-09-19 15:09:31,535 sound source Dropped 12960 samples. This is most likely because downstream can't keep up and is consuming samples too slowly. 2016-09-19 15:09:32,469 sound source stopping
Meanwhile, client side:
2016-09-19 15:02:29,926 server is not responding, drawing spinners over the windows 2016-09-19 15:02:52,853 server is OK again 2016-09-19 15:02:53,039 server is not responding, drawing spinners over the windows 2016-09-19 15:02:55,639 server ping timeout - waited 60 seconds without a response 2016-09-19 15:02:55,692 sound output stopping /Users/Schadenfreude/Desktop/xpra-catalog/xpr-ant-1-13763/Xpra.app/Contents/Resources/lib/python/xpra/client/gtk_base/gtk_client_window_base.py:1279: Warning: g_object_unref: assertion 'G_IS_OBJECT (object)' failed gtk.Window.destroy(self) /Users/Schadenfreude/Desktop/xpra-catalog/xpr-ant-1-13763/Xpra.app/Contents/Resources/lib/python/xpra/gtk_common/gtk_util.py:383: Warning: g_object_unref: assertion 'G_IS_OBJECT (object)' failed gtk.main() 2016-09-19 15:02:55,727 Connection lost ^C got signal SIGINT, exiting Killed: 9
After Force Quitting... I re-attached the client with a -d client
, then, once it was connected again, tried a control-c again client side... triggering another hang with the following:
2016-09-19 15:30:26,066 check_server_echo(0) last=True, server_ok=True 2016-09-19 15:30:26,107 check_server_echo(0) last=True, server_ok=True ^C got signal SIGINT, exiting 2016-09-19 15:30:26,134 UIXpraClient.signal_cleanup() 2016-09-19 15:30:26,134 stop_receiving_sound(True) sound sink=sink_subprocess_wrapper(17588) 2016-09-19 15:30:26,134 sound output stopping 2016-09-19 15:30:26,134 stop_receiving_sound(True) calling <bound method sink_subprocess_wrapper.cleanup of sink_subprocess_wrapper(17588)> 2016-09-19 15:30:26,135 stop_receiving_sound(True) done 2016-09-19 15:30:26,136 UIXpraClient.signal_cleanup() done 2016-09-19 15:30:26,137 signal_disconnect_and_quit(130, exit on signal SIGINT) exit_on_signal=False 2016-09-19 15:30:26,137 disconnect_and_quit(130, exit on signal SIGINT) 2016-09-19 15:30:26,137 GTKXpraClient.quit(130) current exit_code=130 2016-09-19 15:30:26,138 UIXpraClient.cleanup() 2016-09-19 15:30:26,138 XpraClientBase.cleanup() protocol=Protocol(tcp socket: 10.0.11.102:53953 <- 10.0.32.134:1203) 2016-09-19 15:30:26,139 calling <bound method Protocol.close of Protocol(tcp socket: 10.0.11.102:53953 <- 10.0.32.134:1203)> 2016-09-19 15:30:26,139 cleanup done 2016-09-19 15:30:26,144 UIXpraClient.cleanup() calling <class 'xpra.client.gtk_base.gtk_keyboard_helper.GTKKeyboardHelper'>.cleanup() 2016-09-19 15:30:26,144 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.osx_clipboard.OSXClipboardProtocolHelper'>.cleanup() 2016-09-19 15:30:26,146 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.osx_tray.OSXTray'>.cleanup() 2016-09-19 15:30:26,147 UIXpraClient.cleanup() calling <class 'xpra.client.gtk2.gtk2_notifier.GTK2_Notifier'>.cleanup() 2016-09-19 15:30:26,147 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.osx_menu.OSXMenuHelper'>.cleanup() 2016-09-19 15:30:26,147 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.gui.ClientExtras'>.cleanup() 2016-09-19 15:30:26,147 UIXpraClient.cleanup() calling <class 'xpra.codecs.video_helper.VideoHelper'>.cleanup() 2016-09-19 15:30:26,147 destroy_all_windows() destroying 24 / GLClientWindow(24 : gtk2.GLWindowBacking(24, (499, 316), None)) 2016-09-19 15:30:26,147 destroy_window(24, GLClientWindow(24 : gtk2.GLWindowBacking(24, (499, 316), None))) 2016-09-19 15:30:26,153 destroy_all_windows() destroying 1 / GLClientWindow(1 : gtk2.GLWindowBacking(1, (499, 316), None)) 2016-09-19 15:30:26,153 destroy_window(1, GLClientWindow(1 : gtk2.GLWindowBacking(1, (499, 316), None))) 2016-09-19 15:30:26,157 destroy_all_windows() destroying 2 / GLClientWindow(2 : gtk2.GLWindowBacking(2, (499, 316), None)) 2016-09-19 15:30:26,157 destroy_window(2, GLClientWindow(2 : gtk2.GLWindowBacking(2, (499, 316), None))) 2016-09-19 15:30:26,161 destroy_all_windows() destroying 3 / GLClientWindow(3 : gtk2.GLWindowBacking(3, (1511, 826), None)) 2016-09-19 15:30:26,161 destroy_window(3, GLClientWindow(3 : gtk2.GLWindowBacking(3, (1511, 826), None))) 2016-09-19 15:30:26,174 destroy_all_windows() destroying 18 / GLClientWindow(18 : gtk2.GLWindowBacking(18, (1108, 688), None)) 2016-09-19 15:30:26,174 destroy_window(18, GLClientWindow(18 : gtk2.GLWindowBacking(18, (1108, 688), None))) 2016-09-19 15:30:26,186 XpraClient.clean_mmap() mmap_filename=None 2016-09-19 15:30:26,186 UIXpraClient.cleanup() done 2016-09-19 15:30:26,187 GTKXpraClient.quit(130) cleanup done, main_level=1 2016-09-19 15:30:26,187 GTKXpraClient.quit(130) main loop at level 1, calling gtk quit via timeout 2016-09-19 15:30:26,187 GTKXpraClient.exit() calling <function gtk_main_quit_really at 0x69fa170> 2016-09-19 15:30:26,187 GTKXpraClient.run_main_loop() main loop ended, returning exit_code=130 2016-09-19 15:30:26,187 UIXpraClient.cleanup() 2016-09-19 15:30:26,187 XpraClientBase.cleanup() protocol=None 2016-09-19 15:30:26,187 cleanup done 2016-09-19 15:30:26,188 UIXpraClient.cleanup() calling <class 'xpra.client.gtk_base.gtk_keyboard_helper.GTKKeyboardHelper'>.cleanup() 2016-09-19 15:30:26,188 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.osx_clipboard.OSXClipboardProtocolHelper'>.cleanup() 2016-09-19 15:30:26,188 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.osx_tray.OSXTray'>.cleanup() 2016-09-19 15:30:26,188 UIXpraClient.cleanup() calling <class 'xpra.client.gtk2.gtk2_notifier.GTK2_Notifier'>.cleanup() 2016-09-19 15:30:26,188 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.osx_menu.OSXMenuHelper'>.cleanup() 2016-09-19 15:30:26,188 UIXpraClient.cleanup() calling <class 'xpra.platform.darwin.gui.ClientExtras'>.cleanup() 2016-09-19 15:30:26,188 UIXpraClient.cleanup() calling <class 'xpra.codecs.video_helper.VideoHelper'>.cleanup() 2016-09-19 15:30:26,189 XpraClient.clean_mmap() mmap_filename=None 2016-09-19 15:30:26,189 UIXpraClient.cleanup() done
After which, I had to use Force Quit again.
Some server side logs from the second attempt, with an apparent gap between the client-side control-c and the Disconnecting client
message that seems to coincide with the Force quit:
2016-09-19 15:30:26,633 client 4: check_server_echo(0) last=True, server_ok=True 2016-09-19 15:30:26,661 client 4: UIXpraClient.signal_cleanup() 2016-09-19 15:30:26,661 client 4: stop_receiving_sound(True) sound sink=sink_subprocess_wrapper(17588) 2016-09-19 15:30:26,664 client 4: stop_receiving_sound(True) calling <bound method sink_subprocess_wrapper.cleanup of sink_subprocess_wrapper(17588)> 2016-09-19 15:30:26,664 client 4: stop_receiving_sound(True) done 2016-09-19 15:30:26,664 client 4: UIXpraClient.signal_cleanup() done 2016-09-19 15:30:26,665 client 4: signal_disconnect_and_quit(130, exit on signal SIGINT) exit_on_signal=False 2016-09-19 15:30:26,665 client 4: disconnect_and_quit(130, exit on signal SIGINT) 2016-09-19 15:30:26,667 sound source stopping 2016-09-19 15:31:30,735 Disconnecting client 10.0.11.102:53953: 2016-09-19 15:31:30,736 client ping timeout (waited 60 seconds without a response) 2016-09-19 15:31:30,875 xpra client 4 disconnected.
Your server is having some serious problems:
This is most likely because downstream can't keep up and is consuming samples too slowly
: this is #1067 - likely caused by an underpowered or overloaded server
File "/usr/lib64/python2.7/site-packages/xpra/platform/pycups_printing.py", line 296, in remove_printer exec_lpadmin(["-x", PRINTER_PREFIX+sanitize_name(name)]) File "/usr/lib64/python2.7/site-packages/xpra/platform/pycups_printing.py", line 236, in exec_lpadmin proc = subprocess.Popen(command, stdin=None, stdout=None, stderr=None, shell=False, close_fds=True, preexec_fn=preexec) File "/usr/lib64/python2.7/subprocess.py", line 710, in __init__ 2016-09-19 15:08:16,953 sound source pipeline0/GstPulseSrc errread, errwrite) File "/usr/lib64/python2.7/subprocess.py", line 1231, in _execute_child self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory
But looks similar: your server is running out of memory.. Is this caused by the sound accumulating? Underpowered server setup? An application using up too much memory? It could also be the xpra server process accumulating too many video frames for av-async. Knowing which one is using up all the memory will tell us where the fix needs to go.
Those are completely different bugs than the osx client hang and should be filed separately.
As for the OSX hang itself, I am unable to reproduce it.
If you can reproduce it, please try running with "-d all".
Also try sending a SIGUSR1 to the xpra client process when it is hung, this should make it dump all threads states to the debug output so we can see which ones are stuck and where. (ie: kill -SIGUSR1 $PIDOFXPRACLIENT
- you can find the pid using ps -ef | grep xpra
, the main xpra process pid should usually be the second column of the first line)
I am going to presume, for now, that it had more to do with just not bothering to reboot that server vm for months and I just managed to leave enough hanging processes to chew up it's relatively lean memory (though I will see about repro'ing now that I have rebooted it). I'll make a separate ticket if I can do that again.
As for the osx client... I seem to be able to reproduce this at will. Using fedora 23 with a newly built 1.0 r13786 server and using the 1.0 r13763 osx client from your beta repo, any time I use the control-c client side it is hanging.
I'll attach an "excerpt" log with what looks like the interesting bits with just the -d all
when I use control-c and also the interesting looking bits from the kill -SIGUSR1 20624
that I tried to use on a separate connection (which didn't kill the client actually, had to use force quit again).
I'll also attach the full client logs in each case, in case there's something there worth looking at.
log excerpts
-d all and use control-c
-d all and use SIGUSR1
I really don't understand this one at all and I just cannot reproduce it all.
All the threads waiting to exit are daemon threads, so they shouldn't prevent the main thread from terminating. Maybe r13789 will help if this is caused by deadlock in the threading module?
If the latest builds still show the problem, please try to figure out when this problem started occurring (nearest revision) and whether this affects all OSX versions or not. (and looking at the code, it should affect all platforms equally.. not just OSX!?)
See also #1292, which also "fixed itself"...
This could also be explained by the churn in #1306: if a library (openssl?) gets out of sync with the rest of the build env...
It's funny you should mention the "churn" in #1306 - I was testing a 1.0 r13786 and a 1.0 r13814 osx client built by Smo... and I wasn't seeing the same "hang" when trying to use control-c client-side to disconnect.
I will pursue this a bit more very soon, but wanted to mention that for now.
Testing a bit with your osx 1.0 r13888 client against a fedora 23 1.0 r13889 server... I'm still seeing a hang when trying to disconnect a client with a control-c.
... 2016-09-27 18:19:49,841 Xpra X11 server version 1.0-r13889 64-bit 2016-09-27 18:19:49,841 running on Linux Fedora 23 TwentyThree 2016-09-27 18:19:49,841 enabled remote logging 2016-09-27 18:19:49,858 Attached to tcp:10.0.32.134:1203 (press Control-C to detach) 2016-09-27 18:19:51,241 sound output using audio codec opus 2016-09-27 18:19:51,242 sound output using container format ogg ^C got signal SIGINT, exiting 2016-09-27 18:19:54,210 sound output stopping Exception in thread parse (most likely raised during interpreter shutdown): Traceback (most recent call last): File "threading.pyc", line 801, in __bootstrap_inner File "threading.pyc", line 754, in run File "/Users/Schadenfreude/Desktop/xpra-catalog/xpra-ant-1-13888/Xpra.app/Contents/Resources/lib/python/xpra/net/protocol.py", line 682, in _read_parse_thread_loop File "/Users/Schadenfreude/Desktop/xpra-catalog/xpra-ant-1-13888/Xpra.app/Contents/Resources/lib/python/xpra/net/protocol.py", line 708, in do_read_parse_thread_loop File "Queue.pyc", line 179, in get File "threading.pyc", line 384, in notify <type 'exceptions.TypeError'>: 'NoneType' object is not callable Exception in thread write (most likely raised during interpreter shutdown): Traceback (most recent call last): File "threading.pyc", line 801, in __bootstrap_inner File "threading.pyc", line 754, in run File "/Users/Schadenfreude/Desktop/xpra-catalog/xpra-ant-1-13888/Xpra.app/Contents/Resources/lib/python/xpra/net/protocol.py", line 583, in _write_thread_loop File "/Users/Schadenfreude/Desktop/xpra-catalog/xpra-ant-1-13888/Xpra.app/Contents/Resources/lib/python/xpra/net/protocol.py", line 565, in _io_thread_loop File "/Users/Schadenfreude/Desktop/xpra-catalog/xpra-ant-1-13888/Xpra.app/Contents/Resources/lib/python/xpra/net/protocol.py", line 585, in _write File "Queue.pyc", line 179, in get File "threading.pyc", line 384, in notify <type 'exceptions.TypeError'>: 'NoneType' object is not callable Killed: 9
... with the Killed: 9 triggered by a required force quit.
Interestingly, when I start the client with --opengl=on
(to override the Intel graphics card), the control-c doesn't generate any errors client side (though I do still have to use a force quit for the hung client).
Meanwhile, I'm noticing this error server-side with each disconnection, twice:
Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 1658, in do_send_cursor w, h, _xhot, _yhot, serial, pixels, name = cursor_data[2:8] ValueError: need more than 6 values to unpack
The traceback in do_send_cursor
has been fixed since r13891.
Diagnostics: running with "-d all" I see that the last few log entries were often related to printer forwarding, with more messages after the hang. Running with "--printing=no" fixes the problem.
Subtle bug fixed in r14104, closing. Feel free to re-open if you hit this again.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1314