Xpra: Ticket #1633: protocol queue deadlock on close

Was easily triggered with the RFB adaptor (#1620) and an ultravnc client. Shows up as this gdb backtrace:

Traceback (most recent call first):
  Waiting for the GIL
  File "/usr/lib64/python2.7/threading.py", line 340, in wait
    waiter.acquire()
  File "/usr/lib64/python2.7/Queue.py", line 126, in put
    self.not_full.wait()
  File "/usr/lib64/python2.7/site-packages/xpra/server/rfb/rfb_protocol.py", line 242, in raw_write
    self._write_queue.put(contents)
  File "/usr/lib64/python2.7/site-packages/xpra/server/rfb/rfb_protocol.py", line 233, in send
    self.raw_write(packet)
  File "/usr/lib64/python2.7/site-packages/xpra/server/rfb/rfb_source.py", line 88, in send
    p.send(msg)
  File "/usr/lib64/python2.7/site-packages/xpra/server/rfb/rfb_source.py", line 74, in damage
    self.send(pixels)
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 3054, in _damage
    ss.damage(wid, window, x, y, width, height, options)
  File "/usr/lib64/python2.7/site-packages/xpra/server/rfb/rfb_server.py", line 152, in _process_rfb_FramebufferUpdateRequest
    self._damage(model, x, y, w, h)
  File "/usr/lib64/python2.7/site-packages/xpra/gtk_common/gtk_util.py", line 423, in gtk_main
    gtk.main()
  File "/usr/lib64/python2.7/site-packages/xpra/server/gtk_server_base.py", line 64, in do_run
    gtk_main()
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_core.py", line 479, in run
    self.do_run()
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 881, in run
    return ServerCore.run(self)
  File "/usr/lib64/python2.7/site-packages/xpra/scripts/server.py", line 1000, in run_server
    e = app.run()
  File "/usr/lib64/python2.7/site-packages/xpra/scripts/main.py", line 1469, in run_mode
    return run_server(error_cb, options, mode, script_file, args, current_display)
  File "/usr/lib64/python2.7/site-packages/xpra/scripts/main.py", line 177, in main
    return run_mode(script_file, err, options, args, mode, defaults)
  File "/usr/bin/xpra", line 15, in <module>
    sys.exit(main(sys.argv[0], sys.argv))

We're waiting for the queue to be empty to add a packet to it, but the queue has been flushed already and we've put the None packet in there and the IO thread has terminated (leaving the None packet in there). And so the queue will never be emptied again and this thread will wait forever - and since we call the RFB code in the main thread... that's even worse here.



Thu, 07 Sep 2017 12:12:59 GMT - Antoine Martin: status changed

r16796 takes the lazy approach of allowing 2 items in the write queue.

I am keeping this ticket open because:


Tue, 24 Oct 2017 16:23:35 GMT - Antoine Martin: milestone changed

meh - fixing this is hard


Mon, 29 Oct 2018 08:28:47 GMT - Antoine Martin: status, milestone changed; resolution set

See ticket:1948#comment:1, and in particular r20355.


Sat, 23 Jan 2021 05:29:39 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1633