xpra icon
Bug tracker and wiki

Opened 4 weeks ago

Closed 3 weeks ago

Last modified 3 weeks ago

#1456 closed defect (fixed)

Server crashes on launching application (inconsistently)

Reported by: psycho_zs Owned by: psycho_zs
Priority: blocker Milestone: 2.0
Component: server Version: trunk
Keywords: Cc:

Description

This does not happen every time I launch thunderbird, but sometimes.
The following error shows up in server's log before the crash:

2017-03-01 21:23:10,833 started command 'thunderbird' with pid 7119
[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.

I was able to revive the session with xpra upgrade afterwards

Attachments (3)

xcb_error.log (677.1 KB) - added by psycho_zs 4 weeks ago.
io_error.log (337.7 KB) - added by psycho_zs 4 weeks ago.
encoder_error.log (31.1 KB) - added by psycho_zs 3 weeks ago.

Download all attachments as: .zip

Change History (16)

comment:1 Changed 4 weeks ago by Antoine Martin

Milestone: 2.0
Status: newassigned

I have seen this very occasionally and indicates a bug in xpra.
Unfortunately, I have not been able to reproduce it reliably.
If you can reproduce it, please attach the end of the server's "-d all" log.

Changed 4 weeks ago by psycho_zs

Attachment: xcb_error.log added

comment:2 Changed 4 weeks ago by psycho_zs

Attached last 1k lines of log. With -d all the only new info at the moment of the crash is the python error.

Changed 4 weeks ago by psycho_zs

Attachment: io_error.log added

comment:3 Changed 4 weeks ago by psycho_zs

Another crash upon connecting, with IO error 11. X server is alive and session can be restored with xpra upgrade

comment:4 Changed 4 weeks ago by psycho_zs

...also I can not seem to launch xpra on server anymore, it outputs this:

$ xpra start :100
using systemd-run to wrap 'start' server command
'systemd-run' '--description' 'xpra-start' '--scope' '--user' '/usr/bin/xpra' 'start' ':100' '--systemd-run=no'
Failed to create bus connection: No such file or directory

Remote start works though.

comment:5 Changed 4 weeks ago by Antoine Martin

Lots of goodies in those log samples.
What distro is this? What exact command lines?

  • attachment/ticket/1456/io_error.log shows calls to xsettings just before the IO error and sure enough the X11 calls are unguarded - that's been the case since the very first version of xpra! r15207 fixes that. (will be backported to v1.0.x)
  • attachment/ticket/1456/xcb_error.log has more issues in it:
    • r15208 fixes a sound forwarding issue
    • the xcb_xlib_unknown_req_in_deq crash is the same one I had reported in #1430 and failed to reproduce since. It is possible that this is caused by the xsettings issue, but I wouldn't count on it.

Here's a list of significant events leading up to this last crash:

  • mouse events (motion and clicks)
  • client closes a window and we send the message asking it to close on the server:
    02:14:57,051 client closed window 2 - WindowModel(0x4000b4)
    02:14:57,051 sending WM_DELETE_WINDOW to 0x4000b4
    
  • we reset the focus to the world window
  • the application unmaps the window and we stop managing it:
    02:14:57,195 x_event_filter event=('xpra-unmap-event', 'xpra-child-unmap-event')/UnmapNotify window=0x4000b4
    02:14:57,197 unmanaging window: WindowModel(0x4000b4) \
        (<gtk.gdk.Window object at 0x7f772463fa00 (GdkWindow at 0x5627d55ea000)> - \
         <gtk.gdk.Window object at 0x7f772463f6e0 (GdkWindow at 0x5627d5597480)>)
    
  • we try to cleanup the window state (and get lots of errors since it's gone - which is expected):
    02:14:57,200 XError XError: BadWindow detected while already in unwind; discarding
    
  • we tell the client that the window is gone:
    02:14:57,200 add_packet_to_queue(lost-window ...)
    
  • the window in focus changes, and so does the cursor so we send the new cursor pixel data to the client:
    02:14:57,233 x_event_filter event=('xpra-focus-out-event', None)/FocusOut took 4.7ms
    02:14:57,233 get_cursor_data() cursor=[1072, 542, 16, 16, 7, 7, 1L, '1024 bytes', '']
    02:14:57,234 get_cursor_data(): default cursor - clearing it
    
  • a system tray window is destroyed and we tell the client (I assume thunderbird has some sort of system tray icon):
    02:14:58,819 x_event_filter event=('xpra-destroy-event', None)/DestroyNotify took 0.2ms
    02:14:58,819 x_event_filter event=('xpra-unmap-event', 'xpra-child-unmap-event')/UnmapNotify window=0x4000f3
    02:14:58,819 SystemTray.do_xpra_unmap_event(<X11:UnmapNotify \
        {'delivered_to': '0x4000f3', 'send_event': 0, 'window': '0x4000f3', 'serial': '0x175c', 'type': 18, \
         'display': <gtk.gdk.DisplayX11 object at 0x7f773896fa00 (GdkDisplayX11 at 0x5627d4cf7000)>}>) \
        container window=<gtk.gdk.Window object at 0x7f772463f500 (GdkWindow at 0x5627d5597ea0)>
    02:14:58,821 lost_window: 1 - SystemTrayWindowModel(0x20002e)
    02:14:58,821 add_packet_to_queue(lost-window ...)
    
  • again, lots of errors we ignore as we receive events for the already-destroyed window / try to clean it up
  • the thunderbird child process terminates:
    02:14:58,835 sigchld(17, <frame object at 0x7f7710003610>)
    02:14:58,835 poll() procinfo list: [ProcInfo({'returncode': None, 'name': 'thunderbird', 'process': <subprocess.Popen object at 0x7f77232b1450>, \
        'pid': 9936, 'dead': False, 'ignore': False, 'callback': None, 'command': ['thunderbird'], 'forget': False})]
    02:14:58,835 add_dead_process(ProcInfo({'returncode': None, 'name': 'thunderbird', 'process': <subprocess.Popen object at 0x7f77232b1450>, \
        'pid': 9936, 'dead': False, 'ignore': False, 'callback': None, 'command': ['thunderbird'], 'forget': False}))
    02:14:58,835 add_dead_process returncode=0, dead=True, callback=None
    02:14:58,835 child 'thunderbird' with pid 9936 has terminated
    
  • the client disconnects 4 seconds later:
    02:15:03,579 read thread: eof
    
  • a new client connects:
    02:15:05,543 processing packet hello
    
  • it requests the server to launch thunderbird:
    02:15:19,201 will process ui packet start-command
    02:15:19,201 start new command: ['start-command', 'thunderbird', 'thunderbird', False, True]
    

It would save me a lot of time to know that I only really care about the last few entries!

comment:6 Changed 4 weeks ago by psycho_zs

This was done on Current Debian testing, 64bit, xpra 2.0 r15180. Thunderbird with firetray extension.
Server launched with remote xpra start ssh:...:100 -d all
At some point it was re-launched after crash with the same X session by xpra upgrade :100 locally on the server, if I remember correctly. (local xpra start does not work).

comment:7 Changed 4 weeks ago by Antoine Martin

Priority: majorblocker

I have seen the crash - not sure how to reproduce it reliably yet.
Hints would be welcome.

comment:8 Changed 4 weeks ago by Antoine Martin

Owner: changed from Antoine Martin to psycho_zs
Status: assignednew

@psycho_zs: I am unable to make it crash now that I am ready to investigate it :(
I've stopped and restarted thunderbird dozens of times using xpra control :3 start thunderbird.
Then I hit a different bug when running debian stretch: we seem to be crashing handling the initial info request, it could be the same bug.

If you can reproduce the thunderbird one easily, can you try:

  • apply this to the client:
    --- xpra/client/ui_client_base.py	(revision 15219)
    +++ xpra/client/ui_client_base.py	(working copy)
    @@ -1981,7 +1981,7 @@
             gui_ready()
             if self.tray:
                 self.tray.ready()
    -        self.send_info_request()
    +        #self.send_info_request()
     
         def handshake_complete(self):
             oh = self._on_handshake
    
  • running with "--no-tray" to see if that helps
  • running r15207 or later to see if that helps

comment:9 Changed 4 weeks ago by Antoine Martin

Identified the initial connection problem - hopefully this is the last one: #1430.
The patch suggested in comment:8 should avoid causing the crash so easily, but the correct fix will prevent any "info" query from causing crashes.

Last edited 4 weeks ago by Antoine Martin (previous) (diff)

comment:10 Changed 4 weeks ago by Antoine Martin

Bug squashed, see ticket:1430#comment:4.
This affected both 1.0.x and trunk.

@psycho_zs: I would really appreciate if you could try to break it again now that I have committed those fixes. I am quite confident the fixes are correct, but there may still be another bug lurking somewhere.
To make it easier I have posted updated Debian Stretch beta packages (both trunk and v1.0.4): http://xpra.org/beta.

Changed 3 weeks ago by psycho_zs

Attachment: encoder_error.log added

comment:11 Changed 3 weeks ago by psycho_zs

no crash yet, local start works.
But server drains CPU when client is connected, attached new error spam.

comment:12 Changed 3 weeks ago by Antoine Martin

Resolution: fixed
Status: newclosed

@psycho_zs: great news - thank you, the error you are seeing is unrelated to the crash (was caused by r15205, fixed in r15222)

I am closing this ticket for now, but feel free to re-open it.
I will still look into this as part of #1430.

Note: the only way you can hit this codepath is if you have opengl client rendering disabled... which is not recommended.

comment:13 Changed 3 weeks ago by psycho_zs

With opengl on I have redraw issues on window resize (just re-checked on Intel, mesa 17.0.0), so I leave that disabled. Will see if I'm able to catch the crash again.

Note: See TracTickets for help on using tickets.