Xpra: Ticket #352: Killing server with SIGINT doesn't remove socket, doesn't kill Xvfb/Xdummy

Hello,

when killing the server with SIGINT, the socket isn't removed, so a next run of the server does the following:

/home/arthur/.xpra/Chani-1 is not responding, waiting for it to timeout before clearing it.....

It also doesn't kill X, so the Xdummy process remains, and the server doesn't start.

Trimmed output when killed with control-C:

got signal SIGINT, exiting
Tray.cleanup()
Tray.cleanup() done
failed to release dbus notification forwarder: \
    org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \
    Possible causes include: the remote application did not send a reply, \
    the message bus security policy blocked the reply, \
    the reply timeout expired, or the network connection was broken.
cleanup will disconnect: []


Wed, 05 Jun 2013 16:04:16 GMT - Antoine Martin: owner, status, description changed; milestone set


Thu, 06 Jun 2013 06:13:40 GMT - Antoine Martin: owner, status, description changed

Cannot reproduce.


This is what I see when I use "xpra stop" to stop the session (this solution is preferred to SIGINT):

New connection received: SocketConnection(/home/antoine/.xpra/desktop-10)
connection closed after 0 packets received (0 bytes) and 0 packets sent (0 bytes)
Connection lost
Handshake complete; enabling connection
Python/GObject Linux client version 0.10.0 connected from 'desktop'
windows/pixels forwarding is disabled for this client
max client resolution is 0x0 (from []), current server resolution is 2560x1600
Shutting down in response to request
Disconnecting existing client Protocol(SocketConnection(/home/antoine/.xpra/desktop-10)), reason is: shutting down
connection closed after 2 packets received (577 bytes) and 2 packets sent (1101 bytes)
xpra client disconnected.
Connection lost
Connection lost
New connection received: SocketConnection(/home/antoine/.xpra/desktop-10)
connection closed after 0 packets received (0 bytes) and 0 packets sent (0 bytes)
Connection lost
xpra is terminating.
closing tcp socket 0.0.0.0:10000
removing socket /home/antoine/.xpra/desktop-10
killing xvfb with pid 9346
Server terminated successfully (0). Closing log file.

TIL:

removing socket /home/antoine/.xpra/desktop-10

And now with killall -SIGINT xpra:

got signal SIGINT, exiting
xpra is terminating.
closing tcp socket 0.0.0.0:10000
removing socket /home/antoine/.xpra/desktop-10
killing xvfb with pid 9939
Server terminated successfully (0). Closing log file.

This also worked fine. Using SIGTERM has the same effect. I've tried both with and without --no-daemon.

And now with control-C from the controlling terminal:

^C
got signal SIGINT, exiting
child 'xterm' with pid 10181 has terminated
xpra is terminating.
closing tcp socket 0.0.0.0:10000
removing socket /home/antoine/.xpra/desktop-10
killing xvfb with pid 10172
Server terminated successfully (0). Closing log file.

The only way that I can get the server to not cleanup its socket is to use SIGKILL or to kill the vfb from underneath it, which is not something we can handle gracefully anyway, so just don't do that.

Here is the command line I have used for most of this testing:

dbus-launch xpra start :10 --no-daemon --no-pulseaudio

(adding/removing dbus and pulseaudio does not seem to make any difference) I am only adding dbus-launch because of the warning in your logs, looks to me like you are running from a desktop session and should either be using dbus-launch (if you aren't already) or --no-notifications.


Thu, 06 Jun 2013 06:29:03 GMT - ahuillet:

Yes, I need --no-notifications which I forgot to add on the commandline. I don't think that changes my problem, which I can see both on Archlinux and CentOS 6.2.


Thu, 06 Jun 2013 15:28:14 GMT - Antoine Martin:

OK, I simply cannot reproduce with Fedora 19 (python2.7) but with CentOS 6.4 I can (maybe related to the old version of python?), about one in 10 attempts fails to run the cleanups on exit (sometimes repeatedly, sometimes impossible to reproduce - I cannot discern any patterns here yet).


Do you have steps that make it more likely to trigger this bug? (hard fix without being to reproduce reliably, even harder to confirm a fix should I find one to test) I've tried with a client connected and without, with apps and without, etc.. Is trunk also affected?


When we exit on a "deadly signal", we schedule the actual quit() call to run from a timer soon after (0.5s in trunk), so that we still get a chance to run the cleanups that need a working main loop to complete or just a bit of time to do what they needed (sending a "shutdown" packet to clients).


Please consider using xpra stop instead of SIGINT, as this does not seem too be affected.


Fri, 07 Jun 2013 14:16:44 GMT - Antoine Martin:

Does this patch help:

### Eclipse Workspace Patch 1.0
#P Xpra
Index: src/xpra/scripts/server.py
===================================================================
--- src/xpra/scripts/server.py	(revision 3587)
+++ src/xpra/scripts/server.py	(working copy)
@@ -396,7 +396,8 @@
             if os.name=="posix":
                 os.setsid()
         try:
-            xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True, preexec_fn=setsid)
+            xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True,
+                                    stdin=subprocess.PIPE, preexec_fn=setsid)
         except OSError, e:
             sys.stderr.write("Error starting Xvfb: %s\n" % (e,))
             return  1

According to the answers to this question, it isn't easy to ensure that the subprocess will not be receiving the SIGINT - this was the easiest way I could find. More good pointers:

Does this solve the problem?


Sat, 08 Jun 2013 05:16:47 GMT - Antoine Martin:

Added to trunk as of r3597 as this seems to help us exit cleanly with XShm enabled.


Mon, 17 Jun 2013 12:26:43 GMT - Antoine Martin:

Also doing the same thing for pulseaudio and children processes (--start-child=) in r3651


Fri, 12 Jul 2013 08:14:12 GMT - ahuillet:

Dbus seems to be to blame. I don't use Dbus at all. With --no-notifications, the problem disappears, but without the option I believe the following message prevents Xpra from cleaning up properly.

2013-07-12 10:12:42,365 failed to release dbus notification forwarder: \
    org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \
        Possible causes include: the remote application did not send a reply, \
        the message bus security policy blocked the reply, the reply timeout expired, \
        or the network connection was broken.

This is a pity, I believe you should be able to detect when Dbus is not available and handle it gracefully, should you not?


Fri, 12 Jul 2013 08:20:10 GMT - Antoine Martin:

I don't see how we can fail to cleanup the notification forwarder without first having succeeded in loading it up, which even includes talking to the dbus daemon to claim the dbus name. Are you sure you don't have a dbus daemon? (maybe from your desktop session?) You may be able to verify the status of the forwarder with "xpra info" (not sure which versions support that) Or you can also run a simple notification program from an xterm to see if the notifications are being grabbed somewhere - if no dbus handler exists, the test should fail.


Specifying --no-notifications prevents loading of the forwarder altogether.


Without a dbus instance, you should see on startup:

error loading or registering our dbus notifications forwarder:
(..)

Fri, 12 Jul 2013 08:44:24 GMT - Antoine Martin:

Does r3841 help?


Fri, 12 Jul 2013 09:17:59 GMT - ahuillet:

It does not (with ctrl-C)

Output:

2013-07-12 11:16:51,595 xpra is ready.
^C2013-07-12 11:16:52,625
2013-07-12 11:16:52,626 got signal SIGINT, exiting

Then when restarting:

/home/arthur/.xpra/Gurney-1 is not responding, waiting for it to timeout before clearing it.....
(EE)
Fatal server error:
(EE) Server is already active for display 1
        If this server is no longer running, remove /tmp/.X1-lock

(because Xpra didn't kill the server)


Fri, 12 Jul 2013 09:18:33 GMT - ahuillet:

With xpra stop, with or without r3841, killing seems to be done properly.


Fri, 12 Jul 2013 11:04:45 GMT - Antoine Martin:

Please confirm if r3841 helps with ctrl-C so that I know if it helps at all. (and if not, maybe take it out)


Fri, 12 Jul 2013 11:08:22 GMT - ahuillet:

No, it does not help, see first line of comment 11.


Wed, 07 Aug 2013 08:25:47 GMT - krlmlr:

I'm seeing similar issues -- lots of zombie Xvfb processes. I also use Ctrl+C to terminate xpra, the Xvfb terminates properly when I ask to xpra stop.

Debian Lenny, xpra 0.9.0, Python 2.6.6 (marked as outdated in the xpra log: Warning: outdated/buggy version of Python: 2.6.6.final.0)


Wed, 07 Aug 2013 08:31:16 GMT - Antoine Martin:

Please, do not use 0.9.0! It is very buggy, the latest stable 0.9.x release is 0.9.8.


As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.


Wed, 07 Aug 2013 14:55:54 GMT - krlmlr:

Replying to totaam:

Please, do not use 0.9.0! It is very buggy, the latest stable 0.9.x release is 0.9.8.

Sorry, it's not Lenny but Squeeze. Had to update GPG key for APT source winswitch.org. Now running 0.9.6-1, same behavior. (Why is 0.9.8-1 not available from winswitch.org for Squeeze but for Raring)?

As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.

Python 2.7 will be available after upgrading the server to Wheezy. I can live with an occasional killall Xvfb until then.


Wed, 07 Aug 2013 15:00:36 GMT - Antoine Martin: status changed; resolution set

Had to update GPG key for APT source winswitch.org


See KEYEXPIRED 1273837137 in wiki/FAQ

Why is 0.9.8-1 not available from winswitch.org for Squeeze but for Raring

I stopped making packages for squeeze as it was becoming far too difficult to build anything with so many outdated packages in the distro.

Python 2.7 will be available after upgrading the server to Wheezy

Wheezy gets a lot more testing - and has up to date packages.

I am closing this ticket as FIXED, feel free to re-open if you see this problem with up to date packages.


Sat, 23 Jan 2021 04:52:30 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/352