xpra icon
Bug tracker and wiki

Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#352 closed defect (fixed)

Killing server with SIGINT doesn't remove socket, doesn't kill Xvfb/Xdummy

Reported by: ahuillet Owned by: ahuillet
Priority: major Milestone: 0.10
Component: server Version:
Keywords: Cc:

Description (last modified by Antoine Martin)

Hello,

when killing the server with SIGINT, the socket isn't removed, so a next run of the server does the following:

/home/arthur/.xpra/Chani-1 is not responding, waiting for it to timeout before clearing it.....

It also doesn't kill X, so the Xdummy process remains, and the server doesn't start.

Trimmed output when killed with control-C:

got signal SIGINT, exiting
Tray.cleanup()
Tray.cleanup() done
failed to release dbus notification forwarder: \
    org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \
    Possible causes include: the remote application did not send a reply, \
    the message bus security policy blocked the reply, \
    the reply timeout expired, or the network connection was broken.
cleanup will disconnect: []

Change History (18)

comment:1 Changed 5 years ago by Antoine Martin

Description: modified (diff)
Milestone: 0.10
Owner: changed from Antoine Martin to Antoine Martin
Status: newassigned

comment:2 Changed 5 years ago by Antoine Martin

Description: modified (diff)
Owner: changed from Antoine Martin to ahuillet
Status: assignednew

Cannot reproduce.


This is what I see when I use "xpra stop" to stop the session (this solution is preferred to SIGINT):

New connection received: SocketConnection(/home/antoine/.xpra/desktop-10)
connection closed after 0 packets received (0 bytes) and 0 packets sent (0 bytes)
Connection lost
Handshake complete; enabling connection
Python/GObject Linux client version 0.10.0 connected from 'desktop'
windows/pixels forwarding is disabled for this client
max client resolution is 0x0 (from []), current server resolution is 2560x1600
Shutting down in response to request
Disconnecting existing client Protocol(SocketConnection(/home/antoine/.xpra/desktop-10)), reason is: shutting down
connection closed after 2 packets received (577 bytes) and 2 packets sent (1101 bytes)
xpra client disconnected.
Connection lost
Connection lost
New connection received: SocketConnection(/home/antoine/.xpra/desktop-10)
connection closed after 0 packets received (0 bytes) and 0 packets sent (0 bytes)
Connection lost
xpra is terminating.
closing tcp socket 0.0.0.0:10000
removing socket /home/antoine/.xpra/desktop-10
killing xvfb with pid 9346
Server terminated successfully (0). Closing log file.

TIL:

removing socket /home/antoine/.xpra/desktop-10

And now with killall -SIGINT xpra:

got signal SIGINT, exiting
xpra is terminating.
closing tcp socket 0.0.0.0:10000
removing socket /home/antoine/.xpra/desktop-10
killing xvfb with pid 9939
Server terminated successfully (0). Closing log file.

This also worked fine. Using SIGTERM has the same effect.
I've tried both with and without --no-daemon.

And now with control-C from the controlling terminal:

^C
got signal SIGINT, exiting
child 'xterm' with pid 10181 has terminated
xpra is terminating.
closing tcp socket 0.0.0.0:10000
removing socket /home/antoine/.xpra/desktop-10
killing xvfb with pid 10172
Server terminated successfully (0). Closing log file.

The only way that I can get the server to not cleanup its socket is to use SIGKILL or to kill the vfb from underneath it, which is not something we can handle gracefully anyway, so just don't do that.

Here is the command line I have used for most of this testing:

dbus-launch xpra start :10 --no-daemon --no-pulseaudio

(adding/removing dbus and pulseaudio does not seem to make any difference)
I am only adding dbus-launch because of the warning in your logs, looks to me like you are running from a desktop session and should either be using dbus-launch (if you aren't already) or --no-notifications.

comment:3 Changed 5 years ago by ahuillet

Yes, I need --no-notifications which I forgot to add on the commandline. I don't think that changes my problem, which I can see both on Archlinux and CentOS 6.2.

comment:4 Changed 5 years ago by Antoine Martin

OK, I simply cannot reproduce with Fedora 19 (python2.7) but with CentOS 6.4 I can (maybe related to the old version of python?), about one in 10 attempts fails to run the cleanups on exit (sometimes repeatedly, sometimes impossible to reproduce - I cannot discern any patterns here yet).


Do you have steps that make it more likely to trigger this bug? (hard fix without being to reproduce reliably, even harder to confirm a fix should I find one to test)
I've tried with a client connected and without, with apps and without, etc..
Is trunk also affected?


When we exit on a "deadly signal", we schedule the actual quit() call to run from a timer soon after (0.5s in trunk), so that we still get a chance to run the cleanups that need a working main loop to complete or just a bit of time to do what they needed (sending a "shutdown" packet to clients).


Please consider using xpra stop instead of SIGINT, as this does not seem too be affected.

comment:5 Changed 5 years ago by Antoine Martin

Does this patch help:

### Eclipse Workspace Patch 1.0
#P Xpra
Index: src/xpra/scripts/server.py
===================================================================
--- src/xpra/scripts/server.py	(revision 3587)
+++ src/xpra/scripts/server.py	(working copy)
@@ -396,7 +396,8 @@
             if os.name=="posix":
                 os.setsid()
         try:
-            xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True, preexec_fn=setsid)
+            xvfb = subprocess.Popen(xvfb_cmd+[display_name], executable=xvfb_executable, close_fds=True,
+                                    stdin=subprocess.PIPE, preexec_fn=setsid)
         except OSError, e:
             sys.stderr.write("Error starting Xvfb: %s\n" % (e,))
             return  1

According to the answers to this question, it isn't easy to ensure that the subprocess will not be receiving the SIGINT - this was the easiest way I could find.
More good pointers:

Does this solve the problem?

Last edited 5 years ago by Antoine Martin (previous) (diff)

comment:6 Changed 5 years ago by Antoine Martin

Added to trunk as of r3597 as this seems to help us exit cleanly with XShm enabled.

comment:7 Changed 5 years ago by Antoine Martin

Also doing the same thing for pulseaudio and children processes (--start-child=) in r3651

comment:8 Changed 5 years ago by ahuillet

Dbus seems to be to blame. I don't use Dbus at all.
With --no-notifications, the problem disappears, but without the option I believe the following message prevents Xpra from cleaning up properly.

2013-07-12 10:12:42,365 failed to release dbus notification forwarder: \
    org.freedesktop.DBus.Error.NoReply: Did not receive a reply. \
        Possible causes include: the remote application did not send a reply, \
        the message bus security policy blocked the reply, the reply timeout expired, \
        or the network connection was broken.

This is a pity, I believe you should be able to detect when Dbus is not available and handle it gracefully, should you not?

Last edited 5 years ago by Antoine Martin (previous) (diff)

comment:9 Changed 5 years ago by Antoine Martin

I don't see how we can fail to cleanup the notification forwarder without first having succeeded in loading it up, which even includes talking to the dbus daemon to claim the dbus name. Are you sure you don't have a dbus daemon? (maybe from your desktop session?)
You may be able to verify the status of the forwarder with "xpra info" (not sure which versions support that)
Or you can also run a simple notification program from an xterm to see if the notifications are being grabbed somewhere - if no dbus handler exists, the test should fail.


Specifying --no-notifications prevents loading of the forwarder altogether.


Without a dbus instance, you should see on startup:

error loading or registering our dbus notifications forwarder:
(..)
Last edited 5 years ago by Antoine Martin (previous) (diff)

comment:10 Changed 5 years ago by Antoine Martin

Does r3841 help?

comment:11 Changed 5 years ago by ahuillet

It does not (with ctrl-C)

Output:

2013-07-12 11:16:51,595 xpra is ready.
^C2013-07-12 11:16:52,625 
2013-07-12 11:16:52,626 got signal SIGINT, exiting

Then when restarting:

/home/arthur/.xpra/Gurney-1 is not responding, waiting for it to timeout before clearing it.....
(EE) 
Fatal server error:
(EE) Server is already active for display 1
        If this server is no longer running, remove /tmp/.X1-lock

(because Xpra didn't kill the server)

Last edited 5 years ago by ahuillet (previous) (diff)

comment:12 Changed 5 years ago by ahuillet

With xpra stop, with or without r3841, killing seems to be done properly.

comment:13 Changed 5 years ago by Antoine Martin

Please confirm if r3841 helps with ctrl-C so that I know if it helps at all.
(and if not, maybe take it out)

comment:14 Changed 5 years ago by ahuillet

No, it does not help, see first line of comment 11.

comment:15 Changed 5 years ago by krlmlr

I'm seeing similar issues -- lots of zombie Xvfb processes. I also use Ctrl+C to terminate xpra, the Xvfb terminates properly when I ask to xpra stop.

Debian Lenny, xpra 0.9.0, Python 2.6.6 (marked as outdated in the xpra log: Warning: outdated/buggy version of Python: 2.6.6.final.0)

comment:16 Changed 5 years ago by Antoine Martin

Please, do not use 0.9.0! It is very buggy, the latest stable 0.9.x release is 0.9.8.


As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.

comment:17 in reply to:  16 Changed 5 years ago by krlmlr

Replying to totaam:

Please, do not use 0.9.0! It is very buggy, the latest stable 0.9.x release is 0.9.8.

Sorry, it's not Lenny but Squeeze. Had to update GPG key for APT source winswitch.org. Now running 0.9.6-1, same behavior. (Why is 0.9.8-1 not available from winswitch.org for Squeeze but for Raring)?

As for this particular issue: your version of Python is outdated, using a more up to date one should fix this issue.

Python 2.7 will be available after upgrading the server to Wheezy. I can live with an occasional killall Xvfb until then.

comment:18 Changed 5 years ago by Antoine Martin

Resolution: fixed
Status: newclosed

Had to update GPG key for APT source winswitch.org


See KEYEXPIRED 1273837137 in wiki/FAQ

Why is 0.9.8-1 not available from winswitch.org for Squeeze but for Raring

I stopped making packages for squeeze as it was becoming far too difficult to build anything with so many outdated packages in the distro.

Python 2.7 will be available after upgrading the server to Wheezy

Wheezy gets a lot more testing - and has up to date packages.

I am closing this ticket as FIXED, feel free to re-open if you see this problem with up to date packages.

Note: See TracTickets for help on using tickets.