Starting xpra with xpra start ssh:...
.
Server on remote machine is started, but this is being spammed continiously into the log:
2017-02-21 21:37:05,892 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:05,991 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:05,992 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:05,993 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,092 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,093 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,094 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,193 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,194 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,194 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,294 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,295 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,296 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,396 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,396 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,397 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,496 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,497 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,498 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,597 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,598 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,599 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,699 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,699 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,700 New unix-domain connection received on /var/run/xpra/hostname-100 2017-02-21 21:37:06,800 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100 2017-02-21 21:37:06,800 New unix-domain connection received on /home/username/.xpra/hostname-100 2017-02-21 21:37:06,802 New unix-domain connection received on /var/run/xpra/hostname-100
Client soon exits, log spamming continues until I connect with xpra attach
.
Then server's log is truncated and replaced with this message:
(EE) Fatal server error: (EE) Server is already active for display 100 If this server is no longer running, remove /tmp/.X100-lock and start again. (EE) (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. (EE) 2017-02-21 21:37:21,079 2017-02-21 21:37:21,079 Xvfb command has terminated! xpra cannot continue 2017-02-21 21:37:21,079 if the display is already running, try a different one, 2017-02-21 21:37:21,079 or use the --use-display flag 2017-02-21 21:37:21,079 2017-02-21 21:37:21,080 killing xvfb with pid 14651 2017-02-21 21:37:21,080 failed to kill xvfb process with pid 14651: 2017-02-21 21:37:21,080 [Errno 3] No such process
I did find a bug, fixed in r15133, but this would have caused the client side to also emit a large number of warning messages to stdout - did you not get those?
Please close if that fixes things.
r15180 works well on remote start, no log spam on both ends.
If server has leftovers from previous run, client can not initiate remote start and same log spam happens on server. Attached server log and client output.
I'm not sure what you mean by "has leftovers from previous run", or how I would reproduce this.
Assuming that you hit the case where a server was already running on the display that you tried to start remotely, one could argue that the code is working as intended by not matching this server, but r15190 will now allow us to match it and connect to it. (could be backported I guess - not sure that's necessarily right)
Sorry, I'm speaking about sockets:
/var/run/user/1000/xpra/server-100 is not responding, waiting for it to timeout before clearing it..... 2017-02-28 21:09:05,310 created unix domain socket: /var/run/user/1000/xpra/server-100 /home/user/.xpra/server-100 is not responding, waiting for it to timeout before clearing it..... 2017-02-28 21:09:09,315 created unix domain socket: /home/user/.xpra/server-100 /var/run/xpra/server-100 is not responding, waiting for it to timeout before clearing it..... 2017-02-28 21:09:13,321 created unix domain socket: /var/run/xpra/server-100
Maybe client doesn't wait for this process to finish?
Those sockets are usually left behind when a server crashes. Those messages were not in the log samples your had provided.
Fixing this will be harder, we must wait in case the server is still alive, and the client also needs to give up after waiting too long... What we should do in this case is wait for all the sockets in parallel to save time. This will be a bigger change, too late for this version.
I copied that snippet directly from the attached server.log )
Viewing server and client log live it seemed that client gives up just at the same time server finises dealing with sockets. So a dirty fix would be to give the client a couple more seconds of timeout.
I will try catching the crashes.
I copied that snippet directly from the attached server.log )
Sorry, yes - I missed it!
r15278 raised the socket timeout to 20s (r15300 for v1.0.x branch). This will help until we can do parallel socket probing.
It's not really done in parallel as that would be too hard to implement, but r15381 should make things more reliable. I've struggled to test this properly because if I "kill -9" we correctly detect that the socket is dead, and if I just hang the server then the socket still looks alive..
@psycho_zs: does that work for you? how can I test hung sockets?
I was left with old sockets some time ago when xpra server crashed sometimes. Now it does not, so I do not know how to test this situation now. I will test remote start in the next couple of days.
In 2.0: killing all xpra-related processes with kill -9
leaves socket $HOME/.xpra/$SERVERNAME-$PORT. Trying to remote-start after that produces described behavior despite increased timeouts. It also seems that "New unix-domain connection received on..." message spam is continued to removed log with old file descriptor (seen if tailing log throughout the whole connection attempt).
I have troubles building current trunk, it does not seem to produce some files listed in xpra.install, like etc/xpra/xorg.conf
, usr/lib/tmpfiles.d/xpra.conf
. Would you add a fresh beta package?
Older branches will probably not be patched - as I can't think of a way to make things better without changing too much code.
r15425 seems to be working even with leftover sockets. Although it failed once on the first test, right after upgrade. (I killed xpra processes before upgrading).
Closing, feel free to re-open if you can give steps to reproduce problems with trunk.
I reproduced the situation in which client again does not wait long enough to connect. Xpra server did not survive dbus restart after recent system upgrade, leaving behind xpra _proxy_start :100
process and and X. So I killed everything with killall -u username
and did remote start via xpra start ssh:...
. Server waited for 3 sockets, client disconnected just before server finished waiting on them:
Warning: some of the sockets are in an unknown state: /var/run/user/1000/xpra/servername-100 /home/username/.xpra/servername-100 /var/run/xpra/servername-100
Full server log in attachment...
Sorry but I cannot reproduce this. Please try to provide specific steps.
With new initialization mechanism are there any potential race conditions left?
I was able to reproduce it the second time: On server with running xpra restart dbus with related daemons
systemctl restart dbus ; systemctl daemon-reexec ; systemctl restart systemd-logind systemd-journald NetworkManager
kill all user processes:
killall -u $USER
Try to do xpra start ssh:...
Client gives up and disconnects somewhere around the time server prints "created unix domain socket" messages. Attach works a moment later.
Timeouts are inherently racy. It should be enough time for the server to start, even when waiting and then clearing the sockets, but there's no way we can be sure of that. And if the server is quite slow, it won't be enough. (add to that the ssh login delay) Raising the client timeout indefinitely is not a good solution either. It is already at 15 seconds, which is quite a long time. Are you sure your 2.1 server takes that long to start? (are you even testing with 2.1?)
No need to run "xpra shadow ssh:..." which just complicates things. Just kill it and watch it start with "xpra shadow :0 --no-daemon" - see how long that takes. The output should include timestamps, please post that information. (or the server log file if you wish) Without hard data, it is difficult to know where to go.
Example on my relatively beefy system:
2017-04-03 22:53:23,078 created unix domain socket: /run/user/1000/xpra/desktop-10 2017-04-03 22:53:23,078 created unix domain socket: /home/antoine/.xpra/desktop-10 2017-04-03 22:53:23,079 created unix domain socket: /var/run/xpra/desktop-10 (..) 2017-04-03 22:53:24,038 xpra is ready.
So that's 1 second from the socket creation until fully ready. (and the socket timeout + clearing should be around 8 seconds maximum)
See server_3sockets.log, it is the server log from the similar situation as described in my previous comment. 2.1-20170326r15425-1
See server_3sockets.log, it is the server log from the similar situation as described in my previous comment.
That log is against a 2.1-r15340 server, therefore missing any of the fixes from comment:10 onwards. We want the timing information of the current code.
I've just rechecked, the package version is 2.1-20170326r15425-1, but the log reports 2.1-r15340. OK, I'll try a fresher one.
r15494 package matches. Attaching log... Remote start after dbus restart and killed user processes worked. (But there were no substantial system upgrade preceding it).
OK, so in this log we can see:
2017-04-03 20:44:01,438 Warning: some of the sockets are in an unknown state: 19 2017-04-03 20:44:01,438 /var/run/user/1000/xpra/servername-100 20 2017-04-03 20:44:01,438 /home/username/.xpra/servername-100 21 2017-04-03 20:44:01,439 /var/run/xpra/servername-100 22 2017-04-03 20:44:01,439 please wait as we allow the socket probing to timeout 23 2017-04-03 20:44:09,464 created unix domain socket: /var/run/user/1000/xpra/servername-100 24 2017-04-03 20:44:09,465 created unix domain socket: /home/username/.xpra/servername-100 25 2017-04-03 20:44:09,465 created unix domain socket: /var/run/xpra/servername-100 (..) 41 2017-04-03 20:44:12,034 xpra is ready.
So that's:
For a total of 11 seconds, which is well under the client timeout of 15 seconds.. r15508 reduces from 8 to just 6 seconds, which should still be enough to distinguish between a server that is starting up (3 seconds in your case) and one that is just dead.
@psycho_zs: can I close this ticket or can you reproduce with r15508 or later?
Thanks. I'll try to check in the next couple of days. Busy month.
Xpra from latest svn can not start server
xpra start :100 -d all using systemd-run to wrap 'start' server command 'systemd-run' '--description' 'xpra-start' '--scope' '--user' '/usr/bin/xpra' 'start' ':100' '-d' 'all' '--systemd-run=no' Failed to create bus connection: No such file or directory
Latest deb build 2.1-20170409r15542-1 seem to be working, at least it connected successfully with leftover sockets on server a couple of times. First timestamp visible in client output: 10:19:10 First timestamp visible in server log: 10:19:13 "Xpra is ready" message on server: 10:19:21 Handshake complete: 10:19:22
The timeout issue is fixed so I am closing this bug.
If you are having systemd startup problems, please file a separate ticket.
The exact systemd-run
command we use is printed out, if that fails then it is quite unlikely to be an xpra bug.
Saw something similar in ticket:1537#comment:3.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1447