xpra icon
Bug tracker and wiki

Opened 5 weeks ago

Last modified 4 hours ago

#1447 new defect

Server does not accept initial connection on remote start.

Reported by: psycho_zs Owned by: psycho_zs
Priority: major Milestone: 2.1
Component: server Version: trunk
Keywords: Cc:

Description

Starting xpra with xpra start ssh:....

Server on remote machine is started, but this is being spammed continiously into the log:

2017-02-21 21:37:05,892 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:05,991 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:05,992 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:05,993 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,092 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,093 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,094 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,193 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,194 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,194 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,294 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,295 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,296 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,396 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,396 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,397 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,496 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,497 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,498 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,597 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,598 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,599 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,699 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,699 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,700 New unix-domain connection received on /var/run/xpra/hostname-100
2017-02-21 21:37:06,800 New unix-domain connection received on /var/run/user/1000/xpra/hostname-100
2017-02-21 21:37:06,800 New unix-domain connection received on /home/username/.xpra/hostname-100
2017-02-21 21:37:06,802 New unix-domain connection received on /var/run/xpra/hostname-100

Client soon exits, log spamming continues until I connect with xpra attach.
Then server's log is truncated and replaced with this message:

(EE) 
Fatal server error:
(EE) Server is already active for display 100
	If this server is no longer running, remove /tmp/.X100-lock
	and start again.
(EE) 
(EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
(EE) 
2017-02-21 21:37:21,079 
2017-02-21 21:37:21,079 Xvfb command has terminated! xpra cannot continue
2017-02-21 21:37:21,079  if the display is already running, try a different one,
2017-02-21 21:37:21,079  or use the --use-display flag
2017-02-21 21:37:21,079 
2017-02-21 21:37:21,080 killing xvfb with pid 14651
2017-02-21 21:37:21,080 failed to kill xvfb process with pid 14651:
2017-02-21 21:37:21,080  [Errno 3] No such process

Attachments (2)

server.log (28.3 KB) - added by psycho_zs 4 weeks ago.
client.log (1.2 KB) - added by psycho_zs 4 weeks ago.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 5 weeks ago by Antoine Martin

Milestone: 2.0
Owner: changed from Antoine Martin to psycho_zs

I did find a bug, fixed in r15133, but this would have caused the client side to also emit a large number of warning messages to stdout - did you not get those?

Please close if that fixes things.

comment:2 Changed 4 weeks ago by psycho_zs

Resolution: fixed
Status: newclosed

r15180 works well on remote start, no log spam on both ends.

Changed 4 weeks ago by psycho_zs

Attachment: server.log added

Changed 4 weeks ago by psycho_zs

Attachment: client.log added

comment:3 Changed 4 weeks ago by psycho_zs

Resolution: fixed
Status: closedreopened

If server has leftovers from previous run, client can not initiate remote start and same log spam happens on server.
Attached server log and client output.

comment:4 Changed 4 weeks ago by Antoine Martin

I'm not sure what you mean by "has leftovers from previous run", or how I would reproduce this.

Assuming that you hit the case where a server was already running on the display that you tried to start remotely, one could argue that the code is working as intended by not matching this server, but r15190 will now allow us to match it and connect to it.
(could be backported I guess - not sure that's necessarily right)

comment:5 Changed 4 weeks ago by psycho_zs

Sorry, I'm speaking about sockets:

/var/run/user/1000/xpra/server-100 is not responding, waiting for it to timeout before clearing it.....
2017-02-28 21:09:05,310 created unix domain socket: /var/run/user/1000/xpra/server-100
/home/user/.xpra/server-100 is not responding, waiting for it to timeout before clearing it.....
2017-02-28 21:09:09,315 created unix domain socket: /home/user/.xpra/server-100
/var/run/xpra/server-100 is not responding, waiting for it to timeout before clearing it.....
2017-02-28 21:09:13,321 created unix domain socket: /var/run/xpra/server-100

Maybe client doesn't wait for this process to finish?

Last edited 4 weeks ago by psycho_zs (previous) (diff)

comment:6 Changed 4 weeks ago by Antoine Martin

Milestone: 2.02.1
Owner: changed from psycho_zs to Antoine Martin
Status: reopenednew

Those sockets are usually left behind when a server crashes.
Those messages were not in the log samples your had provided.

Fixing this will be harder, we must wait in case the server is still alive, and the client also needs to give up after waiting too long...
What we should do in this case is wait for all the sockets in parallel to save time.
This will be a bigger change, too late for this version.

comment:7 Changed 4 weeks ago by psycho_zs

I copied that snippet directly from the attached server.log )

Viewing server and client log live it seemed that client gives up just at the same time server finises dealing with sockets. So a dirty fix would be to give the client a couple more seconds of timeout.

I will try catching the crashes.

comment:8 Changed 4 weeks ago by Antoine Martin

I copied that snippet directly from the attached server.log )

Sorry, yes - I missed it!

comment:9 Changed 7 days ago by Antoine Martin

r15278 raised the socket timeout to 20s (r15300 for v1.0.x branch).
This will help until we can do parallel socket probing.

comment:10 Changed 2 days ago by Antoine Martin

Owner: changed from Antoine Martin to psycho_zs

It's not really done in parallel as that would be too hard to implement, but r15381 should make things more reliable.
I've struggled to test this properly because if I "kill -9" we correctly detect that the socket is dead, and if I just hang the server then the socket still looks alive..

@psycho_zs: does that work for you? how can I test hung sockets?

comment:11 Changed 2 days ago by psycho_zs

I was left with old sockets some time ago when xpra server crashed sometimes. Now it does not, so I do not know how to test this situation now.
I will test remote start in the next couple of days.

comment:12 Changed 20 hours ago by psycho_zs

In 2.0: killing all xpra-related processes with kill -9 leaves socket $HOME/.xpra/$SERVERNAME-$PORT. Trying to remote-start after that produces described behavior despite increased timeouts. It also seems that "New unix-domain connection received on..." message spam is continued to removed log with old file descriptor (seen if tailing log throughout the whole connection attempt).

I have troubles building current trunk, it does not seem to produce some files listed in xpra.install, like etc/xpra/xorg.conf, usr/lib/tmpfiles.d/xpra.conf. Would you add a fresh beta package?

comment:13 Changed 4 hours ago by Antoine Martin

  • found a way to do proper parallel socket probing using threads, implemented in r15421
  • the packaging issue with tmpfiles.d / sysusers.d files should be fixed in r15422 (see #1450 for details)

Older branches will probably not be patched - as I can't think of a way to make things better without changing too much code.

Note: See TracTickets for help on using tickets.