xpra icon
Bug tracker and wiki

Opened 3 months ago

Closed 3 months ago

Last modified 3 months ago

#2184 closed defect (fixed)

remote start is broken with python3

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: critical Milestone: 2.5
Component: core Version: 2.4.x
Keywords: Cc:

Description

Probably following the fix in r21923.

Running a remote start just prints:

$ python2 /usr/bin/xpra start ssh://localhost --start=xterm --mmap=no 
Xpra GTK2 client version 2.5-r21904M 64-bit
 running on Linux Fedora 29 TwentyNine
 window manager is 'GNOME Shell'
GStreamer version 1.14.4 for Python 2.7.15 64-bit
OpenGL_accelerate module loaded
Using accelerated ArrayDatatype
OpenGL enabled with GeForce GTX 1070/PCIe/SSE2
Connected (version 2.0, client OpenSSH_7.9)
please enter the passphrase for XXXXXXXXXX/.ssh/id_rsa:
Authentication (publickey) successful!
 keyboard settings: rules=evdev, model=pc105, layout=gb,us,gb
 desktop size is 3840x2160 with 1 screen:
  :1.0 (1016x572 mm - DPI: 96x95) workarea: 3840x2126 at 0x34
    monitor 2 (708x398 mm - DPI: 137x137)
2
Connection lost

Strange how the displayfd used by the proxy start ends up showing on the client stdout..

Change History (6)

comment:1 Changed 3 months ago by Antoine Martin

Status: newassigned

Blocker for #2180.

comment:2 Changed 3 months ago by Antoine Martin

The proxy command is:

$XDG_RUNTIME_DIR/xpra/run-xpra "_proxy_start" "--start=xterm" "--mmap=no" \
"--encodings=h264,vp9,vp8,mpeg4,mpeg4+mp4,h264+mp4,vp8+webm,vp9+webm,png,png/P,png/L,webp,rgb,rgb24,rgb32,jpeg,h265,jpeg2000,mpeg1,mpeg2"

And the debug output shows:

read_parse_thread_loop starting
check_server_echo(0) last=True, server_ok=True (last_ping_echoed_time=0)
process_gibberish(['gibberish', "invalid packet header byte 2\n: '320a' read buffer='2\\n' (2 bytes)", '2\n'])
2
GTKXpraClient.quit(9) current exit_code=None

So we're expecting to talk to the new xpra server, but instead we get the display number.

Running that proxy command by hand:

$XDG_RUNTIME_DIR/xpra/run-xpra "_proxy_start" "--start=xterm" "--mmap=no" -d all

Shows (edited):

start_server_subprocess: command='/usr/bin/xpra', 'start', '--mmap=no', '--debug=all', \
    '--start=xterm', '--env=XPRA_PROXY_START_UUID=9d3e30bc981444f69e2e302d1a8299cd', '--daemon=yes', '--systemd-run=no', '--displayfd=5'
Entering daemon mode; any further errors will be reported to:
  /run/user/1000/xpra/S27964.log
read_displayfd(4, 19, None)=b''
Error: displayfd failed
 did not provide a display number using displayfd
(..)
read_displayfd(4, 19, None)=b''
Error: displayfd failed
 did not provide a display number using displayfd
(..)

The server log shows:

read_displayfd(7, 17, None)=b'2\n'
Using display number provided by /usr/libexec/Xorg: :2
xvfb process=<subprocess.Popen object at 0x7fcfefbbf668>
display_name=:2
writing display='2' to displayfd=5
write_displayfd(5, 2, 9) written 2 bytes, remains 0
Last edited 3 months ago by Antoine Martin (previous) (diff)

comment:3 Changed 3 months ago by Antoine Martin

First, r21929 fixes a python3 compatibility bug (chars are now integers..)

With extra debug logging added, python2 shows:

subprocess display pipes: (3, 4)
start_server_subprocess: command='/usr/bin/xpra', 'start', '--start=xterm', '--mmap=no', '--env=XPRA_PROXY_START_UUID=95710b03bdf442ce9dbee8290c7c4a21', '--daemon=yes', '--systemd-run=no', '--displayfd=4'
proc=<subprocess.Popen object at 0x7f66eabd76d0>
read_displayfd(3, 20, None)
Entering daemon mode; any further errors will be reported to:
  /run/user/1000/xpra/S20413.log
r=[3]
read=1

Actual display used: :1
Actual log file name is now: /run/user/1000/xpra/:1.log
disconnect: login timeout

So we create a pipe pair as (3, 4) and read back the display "1" from fd=3.

But with python3:

subprocess display pipes: (3, 4)
start_server_subprocess: command='/usr/bin/xpra', 'start', '--mmap=no', '--start=xterm', '--env=XPRA_PROXY_START_UUID=51c80023bdfb473aa1c06d98c43a6681', '--daemon=yes', '--systemd-run=no', '--displayfd=4'
proc=<subprocess.Popen object at 0x7f1f8cf3de48>
read_displayfd(3, 20, None)
Entering daemon mode; any further errors will be reported to:
  /run/user/1000/xpra/S25491.log
1
Actual display used: :1
Actual log file name is now: /run/user/1000/xpra/:1.log
r=[]
Error: displayfd failed
 did not provide a display number using displayfd

The select call never sees a readable pipe.

comment:4 Changed 3 months ago by Antoine Martin

Forcing the subprocess to run as a python2 server shows:

2019-03-02 14:49:00,330 write_displayfd failed
Traceback (most recent call last):
  File "/usr/lib64/python3.7/site-packages/xpra/scripts/server.py", line 877, in do_run_server
    assert write_displayfd(displayfd, display), "timeout"
  File "/usr/lib64/python3.7/site-packages/xpra/platform/displayfd.py", line 32, in write_displayfd
    count = os.write(w_pipe, buf)
BrokenPipeError: [Errno 32] Broken pipe
2019-03-02 14:49:00,333 Error: failed to write ':2' to fd=4
2019-03-02 14:49:00,334  [Errno 32] Broken pipe

So it looks like the problem comes from the parent process then.

Also, modifying the server to write a different display number from the one that it received from Xorg's displayfd shows that this is the value we see mixed up in the parent's output. Why?

Looks like PEP 446 -- Make newly created file descriptors non-inheritable.
I thought that pipes would be safe: Python 3.3 creates inheritable file descriptors on all platforms, except os.pipe() which creates non-inheritable file descriptors on Windows.
But no, we now have to explictly allow them.

Fixes:

  • r21930 more resilient code
  • r21931 makes the pipe fds inheritable
  • r21932 better debug logging

So this particular bug is fixed.

Still TODO here:

  • install signal handlers so we can control-c the _proxy_start python3 process
  • _shadow_start should be called _proxy_shadow_start (keep older name for backwards compatibility)
  • avoid logging these errors when the socket / pipe is closed or closing:
    Warning: failed to query SO_BROADCAST, SO_RCVLOWAT, SO_DONTROUTE, SO_ERROR, SO_KEEPALIVE, SO_LINGER, SO_OOBINLINE, SO_RCVBUF, SO_RCVTIMEO, SO_REUSEADDR, SO_REUSEPORT, SO_SNDBUF, SO_SNDTIMEO, SO_TYPE
    
Last edited 3 months ago by Antoine Martin (previous) (diff)

comment:5 Changed 3 months ago by Antoine Martin

Resolution: fixed
Status: assignedclosed
  • r21933 fixes the signal handling, we now force exit and return a different exit code (r21935)
  • r21934 more consistent sub-command naming
  • not fixing the small warning issue

comment:6 Changed 3 months ago by Antoine Martin

Summary: remote start is brokenremote start is broken with python3

See also #2185

Last edited 3 months ago by Antoine Martin (previous) (diff)
Note: See TracTickets for help on using tickets.