Xpra: Ticket #1105: systemd multi seat support

See also #1129.

The seat with vt number may be challenging since we don't have a vt number...



Sun, 29 May 2016 03:14:21 GMT - Antoine Martin: status changed

Enlightening thread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZNQW72UP36UAFMX53HPFFQTWTQDZVJ3M/: systemd-logind will now by default terminate user processes that are part of the user session scope unit. PITA for us.

Debian ticket: systemd kill background processes after user logs out

https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples, see Start screen as a user service.

PAM support in tmux uses pam_start + pam_open_session - would this be enough?


Wed, 01 Jun 2016 10:39:08 GMT - Antoine Martin: description changed

r12722 (+fixes in r12756 + r12761 + r12753 for osx) adds an "xpra" pam service so we can call pam_open_session early (before daemonizing) when starting a server. The pam_systemd module should also ensure that the directories are present for #1129. We'll see if this is enough to prevent us from getting killed.

See also: Is linux-PAM session same as linux process session?: The short answer is no, they're different things, but processes that handle login sessions should handle both of them. We're not a login session per-se, but as close as can be.

systemd-devel: The whole su/pkexec session debate: This way, screen will keep an "active" reference to the session and systemd-logind will not mark it as "closing". So the session that was nitiated by sshd will be kept open by "screen". Note that pam_open_session() without pam_authenticate() will *not* create a new session but only attach to the current session.


Wed, 01 Jun 2016 10:54:58 GMT - Antoine Martin: owner, status changed

Wait, as per https://lists.freedesktop.org/archives/systemd-devel/2013-December/014996.html: The session is still marked as "closing" but because processes still exist it never quite dies. And yes, the kill processes option (which is a nice thing to enable if possible) would indeed kill the screen.

@jonathan.underwood: How on earth are we supposed to fix this thing? We don't want or need root, just tell logind to move the process into its own session.


Sat, 18 Jun 2016 16:27:18 GMT - jonathan.underwood:

Well, I am no expert here :) But this is a somewhat hot topic at the moment. I very much think xpra is in the same boat as Screen and tmux. In case you missed it, this is a nice summary of why it's a hot topic:

http://lwn.net/Articles/689732/

The best thing xpra could do, i think, is start in a new process tree. Quite what the right mechanism for that is is unclear - I expect you don't want to do the dbus dance to talk to the systemd daemon to create a new session and control group (which would be the systemd maintainers preferred route).

Something along the lines of this comment might be one way to go:

http://lwn.net/Articles/690795/

This also makes for interesting reading:

https://github.com/tmux/tmux/issues/428

ps. Sorry for the late reply and lack of packaging activity in recent weeks - have changed jobs. I should be getting back to packaging now though.


Sat, 18 Jun 2016 16:32:37 GMT - jonathan.underwood:

Actually, probably the "right" way to go on systems using systemd is to use systemd-run to launch xpra:

https://www.freedesktop.org/software/systemd/man/systemd-run.html


Tue, 12 Jul 2016 16:52:22 GMT - Antoine Martin: milestone changed

Milestone renamed


Wed, 10 Aug 2016 06:03:10 GMT - Antoine Martin:

the "right" way to go on systems using systemd is to use systemd-run to launch xpra


Users shouldn't really need to care about this low-level plumbing, so when they issue an "xpra start", they expect it to survive their current session (be it an ssh session, or even a full desktop environment). That's especially true of ssh sessions started with "xpra start ssh:HOST --start=xterm".

So we would need to do this from "xpra start ...":


I tried to test this using a guest account:

And the xpra server survived... Fedora 24 all up to date. What am I missing? @jonathan.underwood: see also ticket:1129#comment:21


Wed, 17 Aug 2016 07:58:48 GMT - Antoine Martin: attachment set

wrap xpra server command with systemd-run automatically


Wed, 17 Aug 2016 09:15:10 GMT - Antoine Martin: priority changed

Actually, probably the "right" way to go on systems using systemd is to use systemd-run to launch xpra:


As of r13378, we now run server commands via systemd-run:

$ xpra start --start=xterm --no-daemon --systemd-run-args="-p MemoryAccounting=true -p MemoryLimit=64M"
using systemd-run to wrap 'start' server command
'systemd-run' '--scope' '--user' '-p' 'MemoryAccounting=true' '-p' 'MemoryLimit=64M' '/usr/bin/xpra' \
    'start' '--start=xterm' '--systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M' '--daemon=no'
Running scope as unit run-rd905fbd12caf4ec8b400030991401a14.scope.
(...)
● run-rd905fbd12caf4ec8b400030991401a14.scope - /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemo
   Loaded: loaded
Transient: yes
  Drop-In: /run/user/1000/systemd/user/run-rd905fbd12caf4ec8b400030991401a14.scope.d
           └─50-Description.conf, 50-MemoryAccounting.conf, 50-MemoryLimit.conf
   Active: active (running) since Wed 2016-08-17 16:09:09 ICT; 51s ago
   CGroup: /user.slice/user-1000.slice/user@1000.service/run-rd905fbd12caf4ec8b400030991401a14.scope
           ├─25491 /bin/python /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemon=no
           ├─25502 /usr/libexec/Xorg -noreset -nolisten tcp +extension GLX +extension RANDR +extension RENDER -auth /run/user/1000/gdm/Xauthority -logfi
           ├─25509 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
           ├─25639 xterm
           └─25641 bash
Aug 17 16:09:09 desktop systemd[1417]: Started /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemon
Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): pam-systemd initializing
Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): Asking logind to create session: uid=1000 pid=25491 service=xpra type=x11 class=user d
Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): Failed to create session: Access denied

So we end up with a cgroup for the session, but there are problems:

(re-tested after the r13505 pam fix for xauth data)

See also #1335


Sun, 09 Oct 2016 04:26:28 GMT - Antoine Martin:

r14062 disables pam_open for now because it causes the service (#1335) to run in a user-0 slice instead of the system slice.


Sat, 29 Oct 2016 11:38:47 GMT - Antoine Martin: owner, status changed

Instead of ensuring that the session survives, this seems to have the exact opposite effect (and worse - requiring a reboot to properly clear things), details in #1348. I've tested both with KillUserProcesses=no and KillUserProcesses=yes with the same result.

xpra does get killed unceremoniously but worst of all this seems to have an effect on ssh making the next login attempt take forever. (looks similar to systemd issue 2863)

I've asked for help on the PAM session hooks for independent session

Alternatively, we could expand the proxy server to start new sessions on behalf of other users. The proxy server runs as root and should have sufficient privileges to invoke logind's createsession. Downsides: we don't currently require the proxy server to be running and this may slow down session startup.


Wed, 16 Nov 2016 06:33:42 GMT - Antoine Martin: attachment set

start polkit automatically (requires session management)


Tue, 22 Nov 2016 13:49:59 GMT - Antoine Martin: milestone changed

The answer from the systemd mailing list is that we do need a suid binary to do the registration: https://lists.freedesktop.org/archives/systemd-devel/2016-November/037700.html.

Too late to start messing with the suid / socket activation approaches now.


Fri, 30 Dec 2016 02:26:04 GMT - rektide: cc set


Sun, 19 Feb 2017 06:27:51 GMT - Antoine Martin: milestone changed


Thu, 18 May 2017 16:23:06 GMT - Antoine Martin: priority changed

Some related changes:

r15810 added uid and gid support when running as root (added benefits: can listen to ports below 1024 without running as root or using iptables) So theoretically we could ask the root proxy server to start sessions for us and do the pam / logind registration. (that bit seems to work?) The permissions could be restricted using regular authentication or even SO_PEERCRED / SCM_CREDENTIALS (probably the former). So far so good.

But then I found:

Despite the documentation (https://www.freedesktop.org/software/systemd/man/logind.conf.html) stating that: Note that setting KillUserProcesses?=yes will break tools like screen(1) and tmux(1), unless they are moved out of the session scope. See example in systemd-run(1). - EDIT: seems to work on another system...

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Resource_Management_Guide/chap-Using_Control_Groups.html) systemd-run either needs to be disabled by default (only on the distributions affected? could be kernel configuration related...), or changed to "auto" so we can check before trying (or even fallback after failing?)

Links:


Sun, 21 May 2017 18:49:34 GMT - Antoine Martin:

Socket activation has been added (partial), see #1521. Minor improvements to the system-wide proxy server in r15899, r15897, r15894. Preparatory work in r15901, r15902, r15903, r15904. Merged hidden "request-start" subcommand in r15906, ie:

xpra request-start --start=xterm :100

Will connect to the system-wide proxy server and make it start this session.


There are two ways of changing uid:

Issues:


Mon, 22 May 2017 05:44:39 GMT - Antoine Martin: attachment set

ask the proxy server to call pam_open on our behalf (ends up moving the proxy server process into the new session scope, not what we want..)


Mon, 22 May 2017 09:28:17 GMT - Antoine Martin:

Mostly working as of r15907 + r15908 + r15909 via the new "request-start" subcommand, using "peercred" auth (#1524). The xpra server process is started as root by the system proxy instance, it does the pam registration before changing uid, and updates the DISPLAY attribute once we have it. We end up with a new session scope hanging off the user's slice:

Control group /:
-.slice
├─user.slice
│ ├─user-1000.slice
│ │ └─session-c32.scope
│ │   ├─31069 /bin/python /usr/bin/xpra start :100 --csc-modules=all ...
│ │   ├─31071 /usr/libexec/Xorg-for-Xpra-:100 -noreset -novtswitch -nolisten tcp ...
│ │   ├─31090 /usr/bin/dbus-daemon --syslog-only --fork --print-pid 5 --print-address 7 --session
│ │   ├─31318 pulseaudio --start -n --daemonize=false --system=false --exit-idle-time=-1 ...
│ │   ├─31457 xterm
│ │   ├─31459 bash
│ │   ├─31761 /usr/libexec/gvfsd
│ │   └─31767 /usr/libexec/gvfsd-fuse /home/antoine/.gvfs -f -o big_writes
...

And this is also shown as a session, without a seat or controlling TTY:

[antoine@desktop ~]$ loginctl list-sessions
   SESSION        UID USER             SEAT             TTY
        c3         42 gdm              seat0            /dev/tty1
       c32       1000 antoine
        18       1000 antoine          seat0            /dev/tty2

Exiting the xpra server terminates the whole session and all the processes get killed reliably. Sessions started via ssh survive the logout too.

Still TODO:

Still as per comment:15 :

Some good documentation on control groups: LWN: Control groups series by Neil Brown


Wed, 24 May 2017 11:27:13 GMT - Antoine Martin:

Debian packaging of the systemd service: #1530


Mon, 05 Jun 2017 14:16:36 GMT - Antoine Martin:

Updates:

Still TODO:


Tue, 20 Jun 2017 17:02:02 GMT - Antoine Martin:

Updates:

Tested OK on Fedora 26 and centos 7.x


Tue, 20 Jun 2017 21:53:12 GMT - Antoine Martin:

Audit of all chown, chmod and mkdir calls (see r16108):

Last remaining issue: daemon=yes from r16108 seems to cause problems. The process tree is killed. Ouch.


Wed, 21 Jun 2017 12:55:31 GMT - Antoine Martin:

Updates:


The problem referred to in comment:20 is actually a systemd problem.. We correctly ask logind to create a new scope by calling pam open, but somehow things get messed up and systemd spews:

systemd-logind[1098]: Failed to start session scope session-3.scope: Unit session-3.scope already exists.
python[30562]: pam_systemd(xpra:session): Failed to create session: File exists

I've also seen this variant:

systemd-logind[1098]: Failed to start session scope session-3.scope: Device or resource busy
pam_systemd(xpra:session): Failed to create session: Device or resource busy

(maybe after trying to cleanup the stale session file in /run/systemd/transient/?)


Problem is that the pam call returns success... but systemd does a quick session start followed by a shutdown, full log:

Jun 21 14:32:14 systemd[1]: Created slice User Slice of guest.
Jun 21 14:32:14 systemd[1]: Starting User Manager for UID 1001...
Jun 21 14:32:14 systemd-logind[1098]: Failed to start session scope session-3.scope: Device or resource busy
Jun 21 14:32:14 audit[20751]: USER_START pid=20751 uid=0 auid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_localuser acct="guest" exe="/usr/bin/python2.7" hostname=localhost addr=? terminal=pts/7 res=success'
Jun 21 14:32:14 python[20751]: pam_systemd(xpra:session): Failed to create session: Device or resource busy
Jun 21 14:32:14 kernel: audit: type=1105 audit(1498048334.701:1132): pid=20751 uid=0 auid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_localuser acct="guest" exe="/usr/bin/python2.7" hostname=localhost addr=? terminal=pts/7 res=success'
Jun 21 14:32:14 audit[20753]: USER_ACCT pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=1101 audit(1498048334.711:1133): pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 audit[20753]: USER_ROLE_CHANGE pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=2300 audit(1498048334.770:1134): pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=1006 audit(1498048334.770:1135): pid=20753 uid=0 subj=system_u:system_r:init_t:s0 old-auid=4294967295 auid=1001 tty=(none) old-ses=4294967295 ses=12 res=1
Jun 21 14:32:14 systemd[20753]: pam_unix(systemd-user:session): session opened for user guest by (uid=0)
Jun 21 14:32:14 kernel: audit: type=1105 audit(1498048334.771:1136): pid=20753 uid=0 auid=1001 ses=12 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 audit[20753]: USER_START pid=20753 uid=0 auid=1001 ses=12 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 systemd[20753]: Reached target Paths.
Jun 21 14:32:14 systemd[20753]: Starting D-Bus User Message Bus Socket.
Jun 21 14:32:14 systemd[20753]: Reached target Timers.
Jun 21 14:32:14 systemd[20753]: Listening on D-Bus User Message Bus Socket.
Jun 21 14:32:14 systemd[20753]: Reached target Sockets.
Jun 21 14:32:14 systemd[20753]: Reached target Basic System.
Jun 21 14:32:14 systemd[20753]: Reached target Default.
Jun 21 14:32:14 systemd[20753]: Startup finished in 35ms.
Jun 21 14:32:14 systemd[1]: Started User Manager for UID 1001.
Jun 21 14:32:14 kernel: audit: type=1130 audit(1498048334.816:1137): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 systemd[1]: Stopping User Manager for UID 1001...
Jun 21 14:32:14 systemd[20753]: Stopped target Default.
Jun 21 14:32:14 systemd[20753]: Stopped target Basic System.
Jun 21 14:32:14 systemd[20753]: Stopped target Timers.
Jun 21 14:32:14 systemd[20753]: Stopped target Paths.
Jun 21 14:32:14 systemd[20753]: Stopped target Sockets.
Jun 21 14:32:14 systemd[20753]: Closed D-Bus User Message Bus Socket.
Jun 21 14:32:14 systemd[20753]: Reached target Shutdown.
Jun 21 14:32:14 systemd[20753]: Starting Exit the Session...
Jun 21 14:32:14 systemd[20753]: Received SIGRTMIN+24 from PID 20780 (kill).
Jun 21 14:32:14 systemd[20772]: pam_unix(systemd-user:session): session closed for user guest
Jun 21 14:32:14 systemd[1]: Stopped User Manager for UID 1001.
Jun 21 14:32:14 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 kernel: audit: type=1131 audit(1498048334.829:1138): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 21 14:32:14 systemd[1]: Removed slice User Slice of guest.

If we can't rely on logind to create a session... we have a serious problem.


Sat, 24 Jun 2017 22:53:10 GMT - Antoine Martin:

Minor fixes in r16129.

The problem from comment:21 leaves the session running but inaccessible since the /run/user/$UID directory gets nuked. The session can still be accessed if the user is a member of the "xpra" group through its socket in /run/xpra but the other sockets and the log files are lost.. I will try to write a more easily reproducible test case for reporting / asking upstream: daemonize, pam open, (start vfb?), create sockets, redirect stdout+stderr, etc.. (run as root)


Sun, 25 Jun 2017 21:07:27 GMT - Antoine Martin:

It turns out that the problem is not with the code or the pam module, though pam failures to call logind are not returned as errors, simply using a different service name fixes everything. (ie: "login" instead of "xpra") So r16132 uses a more complete pam configuration file and the test now works... but the server still does not. sigh.


Mon, 26 Jun 2017 19:37:05 GMT - Antoine Martin: owner, status changed

Finally all fixed (I think - for real, this time) in r16134: the final piece was that we must keep the pam file descriptor open when redirecting stdout / stderr to the log file.

@smo: FYI, feel free to close. Sessions should be started via the system proxy on systems that have activated (or socket activated), which means they will survive KillUserProcesses=yes.

(commit at 30000 feet - woot!)


Wed, 28 Jun 2017 09:14:46 GMT - Antoine Martin: attachment set

run cleanups with a priority value so we could run pam.close last, but this cannot be used because we are no longer root and dbus sends the uid..


Tue, 04 Jul 2017 11:38:16 GMT - Antoine Martin:

Some related changes:


Mon, 17 Jul 2017 07:04:13 GMT - Antoine Martin:

One minor bug: #1582, need to continue to honour user preferences


Mon, 17 Jul 2017 11:04:59 GMT - Antoine Martin:

Another fix for the ticket that keeps on giving: r16391 (chdir so the cwd is what we expect)


Thu, 20 Jul 2017 07:17:26 GMT - Antoine Martin: status changed; resolution set

crickets - works for me, also tested on Debian: #1530


Tue, 25 Jul 2017 11:15:05 GMT - Antoine Martin:

Important fix in r16502, we really need #1535 to be able to simplify this awful code.


Thu, 27 Jul 2017 05:55:48 GMT - Antoine Martin:

Likely to have caused a regression due to missing environment variables: #1602.


Tue, 11 Feb 2020 04:31:22 GMT - Antoine Martin:

See also #2042, #2585, #1536.


Sat, 23 Jan 2021 05:15:06 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1105