See also #1129.
SetIdleHint
)
The seat with vt number may be challenging since we don't have a vt number...
Enlightening thread: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZNQW72UP36UAFMX53HPFFQTWTQDZVJ3M/: systemd-logind will now by default terminate user processes that are part of the user session scope unit. PITA for us.
Debian ticket: systemd kill background processes after user logs out
https://www.freedesktop.org/software/systemd/man/systemd-run.html#Examples, see Start screen as a user service.
PAM support in tmux uses pam_start
+ pam_open_session
- would this be enough?
r12722 (+fixes in r12756 + r12761 + r12753 for osx) adds an "xpra" pam service so we can call pam_open_session
early (before daemonizing) when starting a server.
The pam_systemd module should also ensure that the directories are present for #1129.
We'll see if this is enough to prevent us from getting killed.
See also: Is linux-PAM session same as linux process session?: The short answer is no, they're different things, but processes that handle login sessions should handle both of them. We're not a login session per-se, but as close as can be.
systemd-devel: The whole su/pkexec session debate: This way, screen will keep an "active" reference to the session and systemd-logind will not mark it as "closing". So the session that was nitiated by sshd will be kept open by "screen". Note that pam_open_session() without pam_authenticate() will *not* create a new session but only attach to the current session.
Wait, as per https://lists.freedesktop.org/archives/systemd-devel/2013-December/014996.html: The session is still marked as "closing" but because processes still exist it never quite dies. And yes, the kill processes option (which is a nice thing to enable if possible) would indeed kill the screen.
@jonathan.underwood: How on earth are we supposed to fix this thing? We don't want or need root, just tell logind to move the process into its own session.
Well, I am no expert here :) But this is a somewhat hot topic at the moment. I very much think xpra is in the same boat as Screen and tmux. In case you missed it, this is a nice summary of why it's a hot topic:
http://lwn.net/Articles/689732/
The best thing xpra could do, i think, is start in a new process tree. Quite what the right mechanism for that is is unclear - I expect you don't want to do the dbus dance to talk to the systemd daemon to create a new session and control group (which would be the systemd maintainers preferred route).
Something along the lines of this comment might be one way to go:
http://lwn.net/Articles/690795/
This also makes for interesting reading:
https://github.com/tmux/tmux/issues/428
ps. Sorry for the late reply and lack of packaging activity in recent weeks - have changed jobs. I should be getting back to packaging now though.
Actually, probably the "right" way to go on systems using systemd is to use systemd-run to launch xpra:
https://www.freedesktop.org/software/systemd/man/systemd-run.html
Milestone renamed
the "right" way to go on systems using systemd is to use systemd-run to launch xpra
Users shouldn't really need to care about this low-level plumbing, so when they issue an "xpra start", they expect it to survive their current session (be it an ssh session, or even a full desktop environment). That's especially true of ssh sessions started with "xpra start ssh:HOST --start=xterm".
So we would need to do this from "xpra start ...":
KillUserProcesses=yes
and skip the workaround if it isn't needed?
systemd-run --scope --user xpra _start $@
(and make "xpra _start" the same as before)
I tried to test this using a guest account:
KillUserProcesses=yes
loginctl disable-linger
ssh guest@localhost
And the xpra server survived... Fedora 24 all up to date. What am I missing? @jonathan.underwood: see also ticket:1129#comment:21
wrap xpra server command with systemd-run automatically
Actually, probably the "right" way to go on systems using systemd is to use systemd-run to launch xpra:
As of r13378, we now run server commands via systemd-run:
$ xpra start --start=xterm --no-daemon --systemd-run-args="-p MemoryAccounting=true -p MemoryLimit=64M" using systemd-run to wrap 'start' server command 'systemd-run' '--scope' '--user' '-p' 'MemoryAccounting=true' '-p' 'MemoryLimit=64M' '/usr/bin/xpra' \ 'start' '--start=xterm' '--systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M' '--daemon=no' Running scope as unit run-rd905fbd12caf4ec8b400030991401a14.scope. (...)
● run-rd905fbd12caf4ec8b400030991401a14.scope - /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemo Loaded: loaded Transient: yes Drop-In: /run/user/1000/systemd/user/run-rd905fbd12caf4ec8b400030991401a14.scope.d └─50-Description.conf, 50-MemoryAccounting.conf, 50-MemoryLimit.conf Active: active (running) since Wed 2016-08-17 16:09:09 ICT; 51s ago CGroup: /user.slice/user-1000.slice/user@1000.service/run-rd905fbd12caf4ec8b400030991401a14.scope ├─25491 /bin/python /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemon=no ├─25502 /usr/libexec/Xorg -noreset -nolisten tcp +extension GLX +extension RANDR +extension RENDER -auth /run/user/1000/gdm/Xauthority -logfi ├─25509 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session ├─25639 xterm └─25641 bash Aug 17 16:09:09 desktop systemd[1417]: Started /usr/bin/xpra start --start=xterm --systemd-run-args=-p MemoryAccounting=true -p MemoryLimit=64M --daemon Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): pam-systemd initializing Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): Asking logind to create session: uid=1000 pid=25491 service=xpra type=x11 class=user d Aug 17 16:09:09 desktop python[25491]: pam_systemd(xpra:session): Failed to create session: Access denied
So we end up with a cgroup for the session, but there are problems:
$ systemd-cgls Control group /: -.slice ├─init.scope (...) └─user.slice └─user-1000.slice ├─session-1.scope └─user@1000.service (...) ├─run-r450625a9be2343f0bfb2034b01db64ee.scope │ ├─13164 /bin/python /usr/bin/xpra start --start=xterm --bind-tcp=0.0.... │ ├─13180 /usr/bin/dbus-daemon --fork --print-pid 5 --print-address 7 -... │ ├─13306 xterm │ └─13308 bash
CreateSession
via dbus (with r13379):
python: pam_systemd(xpra:session): Asking logind to create session: uid=1000 pid=11564 service=xpra type=x11 class=user desktop=xpra seat= vtnr=0 tty= display=#001 remote=no remote_user= remote_host= python: pam_systemd(xpra:session): Failed to create session: Access denied
(re-tested after the r13505 pam fix for xauth data)
See also #1335
r14062 disables pam_open for now because it causes the service (#1335) to run in a user-0 slice instead of the system slice.
Instead of ensuring that the session survives, this seems to have the exact opposite effect (and worse - requiring a reboot to properly clear things), details in #1348.
I've tested both with KillUserProcesses=no
and KillUserProcesses=yes
with the same result.
xpra does get killed unceremoniously but worst of all this seems to have an effect on ssh making the next login attempt take forever. (looks similar to systemd issue 2863)
I've asked for help on the PAM session hooks for independent session
Alternatively, we could expand the proxy server to start new sessions on behalf of other users. The proxy server runs as root and should have sufficient privileges to invoke logind's createsession. Downsides: we don't currently require the proxy server to be running and this may slow down session startup.
start polkit automatically (requires session management)
The answer from the systemd mailing list is that we do need a suid binary to do the registration: https://lists.freedesktop.org/archives/systemd-devel/2016-November/037700.html.
Too late to start messing with the suid / socket activation approaches now.
Some related changes:
r15810 added uid and gid support when running as root (added benefits: can listen to ports below 1024 without running as root or using iptables)
So theoretically we could ask the root proxy server to start sessions for us and do the pam / logind registration. (that bit seems to work?)
The permissions could be restricted using regular authentication or even SO_PEERCRED
/ SCM_CREDENTIALS
(probably the former).
So far so good.
But then I found:
KillUserProcesses
in logind.conf is broken in Fedora 26? xpra survives out of the box - at least for now:
$ sudo loginctl disable-linger guest $ loginctl show-user | grep Kill KillUserProcesses=yes $ sudo loginctl show-user guest | grep Linger Linger=no $ loginctl user-status guest (1001) Since: Thu 2017-05-18 22:23:32 +07; 25min ago State: active Sessions: *10 Linger: no Unit: user-1001.slice ├─session-10.scope │ ├─3122 sshd: guest [priv] │ ├─3174 sshd: guest@pts/1 │ ├─3196 -bash │ ├─4244 loginctl user-status │ └─4245 less ├─session-4.scope │ ├─1658 /usr/bin/python2 /usr/bin/xpra --bind-tcp=0.0.0.0:10000 start :10 --start=xterm --systemd-ru │ ├─1659 /usr/libexec/Xorg-for-Xpra-:10 -noreset -novtswitch -nolisten tcp +extension GLX +extension │ ├─1686 /usr/bin/dbus-daemon --syslog-only --fork --print-pid 5 --print-address 7 --session │ ├─1932 pulseaudio --start -n --daemonize=false --system=false --exit-idle-time=-1 --load=module-sus │ ├─2134 /usr/libexec/gvfsd │ ├─2162 xterm │ └─2164 bash └─user@1001.service └─init.scope ├─3127 /usr/lib/systemd/systemd --user └─3155 (sd-pam)
Despite the documentation (https://www.freedesktop.org/software/systemd/man/logind.conf.html) stating that: Note that setting KillUserProcesses?=yes will break tools like screen(1) and tmux(1), unless they are moved out of the session scope. See example in systemd-run(1). - EDIT: seems to work on another system...
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Resource_Management_Guide/chap-Using_Control_Groups.html) systemd-run either needs to be disabled by default (only on the distributions affected? could be kernel configuration related...), or changed to "auto" so we can check before trying (or even fallback after failing?)
Links:
Socket activation has been added (partial), see #1521. Minor improvements to the system-wide proxy server in r15899, r15897, r15894. Preparatory work in r15901, r15902, r15903, r15904. Merged hidden "request-start" subcommand in r15906, ie:
xpra request-start --start=xterm :100
Will connect to the system-wide proxy server and make it start this session.
There are two ways of changing uid:
user-0.slice
/ session-cNNN.scope
- maybe we should use systemd-run without uid here?
system.slice
/ run-$UUID.scope
. (not sure we could ask the system proxy to do it on our behalf)
Issues:
['/usr/bin/xpra', 'start', '--csc-modules=all', '--packet-encoders=rencode, bencode, yaml', '--video-decoders=all', '--encodings=all', '--compressors=lz4, lzo, zlib', '--start=xterm', '--video-encoders=all', '--env=XPRA_PROXY_START_UUID=3f2cc30518ea4e2498cd85c68c87f3ae', '--systemd-run=no', '--uid=1000', '--gid=1000']
(most of the values are equivalent to the defaults)
XDG_RUNTIME_DIR
problems... again (#1129), ie: error: XDG_RUNTIME_DIR not set in the environment.
xauth: unable to generate an authority file name
, ie:
Error running "xauth add :2 MIT-MAGIC-COOKIE-1 8194181c038c4086bcb206ee7610e98d": non-zero exit code: 1
ask the proxy server to call pam_open on our behalf (ends up moving the proxy server process into the new session scope, not what we want..)
Mostly working as of r15907 + r15908 + r15909 via the new "request-start" subcommand, using "peercred" auth (#1524). The xpra server process is started as root by the system proxy instance, it does the pam registration before changing uid, and updates the DISPLAY attribute once we have it. We end up with a new session scope hanging off the user's slice:
Control group /: -.slice ├─user.slice │ ├─user-1000.slice │ │ └─session-c32.scope │ │ ├─31069 /bin/python /usr/bin/xpra start :100 --csc-modules=all ... │ │ ├─31071 /usr/libexec/Xorg-for-Xpra-:100 -noreset -novtswitch -nolisten tcp ... │ │ ├─31090 /usr/bin/dbus-daemon --syslog-only --fork --print-pid 5 --print-address 7 --session │ │ ├─31318 pulseaudio --start -n --daemonize=false --system=false --exit-idle-time=-1 ... │ │ ├─31457 xterm │ │ ├─31459 bash │ │ ├─31761 /usr/libexec/gvfsd │ │ └─31767 /usr/libexec/gvfsd-fuse /home/antoine/.gvfs -f -o big_writes ...
And this is also shown as a session, without a seat or controlling TTY:
[antoine@desktop ~]$ loginctl list-sessions SESSION UID USER SEAT TTY c3 42 gdm seat0 /dev/tty1 c32 1000 antoine 18 1000 antoine seat0 /dev/tty2
Exiting the xpra server terminates the whole session and all the processes get killed reliably. Sessions started via ssh survive the logout too.
Still TODO:
XDG_RUNTIME_DIR
is not set? (try over TCP with a different auth module, instead of peercred + ssh)
Still as per comment:15 :
Some good documentation on control groups: LWN: Control groups series by Neil Brown
Debian packaging of the systemd service: #1530
Updates:
Still TODO:
xpra start --start=xterm --attach=yes
+ SIGINT: client exits with an error
XDG_RUNTIME_DIR
presence (we need it before the pam registration? used for log files, etc) - see #1537 / #1129
Updates:
Tested OK on Fedora 26 and centos 7.x
Audit of all chown, chmod and mkdir calls (see r16108):
create_runtime_dir
, we may mkdir the user's XDG_RUNTIME_DIRECTORY
as 0o700
with the uid and gid of that user: the actual path is fixed and contains the uid itself: [/var]/run/user/$UID
, so this is safe
write_pidfile
uses the path specified (not used by default) - so someone able to modify the configuration used by root could cause more damage already, and I think we do want to save the file before we change uid (saving after would be safer but we could also fail to save it)
write_runner_shell_scripts
no longer does any fchown or chmod - we never call it as non-root
find_log_dir
may create some directories as 0o7000
and chown them - this could be made tighter? (nothing should be needed when running as root since XDG_RUNTIME_DIR
should exist already)
Last remaining issue: daemon=yes from r16108 seems to cause problems. The process tree is killed. Ouch.
Updates:
The problem referred to in comment:20 is actually a systemd problem.. We correctly ask logind to create a new scope by calling pam open, but somehow things get messed up and systemd spews:
systemd-logind[1098]: Failed to start session scope session-3.scope: Unit session-3.scope already exists. python[30562]: pam_systemd(xpra:session): Failed to create session: File exists
I've also seen this variant:
systemd-logind[1098]: Failed to start session scope session-3.scope: Device or resource busy pam_systemd(xpra:session): Failed to create session: Device or resource busy
(maybe after trying to cleanup the stale session file in /run/systemd/transient/
?)
Problem is that the pam call returns success... but systemd does a quick session start followed by a shutdown, full log:
Jun 21 14:32:14 systemd[1]: Created slice User Slice of guest. Jun 21 14:32:14 systemd[1]: Starting User Manager for UID 1001... Jun 21 14:32:14 systemd-logind[1098]: Failed to start session scope session-3.scope: Device or resource busy Jun 21 14:32:14 audit[20751]: USER_START pid=20751 uid=0 auid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_localuser acct="guest" exe="/usr/bin/python2.7" hostname=localhost addr=? terminal=pts/7 res=success' Jun 21 14:32:14 python[20751]: pam_systemd(xpra:session): Failed to create session: Device or resource busy Jun 21 14:32:14 kernel: audit: type=1105 audit(1498048334.701:1132): pid=20751 uid=0 auid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_localuser acct="guest" exe="/usr/bin/python2.7" hostname=localhost addr=? terminal=pts/7 res=success' Jun 21 14:32:14 audit[20753]: USER_ACCT pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 kernel: audit: type=1101 audit(1498048334.711:1133): pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 audit[20753]: USER_ROLE_CHANGE pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 kernel: audit: type=2300 audit(1498048334.770:1134): pid=20753 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='pam: default-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 selected-context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 kernel: audit: type=1006 audit(1498048334.770:1135): pid=20753 uid=0 subj=system_u:system_r:init_t:s0 old-auid=4294967295 auid=1001 tty=(none) old-ses=4294967295 ses=12 res=1 Jun 21 14:32:14 systemd[20753]: pam_unix(systemd-user:session): session opened for user guest by (uid=0) Jun 21 14:32:14 kernel: audit: type=1105 audit(1498048334.771:1136): pid=20753 uid=0 auid=1001 ses=12 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 audit[20753]: USER_START pid=20753 uid=0 auid=1001 ses=12 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="guest" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 systemd[20753]: Reached target Paths. Jun 21 14:32:14 systemd[20753]: Starting D-Bus User Message Bus Socket. Jun 21 14:32:14 systemd[20753]: Reached target Timers. Jun 21 14:32:14 systemd[20753]: Listening on D-Bus User Message Bus Socket. Jun 21 14:32:14 systemd[20753]: Reached target Sockets. Jun 21 14:32:14 systemd[20753]: Reached target Basic System. Jun 21 14:32:14 systemd[20753]: Reached target Default. Jun 21 14:32:14 systemd[20753]: Startup finished in 35ms. Jun 21 14:32:14 systemd[1]: Started User Manager for UID 1001. Jun 21 14:32:14 kernel: audit: type=1130 audit(1498048334.816:1137): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 systemd[1]: Stopping User Manager for UID 1001... Jun 21 14:32:14 systemd[20753]: Stopped target Default. Jun 21 14:32:14 systemd[20753]: Stopped target Basic System. Jun 21 14:32:14 systemd[20753]: Stopped target Timers. Jun 21 14:32:14 systemd[20753]: Stopped target Paths. Jun 21 14:32:14 systemd[20753]: Stopped target Sockets. Jun 21 14:32:14 systemd[20753]: Closed D-Bus User Message Bus Socket. Jun 21 14:32:14 systemd[20753]: Reached target Shutdown. Jun 21 14:32:14 systemd[20753]: Starting Exit the Session... Jun 21 14:32:14 systemd[20753]: Received SIGRTMIN+24 from PID 20780 (kill). Jun 21 14:32:14 systemd[20772]: pam_unix(systemd-user:session): session closed for user guest Jun 21 14:32:14 systemd[1]: Stopped User Manager for UID 1001. Jun 21 14:32:14 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 kernel: audit: type=1131 audit(1498048334.829:1138): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=user@1001 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Jun 21 14:32:14 systemd[1]: Removed slice User Slice of guest.
If we can't rely on logind to create a session... we have a serious problem.
Minor fixes in r16129.
The problem from comment:21 leaves the session running but inaccessible since the /run/user/$UID
directory gets nuked. The session can still be accessed if the user is a member of the "xpra" group through its socket in /run/xpra
but the other sockets and the log files are lost..
I will try to write a more easily reproducible test case for reporting / asking upstream: daemonize, pam open, (start vfb?), create sockets, redirect stdout+stderr, etc.. (run as root)
It turns out that the problem is not with the code or the pam module, though pam failures to call logind are not returned as errors, simply using a different service name fixes everything. (ie: "login" instead of "xpra") So r16132 uses a more complete pam configuration file and the test now works... but the server still does not. sigh.
Finally all fixed (I think - for real, this time) in r16134: the final piece was that we must keep the pam file descriptor open when redirecting stdout / stderr to the log file.
@smo: FYI, feel free to close. Sessions should be started via the system proxy on systems that have activated (or socket activated), which means they will survive KillUserProcesses=yes
.
(commit at 30000 feet - woot!)
run cleanups with a priority value so we could run pam.close last, but this cannot be used because we are no longer root and dbus sends the uid..
Some related changes:
pam_start
failure (unlikely)
One minor bug: #1582, need to continue to honour user preferences
Another fix for the ticket that keeps on giving: r16391 (chdir so the cwd is what we expect)
crickets - works for me, also tested on Debian: #1530
Important fix in r16502, we really need #1535 to be able to simplify this awful code.
Likely to have caused a regression due to missing environment variables: #1602.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1105