Xpra: Ticket #426: multiplexing multiple xpra instances through one port

The problem is that in some situations, the servers may well be sitting behind firewalls that only allow outgoing connections to selected ports (usually 80 and 443 for web browsing). One solution to this problem is to use SSH or a VPN server running on one of those ports, but these options have their own problems (key exchange, shell account access, etc)

r4326 implements a proof of concept proxy server (accessible via "xpra proxy"): the user connects to this server and after authentication (if required - usually/probably should be) the packets are forwarded to the real server. The encryption and compression is only enabled between client and proxy, and not between proxy and server. (though we could quite easily add options to change this)

What is still needed to make this functional:

threading/processes: how many concurrent connections can we handle before this becomes the bottleneck? (at the moment, the POC server can only handle a single connection)
handle disconnection from either end gracefully
how do we lookup the real server to connect to?
- a shell script could be a little expensive but is more flexible
- a lookup file?

Other things this may be useful for:

this could be used for load balancing
security? (it is easier to isolate the servers from the clients)
we could make the proxy server stateful and let it deal with hardware video encoding: in VM hosted environments the guests do not have easy access to hardware, but the host does. The proxy could tell the server to send all frames as plain rgb+lz4, and then it would just replace the frames with video encoded data. (later, this could be taken one step further and the guests could use a shared memory mechanism to avoid using the virtual network for sending frames to the host)

Sun, 22 Sep 2013 14:10:34 GMT - Antoine Martin:

We probably want to make this *the* default proxy for all sessions on a local system (in TCP mode - SSH has user auth already), and allow system level authentication (via PAM on Linux, per platform auth), it should support "xpra list" too. So we need to specify the username, probably add a "--username=NNNN" switch? (and/or support tcp:username@host:port). Then there are threading issues, at the moment we start many threads per connection (2 for reading and 2 for writing), which is fine when you typically have just one connection active at a time, but this becomes a problem if we want to proxy for dozens of users. Even more so if the proxy handles picture encoding (one more thread..) We also need to deal with session discoverability, and this is a good reason for moving the server sockets to /tmp/. Backwards compatibility can be achieved by symlinking to the old location, checking both locations, (and maybe even adding some code to the run-xpra script?) Then ideally we would want some sort of privilege separation between the code that needs root (socket binding, connecting to xpra sockets in /tmp/) and the code that runs once authenticated (IO and encoding).

Sun, 29 Sep 2013 15:13:10 GMT - Antoine Martin:

Here's how I think this is going to work using the python multiprocessing module:

proxy runs as root (generally - not strictly required)
add an authentication option: --auth=pam|win32security|ldap|file|script...

(and maybe this can be used for regular servers too?)

when we receive a new connection, we process authentication via one of the auth modules
this auth module checks username/password/[display]?

(maybe think about modules that can use a challenge rather than using a plain password) and returns: real uid, real server URI(s), [xpra env options], [session options]

the server can then launch a sub-process, passing it the socket connection (as per this example) and let it deal with changing uid, etc

Notes:

maybe disallow system auth if we're connecting from a non-encrypted TCP socket?
support "xpra list"
don't want to use sendmsg and completely separate processes
connecting to the real server: can't think of any disadvantages of doing it in the subprocess

Mon, 30 Sep 2013 11:38:58 GMT - Antoine Martin: attachment set

attachment set to auth-v3.patch

splits authentication from server core, adds auth modules and keyfile so password file and encryption keyfile can be different

Wed, 02 Oct 2013 10:29:40 GMT - Antoine Martin: attachment set

attachment set to auth-v5.patch

updated patch (broken multiprocessor support..)

Wed, 02 Oct 2013 10:30:59 GMT - Antoine Martin:

Sigh. As explained here: Caution: python-multiprocessing, threads and glib don't mix

So the v5 patch does not run... as the idle_add calls never fire.

Thu, 03 Oct 2013 02:24:31 GMT - Antoine Martin: attachment set

attachment set to auth-v6.patch

updated patch using timers and custom code instead of gobject from the subprocesses

Thu, 03 Oct 2013 03:31:12 GMT - Antoine Martin:

The auth-v6.patch worksaround this by using custom code instead of gobject. Lots of new improvements too:

username works
pam auth works and, setuid/setgid too
both threaded and multiprocessing modes work (controlled via env var)

What does not work yet / not done yet:

client connection fails most of the time (race with encryption setup, causes invalid packet)
hello filtering needs improvement (rencode / compression should use better values + use file overrides if we have any)
signal handling (not done) - see Python: Using KeyboardInterrupt with a Multiprocessing Pool, []
handle connection strings (and URIs) like: tcp:username@host:port
force kill subprocesses on exit?
restrict the subprocesses more: should not need file access (load passwords and keys beforehand?) or any new sockets, limit resource usage, prevent forking, prevent new imports, etc
invalid usernames should still trigger challenge (and avoid user enumeration)

etc..

Thu, 03 Oct 2013 03:33:43 GMT - Antoine Martin: attachment set

attachment set to auth-v8.patch

adds attempts at signal handling and process cleanup + 1 important server fix

Thu, 03 Oct 2013 11:05:23 GMT - Antoine Martin: attachment set

attachment set to auth-v10.patch

many fixes (except encryption drop outs)

Fri, 04 Oct 2013 14:05:57 GMT - Antoine Martin:

Works ok as of r4399. We have a number of auth modules we can choose from:

allow: always allows the user to login - dangerous / only for testing
fail: always fails authentication - useful for testing
file: looks up usernames and password in the password file (format changed)
pam: linux PAM authentication
win32: win32security authentication
sys is a virtual module which will choose win32 or pam

Once authenticated, the proxy server starts a new process as the user that successfully authenticated (with the uid and gid taken from the password database) and connects to the real server. We choose the real display to connect to using the "display" capability (TODO: let client specify it) or choose the only session we find (if only one exists), or we fail. The special case is with the file auth module, which allows us to specify authentication values which may not be valid system users (though a valid uid/gid pair is still required in that case) and a target display which may be a remote one (ie: "tcp:host:port")

Fri, 04 Oct 2013 14:30:11 GMT - Antoine Martin:

Here's how you can use it with the file auth module (sys auth needs encryption to work as we refuse to send unencrypted system passwords over the sockets):

start the server

xpra proxy :100 --bind-tcp=0.0.0.0:20000 --auth=file --password-file=./xpra-auth

add your user entries in the auth file, ie:

echo "antoine|thepassword|1000|1000|tcp:testhost:10|ENV=VALUE|compression=0" >> ./xpra-auth

connect from the client:

echo "thepassword" >> password.txt
xpra attach --username=myusername --password-file=./password.txt $PROXYHOST:20000

This should cause the proxy to forward the connection to the display specified in the auth file (in the example above: tcp:testhost:10)

Thu, 17 Oct 2013 15:34:35 GMT - Antoine Martin:

Many important fixes in r4541, r4537 should make this a lot more usable now.

If things don't work as expected, check that you haven't got an old daemon/zombie running. Note: as of r4557, one can add session options to the auth file (only two are supported so far as a proof of concept compression_level and lz4), ie:

username|password|1000|1000|tcp:localhost:10000|ENV=VALUE|compression_level=1;lz4=0

Feedback welcome!

Wed, 23 Oct 2013 13:49:12 GMT - Antoine Martin:

Some important fixes in r4605, r4606, r4608, etc

I have identified the problem with the encryption: it isn't a problem with the encryption per se, the encryption just makes it more obvious. When using the proxy server, we *always* end up dropping the first packet that the client sends after the hello. Normally, that's a "set_deflate" or one of two "server-settings" (if applicable) or the first of the three "clipboard-token"s... So, when not using encryption, it's still wrong but we just don't notice because those packets aren't essential! The AES decryption relies on the strict presence and order of the data, and the missing packet causes a corrupted stream and disconnection.

That's because when we close the proxy-side connection, we may still have a read blocked in IO wait state via socket.recv. When the next packet comes in, it gets to read it before closing down... We either want to force exit the read loop early (not sure how), or get the data read and inject it into the subprocess (intrusive/ugly)... Or add a way to get the client to send a socket flush() (probably not enough to trigger a proxy read?) or to send a dummy unencrypted packet so we can close the connection? (also ugly but somewhat cleaner: everything exits with normal codepaths)

Thu, 24 Oct 2013 07:59:58 GMT - Antoine Martin:

The socket race is fixed in r4614 and encryption now works in proxy mode too (still only between client and proxy - between proxy and proxied server would require more configuration options, and is not a priority at the moment)

Note: we use a socket timeout (defaults to 0.1s) to guarantee that the sockets are always in a consistent state when handing them over to the new subprocess. This does slow down the initial connection (on average by half that delay, so about 50ms). The current value seems like a good compromise between polling too frequently (wasting CPU) and waiting too long.

r4615 allows this timeout to be configured via the XPRA_PROXY_SOCKET_TIMEOUT env var. (setting this value too high makes it much more noticeable and one can even set it so high that the connection will often timeout)

What is left for this release (the rest can go in an enhancement ticket for another release):

signal handling and subprocess exit
performance/testing

Thu, 07 Nov 2013 05:11:12 GMT - Antoine Martin:

Most of the documentation found in this ticket has been added to the wiki:

Mon, 11 Nov 2013 04:43:16 GMT - Antoine Martin: owner changed

owner changed from Antoine Martin to alas

As of r4735, the proxy server should be able to exit cleanly. "xpra stop" now works against the main proxy process (one must be authenticated as the same user that runs that process)

I think that's enough for this ticket, please test and close if it all works as expected. Please verify that the connection from the proxy to the real xpra server uses rencode and not bencode.

What we may want to add (in a new ticket):

proxy video encoding (#504), for taking advantage of #370 on the host since a VM will not have direct access to the hardware
optimize packet handling (avoid decoding then re-encoding things)
password authentication and encryption between the proxy server and the real servers
better support for "xpra detach" so we can force kill connections (since the proxy will be long lived)

Thu, 16 Jan 2014 18:57:34 GMT - Smo: owner changed

owner changed from alas to Smo

This has been tested but not extensively. We are going to be testing this with 10+ clients and making sure there is nothing broken.

Thu, 06 Feb 2014 09:16:23 GMT - Antoine Martin:

With r5375 one can see a new socket for each proxy instance (this broke older versions which will need r5373 backported):

$ xpra list
Found the following xpra sessions:
	LIVE session at :proxy-28752
	LIVE session at :10
	LIVE session at :20

Which gives us an easier way of interacting and collecting information from proxy instances. It supports: "info", "version" and "stop".

Fri, 21 Mar 2014 00:19:47 GMT - Smo:

Is this normal when using the proxy.

]$ xpra list
Found the following xpra sessions:
        LIVE session at :100
        LIVE session at :17
        LIVE session at :proxy-20954
]$ xpra --username=username --password-file=./password.txt info :proxy-20954
server requested disconnect: this socket only handles 'hello', 'version' and 'stop' requests

Fri, 21 Mar 2014 05:11:07 GMT - Antoine Martin:

Hmmm, the warning message was wrong (fixed in r5878), "info" is handled, that's one of the main purposes of the proxy socket.

It works fine here... (as usual)

Is there anything in the proxy log? All I see (since it works):

New proxy instance control connection received: SocketConnection(/home/antoine/.xpra/desktop-proxy-25522)
Connection lost

Fri, 21 Mar 2014 05:26:23 GMT - Antoine Martin: owner, status changed

owner changed from Smo to Antoine Martin
status changed from new to assigned

Got it: don't use --username or --password-file. The proxy instance does not support any authentication at present (and I hope it never needs it), it is on a unix domain socket only, so regular unix permissions should be sufficient. Unless someone uses the proxy server and shared group sockets with --socket-dir...

r5880 gives a more helpful error message if you try to use authentication
r5881 adds information to the man page

Tue, 25 Mar 2014 18:02:46 GMT - Smo: status changed; resolution set

status changed from assigned to closed
resolution set to fixed

Tested this with 8 connections through the proxy with no issues.

Mon, 27 Apr 2015 10:10:29 GMT - Antoine Martin:

Some improvements worth mentioning here:

r9164: using blocking sockets after the connection is established (fewer timer wakeups)
r9163: re-compress window icon (avoids warning)

Both could be backported, but no rush. See also: ticket:838#comment:12

Sat, 23 Jan 2021 04:55:06 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/426