Xpra: Ticket #417: re-implement bandwidth constraint option

So we can limit ourselves to N Mbps if desired.

This may be implemented two ways:

Or a combination of the 2.



Fri, 15 Nov 2013 14:16:30 GMT - Antoine Martin: milestone changed

too late for 0.11


Mon, 20 Jan 2014 11:15:00 GMT - Antoine Martin: owner, status changed

It would be nice to make this generic enough so that we can pass down the information to each encoder, but taking into account the fact that we may have many windows, each of which consuming a variable amount of bandwidth is not going to be easy!


Mon, 03 Mar 2014 15:08:53 GMT - Antoine Martin: milestone changed

Too difficult, re-scheduling.


Tue, 27 Sep 2016 09:15:01 GMT - Antoine Martin: milestone changed

There is demand for this.


Mon, 20 Feb 2017 11:50:52 GMT - Antoine Martin: milestone changed

See also #540, #401, #619 and #999


Mon, 10 Jul 2017 14:15:18 GMT - Antoine Martin: milestone changed

re-scheduling


Mon, 23 Oct 2017 16:16:54 GMT - Antoine Martin:

Support added in r17232.

You can see the current settings and the bandwidth budget distribution between multiple windows using:

xpra info | egrep -i "bandwidth-limit"

This is what iftop shows for a 1Mbps target and glxgears using up all the available bandwidth:

localhost.localdomain => localhost.localdomain    941Kb   826Kb   984Kb

Caveats:

TODO:


Tue, 24 Oct 2017 15:58:54 GMT - Antoine Martin:

Hooked the network interface speed (when available) and disabled mmap, see ticket:540#comment:16.


Thu, 26 Oct 2017 08:31:57 GMT - Antoine Martin:

r17255: adds the UI option to the HTML5 client's connect dialog, defaults to the value we get from the browser's network information API (as per ticket:1581#comment:3) We don't do this when bypassing the connect dialog, at least for now.


Fri, 27 Oct 2017 15:32:24 GMT - Antoine Martin: owner, status changed

@maxmylyn: ready for a first round of testing. So far, I have used "glxgears" for generating a high framerate, "iftop" to watch the bandwidth usage in realtime and the system tray to change the limit. I've also resized the glxgears window to generate more pixel updates - a larger window should give us a lower framerate (higher batch delay) and higher compression (lower speed and quality). To verify that we stick to our budget correctly, we should test using a strict bandwidth shaper (ie: tc) to replicate real-life network conditions. As long as the bandwidth-limit is slightly below the limit set by the shaper, the results should be identical. When capturing problematic conditions, make sure to get the full network characteristics (latency, bandwidth, etc) and xpra info output.


Fri, 27 Oct 2017 20:19:55 GMT - J. Max Mena: owner changed

Okay, initial testing is complete(trunk 2.X 17263 Fedora 25 server/client). It seems to work fine, at least the lower limits. I'm not sure my machine is capable of pushing huge amounts of data, so the 1/2/5 Mbps limits were all I could test.

One request - to facilitate testing, can we have a control channel or client/server side CLI flag or conf, such that I don't have to use the system tray (since GNOME has decided we don't need that). If we get a switch then I can add a very quick test run or two to the automated test box.


Sat, 28 Oct 2017 02:44:56 GMT - Antoine Martin: owner changed

One request - to facilitate testing, can we have a control channel or client/server side CLI flag or conf, such that I don't have to use the system tray (since GNOME has decided we don't need that). If we get a switch then I can add a very quick test run or two to the automated test box.


Tue, 31 Oct 2017 02:25:10 GMT - J. Max Mena:

Note to self:


Wed, 01 Nov 2017 22:09:19 GMT - J. Max Mena: owner changed

Alright, this was a fun one to test. For reference my server and client are both Fedora 25 running trunk r17281.

So I had to spend about half an hour sifting through random forum posts asking how to do this, and they all wanted some sort of weird multi-line tc command magic...so then I remembered we had some documentation on how to do delay and loss in ticket:999. After perusing that I settled on a command:

tc qdisc add dev ens33 root netem rate 1mbit

Adapted from https://serverfault.com/questions/787006/how-to-add-latency-and-bandwidth-limit-interface-using-tc - close but not quite, and a bit complicated for our simple use-case. Anyways, I'm leaving this here for when I eventually will need to come back to this ticket.

NOTE: Be careful with that command, you can easily lose your SSH session if you're not careful.

And, I played around with 1mbps and 2mbps limits. I set the server to rate limit at 1mbps, and enabled and disabled TC both at 1mbps and 2mbps and in both cases, the bandwidth dropped for a second or so right after enabling/disabling TC (which makes sense as TC probably interrupts connections), but afterwards it settles around 1mbit +- a bit. The highest I saw it get was 1.2mbps with TC set to 2mbps and the limit set to 1mbps, but it settles pretty quickly down to 1mbps. So, I can definitively say the rate limiting is working as expected, even with network limits applied.

As for the png/L encoder - I'm not sure how to force that encoding. I tried --encodings=png/L which should force it to use that encoding, but when I do it fails to connect with:

2017-11-01 15:07:30,448 server failure: disconnected before the session could be established
2017-11-01 15:07:30,448 server requested disconnect: server error (error accepting new connection)
2017-11-01 15:07:30,468 Connection lost

I'm not entirely sure how to force the PNG/L encoding like we talked about, so I'm going to pass this to you to ask how.


Thu, 02 Nov 2017 03:13:35 GMT - Antoine Martin: owner changed

... settles around 1mbit +- a bit ..

Does the bandwidth-limit=1mbps work better than not setting any value when running on a 1mbps constrained connection? (in particular the perceived screen update latency, which should correlate with the batch.delay + damage.out_latency) Did you test with tc latency and jitter? Did you notice any screen update repetitive stuttering?

it fails to connect with: server failure...

The error shown in the server log was: Exception: client failed to specify any supported encodings, r17282 fixes that.


Sat, 04 Nov 2017 06:54:59 GMT - Antoine Martin:

Minor cosmetic improvements in r17296 + r17297 + r17298.


Sun, 19 Nov 2017 07:19:31 GMT - Antoine Martin:

r17452 adds bandwidth availability detection (server side), see ticket:999#comment:18 for details.


Tue, 12 Dec 2017 23:11:04 GMT - J. Max Mena:

Finally catching up to this one:

Does the bandwidth-limit=1mbps work better than not setting any value when running on a 1mbps constrained connection? (in particular the perceived screen update latency, which should correlate with the batch.delay + damage.out_latency)

Definitely. Just running glxgears without the added bandwidth limitations makes it apparent that the added bandwidth limitation helps immensely. Without setting a bandwidth limit, framerate is all over the place with lots of stutters and catching up. With the bandwidth limit set, the framerate is much smoother and notably more consistent with only a small initial stuttering.

Did you notice any screen update repetitive stuttering?

I already mentioned this above, and yes, but only when on a severely constrained connection without the limit set (--bandwidth-limit=).

Did you test with tc latency and jitter?

I'll do this shortly....right after my ~3pmish espresso.


Tue, 12 Dec 2017 23:41:58 GMT - J. Max Mena: owner changed

Alright I ran a few levels of TC:

"Light TC" aka 50ms +-10ms with a 25% chance delay 50ms 10ms 25%:

"Light TC only loss" aka 2% loss no jitter loss 2%:

"Medium TC only loss" aka 2% loss some jitter loss 2% 25%:

Just to be thorough I threw a combination of loss and delay loss 2% delay 50ms 10ms but it wasn't pretty - very low framerate, with the occasional burst of a bit more framerate.


As a total aside - I wonder if there's some utility that will give some kind of packet type accounting in aggregate - to see how much of an impact of the TCP packets have with needing to resend data. Mostly out of curiosity.


Sat, 23 Dec 2017 13:10:07 GMT - Antoine Martin: status changed; resolution set

The stuttering with packet loss is caused by the packets backing up whilst waiting for the server's TCP resend, then they're all flowing again at the same time. UDP (#639) could deal with this better, by skipping those frames. (not sure the current heuristics skip frames as often as they should) We could also use some sort of throttled vsync to keep the framerate more constant when recovering, but that would be hard work and nothing is going to allow us to avoid the pause with TCP as this is happening in the host network stack. I think this works well enough to close this ticket, we can open new ones with refinements / new requirements.


Wed, 24 Jan 2018 14:29:15 GMT - Antoine Martin:

Not sure how well this got tested: although the original changeset (r17259) was fine, r17296 introduced a regression which causes the connection to abort when the system tray bandwidth limit is changed... Fixed in r18141.


Sat, 23 Jan 2021 04:54:49 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/417