Xpra: Ticket #2029: xpra shadow dying due to lack of bandwidth

Is it possible somehow to get statistics on how much bandwidth a xpra shadow would require?

I am trying to do a shadow over the internet (both ends on Ethernet); however, I am given a bunch of:

2018-11-05 21:12:56,142 client @16.500 server is not responding, drawing spinners over the windows
2018-11-05 21:12:59,770 client @20.125 server is OK again
2018-11-05 21:13:01,108 client @21.467 server is not responding, drawing spinners over the windows
2018-11-05 21:13:02,960 client @23.295 server is OK again
2018-11-05 21:13:06,143 client @26.500 server is not responding, drawing spinners over the windows
2018-11-05 21:13:08,247 client @28.608 server is OK again
2018-11-05 21:13:10,812 Warning: delayed region timeout
2018-11-05 21:13:10,812  region is 15 seconds old, will retry - bad connection?
2018-11-05 21:13:10,813  2 late responses:
2018-11-05 21:13:10,813      14 png  :  16s
2018-11-05 21:13:10,813      15 png  :  15s
2018-11-05 21:13:11,153 client @31.515 server is not responding, drawing spinners over the windows
2018-11-05 21:13:14,025 client @34.390 server is OK again
2018-11-05 21:13:14,583 Warning: delayed region timeout
2018-11-05 21:13:14,584  region is 15 seconds old, will retry - bad connection?
2018-11-05 21:13:14,584  5 late responses:
2018-11-05 21:13:14,584      12 png  :  20s
2018-11-05 21:13:14,584      13 png  :  19s
2018-11-05 21:13:14,585      14 png  :  18s
2018-11-05 21:13:14,585      15 png  :  17s
2018-11-05 21:13:14,585      16 h264 :  15s
2018-11-05 21:13:15,136 Warning: delayed region timeout
2018-11-05 21:13:15,137  region is 15 seconds old, will retry - bad connection?
2018-11-05 21:13:15,137  4 late responses:
2018-11-05 21:13:15,137      12 h264 :  20s
2018-11-05 21:13:15,137      13 h264 :  19s
2018-11-05 21:13:15,138      14 png  :  17s
2018-11-05 21:13:15,138      15 png  :  15s
2018-11-05 21:13:16,185 client @36.545 server is not responding, drawing spinners over the windows
2018-11-05 21:13:16,693 client @37.045 server is OK again
2018-11-05 21:13:21,187 client @41.545 server is not responding, drawing spinners over the windows
2018-11-05 21:13:22,221 client @42.577 server is OK again
2018-11-05 21:13:30,172 client @50.515 server is not responding, drawing spinners over the windows
2018-11-05 21:13:30,458 client @50.811 server is OK again
2018-11-05 21:13:34,376 Warning: sanitizing invalid gtk selection
2018-11-05 21:13:34,376  format=0x340957e8, type=0x2abaae0, length=-0x1
2018-11-05 21:13:35,183 client @55.545 server is not responding, drawing spinners over the windows
2018-11-05 21:13:37,291 client @57.640 server is OK again
2018-11-05 21:13:40,203 client @00.561 server is not responding, drawing spinners over the windows
2018-11-05 21:13:42,778 client @03.140 server is OK again
2018-11-05 21:15:05,525 client @25.875 server is not responding, drawing spinners over the windows
2018-11-05 21:15:05,781 client @26.140 server is OK again

At some time, I wondered if available bandwidth would be an issue. I turned the quality to the lowest possible setting (favoring bandwidth), and it seemed to be working. Also, speedometer showed from a "constant" peak of ~1.1 mb/s, to drop to 380 kb/s - 640 kb/s. I believe that my connection's peak is also ~1.3 mb/s, which could explain the timeouts (10 Mbits)

My server's setup is 3 screens (1680x1050+0+30, 1920x1080+3600+0, 1920x1080+1680+0) I have a 10/10 Mbits connection on the client, assume inf/inf at server Server's altered settings are only to disable sound/mic/webcam (otherwise all are at default) Client has disabled OpenGL (for no reason, I just don't like the warning), sound/mic/webcam, and scaled desktop (because 1600x1050 is non-standard for me, and 1920x1080 is already the host screen)

Lowest quality is not really nice if I have to discern anything smaller than 320x240 pixels



Tue, 06 Nov 2018 13:44:58 GMT - Antoine Martin: owner changed

Is it possible to have some estimation on how much bandwidth a shadow session would require?

No, that depends entirely on what is happening in the session and how well the encoders deals with it.

.. or have any kind of "real-time" plottable statistics?

The session info window should have some, including a plot which you can save.

Client has disabled OpenGL (for no reason, I just don't like the warning

Which warning?

Is there any setting that would help me work under the bandwidth limitation?

From a local terminal, try this command to start the shadow server:

XPRA_SHADOW_REFRESH_DELAY=250 xpra shadow ...

This will limit the framerate to 4fps (default is 50ms which gives 20fps). If that helps, please post the -d encoding,bandwidth server debug log and we should be able to make the shadow framerate automatically adapt to the bandwidth conditions.


Thu, 15 Nov 2018 00:27:24 GMT - stdedos:

Replying to Antoine Martin:

.. or have any kind of "real-time" plottable statistics?

The session info window should have some, including a plot which you can save.

I tried to take some. with-delay_no-quality_timeout was taken about at the end. "At" the end, red line spiked to 2M, and then died in ~5 seconds. Otherwise, all "traffic" (both timed out and not) looked the same. Maybe I should've tested the bandwidth from the server side, like I did last time?

Client has disabled OpenGL (for no reason, I just don't like the warning

Which warning?

Warning: vendor 'Intel' is greylisted,
 you may want to turn off OpenGL if you encounter bugs

Since I don't know "what bugs to expect", I just disable it altogether

Is there any setting that would help me work under the bandwidth limitation?

From a local terminal, try this command to start the shadow server:

XPRA_SHADOW_REFRESH_DELAY=250 xpra shadow ...

This will limit the framerate to 4fps (default is 50ms which gives 20fps). If that helps, please post the -d encoding,bandwidth server debug log and we should be able to make the shadow framerate automatically adapt to the bandwidth conditions.

It seems to me that XPRA_SHADOW_REFRESH_DELAY does not change the result. All the servers were started with

(XPRA_SHADOW_REFRESH_DELAY=250) xpra shadow -d encoding,bandwidth :0

and attached with

"C:\Program Files\Xpra\xpra_cmd" attach ssh://sntentos@172.16.57.121/0 --desktop-scaling=0.75 --opengl=no (--quality=10)

--quality=10 was much more of a definite factor to get it working, rather than anything else. Both adding XPRA_SHADOW_REFRESH_DELAY and trying to set the quality/bandwidth from the tray icon looked like no-ops.


Thu, 15 Nov 2018 00:28:49 GMT - stdedos: attachment set


Thu, 15 Nov 2018 16:31:48 GMT - Antoine Martin:

From the log samples:

This looks like a bug. Can you post the server's -d encoding,bandwidth,stats output please?


Thu, 15 Nov 2018 16:56:14 GMT - stdedos:

Replying to Antoine Martin:

Can I "skip" updating Ubuntu for now, and somehow add the webp support?

If is has to do with "adding/missing a package": xpra routinely complains about missing packages, e.g. paramiko, numpi; however, I have numpy installed:

$ pip freeze  | grep numpy
numpy==1.13.3
$ pip3 freeze | grep numpy
numpy==1.13.3

Thu, 15 Nov 2018 17:21:04 GMT - Antoine Martin:

Can I "skip" updating Ubuntu for now, and somehow add the webp support?

No. In any case, fixing the "the automatic quality and speed" is more important.

xpra routinely complains about missing packages, e.g. paramiko, numpi; however, I have numpy installed:

This is bogus. See ticket:1926#comment:1.


Fri, 16 Nov 2018 00:03:19 GMT - stdedos: attachment set


Fri, 16 Nov 2018 00:03:38 GMT - stdedos: attachment set


Fri, 16 Nov 2018 00:04:45 GMT - stdedos:

There you go.

Updating both server and client, seemed that handling was "barely" better.

Still unworkable though.

I raised quality to 15, seemed still okay


Fri, 16 Nov 2018 18:45:15 GMT - Antoine Martin:

.. you can try r21002 or later (server side update).

Updating both server and client, seemed that handling was "barely" better.

Those logs are from an older version of the server (2.5-r20979), so there are no fixes for avoiding the "framerate lowered" with shadow servers in the version you have tested with.

Thanks for posting these logs anyway, I have found many things we should be handling better already:

Updated packages posted for most distros.

Please try again and attach the -d stats,compress server debug output (and not a zip file with extra stuff). You should be getting much more reasonable automatic "speed" and "quality" settings out of the box. Though you may still need to lower min-quality down to 0 since you found that you need a really low quality setting.


Sat, 17 Nov 2018 09:03:38 GMT - Antoine Martin:

More improvements in r21021 but those aren't suitable for backporting.


Sat, 17 Nov 2018 13:17:45 GMT - stdedos:

Replying to Antoine Martin:

.. you can try r21002 or later (server side update).

Updating both server and client, seemed that handling was "barely" better.

Those logs are from an older version of the server (2.5-r20979), so there are no fixes for avoiding the "framerate lowered" with shadow servers in the version you have tested with.

Thank you for all the fixes. I will try to update both client and server, once beta packages are out, then re-run with -d stats,compress.

I assume r21002-fixes are irrelevant to report now? Sadly, when I re-ran there wasn't a Xenial package built.


Sat, 17 Nov 2018 16:19:35 GMT - stdedos: attachment set


Sat, 17 Nov 2018 16:24:00 GMT - stdedos:

I am happy to see that a simple "C:\Program Files\Xpra\xpra_cmd" attach ssh://sntentos@172.16.57.121/0 --min-quality=15 --desktop-scaling=0.75 --opengl=no (on a xpra shadow :0) worked out of the box. I could almost see a video, with bandwidth "constraints" well under control.

However, was I using the new version for sure? I did see the beta/xenial having the r21016 version posted, but xpra --version (and Session info, etc) said r20XXX. Maybe, xpra --version doesn't come directly from the repo, and it's something you have to do manually? If so, would you consider automating it?


Sat, 17 Nov 2018 17:40:42 GMT - stdedos:

For the OpenGL thing: It seems I have already been self-medicating:

Remember my screen setup: 1680x1050+0+30, 1920x1080+3600+0, 1920x1080+1680+0:

(from left to right) Screen 0 "jumps up and down", Screen 1 "does weird things", and Screen 2 sometimes you click and there is "no input"


Sat, 17 Nov 2018 17:41:30 GMT - stdedos: attachment set


Sun, 18 Nov 2018 04:26:10 GMT - Antoine Martin:

However, was I using the new version for sure?

dpkg -l xpra

but xpra --version (and Session info, etc) said r20XXX

What exact number?

xpra --version doesn't come directly from the repo, and it's something you have to do manually? If so, would you consider automating it?

It is already automated. But the packaging files may not have the same version number as the actual xpra source code.

For the OpenGL thing: It seems I have already been self-medicating: ...

I have no idea what I am looking at or what this means. But it doesn't look related to this ticket.


Sun, 18 Nov 2018 06:16:21 GMT - Antoine Martin:

... worked out of the box. I could almost see a video, with bandwidth "constraints" well under control.

You probably do not need to tweak min-quality, or anything else for that matter: leaving opengl on would give you much better client performance too.

The fixes ensure that we use video encodings more aggressively for shadow windows, which drastically reduces the amount of bandwidth used (by avoiding png), that's especially noticeable during start up. The actual automatic quality values used are hovering around ~50%, occasionally creeping up over 90% for a short while before dropping back down again as this seems to consume too much bandwidth for your connection. If you want improved picture quality, you should be able raise the min-quality and let xpra adapt automatically, it should then reduce the framerate.

The only thing left in that log that doesn't look quite right is how when we end up selecting the "vp8" encoder, we never seem to get more than a single frame out of it, which is going to waste CPU and bandwidth. Please post the:

XPRA_DEBUG_VIDEO_CLEAN=1 xpra shadow -d compress,stats

debug log output.


Sun, 18 Nov 2018 10:25:10 GMT - stdedos:

Replying to Antoine Martin:

However, was I using the new version for sure?

dpkg -l xpra

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                   Version                  Architecture             Description
+++-======================================-========================-========================-=================================================================================
ii  xpra                                   2.5-20181117r21016-1     amd64                    tool to detach/reattach running X programs

but xpra --version (and Session info, etc) said r20XXX

What exact number?

$ xpra --version xpra v2.5-r20980

For the OpenGL thing: It seems I have already been self-medicating: ...

I have no idea what I am looking at or what this means. But it doesn't look related to this ticket.

No. But it means the comment below: Replying to Antoine Martin:

... worked out of the box. I could almost see a video, with bandwidth "constraints" well under control.

You probably do not need to tweak min-quality, or anything else for that matter: leaving opengl on would give you much better client performance too.

... cannot be used. I can leave --min-quality out next time

The fixes ensure that we use video encodings more aggressively for shadow windows, which drastically reduces the amount of bandwidth used (by avoiding png), that's especially noticeable during start up. The actual automatic quality values used are hovering around ~50%, occasionally creeping up over 90% for a short while before dropping back down again as this seems to consume too much bandwidth for your connection. If you want improved picture quality, you should be able raise the min-quality and let xpra adapt automatically, it should then reduce the framerate.

Let's see :-D

The only thing left in that log that doesn't look quite right is how when we end up selecting the "vp8" encoder, we never seem to get more than a single frame out of it, which is going to waste CPU and bandwidth. Please post the:

XPRA_DEBUG_VIDEO_CLEAN=1 xpra shadow -d compress,stats

debug log output.

Removing --min-quality appeared to make the output more "pixelated", so that text was not easily readable and/or had artifacts around it (sort-of like a pixelate filter) Otherwise it was good enough for interaction.


Sun, 18 Nov 2018 10:27:34 GMT - stdedos: attachment set


Sun, 18 Nov 2018 11:12:02 GMT - Antoine Martin:

No. But it means the comment below: (..) ... cannot be used. I can leave --min-quality out next time

I don't understand any of this. Is it or is not related to this ticket? min-quality has nothing to do with opengl.

The log attached does not contain any "vp8" compression, so I can't debug that last issue. The contexts are being re-cycled a little bit too much, I'll see if I can make them a little bit more sticky.

The xenial repo has xpra 2.5-r21028.


Sun, 18 Nov 2018 11:20:00 GMT - stdedos:

Replying to Antoine Martin:

No. But it means the comment below: (..) ... cannot be used. I can leave --min-quality out next time

I don't understand any of this. Is it or is not related to this ticket? min-quality has nothing to do with opengl.

No, the fact that "what I look" and "what I mouse-click" is not 100% accurate, means that I cannot use that mode. It is not related to "dying out of bandwidth", but it is related to the shadow mode since I've been using it. I am sorry, but I cannot explain it any further on a comment section, if the video I sent does not "show it" to you.

The log attached does not contain any "vp8" compression, so I can't debug that last issue. The contexts are being re-cycled a little bit too much, I'll see if I can make them a little bit more sticky.

The xenial repo has xpra 2.5-r21028.

Should I force vp8 encoding (--encoding=vp8) with the new version and try again?


Sun, 18 Nov 2018 12:18:24 GMT - stdedos: attachment set


Sun, 18 Nov 2018 12:21:14 GMT - stdedos:

Forcing vp8 seems to lag more than the default option. The output is much more visible, but there is a delay of at least 1 second (at session beginning it was MUCH more). Maybe drop the quality aggressively at the beginning of > 1 sec delay, and then ramp up?


Sun, 18 Nov 2018 17:24:44 GMT - Antoine Martin:

No, the fact that "what I look" and "what I mouse-click" is not 100% accurate, means that I cannot use that mode.

Right, so this is a totally different bug, do we have a ticket number for it? (might be related to desktop-scaling if you use that feature) Could be related to one or more of: #1967, #1805 / #1801, #1656 (r17150), #1567, #1469, #41 / #1247 (r14368), #1339

Should I force vp8 encoding (--encoding=vp8) with the new version and try again?

No. The problem probably comes from the automatic encoding selection. We need all the encodings enabled to see what that code does. Maybe even add -d score:

XPRA_DEBUG_VIDEO_CLEAN=1 xpra shadow -d compress,stats,score

Forcing vp8 seems to lag more than the default option.

What do you mean by "lag" here? I assume you mean the 1 second delay?

The output is much more visible,

What does it mean "visible"?

but there is a delay of at least 1 second (at session beginning it was MUCH more).

In your log, I see that the first few frames for each window are compressed in ~70 to 80ms each. But they use up a lot more bandwidth than with h264: around 600KB for each frame, which means around 1.5 second's worth of bandwidth on a 10Mbps connection. So you end up with a similar problem to what you had originally with "png": the server has to wait before sending any more frames because the line is saturated. Then after that things settle down: quality and speed are lowered, differential compression kicks in and compresses better.

Maybe drop the quality aggressively at the beginning of > 1 sec delay, and then ramp up?

We already do that. Other users rightly complained that we were starting too low. It is impossible to satisfy everyone.

Also note that newer distributions ship a newer version of libvpx, which supports vp8 and vp9 and with much improved performance.


Sun, 18 Nov 2018 21:42:25 GMT - stdedos:

Replying to Antoine Martin:

No, the fact that "what I look" and "what I mouse-click" is not 100% accurate, means that I cannot use that mode.

Right, so this is a totally different bug, do we have a ticket number for it? (might be related to desktop-scaling if you use that feature) Could be related to one or more of: #1967, #1805 / #1801, #1656 (r17150), #1567, #1469, #41 / #1247 (r14368), #1339

Could be #1801 (since I never worked with the other mode), #1567, or an un-reported one, something I might have broken on my installation, Nvidia graphics card or something in-between. Me trying to debug it cannot go further that "output is distorted, output is 'shaking', mouse (x,y) seem wrong".

Should I force vp8 encoding (--encoding=vp8) with the new version and try again?

No. The problem probably comes from the automatic encoding selection. We need all the encodings enabled to see what that code does. Maybe even add -d score:

XPRA_DEBUG_VIDEO_CLEAN=1 xpra shadow -d compress,stats,score

Forcing vp8 seems to lag more than the default option.

What do you mean by "lag" here? I assume you mean the 1 second delay?

Yes

The output is much more visible,

What does it mean "visible"?

Almost no artifacts, colors are accurate.

but there is a delay of at least 1 second (at session beginning it was MUCH more).

In your log, I see that the first few frames for each window are compressed in ~70 to 80ms each. But they use up a lot more bandwidth than with h264: around 600KB for each frame, which means around 1.5 second's worth of bandwidth on a 10Mbps connection. So you end up with a similar problem to what you had originally with "png": the server has to wait before sending any more frames because the line is saturated. Then after that things settle down: quality and speed are lowered, differential compression kicks in and compresses better.

Maybe drop the quality aggressively at the beginning of > 1 sec delay, and then ramp up?

We already do that. Other users rightly complained that we were starting too low. It is impossible to satisfy everyone.

Also note that newer distributions ship a newer version of libvpx, which supports vp8 and vp9 and with much improved performance.

I will try

XPRA_DEBUG_VIDEO_CLEAN=1 xpra shadow -d compress,stats,score

at a later time


Sat, 24 Nov 2018 17:32:02 GMT - stdedos:

Requested diagnostics attached


Tue, 27 Nov 2018 13:32:39 GMT - Antoine Martin:

From that log, it seems that at least the client is no longer dying on its own since you ended up disconnecting it yourself: Disconnecting client .. client request at the end of the log. How good / bad was the experience?

How well we get things to run is always going to be a challenge: you have 3 screens with two 1080p and one 1680x1050. That's 5911200 pixels in total. Each pixel is 24-bit, so that's 17MB of data, or roughly 141 Mbits per refresh to send over the network. Your line is 10Mbps, so we need to compress by a factor of ~15 (ideally more to avoid congestion issues) just to be able to do a single refresh every second. Fortunately, h264 is very efficient and can compress 100 times or more, allowing you to see more than one frame per second. We can probably do better still, just don't expect miracles - we can't workaround the laws of physics.

There are also some recent updates to the speed / quality / batch heuristics: #2061, in particular r21127 avoids subsampling with shadow servers which may help.


Tue, 27 Nov 2018 14:19:15 GMT - stdedos:

Replying to Antoine Martin:

From that log, it seems that at least the client is no longer dying on its own since you ended up disconnecting it yourself: Disconnecting client .. client request at the end of the log. How good / bad was the experience?

Client hasn't dying by itself in some time now (noted somewhere above). Experience is "workable". I'd still wish if text would render artifact-less on the first couple of seconds (instead of more). However I can read the math below, and figure out that's kind of hard to do.

How well we get things to run is always going to be a challenge: you have 3 screens with two 1080p and one 1680x1050. That's 5911200 pixels in total. Each pixel is 24-bit, so that's 17MB of data, or roughly 141 Mbits per refresh to send over the network. Your line is 10Mbps, so we need to compress by a factor of ~15 (ideally more to avoid congestion issues) just to be able to do a single refresh every second. Fortunately, h264 is very efficient and can compress 100 times or more, allowing you to see more than one frame per second. We can probably do better still, just don't expect miracles - we can't workaround the laws of physics.

There are also some recent updates to the speed / quality / batch heuristics: #2061, in particular r21127 avoids subsampling with shadow servers which may help.

I did leave a comment regarding the "described" heuristics on the mentioned ticket; feel free to see if they apply at all or not.

As for further optimizations: (Maybe)

Definitions:

Assumptions:

Preconditions:

Situation: If you have quality issues (traced to bandwidth and/or network instability)

Then:

It is a very crude example I thought, but I am missing strict algorithmic head / proper knowledge. Feel free to refine it, if applicable

If my logic is sound (and applicable), I assume you can save a lot of wasted bandwidth for output user is not going to notice in the end anyway.


Tue, 27 Nov 2018 15:12:08 GMT - Antoine Martin: status changed; resolution set

Client hasn't dying by itself in some time now (noted somewhere above).

OK, thanks. Closing.

"Viewable" window: A window that is at least X% inside the "visible" area of the viewport

With compositing window managers and such, this can't really be relied upon.

You can recognize when window is active / "viewable" / minimized)

We already do:

You can recognize you have quality issues that can be traced to bandwidth and/or network instability

We already do.

Decrease FPS on non-active windows to 10~15fps (windows may be spread on multiple displays, so they still need to "feel" responsive)

We already do, the cap is at 25fps, lower if needed. There is no cap for viewable windows - you can achieve ~100fps or more.

Decrease FPS on non-"viewable"/minimized windows to 1fps (window will still have "enough" image data, when raised/become "viewable")

We already do exactly that.


Tue, 27 Nov 2018 15:16:10 GMT - stdedos:

Well then ... sucks to have my connection :-p

I'll try to raise min-quality (as noted elsewhere) and see if I can survive with it.


Thu, 13 Dec 2018 13:05:32 GMT - stdedos:

Extra idea: In both Windows 10 and in e.g. Ubuntu, there is the concept of workspaces. Can you detect the "current" workspace, and where is a window placed?

For Windows 10, there is an AutoHotKey script hooking enough in the Windows Task View / Workspace logic: https://github.com/sdias/win-10-virtual-desktop-enhancer


Thu, 13 Dec 2018 15:45:35 GMT - Antoine Martin:

Extra idea: In both Windows 10 and in e.g. Ubuntu, there is the concept of workspaces. Can you detect the "current" workspace, and where is a window placed?

We already do that for X11 to slow down the rate of updates when the window is not on the visible workspace.

For Windows 10, there is an AutoHotKey? script hooking enough in the Windows Task View / Workspace logic: ‚Äčhttps://github.com/sdias/win-10-virtual-desktop-enhancer

Thanks for the pointer, added to #2081


Sat, 23 Jan 2021 05:40:15 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2029