Xpra: Ticket #967: Large browser windows with lots of widgets seem to be causing lasting blurriness

Working with a 0.15.5 r10336 windows client (our build) against a 0.15.4 r10209 fedora 21 server, I'm seeing some blurriness, especially when scrolling, which doesn't seem to be resolving itself after the scrolling stops.

I've collected client side logs (with -d client,regionrefresh) and server side logs (with -d encoding,regionrefresh).

Yes, I also collected some xpra info.

And, jut for fun... I'll include a screenshot of the bit of blurriness (it was considerably worse at points, but I did catch some... but was mostly seeing only with very large windows, so I won't try to include inline).

Wed, 26 Aug 2015 02:32:40 GMT - alas: attachment set

xpra info for 0.15.4/5 blurry

Wed, 26 Aug 2015 02:42:24 GMT - alas: attachment set

client side logs -d client,regionrefresh

Wed, 26 Aug 2015 02:48:46 GMT - alas: attachment set

server logs -d encoding,regionrefresh (so long I had to zip... but if something gives time index, they might be useful)

Wed, 26 Aug 2015 02:50:07 GMT - alas: attachment set

screenshot of medium level of blurriness I was seeing with google-chrome (you've seen webkit)

Fri, 28 Aug 2015 03:43:39 GMT - Antoine Martin: owner changed

About the debug flags:

But in this case, the refresh is not the cause of the problem, your xpra info shows:

window[3].size=(1226, 1635)
window[3].video_subregion.refresh_region[0]=(0, 85, 1226, 1550)

So almost all of the window is detected as a video region. The scrolling made us select the whole window as video (because there is no way to tell that it isn't: it is updating fast and in exactly the same area each time, just like video). And the heuristics kept this region afterwards, probably because so many things are animated on this page that they keep the "hit counter" high.

This means that the detection heuristics get it wrong: #410. So the debug flag that you want is probably (server-side): regiondetect.

Thu, 03 Sep 2015 02:00:16 GMT - alas:

Tested some more with our set up. It seems easier to reproduce with our builds & configuration... will try to carve out a few minutes to test with encryption on with your builds to see if that might be the difference.

In any case, got the screen test to go blurry and remain so (msn.com, money tab... perhaps that page should stay blurry?) - will attach server logs with -d regiondetect for a portion of time with the text stuck at blurry, despite scrolling and mousing around the links and such, as well as a new xpra-info specific to our set up. I'll also attach a full-size screenshot of the page with most of it blurry, as well as an edited-for-size to link in-line.

The large-ish image sort of top-left is a rotating ad, which refreshes every... ohh, 1-2 seconds (?) ... and some of the other widgets seem to involve some motion (including a couple more ads that I didn't bother to capture in the screenshot). I suspect they may be responsible for just enough updates to keep the region detecting as video.

Thu, 03 Sep 2015 02:01:17 GMT - alas: attachment set

-d regiondetect server logs, our server, 0.15.5(ish)

Thu, 03 Sep 2015 02:03:48 GMT - alas: attachment set

xpra info, server side (of course) - our server (0.15.5 r10308 +/-)

Thu, 03 Sep 2015 02:04:45 GMT - alas: attachment set

full size shot of page while blurry - our server/client

Thu, 03 Sep 2015 02:07:41 GMT - alas: attachment set

edited for in-line shot of blurry, to show widget concentration

Thu, 03 Sep 2015 02:08:50 GMT - alas:

edited for in-line shot of blurry, to show widget concentration

Thu, 03 Sep 2015 11:43:43 GMT - Antoine Martin:

From your regiondetect debug log, we can see at regular intervals:

testing      current video region       rectangle[0, 79, 2098, 1306]: 100% in,   0% out,  93% of window, score=103
identify video: most=100% damage count={R(0, 79, 2098, 1306): MutableInteger(400)}

So it finds that 100% of screen updates happen in the region that previously identified as video, that's roughly 20 to 40 repaints per second! (the calculations run at most every second - less when there is not much happening on screen)

Not only that, but if you look at the actual paint events themselves (the format is simple: timestamp, X, Y, WIDTH, HEIGHT), ie:

(1441237975.382138, 0, 79, 2098, 1306), (1441237975.402772, 0, 79, 2098, 1306), (1441237975.428191, 0, 79, 2098, 1306)

All of the events that I can see actually repaint the whole of that area! (it's easy to see if you just search the log output for the string 0, 79, 2098, 1306, what is not highlighted is the rest - not much!) Usually you get smaller sub-areas, especially with players like flash that paint the screen in horizontal chunks, or youtube which repaints the video and the controls around it separately, but in this case it is all in one huge area!

You should be able to confirm that we are recording the correct values for paint events by logging with -d encoding then grepping the output for damage. But the code is unambiguous in this area: we record all non refresh events in the list you see in the regiondetect log.

So at this point I think I will close this bug as invalid. The region detect code gets it right, and we're doing remarkably well considering the heavy paint traffic.

It looks to me like the browser is needlessly repainting things that have not moved. It could also be that this particular page is triggering those events through bad javascript code. I found a good page which explains the browsers' rendering process: How Browsers Work: Behind the scenes of modern web browsers If the problem comes from the browser's rendering engine rather than the page, this needs to be fixed as it will consume huge amounts of CPU for absolutely nothing.

Edit: originally said 400 updates per second, which was incorrect. We keep the most recent 400 events, and the time difference from oldest to newest is roughly between 10 and 20 seconds.

Tue, 08 Sep 2015 23:23:49 GMT - alas: status changed; resolution set

Looks like closing on your end is probably the right thing to do. We'll have to handle it on our end.

I'll take the liberty of closing.

Wed, 28 Oct 2015 00:00:15 GMT - J. Max Mena: status changed; resolution deleted

I have been volunteered to re-open this ticket. All jokes aside, I am seeing identical behavior in the latest Chromium (the open source variant - not the closed source Chrome):

From there, you can do two things to see the blurry-ness stick around. You can click on a posting that will time out shortly (within 3 hours), or just sit there on the search results page(new!). With Chromiums paint debug enabled, you can see that the post titles refresh every second, and if on a posting is timed to match the clock ticking down.

The Heuristics here aren't catching (but trying if XPRA_OPENGL_PAINT_BOX=1 is set) these partial refreshes, and instead are repainting the whole window with h264...this causes the whole thing to become blurry. In some cases, it does come in clear; but that's about 30% of the time in my experience today.

I'll attach a screenshot of the behavior. If you would like logs, please let me know what flags you want and I'll attach them; as the repro is relatively simple.

As an aside, all this is very reminiscent of #410 and #596 from almost 2 years ago...speaking of which, my 2 year Anniversary here is coming up in a few short months.

Wed, 28 Oct 2015 00:00:54 GMT - J. Max Mena: attachment set

Sitting at an Ebay search query and seeing the blurry stick constantly. This behaviour appears to stick around indefinitely.

Tue, 03 Nov 2015 00:50:48 GMT - alas:

Repro'd for logs, win client 0.16.0 r11118 against fedora 21 0.16.0 r11118.

Using steps listed above (comment:6), with a slightly different ebay search site... http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2047675.m570.l1313.TR12.TRC2.A0.H0.Xsuper+beetle.TRS0&_nkw=super+beetle&_sacat=0.

Scrolling up & down and mousing all over all the various widgets, even with the chromium paint boxes flashing regularly, wasn't sufficient to induce blurriness with a 1920x1080 window (give or take).

Re-sizing the window, however, seems to trigger the blurriness pretty reliably. (Shrink the window, then resize back to +/- 1920x1080).

I set the test up to be as narrow a window as possible, then blew it at the last minute... launched server with logs being captured, but no flags enabled... then connected client without logs in order to set up the blurriness.

I then disconnected client and re-connected to running session with logs enabled, -d client,regionrefresh (which will explain the disconnect/re-connect you'll see in server logs). I then used control channel to enable the server logs (and noticed that trying to pass two arguments failed... I'll make another ticket for that) - in my hurry I'm not sure if I enabled regionrefresh first, or encoding, but you'll see a few long seconds in the server logs with one enabled only, before I managed to enable the other.

I then resized the chromium window (smaller, larger), but then tried to get a screenshot... which means more logs than were probably strictly necessary. Oops.

In my hurry I also managed to blow the xpra info at the time, but I repro'd again without logs running and grabbed a new xpra info (window sizes might be a little different, but otherwise the info should be good).

Just wanted to give as much info as possible, so you'll be able to ignore as much superfluous logs as possible.

Also, ran with --desktop-scaling=off, I'll attach logs and new screenshot (the repro done my maxmylyn was on a particularly low end client machine, wanted to be sure that wasn't the root cause, rather than just the reason it was so easy to repro)... and then I'll try again with scaling of 1.5 and 2, just to see if there are different results (I imagine there will be).

Tue, 03 Nov 2015 01:24:06 GMT - alas: attachment set

one more repro screenshot, XPRA_OPENGL_PAINT_BOX=1, most of screen encoded h264, but only link areas updating, according to chromium paint boxes

Tue, 03 Nov 2015 01:38:13 GMT - alas:

Interesting, even with --desktop-scaling=1.5, once I get this window to window[4].size=(1842, 952), I am able to make it blurry. Of course, with scaling at 1.5, this window is enormous on a 4K monitor.

Likewise, with the default scaling (which still seems to be 2 x 2 on a 4K), if I shrink, then stretch the window back to window[4].size=(1856, 977), I'm able to induce blurriness... though, that's pretty much fullscreen/maximized on a 4K.

Any other debug flags worth trying?

Sat, 05 Dec 2015 11:17:32 GMT - Antoine Martin: priority changed

@afarr: please re-assign ticket to me if you want me to take a look.

The log data is very very large, but it looks to me like we're doing the right thing, there are lots of samples that look like this:

damage(WindowModel(0xc00001), 5, 101, 1927, 34, {})
damage(5, 101, 1927, 34, {}) wid=3, scheduling batching expiry for sequence 1570 in 50.0 ms
damage(WindowModel(0xc00001), 5, 135, 1927, 34, {})
damage(5, 135, 1927, 34, {}) wid=3, using existing delayed h264 regions created 0.0ms ago
(... edited out, repeating all the way down the window, up to:)
damage(WindowModel(0xc00001), 5, 1019, 1927, 34, {})
damage(5, 1019, 1927, 34, {}) wid=3, using existing delayed h264 regions created 0.0ms ago
damage(WindowModel(0xc00001), 5, 1053, 1927, 13, {})
damage(5, 1053, 1927, 13, {}) wid=3, using existing delayed h264 regions created 0.0ms ago

So this is repainting most of the window in horizontal bands of 34 pixels high, from Y coordinate 101 to 1927. Sometimes the chunks are slightly bigger (40 or more pixels). And this is happening over 10 times per second.

So the heuristics then decide that we are dealing with video content and send it as such, using h264 (and rgb24 for the 1 pixel edges, if any:

process_damage_regions: wid=3, adding pixel data to encode queue (1x965 - rgb24), elapsed time: 47.2 ms, request time: 2.2 ms
process_damage_regions: wid=3, adding pixel data to encode queue (1932x1 - rgb24), elapsed time: 47.7 ms, request time: 0.1 ms
process_damage_regions: wid=3, adding pixel data to encode queue (1926x964 - h264), elapsed time: 48.3 ms, request time: 0.4 ms

The h264 encoder seems to settle for a quality setting between 60% and 70%, which is normal:

video_encode encoder: h264 1920x908 result is 909 bytes (128.1 MPixels/s), \
  client options={'pts': 9886, 'frame': 180L, 'csc': 'YUV420P', 'type': 'P', 'quality': 64, 'speed': 86}

And the video pipeline also uses the YUV420P colourspace conversion mode, which will cause some blurring already for areas that aren't just black and white.

All in all, I don't see much we can do here: I think we are dealing with these pixel storms caused by the browser's rendering engine as well as we can.

Raising because all 0.16.x tickets should be dealt with before the release.

Wed, 09 Dec 2015 00:45:34 GMT - alas: status changed; resolution set

We've been doing some testing, experimenting with the min-quality settings especially (also min-speed, though that seems to be having less impact).

We've noticed that raising min-quality from the default of 30 to 60 seems to help a lot, with relatively little impact on client responsiveness (unless latency misbehaves)... and that further raising to 80 seems to largely resolve blurriness issues for even the most awful widget-packed sites (though that setting seems to be safe to use only on a LAN, vpns or internet inconsistencies seem to easily lead to noticeably degraded responsiveness).

I think you're right, there's not much else to do until someone, somewhere, decides to do the work to make it easier to isolate video regions so that that information can be passed to the heuristics.

I'll go ahead and close this (and if I discover someone has done that work, I'll open an enhancement request ticket to add a heuristics-helper).

Wed, 09 Dec 2015 02:26:44 GMT - Antoine Martin: status changed; resolution deleted

As I mentioned before, increasing the min-quality to workaround browser page rendering issues is very wasteful. Don't do that.

This will end up compressing video using YUV444 instead of YUV420, which will use roughly twice as much bandwidth and twice as much CPU. Halving user density.

Apart from hints that can help us identify video regions, there are other things that may help:

And one last idea: if there is nothing (or almost nothing) changing on screen, then we should be able to see a very high compression ratio on those frames. This could be used as a hint to make us raise the quality automatically.

Wed, 24 Feb 2016 09:56:32 GMT - Antoine Martin:

See also: #1135.

Fri, 08 Apr 2016 00:23:23 GMT - alas: status changed; resolution set

Well, we've been working on the browser to detect video regions. Can currently detect html5 and flash video regions, and the behavior seems better in those cases - but when we hit pages with a lot of widgets or sites that seem to try to trigger constant updates for no apparent regions, especially when there`s no region of video to communicate to the server explicitly, then we still seem to run into the issue.

Before I can test the behavior with any new xpra updates though, we'll need to extract our browser to make it portable enough to try with a more flexible server/client environment.

At this point, I'm inclined to close this ticket as having been an investigation without that video region detection to improve and open a new ticket with some better & more relevant details once we`ve finished up that work to make a browser with video region detection more portable.

I suspect the idea of raising the threshold for video detection, perhaps with a flag or environment variable, when using a browser (or other application) which can be relied on to detect actual video regions would be the next step.

Once I can actually play with the encodings or other code updates though, we'll see what happens.

Closing this for now though.

Thu, 21 Jul 2016 10:36:20 GMT - Antoine Martin: status changed; resolution deleted

Seems this is not fixed, see ticket:800#comment:17 and #1265.

Note: r13056 now also allows the whole window to be a "video region" so we can apply the video settings to full screen windows.

This should not have been closed twice without a proper resolution, this is now a serious blocker for other features: #800, #1257, #1232.

Mon, 25 Jul 2016 18:30:53 GMT - J. Max Mena:

Using a Fedora 23 trunk r13086 built from source server and client:

As per #1257 comment:5, this is a bug, and not expected behavior.

Mon, 25 Jul 2016 18:43:17 GMT - J. Max Mena:

Interestingly, just running htop in an xterm, I see it painting with h264 when it updates, and with png between updates.

Tue, 26 Jul 2016 08:38:28 GMT - Antoine Martin: owner, status changed

As per ticket:1257#comment:6 : typing quickly into a text box (like Trac tickets...) does trigger an h264 region.

I see that it's sometimes painting part of the window with h264 (I usually see this when watching a video in YouTube?? and switching tabs)

That's a slightly different issue, and one that is much harder to fix: the video region code is fairly expensive to run, so we run it no more than every second. Also, we try to stick to a video region when we found one, to prevent video context thrashing which is also expensive.

When it paints the whole window with h264, I don't get the delayed region (partial painting followed by the rest of the window painting), but I do get a notable blurriness

Did it use a video region for the whole window? If so, let's figure out why it did so we can tweak the code to avoid doing so, see comment:9 : how many repaint events are we processing when that happens? Maybe also "-d compress" of when that happens.

(PS: your comment link in comment:16 looks like it points to the wrong comment)

Tue, 26 Jul 2016 17:54:23 GMT - J. Max Mena:

Upped server and client to r13086:

Found an interesting corner case, and I'll attach a screenshot. On some sites that have lots of widgets, gifs, etc etc, it can trick the heuristics into painting the whole window with h264. Using google-chrome --show-paint-rects helps show what's happening.

What's also interesting, is that after you get it into this state, switching tabs to something less egregious (like Trac), and interacting even a little bit with the site will cause the whole window to paint with h264. Once in this state, I'm not entirely sure how to get out of it. I'll attach a screen shot of me clicking into this text field to show what I mean.

Tue, 26 Jul 2016 18:03:10 GMT - J. Max Mena: attachment set

Note that the inline GIF and the giant sparkly SWAG (so annoying, but relevant) are the only things on the page updating, yet the whole page is being painted as H264

Tue, 26 Jul 2016 18:03:41 GMT - J. Max Mena: attachment set

notice that only the cursor is updating and the whole page is being painted as h264.

Tue, 26 Jul 2016 20:12:32 GMT - J. Max Mena:

re comment:17:

Did it use a video region for the whole window?

Yes it was using a video region for the whole window.

I'll spend some time to see if I can repro that easily and get relevant logs.

Fri, 29 Jul 2016 20:19:05 GMT - J. Max Mena: owner changed

Somehow this ticket slipped through the cracks yesterday while I was testing for something similar. Either way, I found a solid repro in Chrome.

Using a trunk Fedora 23 r13131 client and server, started with:

xpra start :13 --bind-tcp= --start-new-commands=yes --start-child=google-chrome (or just an xterm to launch Chrome)

and connected with:

XPRA_OPENGL_PAINT_BOX=1 xpra attach tcp:ip:port

Upon doing so, interacting with tiny elements cause the whole window to be repainted with h264. In doing so, I notice that it tends to paint the whole window with h264, and then when it refreshes with a picture encoding, it seems to have missed a frame or so, so the window appears to jump slightly. You'll also get this behavior when it starts painting text fields with h264, causing the cursor to jump periodically, and text to appear and disappear. Needless to say, it makes typing difficult.

Of note:

If you try to repro on this page, interacting with the text field for comments after scrolling is an easy way to trigger.

I will attach the requested -d compress log.

In the log:

Fri, 29 Jul 2016 20:19:32 GMT - J. Max Mena: attachment set

requested -d compress log

Thu, 25 Aug 2016 14:53:40 GMT - Antoine Martin: owner changed

The problem described in comment:20 doesn't sound like blurriness but more like an h264 b-frame (#800) or auto-refresh issue. (which should go in a different ticket) Please try disabling b-frames to confirm.

The changes in #1232 should now make it much harder to trigger these paint issues, and may just hide the underlying problem, in which case you may have to use XPRA_SCROLL_ENCODING=0 to disable scrolling detection.

Mon, 29 Aug 2016 22:18:07 GMT - J. Max Mena:

The issue listed in comment:20 is much much harder to trigger, even with XPRA_SCROLL_ENCODING=0. I'll keep hunting for it, but I'm fairly confident we've banged out that corner case. Maybe.

Sat, 03 Sep 2016 00:11:58 GMT - alas:

It looks like the strange small square of video region detected (blue-h264 box) seems to be mostly-reliably reproducible using youtube.com.

The exception is when an ad loads. Ironically enough, in that case the correct region is detected.

Still testing with 1.0 r13520 win32 client and fedora 23 server, I grabbed the following logs of before and during.

[jimador@jimador]$ xpra info :13 | grep region
window.2.video_subregion.in-out=(12992756, 4869391)
window.2.video_subregion.rectangle=(60, 140, 1280, 720)
2016-09-02 16:56:26,048 scoreinout(rectangle[60, 140, 1280, 720], 2411642, 394834) inregion=0.859, inwindow=0.638, ratio=1.346, sizeboost=1.638
2016-09-02 16:56:26,049 testing      current video region      rectangle[60, 140, 1280, 720]:  85% in,  14% out,  63% of window, score=170
2016-09-02 16:56:26,049 keeping existing video region rectangle[60, 140, 1280, 720] with score 170

And, after clicking on the top left tile.

[jimador@jimador]$ xpra info :13 | grep region
window.2.video_subregion.in-out=(628800, 5319450)
window.2.video_subregion.rectangle=(92, 166, 301, 204)
window.2.video_subregion.refresh_region[0]=(92, 166, 300, 204)
2016-09-02 16:57:29,857 scoreinout(rectangle[92, 166, 301, 204], 1596504, 22960296) inregion=0.065, inwindow=0.043, ratio=1.529, sizeboost=1.043
2016-09-02 16:57:29,857 testing      current video region       rectangle[92, 166, 301, 204]:   6% in,  93% out,   4% of window, score=160
2016-09-02 16:57:29,857 keeping existing video region rectangle[92, 166, 301, 204] with score 160

I'll attach a couple of screen shots as well.

Sat, 03 Sep 2016 00:14:18 GMT - alas: attachment set

youtube tile page (before choosing top left tile to play)

Sat, 03 Sep 2016 00:15:16 GMT - alas: attachment set

video region detected while video plays corresponding to region where the tile was on the previous page

Sat, 03 Sep 2016 00:17:55 GMT - alas:

Youtube page before selecting one of the tiles (hint, gonna select top left):

youtube tile page (before choosing top left tile to play)

Youtube playing with video region detected corresponding to the tile location of link:

video region detected while video plays corresponding to region where the tile was on the previous page

Tue, 06 Sep 2016 15:50:36 GMT - Antoine Martin:

r13587 tweaks the video region detection code to avoid skewing the results so much for small regions. (it may still need tweaking some more, but at least small regions should no longer win so easily)

Thu, 08 Sep 2016 17:33:03 GMT - Antoine Martin:

More scoring tweaks in r13619 (see commit message). It may still take a little while for things to settle down on the correct region but firefox no longer makes the video region bigger than necessary (including the address bar in the video area..), at least not permanently...

FYI: when things go wrong, please capture ONE section like this one:

scoreinout(1171, 867, rectangle(45, 131, 640, 360), 28800000, 0) inregion=100%, inwindow=22%, ratio=4.4, score=157
testing      current video region       rectangle(45, 131, 640, 360): 100% in,   0% out,  22% of window, damaged ratio=1.00, score=157
identify video: most=100% damage count={R(45, 131, 640, 360): MutableInteger(125)}
scoreinout(1171, 867, rectangle(45, 131, 640, 360), 28800000, 0) inregion=100%, inwindow=22%, ratio=4.4, score=157
testing most-damaged video region       rectangle(45, 131, 640, 360): 100% in,   0% out,  22% of window, damaged ratio=1.00, score=157
identify video: score most damaged area rectangle(45, 131, 640, 360)=157.0%
setting new region rectangle(45, 131, 640, 360): 100% of large damage requests, score=157.0
scoreinout(1171, 867, rectangle(45, 131, 640, 360), 28800000, 0) inregion=100%, inwindow=22%, ratio=4.4, score=157
score((28800000, 0))=157, damaged=100%

Fri, 09 Sep 2016 12:30:40 GMT - Antoine Martin:

As of r13639, we also keep the latest scores in xpra info, ie: with youtube:

window.1.video_subregion.scores.rectangle(125, 71, 1046, 796)=102
window.1.video_subregion.scores.rectangle(125, 131, 480, 360)=105
window.1.video_subregion.scores.rectangle(125, 131, 929, 360)=107
window.1.video_subregion.scores.rectangle(125, 131, 970, 369)=107

(failed to identify the 480x360 video at that point because of other screen updates)

Mon, 26 Dec 2016 09:16:09 GMT - Antoine Martin: status changed; resolution set

Not heard back, closing.

Sat, 23 Jan 2021 05:11:08 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/967