xpra icon
Bug tracker and wiki

Opened 4 years ago

Closed 2 years ago

#410 closed enhancement (fixed)

better sub-window encoding: detect regions and use sub video encoder

Reported by: Antoine Martin Owned by: alas
Priority: critical Milestone: 0.13
Component: core Version:
Keywords: Cc:

Description (last modified by Antoine Martin)

If you have a large window but only a fraction of that window changes we waste a lot resources.
We currently create a video encoder for the whole window and waste a fair bit of time capturing, csc-ing, encoding, (sending), decoding and displaying regions of the screen that have not changed at all (and we know they haven't)

We have the "small region" code, which deals with small-ish regions by sending them using another encoding (usually rgb or png/jpeg) but it has its limits: it does not use a video encoder and is limited in size.

We should keep track of the damage areas in more detail, including their location and size (easy). Then we can detect regions of the screen that update often (easy-ish) and create a sub video encoder just for those (harder).
This could even work when the region is moving, as long as its size stays the same.
This is negated somewhat by the fact that when we send the whole window (because of other updates), it will include this region - unless we purposedly blank it out before compression and ask the client to re-assemble the two before displaying the result (hard).

An even better solution to this would be #509 or #510, as this would give us the video area precisely everytime, even before it gets converted to RGB pixels.

Attachments (26)

video-subregion-client.patch (4.9 KB) - added by Antoine Martin 3 years ago.
allows clients to process video sub-regions
video-subregion-server.patch (15.2 KB) - added by Antoine Martin 3 years ago.
server side patch: detect and use video sub-regions
video-subregion-v4.patch (14.2 KB) - added by Antoine Martin 3 years ago.
remaining server side bits, still needs a proper way to ensure non-video regions don't use the video encoding (which does a re-init and wastes everything)
video-subregion-v5.patch (18.3 KB) - added by Antoine Martin 3 years ago.
better patch if a little ugly: caller can provide method for deciding which encoding to use (so we can ensure we don't use video for anything but the current subregion)
video-subregion-v6.patch (25.4 KB) - added by Antoine Martin 3 years ago.
ensure we don't end up using the video encoder whilst we still have a video region
410refresh.txt (861.6 KB) - added by maxmylyn 3 years ago.
-d refresh from the server
410videobeingblurry.png (352.9 KB) - added by maxmylyn 3 years ago.
A screenshot depicting where the video was usually located while being watched in chrome.
410partialblurry.png (1.4 MB) - added by maxmylyn 3 years ago.
Depicting the bottom half of a video being h264 and the top being lossless.
410offsetrefresh.txt (375.0 KB) - added by maxmylyn 3 years ago.
-d refresh of this behavior. Interesting bit is at 12:33.
410subregion.txt (1.7 MB) - added by maxmylyn 3 years ago.
Booted up chrome, and navigated to a YouTube? video (Switzerland from Above - Top Sights). 10:22 - 10:23 was the choppiest before I closed it.
410subregiontest2.txt (219.3 KB) - added by maxmylyn 3 years ago.
16:05:00ish video playing until 16:05:15ish at which point it's paused until 16:05:30ish where I hit play again.
410subregionr6857.txt (1.2 MB) - added by maxmylyn 3 years ago.
Similar test. Again, launching chrome, navigating to YouTube? and watching a video.
410xprainfo.txt (103.6 KB) - added by maxmylyn 3 years ago.
Requested Xpra Info
410r6883regiondetect.txt (254.3 KB) - added by maxmylyn 3 years ago.
410centerblurr.png (1.3 MB) - added by maxmylyn 3 years ago.
Screenshot showing the placement of the video and the subsequent subregion detections.
410r6866regiondetect.txt (523.6 KB) - added by maxmylyn 3 years ago.
Started Xpra. Set quality to 0 and min-quality to 10 from another putty window. Connected and opened chrome, went to the video, and sat and watched for a minute and skipped ahead a bit and watched some more. Then disconnected and closed the server.
410r6870regiondetect.txt (138.2 KB) - added by maxmylyn 3 years ago.
Booted up chrome. Navigated to YouTube? and played a video. Scrolled and the whole page became blurry. Paused the video and it stayed blurry. Switched tabs and it fixed it. Went back to YouTube? and played the video on the large player for a bit then disconnected.
410r6873regionrefresh.txt (1.0 MB) - added by maxmylyn 3 years ago.
Connected and opened chrome. Navigated to video. Watched for 20ish seconds then skipped ahead. Watched for a bit more, then scrolled and it started rendering the whole thing blurry. Scrolled down after a minute and left the video half hidden while watching for a bit, then disconnected.
410blurry.png (1.7 MB) - added by maxmylyn 3 years ago.
Blurriness with r6896
410blurryrefresh.txt (58.4 KB) - added by maxmylyn 3 years ago.
full -d refresh; relevant time to look at: 15:54:30
410blurryencoding.txt (433.7 KB) - added by maxmylyn 3 years ago.
full -d encoding; relevant time: 16:05:50
410encodingrefresh.txt (925.2 KB) - added by maxmylyn 3 years ago.
Retest of the previous test. Everything after 09:39:40 is just sitting at Wikipedia. No scrolling or typing just moving the mouse over links.
410xprainfo.2.txt (103.6 KB) - added by maxmylyn 3 years ago.
Xpra info right after it snapped to blurry, but before it refreshed to higher quality. Full log-file can be provided if necessary. Luckily got it on the first try.
pycallgraph-xpra.png (678.2 KB) - added by Antoine Martin 3 years ago.
example output from profiling
410xprainfor7041.txt (87.6 KB) - added by maxmylyn 3 years ago.
xpra info while watching a youtube video
410xprainfor7041quality30.txt (87.6 KB) - added by maxmylyn 3 years ago.
an xpra info at the same time, but with quality manually set to 30 - no noticeable affect on video performance.

Change History (80)

comment:1 Changed 3 years ago by Antoine Martin

Description: modified (diff)
Status: newassigned

Changed 3 years ago by Antoine Martin

allows clients to process video sub-regions

Changed 3 years ago by Antoine Martin

server side patch: detect and use video sub-regions

comment:2 Changed 3 years ago by Antoine Martin

Lots of preparatory work in:

With the 2 patches attached, it seems to work. Including with the new test app tests.xpra.test_apps.test_videoregions (as long as the server is started with XPRA_FORCE_BATCH=1)

Still TODO:

  • add some tests for the region substraction code (looks right - but too hard to tell for sure by manual inspection)
  • ensure we don't use the window dimensions anywhere in the encoder/csc pipeline creation or scoring methods (looks clean!)
  • test with scaling
  • test with proxy encoding
  • benchmark and profile it

Maybes:

  • take fps into account: discard old damage events?
  • pass subregion down to encoders so we can blank it out if we've sent it as video already?
  • somehow pass an event to the client telling it that we've discarded the video encoder (rather than wait for the window to close, or a new encoding stream to take its place)
  • support multiple regions (hard)
Last edited 3 years ago by Antoine Martin (previous) (diff)

comment:3 Changed 3 years ago by Antoine Martin

Merged the uncontroversial bits in:

  • r5429: client side
  • r5430: region handling code and tests
  • r5431: proxy encoder support

The tricky thing about the server side is that applications do stupid repaints... ie: firefox will repaint things on the right hand side of the flash player (even though they don't change..), so we have to be more clever than planned and introduce more heuristics, and region matching code. Work in progress.

Changed 3 years ago by Antoine Martin

Attachment: video-subregion-v4.patch added

remaining server side bits, still needs a proper way to ensure non-video regions don't use the video encoding (which does a re-init and wastes everything)

Changed 3 years ago by Antoine Martin

Attachment: video-subregion-v5.patch added

better patch if a little ugly: caller can provide method for deciding which encoding to use (so we can ensure we don't use video for anything but the current subregion)

Changed 3 years ago by Antoine Martin

Attachment: video-subregion-v6.patch added

ensure we don't end up using the video encoder whilst we still have a video region

comment:4 Changed 3 years ago by Antoine Martin

Big commit in r5437 with lots of details, small fix in r5441.

Bigger problem with nvenc in #517 blocks further work on this one.

comment:5 Changed 3 years ago by Antoine Martin

Milestone: future0.13
Owner: changed from Antoine Martin to alas
Status: assignednew

Was actually enabled in 0.12.x and has been working OK so far.

afarr / smo: worth knowing about / testing as part of #419.

comment:6 Changed 3 years ago by Antoine Martin

Follow up work in #596, should have been closed by now: done 4 months ago..

Last edited 3 years ago by Antoine Martin (previous) (diff)

comment:7 Changed 3 years ago by maxmylyn

Did some testing with some quality changes from ticket:596#comment:2 :
Tested with a r6853 Win7 64-bit client against a r6853 Fedora 20 client (with a -d refresh piped into a .txt):

  • Opened up a YouTube? video
  • Started a video
    • The whole webpage renders in a low quality.

After changing the min-quality setting to a higher number, the whole webpage starts rendering much nicer. Lowering min-quality causes the whole webpage to start looking choppy. When the video is paused, the webpage starts to render clearly.

  • Ran another connection(no logs), this time just leaving it on Wikipedia.
    • Hovering over links(causing them to become underlined, with no other webpage changes) caused the whole webpage to become blurry.

It looks like partial refreshes aren't working on the 14.0 r6853 Windows client for sure. Our OSX build currently doesn't have h.264, so it switched over to PNG, which only allowed for lossless rendering. When Windows was forced onto PNG, it behaved identically; drawing only lossless. However, the video played at a lower framerate due to Xpra not being able to keep up.

That being said, I tested with our CentOS client, which had h.264, and I got the exact same behavior. Partial webpage changes are causing a full-screen refresh. Sometimes with quality extremely low, making the window become very blurry.

In addition, I'll leave a comment on #596 as well.

Changed 3 years ago by maxmylyn

Attachment: 410refresh.txt added

-d refresh from the server

comment:8 Changed 3 years ago by alas

As mentioned in #596, the above behavior shows with chrome, but with firefox (& mostly on lazarus) the refreshes seem to be behaving as expected.

comment:9 Changed 3 years ago by Antoine Martin

Resolution: fixed
Status: newclosed

This ticket is about xpra detecting that there is video on screen, and where it is. Judging by the log samples in #596, it does that.
Another way of verifying this is with:

xpra info | grep "\.video_subregion="


.. caused the whole webpage to become blurry
It looks like partial refreshes aren't working on the 14.0 r6853 Windows client for sure


Not necessarily. Just because you don't see anything refreshing visually, does not mean that the browser isn't repainting that part of the screen (often unnecessarily). This belongs in #596 anyway. Closing.

comment:10 Changed 3 years ago by Antoine Martin

Resolution: fixed
Status: closedreopened

Actually re-opening this bug: the issue from ticket:596#comment:4 belongs here instead: not detecting the video region with chrome.

Extracts:

auto refresh: h264 screen update (quality= 37), keeping existing timer \
    (region=rectangle[0, 0, 1450, 894], refresh regions=[R[0, 0, 1450, 894]])

The video region is not detected. All I see is many like this one:

auto refresh: jpeg screen update (quality= 34), scheduling refresh \
    (region=rectangle[0, 0, 213, 26], refresh regions=[R[0, 0, 213, 26]])

Where was the video showing in the window? (213x26 at 0x0 looks like the address bar)
Please post the server's -d subregion log output of when the video is playing and not being detected. Also: which page can allow me to reproduce? Which screen settings, etc.

Last edited 3 years ago by Antoine Martin (previous) (diff)

comment:11 Changed 3 years ago by Antoine Martin

Owner: changed from alas to maxmylyn
Status: reopenednew

comment:12 Changed 3 years ago by maxmylyn

I'll attach a screenshot that I took earlier

Changed 3 years ago by maxmylyn

Attachment: 410videobeingblurry.png added

A screenshot depicting where the video was usually located while being watched in chrome.

comment:13 Changed 3 years ago by Antoine Martin

Please post more information: see comment:10, comment:9, include -d refresh, -d compress, -d damage, -d encoding. And maybe all of them together -d compress,damage,encoding.

comment:14 Changed 3 years ago by maxmylyn

http://www.youtube.com/watch?v=f488uJAQgmw Here is the video. (a little crude humor, but I find it hilarious)

These were watched running full-screen on a 1080p monitor. The window wasn't entirely full-screen. I moved it to the top left and manually dragged to almost full-screen. Basically 1900-1916 x 1040-1075 pixels. All running in chrome - whatever the latest version is.

I'll get you some logs on Monday. (Unless I can get Xpra running at home - a task I'll get to at some point...)

comment:15 Changed 3 years ago by Antoine Martin

Owner: changed from maxmylyn to Antoine Martin
Status: newassigned

Well, that's interesting, I didn't see it before but now I do (even on a video which didn't have the problems when I had tried it earlier - wtf?).

That's because instead of updating the whole video region as one, chrome sends it in 4 smaller chunks every time:

damage(WindowModel(0x1200001 - ".. - YouTube - Chromium"), 5, 161, 640, 102, {})
damage(WindowModel(0x1200001 - ".. - YouTube - Chromium"), 5, 263, 640, 102, {})
damage(WindowModel(0x1200001 - ".. - YouTube - Chromium"), 5, 365, 640, 102, {})
damage(WindowModel(0x1200001 - ".. - YouTube - Chromium"), 5, 467, 640, 54, {})

With another video (same size), I got a different split:

damage(WindowModel(0x1200001 - ".. - YouTube - Chromium"), 133, 161, 480, 136, {})
damage(WindowModel(0x1200001 - ".. - YouTube - Chromium"), 133, 297, 480, 136, {})
damage(WindowModel(0x1200001 - ".. - YouTube - Chromium"), 133, 433, 480, 88, {})

I have no idea why it does that, but it probably throws the video region detection code off. The code does try to merge regions, but when doing so it increases the threshold (to prevent just merging everything - which isn't helpful) - and it looks like that's enough to fail the detection.

comment:16 Changed 3 years ago by Antoine Martin

Owner: changed from Antoine Martin to maxmylyn
Status: assignednew

Fixes in r6855 + r6856.

Can you still break it?

comment:17 Changed 3 years ago by maxmylyn

Yes still breakable with chrome. Running a Windows 7 64-bit client r6856 against a Fedora 20 r6856 server. The server detects the video region and properly starts sending a chunk of the screen as h.264, but it's slightly off from the actual video. Currently the top half of the video is coming in as lossless rgb, but the bottom half and some of the page below it is coming in as h.264.

I'll attach a screenshot and a -d refresh.

The interesting part starts at 12:33:01. From there you can see it sending h264 along with jpegs at the same time rather than a solid stream of h264 with the occasional jpeg full-refresh.(unless this behavior is expected?)

Last edited 3 years ago by maxmylyn (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410partialblurry.png added

Depicting the bottom half of a video being h264 and the top being lossless.

Changed 3 years ago by maxmylyn

Attachment: 410offsetrefresh.txt added

-d refresh of this behavior. Interesting bit is at 12:33.

comment:18 Changed 3 years ago by Antoine Martin

Damn, that's what the new heuristics in r6855 is supposed to detect (and worked here - will try again to break it on my laptop): we try to merge regions of the same width to make one large video region.

How easily can you trigger this? Does it remain like that for very long?

Can please you post the server debug log with -d subregion when this happens. We need to see why it prefers the smaller region to the larger merged one.

Changed 3 years ago by maxmylyn

Attachment: 410subregion.txt added

Booted up chrome, and navigated to a YouTube? video (Switzerland from Above - Top Sights). 10:22 - 10:23 was the choppiest before I closed it.

comment:19 Changed 3 years ago by maxmylyn

I attached some logs. Watched a bit of http://www.youtube.com/watch?v=Sfidg3430dc for a few minutes. Around the time-mark of 10:22 is when the video became really choppy and I could see it attempting to make a subregion, but it didn't quite stick. That video is the one I've found that really shows subregioning(is this a word?) really trying the hardest.

Of note:
This is temporarily reproducible with Firefox. The only difference is that with Firefox Xpra catches on to the video after a short period and merges the subregions and video playback smooths out.

comment:20 Changed 3 years ago by Antoine Martin

Extract from the logs:

2014-06-24 10:22:21,827 identify video: most=4% damage count=\
    {R(282, 121, 1562, 41): AtomicInteger(14),
     R(282, 285, 1562, 41): AtomicInteger(15),
     R(282, 326, 1562, 41): AtomicInteger(15),
     R(282, 490, 1562, 41): AtomicInteger(15),
     R(282, 244, 1562, 41): AtomicInteger(14),
     R(282, 162, 1562, 41): AtomicInteger(14),
     R(282, 203, 1562, 41): AtomicInteger(14),
     R(282, 613, 1562, 41): AtomicInteger(15),
     R(282, 900, 1562, 41): AtomicInteger(16),
     R(282, 818, 1562, 41): AtomicInteger(15),
     R(282, 572, 1562, 41): AtomicInteger(15),
     R(282, 736, 1562, 41): AtomicInteger(15),
     R(282, 531, 1562, 41): AtomicInteger(15),
     R(282, 695, 1562, 41): AtomicInteger(15),
     R(282, 449, 1562, 41): AtomicInteger(15),
     R(282, 367, 1562, 41): AtomicInteger(15),
     R(282, 654, 1562, 41): AtomicInteger(15),
     R(282, 982, 1562, 14): AtomicInteger(16),
     R(282, 777, 1562, 41): AtomicInteger(15),
     R(282, 941, 1562, 41): AtomicInteger(16),
     R(282, 859, 1562, 41): AtomicInteger(16),
     R(282, 408, 1562, 41): AtomicInteger(15)}
(...)
2014-06-24 10:22:23,771 identify video: most=5% damage count={\
     R(282, 121, 1562, 41): AtomicInteger(14),
     R(282, 285, 1562, 41): AtomicInteger(13),
     R(282, 326, 1562, 41): AtomicInteger(13),
     R(282, 490, 1562, 41): AtomicInteger(13),
     R(282, 244, 1562, 41): AtomicInteger(13),
     R(282, 162, 1562, 41): AtomicInteger(14),
     R(282, 203, 1562, 41): AtomicInteger(14),
     R(282, 613, 1562, 41): AtomicInteger(13),
     R(282, 900, 1562, 41): AtomicInteger(17),
     R(282, 818, 1562, 41): AtomicInteger(15),
     R(282, 572, 1562, 41): AtomicInteger(13),
     R(282, 736, 1562, 41): AtomicInteger(13),
     R(282, 531, 1562, 41): AtomicInteger(13),
     R(282, 695, 1562, 41): AtomicInteger(13),
     R(282, 449, 1562, 41): AtomicInteger(13),
     R(282, 367, 1562, 41): AtomicInteger(13),
     R(282, 654, 1562, 41): AtomicInteger(13),
     R(282, 982, 1562, 14): AtomicInteger(18),
     R(282, 777, 1562, 41): AtomicInteger(13),
     R(282, 941, 1562, 41): AtomicInteger(17),
     R(282, 859, 1562, 41): AtomicInteger(17),
     R(282, 408, 1562, 41): AtomicInteger(13)}
2014-06-24 10:22:23,786 identified video region by size (1562x41), using recent match: rectangle[282, 941, 1562, 41]

The code should have been able to identify the region from this.

Am I right in thinking that the video region was quite big, possibly covering more than 70% of the window size? And that the problem did not occur if the video region is smaller than this?

If so, r6857 should fix that.

comment:21 Changed 3 years ago by maxmylyn

Found something of interest. (I'll retest with r6857 in a bit when I get a build in my hands)

Re-ran a test running r6856 Win7 64-bit against a r6856 Fedora 20 server:

  • Booted up chrome
  • Navigated to YouTube?
  • Played, watched -d subregion
  • Paused it
    • While the video was playing the output was flooded with no video region, we may use the video encoder for something else
    • While the video was paused the output was flooded with what looked like video detection code.

I re-ran it and piped it into a .txt.

  • 16:05:00ish - 16:05:15ish I had the video playing
  • 16:05:15ish I hit pause
  • 16:05:30ish I hit play again

It looks like it's sending
no video region, we may use the video encoder for something else

followed by

2014-06-24 16:05:30,259 identify video: most=18% damage count={R(0, 691, 1024, 44): AtomicInteger(13), R(56, 121, 640, 102): AtomicInteger(29), R(56, 427, 640, 54): AtomicInteger(25), R(56, 325, 640, 102): AtomicInteger(30), R(56, 223, 640, 102): AtomicInteger(30), R(0, 628, 1024, 63): AtomicInteger(13), R(0, 565, 1024, 63): AtomicInteger(13), R(0, 502, 1024, 63): AtomicInteger(13)}
2014-06-24 16:05:30,288 failed to identify a video region

every now and then while the video is playing, and various video detection prints when the video is paused. From what I'm seeing (this can be repro'd using a CentOS client, I'll check Fedora in a bit and update this comment) it's failing to detect the video while it's playing, but once it's paused it's seeing the video. This is on the http://www.youtube.com/watch?v=f488uJAQgmw kollektivet music video running with the 720p quality setting on YouTube? with the "small" player.

Changed 3 years ago by maxmylyn

Attachment: 410subregiontest2.txt added

16:05:00ish video playing until 16:05:15ish at which point it's paused until 16:05:30ish where I hit play again.

comment:22 Changed 3 years ago by Antoine Martin

The no video region, we may use the video encoder for something else message is harmless, it means that it is sending a screen update following the no-video-region codepath, which may use the video encoder context for the whole window (if there are enough pixels to justify it) or simply using a non-video encoding.

The video region which is identified when the screen is paused looks wrong (too small), just like before. This should also be fixed in r6857.

comment:23 Changed 3 years ago by maxmylyn

Unfortunately it doesn't help. Re-ran with r6857 Win7 64-bit against r6857 Fedora 20 server; I'll attach a -d subregion log.

edit:

I'm seeing the same behavior as before. Only this time the entire window loses quality while it's hunting for the area of the screen where the video is playing. It seems to eventually settle on the lower half of the video and some more below it. Same behavior as the 410partialblurry.png screenshot.

Last edited 3 years ago by maxmylyn (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410subregionr6857.txt added

Similar test. Again, launching chrome, navigating to YouTube? and watching a video.

comment:24 Changed 3 years ago by Antoine Martin

What size is your screen? What size is a window? (Attaching xpra info should answer both of these questions and much more)
I should be able to test with a 1080p screen later today if that helps me reproduce.

comment:25 Changed 3 years ago by maxmylyn

1080p monitor. Window size is about that big. It's reproducible with a fullscreen window with YouTube? running in the small and large viewer.

Changed 3 years ago by maxmylyn

Attachment: 410xprainfo.txt added

Requested Xpra Info

comment:26 Changed 3 years ago by Antoine Martin

I see that the total client size is pretty big:

client.screen[0].size=(3520, 1196)

And the browser window is:

window[3].size=(1920, 1058)

So I should be able to reproduce this with a 1080p screen... but not having much luck so far. It would be good to figure out why, maybe it's batching / latency related.


The log file is full of:

identified horizontal video region: rectangle[313, 417, 1593, 468]

I guess that the video was not this region then? Did the screen still look like the screenshot from 2 days ago?
Can you estimate the location and size of the region which has the player on it?


I've improved the detection code further in r6862 (+fix in r6863), and we now have a dedicated logger for the code that detects video region: -d regiondetect. (which will make the log file much easier to parse)
Does this help?

Last edited 3 years ago by Antoine Martin (previous) (diff)

comment:27 Changed 3 years ago by maxmylyn

r6863 seems to have improved a little bit. It's still hunting but now it's not detecting other site elements like before. I got a -d regiondetect of the same test. This time I opened chrome, navigated to the video, paused, double checked quality settings quality=0 and min-quality=5 respectively (just to be clear on what it's doing) then hit play and watched it hunt for the video for a while. It ended up settling on choosing the center of the video with the top and bottom of the video not being detected. I'll attach a screenshot to demonstrate.

Last edited 3 years ago by Antoine Martin (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410r6883regiondetect.txt added

Changed 3 years ago by maxmylyn

Attachment: 410centerblurr.png added

Screenshot showing the placement of the video and the subsequent subregion detections.

comment:28 Changed 3 years ago by Antoine Martin

Thanks for the logs, I think r6865 should fix this problem - which I should have been able to reproduce myself, sorry about that.

Bigger regions always contain more updates within, so we have to discount them to avoid always going for the biggest possible region, and it looks like this was skewed in favour of smaller regions too much.

comment:29 Changed 3 years ago by maxmylyn

Re-ran with r6866 Win7 64-bit client against r6866 Fedora 20 server:

Video detect is doing some pretty weird stuff now. It looks like it's getting closer but it's still not refreshing the whole video region as h264. When min-quality is set to a very low number like 10, parts of the YouTube? video become blurry with a decent framerate while other parts of the video stay clear with an abysmal frame-rate. It looks like it's 'hunting' for the correct region to refresh as video but it never seems to find what it wants. Unlike before, now the blurry sections move about the video at random. One second the bottom third may be blurry, and 3 seconds later a small rectangle in the right section of the video become blurry. It has no apparent pattern that I could discern from watching for several minutes. I'll attach a log of the video being played for about 90 seconds or so.

Changed 3 years ago by maxmylyn

Attachment: 410r6866regiondetect.txt added

Started Xpra. Set quality to 0 and min-quality to 10 from another putty window. Connected and opened chrome, went to the video, and sat and watched for a minute and skipped ahead a bit and watched some more. Then disconnected and closed the server.

comment:30 Changed 3 years ago by Antoine Martin

Many improvements in r6868 (and a fix in r6867 - should backport it).

Can you try again and post -d regiondetect, it should be much better at identifying the region and the log output should be more readable.

comment:31 Changed 3 years ago by Antoine Martin

Tweaked further in r6870 to avoid chopping the top or bottom off.
(long docstring to explain fairly obscure maths)

comment:32 Changed 3 years ago by maxmylyn

Retested with 0.14 r6870 Win7 64-bit against a 0.14 r6870 Fedora 20 server:

  • I am now seeing a regression on the whole page becoming blurry after scrolling(which is in the logfile)
  • r6870 properly merges while the YouTube? player is in its "small player"
  • However, when the video is expanded using the player to its native 720p size, but not fullscreen, it no longer properly merges and I still see it playing partially as h.264 and lossless images. It looks like it's hunting for it but seems to want to settle on the top left 2/3rds.
  • Pausing the video sometimes does not trigger a full-refresh and the content of the whole page stays blurry.
    • Switching tabs and back seem to do the trick - it becomes readable again
      • Refreshing does not always work. I'll work on a repro today.

I'll attach a logfile in a minute.

edit:
It appears setting min-quality to 75 appears to force the whole webpage to render at high quality. Again, I'll continue investigating today.

Last edited 3 years ago by maxmylyn (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410r6870regiondetect.txt added

Booted up chrome. Navigated to YouTube? and played a video. Scrolled and the whole page became blurry. Paused the video and it stayed blurry. Switched tabs and it fixed it. Went back to YouTube? and played the video on the large player for a bit then disconnected.

comment:33 Changed 3 years ago by Antoine Martin

More tweaks in r6871 (server only, details in commit message).

The log fie you provided was very useful for fine tuning the heuristics, but if there are specific actions that trigger problems, I would need to log file for just that instant (a few seconds, not minutes) to know what I am looking at.

The "staying blurry" problem requires -d regionrefresh to get the debug output. (added in r6872, more specific than the "subregion" logging flag)

Hopefully, I'll have access to a larger screen than my crappy laptop to do more / better tests soon.

comment:34 Changed 3 years ago by maxmylyn

Re-ran with r6873 Win7 Client against r6873 Fedora 20 Server:

  • Chrome now properly merges video regions. However it does spend a lot of time "hunting" for a better quality setting. I'm seeing frame pauses very frequently during video playback. The Server is a VM, but playing the same video in Firefox(same session 2 browsers) produces some much smoother video playback. It's not perfect but it's notably better than Chrome.
    • However scrolling the webpage causes the video quality to become noticably worse
  • Full-page blurriness seems to have mostly gone away, with the exception of scrolling while watching.
    • Using -d regionrefresh shows the rectangle region for add_video_refresh has changed to (1434,992) from the normalish (1280, 724) after scrolling the webpage down and back up again.
      • Scrolling down a bit hiding the top half of the video off-screen prints the exact same thing, which is definitely not expected.

I'll re-run the test with -d regionrefresh piped to a .txt and attach it here.

Last edited 3 years ago by maxmylyn (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410r6873regionrefresh.txt added

Connected and opened chrome. Navigated to video. Watched for 20ish seconds then skipped ahead. Watched for a bit more, then scrolled and it started rendering the whole thing blurry. Scrolled down after a minute and left the video half hidden while watching for a bit, then disconnected.

comment:35 Changed 3 years ago by maxmylyn

On larger windows (running r6896 Win7 and CentOS clients against r6896 Fedora 20 server), especially on higher resolution displays(1440p and up), the auto-heuristics are causing smaller webpage refreshes(like text being highlighted, etc.) to force the whole webpage to repaint(or refresh?) in a blurry state, often taking a full second before refreshing again with the proper quality. I'll attach a screenshot to demonstrate. In addition, are there any logs with flags you'd want to see?

Last edited 3 years ago by maxmylyn (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410blurry.png added

Blurriness with r6896

comment:36 Changed 3 years ago by Antoine Martin

Is this newer code really to blame? Does this not also occur with older versions? (maybe to a lesser extent?)

Was there a video region on screen when this happened? Why is the picture quality so low? (xpra info may help figure this out)

The debug flags that are relevant for debugging this are:

  • encoding - which will show damage events we receive from the application (if the application repaints most of the page when the user highlights something... there isn't much we can do about it)
  • refresh for the lossless auto-refresh

Be aware that you need to capture the exact instant this happens, otherwise the log output risks being far too large to be of any use.

comment:37 Changed 3 years ago by maxmylyn

In response to your questions:

  • Older code did behave nicer
    • Older versions (tested using OSX) pre 13.x behaved much nicer
  • Setting quality to 100 or another large number definitely gets rid of it.(expected)

I'll play around with it and see if I can get a relatively quick repro that won't involve having to trim down large snippets of log.

Last edited 3 years ago by maxmylyn (previous) (diff)

comment:38 Changed 3 years ago by maxmylyn

Okay finally pinned down a solid repro in Firefox.
Tested using a CentOS r6896 client against a r6896 Fedora 20 server:

Mousing over links and browser elements(causing them to become underlined, or highlighted) with a large Window triggers a full refresh in which the whole Firefox window becomes blurry for about a full-second before refreshing again and becoming clear.

Not entirely sure if it's Firefox or Xpra triggering the full-refresh,(can see the same behavior in Chrome) but disabling auto and setting quality to 100 fixes this completely.(expected)

I'm uploading 2 .txts with full-logs(just in case), but these are the relevant times to look at(where I was just mousing over browser elements - no scrolling or typing)

  • refresh - 15:54:30
  • encoding - 16:05:50

Changed 3 years ago by maxmylyn

Attachment: 410blurryrefresh.txt added

full -d refresh; relevant time to look at: 15:54:30

Changed 3 years ago by maxmylyn

Attachment: 410blurryencoding.txt added

full -d encoding; relevant time: 16:05:50

comment:39 Changed 3 years ago by Antoine Martin

  • please include xpra info captured around the time of the problem
  • the debugging flags should be combined: -d refresh,encoding, or maybe even: -d refresh,encoding,regionrefresh,regiondetect (depending on the amount of logging that goes on and whether regions are relevant to this problem)
  • the time specified does not exist in the logs: your computers should be running ntp to ensure that the time matches

(I'm not skimming 3500 lines of log file to find something suspicious..)

Last edited 3 years ago by Antoine Martin (previous) (diff)

comment:40 Changed 3 years ago by maxmylyn

Heh. Looks like Camarena's time is out of sync. I rechecked it just now and it's exactly 1 minute fast. I'll retry this and reupload.

edit:

Looks like chrome is easier to reproduce the blurryness, the next log file will be short. After about 1 second after connecting, I was able to produce full-window blurriness just hovering over links.

Last edited 3 years ago by maxmylyn (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410encodingrefresh.txt added

Retest of the previous test. Everything after 09:39:40 is just sitting at Wikipedia. No scrolling or typing just moving the mouse over links.

comment:41 Changed 3 years ago by Antoine Martin

Please read carefully. As per comment:39, I still need xpra info at the time of the problem.

It looks to me like chrome is repainting the whole screen a number of times in that log file after 09:39:40, usually in horizontal chunks of about 30 to 80 pixels in height. (sample log below)

We use h264 for the full screen update (which is expected).

Since the quality is quite low (~40%) we also end up downscaling the picture to YUV420P which will cause blurriness already, not helped by the low quality encoding that follows it.

Since the screen keeps refreshing, we re-schedule the auto-refresh (which is also expected), but it still fires on average between 150ms and 350ms after the lossy paint, which is fine. (see timer_full_refresh() after XXXXms in log).
The auto-refresh ends up using png, which brings us to our first problem: webp is missing from your setup. png is slow and may cause the encoding queue to choke, which in turn may be responsible for the lower quality.
(one of the major improvements in 0.13 was the improved wepb encoder and decoder: #419, it should be used)


So the real problem is that we end up lowering the picture quality too much. (see update_quality in the log, get_target_quality in the code).
That's because:

  • batch_factor is low, probably because the batch delay is high (xpra info needed to ascertain)
  • latency_factor is high, so this isn't the problem (if anything, it is too high to be useful)


Can you try increasing the auto-refresh delay to see if it improves things?
(it may actually allow the quality to stay higher since png won't clog up the queue)


I'll try to improve the lossless refresh so that we re-use the encoding selection logic, to make it more likely that we'll choose the best encoding possible for the auto refresh (not png if we can avoid it, and not webp for unsuitable dimensions).

You will need to have webp setup correctly to take advantage of that.


The best thing would be to avoid those unnecessary repaints.
Either by fixing the browser's rendering engine (very hard - if it can be fixed at all, it may not be a "bug" as such) or by tweaking the default browser stylesheet to try to ensure that hovering over something does not change the element's style so much that everything ends up repainted. (monkey patching some popular sites may help a lot - though this would need regular updating)


It is worth logging the full version number of the browser used in the future, as different versions have different rendering engine optimizations, different toolkits. All of these things influence how the screen gets (re)painted.
I have:

google-chrome-stable-36.0.1985.125-1.x86_64

(I was on 35 or even 34 earlier in this ticket)


Example of damage:

damage(WindowModel(0xe00001 - "Wikipedia, the free encyclopedia - Google Chrome"), 0, 124, 1024, 63, {})
damage(WindowModel(0xe00001 - "Wikipedia, the free encyclopedia - Google Chrome"), 0, 187, 1024, 63, {})
damage(WindowModel(0xe00001 - "Wikipedia, the free encyclopedia - Google Chrome"), 0, 250, 1024, 63, {})
damage(WindowModel(0xe00001 - "Wikipedia, the free encyclopedia - Google Chrome"), 0, 313, 1024, 63, {})
...

Example of quality calculations:

update_quality() info={'backlog_factor': 100, 'latency_factor': 252, 'min_quality': 30, 'batch_factor': 22}, quality=45
update_quality() info={'backlog_factor': 100, 'latency_factor': 1971, 'min_quality': 30, 'batch_factor': 30}, quality=50
update_quality() info={'backlog_factor': 100, 'latency_factor': 2491, 'min_quality': 30, 'batch_factor': 45}, quality=58
update_quality() info={'backlog_factor': 100, 'latency_factor': 1144, 'min_quality': 30, 'batch_factor': 52}, quality=63
update_quality() info={'backlog_factor': 100, 'latency_factor': 445, 'min_quality': 30, 'batch_factor': 63}, quality=69
Last edited 3 years ago by Antoine Martin (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410xprainfo.2.txt added

Xpra info right after it snapped to blurry, but before it refreshed to higher quality. Full log-file can be provided if necessary. Luckily got it on the first try.

comment:42 Changed 3 years ago by Antoine Martin

  • r6907 + r6910 implements better encoding selection for auto-refresh - often webp.
  • r6908 makes us keep a higher quality - Important: this may make it look like things are fixed, when in fact it just papers over the real issue. If so, undo this particular commit to make it easier to reproduce the problem as before.


Summary of your / smo's remaining tasks (see comment:41) for details:

  • close #419... (stuck at blocker for 8 weeks)
  • fix your installation so webp gets used, and re-test
  • test with higher auto refresh delays (with the same version as before)
  • try to find ways to get the browser to behave better (ie: via stylesheet)
Last edited 3 years ago by Antoine Martin (previous) (diff)

comment:43 Changed 3 years ago by Antoine Martin

r6936 now also auto-downscales less aggressively

comment:44 Changed 3 years ago by maxmylyn

r6936 seems to have fixed the blurry issue. However video performance has noticeably suffered. Even when encoding quality (again, set on auto) drops to below 50, the framerate of the video never improves, it just stays at a low framerate. This was tested in both Firefox and Chrome.

comment:45 Changed 3 years ago by Antoine Martin

Please see comment:42

  • was it r6908 that causes this?
  • is webp now used?
  • auto-refresh changes tested?
  • xpra info data to compare?

Another thing that is worth recording is the profiling of the window updates processing (just in case there is a bottleneck I haven't seen), run the server via pycallgraph:

./tests/scripts/pycallgraph -i damage  -- start :10 --start-child=xterm

and of the encoding thread:

./tests/scripts/pycallgraph -t encode -i '*' -e one_offs,std,libs -- start :10 --start-child=xterm

I have attached an example output.
More info on profiling here: wiki/Profiling.

Last edited 3 years ago by Antoine Martin (previous) (diff)

Changed 3 years ago by Antoine Martin

Attachment: pycallgraph-xpra.png added

example output from profiling

comment:46 Changed 3 years ago by maxmylyn

Re-ran some tests with r7041 Win7 client against r7041 Fedora 20 and r6907 Fedora 20.:

  • It appears r6908 has caused this assuming it affects later revisions. r6907 have the blurriness issue but videos actually play decently well at low quality. Anything after r6908 has basically eliminated all blurriness, however videos do not lower quality to help with performance. (more below)
  • changing auto-refresh actually made blurriness "stick" longer if I upped it to 70 in r6907
  • Yes.
  • I will attach 2 xpra-info later with quality at auto and manually set to 30. There was no noticeable difference between quality 0 with min-quality 10, than quality 30. Other than some slight pixelization, the video was still sending high-quality frames at a low framerate.

edit: before I forget, smo is working on trying to get pycallgraph working. Our test VMs don't have the source on them so they don't have access to the test-scripts. If there's a way to get the scripts onto them(via a special build, etc.), let me or smo know so we can get on that.

Last edited 3 years ago by maxmylyn (previous) (diff)

Changed 3 years ago by maxmylyn

Attachment: 410xprainfor7041.txt added

xpra info while watching a youtube video

Changed 3 years ago by maxmylyn

an xpra info at the same time, but with quality manually set to 30 - no noticeable affect on video performance.

comment:47 Changed 3 years ago by Antoine Martin

Owner: changed from maxmylyn to Antoine Martin
Status: newassigned

I guess this should have been assigned back to me? Taking it back.

I will re-test this and try to improve the logging and tools used to diagnose region detection.

comment:48 Changed 2 years ago by Antoine Martin

Priority: minorcritical

Raising priority. I've seen some bad behaviour in corner cases.

For example: we select the video region correctly, but then when we make the flash player fullscreen, the video region sticks when in fact we should be switching to fullscreen video.

I'll try to make the logging more useful, so we can capture the input to the video region detection state machine, then later replay it so we can tune it better.

comment:49 Changed 2 years ago by Antoine Martin

Owner: changed from Antoine Martin to maxmylyn
Status: assignednew

Improved region detection in r8164, with also more logging.

I will work on feeding the logging back to the test tool so we can more easily debug and fine tune it.

@maxmylyn: In the meantime, can you break it? If so, can you provide log samples with -d regiondetect near the point where region detection does not work as well as you think it should.

comment:50 Changed 2 years ago by Antoine Martin

More changes:

  • refresh changes: r8407, r8406
  • detection tweaks: ignore small regions in r8404
  • debug logging: r8375
  • fix region merging: r8373 (+backport in r8374),
  • allow us to disable the region merging code with XPRA_MERGE_REGIONS=0 xpra start ...: r8334 (+backport in r8316) for #760
  • r8188: allows us to disable subregion code entirely with XPRA_VIDEO_SUBREGION=0 xpra start..

See also #615

Last edited 2 years ago by Antoine Martin (previous) (diff)

comment:51 Changed 2 years ago by Antoine Martin

Owner: changed from maxmylyn to alas

afarr: unless someone can break it, I think we should close this.

comment:52 Changed 2 years ago by alas

Can't seem to break anything. I do notice (with opengl paint boxes set on an osx client) that the XPRA_VIDEO_SUBREGION=0 setting seems to lead to youtube, for example, only using rgb24 and webp encodings though. Not sure that turning off video subregions is meant to turn off the h264 encdings... but not sure that it's not meant to.

Assuming that's intended, this looks good to close.

comment:53 Changed 2 years ago by Antoine Martin

Not sure that turning off video subregions is meant to turn off the h264 encdings...


It is not meant to, are you sure that it is?

$ xpra info | grep total_frames
window[2].total_frames[delta]=1
window[2].total_frames[h264]=644
window[2].total_frames[jpeg]=1
window[2].total_frames[png]=2
window[2].total_frames[rgb24]=703

Note: most of the rgb24 frames are edges of the video area (h264 can only encode even sizes, so we use rgb24 for the right and bottom edge)

comment:54 Changed 2 years ago by alas

Resolution: fixed
Status: newclosed

Hmm... well I stand corrected. I'm not seeing any blue box outlines, but...

[jimador@zapopan ~]$ xpra info :23 | grep total_frames
window[1].total_frames[jpeg]=1
window[1].total_frames[webp]=1
window[2].total_frames[delta]=9
window[2].total_frames[h264]=581
window[2].total_frames[jpeg]=43
window[2].total_frames[png]=12
window[2].total_frames[rgb24]=1180
window[2].total_frames[webp]=3

Since 581 > 0 (significantly so)... I guess the last concern is solved.

Closing.

Note: See TracTickets for help on using tickets.