Xpra: Ticket #1232: detecting motion vectors / handling scrolling better

Scrolling can make us use a video region for the area being scrolled, we should be able to supply the motion vectors to the encoder: we know that the whole region moves as one unit so this should save a lot of CPU and bandwidth.

Plan:

Alternatively, maybe we could also use the motion vector internally: send the delta (using a fast stream compressor like lz4) + motion: still send the newly exposed region separately (different encoding) We may want to force an I frame for the next video frame since the delta will be bigger.

Also worth thinking about: distinguish video from scrolling. Video has high framerate and lasts longer, we can then:

There are probably some limitations on motion vectors range, etc



Sun, 19 Jun 2016 10:44:40 GMT - Antoine Martin: attachment set

unoptimized code to detect scrolling


Sun, 19 Jun 2016 10:53:48 GMT - Antoine Martin: status, summary changed

The good news first: with the patch above, I can detect scrolling in the majority of cases where the distance is small enough, ie: with Firefox:

best scroll guess, matches 91% of 464 lines: 10
best scroll guess, matches 89% of 464 lines: 13
best scroll guess, matches 88% of 464 lines: 15
best scroll guess, matches 95% of 464 lines: 6
best scroll guess, matches 93% of 464 lines: 8
best scroll guess, matches 89% of 464 lines: 13
best scroll guess, matches 95% of 464 lines: 6
best scroll guess, matches 90% of 464 lines: 12
best scroll guess, matches 95% of 464 lines: 5
best scroll guess, matches 96% of 464 lines: 5
best scroll guess, matches 97% of 464 lines: 3
best scroll guess, matches 95% of 464 lines: 7
best scroll guess, matches 97% of 464 lines: 3
best scroll guess, matches 98% of 464 lines: 0
best scroll guess, matches 90% of 464 lines: -14
best scroll guess, matches 73% of 464 lines: -42
best scroll guess, matches 64% of 464 lines: -52
best scroll guess, matches 74% of 464 lines: -32
best scroll guess, matches 73% of 464 lines: -32
best scroll guess, matches 52% of 464 lines: -60
best scroll guess, matches 77% of 464 lines: -28
best scroll guess, matches 64% of 464 lines: -45
best scroll guess, matches 85% of 464 lines: -17
best scroll guess, matches 85% of 464 lines: -18

The bad news is that x264 does not support supplying motion vectors: Manually feeding x264 with my own motion data?: Short answer: No you can't feed in your motion estimation data to x264. NVENC does, but it's very poorly documented (as usual) and it is per block... and blocks are small.

So I think we'll just have to handle scrolling without using video, unlike the current test patch which hooks in the video path. But we still want to only fire it when there is a good chance it will be useful (as we have to scan the whole picture, and then compare all the checksums)


Mon, 20 Jun 2016 08:31:51 GMT - Antoine Martin: attachment set

make it fast with cython zero-copy and CRC64 instead of sha1sum


Tue, 21 Jun 2016 15:39:50 GMT - Antoine Martin:

Some preparatory work in r12877 and r12878, an updated patch is attached.

What works:

Still todo:


Wed, 22 Jun 2016 02:39:39 GMT - Antoine Martin: attachment set

failed attempt at implementing scrolling in the pixmap backing - just crashes randomly..


Wed, 22 Jun 2016 13:16:54 GMT - Antoine Martin:

More preparatory work in r12881 + r12885 + r12886 + r12887, and tests in r12880 + r12882 + r12883 + r12884


Wed, 22 Jun 2016 16:10:32 GMT - Antoine Martin:

Testing notes for my own benefit since I may have to get back to other things:


Doing a focus change causes the whole window to be "repainted", without any actual pixel changes, the server correctly sees it:

encode_scrolling(XShmImageWrapper(BGRX: 0, 0, 630, 480), '{0: 480, 1: 478, 2: 476, 3: .. 470, -4: 472, -3: 474, -2: 476}', [], [], {})
 will send scroll packets for yscroll={0: 480, 1: 478, 2: 476...}
 480 lines have not changed: '[0, 1, 2, 3, 4, 5, 6, 7, 8, .. , 474, 475, 476, 477, 478, 479]'

No data is sent to the client at all which is great.


Then the vscroll gl test application scrolls the window up by 50 pixels and repaints the bottom 50 pixels, again the server correctly identifies this:

encode_scrolling(XShmImageWrapper(BGRX: 0, 0, 630, 480), '{0: 380, 1: 379, 2: 378, 3: .. 385, -4: 384, -3: 383, -2: 382}', [], [], {})
 will send scroll packets for yscroll={0: 380, -60: ...  -52: 426, -51: 428, -50: 430, -49: 429, -48: 428, ..}
 scroll groups for distance=-50 : '[0, 1, 2, 3, 4, 5, 6, 7, 8, .. , 424, 425, 426, 427, 428, 429]'=[(0, 430)]
 non scroll: 1 packets: '[430, 431, 432, 433, 434, 4 .. , 474, 475, 476, 477, 478, 479]'=[(430, 50)]

And the client gets the matching paint requests:

do_scroll_paints(((0, 50, 630, 430, -50),), 1)
draw_region(0, 430, 630, 50, png, 604 bytes, 0, {'compress_level': 2}, [<function record_decode_time at 0x7f95402250c8>, <function after_draw_refresh at 0x7f9540225050>])

Yet the window appears black after this first scroll event. That's weird considering that the client and the gl vscroll test app actually use the exact same code for painting. The GL paint code seems to have issues when there is more than one window shown, maybe there are other issues too. Maybe fixing the pixmap backend would help confirm that the paint data is correct (or maybe also save it to file for manual inspection).

Ideally, we would want to choose the "scroll" encoding early, but the encoding selection logic code doesn't have access to the pixels.. This may have to be changed. We could then more efficiently discard unnecessary repaints.


Wed, 22 Jun 2016 16:40:55 GMT - Antoine Martin: attachment set

almost working pixmap implementation (needs XPRA_REPAINT_ALL=1), shows the same corruption


Sun, 26 Jun 2016 11:17:01 GMT - Antoine Martin: attachment set

updated patch


Sun, 26 Jun 2016 15:51:03 GMT - Antoine Martin:

Lots of changes related to scrolling have been merged:


Still TODO:

Maybes (see comment:2 for details):


Mon, 27 Jun 2016 14:17:24 GMT - Antoine Martin:


Until this is integrated more tightly, you need to run the server with XPRA_ENCODING_STRICT_MODE=1 to trigger the video code path, which is the place where the scrolling detection is currently hooked up. You can get the scrolling debug with "-d scroll". The color shown with opengl paint box debugging is orange, it only draws around the area we copied to (not from).


Tue, 12 Jul 2016 16:52:22 GMT - Antoine Martin: milestone changed

Milestone renamed


Sat, 20 Aug 2016 12:45:59 GMT - Antoine Martin:

r13402 adds an SSE 4.2 accelerated CRC32 version which is a lot faster than the standard zlib.crc32 on supported hardware (tested on Skylake).


Sat, 20 Aug 2016 15:38:20 GMT - Antoine Martin:

There was a silly bug in the code, which means that the performance was overestimated for crc32c. But as of r13405 we switch to python-xxhash, and we get 7.5GB/s! (with some better tests)


Tue, 23 Aug 2016 08:42:02 GMT - Antoine Martin:

Lots of updates: r13433, r13434, r13435, r13436. scrolling detection is enabled again, and seems to work well enough. As of r13442 we only enable scrolling detection if "xxhash" is available (the alternatives are just too slow). Still todo: #1257, try to manage conflicts between video regions and scrolling... very difficult to have both at the same time.


Thu, 25 Aug 2016 14:45:35 GMT - Antoine Martin: owner, status changed

Looks ready for a first round of testing. How I tested: using firefox and scrolling around (without video playing at the same time). You can try to see what encoding is used with opengl paint debug ("Meta+Shift+F12" to turn on for a window), but since the scrolling will accumulate previous paint boxes... it can get messy quickly. A better way is to check what encodings are actually being used with xpra info | egrep "total_frames|total_pixels". Then for extra verbose details, there's "-d scroll".

Important: you must have python-xxhash installed on the server for scroll detection.


Fri, 26 Aug 2016 09:37:32 GMT - Antoine Martin:

FYI: the win32 and osx shadow servers now take advantage of scrolling detection to save a huge amount of bandwidth and CPU, they can be useful for testing and helped uncover an absolute monster of a bug: r13479


Mon, 29 Aug 2016 06:00:33 GMT - Antoine Martin:


Wed, 07 Sep 2016 09:13:18 GMT - Antoine Martin:

Just hit this:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/gl/gl_window_backing_base.py", line 514, in do_scroll_paints
    assert y+ydelta>=0 and y+h+ydelta<=bh, "vertical scroll by %i: rectangle %s overflows the backing buffer size %s" % (ydelta, (x, y, w, h), self.size)
AssertionError: vertical scroll by 299: rectangle (17, 2, 1002, 299) overflows the backing buffer size (1021, 591)

Not sure how to reproduce it.


Thu, 08 Sep 2016 16:30:19 GMT - Antoine Martin:

The bug in comment:14 is caused by a mismatch between the server window size and the backing client-side (which can happen more easily with desktop-scaling enabled, ie: #1098), r13618 is a workaround for that. r13617 also improves error handling end reporting.


Fri, 09 Sep 2016 11:05:53 GMT - Antoine Martin:

Minor fix in r13623: we don't use scrolling for "real" video, which may use b-frames (#800). "real" video can be identified in xpra info as:

$ xpra info | grep video-mode
window.1.video_subregion.video-mode=True

Wed, 21 Sep 2016 08:14:43 GMT - Antoine Martin:

Minor change: as of r13797 + r13799, we run the scrolling detection in more cases, even when we detected a video region that really does look like real video, but not if the API has been used to set this video region.

Notes:


Fri, 07 Oct 2016 22:37:28 GMT - J. Max Mena: owner changed

Running a Fedora 23 r14042 trunk server with a Fedora 23 r14042 trunk client:

And, I'm not able to trigger the new encoding at all. I can tell this because running with XPRA_OPENGL_PAINT_BOX=1, upon scrolling all I see is h264, none of the new paints mentioned earlier.

Also, I keep seeing this pop up in the logs: attachment/ticket/1232/zero.log

(..)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window/window_video_source.py", line 1120, in calculate_scaling
    ffps = int(pixels/(width*height)/(time.time() - otime))
ZeroDivisionError: integer division or modulo by zero

Passing back to you with the tracebacks.

Also, I'd like (for myself, afarr, and anyone else interested) for you to indicate what flags and options are required to get the new scrolling working. As far as I can tell all that's required is XPRA_SCROLL_ENCODING=1 and XPRA_ENCODING_STRICT_MODE=1 on the server, and just scroll encoding on the client.


Sat, 08 Oct 2016 08:29:31 GMT - Antoine Martin: attachment set

ZeroDivisionError? in calculate_scaling


Sat, 08 Oct 2016 09:52:34 GMT - Antoine Martin: owner changed

The zero division error should be fixed in r14053.


Scrolling should work out of the box, without needing any special command line arguments, provided that:


To verify that scrolling is being used:

The scroll encoding contains the motion data, the png packets contain the area scrolled into view (the last packet) and the areas that have changed (ie: the top and bottom of the scrollbar which has also moved)

One last word: although the scroll paint code will use the stippled brown as "paint box colour", you can barely see it because we usually also paint the lossless screen updates together (flush>0) and those will include the same edges.. r14058 makes it 2 pixels thicker.


Mon, 10 Oct 2016 17:50:53 GMT - J. Max Mena: owner changed

Upped server and client to r14106:

To track it:

I verified that it was the new scrolling code by starting up a new server with XPRA_SCROLL_ENCODING=0 and re-ran those steps, and the memory didn't climb. It's very easy to notice as the memory climbs very steadily and consistently.


Also, I found that grabbing the scroll bar with the mouse and moving it up and down quickly (an extreme corner case) triggers full h264 after a short time (probably 1 full second by my estimation). Not sure if it's expected or not, just wanted to note it.


Passing back to you.


Mon, 10 Oct 2016 19:15:06 GMT - J. Max Mena:

Update:


Tue, 11 Oct 2016 04:06:01 GMT - Antoine Martin: owner changed


Mon, 24 Oct 2016 19:52:23 GMT - J. Max Mena: owner changed

Noted fix of memory leak and extreme corner case. Looks pretty fantastic to me, I haven't found any more corner cases.

Will test more thoroughly when I get the time, but it seems to be behaving very nicely.

I'll pass this back to you, to decide what to do with this ticket. My vote is close it as working, and then open new tickets in the future, but I'll defer to you.


Tue, 25 Oct 2016 03:39:12 GMT - Antoine Martin: status changed; resolution set

Thanks!

Closing, we'll revisit this as part of #1341.


Wed, 02 Nov 2016 02:12:57 GMT - J. Max Mena:

Jotting down notes from the meeting today:

Basically, small webpage updates after scrolling for a bit cause the whole window to get repainted as h264 in a very low quality (I've been seeing ~50-60) compared to the high 90s that the scrolling uses, creating very unexpected blurriness.

A couple of short repro steps:

Upon doing so, the whole window gets repainted with h264, and with -d compress,x264,refresh enabled, you can grep for quality and watch it drop from 97-99 down to 50-60 for the h264 paints.


Wed, 02 Nov 2016 06:40:23 GMT - Antoine Martin: status changed; resolution deleted


Wed, 02 Nov 2016 06:48:03 GMT - Antoine Martin: owner, status changed

Re-opened because r14382 + r14383 make some changes to this core part of the code.

The big problem is that we have to spot when "video" (or scrolling in this case) stops and use a different encoding. Doing this detection is expensive to do, and hard to get right too. The auto-refresh should fix the lossy screen update, does it not?


Fri, 04 Nov 2016 19:17:43 GMT - J. Max Mena: owner changed

Upped server and client r14397:

The changes definitely have helped.

NOTE: This will be triggerable in Chrome going forward due to how Chrome paints - even small web-page updates trigger a full-window repaint. Firefox seems to behave properly, however.

Passing back to you - my testing has shown that it's been resolved in applications that behave nicely, not sure what we want to do about Chrome, or even if there even is anything we can do.


Wed, 09 Nov 2016 08:26:00 GMT - Antoine Martin: status changed

#1352: it seems that there are some painting issues.


Fri, 11 Nov 2016 21:25:29 GMT - J. Max Mena:

From ticket #1352:

Okay, so I'm seeing some some widespread yet intermittent issues using the scrolling codec (hmmm maybe we should come up with a cool name).

For now I'm using a Fedora 24 trunk 1.0 r14404 server and client

For the most part the following websites are the easiest to trigger:

Basically any site that has a good mix of text yet interactive fields causes some weird behavior. I see everything from double lines of text to images appearing over comment fields (if the image is behind a collapsible element). I also see behavior similar to the b-frames issue of the application updating but not getting the paint.

And, to make it more annoyoing, it's most prevalent in Fedora, and every time I try to open Ksnap it gets a paint refresh and goes away.

Turning on XPRA_OPENGL_PAINT_BOX I see that it occurs when there's a mix of the scrolling codec, png, and rgb24 (orange).

Repro steps are spotty, I'm trying to narrow down a valid and reliable repro but am coming up with nothing:

And, it's still an issue with B-Frames disabled, however I'm finding it much harder to get the paint bug to "stick". For now it seems to be breaking slightly and then it clears itself up.

I'll attach some screenshots and some -d compress,scroll prints to this ticket.

My repro is using Reddit comment fields:


Fri, 11 Nov 2016 21:27:54 GMT - J. Max Mena: attachment set

-d scroll,compress output - managed to get it broken similar to the screenshots, then minimized and unminimized and stopped the server


Fri, 11 Nov 2016 21:28:16 GMT - J. Max Mena: attachment set

Broken state after collapsing and uncollapsing the comment


Fri, 11 Nov 2016 21:28:32 GMT - J. Max Mena: attachment set

after minimizing and unminimizing


Fri, 18 Nov 2016 18:11:26 GMT - J. Max Mena:

Note:

It is notably easier to trigger on https://news.google.com


Sun, 20 Nov 2016 17:00:00 GMT - Antoine Martin: priority changed

Confirmed and fairly easy to reproduce. r14465 improves the code and adds XPRA_OPENGL_SAVE_BUFFERS to the opengl client, making it possible to save the state of the client-side FBO, frame by frame.

Other TODO:


Mon, 21 Nov 2016 09:07:55 GMT - Antoine Martin: owner, status changed

More changes in r14466, see commit message. Using the newly introduced XPRA_SAVE_VIDEO_FRAMES=jpeg server side switch and the existing XPRA_OPENGL_SAVE_BUFFERS=jpeg client side switch.

How to use this cool new feature (should go on the wiki somewhere) for debugging any video picture issues:


Back to the bug: r14467 fixes that, I can no longer reproduce it. Note however that we just repaint the area, it may still appear out of order (looks like the picture is flickering a bit when it does), but the next screen refresh fixes it.

I'm still seeing the occasional lower than desired visual quality when scrolling very very fast. It should go back up more quickly than it does after you stop scrolling - but this is not really scrolling related and belongs in another ticket (related to #1257 and #1135).

It would be nice if we could also update the image checksum when we process the next auto-refresh (scrolling is currently considered lossy), but this would be quite hard to implement.


Mon, 21 Nov 2016 23:47:58 GMT - J. Max Mena:

Upped to r14467:

Honestly, it looks and behaves much better now - I'm not seeing any more of the issues I described earlier.

Even after almost a full work day of using it, I still can't find any outstanding issues.

Passing back to you; probably for closure.


Tue, 22 Nov 2016 06:10:41 GMT - Antoine Martin: status changed; resolution set

Let's finally close this.


Thu, 02 Feb 2017 11:33:05 GMT - Antoine Martin:

Follow up in #1426 and #1320


Thu, 25 Apr 2019 06:28:43 GMT - Antoine Martin:

See also #2248: more aggressive use of "scroll" encoding.


Fri, 08 May 2020 14:11:30 GMT - Antoine Martin:

A bug has been fixed 3 years later! #2757


Sat, 23 Jan 2021 05:18:34 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/1232