Xpra: Ticket #328: split yuv into its own pseudo codec

Why?:

cleaner, the code in the 264/vpx codecs is getting messy
we don't necessarily need yuv when using x264/vpx with the opengl client (though we would need to tweak the window backing logic to always choose opengl windows if yuv2rgb is not available)
make easier to tune the colourspace conversion step (downsampling/scaling etc)

Attached is a test patch (does not even compile) which shows how to get started. I don't have time to take this further right now..

Tue, 30 Apr 2013 14:24:08 GMT - Antoine Martin: attachment set

attachment set to yuv-codec.patch

work in progress yuv codec

Tue, 30 Apr 2013 14:39:30 GMT - ahuillet:

I think the "co" should also be split from the "dec".

Sat, 04 May 2013 13:57:20 GMT - Antoine Martin: status changed; owner set

owner set to Antoine Martin
status changed from new to assigned

splitting encoder from decoders is done (that part was very easy) in r3265

Fri, 17 May 2013 13:36:23 GMT - Antoine Martin: attachment set

attachment set to yuv-codec-v2.patch

better patch - no longer using a yuvlib, all in cython

Sun, 26 May 2013 09:40:44 GMT - ahuillet:

The plan is: separate video backends into modules -

csc_* for colorspace conversion (csc_swscale, csc_nvcuda coming one day) enc_* for encoding (enc_x264) dec_* for decoding (dec_avcodec)

The patches on the C parts are done, with an improved API, and kept at https://github.com/ahuillet/video_backends.

Now we need to re-do the .pyx interfaces to make this work again.

Sun, 26 May 2013 09:58:58 GMT - ahuillet:

Quirks:

csc_ is able to support any colorspace, but enc_ and dec_ have not had that piece of functionality readded yet.
the logic for changing colorspace has been removed from enc_x264 because it's not its place, it will have to be reimplemented in python

Memory management:

you allocate a context with init_* functions like before, you clean up the context with free_* functions like before, but those functions *do not* free the context, you have to call free(context) manually
pass { NULL, NULL, NULL } to csc_image functions - they allocate their own memory, you free it by calling free_csc_image({})
enc_x264 allocate its own memory and you don't free it afterwards (it reuses the same buffer over and over again)
I am not sure about dec_avcodec, will check.

Sun, 26 May 2013 15:18:58 GMT - Antoine Martin:

We also need to deal with reads beyond the size of the buffer caused by swscale reading 8 (or more?) bytes at a time - valgrind output:

==8159== Thread 6:
==8159== Invalid read of size 8
==8159==    at 0xF7580E1: ??? (in /usr/lib64/libswscale.so.2.1.101)
==8159==    by 0xF750038: ??? (in /usr/lib64/libswscale.so.2.1.101)
==8159==    by 0xF756465: sws_scale (in /usr/lib64/libswscale.so.2.1.101)
==8159==    by 0xFFFC441: csc_image_rgb2yuv (x264lib.c:502)
==8159==    by 0xFFF71B5: __pyx_pw_4xpra_6codecs_4x264_7encoder_7Encoder_25compress_image (encoder.c:3033)
==8159==    by 0x35B1EDD280: PyEval_EvalFrameEx (ceval.c:4098)
==8159==    by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1E6D925: function_call (funcobject.c:526)
==8159==    by 0x35B1E49C0D: PyObject_Call (abstract.c:2529)
==8159==    by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==  Address 0x1acea03c is 630,732 bytes inside a block of size 630,736 alloc'd
==8159==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==8159==    by 0x35B3A287BB: XGetImage (GetImage.c:82)
==8159==    by 0x158E52C6: __pyx_pw_4xpra_3x11_7gtk_x11_12gdk_bindings_13PixmapWrapper_3get_image (gdk_bindings.c:6555)
==8159==    by 0x35B1E49C0D: PyObject_Call (abstract.c:2529)
==8159==    by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411)
==8159==    by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)

Mon, 27 May 2013 11:31:27 GMT - ahuillet:

New backend infrastructure WIP at https://github.com/ahuillet/xpra/commits/

Things to do :

re-add nogil
fix xmemalign() definition and calls
colorspace other than 420 need to be de-hardcoded
OpenGL backend seems to go through CSC anyway
packaging, debs, rpms - patches directory
win32 build
vpx
feed RGB to x264 directly instead of YUV444
quality = 100 => ultrafast in x264

Tue, 28 May 2013 09:09:59 GMT - Antoine Martin: attachment set

attachment set to yuv-codec-v3.patch

latest work in progress

Fri, 31 May 2013 11:28:41 GMT - Antoine Martin:

Please pull the latest changes I've sent, updated list of remaining issues (help needed - although not all of these issues need to be dealt with before we merge into trunk, most of them will need doing at some point):

edge resistance: take current setting into account, so the current one (or similar ones that re-use csc or video encoder) will get a score boost
fire pipeline changes from timer thread if new best option wins (set new quality/speed then re-evaluate pipeline)
move ImageWrapper to global location (no longer a server-only class)
free_csc is a simple free(),s o no real need for the currently very ugly code: merge with XImageWrapper code which does do a free()

Also, maybe we want/have to use separate memaligned buffers for each plane?

handle scaling and odd dimensions problems
- expose dimensions mask (0xFFFE is for x264 only?)
BGRA does not work for vpx or x264!?: segfault! - disabled by #ifdef for now (may also need rowstride work for uneven sizes?) - and even if it does work, this also affects:
with other csc modes than YUV420P, YUV422P, YUV444P, say BGRA: what should we expose to the client? does it care about the mode that was used or about the actual pixel dimensions used during compression. maybe we should not expose the option "csc=YUV422P" but something like: "csc_divs=(1,1),(2,1),(2,1)" instead? and maybe this ties with scaling too
related to above: old clients without "encoding_client_options" option could still parse pixel data as long as the divs are the same? (not that we would want to care and handle this particular case - just a note)
ensure we zero copy pixels instead of copying from decoder/encoder, then:
- restore nogil: verify that we are passing the same pointers via string wrappers and that we are allowed to manipulate them without the gil.
fix opengl (no idea why the output is blank - the yuv codepath works for the non-gl code..)
swscale: could be tuned using speed/quality too: use FAST_BILINEAR if quality=0 and speed=100, do re-init in set_quality/set_speed - which could be combined as: configure(quality, speed)
think about target bitrate mode - no real difficulty there?
vpx: use vpx_img_wrap to ensure we don't copy the pixels
client side: use similar code to choose the decoder path, and maybe choose window backing (acceleration apis generally require direct access to the backing window)
dec_avcodec should be able to handle vpx: make the decoder export the list of codec formats it handles (x264, vpx, etc)
max width and height for csc (ffmpeg ML related message), vpx and x264: find the actual limits from somewhere (API?) rather than hard-coding to 4k*4k
rename x264 to h264 as encoding option (and keep "x264" in encodings for backwards compatibility)
move memalign to lib with pxd for importing into python code when we create image buffers
encoder/csc init: probe and maybe test speed (also useful to test it does work!) - ie: could test csc a 1k*1k random/test buffer to figure out latency
vpx cannot do speed or quality for now: we should skip the calculations! (keep caps telling us what we can do for the current encoder)
maybe add command line options for enabling/disabling/prioritizing encoder backends
if window size has changed just a little, we could send the edges as per the one-pixel workaround (which may well need generalizing anyway when dealing with thins like scaling), at least a few times before doing the full re-init which sends a new (expensive) iframe. And by that point the window may have been resized some more - so we don't lose video delta benefits.
the encoder specs should be static: the pipeline class should have a single copy of all encoder specs for all window sources. Then:
- the encoder specs can have guesstimates and for some enc/csc, and runtime values too: we return objects, so these can be updated continuously by the encoder/csc
- setup cost should be dynamic: decrease as hardware encoding availability decreases (so we then become even less likely to use hardware for small windows)
rename rgb_format to pixel_format

Fri, 31 May 2013 13:13:19 GMT - ahuillet:

Pull - from where?

Mon, 03 Jun 2013 11:55:16 GMT - Antoine Martin:

Mostly done in:

r3540, r3541, r3542, r3543, r3544: preparatory work
r3545: the big one, lots of details in commit message
r3546, r3547 and r3548: build fixes

Not closing this ticket yet, what is left to do:

handle dimensions restrictions generically, and for scaled down dimensions too: expose it as an encoder spec property (so x264 can have its even window dimensions)
move ImageWrapper and maybe refactor it with the XImageWrapper
zero-copy pixels in and out (also look at vpx_img_wrap for vpx)
nogil option on non-python buffers
let client backing class tell the server if it can use scaling (currently just per client)
let client backing class tell the server which csc modes are allowed (currently just per client)
handle scaling in GL client code
handle csc client side for GL rendering (as GL cannot handle ARGB - but we may still want to use ARGB server side to reduce overall latency)

And maybe also:

swscale could tune the speed/flags at runtime by doing a full re-init (which is fast/cheap)
handle transient encoder failures: mark for retry/not
when trying out new pipelines, keep the old one around in case we need to fallback to it (the new one does not work out)

Thu, 06 Jun 2013 15:46:02 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to ahuillet
status changed from assigned to new

More improvements::

imagewrapper changes in r3552
nogil restored in r3572
less pixel copying in r3570 and r3568
handle size restrictions generically r3588

and lots of cleanups in between.

Which only leaves needing fixing for this ticket:

the OpenGL backing (ideally, having the ability to use RGBP pixel data directly, handling scaling would also be nice)
#350 smarter buffer allocation/free code

Other things we can deal with later (see also lists in previous comments):

finding real csc/encoder limits at runtime (would be worth at least testing what works and what doesn't to make sure the codec_spec is more or less accurate - finding out later when we try to use it is more expensive)
client_backing could tell the server which csc modes it will handle (alternative to fixing gl backing to handle RGBP)

Mon, 10 Jun 2013 12:58:07 GMT - Antoine Martin:

and... scaling brought back an old bug:

non-existing PPS 0 referenced

After spending a lot of time debugging and inspecting the server-side code (which looks fine), I managed to reproduce it with client side debugging switched on and found:

paint_with_video_decoder: window dimensions have changed from (421, 374) to (420, 374)
paint_with_video_decoder: new <type 'xpra.codecs.dec_avcodec.decoder.Decoder'>(420,374,YUV420P)
paint_with_video_decoder: info={'width': 420, 'colorspace': 'YUV420P', 'type': 'x264', 'height': 374}
[h264 @ 0x7f2b8836af00] non-existing PPS 0 referenced
[h264 @ 0x7f2b8836af00] decode_slice_header error
[h264 @ 0x7f2b8836af00] no frame!
Error while decoding frame
2013-06-10 19:04:51,472 draw error
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 1004, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 258, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 348, in draw_region
    self.paint_with_video_decoder(x264_Decoder, "x264", img_data, x, y, width, height, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 258, in paint_with_video_decoder
    coding, len(img_data), width, height, options))
Exception: paint_with_video_decoder: x264 decompression error on 1044 bytes of picture data for 420x374 pixels, \
    options={'quality': 29, 'frame': 2, 'speed': 100, 'csc': 'YUV420P'}

For the record, what is happening here is that we were sending the raw dimensions of the XImageWrapper we encode from as being the dimensions we can paint with, and these new dimensions could get clipped by x264's size restrictions (even dimensions only) to the same size as the current encoder pipeline (ie: when resizing a window to a size one pixel shorter) which means the current encoding pipeline was still valid, but this caused the client side code to erroneously think that it needed a new decoder pipeline to deal with the packet... (cleaning the old pipeline, causing the new video packet to reference a frame that has been discarded). A somewhat ugly solution is to pass the real dimensions via client_options, see attached patch. A better solution will be committed.

Mon, 10 Jun 2013 12:58:43 GMT - Antoine Martin: attachment set

attachment set to actual-video-size.patch

use the correct video size when decoding (hackish solution)

Mon, 17 Jun 2013 12:22:11 GMT - Antoine Martin: status changed; resolution set

status changed from new to closed
resolution set to fixed

The correct solution for the client decoding error was merged in r3607 (note: this was not caused by scaling after all)

Also merged a workaround for GL window's lack of RGB support in r3612 (+ r3617 for OR windows)

As for finding the encoder limits, we'll deal with that when we find problems. (note: this needs to go in the codec_spec rather than finding out too late when we try to instantiate a new pipeline)

This is enough for now, will follow up in #350, etc

Thu, 18 Jul 2013 05:28:23 GMT - Antoine Martin:

Note: all the csc/encoder info is now exported via xpra info:

window[2].csc=swscale
window[2].csc.dst_format=YUV444P
window[2].csc.dst_height=710
window[2].csc.dst_width=1364
window[2].csc.flags=BICUBLIN | SWS_ACCURATE_RND
window[2].csc.frames=6
window[2].csc.pixels_per_second=70503532
window[2].csc.src_format=BGRX
window[2].csc.src_height=710
window[2].csc.src_width=1364
window[2].csc.total_time_ms=82
window[2].encoder=x264
window[2].encoder.frames=6
window[2].encoder.height=710
window[2].encoder.pixels_per_second=23144142
window[2].encoder.preset=medium
window[2].encoder.profile=high444
window[2].encoder.src_format=YUV444P
window[2].encoder.total_time_ms=251
window[2].encoder.width=1364

And we also have the scores for figuring out what other encoding/csc options were available and how they ranked:

window[3].pipeline_option[0].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[0].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[0].format=YUV444P
window[3].pipeline_option[0].score=48
window[3].pipeline_option[1].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[1].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[1].format=YUV422P
window[3].pipeline_option[1].score=44
window[3].pipeline_option[2].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[2].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[2].format=YUV420P
window[3].pipeline_option[2].score=26

How they are ranked will depend on the current quality and speed settings which is also available:

window[2].quality.cur=62
window[2].speed.cur=34

Mon, 19 May 2014 12:38:53 GMT - Antoine Martin: milestone changed; version deleted

version trunk deleted
milestone changed from future to 0.10

(setting correct milestone the work was completed in)

Sat, 23 Jan 2021 04:51:49 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/328