xpra icon
Bug tracker and wiki

Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

#328 closed enhancement (fixed)

split yuv into its own pseudo codec

Reported by: Antoine Martin Owned by: ahuillet
Priority: minor Milestone: 0.10
Component: core Version:
Keywords: Cc:

Description

Why?:

  • cleaner, the code in the 264/vpx codecs is getting messy
  • we don't necessarily need yuv when using x264/vpx with the opengl client (though we would need to tweak the window backing logic to always choose opengl windows if yuv2rgb is not available)
  • make easier to tune the colourspace conversion step (downsampling/scaling etc)

Attached is a test patch (does not even compile) which shows how to get started. I don't have time to take this further right now..

Attachments (4)

yuv-codec.patch (15.5 KB) - added by Antoine Martin 6 years ago.
work in progress yuv codec
yuv-codec-v2.patch (9.6 KB) - added by Antoine Martin 6 years ago.
better patch - no longer using a yuvlib, all in cython
yuv-codec-v3.patch (93.7 KB) - added by Antoine Martin 6 years ago.
latest work in progress
actual-video-size.patch (2.2 KB) - added by Antoine Martin 6 years ago.
use the correct video size when decoding (hackish solution)

Download all attachments as: .zip

Change History (18)

Changed 6 years ago by Antoine Martin

Attachment: yuv-codec.patch added

work in progress yuv codec

comment:1 Changed 6 years ago by ahuillet

I think the "co" should also be split from the "dec".

comment:2 Changed 6 years ago by Antoine Martin

Owner: set to Antoine Martin
Status: newassigned

splitting encoder from decoders is done (that part was very easy) in r3265

Changed 6 years ago by Antoine Martin

Attachment: yuv-codec-v2.patch added

better patch - no longer using a yuvlib, all in cython

comment:3 Changed 6 years ago by ahuillet

The plan is: separate video backends into modules -

csc_* for colorspace conversion (csc_swscale, csc_nvcuda coming one day)
enc_* for encoding (enc_x264)
dec_* for decoding (dec_avcodec)

The patches on the C parts are done, with an improved API, and kept at https://github.com/ahuillet/video_backends.

Now we need to re-do the .pyx interfaces to make this work again.

comment:4 Changed 6 years ago by ahuillet

Quirks:

  • csc_ is able to support any colorspace, but enc_ and dec_ have not had that piece of functionality readded yet.
  • the logic for changing colorspace has been removed from enc_x264 because it's not its place, it will have to be reimplemented in python

Memory management:

  • you allocate a context with init_* functions like before, you clean up the context with free_* functions like before, but those functions *do not* free the context, you have to call free(context) manually
  • pass { NULL, NULL, NULL } to csc_image functions - they allocate their own memory, you free it by calling free_csc_image({})
  • enc_x264 allocate its own memory and you don't free it afterwards (it reuses the same buffer over and over again)
  • I am not sure about dec_avcodec, will check.

comment:5 Changed 6 years ago by Antoine Martin

We also need to deal with reads beyond the size of the buffer caused by swscale reading 8 (or more?) bytes at a time - valgrind output:

==8159== Thread 6:
==8159== Invalid read of size 8
==8159==    at 0xF7580E1: ??? (in /usr/lib64/libswscale.so.2.1.101)
==8159==    by 0xF750038: ??? (in /usr/lib64/libswscale.so.2.1.101)
==8159==    by 0xF756465: sws_scale (in /usr/lib64/libswscale.so.2.1.101)
==8159==    by 0xFFFC441: csc_image_rgb2yuv (x264lib.c:502)
==8159==    by 0xFFF71B5: __pyx_pw_4xpra_6codecs_4x264_7encoder_7Encoder_25compress_image (encoder.c:3033)
==8159==    by 0x35B1EDD280: PyEval_EvalFrameEx (ceval.c:4098)
==8159==    by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1E6D925: function_call (funcobject.c:526)
==8159==    by 0x35B1E49C0D: PyObject_Call (abstract.c:2529)
==8159==    by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==  Address 0x1acea03c is 630,732 bytes inside a block of size 630,736 alloc'd
==8159==    at 0x4A0887C: malloc (vg_replace_malloc.c:270)
==8159==    by 0x35B3A287BB: XGetImage (GetImage.c:82)
==8159==    by 0x158E52C6: __pyx_pw_4xpra_3x11_7gtk_x11_12gdk_bindings_13PixmapWrapper_3get_image (gdk_bindings.c:6555)
==8159==    by 0x35B1E49C0D: PyObject_Call (abstract.c:2529)
==8159==    by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411)
==8159==    by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)
==8159==    by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330)
==8159==    by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)

comment:6 Changed 6 years ago by ahuillet

New backend infrastructure WIP at https://github.com/ahuillet/xpra/commits/

Things to do :

  • re-add nogil
  • fix xmemalign() definition and calls
  • colorspace other than 420 need to be de-hardcoded
  • OpenGL backend seems to go through CSC anyway
  • packaging, debs, rpms - patches directory
  • win32 build
  • vpx
  • feed RGB to x264 directly instead of YUV444
Version 7, edited 6 years ago by ahuillet (previous) (next) (diff)

Changed 6 years ago by Antoine Martin

Attachment: yuv-codec-v3.patch added

latest work in progress

comment:7 Changed 6 years ago by Antoine Martin

Please pull the latest changes I've sent, updated list of remaining issues (help needed - although not all of these issues need to be dealt with before we merge into trunk, most of them will need doing at some point):

  • edge resistance: take current setting into account, so the current one (or similar ones that re-use csc or video encoder) will get a score boost
  • fire pipeline changes from timer thread if new best option wins (set new quality/speed then re-evaluate pipeline)
  • move ImageWrapper to global location (no longer a server-only class)
  • free_csc is a simple free(),s o no real need for the currently very ugly code: merge with XImageWrapper code which does do a free()

Also, maybe we want/have to use separate memaligned buffers for each plane?

  • handle scaling and odd dimensions problems
    • expose dimensions mask (0xFFFE is for x264 only?)
  • BGRA does not work for vpx or x264!?: segfault! - disabled by #ifdef for now (may also need rowstride work for uneven sizes?) - and even if it does work, this also affects:
  • with other csc modes than YUV420P, YUV422P, YUV444P, say BGRA: what should we expose to the client? does it care about the mode that was used or about the actual pixel dimensions used during compression. maybe we should not expose the option "csc=YUV422P" but something like: "csc_divs=(1,1),(2,1),(2,1)" instead? and maybe this ties with scaling too
  • related to above: old clients without "encoding_client_options" option could still parse pixel data as long as the divs are the same? (not that we would want to care and handle this particular case - just a note)
  • ensure we zero copy pixels instead of copying from decoder/encoder, then:
    • restore nogil: verify that we are passing the same pointers via string wrappers and that we are allowed to manipulate them without the gil.
  • fix opengl (no idea why the output is blank - the yuv codepath works for the non-gl code..)
  • swscale: could be tuned using speed/quality too: use FAST_BILINEAR if quality=0 and speed=100, do re-init in set_quality/set_speed - which could be combined as: configure(quality, speed)
  • think about target bitrate mode - no real difficulty there?
  • vpx: use vpx_img_wrap to ensure we don't copy the pixels
  • client side: use similar code to choose the decoder path, and maybe choose window backing (acceleration apis generally require direct access to the backing window)
  • dec_avcodec should be able to handle vpx: make the decoder export the list of codec formats it handles (x264, vpx, etc)
  • max width and height for csc (ffmpeg ML related message), vpx and x264: find the actual limits from somewhere (API?) rather than hard-coding to 4k*4k
  • rename x264 to h264 as encoding option (and keep "x264" in encodings for backwards compatibility)
  • move memalign to lib with pxd for importing into python code when we create image buffers
  • encoder/csc init: probe and maybe test speed (also useful to test it does work!) - ie: could test csc a 1k*1k random/test buffer to figure out latency
  • vpx cannot do speed or quality for now: we should skip the calculations! (keep caps telling us what we can do for the current encoder)
  • maybe add command line options for enabling/disabling/prioritizing encoder backends
  • if window size has changed just a little, we could send the edges as per the one-pixel workaround (which may well need generalizing anyway when dealing with thins like scaling), at least a few times before doing the full re-init which sends a new (expensive) iframe. And by that point the window may have been resized some more - so we don't lose video delta benefits.
  • the encoder specs should be static: the pipeline class should have a single copy of all encoder specs for all window sources. Then:
    • the encoder specs can have guesstimates and for some enc/csc, and runtime values too: we return objects, so these can be updated continuously by the encoder/csc
    • setup cost should be dynamic: decrease as hardware encoding availability decreases (so we then become even less likely to use hardware for small windows)
  • rename rgb_format to pixel_format

comment:8 Changed 6 years ago by ahuillet

Pull - from where?

comment:9 Changed 6 years ago by Antoine Martin

Mostly done in:

Not closing this ticket yet, what is left to do:

  • handle dimensions restrictions generically, and for scaled down dimensions too: expose it as an encoder spec property (so x264 can have its even window dimensions)
  • move ImageWrapper and maybe refactor it with the XImageWrapper
  • zero-copy pixels in and out (also look at vpx_img_wrap for vpx)
  • nogil option on non-python buffers
  • let client backing class tell the server if it can use scaling (currently just per client)
  • let client backing class tell the server which csc modes are allowed (currently just per client)
  • handle scaling in GL client code
  • handle csc client side for GL rendering (as GL cannot handle ARGB - but we may still want to use ARGB server side to reduce overall latency)

And maybe also:

  • swscale could tune the speed/flags at runtime by doing a full re-init (which is fast/cheap)
  • handle transient encoder failures: mark for retry/not
  • when trying out new pipelines, keep the old one around in case we need to fallback to it (the new one does not work out)
Last edited 6 years ago by Antoine Martin (previous) (diff)

comment:10 Changed 6 years ago by Antoine Martin

Owner: changed from Antoine Martin to ahuillet
Status: assignednew

More improvements::

  • imagewrapper changes in r3552
  • nogil restored in r3572
  • less pixel copying in r3570 and r3568
  • handle size restrictions generically r3588

and lots of cleanups in between.

Which only leaves needing fixing for this ticket:

  • the OpenGL backing (ideally, having the ability to use RGBP pixel data directly, handling scaling would also be nice)
  • #350 smarter buffer allocation/free code

Other things we can deal with later (see also lists in previous comments):

  • finding real csc/encoder limits at runtime (would be worth at least testing what works and what doesn't to make sure the codec_spec is more or less accurate - finding out later when we try to use it is more expensive)
  • client_backing could tell the server which csc modes it will handle (alternative to fixing gl backing to handle RGBP)

comment:11 Changed 6 years ago by Antoine Martin

and... scaling brought back an old bug:

non-existing PPS 0 referenced

After spending a lot of time debugging and inspecting the server-side code (which looks fine), I managed to reproduce it with client side debugging switched on and found:

paint_with_video_decoder: window dimensions have changed from (421, 374) to (420, 374)
paint_with_video_decoder: new <type 'xpra.codecs.dec_avcodec.decoder.Decoder'>(420,374,YUV420P)
paint_with_video_decoder: info={'width': 420, 'colorspace': 'YUV420P', 'type': 'x264', 'height': 374}
[h264 @ 0x7f2b8836af00] non-existing PPS 0 referenced
[h264 @ 0x7f2b8836af00] decode_slice_header error
[h264 @ 0x7f2b8836af00] no frame!
Error while decoding frame
2013-06-10 19:04:51,472 draw error
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 1004, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 258, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 348, in draw_region
    self.paint_with_video_decoder(x264_Decoder, "x264", img_data, x, y, width, height, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 258, in paint_with_video_decoder
    coding, len(img_data), width, height, options))
Exception: paint_with_video_decoder: x264 decompression error on 1044 bytes of picture data for 420x374 pixels, \
    options={'quality': 29, 'frame': 2, 'speed': 100, 'csc': 'YUV420P'}

For the record, what is happening here is that we were sending the raw dimensions of the XImageWrapper we encode from as being the dimensions we can paint with, and these new dimensions could get clipped by x264's size restrictions (even dimensions only) to the same size as the current encoder pipeline (ie: when resizing a window to a size one pixel shorter) which means the current encoding pipeline was still valid, but this caused the client side code to erroneously think that it needed a new decoder pipeline to deal with the packet... (cleaning the old pipeline, causing the new video packet to reference a frame that has been discarded).
A somewhat ugly solution is to pass the real dimensions via client_options, see attached patch. A better solution will be committed.

Changed 6 years ago by Antoine Martin

Attachment: actual-video-size.patch added

use the correct video size when decoding (hackish solution)

comment:12 Changed 6 years ago by Antoine Martin

Resolution: fixed
Status: newclosed

The correct solution for the client decoding error was merged in r3607
(note: this was not caused by scaling after all)

Also merged a workaround for GL window's lack of RGB support in r3612 (+ r3617 for OR windows)

As for finding the encoder limits, we'll deal with that when we find problems.
(note: this needs to go in the codec_spec rather than finding out too late when we try to instantiate a new pipeline)

This is enough for now, will follow up in #350, etc

comment:13 Changed 6 years ago by Antoine Martin

Note: all the csc/encoder info is now exported via xpra info:

window[2].csc=swscale
window[2].csc.dst_format=YUV444P
window[2].csc.dst_height=710
window[2].csc.dst_width=1364
window[2].csc.flags=BICUBLIN | SWS_ACCURATE_RND
window[2].csc.frames=6
window[2].csc.pixels_per_second=70503532
window[2].csc.src_format=BGRX
window[2].csc.src_height=710
window[2].csc.src_width=1364
window[2].csc.total_time_ms=82
window[2].encoder=x264
window[2].encoder.frames=6
window[2].encoder.height=710
window[2].encoder.pixels_per_second=23144142
window[2].encoder.preset=medium
window[2].encoder.profile=high444
window[2].encoder.src_format=YUV444P
window[2].encoder.total_time_ms=251
window[2].encoder.width=1364


And we also have the scores for figuring out what other encoding/csc options were available and how they ranked:

window[3].pipeline_option[0].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[0].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[0].format=YUV444P
window[3].pipeline_option[0].score=48
window[3].pipeline_option[1].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[1].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[1].format=YUV422P
window[3].pipeline_option[1].score=44
window[3].pipeline_option[2].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter)
window[3].pipeline_option[2].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder)
window[3].pipeline_option[2].format=YUV420P
window[3].pipeline_option[2].score=26

How they are ranked will depend on the current quality and speed settings which is also available:

window[2].quality.cur=62
window[2].speed.cur=34

comment:14 Changed 5 years ago by Antoine Martin

Milestone: future0.10
Version: trunk

(setting correct milestone the work was completed in)

Note: See TracTickets for help on using tickets.