#328 closed enhancement (fixed)
split yuv into its own pseudo codec
Reported by: | Antoine Martin | Owned by: | ahuillet |
---|---|---|---|
Priority: | minor | Milestone: | 0.10 |
Component: | core | Version: | |
Keywords: | Cc: |
Description
Why?:
- cleaner, the code in the 264/vpx codecs is getting messy
- we don't necessarily need yuv when using x264/vpx with the opengl client (though we would need to tweak the window backing logic to always choose opengl windows if yuv2rgb is not available)
- make easier to tune the colourspace conversion step (downsampling/scaling etc)
Attached is a test patch (does not even compile) which shows how to get started. I don't have time to take this further right now..
Attachments (4)
Change History (19)
Changed 9 years ago by
Attachment: | yuv-codec.patch added |
---|
comment:2 Changed 9 years ago by
Owner: | set to Antoine Martin |
---|---|
Status: | new → assigned |
splitting encoder from decoders is done (that part was very easy) in r3265
Changed 9 years ago by
Attachment: | yuv-codec-v2.patch added |
---|
better patch - no longer using a yuvlib, all in cython
comment:3 Changed 9 years ago by
The plan is: separate video backends into modules -
csc_* for colorspace conversion (csc_swscale, csc_nvcuda coming one day)
enc_* for encoding (enc_x264)
dec_* for decoding (dec_avcodec)
The patches on the C parts are done, with an improved API, and kept at https://github.com/ahuillet/video_backends.
Now we need to re-do the .pyx interfaces to make this work again.
comment:4 Changed 9 years ago by
Quirks:
- csc_ is able to support any colorspace, but enc_ and dec_ have not had that piece of functionality readded yet.
- the logic for changing colorspace has been removed from enc_x264 because it's not its place, it will have to be reimplemented in python
Memory management:
- you allocate a context with init_* functions like before, you clean up the context with free_* functions like before, but those functions *do not* free the context, you have to call free(context) manually
- pass { NULL, NULL, NULL } to csc_image functions - they allocate their own memory, you free it by calling free_csc_image({})
- enc_x264 allocate its own memory and you don't free it afterwards (it reuses the same buffer over and over again)
- I am not sure about dec_avcodec, will check.
comment:5 Changed 9 years ago by
We also need to deal with reads beyond the size of the buffer caused by swscale reading 8 (or more?) bytes at a time - valgrind output:
==8159== Thread 6: ==8159== Invalid read of size 8 ==8159== at 0xF7580E1: ??? (in /usr/lib64/libswscale.so.2.1.101) ==8159== by 0xF750038: ??? (in /usr/lib64/libswscale.so.2.1.101) ==8159== by 0xF756465: sws_scale (in /usr/lib64/libswscale.so.2.1.101) ==8159== by 0xFFFC441: csc_image_rgb2yuv (x264lib.c:502) ==8159== by 0xFFF71B5: __pyx_pw_4xpra_6codecs_4x264_7encoder_7Encoder_25compress_image (encoder.c:3033) ==8159== by 0x35B1EDD280: PyEval_EvalFrameEx (ceval.c:4098) ==8159== by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184) ==8159== by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330) ==8159== by 0x35B1E6D925: function_call (funcobject.c:526) ==8159== by 0x35B1E49C0D: PyObject_Call (abstract.c:2529) ==8159== by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411) ==8159== by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330) ==8159== Address 0x1acea03c is 630,732 bytes inside a block of size 630,736 alloc'd ==8159== at 0x4A0887C: malloc (vg_replace_malloc.c:270) ==8159== by 0x35B3A287BB: XGetImage (GetImage.c:82) ==8159== by 0x158E52C6: __pyx_pw_4xpra_3x11_7gtk_x11_12gdk_bindings_13PixmapWrapper_3get_image (gdk_bindings.c:6555) ==8159== by 0x35B1E49C0D: PyObject_Call (abstract.c:2529) ==8159== by 0x35B1EDA08A: PyEval_EvalFrameEx (ceval.c:4411) ==8159== by 0x35B1EDCEF0: PyEval_EvalFrameEx (ceval.c:4184) ==8159== by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330) ==8159== by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194) ==8159== by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330) ==8159== by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194) ==8159== by 0x35B1EDDCBE: PyEval_EvalCodeEx (ceval.c:3330) ==8159== by 0x35B1EDC2B2: PyEval_EvalFrameEx (ceval.c:4194)
comment:6 Changed 9 years ago by
New backend infrastructure WIP at https://github.com/ahuillet/xpra/commits/
Things to do :
- re-add nogil
- fix xmemalign() definition and calls
- colorspace other than 420 need to be de-hardcoded
- OpenGL backend seems to go through CSC anyway
- packaging, debs, rpms - patches directory
- win32 build
- vpx
- feed RGB to x264 directly instead of YUV444
- quality = 100 => ultrafast in x264
comment:7 Changed 9 years ago by
Please pull the latest changes I've sent, updated list of remaining issues (help needed - although not all of these issues need to be dealt with before we merge into trunk, most of them will need doing at some point):
- edge resistance: take current setting into account, so the current one (or similar ones that re-use csc or video encoder) will get a score boost
- fire pipeline changes from timer thread if new best option wins (set new quality/speed then re-evaluate pipeline)
- move
ImageWrapper
to global location (no longer a server-only class) - free_csc is a simple
free()
,s o no real need for the currently very ugly code: merge with XImageWrapper code which does do a free()
Also, maybe we want/have to use separate memaligned buffers for each plane?
- handle scaling and odd dimensions problems
- expose dimensions mask (0xFFFE is for x264 only?)
- BGRA does not work for vpx or x264!?: segfault! - disabled by #ifdef for now (may also need rowstride work for uneven sizes?) - and even if it does work, this also affects:
- with other csc modes than YUV420P, YUV422P, YUV444P, say BGRA: what should we expose to the client? does it care about the mode that was used or about the actual pixel dimensions used during compression. maybe we should not expose the option "csc=YUV422P" but something like: "csc_divs=(1,1),(2,1),(2,1)" instead? and maybe this ties with scaling too
- related to above: old clients without "encoding_client_options" option could still parse pixel data as long as the divs are the same? (not that we would want to care and handle this particular case - just a note)
- ensure we zero copy pixels instead of copying from decoder/encoder, then:
- restore nogil: verify that we are passing the same pointers via string wrappers and that we are allowed to manipulate them without the gil.
- fix opengl (no idea why the output is blank - the yuv codepath works for the non-gl code..)
- swscale: could be tuned using speed/quality too: use
FAST_BILINEAR
if quality=0 and speed=100, do re-init in set_quality/set_speed - which could be combined as:configure(quality, speed)
- think about target bitrate mode - no real difficulty there?
- vpx: use vpx_img_wrap to ensure we don't copy the pixels
- client side: use similar code to choose the decoder path, and maybe choose window backing (acceleration apis generally require direct access to the backing window)
dec_avcodec
should be able to handle vpx: make the decoder export the list of codec formats it handles (x264, vpx, etc)- max width and height for csc (ffmpeg ML related message), vpx and x264: find the actual limits from somewhere (API?) rather than hard-coding to 4k*4k
- rename x264 to h264 as encoding option (and keep "x264" in encodings for backwards compatibility)
- move memalign to lib with pxd for importing into python code when we create image buffers
- encoder/csc init: probe and maybe test speed (also useful to test it does work!) - ie: could test csc a 1k*1k random/test buffer to figure out latency
- vpx cannot do speed or quality for now: we should skip the calculations! (keep caps telling us what we can do for the current encoder)
- maybe add command line options for enabling/disabling/prioritizing encoder backends
- if window size has changed just a little, we could send the edges as per the one-pixel workaround (which may well need generalizing anyway when dealing with thins like scaling), at least a few times before doing the full re-init which sends a new (expensive) iframe. And by that point the window may have been resized some more - so we don't lose video delta benefits.
- the encoder specs should be static: the pipeline class should have a single copy of all encoder specs for all window sources. Then:
- the encoder specs can have guesstimates and for some enc/csc, and runtime values too: we return objects, so these can be updated continuously by the encoder/csc
- setup cost should be dynamic: decrease as hardware encoding availability decreases (so we then become even less likely to use hardware for small windows)
- rename
rgb_format
topixel_format
comment:9 Changed 9 years ago by
Mostly done in:
- r3540, r3541, r3542, r3543, r3544: preparatory work
- r3545: the big one, lots of details in commit message
- r3546, r3547 and r3548: build fixes
Not closing this ticket yet, what is left to do:
- handle dimensions restrictions generically, and for scaled down dimensions too: expose it as an encoder spec property (so x264 can have its even window dimensions)
- move
ImageWrapper
and maybe refactor it with theXImageWrapper
- zero-copy pixels in and out (also look at
vpx_img_wrap
for vpx) - nogil option on non-python buffers
- let client backing class tell the server if it can use scaling (currently just per client)
- let client backing class tell the server which csc modes are allowed (currently just per client)
- handle scaling in GL client code
- handle csc client side for GL rendering (as GL cannot handle ARGB - but we may still want to use ARGB server side to reduce overall latency)
And maybe also:
- swscale could tune the speed/flags at runtime by doing a full re-init (which is fast/cheap)
- handle transient encoder failures: mark for retry/not
- when trying out new pipelines, keep the old one around in case we need to fallback to it (the new one does not work out)
comment:10 Changed 9 years ago by
Owner: | changed from Antoine Martin to ahuillet |
---|---|
Status: | assigned → new |
More improvements::
- imagewrapper changes in r3552
- nogil restored in r3572
- less pixel copying in r3570 and r3568
- handle size restrictions generically r3588
and lots of cleanups in between.
Which only leaves needing fixing for this ticket:
- the
OpenGL
backing (ideally, having the ability to use RGBP pixel data directly, handling scaling would also be nice) - #350 smarter buffer allocation/free code
Other things we can deal with later (see also lists in previous comments):
- finding real csc/encoder limits at runtime (would be worth at least testing what works and what doesn't to make sure the
codec_spec
is more or less accurate - finding out later when we try to use it is more expensive) - client_backing could tell the server which csc modes it will handle (alternative to fixing gl backing to handle RGBP)
comment:11 Changed 9 years ago by
and... scaling brought back an old bug:
non-existing PPS 0 referenced
After spending a lot of time debugging and inspecting the server-side code (which looks fine), I managed to reproduce it with client side debugging switched on and found:
paint_with_video_decoder: window dimensions have changed from (421, 374) to (420, 374) paint_with_video_decoder: new <type 'xpra.codecs.dec_avcodec.decoder.Decoder'>(420,374,YUV420P) paint_with_video_decoder: info={'width': 420, 'colorspace': 'YUV420P', 'type': 'x264', 'height': 374} [h264 @ 0x7f2b8836af00] non-existing PPS 0 referenced [h264 @ 0x7f2b8836af00] decode_slice_header error [h264 @ 0x7f2b8836af00] no frame! Error while decoding frame 2013-06-10 19:04:51,472 draw error Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 1004, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 258, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 348, in draw_region self.paint_with_video_decoder(x264_Decoder, "x264", img_data, x, y, width, height, options, callbacks) File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 258, in paint_with_video_decoder coding, len(img_data), width, height, options)) Exception: paint_with_video_decoder: x264 decompression error on 1044 bytes of picture data for 420x374 pixels, \ options={'quality': 29, 'frame': 2, 'speed': 100, 'csc': 'YUV420P'}
For the record, what is happening here is that we were sending the raw dimensions of the XImageWrapper
we encode from as being the dimensions we can paint with, and these new dimensions could get clipped by x264's size restrictions (even dimensions only) to the same size as the current encoder pipeline (ie: when resizing a window to a size one pixel shorter) which means the current encoding pipeline was still valid, but this caused the client side code to erroneously think that it needed a new decoder pipeline to deal with the packet... (cleaning the old pipeline, causing the new video packet to reference a frame that has been discarded).
A somewhat ugly solution is to pass the real dimensions via client_options, see attached patch. A better solution will be committed.
Changed 9 years ago by
Attachment: | actual-video-size.patch added |
---|
use the correct video size when decoding (hackish solution)
comment:12 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
The correct solution for the client decoding error was merged in r3607
(note: this was not caused by scaling after all)
Also merged a workaround for GL window's lack of RGB support in r3612 (+ r3617 for OR windows)
As for finding the encoder limits, we'll deal with that when we find problems.
(note: this needs to go in the codec_spec rather than finding out too late when we try to instantiate a new pipeline)
This is enough for now, will follow up in #350, etc
comment:13 Changed 9 years ago by
Note: all the csc/encoder info is now exported via xpra info:
window[2].csc=swscale window[2].csc.dst_format=YUV444P window[2].csc.dst_height=710 window[2].csc.dst_width=1364 window[2].csc.flags=BICUBLIN | SWS_ACCURATE_RND window[2].csc.frames=6 window[2].csc.pixels_per_second=70503532 window[2].csc.src_format=BGRX window[2].csc.src_height=710 window[2].csc.src_width=1364 window[2].csc.total_time_ms=82 window[2].encoder=x264 window[2].encoder.frames=6 window[2].encoder.height=710 window[2].encoder.pixels_per_second=23144142 window[2].encoder.preset=medium window[2].encoder.profile=high444 window[2].encoder.src_format=YUV444P window[2].encoder.total_time_ms=251 window[2].encoder.width=1364
And we also have the scores for figuring out what other encoding/csc options were available and how they ranked:
window[3].pipeline_option[0].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter) window[3].pipeline_option[0].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder) window[3].pipeline_option[0].format=YUV444P window[3].pipeline_option[0].score=48 window[3].pipeline_option[1].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter) window[3].pipeline_option[1].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder) window[3].pipeline_option[1].format=YUV422P window[3].pipeline_option[1].score=44 window[3].pipeline_option[2].csc=codec_spec(xpra.codecs.csc_swscale.colorspace_converter.ColorspaceConverter) window[3].pipeline_option[2].encoder=codec_spec(xpra.codecs.enc_x264.encoder.Encoder) window[3].pipeline_option[2].format=YUV420P window[3].pipeline_option[2].score=26
How they are ranked will depend on the current quality and speed settings which is also available:
window[2].quality.cur=62 window[2].speed.cur=34
comment:14 Changed 8 years ago by
Milestone: | future → 0.10 |
---|---|
Version: | trunk |
(setting correct milestone the work was completed in)
comment:15 Changed 17 months ago by
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/328
work in progress yuv codec