split from #227
when decoding to yuv, we copy each component (Y,U,V) to a python string before returning from the cython x264 codec.pyx
, this crashes on win32, seemingly because of a bad memory access.
What is strange:
I have added some debugging to codec.pyx
(see attachment) to see how much of the data we can read before getting the crash and on which channel the crash occurs (change channels=[..]
to choose which channels are copied and which are just filled with zeroes).
For a 499x316 window, the crash seems to occur on the "V" channel (index=2) at about 63%.
I think it may just be a case of memory corruption caused by PyOpenGL
.
codec with lots of debugging to try to figure out where/why it crashes on win32
closing: it was opengl corrupting things, the crash was not caused by the yuv code.
The problem still happens to me on Win32 + OpenGL, as well as on my 32 bit Archlinux on my eeePC with Intel driver (doesn't happen on 64bit Archlinux Intel nor nVidia).
Setting a fixed quality (--quality=90) makes the crash disappear.
Removing all the free-like calls from do_clean_decoder does *not* make the crash disappear.
Patch I used is:
diff --git a/src/xpra/x264/x264lib.c b/src/xpra/x264/x264lib.c index d18a8a6..395c8eb 100644 --- a/src/xpra/x264/x264lib.c +++ b/src/xpra/x264/x264lib.c @@ -416,23 +416,23 @@ struct x264lib_ctx *init_decoder(int width, int height, int csc_fmt) void do_clean_decoder(struct x264lib_ctx *ctx) { if (ctx->frame) { - avcodec_free_frame(&ctx->frame); +// avcodec_free_frame(&ctx->frame); ctx->frame = NULL; } if (ctx->codec_ctx) { - avcodec_close(ctx->codec_ctx); - av_free(ctx->codec_ctx); +// avcodec_close(ctx->codec_ctx); +// av_free(ctx->codec_ctx); ctx->codec_ctx = NULL; } if (ctx->yuv2rgb) { - sws_freeContext(ctx->yuv2rgb); +// sws_freeContext(ctx->yuv2rgb); ctx->yuv2rgb = NULL; } } void clean_decoder(struct x264lib_ctx *ctx) { do_clean_decoder(ctx); - free(ctx); +// free(ctx); }
By the way, the crash's backtrace on Linux 32bits is the following:
#0 0xb7cdec66 in __memcpy_ia32 () from /usr/lib/libc.so.6 #1 0x00014fc0 in ?? () #2 0xb7e9fc2c in PyString_FromStringAndSize () from /usr/lib/libpython2.7.so.1.0 #3 0xb72692b0 in __pyx_pw_4xpra_4x264_5codec_7Decoder_5decompress_image_to_yuv () from /home/arthur/SD/xpra/src/build/lib.linux-i686-2.7/xpra/x264/codec.so #4 0xb7e92991 in PyCFunction_Call () from /usr/lib/libpython2.7.so.1.0
Looking at the dissassembly (for lack of line number info in codec.so), we see that the crashes occurs upon reading the decoded video.
Setting MEMALIGN to 0 in x264lib.c on Linux doesn't help - the crash happens all the same.
With VPX, on Linux 32 bits:
#0 0xb7cdec66 in __memcpy_ia32 () from /usr/lib/libc.so.6 #1 0x000184c0 in ?? () #2 0xb7e9fc2c in PyString_FromStringAndSize () from /usr/lib/libpython2.7.so.1.0 #3 0xb72d807b in __pyx_pw_4xpra_3vpx_5codec_7Decoder_5decompress_image_to_yuv () from /home/arthur/SD/xpra/src/build/lib.linux-i686-2.7/xpra/vpx/codec.so #4 0xb7e92991 in PyCFunction_Call () from /usr/lib/libpython2.7.so.1.0 #5 0xb7ef0758 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0 #6 0xb7ef007b in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0 #7 0xb7ef007b in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
It is worth noting that VPX doesn't go through libav at all. This strongly hints at a problem *not* in libav nor the way we use it.
Disabling swscale context creation doesn't help either (we don't use it with GL, so we don't need it).
diff --git a/src/xpra/x264/x264lib.c b/src/xpra/x264/x264lib.c index d18a8a6..1199cda 100644 --- a/src/xpra/x264/x264lib.c +++ b/src/xpra/x264/x264lib.c @@ -376,7 +376,7 @@ int init_decoder_context(struct x264lib_ctx *ctx, int width, int height, int csc ctx->height = height; ctx->csc_format = csc_fmt; ctx->csc_algo = get_csc_algo_for_quality(100); - ctx->yuv2rgb = sws_getContext(ctx->width, ctx->height, ctx->csc_format, ctx->width, ctx->height, PIX_FMT_RGB24, + ctx->yuv2rgb = NULL;//sws_getContext(ctx->width, ctx->height, ctx->csc_format, ctx->width, ctx->height, PIX_FMT_ avcodec_register_all();
On Linux 32bits, I have disabled all GL commands, while still going through the GL codepath, in order to check if perhaps our GL code was to blame. The patch below was applied:
diff --git a/src/xpra/gl/gl_window_backing.py b/src/xpra/gl/gl_window_backing.py index d209735..d1a8ccb 100644 --- a/src/xpra/gl/gl_window_backing.py +++ b/src/xpra/gl/gl_window_backing.py @@ -73,6 +73,7 @@ class GLPixmapBacking(PixmapBacking): PixmapBacking.init(self, w, h) def gl_init(self): + return drawable = self.gl_begin() w, h = self.size debug("GL Pixmap backing size: %d x %d, drawable=%s", w, h, drawable) @@ -94,12 +95,14 @@ class GLPixmapBacking(PixmapBacking): return drawable def close(self): + return PixmapBacking.close(self) self.remove_shader() self.glarea = None self.glconfig = None def remove_shader(self): + return if self.yuv_shader: drawable = self.gl_init() if drawable: @@ -111,6 +114,7 @@ class GLPixmapBacking(PixmapBacking): self.yuv_shader = None def gl_begin(self): + return if self.glarea is None: return None #closed already drawable = self.glarea.get_gl_drawable() @@ -124,6 +128,7 @@ class GLPixmapBacking(PixmapBacking): return drawable def gl_end(self, drawable): + return if drawable.is_double_buffered(): drawable.swap_buffers() else: @@ -131,6 +136,7 @@ class GLPixmapBacking(PixmapBacking): drawable.gl_end() def gl_expose_event(self, glarea, event): + return debug("gl_expose_event(%s, %s)", glarea, event) area = event.area x, y, w, h = area.x, area.y, area.width, area.height @@ -143,11 +149,12 @@ class GLPixmapBacking(PixmapBacking): self.gl_end(drawable) def _do_paint_rgb24(self, img_data, x, y, width, height, rowstride, options, callbacks): + return gc = self._backing.new_gc() self.glarea.window.draw_rgb_image(gc, x, y, width, height, gdk.RGB_DITHER_NONE, img_data, rowstride) def do_video_paint(self, coding, img_data, x, y, w, h, options, callbacks): - debug("do_video_paint: options=%s, decoder=%s", options, type(self._video_decoder)) + log.error("do_video_paint: options=%s, decoder=%s", options, type(self._video_decoder)) err, rowstrides, img_data = self._video_decoder.decompress_image_to_yuv(img_data, options) csc_pixel_format = options.get("csc_pixel_format", -1) #this needs to be done here so we still hold the video_decoder lock: @@ -161,6 +168,7 @@ class GLPixmapBacking(PixmapBacking): gc = self._backing.new_gc() self.glarea.window.draw_rgb_image(gc, x, y, width, height, gdk.RGB_DITHER_NONE, img_data, rowstride) def do_video_paint(self, coding, img_data, x, y, w, h, options, callbacks): - debug("do_video_paint: options=%s, decoder=%s", options, type(self._video_decoder)) + log.error("do_video_paint: options=%s, decoder=%s", options, type(self._video_decoder)) err, rowstrides, img_data = self._video_decoder.decompress_image_to_yuv(img_data, options) csc_pixel_format = options.get("csc_pixel_format", -1) #this needs to be done here so we still hold the video_decoder lock: @@ -161,6 +168,7 @@ class GLPixmapBacking(PixmapBacking): gobject.idle_add(self.do_gl_paint, x, y, w, h, img_data, rowstrides, pixel_format, callbacks) def do_gl_paint(self, x, y, w, h, img_data, rowstrides, pixel_format, callbacks): + return #this function runs in the UI thread, no video_decoder lock held drawable = self.gl_init() if not drawable: @@ -192,6 +200,7 @@ class GLPixmapBacking(PixmapBacking): raise Exception("invalid pixel format: %s" % pixel_format) def update_texture_yuv(self, img_data, x, y, width, height, rowstrides, pixel_format): + return window_width, window_height = self.size assert self.textures is not None, "no OpenGL textures!" @@ -253,6 +262,7 @@ class GLPixmapBacking(PixmapBacking): glFlush() def render_image(self, rx, ry, rw, rh): + return debug("render_image %sx%s at %sx%s pixel_format=%s", rw, rh, rx, ry, self.pixel_format) if self.pixel_format not in (YUV420P, YUV422P, YUV444P): #not ready to render yet
What is interesting is - we get the exact same crash. It's harder to reproduce, I don't know why, but it crashes eventually. Note that everything is disabled but do_video_paint.
What valgrind complains about is the following:
==9308== Invalid read of size 4 ==9308== at 0x402D4E8: memcpy (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==9308== by 0x40C6C2B: PyString_FromStringAndSize (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x677735F: __pyx_pw_4xpra_4x264_5codec_7Decoder_5decompress_image_to_yuv (in /home/arthur/SD/xpra/src/build/lib.linux-i686-2.7/xpra/x264/codec.so) ==9308== by 0x40B9990: PyCFunction_Call (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x4117757: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x411707A: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x411707A: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x411707A: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x4117FBC: PyEval_EvalCodeEx (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x41161B3: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x4117FBC: PyEval_EvalCodeEx (in /usr/lib/libpython2.7.so.1.0) ==9308== by 0x41161B3: PyEval_EvalFrameEx (in /usr/lib/libpython2.7.so.1.0) ==9308== Address 0xc9046e4 is 4 bytes after a block of size 58,688 alloc'd ==9308== at 0x4029234: memalign (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==9308== by 0x402934E: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) ==9308== by 0x68A2D87: av_malloc (in /usr/lib/libavutil.so.52.13.100) ==9308== by 0x6F9C3FC: avcodec_default_get_buffer (in /usr/lib/libavcodec.so.54.86.100) ==9308== by 0x6E98689: ??? (in /usr/lib/libavcodec.so.54.86.100) ==9308== by 0x6E9AD35: ??? (in /usr/lib/libavcodec.so.54.86.100) ==9308== by 0x6D35A83: ??? (in /usr/lib/libavcodec.so.54.86.100) ==9308== by 0x6D37B98: ??? (in /usr/lib/libavcodec.so.54.86.100) ==9308== by 0x6D3AC84: ??? (in /usr/lib/libavcodec.so.54.86.100) ==9308== by 0x6F9EC1D: avcodec_decode_video2 (in /usr/lib/libavcodec.so.54.86.100) ==9308== by 0x677D586: decompress_image (in /home/arthur/SD/xpra/src/build/lib.linux-i686-2.7/xpra/x264/codec.so) ==9308== by 0x7FFFFFFF: ???
cleaned up patch
With this patch, the crash doesn't seem to happen.
moves decoding to the UI thread
ensures we call gtk/gl code only from the UI thread and that decode is not from UI thread
Please see the patch above.
Please also see:
glFlush
calls
Summary of where we are at (let's try to re-confirm each one):
Things to try:
do_gl_paint
calls via idle_add
YUVImage
wrapper where we memcopy the pixel data (as we do for RGB), will make it easier to track free()
of the YUV pixel data and overall lifecycle, try not freeing it at all (memleak it)
glTexSubImage2D
/ glBindTexture
: when does this actually read the pixel data? (glFlush?
needed?)
anything else?
glTexSubImage2D
is synchronous, BindTexture
doesn't touch any CPU data buffer
for i in 1, 100000 do reinit_decoder() decode_video() paint_video_with_gl() paint_triangle_with_gl()
Either paint the real video, or a just a simple triangle that doesn't use the data buffer (no glTexSubImage). That way we'll perhaps manage to get a smaller reproduction case. I'm suspect it won't crash until we move decode_video() to another thread.
use a wrapper class for returning pixel data and copy the data when we get it
With the patch above, I am not getting any crashes on win32 (patch is for vpx only for now), so either the cython docs are wrong (or have changed) or I am not understanding it well enough. I thought it said that this stanza:
doutvY = (<char *>dout[0])[:self.height * outstrides[0]]
would cause the dout buffer to be copied into a python string. Or was it generating a python string from the buffer!? I can't be that stupid, can I? Copying the data fixes the problem.. which makes it sound like we were still using the x264/vpx buffer when the next picture decoding came in, causing the race (and also explaining why decoding in the UI thread made the bug disappear as it serialized things).
uses the YUVImage wrapper for both vpx and x264
Never mind, the gl-yuv-cython-fix
did not fix anything, just seems to have made it harder to crash at the time with my test case (mplayer video) - maybe the window dimensions needed to be bigger?
The yuvimage-wrapper
patch does the same with x264.
I now have a test-case on 64-bit Linux that causes a crash with vpx and a single still frame (and I have gone as far back as r2400 and it still occurred). It really does look like a race as simply removing --no-speaker
also prevents the crash! (at least for a while?)
run scripts/xpra attach --no-mmap tcp:127.0.0.1:10000 --no-mmap --encoding=vpx --no-speaker
With a 1280x720 video playing gives me in gdb:
#0 __memcpy_sse2 () at ../sysdeps/x86_64/memcpy.S:427 #1 0x000000325fe8ea3a in memcpy (__len=483840, __src=0x7fffdc1e7330, __dest=0x7fffe40e99e4) at /usr/include/bits/string3.h:51 #2 PyString_FromStringAndSize (size=<optimized out>, str= 0x7fffdc1e7330 "\207\207\207\207\204\200|xrrw\177\207\214\222\226\230\224\220\215\207\206\205\204\206\206\206\206\206\206\206\206\206\206\206\206\206\206\206\206\205\205\205\205\205\205\205\205\206\206\206\206\206\206\206\206\205\205\205\205\205\205\205\205\205\205\205\205\205\205\205\205\204\204\204\204\204\204\203\202\202\202\204\204\205\205\205\205\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\202\202\203\177gj\225\242"...) at /usr/src/debug/Python-2.7.3/Objects/stringobject.c:95 #3 PyString_FromStringAndSize (str= 0x7fffdc1e7330 "\207\207\207\207\204\200|xrrw\177\207\214\222\226\230\224\220\215\207\206\205\204\206\206\206\206\206\206\206\206\206\206\206\206\206\206\206\206\205\205\205\205\205\205\205\205\206\206\206\206\206\206\206\206\205\205\205\205\205\205\205\205\205\205\205\205\205\205\205\205\204\204\204\204\204\204\203\202\202\202\204\204\205\205\205\205\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\204\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\203\202\202\203\177gj\225\242"..., size=483840) at /usr/src/debug/Python-2.7.3/Objects/stringobject.c:57 #4 0x00007fffee93afd9 in __pyx_pf_4xpra_3vpx_5codec_7Decoder_4decompress_image_to_yuv (__pyx_v_self=0xfaacf0, __pyx_v_input=<optimized out>, __pyx_v_options=<optimized out>) at xpra/vpx/codec.c:1753 #5 __pyx_pw_4xpra_3vpx_5codec_7Decoder_5decompress_image_to_yuv (__pyx_v_self=<xpra.vpx.codec.Decoder at remote 0xfaacf0>, __pyx_args= ('\x10\xe1\x04\x9d\x01*\x00\x05\xd0\x02\x00\x07\x08\x85\x85\x88\x85\x84\x88\x02\x03.7\xfc?\xea\xbd\xcb\xf1\xbf\xb9\xfd\x8d\xed\xe5\xfe\x97\xf2_\xfc\x07u\x1f\xf1\x7f\x0e{\xbd\xe9\xbf\xf7~\xa7\x1f\xa1\x7f^\xfe\xf3\xf8\xe7\xa8Q\xa0\xcf\xe5\x9f\xd6?\xd9\xff\x91\xf5\xe7\x97t\x94\x7f\\\xff\xae\xf4\xc4\xc8_\xe9\x7f9\xff\xf1<T?\xd8~\xa9\xf75\xfc\xc7\xfa\xbf\xfa\xbe\x13>Q\xd8\xf7\xf4\xd7\xf9\x8f,9(\xf2O\xcd\xff\xa4jMG\xdf\xec\x1f\xc0~\xc3\xfd\xd0\xd6\xbd>-7\xe4W\r\x04[=gq\x80\x8cG\xe4\x07\xca\xee/^i?\x93_;<\xe0\xf3@\xfe\xc5\xf9c\xefE1_\xf2\xbf\xeb\x1f\x8f\xdf\xe2\xbf\xff\xf1\xae\xf4\xb1]?\xaf\xfe\x87`l\xc7\xf3\xff\xd8\xff\x99\xff1\xfe\xc7\xfcg\xfe\xdf\xf4\x7f3\\\xab\xe0\xef\xab>\xff\xfe[\xfd\'\xf7\xdf\xfc\x9f\xed\xbf\xe7~b~\xbf\xfe\x9f\xfa/\xdeo\xf3\xfe\x8f\xfc\x1f\xf9\xaf\xfa\x7f\xe7\xbfw\xff\xc5\xfb\xd2y\x9f\xe9?\xe3?\xbb\xff\x94\xffs\xfe\x0b\xff\x17\xef\xff\xe6_\xf6\xbf\xf3\xff\xce\x7f\xbb\xff\xad\xfes\xe9\xdf\xf7O\xef\x1f\xef\xbf\xc5\xfe\xee\xfd\x03\x7fM\xfe\x99\xfeS\xfb\xcf\xfa?\xf7\xbf\xe3\xff\xfc|\xde\x7f\xea\x...(truncated), __pyx_kwds=0x0) at xpra/vpx/codec.c:1565 #6 0x000000325fedd281 in call_function (oparg=<optimized out>, pp_stack=0x7fffe9e5b728) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4098 #7 PyEval_EvalFrameEx (f=<optimized out>, throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.3/Python/ceval.c:2740 #8 0x000000325fedcef1 in fast_function (nk=<optimized out>, na=9, n=<optimized out>, pp_stack=0x7fffe9e5b928, func=<function at remote 0x1bd8d70>) at /usr/src/debug/Python-2.7.3/Python/ceval.c:4184
With the yuvimage-wrapper
patch, you get a more helpful backtrace showing that the problem occurs on the memcopy of one of the channels.
Fixed in r2992 for vpx and r2993 for x264
Please confirm so I can close (and I will blacklist "nouveau" until it gets fixed)
Added --opengl=on|off|auto
command line option and config file support in r2997
Everything seems to work fine.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/229