Xpra: Ticket #321: opengl rendering corruption for small rgb areas

Fedora 18 x86_64 boxes with a radeon 5450 and fglrx drivers. One shows the problem, the other one does not. The one that is corrupted looks like each new rgb paint is XORed with the previous one. This makes me suspicious of the XOR code we do on the rgb24 data, but why would it only break on some systems?

The server is started with:

xpra start :10 --start-child=xterm

I am attaching the apitrace for both cases, captured with:

apitrace trace xpra attach --opengl=yes --no-mmap

I pressed the space bar a number of times to get the cursor to move then I disconnected.

This has apparently also been seen on win32.



Sun, 21 Apr 2013 14:55:36 GMT - Antoine Martin: attachment set

apitrace on the system that displays ok


Sun, 21 Apr 2013 14:56:03 GMT - Antoine Martin: attachment set

apitrace on the system which shows black squares instead


Sun, 21 Apr 2013 14:59:11 GMT - Antoine Martin: description changed


Sun, 21 Apr 2013 15:11:41 GMT - Antoine Martin:

Nope: I removed our XOR code so we never XOR anything before sending and I still get the corruption.


Sun, 21 Apr 2013 15:17:07 GMT - Antoine Martin: attachment set

after pressing space a number of times


Sun, 21 Apr 2013 15:17:26 GMT - Antoine Martin: attachment set

after pressing space a number of times + one more redraw


Sun, 21 Apr 2013 15:23:23 GMT - Antoine Martin:

This is really odd, every time I press space in the xterm, we paint just one character 6x13:

2013-04-21 22:14:11,809 _do_paint_rgb24(x=299, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:11,950 _do_paint_rgb24(x=299, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:11,955 _do_paint_rgb24(x=305, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:12,102 _do_paint_rgb24(x=305, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:12,107 _do_paint_rgb24(x=311, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:12,265 _do_paint_rgb24(x=311, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:12,270 _do_paint_rgb24(x=317, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:12,405 _do_paint_rgb24(x=317, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:12,410 _do_paint_rgb24(x=323, y=2, width=6, height=13 rowstride=18)
2013-04-21 22:14:12,561 _do_paint_rgb24(x=323, y=2, width=6, height=13 rowstride=18)

But from the output above, it looks like we paint it twice everytime, not sure why but I suspect that this is the damage sequence we get from the client (moving cursor, then showing the new character in the new cursor location).


The whole line changes as I type! The black and white squares alternate! /raw-attachment/ticket/321/xterm-step1.png

/raw-attachment/ticket/321/xterm-step2.png

FYI: The same thing occurs if I use another printable character, the white squares show the character and the black ones stay black.


Mon, 22 Apr 2013 09:25:02 GMT - Antoine Martin: attachment set

replaying the apitrace on the system that has display problems


Mon, 22 Apr 2013 09:27:07 GMT - Antoine Martin:

Replaying the trace on systems which do not have problems shows the correct/expected back buffers, but replaying the trace on the system which had the display problems initially shows the exact same display corruption again: /raw-attachment/ticket/321/qapitrace-replay-error.png


Mon, 22 Apr 2013 10:08:20 GMT - Antoine Martin:

And replaying the xterm-ok.trace on the "bad" system gives the same display corruption, starting with "Frame 6" where we get 2 black squares on the same line instead of just one (the cursor).

This same trace replays without showing any artifacts on other systems.


Mon, 22 Apr 2013 17:54:58 GMT - Antoine Martin:

r3126 solves the problem for Linux boxes by disabling double-buffering. What seems to be happening is that when we call drawable.swap_buffers() we get the previous buffer, not a "new clean copy" of the current one to work on. But only on some cards/systems....

The problem is that on MS Windows, we need double-buffering to get a working GL context, at least that was the case when I last tried. Those with access to native MS Windows boxes with OpenGL support, please try running:

set XPRA_OPENGL_DOUBLE_BUFFERED=0
xpra attach ...

and

set XPRA_OPENGL_DOUBLE_BUFFERED=1
xpra attach ...

And report back on:


Mon, 22 Apr 2013 22:44:12 GMT - alas:

Testing with a native Windows 7, with r3128 client-side and r3128 fedora server-side, with XPRA_OPENGL_DOUBLE_BUFFERED=0 I don't see the xterm weirdness, the session info shows

Client OpenGL - OpenGL property 'shading language version' is missing

And the xpra session seems to behave well (as it usually behaves anytime with OpenGL disabled, at least).

With XPRA_OPENGL_DOUBLE_BUFFERED=1 The xterm weirdness appears, but the session info shows the Client OpenGL to be green checked with info about my video card (NVIDIA CORPORATION / ION/PCIe/SSE2, in case this turns out to be related to specific video cards.)

However, with double buffering set =1, the first time I tried to test xpra crashed entirely. The second time I got a set of error messages:

2013-04-22 14:26:13,490 using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2013-04-22 14:26:14,203 OpenGL paint error: exception: access violation reading
0x08D6B000
Traceback (most recent call last):
  File "xpra\gl\gl_window_backing.pyc", line 210, in do_gl_paint
  File "xpra\gl\gl_window_backing.pyc", line 243, in update_texture_yuv
  File "latebind.pyx", line 32, in OpenGL_accelerate.latebind.LateBind.__call__ (src\latebind.c:667)
  File "wrapper.pyx", line 308, in OpenGL_accelerate.wrapper.Wrapper.__call__ (src\wrapper.c:5356)
WindowsError: exception: access violation reading 0x08D6B000
2013-04-22 14:26:14,736 OpenGL paint error: GLError(
        err = 1281,
        description = 'invalid value',
        baseOperation = glTexSubImage2D,
        pyArgs = (
                GL_TEXTURE_RECTANGLE_ARB,
                0,
                0,
                0,
                940,
                822,
                GL_LUMINANCE,
                GL_UNSIGNED_BYTE,
                "\x80\x80\x7f\x7f\x80\x80\x80\x80\x80...,
        ),
        cArgs = (
                GL_TEXTURE_RECTANGLE_ARB,
                0,
                0,
                0,
                940,
                822,
                GL_LUMINANCE,
                GL_UNSIGNED_BYTE,
                "\x80\x80\x7f\x7f\x80\x80\x80\x80\x80...,
        ),
        cArguments = (
                GL_TEXTURE_RECTANGLE_ARB,
                0,
                0,
                0,
                940,
                822,
                GL_LUMINANCE,
                GL_UNSIGNED_BYTE,
                "\x80\x80\x7f\x7f\x80\x80\x80\x80\x80...,
        )
)
Traceback (most recent call last):
  File "xpra\gl\gl_window_backing.pyc", line 210, in do_gl_paint
  File "xpra\gl\gl_window_backing.pyc", line 266, in update_texture_yuv
  File "latebind.pyx", line 32, in OpenGL_accelerate.latebind.LateBind.__call__ (src\latebind.c:667)
  File "wrapper.pyx", line 315, in OpenGL_accelerate.wrapper.Wrapper.__call__ (src\wrapper.c:5478)
GLError: GLError(
        err = 1281,
        description = 'invalid value',
        baseOperation = glTexSubImage2D,
        pyArgs = (
                GL_TEXTURE_RECTANGLE_ARB,
                0,
                0,
                0,
                940,
                822,
                GL_LUMINANCE,
                GL_UNSIGNED_BYTE,
                "\x80\x80\x7f\x7f\x80\x80\x80\x80\x80...,
        ),
        cArgs = (
                GL_TEXTURE_RECTANGLE_ARB,
                0,
                0,
                0,
                940,
                822,
                GL_LUMINANCE,
                GL_UNSIGNED_BYTE,
                "\x80\x80\x7f\x7f\x80\x80\x80\x80\x80...,
        ),
        cArguments = (
                GL_TEXTURE_RECTANGLE_ARB,
                0,
                0,
                0,
                940,
                822,
                GL_LUMINANCE,
                GL_UNSIGNED_BYTE,
                "\x80\x80\x7f\x7f\x80\x80\x80\x80\x80...,
        )
)
...
2013-04-22 14:48:15,039 OpenGL paint error: exception: access violation reading
0x08DD0000
Traceback (most recent call last):
  File "xpra\gl\gl_window_backing.pyc", line 210, in do_gl_paint
  File "xpra\gl\gl_window_backing.pyc", line 243, in update_texture_yuv
  File "latebind.pyx", line 32, in OpenGL_accelerate.latebind.LateBind.__call__ (src\latebind.c:667)
  File "wrapper.pyx", line 308, in OpenGL_accelerate.wrapper.Wrapper.__call__ (src\wrapper.c:5356)
WindowsError: exception: access violation reading 0x08DD0000

The third try it worked fine with no error messages.

The fourth try it gave only a few error messages:

2013-04-22 15:06:02,763 OpenGL paint error: exception: access violation reading
0x08D0E000
Traceback (most recent call last):
  File "xpra\gl\gl_window_backing.pyc", line 210, in do_gl_paint
  File "xpra\gl\gl_window_backing.pyc", line 243, in update_texture_yuv
  File "latebind.pyx", line 32, in OpenGL_accelerate.latebind.LateBind.__call__ (src\latebind.c:667)
  File "wrapper.pyx", line 308, in OpenGL_accelerate.wrapper.Wrapper.__call__ (src\wrapper.c:5356)
WindowsError: exception: access violation reading 0x08D0E000

Going back and trying without any setting of the double buffering one way or the other caused the xpra session to crash on the first attempt (trying to connect to the same server-session as I'd been using all along).

Trying again, without setting double buffering either way with a fresh server-side xpra produced the xterm weirdness, with the OpenGL green checked & the video card info displayed in the session info.

Looks like the GL only loads with the double buffering enabled (which I assume is the default) - but then the xterm displays weirdness.

(Oddly, running r3128 client-side but r3098 server side doesn't seem to produce any weirdness in the xterms, though the OpenGL is green checked. This was tested on a native Windows 7 with an ATI Technologies Inc. / ATI Mobility Radeon HD 4200 Series video card.)

(Testing again with the ATI video card and r3128 client & server-side both displays no xterm character weirdness, but the cursor does turn block-y. Perhaps it is a videocard issue?)

(Testing by connecting the NVIDIA card client to the same server session that the ATI was connected to results in the NVIDIA client showing the same behavior as before, with characters and cursor all turning into bank-style bar codes... but with the ATI card it is only the cursor that displays the distortion.)


Tue, 23 Apr 2013 04:58:35 GMT - Antoine Martin:

Thanks!

Looks like the GL only loads with the double buffering enabled (which I assume is the default)

"Good", that's what I wanted to verify. (double-buffering is currently the default for win32 only) So we need to disable double-buffering to fix the visual corruption, but that does not work on win32... so I think I will just make GL default to "off" on win32, unless ahuillet can find out what we're doing wrong or at least a workaround/fix.


WindowsError: exception: access violation reading 0x08D6B000 File "xpra\gl\gl_window_backing.pyc", line 243, in update_texture_yuv

That's this line which is updating the textures:

glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_LUMINANCE, width/div_w, height/div_h, 0, GL_LUMINANCE, GL_UNSIGNED_BYTE, 0)

ahuillet: maybe the buffer is still too small and we're reading past the end? (modulo size of reads in long-words/other?) Maybe we should add an assert to verify, or/and use rowstride instead of width?


Oddly, running r3128 client-side but r3098 server side doesn't seem to produce any weirdness in the xterms, though the OpenGL is green checked

Hmmm, I think that's unlikely - could this be just a fluke?

but the cursor does turn block-y

"block-y"?

Testing by connecting the NVIDIA card client (...) turning into bank-style bar codes. but with the ATI card it is only the cursor that displays the distortion

Weird, so far I had only seen the barcode effect on Linux with ATI, never with nvidia!


Tue, 23 Apr 2013 08:53:05 GMT - Antoine Martin:

r3131 disables OpenGL on win32 by default (can still be enabled with --opengl=yes)

And other platforms use single-buffer mode, which doesn't have visual corruption on any cards (that I know of).

I guess that will do until someone figures out what we need to do to get double-buffering to work reliably with all cards and OSes.


Tue, 23 Apr 2013 13:12:16 GMT - ahuillet:

The issue has to do with an incorrect assumption that we make. We paint (partial) RGB updates on the backbuffer and swap it in front. This only works if the contents of the backbuffer are not changed when swapping from front-to-back, and GLX/WGL explicitely indicate that the contents of the backbuffer are *undefined* after swapping. On Linux it so happens that we are lucky and are not noticing the problem. On Windows, which I suspect is doing triple buffering by default, our backbuffer is the N-2 frame, which explain the issues we've observed with xterm.

The solution to this is to either restore the contents of the frontbuffer to the backbuffer (feasible but not ideal in terms of performance), or draw RGB24 updates to the front buffer directly.


Tue, 23 Apr 2013 13:13:43 GMT - ahuillet: attachment set

Expose backbuffer incorrect assumptions


Tue, 23 Apr 2013 13:17:04 GMT - ahuillet: attachment set

Frontbuffer RGB24 rendering with no swap - expose_event still a problem


Tue, 23 Apr 2013 13:18:23 GMT - ahuillet:

Attached a.patch (applied on top of the previous one) fixes the issues in Xterm by doing frontbuffer rendering for RGB24 and calling glFlush (instead of swap_buffers). However it still breaks with menus in Firefox, due to expose_event() being called when closing the menu.

Recipe to reproduce: start firefox, get it drawn as YUV, click on "bookmarks", menu will appear as RGB, don't move the cursor and click again to close the menu. The Firefox window will become black.

Below is the debug log of the problem. At 15:20:23,079, I have my Firefox window and am about to click "Bookmarks". At 15:20:41,484, I have clicked Bookmarks and the menu appeared OK At 15:20:54,239, I have clicked again on Bookmarks to close the menu, and the Firefox windows is black except for the part that was previously occluded by the menu (see screenshot)

15:20:22,936 do_video_paint: options={'quality': 95, 'frame': 18, 'speed': 29, 'csc_pixel_format': 5}, decoder=<type 'xpra.x264.codec.Decoder'>
15:20:22,971 GL Pixmap backing size: 1088 x 718, drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062caf0 (GdkGLWindowImplX11 at 0x180eb80)>
15:20:22,971 GL creating new YUV textures for pixel format 444 using divs=((1, 1), (1, 1), (1, 1))
15:20:22,974 Assigning fragment program
15:20:22,979 render_image 1088x718 at 0x0 pixel_format=444
15:20:22,979 render_image texture_size=(1088, 718), size=(1088, 718)
15:20:22,980 SWAPPING BUFFERS NOW
15:20:23,078 _do_paint_rgb24(x=185, y=1, width=87, height=23 rowstride=261)
15:20:23,079 GL Pixmap backing size: 1088 x 718, drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062cc80 (GdkGLWindowImplX11 at 0x180eb80)>
15:20:41,301 _do_paint_rgb24(x=185, y=1, width=87, height=23 rowstride=261)
15:20:41,302 GL Pixmap backing size: 1088 x 718, drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062cc80 (GdkGLWindowImplX11 at 0x180eb80)>
15:20:41,483 _do_paint_rgb24(x=185, y=1, width=87, height=23 rowstride=261)
15:20:41,484 GL Pixmap backing size: 1088 x 718, drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062cc80 (GdkGLWindowImplX11 at 0x180eb80)>
[swscaler @ 0x7f8e200c4960] Warning: data is not aligned! This can lead to a speedloss
15:20:54,142 GL Pixmap backing size: 1088 x 718, drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062ccd0 (GdkGLWindowImplX11 at 0x180eb80)>
15:20:54,142 gl_expose_event(<DrawingArea object at 0x2c704b0 (GtkDrawingArea at 0x2c12e00)>, <gtk.gdk.Event at 0x156cfa8: GDK_EXPOSE area=[185, 24, 265, 205]>) drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062ccd0 (GdkGLWindowImplX11 at 0x180eb80)>
15:20:54,143 render_image 265x205 at 185x24 pixel_format=444
15:20:54,143 render_image texture_size=(1088, 718), size=(1088, 718)
15:20:54,144 SWAPPING BUFFERS NOW
15:20:54,172 _do_paint_rgb24(x=185, y=1, width=87, height=23 rowstride=261)
15:20:54,172 GL Pixmap backing size: 1088 x 718, drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062ccd0 (GdkGLWindowImplX11 at 0x180eb80)>
15:20:54,239 _do_paint_rgb24(x=185, y=1, width=87, height=23 rowstride=261)
15:20:54,239 GL Pixmap backing size: 1088 x 718, drawable=<gtk.gdk.GLWindowImplX11 object at 0x7f8e2062ccd0 (GdkGLWindowImplX11 at 0x180eb80)>

Bonus question: what is this swscale message doing there?


Tue, 23 Apr 2013 14:38:10 GMT - ahuillet: attachment set

Screenshot of the issue with firefox when closing menu. Contents of the window obfuscated somewhat for privacy.


Tue, 23 Apr 2013 15:15:52 GMT - Antoine Martin:

bonus question: swscale messages can show up anywhere since they are running in a separate thread, and the first frame to use non-aligned addresses will show up within the first few frames.


Tue, 23 Apr 2013 15:27:31 GMT - ahuillet:

Who is using swscale with OpenGL?


Tue, 23 Apr 2013 15:32:06 GMT - Antoine Martin:

I recently added this code to avoid using OpenGL windows for some window types (drop down menus, tooltips, etc) where the GL setup cost is too high for what is generally just a short-lived window:

#only enable GL for normal windows:
window_types = metadata.get("window-type", ())
if "_NET_WM_WINDOW_TYPE_NORMAL" in window_types:
    ClientWindowClass = self.GLClientWindowClass

Tue, 23 Apr 2013 16:07:46 GMT - ahuillet: attachment set

Attached patch works *pretty well* for double buffered mode.


Tue, 23 Apr 2013 16:11:54 GMT - ahuillet:

frontbuf.patch helps tremendously, fixing known issues with RGB painting and double buffering by doing frontbuffer painting. The Firefox issues are actually unrelated to RGB painting, and related to YUV painting upon receiving expose events, which at the moment is done on partial frame, and on the backbuffer. Thanks to the addition of glClear() call this means that painting called by expose_event fills the window with black. This was fixed by doing frontbuffer rendering for expose events as well, and in a general manner we would probably be well advised to do frontbuffer rendering for everything (which the attached patch doesn't do, and isn't required as far as I can tell).

Last problem with this patch is an interaction between RGB and YUV painting in expose_events calls. Doing expose_events will repaint based on the YUV texture, even though the latest update we received was RGB, so we can theoretically end up with old pixels on screen. I have been able to reproduce the issue by moving my xterm window, but do not believe it is a major problem.

In order to fix everything, we would have to move to a model where we can make the following assumption: "what we paint is available in memory for further updates", which doesn't hold at the moment (RGB updates are painted on frontbuf and thrown away on next refresh, YUV updates are painted on backbuf and not accessible except by reading back from frontbuf).

Solutions to that issue are to use FBOs (render as YUV/RGB to texture/FBO, then render a textured quad to screen), or readback from frontbuffer. Performance will have to be assessed as this can be a fairly heavy operation.


Tue, 23 Apr 2013 16:14:13 GMT - Antoine Martin: milestone changed

Let's try again for 0.10? (early in the dev cycle this time)


Tue, 23 Apr 2013 16:25:33 GMT - ahuillet:

Or the next iteration of 0.9?


Wed, 24 Apr 2013 08:17:16 GMT - ahuillet:

Here is how we will win the war:

05:34 <@totaam> paint yuv: paint, swap, paint again (so both back and front are up to date)
05:35 <@totaam> paint rgb: paint to both (still both up to date)
05:35 <@totaam> expose: paint from back to front?
05:35 <@totaam> or if not, is there a way to update the yuv textures from the back buffer?
05:36 <@totaam> (seems like there should be)
05:36 <@totaam> then when we need to expose, we can just update the textures and use regular render code
05:37 <@totaam> IIRC, expose needs swap, so we may have to go for the second option
05:38 <@totaam> http://www.opengl.org/discussion_boards/showthread.php/171930-How-to-copy-backbuffer-contents-into-a-texture
(...)
07:20 <ahuillet> YUV paints to backbuffer, presents the buffer, then paints again
07:20 <ahuillet> RGB paints to frontbuf and backbuf
07:20 <ahuillet> (and never swaps!! critical)
07:21 <ahuillet> expose can readback from backbuf, present the buffer, and paint again

The main codepath (YUV) is largely unaffected. The alternate codepath (RGBY) is entirely unaffected. Expose events are going to be somewhat slower, which is a pity, and will have to be benchmarked.


Wed, 24 Apr 2013 19:40:09 GMT - ahuillet:

The plan works, but: expose happens when the window was partially offscreen. And in that case, GLX specifies that we lose the contents of the parts of the backbuffer that were offscreen. So using the backbuffer as temporary storage is not going to work, we have to use FBOs to guarantee full frame correctness. This will come at a price.


Thu, 25 Apr 2013 05:31:52 GMT - Antoine Martin: priority, component changed


Fri, 12 Jul 2013 07:14:13 GMT - ahuillet:

As far as I can tell this is done now. Should we not close it?


Fri, 12 Jul 2013 07:23:25 GMT - Antoine Martin: status changed; resolution set

Mostly in r3149, right? (closing)


Sat, 23 Jan 2021 04:51:37 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/321