Xpra: Ticket #2481: Laggy unless XPRA_OPENGL_DRAW_REFRESH=0

I appreciate there is some difficulty with drivers on Python 3, and, for some systems, refreshing more frequently solves the problem.

However, I found the recent change (3.0.2) to set the default of XPRA_OPENGL_DRAW_REFRESH=1 makes scrolling uncomfortably laggy in many programs. This is all local.

I have not experienced any issues by setting this to 0 (yet). Should I expect any strangeness if I keep this set? Is this an issue that a whitelist or blacklist could help?

Sun, 17 Nov 2019 02:37:24 GMT - Antoine Martin: owner changed

owner changed from Antoine Martin to aerusso

Please specify the exact distro you are using. Please also include the output of xpra opengl.

Sun, 17 Nov 2019 03:13:49 GMT - aerusso: attachment set

attachment set to xpra-opengl.out

Output of xpra opengl

Sun, 17 Nov 2019 03:14:40 GMT - aerusso:

This is Debian unstable, and I've attached the output.

Sun, 17 Nov 2019 03:32:37 GMT - Antoine Martin: owner, status changed

owner changed from aerusso to Antoine Martin
status changed from new to assigned

The change that enabled XPRA_OPENGL_DRAW_REFRESH=1 for v3 is in r24352. More details in #2466. According to ticket:2466#comment:5, this also affects some Linux drivers.. So r24433 switches back to XPRA_OPENGL_DRAW_REFRESH=0 in trunk, but I am very reluctant to apply this to v3 for now, as I don't want to spend days making new release after new release just to toggle this switch back and forth when people hit issues..

I think that the slowness is caused by the multiple calls to present_fbo. When scrolling, we're sending the window updates using many chunks. We should bundle all of them together for the refresh code path, the same way that we already do for the non-refresh paint code path.

Sun, 17 Nov 2019 17:23:53 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to aerusso
status changed from assigned to new

@aerusso: does r24434 fix things? It is a better solution than XPRA_OPENGL_DRAW_REFRESH=0. The only other option worth a try if you have repaint issues with this patch is XPRA_REPAINT_ALL=1.

There are beta builds with this change here: https://xpra.org/beta.

Sun, 17 Nov 2019 20:20:32 GMT - aerusso:

I am confused: testing r24434 with XPRA_OPENGL_DRAW_REFRESH=0 (which is what r24433 sets, if I'm reading things right), is excellent (I didn't actually apply that patch, I only applied r24434). Should I be testing with development tip?

With XPRA_OPENGL_DRAW_REFRESH=1, it seems better, but not as good as XPRA_OPENGL_DRAW_REFRESH=0.

This is a relatively high end machine---Ryzen 7 3700X.

Mon, 18 Nov 2019 03:25:24 GMT - Antoine Martin:

Should I be testing with development tip?

Applying r24434 should be enough.

which is what r24433 sets, if I'm reading things right

Damn, I meant to revert r24433, that's now done in r24435.

With XPRA_OPENGL_DRAW_REFRESH=1, it seems better, but not as good as XPRA_OPENGL_DRAW_REFRESH=0.

Better than what?

I think r24434 is what I am going to backport to v3, it's the safest solution: we always request a repaint, but doing it efficiently. I just need to re-test with wayland first, because GTK does weird things with coordinates because of CSD.

Mon, 18 Nov 2019 04:15:44 GMT - aerusso:

Sorry for the confusion. From best to worse, in terms of lag:

XPRA_OPENGL_DRAW_REFRESH=0, with or without r24434 (I can't tell if there is any difference, though I don't really see how there could be)
XPRA_OPENGL_DRAW_REFRESH=1, with r24434
XPRA_OPENGL_DRAW_REFRESH=1, without r24434

Is this a bug in the Python 3 implementation of GTK3? It seems like, at least most of the time on X, a refresh is not needed.

By the way, coalescing all the refreshes into a single one helps the lag enough that I wouldn't have noticed (at least on my machine):

    def after_draw_refresh(self, success, message=""):
        plog("after_draw_refresh(%s, %s) pending_refresh=%s",
             success, message, self.pending_refresh)
        backing = self._backing
        if not backing:
            return
        if REPAINT_ALL or self._client.xscale!=1 or self._client.yscale!=1 or is_Wayland():
            #easy: just repaint the whole window:
            rw, rh = self.get_size()
            self.idle_add(self.queue_draw_area, 0, 0, rw, rh)
            return
        pr = self.pending_refresh
        self.pending_refresh = []
        bx, by, bw, bh = None, None, None, None
        for x, y, w, h in pr:
            if bx is None:
                bx, by, bw, bh = x, y, w, h
                continue
            if x < bx:
                bw += bx - x
                bx = x
            if y < by:
                bh += by - y
                by = y
            if w > bw:
                bw = w
            if h > bh:
                bh = h
        rx, ry, rw, rh = self._client.srect(bx, by, bw, bh)
        #if self.window_offset:
        #    rx -= self.window_offset[0]
        #    ry -= self.window_offset[1]
        self.idle_add(self.queue_draw_area, rx, ry, rw, rh)

I don't do very much graphics, so I don't know if this will break something somewhere else.

Mon, 18 Nov 2019 07:37:23 GMT - Antoine Martin:

XPRA_OPENGL_DRAW_REFRESH=0, with or without r24434 (I can't tell if there is any difference, though I don't really see how there could be)

With r24434, when draw refresh is enabled =1 and with opengl, we don't paint the fbo on-screen until all the screen updates in the same group have been received. This should be faster for when many screen updates arrive in the same group (ie: when scrolling).

By the way, coalescing all the refreshes into a single one helps the lag enough that I wouldn't have noticed (at least on my machine):

I assume you mean without r24434? Or is it better than r24434 with draw refresh enabled? (which is the new default)

I don't do very much graphics, so I don't know if this will break something somewhere else.

I don't think the code is correct, we already have a cython accelerated rectangle merging function in xpra: browser/xpra/trunk/src/xpra/rectangle.pyx (with unit tests). In some cases, it is best not to coalesce: imagine a window updating a single pixel in its 4 corners, we would ask the graphics system to repaint all of it. Seems like a contrived example but similar behaviour happens more than you would think because of the way applications decorate their windows. It's something I have looked into in the past, and it can be costly, especially when opengl is disabled (high CPU cost). We could still merge something like this, as long as it isn't the default.

Mon, 18 Nov 2019 09:14:40 GMT - Antoine Martin:

The backport to v3 is in: r24441. Will test some more.

Mon, 18 Nov 2019 11:08:39 GMT - aerusso:

Or is it better than r24434 with draw refresh enabled? (which is the new default)

Yes, for the workload I tested, draw refresh enabled and merging all of the rectangles is best. It winds up essentially being equivalent to XPRA_REPAINT_ALL=1.

The workload is Google Chrome, when scrolling. For example, here is a typical pending_requests list

[(0, 0, 1500, 846), (0, 119, 1500, 4), (0, 564, 1500, 16), (0, 601, 1500, 28), (0, 648, 1500, 4), (0, 697, 1500, 16), (0, 740, 1500, 2), (0, 780, 1500, 10), (0, 798, 1500, 13), (0, 817, 1500, 10)]

It's really bizarre to me that updating 10 regions is actually something that I can "feel". Now that I'm digging into queue_draw_area, I suspect that there's some slow code path there, maybe getting and initializing the gl context? I didn't look very hard, but gl_init looks like it could be slow.

I think the ideal solution probably renames queue_draw_area to queue_draw_areas and makes it take a list of rectangular regions that all get updated using the same gl context.

I would guess that it only makes sense to consider merging rectangles after collecting the refreshes within a single gl context creation.

Mon, 18 Nov 2019 11:18:15 GMT - Antoine Martin: owner, status changed

owner changed from aerusso to Antoine Martin
status changed from new to assigned

I didn't look very hard, but gl_init looks like it could be slow.

Of course, you're right.

Mon, 18 Nov 2019 11:51:10 GMT - Antoine Martin:

The slowdown comes from calling gl_show which ends up calling swap_buffers(). The other thing I had forgotten about is that we already do a full window redraw when double-buffered (which is everywhere):

        if self.is_double_buffered() or bw!=ww or bh!=wh:
            #refresh the whole window:
            rectangles = ((0, 0, bw, bh), )
        else:
            #paint just the rectangles we have accumulated:
            rectangles = self.pending_fbo_paint

Now, we don't want to penalize the non-opengl case, so maybe the code just needs to be delegated to the backend class so opengl can take the shortcut.

Tue, 19 Nov 2019 15:25:49 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to aerusso
status changed from assigned to new

@aerusso: how about r24460? (this solution may help with #2373)

This does a single queue_draw_area for the full window when using the opengl backend whilst preserving the multiple-regions code for the non-opengl case. (tested briefly)

Fri, 22 Nov 2019 09:38:21 GMT - Antoine Martin:

This may have caused a regression on macos: #2491.

Sun, 24 Nov 2019 19:27:53 GMT - aerusso:

I tested with Debian unstable's version and added r24434, r24435, and r24460. I cannot "feel" any lag with these patches.

Will double buffering always be enabled? If not, I think there may still be value in collecting the multiple calls to queue_draw_area into a single gl context.

Also: thanks for xpra! It's really a fantastic piece of software!

Mon, 25 Nov 2019 14:49:49 GMT - Antoine Martin: status changed; resolution set

status changed from new to closed
resolution set to fixed

r24460 has been backported to the v3 branch in r24483.

Since this fixes the symptoms, I am going to close this ticket. We can deal with regressions and bugs in other tickets and link back here if needed.

Sat, 23 Jan 2021 05:52:28 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2481