Xpra: Ticket #132: idle_add is a latency bottleneck in damage_to_data

When we get many small areas in damage requests, we may end up scheduling hundreds of calls to idle_add from damage_to_data which causes the latency to shoot right up.

Solutions:

Or a combination of the above? Or??

Once done, we can deal with #135



Sun, 03 Jun 2012 08:39:14 GMT - Antoine Martin: status, description changed


Sun, 03 Jun 2012 10:32:10 GMT - Antoine Martin:

Using this simple patch:

Index: src/xpra/server.py
===================================================================
--- src/xpra/server.py	(revision 883)
+++ src/xpra/server.py	(working copy)
@@ -539,9 +540,10 @@
             except Exception, e:
                 log.error("damage_to_data: error processing region %s: %s", damage, e)
                 continue
-            gobject.idle_add(self._process_damage_regions, wid, window, ww, wh, regions, coding, sequence, options)
+            gobject.idle_add(self._process_damage_regions, time.time(), wid, window, ww, wh, regions, coding, sequence, options)
-    def _process_damage_regions(self, wid, window, ww, wh, regions, coding, sequence, options):
+    def _process_damage_regions(self, req_time, wid, window, ww, wh, regions, coding, sequence, options):
+        log.info("process_damage_regions: idle_add latency=%s", int(1000*1000*(time.time()-req_time)))
         if self._damage_cancelled.get(wid, 0)>=sequence:
             log("process_damage_regions: dropping damage request with sequence=%s", sequence)
             return

And running the xterm performance test, we can clearly see the idle_add latency shooting right up:

process_damage_regions: idle_add latency=39
process_damage_regions: idle_add latency=601
process_damage_regions: idle_add latency=793
process_damage_regions: idle_add latency=1873
process_damage_regions: idle_add latency=8841
process_damage_regions: idle_add latency=2904
process_damage_regions: idle_add latency=3646
process_damage_regions: idle_add latency=4523
process_damage_regions: idle_add latency=4644
process_damage_regions: idle_add latency=5743
Uh-oh, our size doesn't fit window sizing constraints: 979x633 vs 979x628
Uh-oh, our size doesn't fit window sizing constraints: 979x633 vs 979x628
unknown modifier: meta
process_damage_regions: idle_add latency=135900
process_damage_regions: idle_add latency=135923
process_damage_regions: idle_add latency=135952
process_damage_regions: idle_add latency=106001
process_damage_regions: idle_add latency=289604
(...)
process_damage_regions: idle_add latency=1162372
process_damage_regions: idle_add latency=1150017
process_damage_regions: idle_add latency=124438
process_damage_regions: idle_add latency=93
process_damage_regions: idle_add latency=28
process_damage_regions: idle_add latency=41
process_damage_regions: idle_add latency=46
process_damage_regions: idle_add latency=59

Then suddenly dropping back down... now need to figure out why!


Sun, 03 Jun 2012 10:46:14 GMT - Antoine Martin:

A simple:

time.sleep(0)

added in r884 ensures we don't queue too many idle_add calls before giving them a chance to run.

A better solution is probably to remove the threading from damage_to_data and to keep track of the idle_add latency as a measure of how smoothly the system is running, and batch more when it goes too high?


Sun, 03 Jun 2012 11:33:10 GMT - Antoine Martin: attachment set

removes threading from damage codepath


Sun, 03 Jun 2012 14:05:47 GMT - Antoine Martin: attachment set

updated patch which dumps the damage latency at various points in the pipeline


Sun, 03 Jun 2012 14:52:55 GMT - Antoine Martin:

With this code in _get_rgb_rawdata

    pixbuf = gtk.gdk.Pixbuf(gtk.gdk.COLORSPACE_RGB, False, 8, width, height)
    log.info("get_rgb_rawdata(..) creating pixbuf object took %s ms", int(1000*(time.time()-start)))
    pixbuf.get_from_drawable(pixmap, pixmap.get_colormap(), x, y, 0, 0, width, height)
    log.info("get_rgb_rawdata(..) pixbuf.get_from_drawable took %s ms", int(1000*(time.time()-start)))
    raw_data = pixbuf.get_pixels()
    log.info("get_rgb_rawdata(..) pixbuf.get_pixels took %s ms", int(1000*(time.time()-start)))
    rowstride = pixbuf.get_rowstride()
    log.info("get_rgb_rawdata(..) took %s ms", int(1000*(time.time()-start)))

We can see:

get_rgb_rawdata(..) creating pixbuf object took 0 ms
get_rgb_rawdata(..) pixbuf.get_from_drawable took 182 ms
get_rgb_rawdata(..) pixbuf.get_pixels took 183 ms
get_rgb_rawdata(..) took 184 ms

So the expensive call is pixbuf.get_from_drawable() in gtk.

Solutions:


Sun, 03 Jun 2012 15:24:23 GMT - Antoine Martin: attachment set

patch for protocol.py so the latency reported is when the packet has been sent by the network layer


Sun, 03 Jun 2012 16:47:57 GMT - Antoine Martin:

r886 commits the unthreaded damage code and adds the latency measurement via "xpra info", see changeset commit message for details


Fri, 20 Jul 2012 05:14:41 GMT - Antoine Martin: status changed; resolution set

good enough in trunk with the latency minimization code from #153


Sat, 23 Jan 2021 04:46:24 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/132