Xpra: Ticket #377: damage latency spinning out of control with png encoding

I've experienced this with firefox and youtube, but I guess any page with lots of small moving things will trigger it.

Eventually the damage latency will reach >10s!

Here are some xpra info nuggets:

window[2].batch.damage-processing-latency=(304, 143)
window[2].batch.damage-processing-latency.recent=1987
window[2].batch.damage-out-latency=(251, 122)
window[2].batch.damage-out-latency.recent=1988

Now png has never worked great for this sort of thing, but it should still work better than this. And it may be a sign that something more fundamental is broken.

Thu, 11 Jul 2013 15:31:48 GMT - Antoine Martin: status changed

status changed from new to assigned

I think the problem is that the slowness of png accumulates very quickly:

2013-07-11 22:13:01,822 damage_in_latency: 682x51 in 18ms
2013-07-11 22:13:01,872 damage_in_latency: 525x387 in 47ms
2013-07-11 22:13:01,935 damage_in_latency: 607x523 in 93ms
2013-07-11 22:13:02,021 damage_in_latency: 835x550 in 166ms
2013-07-11 22:13:02,087 damage_in_latency: 607x523 in 214ms
2013-07-11 22:13:02,153 damage_in_latency: 607x522 in 260ms
2013-07-11 22:13:02,248 damage_in_latency: 835x550 in 342ms
2013-07-11 22:13:02,311 damage_in_latency: 607x521 in 389ms
2013-07-11 22:13:02,383 damage_in_latency: 607x521 in 444ms
2013-07-11 22:13:02,469 damage_in_latency: 835x550 in 514ms
2013-07-11 22:13:02,530 damage_in_latency: 607x523 in 558ms
2013-07-11 22:13:02,594 damage_in_latency: 607x523 in 607ms
2013-07-11 22:13:02,645 damage_in_latency: 835x550 in 642ms
2013-07-11 22:13:02,680 damage_in_latency: 607x523 in 659ms
2013-07-11 22:13:02,712 damage_in_latency: 607x523 in 676ms
2013-07-11 22:13:02,766 damage_in_latency: 835x550 in 712ms
2013-07-11 22:13:02,796 damage_in_latency: 607x523 in 722ms
2013-07-11 22:13:02,828 damage_in_latency: 607x522 in 741ms
2013-07-11 22:13:02,878 damage_in_latency: 835x550 in 774ms
2013-07-11 22:13:02,909 damage_in_latency: 607x522 in 789ms
2013-07-11 22:13:02,942 damage_in_latency: 607x523 in 801ms
2013-07-11 22:13:02,991 damage_in_latency: 835x550 in 835ms
2013-07-11 22:13:03,021 damage_in_latency: 607x523 in 851ms
2013-07-11 22:13:03,055 damage_in_latency: 607x523 in 861ms
2013-07-11 22:13:03,109 damage_in_latency: 835x550 in 897ms
2013-07-11 22:13:03,142 damage_in_latency: 607x522 in 919ms
2013-07-11 22:13:03,177 damage_in_latency: 607x521 in 937ms
2013-07-11 22:13:03,230 damage_in_latency: 835x550 in 938ms
2013-07-11 22:13:03,291 damage_in_latency: 816x550 in 954ms
2013-07-11 22:13:03,357 damage_in_latency: 835x550 in 994ms
2013-07-11 22:13:03,420 damage_in_latency: 841x611 in 1048ms
2013-07-11 22:13:03,461 damage_in_latency: 607x522 in 1071ms
2013-07-11 22:13:03,517 damage_in_latency: 835x550 in 1108ms
2013-07-11 22:13:03,554 damage_in_latency: 607x522 in 1128ms
2013-07-11 22:13:03,592 damage_in_latency: 607x522 in 1152ms
2013-07-11 22:13:03,647 damage_in_latency: 835x550 in 1187ms
2013-07-11 22:13:03,686 damage_in_latency: 607x520 in 1210ms
2013-07-11 22:13:03,722 damage_in_latency: 607x522 in 1227ms
2013-07-11 22:13:03,795 damage_in_latency: 835x550 in 1281ms
2013-07-11 22:13:03,838 damage_in_latency: 607x523 in 1312ms

Here, in 2 seconds, we went from taking ~20ms to process the request to 1.3s!
And because of that, we end up with a high batch delay to compensate (too late):

batch.actual_delay.90p=632
batch.actual_delay.avg=516
batch.actual_delay.cur=4120
batch.actual_delay.max=7811
batch.actual_delay.min=46

What we need is a mechanism similar to the one we use to wait for ACKs from the client before we process more damage requests (used the average client latency to decide how long is too long): we should not have more damage requests in flight than we can process in a reasonable amount of time. The difficulty is that the encoding speed varies greatly depending on the size of the regions.. Also, if we end up batching too much, we may end up sending a full screen update..

Thu, 11 Jul 2013 15:38:04 GMT - ahuillet:

As I've said in the past - PNG is a slow algorithm.

Mon, 15 Jul 2013 08:20:08 GMT - Antoine Martin: status changed; resolution set

status changed from assigned to closed
resolution set to fixed

r3855 solves this by moving the damage latency factors out of the window performance statistics and into the window-source's dynamic delay handling: we keep batching until the regions waiting for encoding get low enough.

This prevents the dramatic increase to the batch delay when the encoder is performing poorly: we only increase it as much as needed to keep the buffers busy. (A much better solution than leaving it to the timer to evaluate - often adjusting too late)

Sat, 23 Jan 2021 04:53:32 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/377