I've experienced this with firefox and youtube, but I guess any page with lots of small moving things will trigger it.
Eventually the damage latency will reach >10s!
Here are some xpra info nuggets:
window[2].batch.damage-processing-latency=(304, 143) window[2].batch.damage-processing-latency.recent=1987 window[2].batch.damage-out-latency=(251, 122) window[2].batch.damage-out-latency.recent=1988
Now png has never worked great for this sort of thing, but it should still work better than this. And it may be a sign that something more fundamental is broken.
I think the problem is that the slowness of png accumulates very quickly:
2013-07-11 22:13:01,822 damage_in_latency: 682x51 in 18ms 2013-07-11 22:13:01,872 damage_in_latency: 525x387 in 47ms 2013-07-11 22:13:01,935 damage_in_latency: 607x523 in 93ms 2013-07-11 22:13:02,021 damage_in_latency: 835x550 in 166ms 2013-07-11 22:13:02,087 damage_in_latency: 607x523 in 214ms 2013-07-11 22:13:02,153 damage_in_latency: 607x522 in 260ms 2013-07-11 22:13:02,248 damage_in_latency: 835x550 in 342ms 2013-07-11 22:13:02,311 damage_in_latency: 607x521 in 389ms 2013-07-11 22:13:02,383 damage_in_latency: 607x521 in 444ms 2013-07-11 22:13:02,469 damage_in_latency: 835x550 in 514ms 2013-07-11 22:13:02,530 damage_in_latency: 607x523 in 558ms 2013-07-11 22:13:02,594 damage_in_latency: 607x523 in 607ms 2013-07-11 22:13:02,645 damage_in_latency: 835x550 in 642ms 2013-07-11 22:13:02,680 damage_in_latency: 607x523 in 659ms 2013-07-11 22:13:02,712 damage_in_latency: 607x523 in 676ms 2013-07-11 22:13:02,766 damage_in_latency: 835x550 in 712ms 2013-07-11 22:13:02,796 damage_in_latency: 607x523 in 722ms 2013-07-11 22:13:02,828 damage_in_latency: 607x522 in 741ms 2013-07-11 22:13:02,878 damage_in_latency: 835x550 in 774ms 2013-07-11 22:13:02,909 damage_in_latency: 607x522 in 789ms 2013-07-11 22:13:02,942 damage_in_latency: 607x523 in 801ms 2013-07-11 22:13:02,991 damage_in_latency: 835x550 in 835ms 2013-07-11 22:13:03,021 damage_in_latency: 607x523 in 851ms 2013-07-11 22:13:03,055 damage_in_latency: 607x523 in 861ms 2013-07-11 22:13:03,109 damage_in_latency: 835x550 in 897ms 2013-07-11 22:13:03,142 damage_in_latency: 607x522 in 919ms 2013-07-11 22:13:03,177 damage_in_latency: 607x521 in 937ms 2013-07-11 22:13:03,230 damage_in_latency: 835x550 in 938ms 2013-07-11 22:13:03,291 damage_in_latency: 816x550 in 954ms 2013-07-11 22:13:03,357 damage_in_latency: 835x550 in 994ms 2013-07-11 22:13:03,420 damage_in_latency: 841x611 in 1048ms 2013-07-11 22:13:03,461 damage_in_latency: 607x522 in 1071ms 2013-07-11 22:13:03,517 damage_in_latency: 835x550 in 1108ms 2013-07-11 22:13:03,554 damage_in_latency: 607x522 in 1128ms 2013-07-11 22:13:03,592 damage_in_latency: 607x522 in 1152ms 2013-07-11 22:13:03,647 damage_in_latency: 835x550 in 1187ms 2013-07-11 22:13:03,686 damage_in_latency: 607x520 in 1210ms 2013-07-11 22:13:03,722 damage_in_latency: 607x522 in 1227ms 2013-07-11 22:13:03,795 damage_in_latency: 835x550 in 1281ms 2013-07-11 22:13:03,838 damage_in_latency: 607x523 in 1312ms
Here, in 2 seconds, we went from taking ~20ms to process the request to 1.3s!
And because of that, we end up with a high batch delay to compensate (too late):
batch.actual_delay.90p=632 batch.actual_delay.avg=516 batch.actual_delay.cur=4120 batch.actual_delay.max=7811 batch.actual_delay.min=46
What we need is a mechanism similar to the one we use to wait for ACKs from the client before we process more damage requests (used the average client latency to decide how long is too long): we should not have more damage requests in flight than we can process in a reasonable amount of time. The difficulty is that the encoding speed varies greatly depending on the size of the regions.. Also, if we end up batching too much, we may end up sending a full screen update..
As I've said in the past - PNG is a slow algorithm.
r3855 solves this by moving the damage latency factors out of the window performance statistics and into the window-source's dynamic delay handling: we keep batching until the regions waiting for encoding get low enough.
This prevents the dramatic increase to the batch delay when the encoder is performing poorly: we only increase it as much as needed to keep the buffers busy. (A much better solution than leaving it to the timer to evaluate - often adjusting too late)
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/377