Changing encoding (x264 to rgb24; rgb24 to x264, etc.) when gwenview
is running in xpra-0.8.0 session produces multiple errors in log:
2013-02-02 11:11:07,389 draw error Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 127, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have! 2013-02-02 11:11:07,392 error processing draw packet Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1124, in _draw_thread_loop self._do_draw(packet) File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 127, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have!
This problem is not specific to gwenview
and affect some other applications. I've seen it in kmail
too.
I'm not seeing it in either...
Does this patch fix things?
Index: src/xpra/window_source.py =================================================================== --- src/xpra/window_source.py (revision 2634) +++ src/xpra/window_source.py (working copy) @@ -436,6 +436,8 @@ #we will resend a key frame because it looks like we will #drop a frame which is being processed self.video_encoder_cleanup() + #clear delta since things got cancelled + self.last_pixmap_data = None def cancel_expire_timer(self): if self.expire_timer:
I still cannot reproduce this (I suspect it has something to do with client DE and window resizing/positioning) so please try r2639 and see if it fixes things for you.
I've seen few errors in 0.8.1 but after applying the patch from comment:1 I haven't seen errors for ~90m. This is not a final confirmation though as it happen to be more difficult to reproduce in 0.8.1. I will post another update later...
The only difference between r2639 and the patch above is when we clear the last_pixmap_data
reference: before instead of after the video encoder cleanup..
This is a race, I'll have to review again I guess.
Reporting back as promised: patch didn't fixed it, at least not completely. I got
2013-02-04 16:52:33,272 draw error Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 130, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have! 2013-02-04 16:52:33,274 error processing draw packet Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1124, in _draw_thread_loop self._do_draw(packet) File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 130, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have!
in last few hours but that's much much less then what 0.8.0 was logging. I reckon r2639 helped. Also it is definitely more difficult to replicate -- I couldn't quite get it deliberately, those errors just appears sometimes and I don't know why or what action produces them...
I still do not fully understand what is going on here.
Threads are involved, so it may well be a race - it does look like one - I just can't see where... or maybe I can?
So here is an explanation of what delta does, both for my own sake, also documentation can't hurt (this should go on the wiki when fixed), and so that you can proof read it and hopefully spot something stupid I have missed.
Server side:
last_pixmap_data
stores a copy of the last pixmap we sent to the client (as well as its size, sequence number, etc). It may be set to None
: at first or when we clear it. As per the changes in r2639: we clear it when switching encoding or when cancelling frames.
Since r2660, we also clear it when the client reports failing to decode a frame (using a negative decode time). This is only for recovering quicker, it doesn't fix the underlying issue which is that the client does not have the pixmap that the server thinks it has.. The client should never fail to decode frames. (*)
last_pixmap_data
at the end of make_data_packet
, once we've successfully encoded a frame. And after checking for cancellations (frames can get cancelled if the window is destroyed, resized, encoding changed..). By that point, the packet we generated will end up queued for sending using queue_damage_packet
and cannot be cancelled. The whole system relies on this packet reaching the client and being acknowledged back. This is important for x264/vpx as they use intra-frames compression, and here because we rely on the client having a copy of the uncompressed data. When we set it, we also set the "store" attribute so that the client knows it needs to keep a copy around after decoding (which may be used for later delta encoding).
last_pixmap_data
is set and we get a new screen update of exactly the same size, and we use a lossless encoding, and both the client and server support the required features, then we xor the two pixmaps before compression. We also pass the sequence number of the reference frame we used for xoring to the client (via client_options
as an attribute called "delta") so the client knows that it needs to xor the data after decompression (and can also check that the frame sequence number matches what it has).
Client side:
paint_image
for png and jpeg, paint_rgb24
for rgb24, paint_with_video_decoder
for x264 and vpx...)
last_pixmap_data
variable, which starts empty
python-imaging
or plain gdk-pixbuf
), end up calling process_delta
after decompressing the data (but before the actual painting part)
process_delta
, we first look for the "delta" attribute and if set, we verify that we have stored a previous pixmap, that it is the right size, that the sequence matches and then we xor it (this is the part that goes wrong)
process_delta
, after xoring if needed, we then store the pixmap data in "last_pixmap_data
" whenever we see a "store" attribute. (both "store" and "delta" can be set at the same time: we apply the xor delta then store the result)
The problem you are seeing comes from the fact that the server relies on the client having a copy of a frame (as the server sent a "store" previously) and it does not. Special cases:
init
in make_new_backing
), but this does not clear the last_pixmap_data
Maybe: we get the draw before the window has had a chance of showing up? (new windows are processed in the UI thread, draws are not and therefore may get there quicker)
Can you please test r2661 and r2662 and tell me if either of them fixes the issue. I don't like r2662 much because it makes us use the UI thread for one instruction, just to ensure they are processed in the right order (it is very hard to come up with a better solution). But unfortunately, I think this is the cause of the problem you reported.
So far tried r2661 -- it works great with new libav-9.1 but doesn't fix the problem with "draw" errors logged. I will test r2662 soon.
I think we can declare r2662 as viable fix for the issue.
With r2662 applied on top of 0.8.1 (together with r2661) I haven't seen a single draw error for hours.
Well done.
Thanks for all the testing.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/242