#242 closed defect (fixed)
0.8.0: client-side errors in log
Reported by: | onlyjob | Owned by: | Antoine Martin |
---|---|---|---|
Priority: | major | Milestone: | 0.8 |
Component: | core | Version: | trunk |
Keywords: | Cc: |
Description
Changing encoding (x264 to rgb24; rgb24 to x264, etc.) when gwenview
is running in xpra-0.8.0 session produces multiple errors in log:
2013-02-02 11:11:07,389 draw error Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 127, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have! 2013-02-02 11:11:07,392 error processing draw packet Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1124, in _draw_thread_loop self._do_draw(packet) File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 127, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have!
This problem is not specific to gwenview
and affect some other applications. I've seen it in kmail
too.
Change History (11)
comment:1 Changed 8 years ago by
Owner: | set to Antoine Martin |
---|---|
Status: | new → accepted |
comment:2 Changed 8 years ago by
I still cannot reproduce this (I suspect it has something to do with client DE and window resizing/positioning) so please try r2639 and see if it fixes things for you.
comment:3 Changed 8 years ago by
I've seen few errors in 0.8.1 but after applying the patch from comment:1 I haven't seen errors for ~90m. This is not a final confirmation though as it happen to be more difficult to reproduce in 0.8.1. I will post another update later...
comment:4 Changed 8 years ago by
The only difference between r2639 and the patch above is when we clear the last_pixmap_data
reference: before instead of after the video encoder cleanup..
This is a race, I'll have to review again I guess.
comment:5 Changed 8 years ago by
Reporting back as promised: patch didn't fixed it, at least not completely.
I got
2013-02-04 16:52:33,272 draw error Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 130, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have! 2013-02-04 16:52:33,274 error processing draw packet Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1124, in _draw_thread_loop self._do_draw(packet) File "/usr/lib/python2.7/dist-packages/xpra/client.py", line 1168, in _do_draw window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time]) File "/usr/lib/python2.7/dist-packages/xpra/platform/client_tray.py", line 130, in draw_region self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 450, in draw_region self.paint_image(coding, img_data, x, y, width, height, rowstride, options, callbacks) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 385, in paint_image img_data = self.process_delta(raw_data, width, height, rowstride, options) File "/usr/lib/python2.7/dist-packages/xpra/window_backing.py", line 112, in process_delta raise Exception("delta region references pixmap data we do not have!") Exception: delta region references pixmap data we do not have!
in last few hours but that's much much less then what 0.8.0 was logging.
I reckon r2639 helped. Also it is definitely more difficult to replicate -- I couldn't quite get it deliberately, those errors just appears sometimes and I don't know why or what action produces them...
comment:6 Changed 8 years ago by
I still do not fully understand what is going on here.
Threads are involved, so it may well be a race - it does look like one - I just can't see where... or maybe I can?
So here is an explanation of what delta does, both for my own sake, also documentation can't hurt (this should go on the wiki when fixed), and so that you can proof read it and hopefully spot something stupid I have missed.
Server side:
last_pixmap_data
stores a copy of the last pixmap we sent to the client (as well as its size, sequence number, etc). It may be set toNone
: at first or when we clear it. As per the changes in r2639: we clear it when switching encoding or when cancelling frames.
Since r2660, we also clear it when the client reports failing to decode a frame (using a negative decode time). This is only for recovering quicker, it doesn't fix the underlying issue which is that the client does not have the pixmap that the server thinks it has.. The client should never fail to decode frames. (*)
- We only set
last_pixmap_data
at the end ofmake_data_packet
, once we've successfully encoded a frame. And after checking for cancellations (frames can get cancelled if the window is destroyed, resized, encoding changed..). By that point, the packet we generated will end up queued for sending usingqueue_damage_packet
and cannot be cancelled. The whole system relies on this packet reaching the client and being acknowledged back. This is important for x264/vpx as they use intra-frames compression, and here because we rely on the client having a copy of the uncompressed data. When we set it, we also set the "store" attribute so that the client knows it needs to keep a copy around after decoding (which may be used for later delta encoding). - When
last_pixmap_data
is set and we get a new screen update of exactly the same size, and we use a lossless encoding, and both the client and server support the required features, then we xor the two pixmaps before compression. We also pass the sequence number of the reference frame we used for xoring to the client (viaclient_options
as an attribute called "delta") so the client knows that it needs to xor the data after decompression (and can also check that the frame sequence number matches what it has).
Client side:
- the draw thread loop picks up draw packets posted by the network thread and calls draw_region on the window class, passing it a callback which will send the acknowledgement packet back to the server once done (it also takes care of firing this ack packet in the case of decoding errors - but you should see an error in the client output in this case)
- the gtk2 version of the window class (this is the default implementation used, the cairo/gtk3 and GL versions aren't supported yet) will dispatch the paint request to the appropriate method for the encoding specified (
paint_image
for png and jpeg,paint_rgb24
for rgb24,paint_with_video_decoder
for x264 and vpx...) - just like the server, this class keeps a
last_pixmap_data
variable, which starts empty - in the case of lossless encodings (rgb24 or png), all their paint methods (there are variants using
python-imaging
or plaingdk-pixbuf
), end up callingprocess_delta
after decompressing the data (but before the actual painting part) - in
process_delta
, we first look for the "delta" attribute and if set, we verify that we have stored a previous pixmap, that it is the right size, that the sequence matches and then we xor it (this is the part that goes wrong) - at the end of
process_delta
, after xoring if needed, we then store the pixmap data in "last_pixmap_data
" whenever we see a "store" attribute. (both "store" and "delta" can be set at the same time: we apply the xor delta then store the result)
The problem you are seeing comes from the fact that the server relies on the client having a copy of a frame (as the server sent a "store" previously) and it does not.
Special cases:
- when the client resizes, we re-initialize the backing class (see
init
inmake_new_backing
), but this does not clear thelast_pixmap_data
- when the window is already gone, we don't decode anything, we don't store anything, but since the window is gone...
Maybe: we get the draw before the window has had a chance of showing up? (new windows are processed in the UI thread, draws are not and therefore may get there quicker)
comment:7 Changed 8 years ago by
Can you please test r2661 and r2662 and tell me if either of them fixes the issue.
I don't like r2662 much because it makes us use the UI thread for one instruction, just to ensure they are processed in the right order (it is very hard to come up with a better solution). But unfortunately, I think this is the cause of the problem you reported.
comment:8 Changed 8 years ago by
comment:9 Changed 8 years ago by
comment:10 Changed 8 years ago by
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Thanks for all the testing.
comment:11 Changed 5 weeks ago by
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/242
I'm not seeing it in either...
Does this patch fix things?