xpra icon
Bug tracker and wiki

Opened 4 years ago

Closed 4 years ago

#866 closed defect (fixed)

decompression error in rgb delta

Reported by: Antoine Martin Owned by: alas
Priority: blocker Milestone: 0.16
Component: core Version: 0.15.x
Keywords: Cc:

Description (last modified by Antoine Martin)

Triggered using the command lines from ticket/863:

  • server:
    XPRA_MAX_DELTA_HITS=2 xpra start :10 --start-child=xterm --encodings=rgb
    
  • client:
    xpra attach --no-mmap -z 0
    

Then just press enter in the xterm until the screen floods with:

error processing draw packet
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 2002, in _draw_thread_loop
    self._do_draw(packet)
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 2048, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 423, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 473, in draw_region
    self.paint_rgb24(img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 264, in paint_rgb24
    rgb24_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 170, in process_delta
    img_data = compression.decompress_by_name(raw_data, algo=comp[0])
  File "/usr/lib64/python2.7/site-packages/xpra/net/compression.py", line 248, in decompress_by_name
    return decompress(data, flag)
  File "/usr/lib64/python2.7/site-packages/xpra/net/compression.py", line 227, in decompress
    return LZ4_uncompress(data)
ValueError: corrupt input at byte 532

I am also seeing some visual corruption, very similar to ticket/465 (which is also about memoryview):
shows the visual corruption
Maybe we need to freeze() or copy the pixel buffer?

The errors only occur when the scrolling triggers a full window refresh with rgb + lz4, smaller screen updates with rgb + lz4 go through without problem:

Maybe we should honour the -z 0 flag here and not compress the rgb pixels at all? (the z flag was for packet compression originally)

Also, the error is sent back to the server which then does a refresh, but this all happens to quickly and we end up in a tight loop. Probably worth adding a delay when refreshing after an error.

Attachments (3)

xterm-xor-rgb-lz4-corruption.png (5.0 KB) - added by Antoine Martin 4 years ago.
shows the visual corruption
integrity-hash-v2.patch (4.9 KB) - added by Antoine Martin 4 years ago.
adds the ability to verify that the data we get before and after decompression matches what the server meant to send
integrity-hash-v3.patch (3.4 KB) - added by Antoine Martin 4 years ago.
also verifies that we can decompress the lz4 data without errors

Download all attachments as: .zip

Change History (14)

comment:1 Changed 4 years ago by Antoine Martin

Description: modified (diff)
Status: newassigned

comment:2 Changed 4 years ago by Antoine Martin

Description: modified (diff)

Changed 4 years ago by Antoine Martin

shows the visual corruption

comment:3 Changed 4 years ago by Antoine Martin

Description: modified (diff)

Changed 4 years ago by Antoine Martin

Attachment: integrity-hash-v2.patch added

adds the ability to verify that the data we get before and after decompression matches what the server meant to send

comment:4 Changed 4 years ago by Antoine Martin

The visual corruption is fixed in r9469, backport to 0.15 in 9470.
(the previous fix r9449 was incomplete)

The "corrupt input" remains..

comment:5 Changed 4 years ago by Antoine Martin

WTH?

  • using compressors=zlib, no problem
  • using compressors=lzo, no problem.
  • using compressors=lz4 gives the ValueError: corrupt input at byte 532

Very very odd.

Last edited 4 years ago by Antoine Martin (previous) (diff)

Changed 4 years ago by Antoine Martin

Attachment: integrity-hash-v3.patch added

also verifies that we can decompress the lz4 data without errors

comment:6 Changed 4 years ago by Antoine Martin

Also interesting: in attachment/ticket/869/ticket869_osx-client-logs.txt (duplicate ticket), we can see that the corrupt byte is at a different place (also always the same):

ValueError: corrupt input at byte 14

And apparently the server segfaults.
All points towards a memory corruption issue. #465 and zero copy memoryview me thinks.

Last edited 4 years ago by Antoine Martin (previous) (diff)

comment:7 Changed 4 years ago by Antoine Martin

Owner: changed from Antoine Martin to alas
Status: assignednew

Fixed in r9480, see commit message.

I wished I had time to investigate why this only affected lz4. Which thread does the decompression should not matter! (network thread or decode thread)

All this time was not all wasted though, we now have:

  • r9476: optional data integrity checks via XPRA_INTEGRITY_HASH=1
  • r9477: logging improvements
  • r9478 + r9479: less unnecessary copying of pixel data with lzo and lz4 (which I dare not backport at this point)

@afarr: please confirm and close, using the instructions in the ticket description, and also the same steps as #869 (that max reported - he can help you with testing).
Please also confirm that none of this affected the 0.15 branch... which is the one branch we should be focusing on.

Last edited 4 years ago by Antoine Martin (previous) (diff)

comment:8 Changed 4 years ago by Antoine Martin

Changed my mind: found too many issues related to buffers vs memoryview (#465), so r9490 backports a number of important fixes to v0.15.x (including r9478 + r9479 and more).

It is worth double-checking:

  • the proxy server, with nvenc
  • nvenc standalone

etc..

comment:9 Changed 4 years ago by alas

Well, testing with 0.15.0 r9470 windows client against 0.15.0 r9470 fedora 20 server, with a firefox video playing on youtube, I was seeing a similar error that didn't seem to be lz4 specific:

2015-05-27 16:43:49,871 error processing draw packet
Traceback (most recent call last):
  File "xpra\client\ui_client_base.pyc", line 1975, in _draw_thread_loop
  File "xpra\client\ui_client_base.pyc", line 2021, in _do_draw
  File "xpra\client\client_window_base.pyc", line 423, in draw_region
  File "xpra\client\window_backing_base.pyc", line 473, in draw_region
  File "xpra\client\window_backing_base.pyc", line 264, in paint_rgb24
  File "xpra\client\window_backing_base.pyc", line 171, in process_delta
  File "xpra\net\compression.pyc", line 242, in decompress_by_name
  File "xpra\net\compression.pyc", line 221, in decompress
ValueError: invalid size in header: 0xff000000

Testing the same site & video with win32 client 0.15.0 r9533 against fedora 20 0.15.0 r9533 server, however... there is no sign of this error.

Oddly though, and I'm not sure if this is related to these backports, when I tested on osx (also 0.15.0 r9533) - XPRA_OPENGL_PAINT_BOX boxes were displaying without having used any flags to enable them. (Will post this and the relatively unhelpful server-side logs in #760.)

comment:10 Changed 4 years ago by Antoine Martin

ValueError: invalid size in header: 0xff000000
I am pretty sure that this error was fixed 6 days ago in r9480 (r9490 for older branches).

Unless you can reproduce this with an up to date build, I think we can close this.


XPRA_OPENGL_PAINT_BOX boxes were displaying without having used any flags to enable them


Let's follow that up in ticket:760#comment:23

comment:11 Changed 4 years ago by alas

Resolution: fixed
Status: newclosed

Testing with the win32 0.15.0 r9533 against a fedora 20 0.15.0 r9533 server... no sign of that draw_thread_loop error - niehter from playing video nor from a lot of returns in an xterm with --encodings=rgb.

Closing.

Note: See TracTickets for help on using tickets.