Xpra: Ticket #866: decompression error in rgb delta

Triggered using the command lines from ticket/863:

server:

XPRA_MAX_DELTA_HITS=2 xpra start :10 --start-child=xterm --encodings=rgb

client:
```
xpra attach --no-mmap -z 0
```

Then just press enter in the xterm until the screen floods with:

error processing draw packet
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 2002, in _draw_thread_loop
    self._do_draw(packet)
  File "/usr/lib64/python2.7/site-packages/xpra/client/ui_client_base.py", line 2048, in _do_draw
    window.draw_region(x, y, width, height, coding, data, rowstride, packet_sequence, options, [record_decode_time])
  File "/usr/lib64/python2.7/site-packages/xpra/client/client_window_base.py", line 423, in draw_region
    self._backing.draw_region(x, y, width, height, coding, img_data, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 473, in draw_region
    self.paint_rgb24(img_data, x, y, width, height, rowstride, options, callbacks)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 264, in paint_rgb24
    rgb24_data = self.process_delta(raw_data, width, height, rowstride, options)
  File "/usr/lib64/python2.7/site-packages/xpra/client/window_backing_base.py", line 170, in process_delta
    img_data = compression.decompress_by_name(raw_data, algo=comp[0])
  File "/usr/lib64/python2.7/site-packages/xpra/net/compression.py", line 248, in decompress_by_name
    return decompress(data, flag)
  File "/usr/lib64/python2.7/site-packages/xpra/net/compression.py", line 227, in decompress
    return LZ4_uncompress(data)
ValueError: corrupt input at byte 532

I am also seeing some visual corruption, very similar to ticket/465 (which is also about memoryview): Maybe we need to freeze() or copy the pixel buffer?

The errors only occur when the scrolling triggers a full window refresh with rgb + lz4, smaller screen updates with rgb + lz4 go through without problem:

Maybe we should honour the -z 0 flag here and not compress the rgb pixels at all? (the z flag was for packet compression originally)

Also, the error is sent back to the server which then does a refresh, but this all happens to quickly and we end up in a tight loop. Probably worth adding a delay when refreshing after an error.

Thu, 21 May 2015 05:47:48 GMT - Antoine Martin: status, description changed

status changed from new to assigned
description modified (diff)

Thu, 21 May 2015 05:51:42 GMT - Antoine Martin: description changed

description modified (diff)

Thu, 21 May 2015 05:52:09 GMT - Antoine Martin: attachment set

attachment set to xterm-xor-rgb-lz4-corruption.png

shows the visual corruption

Thu, 21 May 2015 05:52:50 GMT - Antoine Martin: description changed

description modified (diff)

Thu, 21 May 2015 13:52:28 GMT - Antoine Martin: attachment set

attachment set to integrity-hash-v2.patch

adds the ability to verify that the data we get before and after decompression matches what the server meant to send

Thu, 21 May 2015 16:58:36 GMT - Antoine Martin:

The visual corruption is fixed in r9469, backport to 0.15 in 9470. (the previous fix r9449 was incomplete)

The "corrupt input" remains..

Thu, 21 May 2015 17:08:42 GMT - Antoine Martin:

WTH?

using compressors=zlib, no problem
using compressors=lzo, no problem.
using compressors=lz4 gives the ValueError: corrupt input at byte 532

Very very odd.

Thu, 21 May 2015 17:17:59 GMT - Antoine Martin: attachment set

attachment set to integrity-hash-v3.patch

also verifies that we can decompress the lz4 data without errors

Fri, 22 May 2015 05:31:40 GMT - Antoine Martin:

Also interesting: in attachment/ticket/869/ticket869_osx-client-logs.txt (duplicate ticket), we can see that the corrupt byte is at a different place (also always the same):

ValueError: corrupt input at byte 14

And apparently the server segfaults. All points towards a memory corruption issue. #465 and zero copy memoryview me thinks.

Fri, 22 May 2015 11:33:13 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to alas
status changed from assigned to new

Fixed in r9480, see commit message.

I wished I had time to investigate why this only affected lz4. Which thread does the decompression should not matter! (network thread or decode thread)

All this time was not all wasted though, we now have:

r9476: optional data integrity checks via XPRA_INTEGRITY_HASH=1
r9477: logging improvements
r9478 + r9479: less unnecessary copying of pixel data with lzo and lz4 (which I dare not backport at this point)

@afarr: please confirm and close, using the instructions in the ticket description, and also the same steps as #869 (that max reported - he can help you with testing). Please also confirm that none of this affected the 0.15 branch... which is the one branch we should be focusing on.

Fri, 22 May 2015 15:28:16 GMT - Antoine Martin:

Changed my mind: found too many issues related to buffers vs memoryview (#465), so r9490 backports a number of important fixes to v0.15.x (including r9478 + r9479 and more).

It is worth double-checking:

the proxy server, with nvenc
nvenc standalone

etc..

Thu, 28 May 2015 01:08:06 GMT - alas:

Well, testing with 0.15.0 r9470 windows client against 0.15.0 r9470 fedora 20 server, with a firefox video playing on youtube, I was seeing a similar error that didn't seem to be lz4 specific:

2015-05-27 16:43:49,871 error processing draw packet
Traceback (most recent call last):
  File "xpra\client\ui_client_base.pyc", line 1975, in _draw_thread_loop
  File "xpra\client\ui_client_base.pyc", line 2021, in _do_draw
  File "xpra\client\client_window_base.pyc", line 423, in draw_region
  File "xpra\client\window_backing_base.pyc", line 473, in draw_region
  File "xpra\client\window_backing_base.pyc", line 264, in paint_rgb24
  File "xpra\client\window_backing_base.pyc", line 171, in process_delta
  File "xpra\net\compression.pyc", line 242, in decompress_by_name
  File "xpra\net\compression.pyc", line 221, in decompress
ValueError: invalid size in header: 0xff000000

Testing the same site & video with win32 client 0.15.0 r9533 against fedora 20 0.15.0 r9533 server, however... there is no sign of this error.

Oddly though, and I'm not sure if this is related to these backports, when I tested on osx (also 0.15.0 r9533) - XPRA_OPENGL_PAINT_BOX boxes were displaying without having used any flags to enable them. (Will post this and the relatively unhelpful server-side logs in #760.)

Thu, 28 May 2015 03:54:19 GMT - Antoine Martin:

ValueError: invalid size in header: 0xff000000 I am pretty sure that this error was fixed 6 days ago in r9480 (r9490 for older branches).

Unless you can reproduce this with an up to date build, I think we can close this.

XPRA_OPENGL_PAINT_BOX boxes were displaying without having used any flags to enable them

Let's follow that up in ticket:760#comment:23

Mon, 01 Jun 2015 22:36:47 GMT - alas: status changed; resolution set

status changed from new to closed
resolution set to fixed

Testing with the win32 0.15.0 r9533 against a fedora 20 0.15.0 r9533 server... no sign of that draw_thread_loop error - niehter from playing video nor from a lot of returns in an xterm with --encodings=rgb.

Closing.

Sat, 23 Jan 2021 05:08:17 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/866