Xpra: Ticket #910: fault injection for testing and triggering bugs

Hopefully this can help us with hard to trigger bugs and for catching corner cases that are difficult to encounter under normal circumstances. Things like error handlers, etc..

For example:

More to come, for #902.



Tue, 07 Jul 2015 17:16:19 GMT - Antoine Martin: status changed

r9894 does the same thing for the client, pretending that we failed to paint something (which should trigger a refresh):

$ XPRA_PAINT_FAULT_INJECTION_RATE=100 xpra attach --no-mmap
(...)
injecting paint fault for h264 packet 100
injecting paint fault for h264 packet 200
...

Wed, 08 Jul 2015 08:34:34 GMT - Antoine Martin:

r9899 adds the ability to fail to report the failures to the server with:

XPRA_PAINT_FAULT_INJECTION_RATE=100 XPRA_PAINT_FAULT_INJECTION_TELL=0 xpra attach

Which makes the server stop sending window updates and eventually print:

delayed_region_timeout: region is 15025ms old, bad connection?
...

r9824 also added the ability to simulate "hello" initial packet parsing errors with:

XPRA_SIMULATE_SERVER_HELLO_ERROR=1 xpra start ...

Which tests the hello error handling.


Thu, 10 Sep 2015 07:06:09 GMT - Antoine Martin: owner, status changed

This will do for now. As of r10587, all the fault injection environment variables are documented here: wiki/Usage/EnvironmentOptions.

@afarr: the idea here is to make it easier to exercise code which would not normally fire. we want to check:

PS: not doing an xpra control interface as this is too intricate


Sat, 24 Oct 2015 06:31:53 GMT - Antoine Martin:

r10992 also makes it possible to fail individual codecs:

This can be useful for testing different decoders/encoders, in particular disabling pillow support. (see also #1010)


Tue, 27 Oct 2015 01:53:41 GMT - alas: owner changed

[root@jimador codecs]# XPRA_CODEC_FAIL_IMPORT=enc_pillow,dec_pillow python loader.py
codecs and csc modules found:
* csc_cython           : csc_cython/colorspace_converter.so
* csc_opencl           :
* csc_swscale          : csc_swscale/colorspace_converter.so
* dec_avcodec2         : dec_avcodec2/decoder.so
* dec_pillow           :
* dec_vpx              : vpx/decoder.so
* dec_webp             : webp/decode.so
* enc_pillow           :
* enc_vpx              : vpx/encoder.so
* enc_webp             : webp/encode.so
* enc_x264             : enc_x264/encoder.so
* enc_x265             :
* nvenc4               :
* nvenc5               :
codecs versions:
* PIL                  : 3.0.0
* avcodec2             : 56.41.100
* buffer_api           : 1
* cython               : 1.0.23.1
* dec_pillow           : 3.0.0
* dec_webp             : 0.4.3
* enc_pillow           : 3.0.0
* enc_webp             : 0.4.3
* numpy                : 1.8.2
* swscale              : 3.1.101
* vpx                  : 1.4.0
* x264                 : 146

... presumably as expected.

Likewise, XPRA_CODEC_FAIL_SELFTEST=enc_pillow,dec_pillow ./xpra/codecs/loader.py outputs:

[root@jimador codecs]# XPRA_CODEC_FAIL_SELFTEST=enc_pillow,dec_pillow python loader.py
failed to import Pillow encoder (enc_pillow)
 codec found in fail selftest list
failed to import Pillow decoder (dec_pillow)
 codec found in fail selftest list
codecs and csc modules found:
* csc_cython           : csc_cython/colorspace_converter.so
* csc_opencl           :
* csc_swscale          : csc_swscale/colorspace_converter.so
* dec_avcodec2         : dec_avcodec2/decoder.so
* dec_pillow           :
* dec_vpx              : vpx/decoder.so
* dec_webp             : webp/decode.so
* enc_pillow           :
* enc_vpx              : vpx/encoder.so
* enc_webp             : webp/encode.so
* enc_x264             : enc_x264/encoder.so
* enc_x265             :
* nvenc4               :
* nvenc5               :
codecs versions:
* PIL                  : 3.0.0
* avcodec2             : 56.41.100
* buffer_api           : 1
* cython               : 1.0.23.1
* dec_pillow           : 3.0.0
* dec_webp             : 0.4.3
* enc_pillow           : 3.0.0
* enc_webp             : 0.4.3
* numpy                : 1.8.2
* swscale              : 3.1.101
* vpx                  : 1.4.0
* x264                 : 146

2015-10-26 18:19:27,586 draw error
Traceback (most recent call last):
  File "xpra\client\ui_client_base.pyc", line 2475, in _do_draw
  File "xpra\client\client_window_base.pyc", line 539, in draw_region
  File "xpra\client\window_backing_base.pyc", line 516, in draw_region
  File "xpra\client\window_backing_base.pyc", line 301, in paint_rgb32
  File "xpra\client\window_backing_base.pyc", line 183, in process_delta
AssertionError: delta bucket 2 data does not match: expected (1051, 1, 2078) but got (1051, 1, 2075)

Disconnecting then running with both XPRA_PAINT_FAULT_INJECTION_RATE=100 XPRA_PAINT_FAULT_INJECTION_TELL=0 xpra attach - I did very quickly get not only the expected 2015-10-26 18:21:03,944 delayed_region_timeout: region is 15041ms old, bad connection? error (and a black screen in place of browser app contents until the video stopped "playing") - I also got an error that might be of interest:

2015-10-26 18:23:51,826 Error: expiring missing damage acks: [96].


Running server with XPRA_SIMULATE_SERVER_HELLO_ERROR=1, with no encryption or with encryption and password-file and encryption-keyfile triggers errors that I expect are expected:

Server-side:

2015-10-26 18:29:51,038 Connection lost
2015-10-26 18:29:51,250 server error processing new connection from Protocol(tcp socket: 10.0.32.136:2200 <- 10.0.11.162:64264): Simulating a server error
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_core.py", line 657, in _process_hello
    raise Exception("Simulating a server error")
Exception: Simulating a server error
2015-10-26 18:29:51,251 Disconnecting client '10.0.11.162:64264':
2015-10-26 18:29:51,252  server error (error accepting new connection)
2015-10-26 18:29:51,253 Connection lost

And client side (in yellow, which I take it indicates it is a simulated error?):

2015-10-26 18:29:34,960 server failure: disconnected before the session could be established
2015-10-26 18:29:34,960 server requested disconnect: server error (error accepting new connection)
2015-10-26 18:29:34,994 Connection lost

When terminating the server running with encryption and password-file and encryption-keyfile, with a control-C, I got an unusual traceback:

^C/usr/lib64/python2.7/threading.py:1163: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  return _active[_get_ident()]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/child_reaper.py", line 175, in reap
    self.poll()
  File "/usr/lib64/python2.7/site-packages/xpra/child_reaper.py", line 113, in poll
    self.add_dead_process(procinfo)
  File "/usr/lib64/python2.7/site-packages/xpra/child_reaper.py", line 154, in add_dead_process
    cb(procinfo.process)
  File "/usr/lib64/python2.7/site-packages/xpra/server/server_base.py", line 386, in pulseaudio_ended
    soundlog.warn("Warning: the pulseaudio server process has terminated after %i seconds", int(elapsed))
  File "/usr/lib64/python2.7/site-packages/xpra/log.py", line 329, in warn
    self.log(logging.WARN, msg, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/xpra/log.py", line 318, in log
    global_logging_handler(self.logger.log, level, msg, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/xpra/log.py", line 100, in standard_logging
    log(level, msg, *args, **kwargs)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1224, in log
    self._log(level, msg, args, **kwargs)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1278, in _log
    record = self.makeRecord(self.name, level, fn, lno, msg, args, exc_info, func, extra)
  File "/usr/lib64/python2.7/logging/__init__.py", line 1252, in makeRecord
    rv = LogRecord(name, level, fn, lno, msg, args, exc_info, func)
  File "/usr/lib64/python2.7/logging/__init__.py", line 291, in __init__
    self.threadName = threading.current_thread().name
  File "/usr/lib64/python2.7/threading.py", line 1163, in currentThread
    return _active[_get_ident()]
KeyboardInterrupt

2015-10-26 18:43:26,890 Received uninterpretable nonsense: invalid packet header: '50726f746f636f6c' read buffer='Protocol JUNK! added by fault injection code'
2015-10-26 18:43:26,890  packet no 369 data: 'Protocol JUNK! added by fault injection code'
2015-10-26 18:43:26,890 sound-sink stopping
2015-10-26 18:43:26,951 Connection lost

Server-side though, I got the fault injection warning in yellow, but then an error in red (though, the yellow does explain that, rather):

2015-10-26 18:43:43,167 injecting fault in Protocol(tcp socket: 10.0.32.136:2200 <- 10.0.11.162:64292)
2015-10-26 18:43:43,874 internal error: read connection tcp socket: 10.0.32.136:2200 <- 10.0.11.162:64292 reset
2015-10-26 18:43:43,874  [Errno 104] Connection reset by peer

Server side:

2015-10-26 18:47:11,597 injecting fault in Protocol(Pipe(sound-source))
2015-10-26 18:47:11,599 sound-source injecting fault in Protocol(Pipe())
2015-10-26 18:47:11,602 sound-source gibberish received:
2015-10-26 18:47:11,603 sound-source  "invalid packet header: '5772617070657220' read buffer='Wrapper JUNK! added by fa"...
2015-10-26 18:47:11,603 sound-source  stopping
2015-10-26 18:47:11,604 sound-source stopping on gibberish:
2015-10-26 18:47:11,603 sound-source stopping
2015-10-26 18:47:11,604  "invalid packet header: '5772617070657220' read buffer='Wrapper JUNK! added by fa"...

And client side all I see is 2015-10-26 18:46:55,338 sound-sink stopping.


Not sure if the wrapper layer fault injection is meant to only target the sound or not... in either case, I'll assign this back to you to wait on more new debugging tricks.


Tue, 27 Oct 2015 04:09:39 GMT - Antoine Martin: status changed; resolution set

For the codec stuff XPRA_CODEC_FAIL_IMPORT and XPRA_CODEC_FAIL_SELFTEST see ticket:1010#comment:4


With XPRA_PAINT_FAULT_INJECTION_TELL=0, you are meant to get both: region is 15041ms old, bad connection? and Error: expiring missing damage acks: [96]: the missing ack is making us delay the sending of new updates until we hit the timeout of 15 seconds.


And client side (in yellow, which I take it indicates it is a simulated error?):


0.16 includes a coloured stream logger, which is used with the server when running with {{{daemon=no) and with the client when running from a console. The definition of the colours can be found in the source: colorstreamhandler.py.


When terminating the server running with encryption and password-file and encryption-keyfile, with a control-C, I got an unusual traceback: ^C/usr/lib64/python2.7/threading.py:1163: RuntimeWarning: tp_compare didn't return -1 or -2 for exception


I believe that this only happens when running with daemon=no and using control-c, and it shouldn't be too much of a problem: if the server doesn't immediately exit, just hit control-c again. So I'm leaving this one as it is.


Server-side though, I got the fault injection warning in yellow, but then an error in red (though, the yellow does explain that, rather): ..


The warning tells you it is sending garbage to the client, the client then drops the connection and the server sees that as an error. All is well.


Not sure if the wrapper layer fault injection is meant to only target the sound or not... in either case, I'll assign this back to you to wait on more new debugging tricks.


The wrapper is generic but is currently only used for the sound subprocess.

I think this is enough for now, I have added a link to this ticket from wiki/Debugging under "fault injection", and I will try to add more fault injection as I go along.


Sat, 23 Jan 2021 05:09:32 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/910