Video sub regions are a little bit unpredictable and often end up destroying video contexts and re-creating them later... which quickly led to:
Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1011, in setup_pipeline self._video_encoder.init_context(enc_width, enc_height, enc_in_format, encoder_spec.encoding, quality, speed, encoder_scaling, self.encoding_options) File "encoder.pyx", line 1315, in xpra.codecs.nvenc.encoder.Encoder.init_context (xpra/codecs/nvenc/encoder.c:7730) File "encoder.pyx", line 1351, in xpra.codecs.nvenc.encoder.Encoder.init_cuda (xpra/codecs/nvenc/encoder.c:8492) File "encoder.pyx", line 1209, in xpra.codecs.nvenc.encoder.get_BGRA2NV12 (xpra/codecs/nvenc/encoder.c:6516) File "encoder.pyx", line 1197, in xpra.codecs.nvenc.encoder.get_CUDA_kernel (xpra/codecs/nvenc/encoder.c:6268) MemoryError: cuModuleLoadDataEx failed: out of memory -
And probably also this one:
setup_pipeline failed for (61, None, 'BGRX', codec_spec(nvenc:nvenc)) Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1011, in setup_pipeline self._video_encoder.init_context(enc_width, enc_height, enc_in_format, encoder_spec.encoding, quality, speed, encoder_scaling, self.encoding_options) File "encoder.pyx", line 1315, in xpra.codecs.nvenc.encoder.Encoder.init_context (xpra/codecs/nvenc/encoder.c:7730) File "encoder.pyx", line 1377, in xpra.codecs.nvenc.encoder.Encoder.init_cuda (xpra/codecs/nvenc/encoder.c:9153) File "encoder.pyx", line 1424, in xpra.codecs.nvenc.encoder.Encoder.init_nvenc (xpra/codecs/nvenc/encoder.c:9582) File "encoder.pyx", line 1183, in xpra.codecs.nvenc.encoder.raiseNVENC (xpra/codecs/nvenc/encoder.c:6065) Exception: initializing encoder - returned 8: This indicates that one or more of the parameter passed to the API call is invalid.
Note: As part of the work on video regions (#410), nvenc also needed a fix (r5442) for handling input data with a larger rowstride than anticipated (which is often the case with video subregions and
This is 100% reproducible, simply resizing a fast refreshing window causes the encoder to re-init lots of times, often losing 15 to 30MB of memory each time.
Strange thing is, when I run a test designed specifically for reproducing this bug by creating and destroying lots of contexts (see r5469), even after randomizing the input (r5470), I cannot reproduce the leak there!?
I think I have found it: we clean the encoder contexts using the background worker to prevent delays in the encoding thread. Calling
encoder.clean directly (as done in the tests) prevents the leak.
Either CUDA and/or NVENC aren't really thread safe, despite their claims, or the worker gets stuck (which is very unlikely).
Third option, likely the right one: we need locking around the CUDA context switching code (push/pop) to prevent multiple threads (in this case: encoding thread and worker thread calling clean) interacting with the same GPU (even though that is done through a different context object - could be related to how python does its threading).
Which means that we will often end up serializing access from the encoding thread anyway, so why bother doing clean in the worker thread and add the complication and overhead of locking? Probably best to just clean from the encoding thread directly.
Alternatively, we could split the cleanup into 2 parts:
Doing cleanup in the worker thread was done because the cost of setting up or destroying an nvenc context is high (see r4708), so this means the simpler option is probably better and this will make #466 more pressing: keeping the same context whilst resizing will mitigate this.
This will need to be backported to v0.11.x.
Note for those landing here: NVENC is not safe to use in versions older than 0.15 because of a context leak due to threading.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/517