Xpra: Ticket #517: nvenc memory leak

Video sub regions are a little bit unpredictable and often end up destroying video contexts and re-creating them later... which quickly led to:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1011, in setup_pipeline
    self._video_encoder.init_context(enc_width, enc_height, enc_in_format, encoder_spec.encoding, quality, speed, encoder_scaling, self.encoding_options)
  File "encoder.pyx", line 1315, in xpra.codecs.nvenc.encoder.Encoder.init_context (xpra/codecs/nvenc/encoder.c:7730)
  File "encoder.pyx", line 1351, in xpra.codecs.nvenc.encoder.Encoder.init_cuda (xpra/codecs/nvenc/encoder.c:8492)
  File "encoder.pyx", line 1209, in xpra.codecs.nvenc.encoder.get_BGRA2NV12 (xpra/codecs/nvenc/encoder.c:6516)
  File "encoder.pyx", line 1197, in xpra.codecs.nvenc.encoder.get_CUDA_kernel (xpra/codecs/nvenc/encoder.c:6268)
MemoryError: cuModuleLoadDataEx failed: out of memory -

And probably also this one:

setup_pipeline failed for (61, None, 'BGRX', codec_spec(nvenc:nvenc))
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1011, in setup_pipeline
    self._video_encoder.init_context(enc_width, enc_height, enc_in_format, encoder_spec.encoding, quality, speed, encoder_scaling, self.encoding_options)
  File "encoder.pyx", line 1315, in xpra.codecs.nvenc.encoder.Encoder.init_context (xpra/codecs/nvenc/encoder.c:7730)
  File "encoder.pyx", line 1377, in xpra.codecs.nvenc.encoder.Encoder.init_cuda (xpra/codecs/nvenc/encoder.c:9153)
  File "encoder.pyx", line 1424, in xpra.codecs.nvenc.encoder.Encoder.init_nvenc (xpra/codecs/nvenc/encoder.c:9582)
  File "encoder.pyx", line 1183, in xpra.codecs.nvenc.encoder.raiseNVENC (xpra/codecs/nvenc/encoder.c:6065)
Exception: initializing encoder - returned 8: This indicates that one or more of the parameter passed to the API call is invalid.

Note: As part of the work on video regions (#410), nvenc also needed a fix (r5442) for handling input data with a larger rowstride than anticipated (which is often the case with video subregions and XShm).

Fri, 14 Feb 2014 15:53:50 GMT - Antoine Martin: status changed

status changed from new to assigned

This is 100% reproducible, simply resizing a fast refreshing window causes the encoder to re-init lots of times, often losing 15 to 30MB of memory each time.

Strange thing is, when I run a test designed specifically for reproducing this bug by creating and destroying lots of contexts (see r5469), even after randomizing the input (r5470), I cannot reproduce the leak there!?

Fri, 14 Feb 2014 16:17:39 GMT - Antoine Martin:

I think I have found it: we clean the encoder contexts using the background worker to prevent delays in the encoding thread. Calling encoder.clean directly (as done in the tests) prevents the leak.

Either CUDA and/or NVENC aren't really thread safe, despite their claims, or the worker gets stuck (which is very unlikely).

Sat, 15 Feb 2014 03:12:52 GMT - Antoine Martin:

Third option, likely the right one: we need locking around the CUDA context switching code (push/pop) to prevent multiple threads (in this case: encoding thread and worker thread calling clean) interacting with the same GPU (even though that is done through a different context object - could be related to how python does its threading).

Which means that we will often end up serializing access from the encoding thread anyway, so why bother doing clean in the worker thread and add the complication and overhead of locking? Probably best to just clean from the encoding thread directly.

Alternatively, we could split the cleanup into 2 parts:

clean: called from encoding thread, nvenc (and probably also cuda csc) can do everything from there
destroy: called from worker thread later, other encoders can do their cleanup there cheaply

Doing cleanup in the worker thread was done because the cost of setting up or destroying an nvenc context is high (see r4708), so this means the simpler option is probably better and this will make #466 more pressing: keeping the same context whilst resizing will mitigate this.

This will need to be backported to v0.11.x.

Sat, 15 Feb 2014 03:39:17 GMT - Antoine Martin: status changed; resolution set

status changed from assigned to closed
resolution set to fixed

Fixed in r5473, backport in r5474. Will follow up in #466.

Another leak, decoding side this time, was caused by r5515, fixed in r5542

Sun, 29 Mar 2015 07:17:24 GMT - Antoine Martin:

Note for those landing here: NVENC is not safe to use in versions older than 0.15 because of a context leak due to threading.

Sat, 23 Jan 2021 04:58:06 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/517