xpra icon
Bug tracker and wiki

Opened 6 years ago

Closed 6 years ago

#733 closed defect (fixed)

nvenc out of memory, leaks, crashes

Reported by: Antoine Martin Owned by: Nick Centanni
Priority: blocker Milestone: 0.15
Component: encodings Version: trunk
Keywords: nvenc Cc:

Description (last modified by Antoine Martin)

Not sure if this is due to nvenc v4 (#653), the driver version, changes in our code, or what.. But it's pretty bad.

  • r8085 adds some debugging to xpra info, it shows that when we resize a window, we go through many cycles of creating then destroying nvenc contexts before we settle down (pipeline scoring playing the yo-yo). This needs to be fixed (but is actually quite useful for testing!), but in itself should not be critical... except that:
  • nvenc leaks memory, to the tune of about ~50MB per context, resize the window once and you can go through 20 contexts and about 1GB of memory! This is pinned device memory by the looks of this error message: MemoryError: cuCtxCreate failed: out of memory which fires from the pycuda code device.make_context(flags=cf.SCHED_YIELD | cf.MAP_HOST).

I've also seen it coming up as error during picture encoding - returned 10: ... during self.functionList.nvEncEncodePicture.
Once this happens, it is also possible to hit AssertionError: no NVENC device found! since there is no more memory available on the card!

I keep an eye on the server state using:

watch 'xpra info | egrep -e \
    "window\[[0-9]*\].encoder=|encoder_height|encoder_width|last_failure|context_count|device_count|generation|kernel"'

The memory does not go back down when we disconnect the client either..

This is a blocker for #653, #466

Attachments (3)

glxgears-resize.sh (157 bytes) - added by Antoine Martin 6 years ago.
simple xdotool script to constantly resize the glxgears window to try to cause the server hang
video-lockless.patch (32.6 KB) - added by Antoine Martin 6 years ago.
work in progress patch which does everything in the encode thread and removes locking
glxgears-resize.2.sh (254 bytes) - added by Antoine Martin 6 years ago.
better glxgears test script: creates and destroys glxgears instances as well as resizing them

Download all attachments as: .zip

Change History (11)

comment:1 Changed 6 years ago by Antoine Martin

Description: modified (diff)
Owner: changed from Antoine Martin to Antoine Martin
Status: newassigned

comment:2 Changed 6 years ago by Antoine Martin

I think it is a threading issue, r8097 adds tests for threading.
Problem is that the compression code can create a new video encoding pipeline from multiple threads: from the encode thread, from the timer worker thread, etc. And the same thing goes for closing the encoder.

I believe we need to do ALL of these things, all in the same thread.

comment:3 Changed 6 years ago by Antoine Martin

Confirmed as a threading issue, this trivial patch prevents the lockups, but also causes significant stuttering as we always evaluate the encoding pipeline in the encoding thread - this penalizes x264 and vpx unnecessarily:

--- src/xpra/server/window_video_source.py	(revision 8097)
+++ src/xpra/server/window_video_source.py	(working copy)
@@ -614,7 +614,7 @@
                 self._lossless_threshold_base = min(80, 10+self._current_speed/5)
                 self._lossless_threshold_pixel_boost = 90
 
-        if self._video_encoder:
+        if self._video_encoder and False:
             self.check_pipeline_score(force_reload)
 
     def check_pipeline_score(self, force_reload):

comment:4 Changed 6 years ago by Antoine Martin

I thought I had managed to reproduce it with the patch applied, thinking it wasn't a fix bug that it just made it harder to hit.
It even locked up my X11 session at one point! Not sure if this is related:

[407579.282272] NVRM: Xid (PCI:0000:01:00): 31, Ch 0000000d, engmask 00002100, intr 10000000

I also see this error on the client side sometimes:

Exception: avcodec decoding failed to decode 19847 bytes of h264 data (frame 0, step 1 of 1)

Always on frame 0. As if the data is either invalid or incomplete.

But now that I am trying hard to reproduce it, I cannot hit the bug!?!
I thought it might be caused by small dimensions, no change. Frequent changes: managed over 200 generations without problem... etc
And just as I write this: it happens again!

Sending a SIGUSR1 to the server process shows some threads doing:

  • condition.wait from _write_format_thread_loop, _read_parse_thread_loop, _write_thread_loop and _write_format_thread_loop, background_worker.
  • untilConcludes from _read_thread_loop * 2 (info thread)
  • and itself: dump_frames from sigusr1
  • this confusing trace:
    140729086670592 - <frame object at 0x7ffdf801c0f0>:
      File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
        self.__bootstrap_inner()
      File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
        self.run()
      File "/usr/lib64/python2.7/threading.py", line 764, in run
        self.__target(*self.__args, **self.__kwargs)
      File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 1582, in encode_loop
        fn_and_args = self.compression_work_queue.get(True)
      File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 1201, in make_data_packet_cb
        refreshlog("auto refresh: %5s screen update (quality=%3i), %s (region=%s, refresh regions=%s)", encoding, actual_quality, msg, region, self.refresh_regions)
      File "/usr/lib64/python2.7/site-packages/xpra/server/window_source.py", line 1470, in make_data_packet
        return packet
      File "/usr/lib64/python2.7/site-packages/xpra/server/window_video_source.py", line 1192, in video_encode
        return self._video_encoder.get_encoding(), Compressed(encoding, data), client_options, width, height, 0, 24
      File "/usr/lib64/python2.7/site-packages/xpra/log.py", line 163, in __call__
        self.log(logging.DEBUG, msg, *args, **kwargs)
    

Thoughts: maybe this has something to do with the xpra info connection, there were reports of server hangs IIRC. Or maybe it's a threading issue in the auto-refresh. So:

  • must try to reproduce without info running in parallel - or with aggressive info, to see if it crashes sooner
  • try with auto-refresh turned off
  • try without the video regions code (who knows..)

With full nvenc debug enabled, the lockup looks like this:

2014-11-13 17:56:26,762 compress_image(XShmImageWrapper(BGRX: 0, 0, 621, 417), {}) thread=<Thread(encode, started daemon 140456921536256)>
2014-11-13 17:56:26,763 compress_image(..) host buffer populated with 1035828 bytes (max 4259840)
2014-11-13 17:56:26,764 compress_image(..) input buffer copied to device
2014-11-13 17:56:26,765 compress_image(..) kernel BGRA_to_NV12 executed - CSC took 0.2 ms
2014-11-13 17:56:26,766 nvEncMapInputResource(0x7fbeace9e420)
2014-11-13 17:56:26,789 compress_image(..) device buffer mapped to 0x7fbe9c27b450
2014-11-13 17:56:26,790 nvEncEncodePicture(0x7fbeace9f040)
2014-11-13 17:56:26,795 compress_image(..) encoded in 29.6 ms
2014-11-13 17:56:26,796 nvEncLockBitstream(0x7fbeace9ea30)

2014-11-13 17:56:48,670 compress_image(..) output buffer locked, bitstreamBufferPtr=0x7fbe96a8f000
found 7 frames:
2014-11-13 17:56:48,671 nvEncUnlockBitstream(0x7fbe9c26cb30)
140456570648320 - <frame object at 0x29c98f0>:
2014-11-13 17:56:48,674 nvEncUnmapInputResource(0x7fbe9c26cb30)
2014-11-13 17:56:48,683 compress_image(..) download took 21888.6 ms
2014-11-13 17:56:48,684 compress_image(..) returning 5930 bytes (0.1%), complete compression for frame 17 took 21921.4ms

There's a 22 second gap calling nvEncLockBitstream, and even the sigusr1 signal was delayed I think.

Maybe we should release the gil?

Last edited 6 years ago by Antoine Martin (previous) (diff)

Changed 6 years ago by Antoine Martin

Attachment: glxgears-resize.sh added

simple xdotool script to constantly resize the glxgears window to try to cause the server hang

comment:5 Changed 6 years ago by Antoine Martin

Keywords: nvenc added

With r8104 and the change from comment:3, I am not getting any lockups anymore. Tested with the glxgears resize script and by hand.
I can reach generation>1000 (that's more than 1000 contexts created and then destroyed), without any server hangs.

Now, I just need to find a better solution than comment:3 ..

Changed 6 years ago by Antoine Martin

Attachment: video-lockless.patch added

work in progress patch which does everything in the encode thread and removes locking

Changed 6 years ago by Antoine Martin

Attachment: glxgears-resize.2.sh added

better glxgears test script: creates and destroys glxgears instances as well as resizing them

comment:6 Changed 6 years ago by Antoine Martin

Owner: changed from Antoine Martin to alas
Status: assignednew

Looks fixed with r8108.

@afarr: can you break it?

comment:7 Changed 6 years ago by alas

Owner: changed from alas to Nick Centanni

Looks like this is probably in good shape (you said you ran the server for hours without problem?) ... can go ahead and close unless you think there's some chance of crashing it by going crazy.

comment:8 Changed 6 years ago by Nick Centanni

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.