xpra icon
Bug tracker and wiki

Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#595 closed task (worksforme)

latest nvidia drivers break YUV444 encoding

Reported by: Antoine Martin Owned by: Smo
Priority: blocker Milestone: 0.14
Component: encodings Version: trunk
Keywords: Cc:

Description

Not sure which versions work and which versions do not (this seems to correspond to the changes which also changed the list of license keys, see NVENC developer key?), so r6741 allows us to turn it off via the XPRA_NVENC_YUV444P=0 env var.

Since the YUV444P mode is completely undocumented, and nvidia developers even claimed that it wasn't possible to use it (it is, well it was...) - this is not going to be fun.

Change History (13)

comment:1 Changed 5 years ago by Antoine Martin

Owner: changed from Antoine Martin to Antoine Martin
Status: newassigned

r6781 added nvidia driver version logging to make it easier to instantly see which version is loaded.
Maybe we can also use it to give warnings about which versions are known to break and / or disable YUV444 when we know it isn't going to work.

More tests with Fedora 20, kernel 3.14.7-200.fc20.x86_64 and a GTX 760 OC:

  • 304.121 - not supported, libnvidia-encode.so not provided
  • 319.76 works but requires an older kernel (ie: 3.11.10-301.fc20)
  • 325.15 does not build, even with older 3.11 kernel
  • 331.79 works but requires different keys than earlier builds
  • 334.21 works but requires an older kernel (ie: 3.11.10-301.fc20) or a patched installer
  • 337.25 breaks YUV444 support, requires the "newer" set of keys
  • 340.17 fails completely with failed to get preset config for ... This indicates that an invalid struct version was used by the client. - looks to me like this is meant to be used with a newer SDK.
Last edited 5 years ago by Antoine Martin (previous) (diff)

comment:2 Changed 5 years ago by Antoine Martin

Resolution: fixed
Status: assignedclosed

r6811 disables YUV444 when we detect a buggy nvidia driver version. Can also be re-enabled via env var if needed.

comment:3 Changed 5 years ago by Antoine Martin

Resolution: fixed
Status: closedreopened

Looks like other versions break... re-opening.

comment:4 Changed 5 years ago by Antoine Martin

Owner: changed from Antoine Martin to Smo
Status: reopenednew

r6831 blacklists some more versions and disables YUV444 mode.

Please check:

  • that the list in comment:1 is correct
  • that the blacklist in r6831 is also correct, you can verify by force enabling it via the env var: you should get errors.

We may need to blacklist some more versions..

comment:5 Changed 5 years ago by Smo

Confirmed

331.79 Works with new key
340.17 Fails completely with same error

Will test with 331.79 for now

Last edited 5 years ago by Antoine Martin (previous) (diff)

comment:6 Changed 5 years ago by Antoine Martin

smo: 331.79 is not as good as older versions: no YUV444...

comment:7 Changed 5 years ago by Antoine Martin

Priority: majorblocker

(raising: blocker for release)

comment:8 Changed 5 years ago by Smo

Testing some new drivers from

http://www.nvidia.ca/object/unix.html

Trying

340.24

Produces this output

2014-08-04 18:12:57,188 nvenc: found nvidia kernel module version 340.24
2014-08-04 18:12:57,206 CUDA initialization (this may take a few seconds)
2014-08-04 18:12:59,396 CUDA 6.0.0 / PyCUDA 2013.1.1, found 2 device(s):
2014-08-04 18:12:59,564   + GeForce GTX 750 Ti @ 0000:83:00.0 (memory: 98% free, compute: 5.0)
2014-08-04 18:12:59,689   + GeForce GTX 650 @ 0000:09:00.0 (memory: 97% free, compute: 3.0)
2014-08-04 18:13:00,111 pulseaudio server started with pid 5087
2014-08-04 18:13:00,120 started child 'xterm -fg white -bg black' with pid 5089
2014-08-04 18:13:00,148 xpra server version 0.14.0 (r7043)
2014-08-04 18:13:00,148 running with pid 5049
2014-08-04 18:13:00,368 xpra is ready.
2014-08-04 18:13:00,546 failed to get preset config for default (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / B2DFB705-4EBD-4C49-9B5F-24A777D3E587): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,546 failed to get preset config for low-latency (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 49DF21C5-6DFA-4FEB-9787-6ACC9EFFB726): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,546 failed to get preset config for hp (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 60E4C59F-E846-4484-A56D-CD45BE9FDDF6): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for hq (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 34DBA71D-A77B-4B8F-9C3E-B6D5DA24C012): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for bd (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 82E3E450-BDBB-4E40-989C-82A90DF9EF32): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for low-latency-hq (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / C5F733B9-EA97-4CF9-BEC2-BF78A74FD105): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for low-latency-hp (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 67082A44-4BAD-48FA-98EA-93056D150A58): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 failed to get preset config for None (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 7ADD423D-D035-4F6F-AEA5-50885658643C): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,547 nvenc: found some unknown presets: 7ADD423D-D035-4F6F-AEA5-50885658643C
2014-08-04 18:13:00,548 failed to get preset config for hp (6BC82762-4E63-4CA4-AA85-1E50F321F6BF / 60E4C59F-E846-4484-A56D-CD45BE9FDDF6): This indicates that an invalid struct version was used by the client.
2014-08-04 18:13:00,583 Warning: nvenc video encoder failed: could not find preset hp
Last edited 5 years ago by Smo (previous) (diff)

comment:9 Changed 5 years ago by Smo

Tried today with new beta driver 343.13 on CentOS 7

Identical messages as comment:8

comment:10 Changed 5 years ago by Antoine Martin

Resolution: worksforme
Status: newclosed

I believe the version blacklisting works.

If we find newer versions of the driver that need to be added to the blacklist, add them here.

comment:11 Changed 5 years ago by Antoine Martin

Milestone: 0.150.14

Was done for milestone 0.14

comment:12 Changed 5 years ago by Antoine Martin

Worth noting that the CUDA 6.5 SDK is not compatible with the 331.79 drivers (6.0 is fine).

You get:

UserWarning: Failed to import the CUDA driver interface, with an error message  \
indicating that the version of your CUDA header does not match the version of your CUDA driver

comment:13 Changed 5 years ago by Antoine Martin

My findings so far:

  • the Maxwell based 750 Ti really is much faster than the Kepler based 760 OC! (despite its lower clock speed, lower number of CUDA cores, etc..)
  • almost no difference in terms of performance between SDK v3 and v4 - as expected since this is all done in hardware (using the same settings - v4 does have interesting new options though)
  • newer kernels (3.16) perform better than older ones (3.11) by about 10%, with the same CUDA and same driver versions
  • driver version 340.x has different performance characteristics than previous versions (tested on GTX750 Ti only): more consistent speed, even with smaller encoding sizes, though a slightly lower performance at 1080p - it looks like the overheads are lower somehow
  • as expected quality=100 with speed=0 is the slowest setting (10 to 20% slower - except with driver 340.x, see above), all the other settings perform very similarly
  • I can saturate the 2GB memory of my GTX 760 OC with just 22 1080p contexts: cuCtxCreate failed: out of memory (also being used for my own desktop, so it was only ~75% free to begin with, so 2GB per card is OK for 1080p I think)
  • we have a big problem with parallel encoding (maybe I just tested wrong previously?): the threaded, parallel encoding tests using 10 threads show that we are quickly reaching the context limit (Tested with both the GTX 760 and 750 Ti, the latter is a bit faster as per above - only tested with drivers 340.x and 343.x):
    • at 1080p: sequential encoding goes at about 128MPixels/s, with 10 threads the speed drops to a 15MPixels/s. This only gives us 8fps, which is just not good enough! (the 750 Ti does 12 fps for 10 contexts)
    • at 720p: from 128MPixels/s down to 9MPixels/s with the 760 OC, and 184 down to 20 with the 750 Ti. That's ~11fps and 24 fps respectively. (and that's a best case scenario)

Before I put this on the wiki somewhere, I want to record my testing with all the combinations of CUDA, kernel, drivers, SDK... compatibility issues:

  • NVENC v4 requires CUDA 6.5 (you get invalid struct version was used with 6.0 and earlier)
  • driver branch 331 does not work with CUDA 6.5, errors out with undefined symbol: cuMemHostRegister_v2 (CUDA 6.0 OK, not tested 5.5)
  • driver branch 331.x does not support 4k monitors
  • driver branch 340.x does support 4k
  • drivers that require the "new" keys do not support YUV444: forcing it with XPRA_NVENC_YUV444P=1 gives us the same performance as those that do work, but with data that cannot be used by the client (and managed to crash the client once! fuzzing of sorts..)
  • driver versions that require "old" keys: 331.67, ... "new" keys: 331.89, 331.104, 340.*, 343.*

For testing, use the test_nvenc3 and test_nvenc4 scripts.

Note: pycuda must be rebuilt against the version of CUDA being tested - which takes time..

As of r7944, we can build both nvenc modules (v3 and v4) at the same time with:

./setup.py --with-nvenc3 --with-nvenc3

Which should be auto detected already since you will need the pkgconfig file for each SDK.
To choose which CUDA SDK to build against use:

./setup.py --with-cuda=6.5

You will need at least 6.5 to build the nvenc4 module, nvenc3 works with 5.0 onwards.


Testing with a GTX 750 Ti.

  • Fedora kernel 3.11 with NVENC SDK 3:
    • driver 319.76: card is "not supported"
    • driver 331.67:
      • CUDA 6.5: No
      • CUDA 6.0: OK
        • NV12 (in MPixels/s):
          • 1920x1080: 181 to 218
          • 1024x768: 123 to 158
        • YUV444:
          • 1920x1080: 40 to 44
          • 1024x768: 34 to 42
    • driver 331.89:
      • CUDA 6.5: No
      • CUDA 6.0: OK
        • NV12:
          • 1920x1080: 182 to 210
          • 1024x768: 141 to 197
        • YUV444: broken
    • driver 331.104:
      • CUDA 6.5: No
      • CUDA 6.0: OK
        • NV12:
          • 1920x1080: 188 to 226
          • 1024x768: 141 to 201
        • YUV444: broken
    • driver 340.46:
      • CUDA 6.0: No
      • CUDA 6.5:
  • Fedora kernel 3.16 with NVENC SDK 3:
    • driver 319.x: cannot build
    • driver 331.67:
      • CUDA 6.5: ?
      • CUDA 6.0:
        • NV12:
          • 1920x1080: 212 to 240
          • 1024x768: 158 to 201
        • YUV444:
          • 1920x1080: 46 to 47
          • 1024x768: 34 to 44
    • driver 331.89:
      • CUDA 6.5: ?
      • CUDA 6.0:
        • NV12:
          • 1920x1080: 212 to 240
          • 1024x768: 158 to 203
        • YUV444: broken
    • driver 331.104:
      • CUDA 6.5: ?
      • CUDA 6.0:
        • NV12:
          • 1920x1080: 212 to 243
          • 1024x768: 158 to 204
        • YUV444: broken
    • driver 340.46:
      • CUDA 6.5: ?
      • CUDA 6.0:
        • NV12:
          • 1920x1080: 206 to 214
          • 1024x768: 205 to 207
        • YUV444: broken
  • Fedora kernel 3.16 with NVENC SDK 4:
    • ALL 331.x drivers tested (331.67, 331.89, 331.104):
      • CUDA 6.0: version mismatch
      • CUDA 6.5: ?
    • driver 340.46:
      • CUDA 6.0: No
      • CUDA 6.5:
        • NV12:
          • 1920x1080: 205 to 215
          • 1024x768: 204 to 208
        • YUV444: new support code pending

Testing with a GTX 760 OC:

  • Fedora NVENC SDK 4 kernel 3.16
    • drivers 331.x (331.79, 331.89):
      • CUDA 6.0: "version mismatch"
      • CUDA 6.5: No
    • drivers 343.22:
      • CUDA 6.0: ?
      • CUDA 6.5:
        • NV12:
          • 1920x1080: 125 to 131
          • 1024x768: 83 to 121
  • Fedora NVENC SDK 3 kernel 3.16
    • drivers 343.22:
      • CUDA 6.0: no
      • CUDA 6.5:
        • NV12:
          • 1920x1080: 125 to 133
          • 1024x768: 84 to 116

Will edit this ticket more after reboots and code updates..

Last edited 5 years ago by Antoine Martin (previous) (diff)
Note: See TracTickets for help on using tickets.