split from #370:
max_threads_per_block- doesn't seem to be causing problems yet
nvEncReconfigureEncoder(with edge resistance if it causes a new IDR frame)
File "encoder.pyx", line 1588, in xpra.codecs.nvenc.encoder.Encoder.compress_image (xpra/codecs/nvenc/encoder.c:12085) File "encoder.pyx", line 1624, in xpra.codecs.nvenc.encoder.Encoder.do_compress_image (xpra/codecs/nvenc/encoder.c:12598) LogicError: cuMemcpyHtoD failed: invalid/unknown error code
gpuGetMaxGflopsDeviceId: max_gflops = device_properties.multiProcessorCount * device_properties.clockRate;
YUV444 for NVENC: using 3 pass encoding (one for each of Y, U and V)
Updated TODO list:
nvEncReconfigureEncoderon the fly?
YUV444P fix in r5667: so this is what the undocumented
-d nvenc, auto-scaling turned off with
compress_image(..) returning 129399 bytes (1.4%), complete compression for frame 645 took 39.4ms
compress_image(..) returning 33506 bytes (0.4%), complete compression for frame 365 took 17.5ms
YUV420P is much faster than the 3-pass
YUV444P mode, as expected.
This will have to do for this release, most of the important remaining items are too intrusive to change this late in the release cycle.
smo: please test:
YUV444Pmode (see above), compare it with
I'm not able to run glxspheres64 because of the nature of the setup. Is there there something else that I could try that doesn't involve GL?
I'm hoping to close this as it seems to work well for me but I want to post some information from my setup before closing.
I only use
glxgears often because they produce lots of frames without requiring any external data, but playing a video will do just as well.
Note: you may be able to run GL stuff against software mesa rendering, without needing an X11 server running and with the nvidia
libGL installed on the system, by using
Needs more testing with newer NVIDIA drivers / cuda sdk but will close now until there is something to comment on.
Did you measure the bitrate and performance as per comment:6?
FYI: r6699 allows us to specify multiple license keys in CSV format:
XPRA_NVENC_CLIENT_KEY="key1,key2" /usr/bin/xpra start ...
Which makes it easier to deal with the constant nvidia license key driver breakage
Please test with nvenc v4, see #653
I am taking this ticket back as
YUV444 in nvenc4 is completely different from SDK v4 and is going to require quite a few changes - which should give us a nice performance improvement. Will re-assign for testing + benchmarking afterwards.
Moving the new YUV444 mode to a new ticket so this can get more testing, together with #653.
smo: not sure who should test this, but it's been ready for months, time to get on it.
Here are some performance numbers from 2 cards
quality=100 (YUV444P mode) GTX 650 compress_image(..) returning 54939 bytes (1.1%), complete compression for frame 6875 took 9.3ms compress_image(..) returning 54939 bytes (1.1%), complete compression for frame 6876 took 8.8ms GTX 750 ti compress_image(..) returning 164794 bytes (3.4%), complete compression for frame 64 took 10.2ms compress_image(..) returning 164816 bytes (3.4%), complete compression for frame 63 took 13.3ms GTX 970 compress_image(..) returning 325881 bytes (4.5%), complete compression for frame 151 took 14.5ms compress_image(..) returning 321035 bytes (4.5%), complete compression for frame 152 took 14.8ms
quality=50 (YUV420P mode) GTX 650 compress_image(..) returning 15659 bytes (0.3%), complete compression for frame 310 took 8.6ms compress_image(..) returning 15617 bytes (0.3%), complete compression for frame 311 took 8.5ms GTX 750 ti compress_image(..) returning 10193 bytes (0.2%), complete compression for frame 1085 took 11.7ms compress_image(..) returning 10193 bytes (0.2%), complete compression for frame 1086 took 11.0ms GTX 970 compress_image(..) returning 18628 bytes (0.4%), complete compression for frame 178 took 9.2ms compress_image(..) returning 18356 bytes (0.4%), complete compression for frame 179 took 8.3ms
I have 1 more card to test and I will update this when I do some more testing.
Interesting to see the GTX 970 going more slowly than I expected, more slowly than when I had tested it IIRC.
Some things worth mentioning:
/etc/xpraor in the per-user directory
~/.xpra. The environment variable
XPRA_NVENC_CLIENT_KEYstill overrides all keys defined in those files. As of r8778, you can also put your license keys in
nvenc.keyswhich will be used by both codecs. (and by nvenc5 and later when I get around to it) You can mix license keys for different driver versions and the code will validate them and figure out which ones can be used, but:
I agree there are many factors when trying to benchmark. It would be a good idea to come up with a better way.
I'm closing this for now as it is working.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/466