split from #370:
max_block_sizes
, max_grid_sizes
and max_threads_per_block
- doesn't seem to be causing problems yet
NV12
/ YUV444P
)
nvEncReconfigureEncoder
(with edge resistance if it causes a new IDR frame)
inputBuffer
)
File "encoder.pyx", line 1588, in xpra.codecs.nvenc.encoder.Encoder.compress_image (xpra/codecs/nvenc/encoder.c:12085) File "encoder.pyx", line 1624, in xpra.codecs.nvenc.encoder.Encoder.do_compress_image (xpra/codecs/nvenc/encoder.c:12598) LogicError: cuMemcpyHtoD failed: invalid/unknown error code
Lower priority:
gpuGetMaxGflopsDeviceId: max_gflops = device_properties.multiProcessorCount * device_properties.clockRate;
YUV444 for NVENC: using 3 pass encoding (one for each of Y, U and V)
Updated TODO list:
nvEncReconfigureEncoder
on the fly?
r5515 caused a big memory leak client side, fixed in r5542
Important YUV444P
fix in r5667: so this is what the undocumented colourPlaneId
does!
r5664 and r5666 also allow us to tune the bitrate based on the usual "speed" setting and the encoder input size, using an exponential scale to prefer low bandwidth (see changeset for details).
Testing with glxspheres64
and -d nvenc
, auto-scaling turned off with XPRA_SCALING=0
:
quality=100
(YUV444P
mode), typically:
compress_image(..) returning 129399 bytes (1.4%), complete compression for frame 645 took 39.4ms
quality=50
(YUV420P
mode), typically:
compress_image(..) returning 33506 bytes (0.4%), complete compression for frame 365 took 17.5ms
So YUV420P
is much faster than the 3-pass YUV444P
mode, as expected.
Note: r5668 enables YUV444P
for quality>=50%, but with r5669 we don't bother with it when downscaling.
This will have to do for this release, most of the important remaining items are too intrusive to change this late in the release cycle.
Remaining items moved to #538 and #564
smo: please test:
YUV444P
mode (see above), compare it with YUV420P
.
I'm not able to run glxspheres64 because of the nature of the setup. Is there there something else that I could try that doesn't involve GL?
I'm hoping to close this as it seems to work well for me but I want to post some information from my setup before closing.
I only use glxspheres
and glxgears
often because they produce lots of frames without requiring any external data, but playing a video will do just as well.
Note: you may be able to run GL stuff against software mesa rendering, without needing an X11 server running and with the nvidia libGL
installed on the system, by using LD_SO_PRELOAD
tricks.
Needs more testing with newer NVIDIA drivers / cuda sdk but will close now until there is something to comment on.
Did you measure the bitrate and performance as per comment:6?
FYI: r6699 allows us to specify multiple license keys in CSV format:
XPRA_NVENC_CLIENT_KEY="key1,key2" /usr/bin/xpra start ...
Which makes it easier to deal with the constant nvidia license key driver breakage
Please test with nvenc v4, see #653
I am taking this ticket back as YUV444
in nvenc4 is completely different from SDK v4 and is going to require quite a few changes - which should give us a nice performance improvement. Will re-assign for testing + benchmarking afterwards.
Moving the new YUV444 mode to a new ticket so this can get more testing, together with #653.
smo: not sure who should test this, but it's been ready for months, time to get on it.
Here are some performance numbers from 2 cards
quality=100 (YUV444P mode) GTX 650 compress_image(..) returning 54939 bytes (1.1%), complete compression for frame 6875 took 9.3ms compress_image(..) returning 54939 bytes (1.1%), complete compression for frame 6876 took 8.8ms GTX 750 ti compress_image(..) returning 164794 bytes (3.4%), complete compression for frame 64 took 10.2ms compress_image(..) returning 164816 bytes (3.4%), complete compression for frame 63 took 13.3ms GTX 970 compress_image(..) returning 325881 bytes (4.5%), complete compression for frame 151 took 14.5ms compress_image(..) returning 321035 bytes (4.5%), complete compression for frame 152 took 14.8ms
quality=50 (YUV420P mode) GTX 650 compress_image(..) returning 15659 bytes (0.3%), complete compression for frame 310 took 8.6ms compress_image(..) returning 15617 bytes (0.3%), complete compression for frame 311 took 8.5ms GTX 750 ti compress_image(..) returning 10193 bytes (0.2%), complete compression for frame 1085 took 11.7ms compress_image(..) returning 10193 bytes (0.2%), complete compression for frame 1086 took 11.0ms GTX 970 compress_image(..) returning 18628 bytes (0.4%), complete compression for frame 178 took 9.2ms compress_image(..) returning 18356 bytes (0.4%), complete compression for frame 179 took 8.3ms
I have 1 more card to test and I will update this when I do some more testing.
Interesting to see the GTX 970 going more slowly than I expected, more slowly than when I had tested it IIRC.
Some things worth mentioning:
nvenc3.keys
or nvenc4.keys
either in /etc/xpra
or in the per-user directory ~/.xpra
. The environment variable XPRA_NVENC_CLIENT_KEY
still overrides all keys defined in those files. As of r8778, you can also put your license keys in nvenc.keys
which will be used by both codecs. (and by nvenc5 and later when I get around to it) You can mix license keys for different driver versions and the code will validate them and figure out which ones can be used, but:
I agree there are many factors when trying to benchmark. It would be a good idea to come up with a better way.
I'm closing this for now as it is working.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/466