{{{#!div class="box" [[Image(http://xpra.org/icons/speed.png)]] = CSC Performance = The point of providing different CSC implementations is to be able to get the best performance out of the hardware. Unfortunately, it is impossible to say in advance for definite which module will be the fastest on any given piece of hardware, though swscale is a good bet. Also, some modules offload to the GPU while others remain on the CPU only, and often this will have an impact on the rest of the system. Some modules take longer to initialize, which may or may not be an issue. So the best way to choose the right CSC module is to test each one and see the cost/benefits. }}} {{{#!div class="box" == Running the performance tests == You can get your own performance figures by running the tests: * [/browser/xpra/trunk/src/tests/xpra/codecs/test_csc_cython.py xpra.codecs.test_csc_cython] * [/browser/xpra/trunk/src/tests/xpra/codecs/test_csc_opencl.py xpra.codecs.test_csc_opencl] (you can use environment variables to choose the {{{OpenCL}}} backend to use) * [/browser/xpra/trunk/src/tests/xpra/codecs/test_csc_swscale.py xpra.codecs.test_csc_swscale] To prevent conflicts between the source tree and the installed version of xpra, the easiest way to run the tests is to check out the tests in a temporary area: {{{ mkdir tmp && cd tmp svn co http://xpra.org/svn/Xpra/trunk/src/tests/ PYTHONPATH=. tests/xpra/codecs/test_csc_cython.py PYTHONPATH=. tests/xpra/codecs/test_csc_opencl.py PYTHONPATH=. tests/xpra/codecs/test_csc_swscale.py }}} }}} {{{#!div class="box" == Caveats == * ensure that there are no other tasks running on the system... even having an X11 GUI will use the GPU, which will take some memory, bandwidth and performance out of it * ensure that the CPU/GPU are not running at lower clock speeds to save power (ie: powermizer for nvidia, CPU governor on Linux) * run the tests repeatedly and average the results - results that vary too widely should be investigated or simply discarded }}} == Results with 0.16.0 pre-release == * 1920x1080 in MPixels/s * ffmpeg version 2.7.1 * pyopencl 2015.1 * Cython 0.22.1 ||= Module =||= Options (ICD) =||= CPU =||= GPU =||||||= BGRX to YUV =||||||= YUV to BGRX || ||= =||= =||= =||= =||= YUV420P =||= YUV422P =||= YUV444P =||= YUV420P =||= YUV422P =||= YUV444P =|| ||cython|| ||AMD X4 945||GTX 970|| 23|| || || 23|| || || ||swscale|| ||AMD X4 945||GTX 970|| 152|| 150|| 119|| 385|| 462|| 182|| ||opencl||NVIDIA||AMD X4 945||GTX 970|| 150|| 158|| 144|| ||cython|| ||Core i5-4440||GTX 760|| 70|| || || 69|| || || ||swscale|| ||Core i5-4440||GTX 760|| 307|| 303|| 248|| 854|| 820|| 417|| ||opencl||NVIDIA||Core i5-4440||GTX 760|| 310|| 341|| 301|| ||opencl||Intel||Core i5-4440||GTX 760|| 168|| 160|| 160|| ---- == Results with 0.15.4 == * 1920x1080 in MPixels/s * ffmpeg version 2.7.1 * pyopencl 2015.1 * Cython 0.22.1 ||= Module =||= Options (ICD) =||= CPU =||= GPU =||||||= BGRX to YUV =||||||= YUV to BGRX || ||= =||= =||= =||= =||= YUV420P =||= YUV422P =||= YUV444P =||= YUV420P =||= YUV422P =||= YUV444P =|| ||cython|| ||AMD X4 945||GTX 970|| 72|| || ||64|| || || ||swscale|| ||AMD X4 945||GTX 970|| 304|| 303|| 247|| 627|| 777|| 345|| ||opencl||NVIDIA||AMD X4 945||GTX 970|| 317|| 341|| 293|| ||opencl||Intel||AMD X4 945||GTX 970|| 168|| 162|| 163|| ---- == Results with 0.14.28 == * 1920x1080 in MPixels/s * ffmpeg version 2.7.1 * pyopencl 2015.1 * Cython 0.22.1 ||= Module =||= Options (ICD) =||= CPU =||= GPU =||||||= BGRX to YUV =||||||= YUV to BGRX || ||= =||= =||= =||= =||= YUV420P =||= YUV422P =||= YUV444P =||= YUV420P =||= YUV422P =||= YUV444P =|| ||cython|| ||AMD X4 945||GTX 970|| 23|| || || 21|| || || ||swscale|| ||AMD X4 945||GTX 970|| 151|| 149|| 122|| 385|| 465|| 182|| ||opencl||NVIDIA||AMD X4 945||GTX 970|| 380|| 329|| 281|| ||opencl||Intel||AMD X4 945||GTX 970|| 311|| 237|| 281|| ||cython|| ||Core i5-4440||GTX 760|| 71|| || || 64|| || || ||swscale|| ||Core i5-4440||GTX 760|| 306|| 303|| 251|| 554|| 785|| 387|| ||opencl||NVIDIA||Core i5-4440||GTX 760|| 143|| 106|| 86|| 99|| 92|| 75|| ||opencl||AMD||Core i5-4440||GTX 760|| 143|| 116|| 89|| 105|| 84|| 73|| ---- == Results with 0.11.0 release == All tests at 1920x1080 in MPixels/s ||= Module =||= Options =||= CPU =||= GPU =||||||= BGRX to YUV =||||||= YUV to BGRX || ||= =||= =||= =||= =||= YUV420P =||= YUV422P =||= YUV444P =||= YUV420P =||= YUV422P =||= YUV444P =|| ||cython|| ||AMD X4 945||GTX 760||47|| ||swscale|| ||AMD X4 945||GTX 760||119||163||132||199||345||229|| ||nvcuda|| ||AMD X4 945||GTX 760||126||109||114|| ||opencl||NVIDIA||AMD X4 945||GTX 760||382||326||278||275||315||266|| ||opencl||AMD||AMD X4 945||GTX 760||61||54||43||44||37||24|| ||opencl||Intel||AMD X4 945||GTX 760||57||39||40||43||37||19|| ||cython|| ||Intel i3-3110M||Intel HD 4000||73|| ||swscale|| ||Intel i3-3110M||Intel HD 4000||150||199||164||341||361||351|| ||opencl||AMD||Intel i3-3110M||Intel HD 4000||70||70||62||49||43||34|| ||opencl||Intel||Intel i3-3110M||Intel HD 4000||159||119||152|| ||cython|| ||Intel i7-4500U||Intel HD 4400||105|| ||swscale|| ||Intel i7-4500U||Intel HD 4400||206||284||228||458||480||362 ||cython|| ||2xIntel Xeon E5-2670||GTX 760||92|| ||swscale|| ||2xIntel Xeon E5-2670||GTX 760||184||257||199||343||446||421 ||nvcuda|| ||2xIntel Xeon E5-2670||GTX 760||84||80||76|| ||opencl||NVIDIA||2xIntel Xeon E5-2670||GTX 760||333||289||222||242||269||233 ||cython|| ||AMD FX-6100||Radeon HD 6870||60|| ||swscale|| ||AMD FX-6100||Radeon HD 6870||175||168||138||433||577||353 ||opencl||AMD||AMD FX-6100||Radeon HD 6870||235||232||201||204||192||210 ---- == Previous Results == These values were obtained with r4272 and later, different combinations may have been tested with different revisions and should therefore '''not be trusted'''. (results are in MPixels/s): * 1920x1080 {{{RGB}}} to {{{YUV???P}}}: ||= Module =||= CPU/GPU =||= YUV420P =||= YUV422P =||= YUV444P =|| ||swscale||AMD FX 8150||142||182||151|| ||swscale||AMD X4 945||120||165||131|| ||swscale||AMD X2 260||124||170||140|| ||swscale||Intel Core i3-3110M||164||229||181|| ||swscale||2xIntel Xeon E5-2670||215||322||253|| ||CUDA-Nvidia||AMD X4 945 + GTS 450||366||341||290|| ||CUDA-Nvidia||2xIntel Xeon E5-2670 / 2xK1||173||177||160|| ||OpenCL-Nvidia||AMD FX8150 + GTX 760||345||303||254|| ||OpenCL-Nvidia||AMD X4 945 + GTS 450||357||303||260|| ||OpenCL-Nvidia||2xIntel Xeon E5-2670 / 2xK1||210||211||192|| ||OpenCL-Nvidia||Intel Xeon E5-2620 / GTX 650ti||502||457||399|| ||OpenCL-Intel||AMD FX 8150||129||114||119|| ||OpenCL-Intel||Intel Core i3-3110M||141||92||53|| ||OpenCL-Intel||2xIntel Xeon E5-2670||472||412||263|| ||OpenCL-Intel||Intel Xeon E5-2620||254||213||131|| ||OpenCL-Intel||Intel i7-4500U||155||125||166|| ||OpenCL-AMD||AMD FX 8150 + Radeon HD5450||110||49||42|| ||OpenCL-AMD||AMD FX 8150||93||79||76|| ||OpenCL-AMD||AMD FX 6100 + Radeon HD6870||274||234||219|| ||OpenCL-AMD||AMD FX 6100||126||115||90|| ||OpenCL-AMD||AMD X4 945||63||54||53|| ||OpenCL-AMD||AMD M300||14||12||12|| ||OpenCL-AMD||AMD X2 + Radeon HD5450||151||61||57|| ||OpenCL-AMD||AMD X2||15||14||11|| ||OpenCL-AMD||Intel Core i3-3110M||71||58||63|| ||OpenCL-Apple||Intel Core2Duo P8600 + GeForce 320||22||28||22|| * 1920x1080 RGB to GBR (simple byte swapping): ||= Module =||= CPU/GPU =||= MPixels/s =|| ||swscale||AMD FX 8150||718|| ||swscale||AMD FX 6100||608|| ||swscale||AMD X4 945||524|| ||swscale||AMD X2 260||582|| ||swscale||Intel Core i3-3110M||550|| ||swscale||Intel i7-4500U||627|| ||swscale||2xIntel Xeon E5-2670||758|| * 1920x1080 {{{YUV???P}}} to {{{BGR(X)}}}: ||= Module =||= CPU/GPU =||= YUV420P =||= YUV422P =||= YUV444P =|| ||swscale||AMD FX 8150||381||406||416|| ||swscale||AMD FX 6100||361||375||370|| ||swscale||AMD X4 945||369||323||237|| ||swscale||AMD X2 260||312||255||330|| ||swscale||Intel Core i3-3110M||350||309||310|| ||swscale||2xIntel Xeon E5-2670||177||168||163|| ||CUDA-Nvidia||AMD X4 945 + GTS 450||202||191||180|| ||CUDA-Nvidia||2xIntel Xeon E5-2670 / 2xK1||180||155||151|| ||OpenCL-Nvidia||AMD FX 8150 + GTX 760||331||289||257|| ||OpenCL-Nvidia||AMD X4 945 + GTS 450||?||?||?|| ||OpenCL-Nvidia||Intel Xeon E5-2620 / GTX 650ti||458||377||358|| ||OpenCL-Nvidia||2xIntel Xeon E5-2670 / 2xK1||190||165||148|| ||OpenCL-Intel||AMD FX 8150||96||70||67|| ||OpenCL-Intel||Intel Core i3-3110M||82||88||87|| ||OpenCL-Intel||Intel Xeon E5-2620||146||123||116|| ||OpenCL-Intel||2xIntel Xeon E5-2670||265||271||268|| ||OpenCL-Intel||Intel i7-4500U||162||122||153|| ||OpenCL-AMD||AMD FX 8150 + Radeon HD5450||84||82||70|| ||OpenCL-AMD||AMD FX 8150||60||55||47|| ||OpenCL-AMD||AMD FX 6100 + Radeon HD6870||179||231||197|| ||OpenCL-AMD||AMD FX 6100||78||68||56|| ||OpenCL-AMD||AMD X4 945||54||51||50|| ||OpenCL-AMD||AMD M300||11||9||7|| ||OpenCL-AMD||AMD X2 260 + Radeon HD5450||107||98||98|| ||OpenCL-AMD||AMD X2 260||11||10||7|| ||OpenCL-AMD||Intel Core i3-3110M||60||56||58|| And [http://xpra.org/stats/CSC/ here are some charts] based on those figures. Note: for historical reasons, we include results for the now deleted [/wiki/CSC/NVCUDA csc nvcuda] module despite the fact that it never worked reliably...