Xpra: Ticket #727: building with clang - measure difference in performance

http://clang.llvm.org/ is an alternative to GCC.

It is possible to build the cython modules with the script attached (based on this filter). To use it:

CC=clang-wrapper setup.py build

It would be interesting to run the automated tests to compare and see if we get any difference in performance. Probably should be tested with something that tests lots of instantiations rather than x264 encoding as this will still use the same binary code. That said, it would be worth trying to build x264 and other libs with clang too...

See: LLVM Clang vs. GCC On Fedora 21 With A High-End Xeon

Tue, 04 Nov 2014 05:15:55 GMT - Antoine Martin: attachment set

attachment set to clang-wrapper

wrapper script which strips arguments that clang chokes on

Tue, 04 Nov 2014 05:16:45 GMT - Antoine Martin: description changed

description modified (diff)

Tue, 04 Nov 2014 07:29:38 GMT - Antoine Martin: description changed

description modified (diff)

Sun, 09 Nov 2014 02:24:33 GMT - Antoine Martin: description changed

description modified (diff)

Fri, 23 Jan 2015 08:58:06 GMT - Antoine Martin: attachment set

attachment set to gcc-vs-clang.csv

test data output

Fri, 23 Jan 2015 08:58:25 GMT - Antoine Martin: attachment set

attachment set to clang-vs-gcc-mmap-pixelspersecond.png

one of the most telling graphs I generated

Fri, 23 Jan 2015 09:03:27 GMT - Antoine Martin: status changed; resolution set

status changed from new to closed
resolution set to wontfix

As can be seen on this graph:

gcc beats clang in most cases, it looks like clang adds an overhead to even the most basic case (mmap). CPU utilization is about the same, but gcc encodes more pixels/s, sends more packets, etc.

I did find one case where clang does better than gcc: it seems that clang compresses png better, at least for the "glxgears" and "glxspheres" tests. No idea why as we don't rebuild the libpng library with clang.. it should end up being more or less the same. Odd!

We may want to revisit this later, once clang matures a bit more and once projects start making more of an effort to optimize for it. At present, x264 is always faster with gcc it seems.

Tue, 03 Feb 2015 23:54:05 GMT - Nick Centanni:

I've re-run the clang vs gcc tests on a more massive scale -- this is 40 seconds per test, the suite run 9 times for each variable, and each cell averaged across the 9 reps.

I've attached the results as an archive. After extracting the archive, view the HTML file.

Alternately look at how the python script is used, and change which combinations of metrics, encodings, and apps are graphed.

After making changes run the python script with no arguments, and a new HTML file will be written.

Tue, 03 Feb 2015 23:56:43 GMT - Nick Centanni:

I've reopened the tickets because I think the results should be examined to make sure the conclusions we reached last time are still reflected in the results.

Tue, 03 Feb 2015 23:56:59 GMT - Nick Centanni: status changed; resolution deleted

status changed from closed to reopened
resolution wontfix deleted

Tue, 03 Feb 2015 23:57:17 GMT - Nick Centanni: owner, status changed

owner changed from Smo to Nick Centanni
status changed from reopened to new

Tue, 03 Feb 2015 23:57:30 GMT - Nick Centanni: owner changed

owner changed from Nick Centanni to Antoine Martin

Wed, 04 Feb 2015 00:06:16 GMT - Nick Centanni: attachment set

attachment set to perf_charts.tar.gz

Chart generator (beta) for examining performance results.

Wed, 04 Feb 2015 10:20:10 GMT - Antoine Martin: owner changed

owner changed from Antoine Martin to Nick Centanni

Thanks, that chart generator is great - much easier to use than sofastats which is a PITA to install. Can you submit it or commit it to svn? And maybe edit the wiki to refer to it?

As for clang vs gcc: the gcc win is confirmed.

Tue, 28 Apr 2015 21:42:11 GMT - Nick Centanni:

I've added a section to the wiki at https://www.xpra.org/trac/wiki/Testing documenting the changes to the performance script, as well as instructions generating charts from the data using test_measure_perf_charts.py.

Tue, 28 Apr 2015 21:42:17 GMT - Nick Centanni: status changed; resolution set

status changed from new to closed
resolution set to fixed

Fri, 13 Nov 2015 08:12:26 GMT - Antoine Martin: attachment set

attachment set to clang-wrapper.2

updated wrapper - works with Fedora 23

Tue, 08 Dec 2015 03:26:33 GMT - Antoine Martin:

Maybe we should just optimize for size! Or for Xeons? or something.

Tue, 08 Dec 2015 16:45:08 GMT - Nick Centanni:

Or maybe experiment with the -falign-* flags. The discussion in that second link seemed to suggest that setting alignment flags produces the equivalent speed improvement that you get when optimizing for size.

Tue, 22 Dec 2015 10:09:06 GMT - Antoine Martin:

Clang getting more competitive apparently: Intel Xeon Skylake Compilers: Clang Showing Strong Performance Against GCC. No x264 data, only lame. Not quite there yet IMO.

Sun, 23 Jul 2017 09:36:50 GMT - Antoine Martin:

On Fedora 26 (clang 4), the clang wrapper script is no longer needed and with Cython 0.26 we can now un-disable more warnings: r16464.

Mon, 24 Jul 2017 07:26:42 GMT - Antoine Martin:

This is just one test, and with a specific CPU, but it seems that clang could give us better FPS with x264 by about 5%: Ryzen Compiler Performance: Clang 4/5 vs. GCC 6/7/8 Benchmarks.

Thu, 26 Apr 2018 03:46:07 GMT - Antoine Martin: attachment set

attachment set to clang-wrapper.3

updated wrapper for clang-6.0

Thu, 26 Apr 2018 03:47:27 GMT - Antoine Martin:

Building with clang-6.0 (ie: Fedora 28) hits this error:

clang-6.0: error: unknown argument: '-mcet'
clang-6.0: error: unknown argument: '-fcf-protection'

To fix this, use r19079 and the updated wrapper.

Sun, 14 Apr 2019 17:57:51 GMT - Antoine Martin:

With Fedora 29 and clang 7.0.1, add -fstack-clash-protection to the list.

Sat, 23 Jan 2021 05:04:12 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/727