xpra icon
Bug tracker and wiki

Opened 15 months ago

Closed 2 weeks ago

#1851 closed task (fixed)

tune vpx threading

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: major Milestone: 2.4
Component: encodings Version: 2.3.x
Keywords: Cc:

Description (last modified by Antoine Martin)

Same as #1840 but for libvpx.

Some links:

It looks like part of the reason why vp8 and vp9 are now faster, and why I chose to use vpx more (see ticket:832#comment:22) is that the threading improvements make it faster.
This does mean that reducing the threading might reduce the performance too much.

Attachments (1)

test_vpx.tar.gz (138.2 KB) - added by Smo 2 weeks ago.
VPX data and charts for threads 1/2/4

Download all attachments as: .zip

Change History (8)

comment:1 Changed 15 months ago by Antoine Martin

You can choose the maximum number of threads with:

XPRA_VPX_THREADS=2 xpra start ...

We want to see how this affects frame latency, bandwidth, CPU load, etc.
Unlike x264, it looks like we don't have a lot of room for manoeuver here.
(the current value is "number-of-cpus" minus 1)
Maybe this should be capped at 2 threads.

comment:2 Changed 15 months ago by J. Max Mena

I've set up a quick script that should run a series of three tests runs with XPRA_VPX_THREADS set to 1, 2, and 4. For reference the test box is an 8-core system. I'm more curious to see how much of an impact it has on more low-end machines so I'm going to update one of my low-end test boxes and run the tests again on there.

Last edited 15 months ago by J. Max Mena (previous) (diff)

comment:3 Changed 11 months ago by Antoine Martin

Description: modified (diff)

comment:4 Changed 6 months ago by Antoine Martin

Owner: changed from J. Max Mena to Jonathan Anthony
Status: assignednew

comment:5 Changed 3 weeks ago by Smo

Owner: changed from Jonathan Anthony to Smo

Changed 2 weeks ago by Smo

Attachment: test_vpx.tar.gz added

VPX data and charts for threads 1/2/4

comment:6 Changed 2 weeks ago by Smo

Owner: changed from Smo to Antoine Martin

I've attached some test data and charts.

The data seems to show that more threads is better.

Can you check these over and let me know if any other action is required.

comment:7 Changed 2 weeks ago by Antoine Martin

Resolution: fixed
Status: newclosed

Interesting data:

  • we encode more pixels per second with more threads but when it comes to actually sending ("pixels sent"), the benefits are much lower as other costs come into play (and maybe we're hitting a performance ceiling?)
  • there seems to be a sweet spot with 2 threads, at least for the batch delay and damage latency
  • going up to 4 threads doesn't gain much (ie: marginal improvement in damage latency and pixels sent per second) - I suspect that this may vary with bigger picture sizes
  • decoding takes a little bit longer with more threads - which is fine, we're almost never bound by the client's decoding speed
  • 4 threads uses quite a bit more server side memory

So, r23474 makes us use fewer threads by default (was number-of-cpus - 1):

>>> import math
>>> for i in range(8):
...  print("%-3i: %2i" % (2**i, math.sqrt(2**i+1)))
... 
1  :  1
2  :  1
4  :  2
8  :  3
16 :  4
32 :  5
64 :  8
128: 11

This can still be overriden using the env var XPRA_VPX_THREADS=

Note: See TracTickets for help on using tickets.