Version 5 (modified by 8 years ago) (diff) | ,
---|
Window Refresh
The key feature of xpra is the forwarding of individual window contents to the client.
The speed at which the server compresses and sends screen updates to the client depends on a number of factors. There are only a few things the system can tune at runtime:
- the "batch delay": this is the time that the server will wait for screen updates to accumulate before grabbing the next frame
- the "video quality" (for the x264 video encoder only)
- the "video speed" (for the x264 video encoder only)
Code Pointers
Since this is an area that receives constant tuning and improvements, the most reliable source of current information is the code itself:
- server_source.py: this class is instantiated for each client connection. In particular:
GlobalPerformanceStatistics
: collects various statistics about this client's connection speed, the various work queues, etc. Theget_factors
method returns a guesstimate of how the batch delay should be adjusted for the given parameters ("target latency" and "pixel count")calculate_delay_thread
which runs at most 4 times a second to tune the batch delay, both the global default one (default_batch_config
which is aDamageBatchConfig
) and the one specific to each window (see below for details)
- window_source.py: this class is instantiated for each window and for each client, it deals with sending the window's pixels to the client. In particular:
DamageBatchConfig
: the structure which encapsulates all the configuration and historical values related to a batch delay.WindowPerformanceStatistics
: per-window statistics: performance of the encoding/decoding, amount of frames and pixels in flight, etc. Again, theget_factors
method rreturns a guesstimate of how the batch delay should ne agjusted for the given parameters ("pixel_count" and current "batch delay")
- batch_delay_calculator.py: this class is where we used the factors obtained by
get_factors
above to tune the various attributes.
Note also that many of these classes have a add_stats
method which is used by xpra info
to provide detailed statistics from the command line and is a good first step for debugging.
Background on damage request processing
Step by step (and oversimplified - so check the code for details) of the critical damage code path:
- an X11 client application running against the xpra display updates something on one of its windows, or creates a new window, resizes it, etc. It notifies the X11 server that something needs to be redrawn, changed or updated.
- xpra, working as a compositing window manager, receives a notification that something needs updating/changing.
- this
Damage
event is forwarded through various classes (CompositeHelper
,WindowModel
, .. and eventually ends up in the server'sdamage
method - often as aDamage
event directly, or in other cases we synthesize one to force the client to (re-)draw the window. - then for each client connected, we call
damage
on itsServerSource
(see above) - then this damage event is forwarded to the
WindowSource
for the given window (creating one if needed) - then we either add this event to the current list of delayed regions (which will expire after "batch delay"), we create a new delayed region, or if things are not congested at all we just process it immediately
process_damage_region
(called when the timer expires - or directly) will grab the window's pixels and queue a request to call "make a data packet" from it. This is done so that we minimize the amount of work done in the critical UI thread, another thread will pick items off this queue.
Note: delayed regions may get delayed more than originally intended if the client has an excessive packet or pixel backlog.
The damage thread will pick items from this queue, decide what compression to use, compress the pixels and make a data packet then queue this packet for sending. Note that the queue is in ServerSource
and it is shared between all the windows..
Things not covered here:
- auto-refresh
- choosing between a full frame or many smaller regions
- various tunables
- callbacks recording performance statistics
- callbacks between
ServerSource
andWindowSource
- calculating the backlog
- error handling
- cancelling requests
- client encoding options
- mmap
The Factors
Again, this is a simplified explanation, and the actual values used will make use of heuristics, weighted averages, etc. Recent vs average values, trends, etc.
Please refer to the source for details.
From ServerSource
:
client latency
: the client latency measured during the processing of pixel data (when we get the echo back). We know the lowest latency observed, and we try to keep the latency as low as thatclient ping latency
: as above, but measured from ping packets (client)server ping latency
: as above, but measured from ping packets (server - if available, the client sends this information as part of ping echo packets)damage data queue
: the number of damage frames waiting to be compressed, we want to keep this low - especially as this data is uncompressed at this point, so it can be quite largedamage packet queue size
: the number of packets waiting to be sent by the network layer - we want to keep this low, but sometimes many small packets can make it look worse than it is..damage packet queue pixels
: the number of pixels in the packet queue, the target value depends on the current size of the windowmmap area % full
: only used with mmap
From WindowSource
:
damage processing latency
: how long frames take from the moment we receive the damage event until the packet is queued for sending. This value increases with the number of pixels that are encoded. We want to keep this low.damage processing ratios
: as above, but based on the trend: this measure goes up when we queue more damage requests than we can encode.damage send latency
: how long frames take from the moment the damage event until the packet containing it has made it out of the network layer (it may still be in the operating system's buffers though). Again, we want to keep this low. This value increases when the network becomes the bottleneck.damage network delay
: the difference betweendamage processing latency
} anddamage send latency
, this should equal to the network latency - but because the values are running averages, it is not very reliable.network send speed
: we keep track of the socket's performance (in bytes per second)client decode speed
: how quickly the client is decoding framesno damage events for X ms
: when nothing happens for a while, this means that the window is not busy and therefore we ought to be able to lower the delay
Each factor provides:
- a textual description which can be used for debugging
- a change factor (float number) ranging from 0.0 (lower the delay) to a small positive number (up to 10 - generally, but that is not enforced).. A value of 1.0 means no change.
- a weight value: this is to measure how confident this particular factor is about the change it requests.
ie:
- with a weight of 0.0, the factor is irrelevant as it won't be counted
- with a high enough weight, a factor can overrule others
Debugging
"xpra info" provides a good overview of the current values used for batch delay and encoding speed/quality, but very little in terms of the factors used to come up with those values (only some latency values are exposed).
To dump the change in "batch delay", set:
XPRA_DELAY_DEBUG=1
when starting the server. Then every 30 seconds, or every 1000 messages, the delay factors will be dumped to the log (this is done to prevent the logging itself from affecting the system and the calculations - and it still does, but much more sparsely), the log messages look like this:
update_batch_delay: wid=5, last updated 249.50 ms ago, decay=1.00s, \ change factor=9.8%, delay min=5, avg=5, max=6, cur=6.7, \ w. average=6.0, tot wgt=227.2, hist_w=113.6, new delay=7.4
For more details, to get the actual factors used, set:
XPRA_DELAY_DEBUG=1
Then the output should be much more verbose:
Factors (change - weight - description): -14 38 damage processing latency: avg=0.013, recent=0.013, target=0.014, aim=0.800, aimed avg factor=0.729, div=1.000, s=<built-in function sqrt> +128 15 damage processing ratios 12 - 13 / 5 -25 50 damage send latency: avg=0.014, recent=0.015, target=0.030, aim=0.800, aimed avg factor=0.557, div=1.000, s=<built-in function sqrt> -65 8 damage network delay: avg delay=0.001 recent delay=0.002 -65 23 client decode speed: avg=32.3, recent=32.3 (MPixels/s) +0 0 no damage events for 1.2 ms (highest latency is 100.0) -49 8 client latency: avg=0.002, recent=0.002, target=0.006, aim=0.800, aimed avg factor=0.260, div=1.000, s=<built-in function sqrt> -40 4 client ping latency: avg=0.003, recent=0.003, target=0.007, aim=0.950, aimed avg factor=0.353, div=1.000, s=<built-in function sqrt> -54 4 server ping latency: avg=0.003, recent=0.002, target=0.006, aim=0.950, aimed avg factor=0.211, div=1.000, s=<built-in function sqrt> -100 0 damage packet queue size: avg=0.000, recent=0.000, target=1.000, aim=0.250, aimed avg factor=0.000, div=1.000, s=<built-in function sqrt> -100 0 damage packet queue pixels: avg=0.000, recent=0.000, target=1.000, aim=0.250, aimed avg factor=0.000, div=90000.000, s=<built-in function sqrt> -99 20 damage data queue: avg=0.346, recent=0.033, target=1.000, aim=0.250, aimed avg factor=0.019, div=1.000, s=<built-in function logp> -100 0 damage packet queue window pixels: avg=0.000, recent=0.000, target=1.000, aim=0.250, aimed avg factor=0.000, div=90000.000, s=<built-in function sqrt>
Note: the change and weight are shown as percentages in the output (rather than floating point numbers in the implementation).
To dump the changes to video encoding speed and quality, set:
XPRA_VIDEO_DEBUG=1
when starting the server, you will then get messages like these when using fixed quality/speed settings:
video encoder using fixed speed: 10 video encoder using fixed quality: 10
Or these when actually using the tuning code:
video encoder quality factors: wid=5, packets_bl=1.00, batch_q=0.31, \ latency_q=94.44, target=30, new_quality=25 video encoder speed factors: wid=4, low_limit=157684, min_damage_latency=0.03, \ target_damage_latency=0.20, batch.delay=17.69, dam_lat=0.00, dec_lat=0.05, \ target=4.00, new_speed=5.00
Please refer to the code for details.