Xpra: Ticket #835: synchronize sound with video frames

Because we try hard not to drop any sound packets (unless we do a sound pipeline restart), we have a small-ish sound buffer at the client side, we will need to delay the video frames to match that.

Do we want to delay all screen updates or just video? Since "normal" updates could just be lossless refreshes for the video, we may have to do it for all screen updates - which is a bit of a shame. Unless we pass the video region info to the client and let it manage what to delay?

What is going to be needed:

for testing: a good reference video, with pictures that make it easy to identify when they should be displayed in relation to the sound
include a presentation timestamp with all screen updates and all sound buffers
capture good statistics from the sound process about how long the sound buffers took to play - and what range of the buffer capacity is actually used (so we don't cause overruns or underruns by changing the settings)
drive the sound delay from the main process: pass a message to the sound process to tell it to change the value

Issues:

finish #669
the screen updates may take a little while to get displayed (vertical sync, opengl page flip, compositing, etc) - very hard to evaluate, but small anyway
we get notified of screen updates by X11's damage extension (which can introduce some delays when under load)
should this be automatic or a setting? We should be able to detect "video" mode when there are many video screen updates per second (no point delaying screen updates when we only get one or two a second - usually)
since we delay things, we could tell the video encoder to enable B-frames and use the delay to compress better (and almost for free)
different platforms behave differently: Linux is generally quite happy with a small buffer and even underruns, win32 and osx are not IIRC

Sun, 12 Apr 2015 12:46:27 GMT - Antoine Martin: status changed

status changed from new to assigned

Here are some good test videos to use for testing:

As of r8983, we now send and "info" packet by to the parent process after each buffer we push, so we should have up to date and accurate information about the queue levels.

Mon, 13 Apr 2015 13:21:15 GMT - Antoine Martin:

Another useful metric is to hit the pause button in a media player (or even youtube) and to measure for how much longer the music keeps playing on the client end. Even with a simple 440HZ Tone.

It can be useful to run the software producing the sound outside xpra using a regular desktop session's pulseaudio setup (and the --no-pulseaudio flag when starting the xpra server) so that the request to stop the sound (ie: pause the music using a click or keypress) is processed almost instantly without going through xpra forwarding (which adds its own event processing latency).

Running the latest code with a Linux client on a GB LAN, I see:

buffer level in session info is hovering at about 25ms on average (connection tab)
the audio seems to be about 150ms behind the video (no matter what encoding is used)

150ms is definitely within the range that we can deal with by adding a simple paint delay. It is more difficult for me to test win32 and osx, as I run those within virtual machines which add their own latency.. (forwarding the sound back from the virtualized sound card back to the host)

Wed, 15 Apr 2015 22:38:30 GMT - J. Max Mena:

Tested with an r9016 Win8.1 Client against an r9016 Fedora 20 server:

Sound was on average 150ms behind(as best I can tell)
Sound queue varied greatly and was on average 20-30ms. Although now it's up to 100-200ms

Stole the sesion with an OSX 10.10.1 r9016 client:

Sound remained on average 150ms behind
Sound queue went up to 200-300ms

Thu, 30 Apr 2015 14:41:22 GMT - Antoine Martin:

See also #849 (more general sound improvements)

Tue, 05 May 2015 09:02:52 GMT - Antoine Martin: attachment set

attachment set to sound-sync-v1.patch

work in progress patch

Tue, 05 May 2015 10:04:11 GMT - Antoine Martin:

With the patch above, I can get almost perfect sync on a LAN with Linux clients. There are two parts (only one is used so far - the server side):

sending the current client queue latency delay back to the server (sent back using the "sound-control" packet type)
delay the processing of video regions (currently hard-coded to 200ms)

Other changes needed / issues remaining:

we could enable this code for older clients by just using a hard-coded value
this only works for video regions - full-window animation or fullscreen video is currently not handled as a video region... the video region detection code and its tests will require changes (and this will impact other things..)
we should not bother sending the client delay if it changes by less than 50ms since the last time we sent it
try harder to keep this client-side queue delay low (but without causing underruns,overruns or pipeline restarts..) - we first need some good test data on as many platforms and network conditions as possible (and maybe also with gstreamer 1.0 as per #849)
actually use the client delay - this is hard because we queue the encode callback using timeout_add: we can't just use an arbitrary changing values here or we would quickly get frames out of order! (and maybe the call_in_encode_thread function should take care of the delay, as it is already used with a delay by the window icon code)
take into account the encoding latency (I test with nvenc which is fast)
when we delay things, we keep the raw rgb pixels in memory with the python frame, this can get quite big - and we should limit how many frames are kept waiting this way (or risk gigabytes of ram getting used up when things go wrong - and they sure will)
we encode after the delay, maybe we should encode first (saving lots of ram), then delay before queuing the packet for sending
when cancelling (because of resize or deleted window), we should just drop all the delayed frames
auto-refresh issues - if any
some statistics use the elapsed time since the damage_time, those are used for auto speed and quality tuning and need to be updated to subtract the delay
lots of testing needed
expose things via xpra info for debugging
expose this delay to the client?

Wed, 13 May 2015 10:59:52 GMT - Antoine Martin: attachment set

attachment set to sound-sync-v2.patch

updated patch, adds command line option, client delay value update batching

Thu, 14 May 2015 13:18:27 GMT - Antoine Martin: attachment set

attachment set to sound-sync-v3.patch

latest work in progress: adds ability to change the delay on the fly and keep the frame order

Thu, 14 May 2015 13:25:31 GMT - Antoine Martin:

New issues and ideas:

the new queue stuff is overkill and costly for small packets... find a way to bypass it? (just skip encode_from_queue?)
we need to free the images when we cancel_damage
process_damage_region should probably always use the clipped size for video?
maybe av-sync should be an options flag? (and use it to expose the actual value to clients?)
dislike for hard-coded rgb24 in video version of process_damage_region
some video encoders take time to start (even x264..), could be done in parallel with a jpeg/rgb encode, and send a flag to the client to tell it to decode but not paint that initial frame

Fri, 15 May 2015 07:22:34 GMT - Antoine Martin:

r9372 implements most of what is required and seems to work quite well on my LAN, see commit message. For debugging purposes, there is now a -d av-sync logging flag.

To make it possible to fine tune the syncing, the environment variable XPRA_AV_SYNC_DELTA can be used to adjust the delay, the effect on both client and server should be the same (be it is also cumulative!):

on the client: changes the delay reported back to the server
on the server: adjusts the delay received

We now expose the delay via xpra info (look for av-sync).

Still TODO:

this only works for video regions (no fullscreen or full window), if the video does not get detected, then this is not a bug with this ticket but with #410. Once we detect fullscreen using the video region code, we can simplify process_damage_region
mmap case is not handled (we could delay it the same way)
try harder to keep this client-side queue delay low (could we tune the audioresample element to drain / fill the queue?)
take into account the encoding latency.. (I am testing with nvenc)
I have seen the encoding queue sending frames way too late.. maybe we're now queueing up too much work to handle (since this is no longer taken into account by the batching delay calculations)
maybe switch to encode first and then queue? (save memory, better statistics, less congestion, etc)
rather than starting with the full delay for the first video frame, we could do it more gradually, so that we adapt the frame delay more smoothly when it changes (most important on startup where it causes a lot of jerkiness)

Fri, 15 May 2015 10:45:15 GMT - Antoine Martin:

The commit above only worked with XPRA_XSHM=0 to turn off shared memory, that's because the pixels keep getting updated after we request them so the image we grabbed via xshm was always fully up to date, instead of the older version at the time we requested it. r9373 freezes the pixel buffer before queuing up the work. (and uses the new re-stride code from #839 to try to save space, but it will still consume a lot more cpu and memory than non-av-synced pixels...).

Finally, we need to find a way to auto-tune the XPRA_AV_SYNC_DELTA. That's not going to be easy as there are a number of things that will have an impact, at both ends!:

the OS and version
the GStreamer version (we are using 0.10, but we should be moving to 1.x now that we can - see #849)
the video encoding used (and the speed setting used...)
the browser and plugin used? (ie: using alsa or pulseaudio api)

etc.

From my brief testing, I found that those values work well:

mp3 linux: 250 (that's quite slow!)
wav linux: 0 (impressive!)
wavpack, flac and opus (now enabled as of r9375): all so out of sync, no delay value can fix them - there must be something wrong with the gstreamer pipeline setup, queuing too much somewhere server side

Maybe we need to tune some of those encoders better (all the 0.10 plugins can be found here: http://www.freedesktop.org/software/gstreamer-sdk/data/docs/latest/):

wavpackenc) there is an undocumented mode argument of type GstWavpackEncMode, with the following definitions found in the source file (extra-processing already defaults to 0, so that's fine as it is):

{GST_WAVPACK_ENC_MODE_VERY_FAST, "Very Fast Compression", "veryfast"},
{GST_WAVPACK_ENC_MODE_FAST, "Fast Compression", "fast"},
{GST_WAVPACK_ENC_MODE_DEFAULT, "Normal Compression", "normal"},
{GST_WAVPACK_ENC_MODE_HIGH, "High Compression", "high"},
{GST_WAVPACK_ENC_MODE_VERY_HIGH, "Very High Compression", "veryhigh"},

lamemp3enc: encoding-engine-quality: Quality/speed of the encoding engine, this does not affect the bitrate'' - and in the source code we find:
```
LAMEMP3ENC_ENCODING_ENGINE_QUALITY_FAST = 0,
LAMEMP3ENC_ENCODING_ENGINE_QUALITY_STANDARD,
LAMEMP3ENC_ENCODING_ENGINE_QUALITY_HIGH
```

flacenc: quality: Speed versus compression tradeoff. - in the source:

      {0, "0 - Fastest compression", "0"},
      {1, "1", "1"},
      {2, "2", "2"},
      {3, "3", "3"},
      {4, "4", "4"},
      {5, "5 - Default", "5"},
      {6, "6", "6"},
      {7, "7", "7"},
      {8, "8 - Highest compression", "8"},
      {9, "9 - Insane", "9"},
      {0, NULL, NULL}

Also, from what I am reading (Need help with using OPUS over RTP), audioconvert is optional for some encoders/decoders (but it is required for opus for example).

Some other videos useful for visualizing the synchronization:

Fri, 15 May 2015 11:32:46 GMT - Antoine Martin: attachment set

attachment set to sound-low-latency-tuning.patch

ugly patch for tuning the encoders to use low latency settings

Fri, 15 May 2015 11:40:39 GMT - Antoine Martin:

The patch above reduces the latency of all the codecs it tunes:

mp3 mostly unchanged, still about 200 to 250ms
wavpack is still awful
flac is down to about 150ms!
opus: flaky so I may have to disable it again..

Fri, 15 May 2015 14:26:29 GMT - Antoine Martin:

gstreamer does calculate the latency of its pipeline: Clocks and synchronization in GStreamer, but I see no way of getting the data from the obscure message we get on the bus:

sound-source Latency message from /GstPipeline:pipeline0/GstLameMP3Enc:lamemp3enc0 (__main__.GstLameMP3Enc): <gst.Message (none) from lamemp3enc0 at 0x7ff7fc0023c0>

This function is not exposed as far as I can see: gst-audio-encoder-get-latency.

Fri, 15 May 2015 16:43:14 GMT - Antoine Martin:

As of r9381, we auto-tune the av-sync latency based on the codec in use given this table (in milliseconds):

        MP3         : 250,
        FLAC        : 150,
        WAV         : 0,
        WAVPACK     : 600,

You also need r9382 to use the low-latency codec options which make flac usable. Now we need to test this code on more platforms and see what needs adjusting.

Note: we now choose flac ahead of mp3, because it doesn't really use that much more bandwidth (at least for use on a LAN), and it does lower the encoder latency by 100ms which is not insignificant, and it seems to be less prone to jitter (which raises the client side queue delay) and mp3 seems to cause some frame video frame delays (to be investigated) - here are some statistics you can get using xpra info | grep client.av-sync:

for flac:

client.av-sync.client.delay=0
client.av-sync.delta=0
client.av-sync.total=150

for mp3:

client.av-sync.client.delay=176
client.av-sync.delta=0
client.av-sync.total=426

That said, I've just tested again and got different results... So for testing mp3, use the client with --speaker-codec=mp3.

But since flac is disabled on win32... it will default to mp3 on that platform.

The plan is to allow us to use python3 + gstreamer 1.0 in 0.16 (see #849), so maybe we could re-enable flac on win32 then - the integration code is not hard (pretty much there already), but the packaging part will be a challenge!

Thu, 21 May 2015 03:41:46 GMT - Antoine Martin:

Fixes:

r9462
r9463 (see #865)

Mon, 25 May 2015 08:22:22 GMT - Antoine Martin:

As of r9517, the av-sync delay is per-window and changes more gradually to prevent the stuttering effect. New issues:

sound delay way too high on my laptop (well over a second), this will need pipeline tuning without doing restarts

firefox crashed the xpra server with:

Xorg: ../../../include/privates.h:123: dixGetPrivateAddr: Assertion `key->initialized' failed.
Xpra: Fatal IO error 0 (Success) on X server :2.
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":2"
Xpra: Fatal IO error 11 (Resource temporarily unavailable) on X server :2.

Founds lots of hits, but none of them seem particularly relevant.

Mon, 13 Jul 2015 05:36:59 GMT - rektide: cc set

cc rektide@… added

Tue, 21 Jul 2015 11:42:39 GMT - Antoine Martin:

See attachment/ticket/800/delayed-frames.patch: the video encoders will keep the delayed frames internally, so we should pass at least part of the av-sync delay to them...

Fri, 14 Aug 2015 11:45:20 GMT - Antoine Martin:

See 849#comment:16

Fri, 16 Oct 2015 15:06:58 GMT - Antoine Martin:

Lots of fixes and updates in r10878, in particular:

XPRA_MAX_SYNC_BUFFER_SIZE is the maximum amount of memory we can use for buffering video frames (defaults to 256 - in MB)
XPRA_AV_SYNC_RATE_CHANGE by how much we adjust the av-sync delay (defaults to 20 - in ms)
XPRA_AV_SYNC_TIME_CHANGE how often we adjust the av-sync delay (defaults to 500 - in ms)

Mon, 07 Dec 2015 11:58:54 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to alas
status changed from assigned to new

Minor debugging tweaks in r11365, and ability to fine tune the av-sync delay added in r11366. This can be used to adjust the buffering of the video at runtime without restarting the server (same as what XPRA_AV_SYNC_DELTA does on startup). So for example, to delay the video by an extra 50ms:

xpra control :10 sound-output av-sync-delta "50"

This is also available via the dbus interface (see #904), which is a little bit more user-friendly. This is not cumulative, it resets the delta value every time. Because of the command line parser used, it is a bit difficult to specify negative values as they would get interpreted as options rather than arguments (no such problem with the dbus version). To workaround this, you can specify negative values by quoting them and adding a space, ie: xpra control :10 sound-output av-sync-delta " -100"

Tested with artificial sound jitter to cause the sound buffer levels to go up:

XPRA_SOUND_SOURCE_JITTER=300 xpra start ...

The "sending update queue.used=" message on the client when running with -d av-sync fires a lot and then on the server we do adjust the target delay accordingly (every XPRA_AV_SYNC_TIME_CHANGE ms):

update_av_sync_delay() current=63, target=116, adding 20 (capped to +-20 from 53)

This occasionally causes items to be delayed a bit too much as we decrease the delay:

encode_from_queue: processing item 1/5 (overdue by 157ms)

The vast majority of the encoding happens to be on time, or overdue by a negligible amount (under 50ms). More importantly, the sync test videos from comment:1 seem to play mostly in sync.

This is only ever going to work as well as permitted by the somewhat incomplete testing work done in #849, see also #999 for how to reproduce different bandwidth conditions more reliably.

Failing to detect the video region (see #410) will cause things to get out of sync, and that this is not a problem with sound sync per-se.

Having sound sync enabled will now also cause further problems with video regions being wrongly detected (see #967), as this will delay the screen updates for the wrong area of the window (or even, the whole window). Again, this is a problem with video region detection, not sound sync.

Wed, 09 Dec 2015 23:57:49 GMT - J. Max Mena:

Set up a Fedora 23 trunk r11384 Server and connected a Fedora 23 trunk r11384 client(both machines hardware):

Set up in the same LAN, with all wired connections

Set av-sync-delta to around 250, and I'm getting it almost perfectly in sync.

I've also found that it seems to reset on reconnect. Either that or my connection conditions change slightly. Either way, I have to fiddle with it each connection, but in my setup around 250 seems to be good. Also, failing to detect the video region definitely comes into play; I've noticed that when (with opengl paint boxes enabled) it is painting the test video region with chunks of image encoders (it seems to like WebP..I think, I forget the colors), it rarely gets into sync.

I'll need to test it far more thoroughly (maybe we can setup some different network conditions), but so far it appears to have a noticeable effect.

Thu, 10 Dec 2015 05:28:06 GMT - Antoine Martin:

Please always specify at the very least:

which codec was used (vorbis?)
xpra info | grep av-sync
the average sound queue level as shown on session info, roughly

I've also found that it seems to reset on reconnect.

The delta obviously is per client connection, so multiple clients can get different av sync delays. They should all be in sync - just not with each other.

but in my setup around 250 seems to be good

Are you saying that you need to add 250ms to get things to sync? (the point of this ticket is that the system should be able to auto-detect the delay to use).

Wed, 16 Dec 2015 19:42:48 GMT - J. Max Mena:

which codec was used (vorbis?)

Woops, forgot to include that. Yes, it was using vorbis.

Are you saying that you need to add 250ms to get things to sync?

In our current networking setup that appears to work nicely with vorbis. It seems to be the most in-sync that way. I imagine different network setups would need wildly (slight exaggeration) different delay values. My recommendation is to add a man-page and wiki entry showing how to change these values so that users can tweak it as they see fit(assuming they aren't already there). 250ms with vorbis works here right now, but with networking it's always a moving target.

As of right now (really close to in sync), xpra info | grep av-sync shows:

client.av-sync.client.delay=92
client.av-sync.delta=250
client.av-sync.total=342
features.av-sync=True
window[1].av-sync.current=342
window[1].av-sync.target=342
window[25].av-sync.current=342
window[25].av-sync.target=342

I'll be looking at different codecs and delays today so I'll make a nice chart showing what works here in an "ideal" networking setup(hard network connection + non-virtual machines).

Wed, 16 Dec 2015 23:58:41 GMT - J. Max Mena:

Played around with it for a good portion of today. I've found that as the session goes along it kinda varies a bit.

Speaker Codec	Best latency settings
vorbis	250-275
opus	200-250
wav	200-250

vorbis info in comment:21

opus xpra info | grep av-sync output:

client.av-sync.client.delay=160
client.av-sync.delta=200
client.av-sync.total=360
features.av-sync=True
window[1].av-sync.current=360
window[1].av-sync.target=360
window[25].av-sync.current=360
window[25].av-sync.target=360

Sound queue for opus:

Between 120-180ms

wav xpra info | grep av-sync output:

client.av-sync.client.delay=80
client.av-sync.delta=200
client.av-sync.total=280
features.av-sync=True
window[1].av-sync.current=280
window[1].av-sync.target=280
window[25].av-sync.current=280
window[25].av-sync.target=280

Sound queue for wav:

Between 50-100ms
- oddly enough, many of the spikes in the level were above the maximum..and sound quality was less than stellar.

These three were all I got to today. It is quite time consuming to get down just perfect. Plus, we had an all hands meeting...and I had a CS final this morning so I came in to the office late. I'll get to the rest tomorrow.

Thu, 17 Dec 2015 06:03:58 GMT - Antoine Martin:

My recommendation is to add a man-page and wiki entry showing how to change these values so that users can tweak it as they see fit(assuming they aren't already there).

No, tweaking should be a last resort thing. The point of this ticket is to make it sync automatically, by default, without any user intervention.

250ms with vorbis works here right now, but with networking it's always a moving target.

And the heuristics are meant to adjust for that.

I am at a loss to explain why you would need a 250ms adjustment. This is a huge value, and I don't understand where this delay is coming from. This is definitely not what I am seeing here on a LAN where the defaults work out of the box.

Please include a screenshot of the sound latency graph when this happens, and maybe also a full xpra info.

Sound queue for wav: ... .and sound quality was less than stellar.

We don't care much about wav, the prefered codec order is:

CODEC_ORDER = [VORBIS, OPUS, FLAC, MP3, WAV, WAVPACK, SPEEX]    #AAC is untested

Fri, 18 Dec 2015 19:22:36 GMT - J. Max Mena:

No, tweaking should be a last resort thing. The point of this ticket is to make it sync automatically, by default, without any user intervention.

Okay, I misinterpreted the point of the ticket. I'll keep this in mind going forward.

I am at a loss to explain why you would need a 250ms adjustment.

And I am as well. All day yesterday, and most of this morning it's working fine without any delta adjustment.

One important thing we've found(and Lonny pointed out to me) is that the sync is noticeably better with a constant background noise playing. On average I see it 20-50ms better when I leave a background noise source on while doing the sync tests. The background noise can be anything from music in a flash game to movies playing in VLC.

Tue, 22 Dec 2015 13:35:19 GMT - Antoine Martin:

(huge bug found and fixed, see ticket:1054#comment:3 - may have caused the video to not get delayed in some corner cases: only with a video region of the same width as default xshm images, IIRC: 2048 pixel wide)

Tue, 05 Jan 2016 08:46:40 GMT - Antoine Martin:

r11585 improves the handling for the freeze() failure reported in ticket:995#comment:29. (will backport) If this happens again, hopefully we can figure out why and fix it properly.

Thu, 28 Jan 2016 20:56:12 GMT - Antoine Martin:

Follow up in #1103

Sat, 30 Jan 2016 03:30:04 GMT - alas:

Some significant finds for sync'ing posted in 1103:comment2 (with vorbis).

Sun, 10 Apr 2016 06:10:56 GMT - Antoine Martin: priority changed

priority changed from major to critical

Can we please close this one?

Mon, 11 Apr 2016 18:52:24 GMT - alas: status changed; resolution set

status changed from new to closed
resolution set to fixed

Ooops, yes.

The sync is actually very good except in some fringe cases (4K monitors with low refresh rates connected to weak machines, when opengl is greylisted, etc.) or for those who don't think audio-late of 1-3 frames (with a 26 fps video sync test) is good enough.

Will follow up some more with #1103 (or whatever new tickets are opened in its wake). Closing this as doing quite the job for now.

Wed, 13 Apr 2016 14:20:07 GMT - Antoine Martin:

Follow up in #1164.

Mon, 20 Jun 2016 04:07:49 GMT - Antoine Martin:

Looks like there is a bug in this code: #1237.

Sun, 03 Jul 2016 10:00:41 GMT - Antoine Martin:

r12950 makes it possible to force a fixed frame delay server side using the env var XPRA_FORCE_AV_DELAY, this applies to all screen updates indiscriminately instead of just video regions. Useful for testing.

Fri, 28 Oct 2016 04:06:58 GMT - Antoine Martin:

Some fixes:

r14305: skip error message during cleanup (race condition)
r14306: remove more than one item from the queue per iteration
r14307: better timer management, honour XPRA_FORCE_AV_DELAY for testing
r14308: fix memleak (requires r14306) - (edit: backport probably not needed, memleak probably not real)
r14310: unmap cleanup (backport?)
r14311: code cleanup, always clear encode queue on cancel_damage
r14312: fix server-side queue level accounting

Thu, 01 Jun 2017 10:18:52 GMT - Antoine Martin:

Improvements in audio pipeline capture latency related to AV timestamps changes in r15986.

Sat, 23 Jan 2021 05:07:25 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/835