Xpra: Ticket #855: UI thread gets stuck on osx (at random?)

Reported in ticket:669#comment:40 and #1002, it looks like this:

UI thread is now blocked
UI thread is running again, resuming

The weird thing is that it isn't just the UI thread, the polling thread also gets stopped. Funnily enough, sound keeps on playing, so not all threads are blocked!?

We should figure out why things gets stuck, and what we can do about it. At least we probably shouldn't suspend then resume immediately: if the polling thread got stuck, we can just skip the events.

See also: #1003, #986



Fri, 23 Oct 2015 06:54:40 GMT - Antoine Martin: status, description changed


Fri, 23 Oct 2015 06:59:12 GMT - Antoine Martin: owner, priority, status, description changed

@afarr: please provide the information requested in ticket:1002#comment:1 here.


Sat, 31 Oct 2015 00:36:23 GMT - alas: owner changed

Is the gap between the "blocked" and "running again" always this low? (here just 1ms)


Not always, but at least 9 times out of 10 the gap is between 0 and 23 ms.

Occasionally, when going through about 10 hours of logs with -d sound on to see if I could spot any cause for sound apparently stopping on sessions left idle overnight, I did see the UI thread is now blocked message with no indication of it resuming for 20 seconds or so, after which I see another UI thread is now blocked message followed by an immediate resumption message. (Which might be the result of overnight spinners, client trying to go to sleep, or who knows what, I suppose.)

Trying to connect with -d util, though, I see a never-ending stream of output repeating:

2015-10-30 17:02:53,207 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:53,207 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:53,208 UI_thread_wakeup()
2015-10-30 17:02:53,271 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,331 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,397 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,455 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,509 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,583 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,639 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,708 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:53,709 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:53,709 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,709 UI_thread_wakeup()
2015-10-30 17:02:53,769 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,826 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,878 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,954 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,006 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,082 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,137 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,209 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:54,209 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:54,210 UI_thread_wakeup()

... I'm not seeing any UI thread messages while the ... oops, typed too soon. I eventually saw a UI thread message, despite the util output:

2015-10-30 17:21:32,921 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,000 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,080 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,131 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,154 poll_UI_loop() last_UI_thread_time was 10.3 seconds ago (max 1), UI_blocked=False
2015-10-30 17:21:33,154 UI thread is now blocked
2015-10-30 17:21:33,155 UI_thread_wakeup()
2015-10-30 17:21:33,155 UI thread is running again, resuming

There was a longer than usual collection of 'info' callbacks before the UI thread was blocked, but it looks like that's included in the message (had nothing running at the time but an idle gedit window).

(Passing back to you so it doesn't get lost in the pile of my tickets.)


Sat, 31 Oct 2015 10:02:57 GMT - Antoine Martin: owner changed

(or even higher values?) Which will increase the polling delay to 2 seconds (2000 milliseconds).


Just for the record:


Mon, 02 Nov 2015 23:42:32 GMT - alas:

Testing with the 0.16.0 r11118 osx client against a 0.16.0 r11118 fedora 21 server, with -d xpra.platform.ui_thread_watcher I'm not seeing any sign of the UI thread blocked warnings.

Ran for about 20 minutes, only thing I saw (aside from the expected UI message when opening the Session Info) was a lot of messages like:

2015-11-02 14:14:59,213 wait(0.5000) actually waited 0.5011
2015-11-02 14:14:59,213 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-11-02 14:14:59,213 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x9141bb0>]
2015-11-02 14:14:59,213 UI_thread_wakeup()

... which I'm guessing is expected.

2015-11-02 14:25:21,900 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2015-11-02 14:53:41,143 Warning: long timer waiting time,
2015-11-02 14:53:41,150  UI thread polling waited 9.9 seconds longer than intended (10.4 vs 0.5)
2015-11-02 14:53:41,157 UI thread is now blocked
2015-11-02 14:53:41,158 UI thread is running again, resuming
2015-11-02 14:54:02,176 server is not responding, drawing spinners over the windows
2015-11-02 14:54:02,434 server is OK again
2015-11-02 14:55:11,094 Warning: long timer waiting time,
2015-11-02 14:55:11,094  UI thread polling waited 9.8 seconds longer than intended (10.3 vs 0.5)
2015-11-02 14:55:11,094 UI thread is now blocked
2015-11-02 14:55:11,098 UI thread is running again, resuming
2015-11-02 14:56:11,303 Warning: long timer waiting time,
2015-11-02 14:56:11,303  UI thread polling waited 9.6 seconds longer than intended (10.1 vs 0.5)
2015-11-02 14:56:11,303 UI thread is now blocked
2015-11-02 14:56:11,303 UI thread is running again, resuming
2015-11-02 14:57:01,900 UI thread is now blocked
2015-11-02 14:57:01,904 UI thread is running again, resuming

It looks like there were several such messages in the wake of a spot of spinners? After a few more minutes I got a couple more.

2015-11-02 14:59:11,374 Warning: long timer waiting time,
2015-11-02 14:59:11,374  UI thread polling waited 9.8 seconds longer than intended (10.3 vs 0.5)
2015-11-02 14:59:11,374 UI thread is now blocked
2015-11-02 14:59:11,375 UI thread is running again, resuming

Looking at these numbers ("10.3 vs. 0.5"), I'm guessing that just upping the poling period to 2.0 seconds will be enough.


Tue, 03 Nov 2015 15:40:17 GMT - Antoine Martin:

Testing with the 0.16.0 r11118 osx client against a 0.16.0 r11118 fedora 21 server, with -d xpra.platform.ui_thread_watcher I'm not seeing any sign of the UI thread blocked warnings.


Has it magically fixed itself, or is it the logging that prevents it from sleeping: does it re-appear if you turn off all debugging?


Tue, 03 Nov 2015 19:19:03 GMT - alas:

Running with no debugging, the message is back again (oddly, a relief?)... but my "quick" test didn't show any sign of the messages for a good 45 minutes... in fact, again, they didn't start until I had some spinners...

2015-11-03 09:39:39,769 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2015-11-03 10:26:50,270 server is not responding, drawing spinners over the windows
2015-11-03 10:26:54,341 server is OK again
2015-11-03 10:43:17,456 Warning: long timer waiting time,
2015-11-03 10:43:17,456  UI thread polling waited 10.0 seconds longer than intended (10.5 vs 0.5)
2015-11-03 10:43:17,457 UI thread is now blocked
2015-11-03 10:43:17,457 UI thread is running again, resuming
2015-11-03 10:44:14,816 Warning: long timer waiting time,
2015-11-03 10:44:14,816  UI thread polling waited 7.1 seconds longer than intended (7.6 vs 0.5)
2015-11-03 10:44:14,816 UI thread is now blocked
2015-11-03 10:44:14,819 UI thread is running again, resuming
...

Once they start, they continue to repeat every 1-3 minutes, give or take. (I'll try a few more times as opportunities present themselves, see if this pattern continues.)


Sat, 05 Dec 2015 11:27:00 GMT - Antoine Martin:

It would be interesting to see if having the debugging turned on helps with the sound stopping when left unattended for too long (#986?) - it might.


Wed, 09 Dec 2015 01:18:45 GMT - alas:

I actually have logs for a nine hour space of unattended video playing... sound seems to break whether or not the logs are running... and it looks like the UI thread gets blocked like this periodically throughout the whole period.

Of course, those tests were run a little while back. I'll try again with the latest, and see what I see. Since there was no sign of any impact, however, based on my attempts to grep through the logs for the UI thread blocking while searching for something to explain what was happening with the sound.


Fri, 11 Dec 2015 21:37:28 GMT - alas:

Interesting.

When I connect with -d sound on both server and client (0.16.0 r11304 osx client against 0.16.0 r11342 fedora 21 server)... I do get UI thread is now blocked messages when I leave the session unattended overnight (and the sleep handlers start kicking in, etc.) - but when I ran and left it idle playing sound during the day (periodically fondling it back awake when the screensaver tried to start), I didn't see any of the UI thread messages.

I will try to repeat with an r11206 osx client (noticed the r11304 and r11261 from your beta repo have no mp3 codecs, which is probably not related to the issue of sound stopping after several hours, #986, but I'll go ahead and run yet another test just to see... and comment in #986 as well).


Tue, 12 Jan 2016 19:16:47 GMT - alas: owner changed

Testing with osx 0.17.0 r11640 (with working vorbis) against a fedora 23 0.17.0 r11649 server (also with working vorbis, coincidentally)... I am still seeing UI thread messages when running semi-idle video tests with -d sound— but it takes about half an hour before they begin to appear.

There don't seem to be any ill effects, however. With vorbis, the sound lasts all night, despite approximately 8000 instances of the UI thread error over about a 15 hour session of running video while doing other things.

Seem to come as much as several times a minute after about half an hour, despite the energy saver settings set to not turn off the display until after an hour, so I don't think it's just a matter of sleep settings inducing them (I've seen them start after a spinner in many cases, not sure it hasn't been a bunch of coincidences though).

In any case, since the clients appear to always recover, and the messages don't even seem liable to be related to sound anymore... I'm inclined to close this as worksfor... us? Passing it back to you to think over (will also comment on the results of vorbis vs. mp3/wavpack in #986).


Wed, 13 Jan 2016 05:42:54 GMT - Antoine Martin: status changed; resolution set

Closing as "wontfix", we'll just ignore the warnings for now.

I am not removing them because of if we get other problems, they could be related and having these warnings nearby might be a useful clue.


Thu, 16 Jun 2016 12:28:45 GMT - Antoine Martin:

See also: ticket:249#comment:15


Fri, 16 Sep 2016 04:37:12 GMT - Antoine Martin:

It's also possible for OSX to sleep in between our checks for the slow UI thread, which would manifest itself with just this message:

Warning: long timer waiting time,
 UI thread polling waited 10120.2 seconds longer than intended (10120.7 vs 0.5)

As of r13749 we now also fire the "resume" callbacks in this case, even though we never fired the "fail" callbacks prior to that. It will log an extra UI thread is running again, resuming. This means that the display should be refreshed with a lossless update at this point.


Sat, 23 Jan 2021 05:07:58 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/855