xpra icon
Bug tracker and wiki

Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#855 closed defect (wontfix)

UI thread gets stuck on osx (at random?)

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: major Milestone: 0.16
Component: platforms Version: 0.15.x
Keywords: osx Cc:

Description (last modified by Antoine Martin)

Reported in ticket:669#comment:40 and #1002, it looks like this:

UI thread is now blocked
UI thread is running again, resuming

The weird thing is that it isn't just the UI thread, the polling thread also gets stopped.
Funnily enough, sound keeps on playing, so not all threads are blocked!?

We should figure out why things gets stuck, and what we can do about it.
At least we probably shouldn't suspend then resume immediately: if the polling thread got stuck, we can just skip the events.

See also: #1003, #986

Change History (14)

comment:1 Changed 4 years ago by Antoine Martin

Description: modified (diff)
Status: newassigned

comment:2 Changed 4 years ago by Antoine Martin

Description: modified (diff)
Owner: changed from Antoine Martin to alas
Priority: minormajor
Status: assignednew

@afarr: please provide the information requested in ticket:1002#comment:1 here.

comment:3 Changed 4 years ago by alas

Owner: changed from alas to Antoine Martin

Is the gap between the "blocked" and "running again" always this low? (here just 1ms)


Not always, but at least 9 times out of 10 the gap is between 0 and 23 ms.

Occasionally, when going through about 10 hours of logs with -d sound on to see if I could spot any cause for sound apparently stopping on sessions left idle overnight, I did see the UI thread is now blocked message with no indication of it resuming for 20 seconds or so, after which I see another UI thread is now blocked message followed by an immediate resumption message. (Which might be the result of overnight spinners, client trying to go to sleep, or who knows what, I suppose.)

Trying to connect with -d util, though, I see a never-ending stream of output repeating:

2015-10-30 17:02:53,207 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:53,207 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:53,208 UI_thread_wakeup()
2015-10-30 17:02:53,271 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,331 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,397 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,455 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,509 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,583 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,639 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,708 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:53,709 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:53,709 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,709 UI_thread_wakeup()
2015-10-30 17:02:53,769 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,826 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,878 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:53,954 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,006 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,082 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,137 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61218)>, [])]
2015-10-30 17:02:54,209 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-10-30 17:02:54,209 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x7d84970>]
2015-10-30 17:02:54,210 UI_thread_wakeup()

... I'm not seeing any UI thread messages while the ... oops, typed too soon. I eventually saw a UI thread message, despite the util output:

2015-10-30 17:21:32,921 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,000 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,080 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,131 firing callback for 'info': [(<bound method sink_subprocess_wrapper.info_update of sink_subprocess_wrapper(61312)>, [])]
2015-10-30 17:21:33,154 poll_UI_loop() last_UI_thread_time was 10.3 seconds ago (max 1), UI_blocked=False
2015-10-30 17:21:33,154 UI thread is now blocked
2015-10-30 17:21:33,155 UI_thread_wakeup()
2015-10-30 17:21:33,155 UI thread is running again, resuming

There was a longer than usual collection of 'info' callbacks before the UI thread was blocked, but it looks like that's included in the message (had nothing running at the time but an idle gedit window).

(Passing back to you so it doesn't get lost in the pile of my tickets.)

comment:4 Changed 4 years ago by Antoine Martin

Owner: changed from Antoine Martin to alas
  • r11103 will log how long it actually takes to do the probing, maybe everything gets delayed on the system and not just the UI thread. This would then show up as warning looking like this:
    Warning: long waiting time:
     UI thread polling waited 1.0 longer than intended (2.0 vs 1.0)
    
  • since 9 out of 10 are close to 1 second, maybe the machine goes into a sleep or low power mode and does not service timers as regularly as normal, if that's the case then maybe we just need to increase the polling timer, try running the client with:
    XPRA_UI_THREAD_POLLING=2000 ...
    

(or even higher values?)
Which will increase the polling delay to 2 seconds (2000 milliseconds).


Just for the record:

  • these warnings also show up when the user activates the global menu - that's what this polling timer is for, we can't avoid this
  • I sometimes forget that the logger can use the module name to restrict logging to particular file, in this case -d xpra.platform.ui_thread_watcher is a much more restricted logging subset of -d util

comment:5 Changed 4 years ago by alas

Testing with the 0.16.0 r11118 osx client against a 0.16.0 r11118 fedora 21 server, with -d xpra.platform.ui_thread_watcher I'm not seeing any sign of the UI thread blocked warnings.

Ran for about 20 minutes, only thing I saw (aside from the expected UI message when opening the Session Info) was a lot of messages like:

2015-11-02 14:14:59,213 wait(0.5000) actually waited 0.5011
2015-11-02 14:14:59,213 poll_UI_loop() last_UI_thread_time was 0.5 seconds ago (max 1), UI_blocked=False
2015-11-02 14:14:59,213 poll_UI_loop() ok, firing [<function timer_clipboard_check at 0x9141bb0>]
2015-11-02 14:14:59,213 UI_thread_wakeup()

... which I'm guessing is expected.

  • Running without the debug flag, it took nearly 20 minutes before I got output...
2015-11-02 14:25:21,900 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2015-11-02 14:53:41,143 Warning: long timer waiting time,
2015-11-02 14:53:41,150  UI thread polling waited 9.9 seconds longer than intended (10.4 vs 0.5)
2015-11-02 14:53:41,157 UI thread is now blocked
2015-11-02 14:53:41,158 UI thread is running again, resuming
2015-11-02 14:54:02,176 server is not responding, drawing spinners over the windows
2015-11-02 14:54:02,434 server is OK again
2015-11-02 14:55:11,094 Warning: long timer waiting time,
2015-11-02 14:55:11,094  UI thread polling waited 9.8 seconds longer than intended (10.3 vs 0.5)
2015-11-02 14:55:11,094 UI thread is now blocked
2015-11-02 14:55:11,098 UI thread is running again, resuming
2015-11-02 14:56:11,303 Warning: long timer waiting time,
2015-11-02 14:56:11,303  UI thread polling waited 9.6 seconds longer than intended (10.1 vs 0.5)
2015-11-02 14:56:11,303 UI thread is now blocked
2015-11-02 14:56:11,303 UI thread is running again, resuming
2015-11-02 14:57:01,900 UI thread is now blocked
2015-11-02 14:57:01,904 UI thread is running again, resuming

It looks like there were several such messages in the wake of a spot of spinners? After a few more minutes I got a couple more.

2015-11-02 14:59:11,374 Warning: long timer waiting time,
2015-11-02 14:59:11,374  UI thread polling waited 9.8 seconds longer than intended (10.3 vs 0.5)
2015-11-02 14:59:11,374 UI thread is now blocked
2015-11-02 14:59:11,375 UI thread is running again, resuming

Looking at these numbers ("10.3 vs. 0.5"), I'm guessing that just upping the poling period to 2.0 seconds will be enough.

  • Trying, just to see what would happen, with XPRA_UI_THREAD_POLLING=3000, I saw no sign of the UI thread blocked messages after letting the session run for half an hour ... but I also had no instances of spinners - so maybe that's what's causing the polling delays at this point?

comment:6 Changed 4 years ago by Antoine Martin

Testing with the 0.16.0 r11118 osx client against a 0.16.0 r11118 fedora 21 server, with -d xpra.platform.ui_thread_watcher I'm not seeing any sign of the UI thread blocked warnings.


Has it magically fixed itself, or is it the logging that prevents it from sleeping: does it re-appear if you turn off all debugging?

comment:7 Changed 4 years ago by alas

Running with no debugging, the message is back again (oddly, a relief?)... but my "quick" test didn't show any sign of the messages for a good 45 minutes... in fact, again, they didn't start until I had some spinners...

2015-11-03 09:39:39,769 sound-sink using audio codec: MPEG 1 Audio, Layer 3 (MP3)
2015-11-03 10:26:50,270 server is not responding, drawing spinners over the windows
2015-11-03 10:26:54,341 server is OK again
2015-11-03 10:43:17,456 Warning: long timer waiting time,
2015-11-03 10:43:17,456  UI thread polling waited 10.0 seconds longer than intended (10.5 vs 0.5)
2015-11-03 10:43:17,457 UI thread is now blocked
2015-11-03 10:43:17,457 UI thread is running again, resuming
2015-11-03 10:44:14,816 Warning: long timer waiting time,
2015-11-03 10:44:14,816  UI thread polling waited 7.1 seconds longer than intended (7.6 vs 0.5)
2015-11-03 10:44:14,816 UI thread is now blocked
2015-11-03 10:44:14,819 UI thread is running again, resuming
...

Once they start, they continue to repeat every 1-3 minutes, give or take. (I'll try a few more times as opportunities present themselves, see if this pattern continues.)

comment:8 Changed 4 years ago by Antoine Martin

It would be interesting to see if having the debugging turned on helps with the sound stopping when left unattended for too long (#986?) - it might.

Last edited 4 years ago by Antoine Martin (previous) (diff)

comment:9 Changed 4 years ago by alas

I actually have logs for a nine hour space of unattended video playing... sound seems to break whether or not the logs are running... and it looks like the UI thread gets blocked like this periodically throughout the whole period.

Of course, those tests were run a little while back. I'll try again with the latest, and see what I see. Since there was no sign of any impact, however, based on my attempts to grep through the logs for the UI thread blocking while searching for something to explain what was happening with the sound.

comment:10 Changed 4 years ago by alas

Interesting.

When I connect with -d sound on both server and client (0.16.0 r11304 osx client against 0.16.0 r11342 fedora 21 server)... I do get UI thread is now blocked messages when I leave the session unattended overnight (and the sleep handlers start kicking in, etc.) - but when I ran and left it idle playing sound during the day (periodically fondling it back awake when the screensaver tried to start), I didn't see any of the UI thread messages.

I will try to repeat with an r11206 osx client (noticed the r11304 and r11261 from your beta repo have no mp3 codecs, which is probably not related to the issue of sound stopping after several hours, #986, but I'll go ahead and run yet another test just to see... and comment in #986 as well).

comment:11 Changed 3 years ago by alas

Owner: changed from alas to Antoine Martin

Testing with osx 0.17.0 r11640 (with working vorbis) against a fedora 23 0.17.0 r11649 server (also with working vorbis, coincidentally)... I am still seeing UI thread messages when running semi-idle video tests with -d sound— but it takes about half an hour before they begin to appear.

There don't seem to be any ill effects, however. With vorbis, the sound lasts all night, despite approximately 8000 instances of the UI thread error over about a 15 hour session of running video while doing other things.

Seem to come as much as several times a minute after about half an hour, despite the energy saver settings set to not turn off the display until after an hour, so I don't think it's just a matter of sleep settings inducing them (I've seen them start after a spinner in many cases, not sure it hasn't been a bunch of coincidences though).

In any case, since the clients appear to always recover, and the messages don't even seem liable to be related to sound anymore... I'm inclined to close this as worksfor... us? Passing it back to you to think over (will also comment on the results of vorbis vs. mp3/wavpack in #986).

comment:12 Changed 3 years ago by Antoine Martin

Resolution: wontfix
Status: newclosed

Closing as "wontfix", we'll just ignore the warnings for now.

I am not removing them because of if we get other problems, they could be related and having these warnings nearby might be a useful clue.

comment:13 Changed 3 years ago by Antoine Martin

comment:14 Changed 3 years ago by Antoine Martin

It's also possible for OSX to sleep in between our checks for the slow UI thread, which would manifest itself with just this message:

Warning: long timer waiting time,
 UI thread polling waited 10120.2 seconds longer than intended (10120.7 vs 0.5)

As of r13749 we now also fire the "resume" callbacks in this case, even though we never fired the "fail" callbacks prior to that.
It will log an extra UI thread is running again, resuming.
This means that the display should be refreshed with a lossless update at this point.

Note: See TracTickets for help on using tickets.