Fedora 20 x86_64 server, Win XP client.
started the following client applications from the
while true; do gtkperf -a; sleep 1; done
It eventually crashed the xpra server, without any error message...
And now that I have
gdb attached to the server process so that I can get a backtrace, it is refusing to crash!
Can you reproduce?
You can attach
gdb to the
xpra server process with:
gdb /usr/bin/python (...) > att $XPRA_SERVR_PID (...) > continue
Then when it crashes, get the backtrace with:
You can find the current value of
pidof -x xpra.
You can just leave it all running until you get the crash. (if?)
It would also be useful to keep an eye on CPU and memory usage. Even though this is a pathological test, the worst possible application to run in xpra, we should handle it as gracefully as we can.
I tried with windows 0.12.5 r6398 against fedora 19 server also running r6398... for about 75 minutes.
No Hate. (It wouldn't crash)
Client side cpu seems to be oscillating between about 30-70% and seems to hover around 80% memory (my windows machine doesn't have a lot of spare memory, just FYI, in case that might affect that stat).
Server-side cpu seems to be oscillating between about 90-111%, though memory seems negligible (Xorg is consistently using about 37%, while glxgears is closer to 0.3%) ... though I wasn't able to check those server-side numbers while running the second round of tests for #567 against an xpra session hosted by a different user on the same test fedora 19 VM... on the other hand, whatever extra stress that might've caused also wasn't enough to induce a crash).
Does this test need to be run against a Fedora 20 server? (I haven't taken the time to install glx gears, or much else, on the Fedora 20 server that I supposedly have available.) Come Monday I should be able to run this for hours and use other machines to get work done, so let me know if it needs to be fedora 20, and I'll abuse whatever systems needed.
OK, Fedora 19 should do just as well. But worth running again on Monday just to be sure.
If neither of us can reproduce it, we'll just assume it was fluke and close this ticket.
Keep an eye on the xpra server RAM usage, as
gtkperf is likely to expose leaks there if there are any.
I ran the test again, same client same server. Technically I ran it a couple of times, but the first time through I tried to move the
glxgears window out from under the winking
gtkperf window, and in the process caused everything to crash. (I might try to reproduce that soon.)
Running through without trying to move windows, the test ran like a champ for just shy of 4 hours (I had to re-start glxgears a couple of times though, not sure if it just times out or if that was something I should follow up with).
2014-05-12 14:53:17,855 delayed_region_timeout: something is wrong, is the connection dead? 2014-05-12 14:53:22,267 delayed_region_timeout: something is wrong, is the connection dead? 2014-05-12 14:53:22,679 delayed_region_timeout: something is wrong, is the connection dead?
The client cpu and memory were about the same as they had been all along (between 40-85% cpu, about 86% memory). The xpra process wasn't using up any particular cpu or memory, but the Xorg memory had been climbing all along - from about 30% at one hour, to about 60% after two, up to about 90% by three... though it seemed to flatten at that point and level back at 88%... until the end, when it climbed up to about 93.9%... and flattened while the session went into spinners.
I had typed the
continue into the gdb, and whatever was going on, that just froze, mostly. It had been tracing things all along, until it gave me this:
Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0x7f8a6c4f1700 (LWP 651)] 0x0000003ef200e5bb in send () from /lib64/libpthread.so.0 (gdb) Continuing.
I tried to run a backtrace, but there was no response, just mention of more threads exiting:
bac [Thread 0x7f8a6bcf0700 (LWP 652) exited] ktrace [Thread 0x7f8a6c4f1700 (LWP 651) exited]
When the client finally lost connection, it output the following:
2014-05-12 15:02:23,806 server is OK again 2014-05-12 15:02:30,667 server is not responding, drawing spinners over the windows 2014-05-12 15:03:29,236 server ping timeout - waited 60 seconds without a response 2014-05-12 15:03:29,641 Connection lost C:\Program Files (x86)\Xpra\library.zip\xpra\client\gtk_base\gtk_client_window_base.py:103: GtkWarning: gtk_widget_map: assertion `gtk_widget_get_visible (widget)' failed
Meanwhile, server output this:
2014-05-12 15:01:23,465 delayed_region_timeout: something is wrong, is the connection dead? 2014-05-12 15:01:45,062 XShmWrapper.shmget(PRIVATE, 656180 bytes, 1023) failed, bytes_per_line=2180, width=545, height=300 2014-05-12 15:02:06,516 disabling XShm support following irrecoverable error 2014-05-12 15:03:59,211 read connection reset or aborted for SocketConnection(('10.0.32.172', 1201) - ('10.0.11.181', 57536))
I never did manage to get any backtrace. Hope the rest of it is of some use.
I tried to move the
glxgearswindow out from under the winking
gtkperfwindow, and in the process caused everything to crash. (I might try to reproduce that soon
Xorg memory had been climbing all along - from about 30% at one hour, to about 60% after two, up to about 90%..
That would explain the spinners and the disconnection. If the memory is full, the system is going to swap and collapse soon after. This particular bug is already tracked in #535, so we can ignore memory related problems for this ticket.
I tried the test again, same server version same client version.
glxgears window out from under the winking
gtkperf window, while a challenge because of focus racing issues, caused no problems. To be sure, I periodically moved the
glxgears window around the display, to a second display, back under the
gtkperf window, back out from under... etc. - no issues.
I tried disconnecting the client, then reconnecting, then repeated moving the
glxgears window around the displays - still no issues.
I let the test run for two hours, periodically moving the
glxgears window around and around - still no issues.
Oddly, this time through, after 2 hours, the Xorg process was still only up to around 17% memory (after 2 hours on the test before, it was up to around 60%). Also, the
glxgears process never stopped, thereby never requiring that I go back to the xterm to re-launch.
I doubt it's surprising or notable, but the glxgears generally stopped spinning while the
gtkperf window was active.
Maybe it just wasn't as hot today and the test VM server just behaved better?
Anything else you'd like me to try for this case?
afarr: were the versions the same as last time? It would be worth trying different encodings. Like you said, maybe the server was behaving better (or worse) than last time and using a different encoding.
Versions were the same, 0.12.5 r6398 windows client, 0.13.0 r6398 fedora 19 server.
Repeated test for 2 hours with
--encoding=jpeg, which seemed to be a closer performance to what I saw with the first test - the glxgears had to be restarted a number of times (and by the 1:45 mark, or so, the glxgears FPS dropped to 13.701, then all the way to .002). Moving the glxgears window around didn't cause any particular problems though, and there were no crashes.
Repeated for another half hour with
--encoding=rgb (reconnecting to still running server session). Performance was about the same (glxgears required no restarting).
Repeated yet again for another half hour with
--encoding=png, again reconnecting to the still running server session. Performance was, again, about the same, though glxgears did require one restart.
No sign of any crashes though.
You were having to restart the
glxgears because xpra is so swamped with requests to create and destroy windows from
gtkperf that it struggles to cope with any amount of pixels for
If it still hasn't crashed, I think we can assume this is working as well as it should. Closing.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/569