Xpra: Ticket #684: Win32 xpra client 0.15 sometimes crashes at GTK assertion hdc_count==0

Running 3 windows of google-chrome I experienced a couple of crashes of the windows client Xpra_Setup_0.15.0-r7577M.exe running against 0.14.4 Linux server. When the error occurs, the log on Windows (Win7 Pro) tells:

Gdk:ERROR:gdkdrawable-win32.c:2040:_gdk_win32_drawable_finish: assertion failed: (impl-)hdc_count == 0)

Then it says "This aplication has requested the Runtime to terminate it in an unusual way". The server terminates 60 seconds later due to a timeout.



Wed, 17 Sep 2014 17:06:58 GMT - Antoine Martin: owner, description changed

Ouch, is this a new thing that only happens with the beta? Can you easily reproduce it? If so, can you try an older beta build until you find one without the bug? (that will help us narrow it down)


Fri, 19 Sep 2014 09:56:52 GMT - Antoine Martin: owner changed

This could be a serious bug, afarr can you reproduce it?


Fri, 26 Sep 2014 01:01:07 GMT - alas:

Testing with xpra_setup_0.15.0-r7804 win client (windows 7), and your 0.14.4 xpra-0.14.4-1.fc20.x86_64.rpm fedora 20 server I wasn't able to get the crash listed with 3 google-chrome windows... but when I tried to open the session info my server kept disconnecting me with a huge amount of what looks like keyboard mapping debug info.

On the client side I did notice this:

C:\Program Files (x86)\Xpra>xpra_cmd.exe attach tcp:10.0.32.53:1201
2014-09-25 17:52:33,023 rencode import error: No module named rencode
2014-09-25 17:52:34,226 Warning: 'rencode' packet encoder not found
2014-09-25 17:52:34,230  the other packet encoders are much slower
2014-09-25 17:52:34,232 xpra client version 0.15.0
2014-09-25 17:52:38,543 OpenGL_accelerate module loaded
2014-09-25 17:52:38,545 Using accelerated ArrayDatatype
2014-09-25 17:52:38,548 OpenGL support could not be enabled:
2014-09-25 17:52:38,549  some required OpenGL functions are not available: glActiveTexture, glMultiTexCoord2i

... on connection, but the client worked despite that. The disconnection message client side was a more innocuous:

2014-09-25 17:52:53,686 Connection lost
2014-09-25 17:52:54,714 server is not responding, drawing spinners over the windows

Fri, 26 Sep 2014 02:32:32 GMT - Antoine Martin:

but when I tried to open the session info my server kept disconnecting me with a huge amount of what looks like keyboard mapping debug info


This sounds like a bug caused by a problem encoding info data with the fallback bencoder. This is a serious bug that needs to be fixed. Please reproduce it and create a new ticket for it (linking to #614 which is where this should have been caught), forcing bencode if rencode has been fixed in your builds with: --packet-encoders=bencode. (can verify it is using bencode via session info or xpra info)
The opengl failures would be worth recording in #679: which chipset this is, with the opengl failure this causes: some required OpenGL functions are not available: glActiveTexture, glMultiTexCoord2i.

To test for this crash, I believe you need to use a machine which has opengl enabled. (at least I think that is where the _gdk_win32_drawable_finish is coming from) And since it looks opengl related, playing with single and double buffering may also make a difference. (using XPRA_OPENGL_DOUBLE_BUFFERED=0 for disabling double buffering, double buffering status is shown on session info)


Fri, 26 Sep 2014 16:31:46 GMT - sschnitzer:

Sorry for the delay. I tried to reproduce the bug. Right now I get "internal error: error in network packet reading/parsing"in xpra\net\protocol.pyc (line 585). I used just two "google-chrome" windows with heise.de opened. Then some patience, maybe window moves (did not really figure out any pattern, yet), and it crashes...

The same error occurs with Win32 0.14.4 (then its line 587). So I guess I cannot find an older version without bug. OpenGL is disabled. Session info says "n/a".

I need some instructions on how to proceed. I can offer to record a "tcpdump" of all the traffic. In that case I need to now whether I shall record the traffic between client and proxy or between proxy and server. However, I would like to send it to PM since it might contain personal information.


Fri, 26 Sep 2014 16:39:40 GMT - Antoine Martin:

Right now I get "internal error: error in network packet reading/parsing"in xpra\net\protocol.pyc (line 585).


That's odd, and completely different from the bug above.

In that case I need to now whether I shall record the traffic between client and proxy or between proxy and server.


So you're using the proxy... that's a crucial bit of information which was missing until now. I suspect that for one reason or another, your proxy ends up using bencode instead of rencode, and chokes on something. Getting xpra info from it might help. As would running with -d network to see more details about the cause of the parsing loop crash.


Fri, 26 Sep 2014 19:35:01 GMT - sschnitzer:

How can I get xpra info from the proxy process?

$xpra info :100
Warning: running as root
server requires authentication, please provide a password

According to xpra man page there is no possibility to pass username and password. And if I use tcp:user:host:port syntax, I would reach the server xpra, not the proxy. Or did you actually mean getting "xpra info" from the server process?


Fri, 26 Sep 2014 20:24:04 GMT - sschnitzer:

With 0.14.4 client and server I just reproduced the bug. Here is the relevant part of the proxy log:

...
2014-09-26 21:58:11,197 processing packet draw
2014-09-26 21:58:11,197 add_packet_to_queue(draw ...)
2014-09-26 21:58:11,215 processing packet damage-sequence
2014-09-26 21:58:11,215 add_packet_to_queue(damage-sequence ...)
2014-09-26 21:58:11,232 internal error: read connection SocketConnection(('1.2.3.1', 123) - ('1.2.3.4', 49542)) reset: [Errno 104] Connection reset by peer
2014-09-26 21:58:11,233 connection lost: read connection SocketConnection(('1.2.3.1', 123) - ('1.2.3.4', 49542)) reset: [Errno 104] Connection reset by peer
...

The log for the server is almost the same (except host and port numbers) And here the relevant part of the client log:

2014-09-26 21:58:10,015 do_expose_event(<gtk.gdk.Event at 046018F0: GDK_EXPOSE area=[56, 181, 970, 250]>) area=gtk.gdk.Rectangle(56, 181, 970, 250)
2014-09-26 21:58:10,030 processing packet draw
2014-09-26 21:58:10,030 process_draw 2455083 bytes for window 2 using rgb24 encoding with options={'rgb_format': 'RGB'}
2014-09-26 21:58:10,030 draw_region(0, 0, 1151, 711, rgb24, 2455083 bytes, 3453, {'rgb_format': 'RGB'}, [<function record_decode_time at 0x04703730>, <function
after_draw_refresh at 0x047035F0>])
2014-09-26 21:58:10,030 record_decode_time(True) wid=2, rgb24: 1151x711, 0.0ms
2014-09-26 21:58:10,046 after_draw_refresh(True) 1151x711 at 0x0 encoding=rgb24, options={'rgb_format': 'RGB'}
2014-09-26 21:58:10,046 do_expose_event(<gtk.gdk.Event at 046018D8: GDK_EXPOSE area=[0, 0, 1151, 711]>) area=gtk.gdk.Rectangle(0, 0, 1151, 711)
2014-09-26 21:58:10,046 add_packet_to_queue(damage-sequence ...)
2014-09-26 21:58:10,078 internal error: error in network packet reading/parsing
Traceback (most recent call last):
  File "xpra\net\protocol.pyc", line 587, in _read_parse_thread_loop
  File "xpra\net\protocol.pyc", line 616, in do_read_parse_thread_loop
MemoryError
2014-09-26 21:58:10,078 connection lost: error in network packet reading/parsing
2014-09-26 21:58:10,078 close() closed=False
2014-09-26 21:58:10,078 terminate_queue_threads()
2014-09-26 21:58:10,078 Connection lost

Additionally, I hava attached the output of "xpra info :DISPLAY" on the server process.


Fri, 26 Sep 2014 20:25:41 GMT - sschnitzer: attachment set

Output of "xpra info" from 0.14.4 server


Sat, 27 Sep 2014 04:19:57 GMT - Antoine Martin: owner changed

How can I get xpra info from the proxy process?


Run xpra list as the user who owns the proxy instance (not the proxy master server) and you will see an entry you can connect to.

Traceback (most recent call last):
  File "xpra\net\protocol.pyc", line 587, in _read_parse_thread_loop
  File "xpra\net\protocol.pyc", line 616, in do_read_parse_thread_loop
MemoryError

That's odd, is your client under memory pressure? The only relevant link I found is this ticket: socket read() can cause MemoryError in Windows

Does it make any difference if you use --encodings=jpeg (to force jpeg only)


Sat, 27 Sep 2014 08:00:47 GMT - sschnitzer:

xpra list does not show the proxy instance, only the xpra server. ps lists the forked proxy instance and the xpra server, but xpra list just displays the latter one as LIVE session at :1001.

I tried again, with the latest Windows client 0.15.0-r7639, using default xpra.conf and the commandline parameters attach --debug=all --username=... --socket-dir=C:\temp\xpra --password-file=... --encoding=jpeg -z 0 --border=... --video-encoders=x264 tcp:... (still using 0.14.4 server).

The system has 4 GB of RAM with nothing else than xpra running. I monitored memory consumption with the Windows task manager. Now I again get the "impl->hcd_count == 0" error from the beginning.

When I start the xpra client, it restores the two google-chrome windows which I immediately minimized. In that state, the xpra_cmd.exe process uses about 60 MB of memory. However, as soon as I unminimize both windows again, the memory consumption starts growing linearly. Then at some point, it crashes. From a few runs I guess the crash occurs somewhere between 600 MB and 1.2 GB, after about 1 or 2 minutes of waiting.

I already tried to remove the optional command line options one by one. Even with attach --username=... --password-file=... tcp:... just one google-chrome window, memory keeps growing. From my experience, memory grows if some rendering updates are transmitted. To clarify what I mean, a short example: I started Firefox with AdblockPlus? showing heise.de. Memory consumption is stable. Whenever I reduce overlap form other windows, memory consumption jumps a bit up. When I disable AdblockPlus? some animated ad shows up on the right. If I don't overlap it, memory consumption linearly grows quite fast (a few MB/s).

Without looking into the code I guess there is a severe memory leak in the xpra win32 client. At least according to task manager there was always more than 2 GB of free memory available. I don't know why it crashes far before the 4 GB boundary, maybe stack corruption?


Sat, 27 Sep 2014 11:23:29 GMT - Antoine Martin:

xpra list does not show the proxy instance


It does, but as I mentioned before, you need to run xpra list as the same user as the proxy instance. Maybe root in your case? I have added more information here: wiki/ProxyServer.

Now I again get the "impl->hcd_count == 0" error from the beginning.


"Now"? As in, did it stop happening?

memory keeps growing


Sounds like there is a memory leak (which we will fix), but I doubt this has anything to do with the hdc_count crash. I've moved this one to #696.

crashes far before the 4 GB boundary


Xpra is a 32-bit process, and not all 4GB are addressable, so it would be expected to crash before reaching 4GB with this memory leak.


Mon, 13 Oct 2014 10:12:40 GMT - Antoine Martin:

Can I close this ticket and #696?


Mon, 13 Oct 2014 13:14:39 GMT - sschnitzer:

It seems that only I can reproduce this issue. Maybe the best way to proceed is that I will try 0.15 trunk for both, client and server and see if I still can reproduce it then.

I don't know why I got "internal error: error in network packet reading/parsing", actually I can only reproduce the "impl->hcd_count == 0".

Concerning #696 I have no idea about which "more details" could help.

I plan to check with trunk next week. If suggest to leave the ticket open until then.


Sat, 25 Oct 2014 18:03:57 GMT - sschnitzer:

I reproduced the crash with the following setup:

Server:

Centos 7, up to date
Additional repo: www.xpra.org/dists/CentOS/7
xpra version: trunk (SVN, a few hours ago)
./setup.py build+install with no further options, only "export PKG_CONFIG_PATH=/usr/lib64/xpra/pkgconfig:/usr/share/pkgconfig" is needed
default xpra.conf

Started server processes:

xpra proxy :100 --socket-dir=/tmp --bind-tcp=1.2.3.4:555 --auth=file --password-file=/etc/xpra/xpra.auth --no-daemon
xpra start --debug=all :1234 --bind-tcp=127.0.0.1:31234 --no-daemon

Client:

Win7 x64, 4GB RAM
xpra version 0.15.0-r7928
client command line parameters: attach --username=xyz --password-file=PATH\TO\FILE tcp:1.2.3.4:5555
default xpra.conf

Szenario:

Start google-chrome
1. Open any page with lots of movements, I chose youtube.de this time
2. Watch memory consumption grow and crash approach.
3. After the crash, just reconnect and continue with step 2.

The reason I previously was not able to provide the content of xpra info was the option --socket-dir=/tmp. I needed to specify the socket-dir also on list and info. Attached you find the output of the proxy instance at the point in time the client is just crashing. I tried multiple times: the memory consumption always seems to go up to about 1.8 GB and then it crashes. If I use encoding=rgb for the client, I get a differen error. For details I will attach screenshots of both scenarios.

I am afraid the error is again different. However, with encoding=rgb it clearly says "Memory Error". So I still guess the error messages are just many different ways of indicating something like stack overflow.

Let me know if I can provide any further information.


Sat, 25 Oct 2014 18:04:49 GMT - sschnitzer: attachment set


Sat, 25 Oct 2014 18:05:00 GMT - sschnitzer: attachment set


Sat, 25 Oct 2014 18:05:12 GMT - sschnitzer: attachment set


Sat, 25 Oct 2014 18:55:47 GMT - Antoine Martin:

All this latest info seems to be related to the memory leak (which should have gone into #696) and not to the GTK assertion hdc_count==0 of this ticket, is that right? Or are you still also getting the hdc error?

Can you try with 0.14.10 just released today? (Client side, server probably does not matter much)

Does the memory leak progress at the same speed with all encodings?


Tue, 09 Dec 2014 03:37:06 GMT - Antoine Martin:

Can I close this? And follow up the memory leak in #696?


Tue, 09 Dec 2014 08:21:55 GMT - sschnitzer:

Yes, thats fine.


Tue, 09 Dec 2014 17:02:19 GMT - Antoine Martin: status changed; resolution set


Sat, 23 Jan 2021 05:02:55 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/684