xpra icon
Bug tracker and wiki

Opened 3 years ago

Closed 3 years ago

#1199 closed defect (fixed)

server with limited memory throwing decoding errors when desktop-scaling reduced below 100%

Reported by: alas Owned by: alas
Priority: major Milestone: 1.0
Component: server Version: trunk
Keywords: Cc:

Description

Testing with a server with 512 MB memory, to try to repro #1111, when scaling to below 100% I'm seeing h264 decode errors both server and client-side.

0.18.0 r12578 windows client, 0.18.0 r12577 fedora 23 server.

Server-side, I'm seeing:

2016-05-17 12:32:52,082 avcodec_decode_video2 h264 decoding failure:
2016-05-17 12:32:52,122  Invalid data found when processing input
2016-05-17 12:32:52,168 Warning: client decoding error: video decoder avcodec failed to decode 3137 bytes of h264 data
2016-05-17 12:32:52,174 Error: decode failed on 3137 bytes of h264 data
2016-05-17 12:32:52,175  1898x892 pixels using avcodec
2016-05-17 12:32:52,176  decoding options={'speed': 91, 'type': 'P', 'flush': 0, 'pts': 172, 'frame': 1, 'encoding': 'h264', 'quality': 40, 'csc': 'YUV420P'}
2016-05-17 12:33:04,201 draw error
2016-05-17 12:33:04,246   File "xpra\client\ui_client_base.pyc", line 2890, in _do_draw
2016-05-17 12:33:04,247   File "xpra\client\client_window_base.pyc", line 550, in draw_region
2016-05-17 12:33:04,248   File "xpra\client\window_backing_base.pyc", line 523, in draw_region
2016-05-17 12:33:04,248   File "xpra\client\window_backing_base.pyc", line 405, in paint_with_video_decoder
2016-05-17 12:33:04,251   File "xpra\codecs\dec_avcodec2\decoder.pyx", line 579, in xpra.codecs.dec_avcodec2.decoder.Decoder.decompress_image (xpra/codecs/dec_avcodec2/decoder.c:8992)
2016-05-17 12:33:04,253 Warning: client decoding error: codec width is smaller than our width: 1264<1898
2016-05-17 12:33:04,255 error processing draw packet
2016-05-17 12:33:04,256   File "xpra\client\ui_client_base.pyc", line 2830, in _draw_thread_loop
2016-05-17 12:33:04,257   File "xpra\client\ui_client_base.pyc", line 2890, in _do_draw
2016-05-17 12:33:04,258   File "xpra\client\client_window_base.pyc", line 550, in draw_region
2016-05-17 12:33:04,260   File "xpra\client\window_backing_base.pyc", line 523, in draw_region
2016-05-17 12:33:04,260   File "xpra\client\window_backing_base.pyc", line 405, in paint_with_video_decoder
2016-05-17 12:33:04,260   File "xpra\codecs\dec_avcodec2\decoder.pyx", line 579, in xpra.codecs.dec_avcodec2.decoder.Decoder.decompress_image (xpra/codecs/dec_avcodec2/decoder.c:8992)

Client-side:

Exception AssertionError: 'frame was freed by both, but not actually freed!' in 'xpra.codecs.dec_avcodec2.decoder.AVFrameWrapper.__dealloc__' ignored
2016-05-17 12:32:59,384 draw error
Traceback (most recent call last):
  File "xpra\client\ui_client_base.pyc", line 2890, in _do_draw
  File "xpra\client\client_window_base.pyc", line 550, in draw_region
  File "xpra\client\window_backing_base.pyc", line 523, in draw_region
  File "xpra\client\window_backing_base.pyc", line 405, in paint_with_video_decoder
  File "xpra\codecs\dec_avcodec2\decoder.pyx", line 579, in xpra.codecs.dec_avcodec2.decoder.Decoder.decompress_image (xpra/codecs/dec_avcodec2/decoder.c:8992)
AssertionError: codec width is smaller than our widAttribute eth: 1264<1898
2016-05-17 12:32:59,390 error processing draw packet
Traceback (most recent call last):
  File "xpra\client\ui_client_base.pyc", line 2830, in _draw_thread_loop
  File "xpra\client\ui_client_base.pyc", line 2890, in _do_draw
  File "xpra\client\client_window_base.pyc", line 550, in draw_region
  File "xpra\client\window_backing_base.pyc", line 523, in draw_region
  File "xpra\client\window_backing_base.pyc", line 405, in paint_with_video_decoder
  File "xpra\codecs\dec_avcodec2\decoder.pyx", line 579, in xpra.codecs.dec_avcodec2.decoder.Decoder.decompress_image (xpra/codecs/dec_avcodec2/decoder.c:8992)
AssertionError: codec width is smaller than our width: 1264<1898

... and, just as an added note, this seems to go hand in hand with the traceback in #1198 - except that this (these) tracebacks go away once I adjust the desktop-scaling back to 100% - while the traceback in #1198 continues to be triggered sporadically even afterward.

Trying the same to repro with a fedora 23 vm with a reasonable amount of resources (4GB?) ... I can't repro with either 0.17.0 or 0.18.0.

(Maybe a warning about so few resources is the best plan?... assuming you're even bothering to check.)

Attachments (3)

ticket1199_older-weak-fedora-server-wont-start.txt (92.8 KB) - added by alas 3 years ago.
The original weak vm failure to start logs
ticket1199_new-weak-fedora-server-wont-start.txt (254.3 KB) - added by alas 3 years ago.
The new vm, which also fails to launch a server, logs
ticket1199_d-paint-10.txt (255.3 KB) - added by alas 3 years ago.
Tenth try's the charm…

Download all attachments as: .zip

Change History (11)

comment:1 Changed 3 years ago by alas

As an additional note... after adjusting desktop-scaling back below 100% to repro, then back up to 100% to try to stop errors (and having done so after hours in between trying to repro #1111) ... I began seeing these same tracebacks even with desktop-scaling = 100%, as well as a host of others which continued after I disconnected the client... and even for a while after I SIGINT'd the server:

Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 853, in emit
    msg = self.format(record)
  File "/usr/lib64/python2.7/site-packages/xpra/colorstreamhandler.py", line 54, in format
    text = logging.StreamHandler.format(self, record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 726, in format
    return fmt.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 465, in format
    record.message = record.getMessage()
  File "/usr/lib64/python2.7/logging/__init__.py", line 329, in getMessage
    msg = msg % self.args
  File "colorspace_converter.pyx", line 254, in xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter.__repr__ (xpra/codecs/csc_libyuv/colorspace_converter.cpp:3702)
AttributeError: 'xpra.codecs.csc_libyuv.colorspace_converter.Colors' object has no attribute 'width'
Logged from file log.py, line 99
2016-05-17 17:49:18,745 server is not responding, drawing spinners over the windows
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 853, in emit
    msg = self.format(record)
  File "/usr/lib64/python2.7/site-packages/xpra/colorstreamhandler.py", line 54, in format
    text = logging.StreamHandler.format(self, record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 726, in format
    return fmt.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 465, in format
    record.message = record.getMessage()
  File "/usr/lib64/python2.7/logging/__init__.py", line 329, in getMessage
    msg = msg % self.args
  File "colorspace_converter.pyx", line 254, in xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter.__repr__ (xpra/codecs/csc_libyuv/colorspace_converter.cpp:3702)
AttributeError: 'xpra.codecs.csc_libyuv.colorspace_converter.Colors' object has no attribute 'width'
Logged from file log.py, line 99
2016-05-17 17:49:18,945 server is OK again
2016-05-17 17:49:19,776 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:19,831      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter object at 0x7efe7bada050>
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 853, in emit
    msg = self.format(record)
  File "/usr/lib64/python2.7/site-packages/xpra/colorstreamhandler.py", line 54, in format
    text = logging.StreamHandler.format(self, record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 726, in format
    return fmt.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 465, in format
    record.message = record.getMessage()
  File "/usr/lib64/python2.7/logging/__init__.py", line 329, in getMessage
    msg = msg % self.args
  File "colorspace_converter.pyx", line 254, in xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter.__repr__ (xpra/codecs/csc_libyuv/colorspace_converter.cpp:3702)
AttributeError: 'xpra.codecs.csc_libyuv.colorspace_converter.Colors' object has no attribute 'width'
Logged from file log.py, line 99
2016-05-17 17:49:23,763 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:24,213      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter object at 0x7efe7baea2c0>
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 853, in emit
    msg = self.format(record)
  File "/usr/lib64/python2.7/site-packages/xpra/colorstreamhandler.py", line 54, in format
    text = logging.StreamHandler.format(self, record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 726, in format
    return fmt.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 465, in format
    record.message = record.getMessage()
  File "/usr/lib64/python2.7/logging/__init__.py", line 329, in getMessage
    msg = msg % self.args
  File "colorspace_converter.pyx", line 254, in xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter.__repr__ (xpra/codecs/csc_libyuv/colorspace_converter.cpp:3702)
AttributeError: 'xpra.codecs.csc_libyuv.colorspace_converter.Colors' object has no attribute 'width'
Logged from file log.py, line 99
2016-05-17 17:49:25,234 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:25,284      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter object at 0x7efea44558d8>

...

2016-05-17 17:49:35,180 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:35,222      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.csc_libyuv.colorspace_converter.ColorspaceConverter object at 0x7efea44553f8>
2016-05-17 17:49:35,235 Warning: already 25 active instances of <type 'xpra.codecs.enc_x264.encoder.Encoder'>: [x264_encoder(YUV420P - 960x540), x264_encoder(YUV420P - 186x272), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 1060x594), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 960x540), x264_encoder(YUV420P - 186x260), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 960x540), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 186x272), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 186x260), x264_encoder(YUV420P - 186x260), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 1060x464), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 186x272), x264_encoder(YUV420P - 186x272), x264_encoder(YUV420P - 1072x594), x264_encoder(YUV420P - 1072x594)]
2016-05-17 17:49:36,479 referrers for x264_encoder(YUV420P - 960x540): 1
2016-05-17 17:49:36,575 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:36,585      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe7bae5050>
2016-05-17 17:49:36,705 referrers for x264_encoder(YUV420P - 186x272): 1
2016-05-17 17:49:36,883 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:36,941      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe7bae5890>
2016-05-17 17:49:37,063 referrers for x264_encoder(YUV420P - 1072x594): 1
2016-05-17 17:49:37,171 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:37,204      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe7bae5100>
2016-05-17 17:49:37,406 referrers for x264_encoder(YUV420P - 1072x594): 1
2016-05-17 17:49:37,512 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:37,727      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe9e641940>
2016-05-17 17:49:38,157 referrers for x264_encoder(YUV420P - 1072x594): 1
2016-05-17 17:49:38,701 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:39,088      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe7bae51b0>
2016-05-17 17:49:39,267 referrers for x264_encoder(YUV420P - 1060x594): 1
2016-05-17 17:49:40,182 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:40,217      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe9e6419f0>
2016-05-17 17:49:40,355 referrers for x264_encoder(YUV420P - 1072x594): 1
2016-05-17 17:49:40,483 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:40,485      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe7bae5260>
2016-05-17 17:49:41,049 referrers for x264_encoder(YUV420P - 960x540): 1
2016-05-17 17:49:41,146 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:41,244      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe9d977310>
2016-05-17 17:49:41,679 referrers for x264_encoder(YUV420P - 186x260): 1
2016-05-17 17:49:42,313 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:42,720      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe9d977470>
2016-05-17 17:49:43,337 referrers for x264_encoder(YUV420P - 1072x594): 1
2016-05-17 17:49:43,966 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:44,119      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe7bae5310>
2016-05-17 17:49:44,812 server is not responding, drawing spinners over the windows
2016-05-17 17:49:44,934 server is OK again
2016-05-17 17:49:45,389 referrers for x264_encoder(YUV420P - 960x540): 1
2016-05-17 17:49:46,525 sound source pipeline warning: Can't record audio fast enough
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/sound/sound_pipeline.py", line 322, in on_message
2016-05-17 17:49:46,504 [0] in <type 'builtin_function_or_method'>
    for x in w[1:].split(":"):
AttributeError: 'tuple' object has no attribute 'split'
2016-05-17 17:49:46,586      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe9e641b50>
2016-05-17 17:49:46,820 sound source pipeline warning: Can't record audio fast enough
2016-05-17 17:49:46,820 referrers for x264_encoder(YUV420P - 1072x594): 1
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/sound/sound_pipeline.py", line 322, in on_message
    for x in w[1:].split(":"):
AttributeError2016-05-17 17:49:46,876 [0] in <type 'builtin_function_or_method'>
: 2016-05-17 17:49:46,882      <type 'builtin_function_or_method'> : <built-in method clean of xpra.codecs.enc_x264.encoder.Encoder object at 0x7efe7bae53c0>
'tuple' object has no attribute 'split'
2016-05-17 17:49:47,010 referrers for x264_encoder(YUV420P - 186x272): 1
2016-05-17 17:49:48,288 [0] in <type 'builtin_function_or_method'>
2016-05-17 17:49:48,453 sound source pipeline warning: Can't record audio fast enough
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/sound/sound_pipeline.py", line 322, in on_message
    for x in w[1:].split(":"):
AttributeError: 'tuple' object has no attribute 'split'

... which obviously began tying in with #1198.

comment:2 Changed 3 years ago by Antoine Martin

Owner: changed from Antoine Martin to alas
Priority: minormajor

Some very interesting tracebacks in here.

r12594 will at least reduce the amount of warnings the decoder problem generates (the frame was freed by both, .. should be gone now), and also validate things earlier so it won't cause a memory leak when we do hit errors. This isn't really critical because it should be rare and relatively small, but it will be backported.

The AttributeError: 'xpra.codecs.csc_libyuv.colorspace_converter.Colors' object has no attribute 'width' error is a critical blooper I made in the libyuv csc module (#973), fixed in r12596 and which will also be backported.
It only triggered now because of other errors which then attempt to log the current state of the window video encoding, including the csc information, and this unfortunate bug ends up hiding the details we actually want to see, but this is actually unrelated and only triggered by the other problems.

Along the way, I also found the probable cause for Exception: delta region bucket 0 references pixmap data we do not have! as reported in #242, ticket:976#comment:33, ticket:1087#comment:1 and #989, it was meant to be finally fixed in r10922, but that was incomplete, we also needed r12597, which I will backport. I hope that's the final fix for that one.

The reason why you see the same message on both client and server is because in 0.17.x onwards have --remote-logging=both enabled by default: the client's log messages will show up on both the client and server. As of r12595 this will be a bit clearer as the client log messages will always be prefixed with client %i: where the client number is a counter which is incremented with each new connection and also shown as counter in xpra info so that logging messages can be reconciled with the client details they belong to.

Now, the much more difficult problem to solve is the decoder's dimensions, which have clearly been adjusted by the scaling factor (increased by 50% from 1264 to 1898, probably from 2/3 to 1 or the other way).
I believe that we are receiving a new frame with the new dimensions, but that somehow the decoder we use still has the old dimensions. This should have been caught by the sanity checks in paint_with_video_decoder. Or maybe the metadata that goes with the packet is wrong. Whatever it is, something is not right.
Can you reproduce with -d paint enabled? This is not a very high priority since this error should trigger an encoder reset, which will refresh things, but we should get to the bottom of this as this may hide a deeper problem.

The Warning: already 25 active instances... message could be a symptom of the same issue.
Whenever we re-initialize encoders (for whatever reason, but in this case probably due to the window size being changed for scaling), we queue the request to cleanup the current encoder after the frames already queued (to prevent deadly races where we would free memory whilst using it). Unless the queue is very busy, I don't see how we could get 25 instances waiting to be cleaned up, unless... you connect to the same session 25+ times? Did you?
r12601 + r12602 fix bugs where we would stop processing the encoding queue when a client disconnects, potentially leaking the encoder whenever a client disconnects whilst there is a busy queue (video playing). Both will be backported. r12603 is an improvement on that: we skip things like frame encoding (which we won't be sending anyway since the source is closed), but not encoder and csc cleanup (to prevent leaks).

#1198 is unlikely to be related, maybe just triggered by the scaling changes, which could cause a delay or bandwidth congestion, which in turn could trigger some sound warnings... and hitting the bug in #1198 (which should now fixed).

comment:3 Changed 3 years ago by alas

How very odd.

I tried to try to repro... but the server won't start today.

Since I'd built a new server to take a stab at #1197, I tried that one too... but it also won't start, today.

I tried uninstall/re-install & rebooting - no luck. Also tried rolling back to 0509 version in repo... also no luck.

Attaching some -d all logs from the attempted server starts, the one marked "old" is from the vm I did the original test on, the one marked new... well, is new.

In case it's important, launching with:

dbus-launch xpra --no-daemon --bind-tcp=0.0.0.0:1203 --start-child=xterm --start-child=xterm --mdns=no start :13 --start-new-commands=yes -d all

... maybe I'll try again tomorrow?

Changed 3 years ago by alas

The original weak vm failure to start logs

Changed 3 years ago by alas

The new vm, which also fails to launch a server, logs

comment:4 Changed 3 years ago by alas

Tested again, but couldn't repro with a 0.18.0 r12608 fedora 23 server with scant resources... and when I tried to restart to take a second stab at it the server wouldn't start at all (again).

Set up yet another new fedora 23 vm installed 0.18.0 r12608, had no luck, rolled back to 0.18.0 r12578 and again could repro with firefox... but once I installed google-chrome-unstable I was finally able to do so (by this point the vm was deep into its swap memory).

(Some of the details might possibly be useful.)

It then took 10 tries to successfully connect to the rickety server successfully and induce the error.... attaching the client logs I finally got though.

(Upon further examination, I'm not sure those logs have a repro of the same issue, or just a similar one... but those seem to be the best I can manage since these feeble servers are so picky about when they're even willing to start up.)

Last edited 3 years ago by alas (previous) (diff)

Changed 3 years ago by alas

Attachment: ticket1199_d-paint-10.txt added

Tenth try's the charm...

comment:5 Changed 3 years ago by Antoine Martin

Note: as of 0.17.0, you don't need to dbus-launch xpra, it will do it automatically.
I am seeing some delta bucket errors in that log, which is expected since the revision is older than r12597, but I am not seeing any codec width is smaller than our width.

comment:6 Changed 3 years ago by alas

Been trying to repro with -d paint with no success. Weird issue with servers not wanting to re-launch (presumably because of limited resources - comment:3) has been slowing down the efforts though.

Tried to roll back to 0.18.0 r12542 server and couldn't repro at all. I even tried to add some delay to try to induce, since I thought I'd seen something similar with delay in one of our set ups, but all that did was break the low resource server.

Tried to upgrade to 0.18.0 r12608 - and it wouldn't launch again. Maybe tomorrow... though I'm increasingly thinking there might've been something else going on to trigger those errors.

comment:7 Changed 3 years ago by Antoine Martin

Milestone: 0.181.0

Milestone renamed

comment:8 Changed 3 years ago by Antoine Martin

Resolution: fixed
Status: newclosed
Summary: 0.18.0 server with limited memory throwing decoding errors when desktop-scaling reduced below 100%server with limited memory throwing decoding errors when desktop-scaling reduced below 100%

As of r13289, we will show a warning if there is less than 512MB of memory:

Warning: only 512MB total system memory available
 this may not be enough to run a server

We could try to detect how much free memory we have and somehow reject very large resolutions when that's too low. Meh. Not worth the effort, a warning will do.

Note: See TracTickets for help on using tickets.