xpra icon
Bug tracker and wiki

Opened 2 months ago

Closed 5 weeks ago

#2142 closed defect (invalid)

AvCodec Error -22

Reported by: Devyn Collier Johnson Owned by: Devyn Collier Johnson
Priority: major Milestone: 2.5
Component: server Version: 2.4.x
Keywords: avcodec, h264 Cc:

Description (last modified by Antoine Martin)

I have built Xpra from the Subversion repository source (tag v2.4.x) since the package in default Ubuntu repository and the WinSwitch? repository ( https://winswitch.org/downloads/debian-repository.html?dist_select=cosmic ) both appear to have a "tjcompress2 error -2" during the initial start-up of the server.

In the attached file, I include the error message, the build commands I used, and the output of xpra showconfig.

When running an Xpra server using the below command, an avcodec error -22 message is given when I try to launch the Xpra GUI-client. Prior to launching the client, the HTML5 interface would never load (I just get "Unable to connect" in Firefox). I am needing to use the HTML5 interface as seen here - https://www.xpra.org/trac/wiki/Clients/HTML5). I have installed all of the dependencies available in Ubuntu for both building and running Xpra.

xpra start :37 --start=gnome-mines --html=on --systemd-run=no --start-via-proxy=no --tcp-auth=file:filename=/home/collier/xpra_pswd.txt --bind-tcp=0.0.0.0:13700 --encoding=x264 --csc-modules=swscale --compress=1 --uid=1000 −−mmap−group=auto

I have also used the below environment variables as suggested in other bug reports relating to the h264 codec.

export XPRA_B_FRAMES=0
export XPRA_X264_THREADS=4
export XPRA_X264_SLICED_THREADS=0

Attachments (18)

Xpra_Error_Log.txt (19.1 KB) - added by Devyn Collier Johnson 2 months ago.
Xpra Error Log and Info
Xpra_Error_Log.2.txt (32.4 KB) - added by Devyn Collier Johnson 2 months ago.
Xpra Error Log 2
xpra.log (354.4 KB) - added by Devyn Collier Johnson 2 months ago.
Xpra X11 Log
Xpra_Log_Fresh_Install.txt (13.4 KB) - added by Devyn Collier Johnson 2 months ago.
Fresh Installation Log
Xpra.png (274.4 KB) - added by Devyn Collier Johnson 8 weeks ago.
Segmentation Error
xpra_full.log (153.6 KB) - added by Devyn Collier Johnson 8 weeks ago.
Full Log -d all
libvpx.png (100.7 KB) - added by Devyn Collier Johnson 8 weeks ago.
libvpx version
Xpra_GDB.txt (4.2 KB) - added by Devyn Collier Johnson 8 weeks ago.
GDB Output
Xpra_GDB_BT.txt (24.6 KB) - added by Devyn Collier Johnson 8 weeks ago.
GDB with bt
Xpra_Bug_Notes.txt (26.0 KB) - added by Devyn Collier Johnson 8 weeks ago.
Xpra BT Winswitch PPA
Xpra_Beta_Log.txt (10.9 KB) - added by Devyn Collier Johnson 7 weeks ago.
Xpra v2.5 Beta Logs
Xpra_Beta_Backtrace.txt (28.9 KB) - added by Devyn Collier Johnson 7 weeks ago.
Xpra v2.5 Backtrace
System_Info.txt (17.8 KB) - added by Devyn Collier Johnson 7 weeks ago.
System Info
hybi-log-addresses.patch (1.0 KB) - added by Antoine Martin 6 weeks ago.
print addresses used with 32-bit accesses
Dockerfile (3.5 KB) - added by Devyn Collier Johnson 5 weeks ago.
Dockerfile
Docker_Log.txt (4.3 KB) - added by Devyn Collier Johnson 5 weeks ago.
Log from AWS
init_xpra.sh (2.0 KB) - added by Devyn Collier Johnson 5 weeks ago.
Entrypoint
gpg.asc (9.1 KB) - added by Devyn Collier Johnson 5 weeks ago.
gpg.asc

Download all attachments as: .zip

Change History (54)

Changed 2 months ago by Devyn Collier Johnson

Attachment: Xpra_Error_Log.txt added

Xpra Error Log and Info

comment:1 Changed 2 months ago by Antoine Martin

Description: modified (diff)
Owner: changed from Antoine Martin to Devyn Collier Johnson

FYI:

  • --html=on the html5 client should be enabled by default if everything is installed correctly
  • systemd-run=no should default to no on versions of Ubuntu that are known to be broken
  • start-via-proxy=no already defaults to no
  • --encoding=x264 - don't do that
  • csc-modules=swscale - since you're not building other csc modules, this doesn't do anything
  • compress=1 - should already be the default, strangely enough your xpra showconfig shows a different value..
  • uid=1000 unless you are running as root, this doesn't do anything
  • mmap−group=auto - should already be the default

So AFAICT, your command line should just be:

xpra start :37 --start=gnome-mines \
   --tcp-auth=file:filename=/home/collier/xpra_pswd.txt --bind-tcp=0.0.0.0:13700

There are separate issues in this ticket.
(from the log, you seem to be running Ubuntu 18.10)

tjcompress2 error -2 during the initial start-up of the server.

Please post the full error message.
This is from the turbo jpeg encoder. and this is not normal, and it does not occur on a standard installation of Ubuntu 18.10.
Maybe start again with a clean installation?

avcodec error -22 message is given when I try to launch the Xpra GUI client

What command? Just xpra?
Please post the output of ./xpra/codecs/loader.py -v.
Please post: dpkg --list | egrep -i "ffmepg|xpra|dummy".

Prior to launching the client, the HTML5 interface would never load

I'm not sure I understand what that means: does launching the client somehow fix the HTML5 client?
Make sure your firewall isn't blocking that port. Make sure websockify is installed.
If that doesn't help, run your server with -d websockify,http and post the log file.
Or you may want to try the beta channel, which has big improvements to the websocket layer, which also makes it easier to deploy (#2121).

I have also used the below environment variables as suggested in other bug reports relating to the h264 codec.
(..)

You should not be fiddling with those settings, if the html5 client is not connecting, they won't make any difference at all.

comment:2 Changed 2 months ago by Devyn Collier Johnson

I attached two log files. The one contains the output of the requested commands appended to the bottom the the initial file that I uploaded.

You are correct, I am running Ubuntu 18.10 (Cosmic).

The purpose of running the command "xpra" is to try to connect to the running session via the GUI since the web-interface did not appear to be working. This was a way of checking if it is just the web-interface that was down or all of the Xpra server. As for the results, the GUI client would report "No sessions found".

The error that appears when installing from the repositories is seen below.

Error: failed to compress jpeg image, code -2:
tjCompress2(): Invalid argument
 width=32, stride=128, height=32
 format=BGRA, quality=0
Error: failed to compress jpeg image, code -2:
tjCompress2(): Invalid argument
 width=32, stride=128, height=32
 format=BGRA, quality=50
Error: failed to compress jpeg image, code -2:
tjCompress2(): Invalid argument
 width=32, stride=128, height=32
 format=BGRA, quality=100

I also got the version information concerning libTurbo.

collier@Nacho-Computer:~$ dpkg --list | egrep -i 'turbo'
ii  libjpeg-turbo-progs                        2.0.0-0ubuntu2                                amd64        Programs for manipulating JPEG files
ii  libjpeg-turbo8:amd64                       2.0.0-0ubuntu2                                amd64        IJG JPEG compliant runtime library.
ii  libjpeg-turbo8-dev:amd64                   2.0.0-0ubuntu2                                amd64        Development files for the IJG JPEG library
ii  libturbojpeg:amd64                         2.0.0-0ubuntu2                                amd64        IJG JPEG compliant runtime library.
ii  libturbojpeg0-dev:amd64                    2.0.0-0ubuntu2                                amd64        Development files for the TurboJPEG library

As for your comment regarding the environment variables. True, you have a good point. The only reason I tried that was due to trying as many possibilities since with many bug reports I have filed in the past (with other projects), the developers would often times have me try solutions to slightly off-topic issues. Since I have never filed a report with Xpra, I wanted to be sure and get any off-topic solutions out of the way (yes, it sounds illogical, but I have often times dealt with illogical people).

After looking at the output of running Xpra with -d websockify,http, I noticed the line 2019-02-10 18:39:02,042 init_html_proxy(..) options: tcp_proxy=, html='yes' with the empty tcp_proxy parameter. Does the Xpra HTML5 interface require a proxy?

As for trying the beta channel, how stable is the beta version of Xpra? I am needing something stable and reliable (although, an unstable version of Xpra would be better than a non-working version of Xpra, right? 😉)

Changed 2 months ago by Devyn Collier Johnson

Attachment: Xpra_Error_Log.2.txt added

Xpra Error Log 2

Changed 2 months ago by Devyn Collier Johnson

Attachment: xpra.log added

Xpra X11 Log

comment:3 Changed 2 months ago by Antoine Martin

TILs:

xpra start :37 -d websockify,http --start=gnome-mines --tcp-auth=file:filename=/home/collier/xpra_pswd.txt --bind-tcp=0.0.0.0:13700
(..)	serving html content from: /usr/local/share/xpra/www

All you need to do at this point is to run: xdg-open http://localhost:13700/connect.html.
Keep an eye on the server log, just in case.
If the browser does not connect, you have a firewall problem.

As for the codec issues, this does not happen with a clean installation of xpra on a standard Ubuntu system, so either re-install both or at least remove the xpra package before installing from source.

As for trying the beta channel, how stable is the beta version of Xpra?

It should be stable, but being a beta channel, things do break occasionally.

Last edited 2 months ago by Antoine Martin (previous) (diff)

comment:4 Changed 2 months ago by Devyn Collier Johnson

I tried running Xpra on a fresh installation of the codecs and Xpra itself (all from the default Ubuntu repos). I also explicitly allowed the port 13700 to both TCP and UDP. The HTML5 interface still fails to work. If I run xpra list to see the list of running sessions, the session is listed as "UNKNOWN" and is cleaned-up.

I attached the logs.

Changed 2 months ago by Devyn Collier Johnson

Attachment: Xpra_Log_Fresh_Install.txt added

Fresh Installation Log

comment:5 Changed 2 months ago by Antoine Martin

Your session must be taking forever to launch - could be caused by #2091. Is this an underpowered CPU?
You ran "xpra list" before the server had finished starting up, so "xpra list" ended up cleaning up the sockets.
Try to run the server with "--no-daemon" and wait until the server output prints "xpra is ready".

Last edited 2 months ago by Antoine Martin (previous) (diff)

comment:6 Changed 2 months ago by Antoine Martin

Also note that there are no codec errors in the logs.
You can check that manually by running ./xpra/codecs/loader.py -v.

comment:7 Changed 2 months ago by Devyn Collier Johnson

The CPU is not under-powered. My system has an 8700K Intel processor (Coffeelake) with six physical cores + six virtual cores and an Nvidia 1080 GPU. My system has 32GB of RAM.

By using the --no-daemon parameter, I was able to see that Xpra is having a segmentation fault on a clean installation from the default Ubuntu repositories.

collier@Nacho-Computer:~$ xpra start :35 --no-daemon --start=gnome-mines --tcp-auth=file:filename=/home/collier/xpra_pswd.txt --bind-tcp=0.0.0.0:13700 --html=on
2019-02-16 09:00:55,915 cannot use uinput for virtual devices:
2019-02-16 09:00:55,915  [Errno 13] Failed to open the uinput device: Permission denied
[mi] Extension "Composite" is not recognized
[mi] Only the following extensions can be run-time enabled:
[mi]    Generic Event Extension
[mi]    MIT-SHM
[mi]    XTEST
[mi]    SECURITY
[mi]    XINERAMA
[mi]    XFIXES
[mi]    RENDER
[mi]    RANDR
[mi]    COMPOSITE
[mi]    DAMAGE
[mi]    MIT-SCREEN-SAVER
[mi]    DOUBLE-BUFFER
[mi]    RECORD
[mi]    DPMS
[mi]    X-Resource
[mi]    XVideo
[mi]    XVideo-MotionCompensation
[mi]    SELinux
[mi]    GLX
2019-02-16 09:00:56,022 created unix domain socket: /run/user/1000/xpra/Nacho-Computer-35
2019-02-16 09:00:56,022 created unix domain socket: /run/xpra/Nacho-Computer-35
2019-02-16 09:00:56,120 pointer device emulation using XTest
2019-02-16 09:00:56,993  OpenGL is supported on this display
WARNING: no 'numpy' module, HyBi protocol will be slower
2019-02-16 09:00:57,030 serving html content from: /usr/local/share/xpra/www
2019-02-16 09:00:57,111 D-Bus notification forwarding is available
2019-02-16 09:00:57,246 found 1 virtual video device for webcam forwarding
2019-02-16 09:00:57,256 pulseaudio server started with pid 10644
2019-02-16 09:00:57,256  private server socket path:
2019-02-16 09:00:57,256  '/run/user/1000/xpra/pulse-35/pulse/native'
2019-02-16 09:00:58,147 GStreamer version 1.14.4 for Python 2.7.15 64-bit
Segmentation fault

comment:8 Changed 2 months ago by Antoine Martin

By using the --no-daemon parameter, I was able to see that Xpra is having a segmentation fault on a clean installation from the default Ubuntu repositories.

I find that a little bit hard to believe seeing that I did a clean install test as part of testing for comment:3.
To diagnose those types of crashes:

  • run the server with -d all
  • run xpra in gdb to get a backtrace

comment:9 Changed 8 weeks ago by Devyn Collier Johnson

I ran the suggested commands. I attached a screenshot and Xpra's output.

How do you recommend that I run Xpra with GDB? I ran it, but the xpra command is a Python script.

Changed 8 weeks ago by Devyn Collier Johnson

Attachment: Xpra.png added

Segmentation Error

Changed 8 weeks ago by Devyn Collier Johnson

Attachment: xpra_full.log added

Full Log -d all

comment:10 Changed 8 weeks ago by Antoine Martin

Looks like the crash is in the vpx decoder, you can try running xpra with:

xpra start --video-encoders=x264 ...

This should avoid the crash.
What is your libvpx version?

$ gdb --args xpra start :34 --no-daemon ...

You were pretty close, try:
gdb --args /usr/bin/python2 /usr/bin/xpra start ...

(as per wiki/Debugging)

Last edited 8 weeks ago by Antoine Martin (previous) (diff)

comment:11 Changed 8 weeks ago by Devyn Collier Johnson

I have libvpx version 1.7.0

I ran GDB as suggested while using the --video-encoders=x264 with xpra. However, it appears that Xpra is still calling VPX.

I attached a screenshot and the output of GDB.

Changed 8 weeks ago by Devyn Collier Johnson

Attachment: libvpx.png added

libvpx version

Changed 8 weeks ago by Devyn Collier Johnson

Attachment: Xpra_GDB.txt added

GDB Output

comment:12 Changed 8 weeks ago by Antoine Martin

Please grab a backtrace from gdb by typing bt at the gdb crash prompt.

Changed 8 weeks ago by Devyn Collier Johnson

Attachment: Xpra_GDB_BT.txt added

GDB with bt

comment:13 Changed 8 weeks ago by Devyn Collier Johnson

Okay, I attached the backtrace

comment:14 Changed 8 weeks ago by Antoine Martin

Ah, ubuntu's gdb doesn't give you very useful backtraces, does it have a py-bt command?

That said, some things immediately stand out from this stacktrace:

$ gdb --args /usr/bin/python2 /usr/local/bin/xpra start :29 ...

/usr/local/bin/xpra is not a standard location for one of our packages.
You must have installed this yourself by building from source? Why is that? Did you remove the package before mixing with your source installation?

The fact that you are seeing crashes and errors with multiple codecs, and that those errors look like library version issues, and that I am not seeing that on a fresh install, this all makes me think that you are either using the wrong package (wrong repository configured?) or building it wrong yourself or you have the wrong libraries installed.

However, it appears that Xpra is still calling VPX.

Yes, the codec still gets loaded, it isn't fully initialized but that's still enough to trigger the crash.

comment:15 Changed 8 weeks ago by Devyn Collier Johnson

I had installed Xpra from the default repos. Initially (the first post), I had installed from source due to wanting to compile the code specifically to my needs (e.g. compile without webcam support, etc.) and specifically for the processor (e.g. CFLAGS= -march=skylake -mavx -O3 etc.) to achieve greater performance.

For this next run and backtrace, I installed Xpra from the Winswitch repo/ppa and I used PIP to upgrade most of my Python2 packages. I also saw that you provided an update for Xdummy (I got that update). Obviously, I did all this after I uninstalled the existing XPra and after searching the whole file hierarchy for any remnants of Xpra. I also rebooted after applying these changes. However, I am still getting a segfault. I am also getting the "tjCompress2" compress errors.

Changed 8 weeks ago by Devyn Collier Johnson

Attachment: Xpra_Bug_Notes.txt added

Xpra BT Winswitch PPA

comment:16 Changed 8 weeks ago by Devyn Collier Johnson

No, Ubuntu does not appear to have py-bt.

comment:17 Changed 8 weeks ago by Antoine Martin

I had installed Xpra from the default repos.
Initially (the first post), I had installed from source due to wanting to compile the code specifically to my needs (e.g. compile without webcam support, etc.)

That's mostly superfluous: features that are disabled aren't loaded into memory (see #1861 and #1838 for details), just disable them instead.

and specifically for the processor (e.g. CFLAGS= -march=skylake -mavx -O3 etc.) to achieve greater performance.

AFAIK, the benefits of this are very limited.
The only part of the process that can really benefit from CPU optimizations is the picture encoding stage. For that, you should rebuild x264 and / or libvpx, not xpra itself. And even then, those libraries include various hand crafted CPU optimizations already (turbojpeg does too), so gcc probably won't be able to improve on that.

For this next run and backtrace, I installed Xpra from the Winswitch repo/ppa and I used PIP to upgrade most of my Python2 packages.

That's usually a bad idea. Don't mix distribution packages with pip installed packages.

However, I am still getting a segfault. I am also getting the "tjCompress2" compress errors.

You could always nuke the problematic codecs from the filesystem, ie: rm -fr /the/path/to/xpra/codecs/vpx
But this won't resolve the broken state of your system, which is likely to cause you more problems down the line.

comment:18 Changed 7 weeks ago by Devyn Collier Johnson

Even after running rm -fr /the/path/to/xpra/codecs/vpx on a fresh installation of Ubuntu, it still does not work. Could it have anything to do with using the proprietary Nvidia driver which would be (to the best of my knowledge) the only difference between my fresh installation and yours?

Is there a way to prevent all codecs from loading except for the one that I need to use? Also, from a fresh installation of everything, why would I be getting the "tjCompress2" compress errors?

comment:19 Changed 7 weeks ago by Antoine Martin

Even after running rm -fr /the/path/to/xpra/codecs/vpx on a fresh installation of Ubuntu, it still does not work.

How so?

Could it have anything to do with using the proprietary Nvidia driver which would be (to the best of my knowledge) the only difference between my fresh installation and yours?

The nvidia driver would allow for the nvenc codec to be used. (awesome performance - highly recommended)
You can try nuking that one too: rm -fr /the/path/to/xpra/codecs/nvenc.

Is there a way to prevent all codecs from loading except for the one that I need to use?

Yes, use the 2.5 beta builds.

Also, from a fresh installation of everything, why would I be getting the "tjCompress2" compress errors?

My guess is that there is something wrong with your installation.
The fact that you're getting so many different codec errors tells me that none of the codec shared libraries match what xpra is expecting, you're either using the wrong repository / package or your mixing with a source installation.

comment:20 Changed 7 weeks ago by Devyn Collier Johnson

Today, I started from a fresh install and I used the Beta (v2.5) WinSwitch? repository for Cosmic. I installed the package for Python3. This time, I actually managed to get to the screen that would show the progress of the loading web-sockets. However, it stopped very close to finish loading. I attached the GDB log.

Changed 7 weeks ago by Devyn Collier Johnson

Attachment: Xpra_Beta_Log.txt added

Xpra v2.5 Beta Logs

comment:21 Changed 7 weeks ago by Devyn Collier Johnson

In the beta version, it crashed due to the XOR codec. I tried deleting that directory (/usr/lib/python3/dist-packages/xpra/codecs/xor/), but then Xpra would just say that the codec was not found and then stall.

comment:22 Changed 7 weeks ago by Antoine Martin

Xpra v2.5 Beta Logs

You're not including the backtrace from gdb, you need to run py-bt or bt from the gdb prompt.

I installed the package for Python3.

Just in case, also try the python2 version.

Can you connect with the regular client instead of the html5 client?
Is the x264 codec enabled then?

I tried deleting that directory (/usr/lib/python3/dist-packages/xpra/codecs/xor/), but then Xpra would just say that the codec was not found and then stall.

This module is generally required. It is also used by the new websockets code: #2121.

The only crash we've ever had in xor was related to unaligned 64-bit access on older CPUs: #1749. Which is why we now use 32-bit access everywhere and added extra code to align addresses in the target buffer.

comment:23 Changed 7 weeks ago by Devyn Collier Johnson

Whoops, my bad. Here are the back-traces.

I will try using the Python2 version sometime next week as well as using the regular client.

Changed 7 weeks ago by Devyn Collier Johnson

Attachment: Xpra_Beta_Backtrace.txt added

Xpra v2.5 Backtrace

comment:24 Changed 7 weeks ago by Antoine Martin

TILs:

Thread 91 "python3.6" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb3fff700 (LWP 16413)]
0x00007ffff1323fbf in ?? ()
   from /usr/lib/python3/dist-packages/xpra/codecs/xor/cyxor.cpython-36m-x86_64-linux-gnu.so
(gdb) bt
#0  0x00007ffff1323fbf in  ()
    at /usr/lib/python3/dist-packages/xpra/codecs/xor/cyxor.cpython-36m-x86_64-linux-gnu.so
#1  0x00007ffff1324e92 in  ()
    at /usr/lib/python3/dist-packages/xpra/codecs/xor/cyxor.cpython-36m-x86_64-linux-gnu.so
#2  0x000000000050c4f5 in _PyCFunction_FastCallDict
    (kwargs=<optimized out>, nargs=<optimized out>, args=<optimized out>, func_obj=<built-in function hybi_unmask>) at ../Objects/methodobject.c:231
#3  0x000000000050c4f5 in _PyCFunction_FastCallKeywords
    (kwnames=<optimized out>, nargs=<optimized out>, stack=<optimized out>, func=<optimized out>)
    at ../Objects/methodobject.c:294
#4  0x000000000050c4f5 in call_function
    (pp_stack=0x7fffb3ffdee0, oparg=<optimized out>, kwnames=<optimized out>) at ../Python/ceval.c:4837
#5  0x000000000050dd99 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>)
    at ../Python/ceval.c:3335
#6  0x000000000050b638 in PyEval_EvalFrameEx
    (throwflag=0, f=Frame 0x7fffbc002ec8, for file /usr/lib/python3/dist-packages/xpra/net/websockets/header.py, line 65, in decode_hybi (buf=b'\x82\xfe\x17...

So it is the new hybi_unmask function called from decode_hybi.
The changelog for this code is here: log/xpra/trunk/src/xpra/codecs/xor/cyxor.pyx.
In particular: #1926 (superseded by #2121) and r21393 for 32-bit accesses.

Please post the /proc/cpuinfo of the machine that is having this problem.

comment:25 Changed 7 weeks ago by Devyn Collier Johnson

I attached the requested information and I provided additional information in the file as well.

Also, I have Xpra v2.5-r21899.

Changed 7 weeks ago by Devyn Collier Johnson

Attachment: System_Info.txt added

System Info

comment:26 Changed 7 weeks ago by Antoine Martin

Right, at this point I am pretty sure that the problem is with your system.
The new cyxor code has torture tests that I've run on similar but older CPUs, and on Ubuntu virtual machines. This is just another symptom of the more general problem you have.

Unless you can provide me with steps to reproduce the problem reliably (ie: dockerfile, virtual machine image, etc) then I will have to close this ticket as invalid.

comment:27 Changed 6 weeks ago by Devyn Collier Johnson

What do you mean by "the problem is with your system" and "another symptom of the more general problem you have"? I have tried Xpra on a fresh install of both Ubuntu and Xubuntu. The hardware works perfectly fine for all other uses. I have not had any issues with any other software for the whole year that I have had this hardware. What is this general problem?

I installed the Python2 version of Xpra v2.5 (Beta) from the Winswitch PPA. Again, a fresh Linux install. The Python2 version using the non-HTML5 client works, but the HTML5 interface via the web-browser (I have tried Firefox and Chrome), but neither work. With the Python2 version of Xpra v2.5 (Beta), it repeatedly loops invalid packet format, character 0xa0, not an xpra client? and server does not support h264 encoding and has switched to auto in the command-line where I ran gdb --args /usr/bin/python2.7 /usr/bin/xpra start :34 --no-daemon --video-encoders=x264 --start=xmahjongg --auth=allow --bind-tcp=0.0.0.0:17300.

To reproduce the issue, install Ubuntu or Xubuntu on a system with an i7-8700K Intel processor, a 1080 Nvidia graphics card, and a 16-inch 4K screen. Add the WinSwitch? Beta repo (deb http://winswitch.org/beta/ cosmic main) and the Graphics repo (deb http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu cosmic main) for the Nvidia driver. Install Paramiko, Websockify, and all other listed dependencies for Xpra.

I disassembled the cyxor.cpython-36m-x86_64-linux-gnu.so file and I noticed out of the whole file, there were only two SSE assembly command (both of which are unaligned).

AT&T Syntax:

    437a:	movdqu -0x10(%r9,%rax,1),%xmm0
    4381:	movups %xmm0,(%r9,%rax,1)

movdqu is an unaligned double quadword move
movups is an unaligned packed single-precision floating-point move

If the pointer is not 16-byte aligned, this will cause a segmentation fault. Also, if the data happens to be on the stack instead of the memory, than this could cause alignment issues that are seen in some systems and not others.

This reminds me of ticket:1749#comment:10

This issue with Cython and Python ( https://stackoverflow.com/questions/51187592/using-c-union-with-sse-intrinsics-in-cython-results-in-sigsegv ) is similar to our issue at the assembly level.

Also, I noticed other developers that get segmentation faults originating from Cython code tend to add nogil or some other methods of manipulating the Python garbage collector to help with similar issues.

Here are some helpful links on SSE and alignment

https://stackoverflow.com/questions/47510783/why-does-unaligned-access-to-mmaped-memory-sometimes-segfault-on-amd64

https://stackoverflow.com/questions/841433/are-stack-variables-aligned-by-the-gcc-attribute-alignedx

Last edited 6 weeks ago by Antoine Martin (previous) (diff)

Changed 6 weeks ago by Antoine Martin

Attachment: hybi-log-addresses.patch added

print addresses used with 32-bit accesses

comment:28 Changed 6 weeks ago by Antoine Martin

What do you mean by "the problem is with your system" and "another symptom of the more general problem you have"?

I mean that no-one else has reported any problems with the codecs and that every time I have seen errors like these reported it ended up being a mistake during system setup. (wrong arch, wrong distro version, mixed source installation, etc)

To reproduce the issue, install Ubuntu or Xubuntu on a system ...

As per comment:8, I had already done a test install in a VM 3 weeks ago.

Add the WinSwitch?? Beta repo (deb http://winswitch.org/beta/ cosmic main)

As per the installation instructions, the beta repo should not be installed without also installing the stable repo.
(it may or may not work correctly without)

Install Paramiko, Websockify, and all other listed dependencies for Xpra.

The dependencies should be installed automatically when you install xpra, and websockify is no longer used in 2.5 (as per #2121)

I disassembled .. floating-point move

Are you sure that those instructions are actually used?
We don't do any floating point operations in that whole module.

If the pointer is not 16-byte aligned, this will cause a segmentation fault.

Unless gcc does something really weird with vectorization, all accesses are 32-bit only and 4-byte aligned.
Run the python ./unittests/unit/net/cyxor_hybi_test.py with the patch above applied to verify the value of the pointers.

Also, I noticed other developers that get segmentation faults originating from Cython code tend to add nogil or some other methods of manipulating the Python garbage collector to help with similar issues.

We don't release the GIL in this particular module. Adding nogil would definitely not help, if anything it can cause more problems.

So, like I said: if I can't reproduce the problem, I can't fix it.

comment:29 Changed 6 weeks ago by Devyn Collier Johnson

Just to keep you updated, I am currently in the process of obtaining an online cloud-computing account which I will use to try Xpra v2.5-beta (both WinSwitch? repos) since it appears to have issues running directly on a fresh Ubuntu system with the i7-8700K Intel processor. True, I could try a virtual machine, but I want to try Xpra on another freshly installed system that will not be using nor involved with the i7-8700K Intel processor. Also, considering that you have never had such issues running Xpra within Docker, that may be the best option.

Once I complete this testing, I will report back.

Either way, I will soon be working on optimizing and enhancing the code for Xpra (like we mentioned on our video-call).

comment:30 Changed 6 weeks ago by Antoine Martin

I want to try Xpra on another freshly installed system

Please keep a lot of every terminal command you run to get it installed, so this can be reproduced somewhere else if need be.

Either way, I will soon be working on optimizing and enhancing the code for Xpra

Please create a separate ticket for that, for more information see wiki/Performance (out of date) and #620: you need to use the profiling tools to identify the locations that may need optimizing then the automated tests (#2112) to validate changes.

comment:31 Changed 5 weeks ago by Devyn Collier Johnson

I tried Xpra in Docker on AWS and it still has a segmentation fault (both Python2 and Python3). I attached the needed files.

Changed 5 weeks ago by Devyn Collier Johnson

Attachment: Dockerfile added

Dockerfile

Changed 5 weeks ago by Devyn Collier Johnson

Attachment: Docker_Log.txt added

Log from AWS

Changed 5 weeks ago by Devyn Collier Johnson

Attachment: init_xpra.sh added

Entrypoint

Changed 5 weeks ago by Devyn Collier Johnson

Attachment: gpg.asc added

gpg.asc

comment:32 Changed 5 weeks ago by Antoine Martin

Resolution: invalid
Status: newclosed

As I suspected all along, there is something fundamental and non-standard that you're modifying on your system.
Setting PYTHONOPTIMIZE=2 breaks all sort of things: Cython extensions, Pillow (issue 3232: Pillow cannot be loaded in python optimize (2) mode) and is not going to optimise anything useful.

More information here: What does Python optimization (-O or PYTHONOPTIMIZE) do?

As of r22087 we will now print a big warning when the flag is set.
If you really want to optimise things, use the profiling tools (ie: #620) and work from there.

comment:33 Changed 5 weeks ago by Devyn Collier Johnson

Resolution: invalid
Status: closedreopened

I am completely confused. On a fresh install of Ubuntu with the Winswitch repo added and all other steps mentioned in comment 27 ( https://www.xpra.org/trac/ticket/2142#comment:27 ), how does the PYTHONOPTIMIZE variable get set? In all these fresh installs and attempts to get Xpra working that was the first time I set the variable. Also, when removing that line from the Docker image and script, Xpra still fails to work.

comment:34 Changed 5 weeks ago by Antoine Martin

Xpra works here, as soon as I remove it from the dockerfile.
If it still doesn't work for you, maybe there's something else that is changed.

comment:35 Changed 5 weeks ago by Devyn Collier Johnson

I had removed the PYTHONOPTIMIZE line as well, but Xpra still did not work. Did you manage to get Xmahjongg to work in the web-browser? If so, what does the Dockerfile look like the you are using? A better question may be, how are you running Xpra to get it to work?

comment:36 Changed 5 weeks ago by Antoine Martin

Resolution: invalid
Status: reopenedclosed

I had removed the PYTHONOPTIMIZE line as well, but Xpra still did not work.

There are 2 of them, not just one. Remove all of them.

Did you manage to get Xmahjongg to work in the web-browser?

Not with your dockerfile and script, because {xmahjong is not on the $PATH.

Note: See TracTickets for help on using tickets.