xpra icon
Bug tracker and wiki

Opened 9 months ago

Closed 8 months ago

Last modified 3 weeks ago

#2121 closed enhancement (fixed)

get rid of websockify

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: critical Milestone: 2.5
Component: network Version: 2.4.x
Keywords: Cc:

Description (last modified by Antoine Martin)

Avoid copying memory around, and support sending memoryviews as-is.

See #1926 and #2104.

sendmsg(data) calls _sendmsg(0x2, data), which does this:

        frame = self._encode_hybi(opcode, msg)
        return self._send(frame)

We should be able to subclass it and override _encode_hybi to return multiple frames to send.
flush will need to be told how to handle lists of buffers.
We can join the hiby header for payloads smaller than ~64K / 4K? (configurable)

TODO:

  • verify that TCP_NODELAY is being toggled as needed.
  • verify raw processing performance
  • verify actual network performance

Attachments (3)

websockify.patch (1.7 KB) - added by Antoine Martin 9 months ago.
bugs in websockify trunk
websockify-zerocopy.patch (6.2 KB) - added by Antoine Martin 9 months ago.
working zerocopy patch
websocket-legacy-flag.patch (5.5 KB) - added by Antoine Martin 9 months ago.
adds a flag so we can enable the new frame format without breaking older html5 clients

Download all attachments as: .zip

Change History (32)

comment:1 Changed 9 months ago by Antoine Martin

Description: modified (diff)
Status: newassigned

Changed 9 months ago by Antoine Martin

Attachment: websockify.patch added

bugs in websockify trunk

comment:2 Changed 9 months ago by Antoine Martin

First things first: there are a number of bugs in websockify upstream... (see patch)
And the code readability is not great.

Maybe we should just do the work in our own websocket class?
The problems with upstream breakage (#2104) would become a non-issue, and we may be able to optimize it better too.

Changed 9 months ago by Antoine Martin

Attachment: websockify-zerocopy.patch added

working zerocopy patch

comment:3 Changed 9 months ago by Antoine Martin

The patch above should help, it avoids making unnecessary memory copies of buffers larger than XPRA_WEBSOCKET_JOIN_SIZE (defaults to 16KB - we should evaluate what the best value is for the join size, ie: we use 64KB in the TCP layer...)

Each websocket header is just a few bytes with the packet size, since the primary purpose of zerocopy chunked packets is to optimize "paint" packets where the pixel data is sent as a separate chunk, we could also just generate the websocket header for both chunks and then submit them to the socket layer separately.
At the receiving end we end up having to slice the buffer in Javascript already anyway, to remove the xpra TCP packet header. And we also need to have received both chunks to process the packet, so there is no harm in using

single websocket frame for both.

The websocket frame already includes the packet size, so xpra's packet size header is a little bit redundant (and then there's also lz4's!) but adding a new packet format is a daunting task.

See also #619.

comment:4 Changed 9 months ago by Antoine Martin

The reason why this new code did not work with the html5 client... was a bug in the html5 client network code, fixed in r21511.
This may require a flag to avoid breaking older clients... we can switch to a slower chunk inlining workaround for those.
Server code finally added in r21513, packaging changes in r21514.

TODO:

  • handle continuation frames - just in case a proxy adds them
  • handle client connections using this new code and drop python-websocket-client
  • add html (rename to http?) option to only do websocket / http, ie: --http=websocket
  • don't parse the header every time until we get a full frame (though in practice, we almost always do unless a proxy slices it up..)
  • use recv_into on all sockets (transform into plain recv for those that don't have a native recv_info like pipes)
Last edited 9 months ago by Antoine Martin (previous) (diff)

comment:5 Changed 9 months ago by Antoine Martin

Here's one of the improvements: number of packets per HTTP request.

In all cases the browser's request looks like this:

######
T 127.0.0.1:39618 -> 127.0.0.1:10000 [AP] #145
  47 45 54 20 2f 69 63 6f    6e 73 2f 6e 6f 69 63 6f    GET /icons/noico
  6e 2e 70 6e 67 20 48 54    54 50 2f 31 2e 31 0d 0a    n.png HTTP/1.1..
  48 6f 73 74 3a 20 6c 6f    63 61 6c 68 6f 73 74 3a    Host: localhost:
  31 30 30 30 30 0d 0a 43    6f 6e 6e 65 63 74 69 6f    10000..Connectio
  6e 3a 20 6b 65 65 70 2d    61 6c 69 76 65 0d 0a 43    n: keep-alive..C
  61 63 68 65 2d 43 6f 6e    74 72 6f 6c 3a 20 6d 61    ache-Control: ma
(..)

With trunk:

#######
T 127.0.0.1:10000 -> 127.0.0.1:39618 [AP] #152
  48 54 54 50 2f 31 2e 30    20 32 30 30 20 4f 4b 0d    HTTP/1.0 200 OK.
  0a 53 65 72 76 65 72 3a    20 58 70 72 61 2d 57 65    .Server: Xpra-We
  62 53 6f 63 6b 65 74 2d    53 65 72 76 65 72 20 50    bSocket-Server P
  79 74 68 6f 6e 2f 32 2e    37 2e 31 35 0d 0a 44 61    ython/2.7.15..Da
  74 65 3a 20 54 68 75 2c    20 33 31 20 4a 61 6e 20    te: Thu, 31 Jan 
  32 30 31 39 20 31 31 3a    35 33 3a 32 37 20 47 4d    2019 11:53:27 GM
  54 0d 0a 4c 61 73 74 2d    4d 6f 64 69 66 69 65 64    T..Last-Modified
  3a 20 57 65 64 2c 20 33    30 20 4a 61 6e 20 32 30    : Wed, 30 Jan 20
  31 39 20 31 34 3a 32 32    3a 34 32 20 47 4d 54 0d    19 14:22:42 GMT.
  0a 43 6f 6e 74 65 6e 74    2d 4c 65 6e 67 74 68 3a    .Content-Length:
  20 31 33 33 0d 0a 43 6f    6e 74 65 6e 74 2d 74 79     133..Content-ty
  70 65 3a 20 69 6d 61 67    65 2f 70 6e 67 0d 0a 45    pe: image/png..E
  78 70 69 72 65 73 3a 20    30 0d 0a 50 72 61 67 6d    xpires: 0..Pragm
  61 3a 20 6e 6f 2d 63 61    63 68 65 0d 0a 43 61 63    a: no-cache..Cac
  68 65 2d 43 6f 6e 74 72    6f 6c 3a 20 6e 6f 2d 63    he-Control: no-c
  61 63 68 65 2c 20 6e 6f    2d 73 74 6f 72 65 2c 20    ache, no-store, 
  6d 75 73 74 2d 72 65 76    61 6c 69 64 61 74 65 0d    must-revalidate.
  0a 0d 0a 89 50 4e 47 0d    0a 1a 0a 00 00 00 0d 49    ....PNG........I
  48 44 52 00 00 00 10 00    00 00 10 08 06 00 00 00    HDR.............
  1f f3 ff 61 00 00 00 06    62 4b 47 44 00 ff 00 ff    ...a....bKGD....
  00 ff a0 bd a7 93 00 00    00 09 70 48 59 73 00 00    ..........pHYs..
  0b 0f 00 00 0b 0f 01 92    f9 03 a5 00 00 00 07 74    ...............t
  49 4d 45 07 e0 05 19 0a    14 06 f5 3b d1 f1 00 00    IME........;....
  00 12 49 44 41 54 38 cb    63 60 18 05 a3 60 14 8c    ..IDAT8.c`...`..
  02 08 00 00 04 10 00 01    85 3f aa 72 00 00 00 00    .........?.r....
  49 45 4e 44 ae 42 60 82                               IEND.B`.        
####

With version 2.4.3:

##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #197
  48 54 54 50 2f 31 2e 31    20 32 30 30 20 4f 4b 0d    HTTP/1.1 200 OK.
  0a                                                    .               
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #199
  53 65 72 76 65 72 3a 20    58 70 72 61 2d 57 65 62    Server: Xpra-Web
  53 6f 63 6b 69 66 79 20    50 79 74 68 6f 6e 2f 32    Sockify Python/2
  2e 37 2e 31 35 0d 0a                                  .7.15..         
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #201
  44 61 74 65 3a 20 54 68    75 2c 20 33 31 20 4a 61    Date: Thu, 31 Ja
  6e 20 32 30 31 39 20 31    31 3a 35 39 3a 33 38 20    n 2019 11:59:38 
  47 4d 54 0d 0a                                        GMT..           
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #203
  4c 61 73 74 2d 4d 6f 64    69 66 69 65 64 3a 20 54    Last-Modified: T
  68 75 2c 20 33 31 20 4a    61 6e 20 32 30 31 39 20    hu, 31 Jan 2019 
  31 31 3a 35 36 3a 31 38    20 47 4d 54 0d 0a          11:56:18 GMT..  
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #205
  43 6f 6e 74 65 6e 74 2d    4c 65 6e 67 74 68 3a 20    Content-Length: 
  31 33 33 0d 0a                                        133..           
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #207
  43 6f 6e 74 65 6e 74 2d    74 79 70 65 3a 20 69 6d    Content-type: im
  61 67 65 2f 70 6e 67 0d    0a                         age/png..       
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #209
  45 78 70 69 72 65 73 3a    20 30 0d 0a                Expires: 0..    
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #211
  50 72 61 67 6d 61 3a 20    6e 6f 2d 63 61 63 68 65    Pragma: no-cache
  0d 0a                                                 ..              
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #213
  43 61 63 68 65 2d 43 6f    6e 74 72 6f 6c 3a 20 6e    Cache-Control: n
  6f 2d 63 61 63 68 65 2c    20 6e 6f 2d 73 74 6f 72    o-cache, no-stor
  65 2c 20 6d 75 73 74 2d    72 65 76 61 6c 69 64 61    e, must-revalida
  74 65 0d 0a                                           te..            
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #215
  0d 0a                                                 ..              
##
T 127.0.0.1:10000 -> 127.0.0.1:42636 [AP] #217
  89 50 4e 47 0d 0a 1a 0a    00 00 00 0d 49 48 44 52    .PNG........IHDR
  00 00 00 10 00 00 00 10    08 06 00 00 00 1f f3 ff    ................
  61 00 00 00 06 62 4b 47    44 00 ff 00 ff 00 ff a0    a....bKGD.......
  bd a7 93 00 00 00 09 70    48 59 73 00 00 0b 0f 00    .......pHYs.....
  00 0b 0f 01 92 f9 03 a5    00 00 00 07 74 49 4d 45    ............tIME
  07 e0 05 19 0a 14 06 f5    3b d1 f1 00 00 00 12 49    ........;......I
  44 41 54 38 cb 63 60 18    05 a3 60 14 8c 02 08 00    DAT8.c`...`.....
  00 04 10 00 01 85 3f aa    72 00 00 00 00 49 45 4e    ......?.r....IEN
  44 ae 42 60 82                                        D.B`.           

The switch to websockets is also more concise.

As for websocket traffic, we avoid extra headers, but I think we can still do better packet aggregation, either using NODELAY (#619) or maybe we need a more explicit CORK: When should I use TCP_NODELAY and when TCP_CORK?

comment:6 Changed 9 months ago by Antoine Martin

Packet aggregation issue from comment:5 is now solved: #2130.

comment:7 Changed 9 months ago by Antoine Martin

Summary: speedup websockify send / hybiget rid of websockify

comment:8 Changed 9 months ago by Mark Harkin

I realize this is still probably a work in progress but as a headsup, I'm getting the following error after r21513.
Possibly a missing commit for xpra/server/websocket.py?

2019-02-02 08:03:13,323 Error: cannot import websocket connection handler:
2019-02-02 08:03:13,323  No module named http.server
2019-02-02 08:03:13,323  the html server will not be available

comment:9 Changed 9 months ago by Mark Harkin

It is a Python2 issue. Python2 need to import from BaseHTTPServer instead of http.server in http_handler.py.

comment:10 Changed 9 months ago by Antoine Martin

@Mark Harkin: I don't get it, works for me with both python2 and python3:

$ python2 -c "from http.server import BaseHTTPRequestHandler"
$ python3 -c "from http.server import BaseHTTPRequestHandler"

comment:11 Changed 9 months ago by Mark Harkin

I think it's best to ignore this for now then. It's probably an issue with my MSYS2 setup.

Changed 9 months ago by Antoine Martin

Attachment: websocket-legacy-flag.patch added

adds a flag so we can enable the new frame format without breaking older html5 clients

comment:12 Changed 9 months ago by Mark Harkin

I see you've marked remove python-websocket-client in your TODO, so these are just for note. I'll hold off on these changes until the html client is working again anyway.

I found the issue with import http.server above. The client build excludes BaseHTTPServer in setup.py. It may even be worth just removing the CLIENT_ONLY default of --without-html5 from MINGW_BUILD.sh to default to this working but giving the option to remove.

From the my MSYS2 build, I also had to rename xpra/net/websocket.py to avoid a conflict with the import of the external websocket module. The 2nd answer in https://stackoverflow.com/questions/42141689/attributeerror-module-object-has-no-attribute-websocketapp explains it a bit more.

comment:13 Changed 9 months ago by Antoine Martin

I'll hold off on these changes until the html client is working again anyway.

It should work already.
Only the older html5 clients would need the server worakround in the patch above - which will be merged after a few changes.

I found the issue with import http.server above. The client build excludes BaseHTTPServer in setup.py. It may even be worth just removing the CLIENT_ONLY default of --without-html5 from MINGW_BUILD.sh to default to this working but giving the option to remove.

Now you've lost me! Why would a client build need the html5 bits?
Can you provide the context for that error you're seeing? What command you're using, etc?

From the my MSYS2 build, I also had to rename xpra/net/websocket.py to avoid a conflict with the import of the external websocket module

Yes, I will remove the websocket-client code completely so this issue should go away.

comment:14 Changed 9 months ago by Mark Harkin

RE: http.server: Sorry, yes it was a server side issue and I was using a alpine build, I was missing the "future" pip package. I'll only comment after a centos build from now on :P

RE: html: getting an invalid frame header error. possibly because of my custom changes or using the proxy server. I'll get back when I get time to look at it.

comment:15 Changed 9 months ago by Antoine Martin

  • websocket-client removed in r21540 (from build deps: r21540)
  • better backwards compatibility: r21542
  • support sending masked frames (disabled by default since slower): r21543
  • backport fix flag to v2.4.x since it has the frame parsing fix: r21544

Still TODO:

  • testing (all platforms, different browsers, etc)
  • maybe create a new submodule (xpra.net.websocket?)
Last edited 9 months ago by Antoine Martin (previous) (diff)

comment:16 Changed 8 months ago by Antoine Martin

websocket bits moved to a sub-module: r21546, with its own build option: r21547.

Tested OK so far.

comment:17 Changed 8 months ago by Mark Harkin

@Antoine Martin
Probably related to your TODO "handle continuation frames - just in case a proxy adds them"

Works fine when connecting directly to a server, but using a proxy server causes issues on both the html and python clients.

Connecting from html client, server log shows:

2019-02-06 12:26:43,095 stopping proxy instance: disconnect from server: invalid packet header byte �: '889a392ce7bb3ac6' read buffer='\x88\x9a9,\xe7\xbb:\xc6\xb0\xde[\x7f\x88\xd8RI\x93\x9bi^\x88\xcfVO\x88\xd7\x19i\x95\xc9V^' (32 bytes)

Connecting from python client, client log shows:

2019-02-06 13:28:41,925 Error: read on ws socket: 192.168.1.19:63266 <- 192.168.1.30:10000 failed: <type 'exceptions.AssertionError'>
Traceback (most recent call last):
  File "./xpra/net/protocol.py", line 624, in _io_thread_loop
  File "./xpra/net/protocol.py", line 703, in _read
  File "./xpra/net/websockets/protocol.py", line 135, in parse_ws_frame
AssertionError: cannot handle fragmented '0' frames

comment:18 Changed 8 months ago by Antoine Martin

Probably related to your TODO "handle continuation frames - just in case a proxy adds them"

This was meant to be fixed, but there was a typo, fixed in r21552.

There may be other bugs though, how to I reproduce fragmented frames?

comment:19 in reply to:  18 Changed 8 months ago by Mark Harkin

Replying to Antoine Martin:

Probably related to your TODO "handle continuation frames - just in case a proxy adds them"

This was meant to be fixed, but there was a typo, fixed in r21552.

There may be other bugs though, how to I reproduce fragmented frames?

It seems just using the proxy server will reliably create the problem.
After r21552, python client now logging:

2019-02-06 14:31:40,650 Error: read on ws socket: 192.168.1.19:53406 <- 192.168.1.30:10000 failed: <type 'exceptions.AssertionError'>
Traceback (most recent call last):
  File "./xpra/net/protocol.py", line 624, in _io_thread_loop
  File "./xpra/net/protocol.py", line 703, in _read
  File "./xpra/net/websockets/protocol.py", line 121, in parse_ws_frame
AssertionError: continuation frame does not follow a partial frame

comment:20 Changed 8 months ago by Antoine Martin

Priority: majorcritical

It seems just using the proxy server will reliably create the problem.

Thanks for pointing that out, that's fixed in r21560.

The problem was that since the logic for packet framing was moved from the socket layer to the protocol layer, we now need to use a different protocol class for websockets and the proxy code had not been updated.

comment:21 in reply to:  20 Changed 8 months ago by Mark Harkin

Replying to Antoine Martin:

It seems just using the proxy server will reliably create the problem.

Thanks for pointing that out, that's fixed in r21560.

The problem was that since the logic for packet framing was moved from the socket layer to the protocol layer, we now need to use a different protocol class for websockets and the proxy code had not been updated.

After r21560, when connecting from either client I'm now getting the server-side error:

2019-02-06 17:11:35,741  packet from 192.168.1.19:57146
2019-02-06 17:11:35,741  received on 0.0.0.0:10000
2019-02-06 17:11:35,741  invalid packet header, HTTP GET request

comment:22 Changed 8 months ago by Antoine Martin

After r21560, when connecting from either client I'm now getting the server-side error:
(..)
invalid packet header, HTTP GET request

That's very odd. Works just fine here.

Please post the exact commands you are using for testing.
And maybe also the proxy's -d proxy log and the server's -d websocket,http log.

For reference, here's what I am using for testing:

  • server:
    xpra start  --start=xterm  --no-daemon -d websocket,http
    
  • proxy:
    xpra proxy --no-daemon --bind-tcp=0.0.0.0:14501 --tcp-auth=none
    
  • client:
    xpra attach ws://localhost:14501 -d websocket
    

comment:23 Changed 8 months ago by Antoine Martin

r21570 makes the BaseHTTPRequestHandler python version agnostic, fixing comment:8 since someone else reported the same problem. (whereas I'm still getting on fine as per comment:9 on Fedora 29 with python2)

comment:24 in reply to:  22 Changed 8 months ago by Mark Harkin

Something was fixed in a later release, can't reproduce on head.
Looked like html was getting copied into /usr/share/dist/xpra instead of /usr/share/xpra. Not an issue now though.

From a quick test new websocket work looks very good, thanks!

Last edited 8 months ago by Antoine Martin (previous) (diff)

comment:25 Changed 8 months ago by Antoine Martin

Looked like html was getting copied into /usr/share/dist/xpra instead of /usr/share/xpra. Not an issue now though.

That will be r21585.

comment:26 Changed 8 months ago by Antoine Martin

Resolution: fixed
Status: assignedclosed

More testing done as part of #2139.

This will do.

comment:27 Changed 8 months ago by Antoine Martin

Regression with python3: #2149

comment:28 Changed 8 months ago by Antoine Martin

Cleanup and refactoring in r21836 + r21837 + r21838 + r21839.

Last edited 8 months ago by Antoine Martin (previous) (diff)

comment:29 Changed 3 weeks ago by Antoine Martin

r21542 caused a regression: #2431

Last edited 3 weeks ago by Antoine Martin (previous) (diff)
Note: See TracTickets for help on using tickets.