Xpra: Ticket #2139: less copying to reassemble packets

At the moment, we use:

            if read_buffer:
                read_buffer = read_buffer + buf
            else:
                read_buffer = buf

Which creates a new buffer every time. Since we read in 64KB chunks, this could cause the same memory to be copied a number of times before we reach the packet size.

We can usually parse the packet size from the very first chunk, and then we don't need to copy anything until we've received the full packet.

Network tracker ticket: #1590



Fri, 08 Feb 2019 05:15:19 GMT - Antoine Martin: status, description changed


Fri, 08 Feb 2019 13:08:55 GMT - Antoine Martin: attachment set

poc


Fri, 08 Feb 2019 15:09:02 GMT - Antoine Martin:

Interesting stuff: the new unit test added in r21586 + r21587 was showing very low speeds (60MB/s) until I replaced the fake reads with memory slicing in r21589 (now ~1GB/s), which shows that memory copying can be expensive.


Fri, 08 Feb 2019 16:35:23 GMT - Antoine Martin:

r21591 improves the test code to make it more realistic. With this change:

r21592 replaces the protocol code with the new one. I would still like to run the profiling code on this.


Sat, 09 Feb 2019 08:36:12 GMT - Antoine Martin:

Rewards from profiling: r21596 improves the performance by simply removing a non-essential logging call.


Sat, 09 Feb 2019 12:11:32 GMT - Antoine Martin: attachment set

sample profiling output


Sat, 09 Feb 2019 12:13:12 GMT - Antoine Martin:

I'm not sure there's much more we can do to streamline things, yet the performance is a tad disappointing, only ~1Gbps. Sample profiling output: sample profiling output


Sun, 10 Feb 2019 04:10:24 GMT - Antoine Martin: attachment set

try to speedup protocol using cython


Sun, 10 Feb 2019 05:43:52 GMT - Antoine Martin:

I thought the bottleneck was in python, so I converted the module to cython, but that didn't help at all (less than 1%). Turns out that the performance does increase if we measure things for longer. Cache warming up? Startup costs?


Sun, 10 Feb 2019 14:40:35 GMT - Antoine Martin: status changed; resolution set

Minor protocol cleanup in r21610. More improvements to the tests in:

The websocket protocol is within touching distance of the raw xpra protocol:

xpra      format thread:			227MB/s
xpra      packets formatted per second:		2734
xpra      incoming packet processing speed:	612MB/s
xpra      packet parsed per second:		78501
websocket format thread:			225MB/s
websocket packets formatted per second:		2705
websocket incoming packet processing speed:	401MB/s
websocket packet parsed per second:		64813

There is still room for improvement, but this should be sufficient for some time.


Sat, 23 Jan 2021 05:43:14 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2139