At the moment, we use:
if read_buffer: read_buffer = read_buffer + buf else: read_buffer = buf
Which creates a new buffer every time. Since we read in 64KB chunks, this could cause the same memory to be copied a number of times before we reach the packet size.
We can usually parse the packet size from the very first chunk, and then we don't need to copy anything until we've received the full packet.
Network tracker ticket: #1590
Interesting stuff: the new unit test added in r21586 + r21587 was showing very low speeds (60MB/s) until I replaced the fake reads with memory slicing in r21589 (now ~1GB/s), which shows that memory copying can be expensive.
r21591 improves the test code to make it more realistic. With this change:
r21592 replaces the protocol code with the new one. I would still like to run the profiling code on this.
Rewards from profiling: r21596 improves the performance by simply removing a non-essential logging call.
sample profiling output
try to speedup protocol using cython
I thought the bottleneck was in python, so I converted the module to cython, but that didn't help at all (less than 1%). Turns out that the performance does increase if we measure things for longer. Cache warming up? Startup costs?
Minor protocol cleanup in r21610. More improvements to the tests in:
The websocket protocol is within touching distance of the raw xpra protocol:
xpra format thread: 227MB/s xpra packets formatted per second: 2734 xpra incoming packet processing speed: 612MB/s xpra packet parsed per second: 78501 websocket format thread: 225MB/s websocket packets formatted per second: 2705 websocket incoming packet processing speed: 401MB/s websocket packet parsed per second: 64813
There is still room for improvement, but this should be sufficient for some time.
this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2139