xpra icon
Bug tracker and wiki

Opened 11 days ago

Closed 8 days ago

#2139 closed enhancement (fixed)

less copying to reassemble packets

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: major Milestone: 2.5
Component: network Version: 2.4.x
Keywords: Cc:

Description (last modified by Antoine Martin)

At the moment, we use:

            if read_buffer:
                read_buffer = read_buffer + buf
            else:
                read_buffer = buf

Which creates a new buffer every time.
Since we read in 64KB chunks, this could cause the same memory to be copied a number of times before we reach the packet size.

We can usually parse the packet size from the very first chunk, and then we don't need to copy anything until we've received the full packet.

Network tracker ticket: #1590

Attachments (3)

protocol-noappend-v1.patch (10.1 KB) - added by Antoine Martin 10 days ago.
poc
format-thread-218329.png (313.2 KB) - added by Antoine Martin 9 days ago.
sample profiling output
cythonize-protocol.patch (59.2 KB) - added by Antoine Martin 8 days ago.
try to speedup protocol using cython

Download all attachments as: .zip

Change History (10)

comment:1 Changed 10 days ago by Antoine Martin

Description: modified (diff)
Status: newassigned

Changed 10 days ago by Antoine Martin

Attachment: protocol-noappend-v1.patch added

poc

comment:2 Changed 10 days ago by Antoine Martin

Interesting stuff: the new unit test added in r21586 + r21587 was showing very low speeds (60MB/s) until I replaced the fake reads with memory slicing in r21589 (now ~1GB/s), which shows that memory copying can be expensive.

comment:3 Changed 10 days ago by Antoine Martin

r21591 improves the test code to make it more realistic.
With this change:

  • old code: 1.5 to 4Gbps (average around 2Gbps)
  • new code: 2 to 10Gbps! (average around 3.5Gbps)

r21592 replaces the protocol code with the new one.
I would still like to run the profiling code on this.

Last edited 10 days ago by Antoine Martin (previous) (diff)

comment:4 Changed 9 days ago by Antoine Martin

Rewards from profiling: r21596 improves the performance by simply removing a non-essential logging call.

Changed 9 days ago by Antoine Martin

Attachment: format-thread-218329.png added

sample profiling output

comment:5 Changed 9 days ago by Antoine Martin

  • profiling code added in r21602
  • another logging call removed: r21604

I'm not sure there's much more we can do to streamline things, yet the performance is a tad disappointing, only ~1Gbps.
Sample profiling output: sample profiling output

Changed 8 days ago by Antoine Martin

Attachment: cythonize-protocol.patch added

try to speedup protocol using cython

comment:6 Changed 8 days ago by Antoine Martin

I thought the bottleneck was in python, so I converted the module to cython, but that didn't help at all (less than 1%).
Turns out that the performance does increase if we measure things for longer. Cache warming up? Startup costs?

comment:7 Changed 8 days ago by Antoine Martin

Resolution: fixed
Status: assignedclosed

Minor protocol cleanup in r21610.
More improvements to the tests in:

The websocket protocol is within touching distance of the raw xpra protocol:

xpra      format thread:			227MB/s
xpra      packets formatted per second:		2734
xpra      incoming packet processing speed:	612MB/s
xpra      packet parsed per second:		78501
websocket format thread:			225MB/s
websocket packets formatted per second:		2705
websocket incoming packet processing speed:	401MB/s
websocket packet parsed per second:		64813

There is still room for improvement, but this should be sufficient for some time.

Note: See TracTickets for help on using tickets.