xpra icon
Bug tracker and wiki

Opened 3 years ago

Closed 3 years ago

#502 closed defect (fixed)

efficient network receive buffer management when receiving large chunks

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: major Milestone: 0.12
Component: core Version:
Keywords: Cc:

Description

At the moment, we have a read_buffer which is a string and we append to it each time we get more data. We read data from the network 8KB at a time, which means that for an 8MB picture (uncompressed RGBA at 1080p), we end up copying that string buffer 1000 times... Quick maths tell me we generate (1000*1001)/2*8K = ~4GB of memory copy for an 8MB picture!
Now, with just lz4 compression, the average frame drops to just a few percent of the original size, ie for 5%: 400KB is 50 packets, which means: 50*51/2*8K= ~10MB (which is still 25 times more than we should!)
With h264, the compression is much more efficient, so the average packet size drops to 200KB, still high enough that memory copy is probably costing us.

Attachments (4)

protocol-efficient-receive-buffer.patch (4.0 KB) - added by Antoine Martin 3 years ago.
PoC: populates the payload buffer using slicing so we only allocate the memory once
protocol-efficient-receive-buffer-v2.patch (4.6 KB) - added by Antoine Martin 3 years ago.
better patch using bytearray instead of ctypes.create_string_buffer - but still too slow..
protocol-efficient-receive-buffer-v3.patch (4.5 KB) - added by Antoine Martin 3 years ago.
updated version using a list of strings as temporary buffer
rgb-net-chunks-compression-old-vs-new-filtered.csv (29.6 KB) - added by Antoine Martin 3 years ago.
raw CSV data used to generate the graphs

Download all attachments as: .zip

Change History (7)

Changed 3 years ago by Antoine Martin

PoC: populates the payload buffer using slicing so we only allocate the memory once

Changed 3 years ago by Antoine Martin

better patch using bytearray instead of ctypes.create_string_buffer - but still too slow..

comment:1 Changed 3 years ago by Antoine Martin

Status: newassigned

The v2 patch above looks good, but there is one problem left: we don't want to spend any time allocating the buffer, a simple malloc would do, unfortunately this is how long it takes to allocate 1GB:

  • ctypes.create_string_buffer(1024*1024*1024) > 1s
  • bytearray(1024*1024*1024) > 1s
  • " "*(1024*1024*1024) ~ 3ms!

So the string wins, but it is immutable...
All I want, is a bytearray backed by malloc... why is it so hard?

Changed 3 years ago by Antoine Martin

updated version using a list of strings as temporary buffer

comment:2 Changed 3 years ago by Antoine Martin

Contrary to what I expected, the performance improvement is marginal at best for the large packet case and the extra if statements in the new code actually make the more common case (handling smaller packets) a little slower! It does reduce client CPU load though, which may still make this worth having.

See Concatenation Test Code for why that is. Quote: With byte-code strings, concatenating with += is as fast as a .join. Since our strings are byte strings from the network layer, using join doesn't really help.

What seems to make more of a difference is the size of the receive buffer, bumping the size to 16k makes a noticeable improvement for the high bandwidth case.
Interestingly, the lz4 compression saves a huge amount of bandwidth, without really costing much in terms of number of frames / pixels sent.

Last edited 3 years ago by Antoine Martin (previous) (diff)

comment:3 Changed 3 years ago by Antoine Martin

Resolution: fixed
Status: assignedclosed

As can be seen on those newly added performance charts: rgb-nocompress, disabling compression of RGB pixels can increase the bandwidth consumption 100-fold, peaking above 100MB/s.
The largest improvement comes from bumping the size of the network receive buffer to 64KB, done in r5276.

Finally, the protocol speed charts comparing old code with small variations of the new code, with and without 64KB receive buffers, shows that there isn't much to gain from the new code.

Not applying and closing.

Changed 3 years ago by Antoine Martin

raw CSV data used to generate the graphs

Note: See TracTickets for help on using tickets.