Xpra: Ticket #2730: xpra server leaks memory

xpra v3.0.7-r25629

Fedora Core 31 5.5.11-200.fc31.x86_64 #1 SMP Mon Mar 23 17:32:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I started happening after recent dnf update

Similar to #1662

[mj@mj-opti7040 ~]$ xpra info :122 | grep memory
threads.memory.children.idrss=0
threads.memory.children.inblock=107368
threads.memory.children.isrss=0
threads.memory.children.ixrss=0
threads.memory.children.majflt=383
threads.memory.children.maxrss=5342608
threads.memory.children.minflt=116275
threads.memory.children.msgrcv=0
threads.memory.children.msgsnd=0
threads.memory.children.nivcsw=278
threads.memory.children.nsignals=0
threads.memory.children.nswap=0
threads.memory.children.nvcsw=2460
threads.memory.children.oublock=56
threads.memory.children.stime=0
threads.memory.children.utime=2
threads.memory.server.idrss=0
threads.memory.server.inblock=135360
threads.memory.server.isrss=0
threads.memory.server.ixrss=0
threads.memory.server.majflt=1710
threads.memory.server.maxrss=5794408
threads.memory.server.minflt=3353250
threads.memory.server.msgrcv=0
threads.memory.server.msgsnd=0
threads.memory.server.nivcsw=27229
threads.memory.server.nsignals=0
threads.memory.server.nswap=0
threads.memory.server.nvcsw=2581653
threads.memory.server.oublock=16
threads.memory.server.stime=90
threads.memory.server.utime=1065
total-memory=8221163520

The only apps in use: terminator, meld and boostnote Session was running for several days. Session reattached dozens of times.

xpra stop :122

made the xpra process go away and memory freed
2383977 mj        20   0 7459300   5.1g  27072 S   0.0  66.4  19:12.37 xpra

Thu, 16 Apr 2020 07:46:18 GMT - MJ: attachment set

attachment set to xpra-memmap

cat /proc/pid/maps

Thu, 16 Apr 2020 11:49:52 GMT - Antoine Martin: status changed

status changed from new to assigned

Is this a local connection, with mmap, or remote? What commands do you use for starting and connecting to the server?

5.1g

Yes, that's a little excessive!

Thu, 16 Apr 2020 13:17:07 GMT - MJ:

This is a remote ssh connection from windows client.#

This is my usual command (either start or attach)

.\Xpra.exe attach --encoding png --no-file-transfer --no-microphone --no-speaker --ssh='wsl ssh' ssh/mj@rmjd/121  --desktop-scaling=1 --start=terminator

I think encoding png no longer works after recent server update. So that is one thing that must have changed.

Fri, 17 Apr 2020 08:30:05 GMT - Antoine Martin:

#2731 looks like a duplicate of this bug.

Fri, 17 Apr 2020 08:41:42 GMT - stdedos: cc set

cc stdedos added

Fri, 17 Apr 2020 16:47:48 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to MJ
status changed from assigned to new

I've just tried to reproduce on Fedora 31, I've left glxgears running for a while and the memory usage didn't go up noticeably. (it goes up and down as the garbage collector kicks in regularly) I've also tried with terminator, left top -d 0.2 running for a while to generate screen updates.

Any idea what I can do to trigger this memleak?

--encoding=png I think encoding png no longer works after recent server update.

FYI: I can't think of one good reason for using png, it is demonstrably worse than the alternatives. (slower, higher cpu usage, lower compression) You should probably use --quality=100 and leave the encoding option unchanged if what you want is lossless screen updates. (and you will get a mix of png, webp and rgb - the server will choose what's best for each screen update)

Fri, 17 Apr 2020 18:39:28 GMT - MJ:

Some more info. I reproduced this again. I have started new session 35 hours ago. The only application I have launched this time is terminator. In the meantime I have reattached to the session once or twice. I can see the memory usage now is approaching 1GB. It has been stable for past 5 minutes.

However, 5 minutes ago it was about 127528 and the sudden increase between my two glances on top tool screen.

top - 19:08:16 up 11 days, 20:38,  2 users,  load average: 0.14, 0.07, 0.01
2821486 mj        20   0 1886224 898300  68176 S   4.8  11.2   3:55.10
top - 19:13:15 up 11 days, 20:43,  2 users,  load average: 0.11, 0.10, 0.04
2821486 mj        20   0 1951760 977296  68176 S   5.1  12.2   4:05.24

$pmap 2821486
2821486:   /usr/bin/python3 -s /usr/bin/xpra start :122 --encoding=png --
ssh=wsl ssh --speaker=off --start=terminator --env=XPRA_PROXY_START_UUID=af228e0e10b540e29dc8f118f6f04d15 --daemon=yes --systemd-run=no
000055d183cd6000      4K r---- python3.7
000055d183cd7000      4K r-x-- python3.7
000055d183cd8000      4K r---- python3.7
000055d183cd9000      4K r---- python3.7
000055d183cda000      4K rw--- python3.7
000055d1844c5000   1964K rw---   [ anon ]
000055d1846b0000 196488K rw---   [ anon ] <<<<<
00007fd52c000000  64480K rw---   [ anon ]

I reckon the memory block shown by the pmap is the heap. Hopefully the delta in the memory usage might give you a clue

Another update, the memory usage actually grows constantly, that is in significant jumps every few minutes.

top - 19:32:48 up 11 days, 21:03,  2 users,  load average: 0.18, 0.18, 0.11
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
2821486 mj        20   0 2541584   1.5g  68260 S  22.8  19.3   4:43.12
2821486:   /usr/bin/python3 -s /usr/bin/xpra start :122 --encoding=png --ssh=wsl ssh --speaker=off --start=terminator --env=XPRA_PROXY_START_UUID=af228e0e10b540e29dc8f118f6f04d15 --daemon=yes --systemd-run=no
000055d183cd6000      4K r---- python3.7
000055d183cd7000      4K r-x-- python3.7
000055d183cd8000      4K r---- python3.7
000055d183cd9000      4K r---- python3.7
000055d183cda000      4K rw--- python3.7
000055d1844c5000   1964K rw---   [ anon ]
000055d1846b0000 196488K rw---   [ anon ]
00007fd504000000  58980K rw---   [ anon ]
q
}}
diff of subsequent pmaps shows the following:
{{{
$diff pmap.1 pmap.2
8a9,28
> 00007fd4fc000000  28212K rw---   [ anon ]
> 00007fd4fdb8d000  37324K -----   [ anon ]
> 00007fd504000000  65148K rw---   [ anon ]
> 00007fd507f9f000    388K -----   [ anon ]
> 00007fd50c000000  58856K rw---   [ anon ]
> 00007fd50f97a000   6680K -----   [ anon ]
> 00007fd510000000  64968K rw---   [ anon ]
> 00007fd513f72000    568K -----   [ anon ]
> 00007fd514000000  64824K rw---   [ anon ]
> 00007fd517f4e000    712K -----   [ anon ]
> 00007fd518000000  64624K rw---   [ anon ]
> 00007fd51bf1c000    912K -----   [ anon ]
> 00007fd51c000000  65240K rw---   [ anon ]
> 00007fd51ffb6000    296K -----   [ anon ]
> 00007fd520000000  64948K rw---   [ anon ]
> 00007fd523f6d000    588K -----   [ anon ]
> 00007fd524000000  65364K rw---   [ anon ]
> 00007fd527fd5000    172K -----   [ anon ]
> 00007fd528000000  63680K rw---   [ anon ]
> 00007fd52be30000   1856K -----   [ anon ]
11,12c31,32
< 00007fd530000000  36628K rw---   [ anon ]
< 00007fd5323c5000  28908K -----   [ anon ]
---
> 00007fd530000000  65040K rw---   [ anon ]
> 00007fd533f84000    496K -----   [ anon ]
38,39c58,59
< 00007fd57c000000    844K rw---   [ anon ]
< 00007fd57c0d3000  64692K -----   [ anon ]
---
> 00007fd57c000000   1148K rw---   [ anon ]
> 00007fd57c11f000  64388K -----   [ anon ]
1190,1191c1210
<  total          1951764K
<
---
>  total          2607124K
}}}
Hope that helps

Tue, 05 May 2020 17:16:58 GMT - MJ:

A mitigation for me is to run xpra with

ulimit -Sv 2000000     # Set 2GB vmem limit
and periodically reconnect with
xpra upgrade :123 # which restarts xpra server but keeps Xorg and apps

Tue, 05 May 2020 17:19:59 GMT - MJ:

First thing that hits the memory limit is:

with image=XShmImageWrapper(BGRA: 0, 0, 1278, 1368), options={'window-size': (1278, 1368), 'scroll': True}
Traceback (most recent call last):
  File "/usr/lib64/python3.7/site-packages/xpra/server/window/window_video_source.py", line 1816, in may_use_scrolling
    self.encode_scrolling(scroll_data, image, options, match_pct)
  File "/usr/lib64/python3.7/site-packages/xpra/server/window/window_video_source.py", line 1897, in encode_scrolling
    ret = encode_fn(encoding, sub, options)
  File "/usr/lib64/python3.7/site-packages/xpra/server/window/window_source.py", line 2433, in rgb_encode
    self.rgb_zlib, self.rgb_lz4, self.rgb_lzo)
  File "/usr/lib64/python3.7/site-packages/xpra/server/picture_encode.py", line 53, in rgb_encode
    if not rgb_reformat(image, rgb_formats, supports_transparency):
  File "/usr/lib64/python3.7/site-packages/xpra/codecs/rgb_transform.py", line 73, in rgb_reformat
    pixels = pixels.tobytes()
MemoryError

Note in my usecase there is lot of logs scrolling over the screen fair share of the time.

Wed, 06 May 2020 07:15:22 GMT - Antoine Martin:

Note in my usecase there is lot of logs scrolling over the screen fair share of the time.

Does the problem go away if you start your server with --env=XPRA_SCROLL_ENCODING=0 ?

Wed, 06 May 2020 16:37:43 GMT - Antoine Martin:

I really can't see any memory increase here. r26271 switches to the builtin tracemalloc for memleak debugging:

XPRA_DETECT_MEMLEAKS=1 python3 /usr/bin/xpra start --start=xterm  --bind-tcp=0.0.0.0:10000 --no-daemon

Doesn't show anything interesting unfortunately.

Sat, 09 May 2020 06:57:24 GMT - Antoine Martin: owner, status changed

owner changed from MJ to Antoine Martin
status changed from new to assigned

Running xpra for hours:

 432349 antoine   20   0 1456064 199964  78032 S  73.8   0.6   4:15.65 /usr/bin/python3.7 /usr/bin/xpra start --start=xterm --bind-tcp=0.0.0.0:10000 --daemon=no

In that session, I'm using simulate_console_user.py to generate xterm screen updates and in particular scroll events. After quite a while:

 432349 antoine   20   0 1680216 310724  78964 S  69.4   0.9 144:25.79 /usr/bin/python3.7 /usr/bin/xpra start --start=xterm --bind-tcp=0.0.0.0:10000 --daemon=no

So, not a huge increase, but still 110MB.

Disconnecting the client:

 432349 antoine   20   0 1622844 278192  78964 S   0.0   0.8 145:20.91 /usr/bin/python3.7 /usr/bin/xpra start --start=xterm --bind-tcp=0.0.0.0:10000 --daemon=no

Brings it back down by about 32MB.

It does however increase by around 10MB with each new connection (even for xpra info), at least for the first few.

Maybe try with gtkperf -a to speed it up? Maybe the leak is in C code via malloc? (valgrind it?)

Running with XPRA_DETECT_MEMLEAKS=999999 and looking at the difference on exit shows nothing useful, the biggest "leak" is 2.5MB!

Sat, 09 May 2020 09:28:38 GMT - Antoine Martin: attachment set

attachment set to massif.png

valgrind massif graph

Sat, 09 May 2020 12:53:06 GMT - MJ:

Note in my usecase there is lot of logs scrolling over the screen fair share of the time.

Does the problem go away if you start your server with --env=XPRA_SCROLL_ENCODING=0 ?

It seems it does, it's been couple of hours and the memres stays at 168MB,

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
 842257 mj        20   0 1239716 166888  70412 S   0.0   2.1   0:34.76 xpra

Few more details.

In my case memory leaks in some 60MB separately mapped blocks as shown in comment:6

My usecase is terminator with 3 panes each with a bash terminal . Perhaps there is an issue with partial screen scrolling? My screen resolution is 1440p.

Sun, 10 May 2020 09:06:51 GMT - Antoine Martin: milestone changed

milestone changed from 4.0 to 4.1

Does the problem go away if you start your server with --env=XPRA_SCROLL_ENCODING=0 ?

It seems it does, it's been couple of hours and the memres stays at 168MB,

That's good, because now I know where to look. And that's also bad, because I've already looked and I really can't see how we would leak memory there. And I can't get any of the usual memleak tools to give me any kind of useful hint.

r26295 makes scroll code more explicit and call free before the garbage collector does - should not matter, but this is better anyway so worth doing.

Mon, 11 May 2020 10:22:07 GMT - MJ:

I went for 4.0-0.r26306 and the issue is still easily reproducible. Also I have solid instructions for reproducing.

The version to be precise:

rpm -i python3-xpra-server-4.0-0.r26306.fc31.x86_64.rpm python3-xpra-4.0-0.r26306.fc31.x86_64.rpm xpra-common-4.0-0.r26306.fc31.noarch.rpm xpra-common-server-4.0-0.r26306.fc31.noarch.rpm ffmpeg-xpra-4.2.2-2.fc31.x86_64.rpm x264-xpra-20190929-1.fc31.x86_64.rpm

Reproduce steps

start xpra with no tweaks (specifically no --env=XPRA_SCROLL_ENCODING=0) with terminator --start-cmd=terminator
in the middle of terminator window right click split horizontally to split it into two panes
in the top terminal pane kick off man bash
scroll the pane with page-down
observe memres resource: every press of page down the memres increases by 7-30MB depending on resolution of the terminator window (and perhaps size of the pane)

Facts: with single pane no observable leakage

Tue, 12 May 2020 14:31:55 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to MJ
status changed from assigned to new

Also I have solid instructions for reproducing.

Doesn't reproduce any problems for me. The memory usage is very stable.

Maybe you're using mmap and you are mistaking the increase due to mmap sharing for an actual memory leak? Unlike a leak, the mmap shared memory usage goes up only until it reaches the size of the mmap area and wraps around (the size varies - usually around 256MB).

Please post the server's -d compress to confirm. If that's the case, you can turn off mmap with --mmap=no to continue to try to identify the leak.

This should not be affected by scrolling, and for whatever reason I am not seeing any scrolling packets when I test using your steps.

Tue, 12 May 2020 20:28:37 GMT - MJ:

I mean the actual memory working set size when using term memory-usage as well as res metric from top. However, when issue gets reproduced virt metric grows as well, so it is somewhat good proxy for tracking res (hence my earlier workaround to use ulimit -S). When problem occurs I can see pmap reporting increase with mem mapped blocks. Note that eventually the amount of res in use is in gigabytes, so obviously virt needs to be way ahead of this and the delta looks roughly constant and around 1GB.

Tue, 12 May 2020 20:47:41 GMT - MJ:

Replying to Antoine Martin:

Maybe you're using mmap and you are mistaking the increase due to mmap sharing for an actual memory leak?

You seem to be right that I too quickly attributed single page-down key press to abnormal increase in memory usage. Indeed in the early phase it might look to be the case. Nonetheless, for me the memory usage quickly surpasses 256MB.

I need to probably correct this step of my instructions:

- * scroll the pane with page-down
+ * press page-down to scroll through the entire man page, and then switch to page-up and repeat

It took me 2 minutes to have res reach 1GB (and virt 2GB). Note in my case it helps if the top pane is less than half of the window perhaps because this ensures quicker screen updates.

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1704506 mj        20   0 2065708   1.1g  43264 S  52.3  14.0   1:33.35 xpra

Note the above was with xpra upgrade --env=XPRA_SCROLL_ENCODING=1 --mmap=no :123

Tue, 12 May 2020 21:00:36 GMT - MJ:

After initial warm-up to 500MB, I can see xpra allocating more and more blocks. When constantly scrolling there is a couple of blocks every 10s perhaps.

Every 2.0s: pmap 1705983 | diff /tmp/pmap1.txt -                                 *** Tue May 12 21:59:26 2020
9,10c9,28
< 00007f0f7c000000  10520K rw---   [ anon ]
< 00007f0f7ca46000  55016K -----   [ anon ]
---
> 00007f0f5c000000  65312K rw---   [ anon ]
> 00007f0f5ffc8000    224K -----   [ anon ]
> 00007f0f60000000  38276K rw---   [ anon ]
> 00007f0f62561000  27260K -----   [ anon ]
> 00007f0f64000000  65460K rw---   [ anon ]
> 00007f0f67fed000     76K -----   [ anon ]
> 00007f0f68000000  65424K rw---   [ anon ]
> 00007f0f6bfe4000    112K -----   [ anon ]
> 00007f0f6c000000  64940K rw---   [ anon ]
> 00007f0f6ff6b000    596K -----   [ anon ]
> 00007f0f70000000  65208K rw---   [ anon ]
> 00007f0f73fae000    328K -----   [ anon ]
> 00007f0f74000000  65172K rw---   [ anon ]
> 00007f0f77fa5000    364K -----   [ anon ]
> 00007f0f78000000  64924K rw---   [ anon ]
> 00007f0f7bf67000    612K -----   [ anon ]
> 00007f0f7c000000  64788K rw---   [ anon ]
> 00007f0f7ff45000    748K -----   [ anon ]
> 00007f0f80000000  65408K rw---   [ anon ]
> 00007f0f83fe0000    128K -----   [ anon ]
27,28c45,46
< 00007f0fa4000000    160K rw---   [ anon ]
< 00007f0fa4028000  65376K -----   [ anon ]
---
> 00007f0fa4000000    164K rw---   [ anon ]
> 00007f0fa4029000  65372K -----   [ anon ]
59,60c77,78
< 00007f0fc0000000    132K rw---   [ anon ]
< 00007f0fc0021000  65404K -----   [ anon ]
---
> 00007f0fc0000000    140K rw---   [ anon ]
> 00007f0fc0023000  65396K -----   [ anon ]
1071c1089
<  total          1475884K
---
>  total          2065708K

Tue, 12 May 2020 21:18:30 GMT - MJ: attachment set

attachment set to xpra_debug_compress.txt

xpra debug compress

Tue, 12 May 2020 21:19:00 GMT - MJ:

I've added debug compress logging in an attachement

Wed, 13 May 2020 05:34:57 GMT - Antoine Martin:

OK, so most of your -d compress log entries look like this:

compress:   0.2ms for 1278x1368 pixels at    0,0    for wid=1     using    scroll as   1 rectangles  ( 6829KB)           , sequence  3279, client_options={'window-size': (1278, 1368), 'flush': 2}
3270	2020-05-12 22:14:15,044 compress:   0.8ms for 1278x18   pixels at    0,39   for wid=1     using     rgb24 with ratio   5.4%  (   89KB to     4KB), sequence  3280, client_options={'rgb_format': 'RGB', 'zlib': 4, 'flush': 1, 'window-size': (1278, 1368)}
compress:   0.8ms for 1278x21   pixels at    0,361  for wid=1     using     rgb24 with ratio   4.8%  (  104KB to     5KB), sequence  3281, client_options={'rgb_format': 'RGB', 'zlib': 4, 'window-size': (1278, 1368)}
compress:   1.9ms for 1258x22   pixels at    1,361  for wid=1     using     rgb24 with ratio   4.6%  (  108KB to     5KB), sequence  3281, client_options={'rgb_format': 'RGB', 'zlib': 5}
3273	2020-05-12 22:14:15,786 compress:   0.3ms for 1278x1368 pixels at    0,0    for wid=1     using    scroll as   1 rectangles  ( 6829KB)           , sequence  3283, client_options={'window-size': (1278, 1368), 'flush': 1}
compress:   1.4ms for 1278x21   pixels at    0,361  for wid=1     using     rgb24 with ratio   4.8%  (  104KB to     5KB), sequence  3284, client_options={'rgb_format': 'RGB', 'zlib': 5, 'window-size': (1278, 1368)}

A mixture of scroll and rgb24.

This raises some questions: why is your system not using lz4? (zlib is absolutely terrible) How did you install xpra? Are you not using the repository and dnf? Is python-pillow not installed?

I need to probably correct this step of my instructions:

No matter how many times I scroll up and down, I don't get any memory increase, apart from the initial mmap related one.

Wed, 13 May 2020 10:12:18 GMT - MJ:

I used sudo dnf install xpra

$ sudo dnf install python3-pillow
Last metadata expiration check: 0:24:14 ago on Wed May 13 10:47:13 2020.
Package python3-pillow-6.2.2-1.fc31.x86_64 is already installed.

I also have python3-lz4 package lz4-libs

Wed, 13 May 2020 15:08:08 GMT - Antoine Martin:

I also have python3-lz4 package lz4-libs

That's odd. Then there must be something in your configuration or command lines that makes xpra not use lz4. Please post any differences shown in xpra showconfig (for both client and server), and the xpra info output for the server.

Does your client have opengl acceleration enabled or not? (and does it make any difference for the leak?)

I'll try to reproduce by forcing rgb24 with scroll. New locations to look for leaks: rgb_encode / rgb_reformat, image.may_restride, image.get_sub_image and freeing.

Wed, 13 May 2020 17:33:16 GMT - MJ:

Client showconfig shows no changes.

Server config:

[mj@mj-opti7040 ~]$ xpra showconfig
(..)
xvfb                  (used)   = 'xpra_Xdummy -noreset -novtswitch -nolisten tcp +extension GLX +extension RANDR +extension RENDER -auth $XAUTHORITY -logfile ${XPRA_LOG_DIR}/Xorg.${DISPLAY}.log -configdir ${XDG_RUNTIME_DIR}/xpra/xorg.conf.d/$PID -config /etc/xpra/xorg.conf'  <class 'str'>
xvfb                 (default) = '/usr/libexec/Xorg -noreset -novtswitch -nolisten tcp +extension GLX +extension RANDR +extension RENDER -auth $XAUTHORITY -logfile ${XPRA_LOG_DIR}/Xorg.${DISPLAY}.log -configdir ${XDG_RUNTIME_DIR}/xpra/xorg.conf.d/$PID -config /etc/xpra/xorg.conf'  <class 'str'>

Server side command:

( ulimit -Sv 3000000; xpra upgrade --env=XPRA_SCROLL_ENCODING=1 :123)

Client side command

PS C:\Program Files\Xpra> .\Xpra_cmd.exe --version
xpra v3.0.8-r25879
.\Xpra_cmd.exe attach --quality=100 --speaker=off --ssh='wsl ssh' ssh/mjd/123 --start=terminator

Note if I missed to mentioned: except for one experiment on this bug I used xpra v3.0.9-r26127. I abandoned xpra 4.0 as it has additional problem that looks like a back buffer flicker.

Mon, 18 May 2020 07:48:07 GMT - MJ:

I have tried with and without opengl, in both cases memory leaks into gigabytes, and in both cases -d compress shows compression zlib

For the record my command line was:

( ulimit -Sv 3000000; xpra upgrade --env=XPRA_SCROLL_ENCODING=1 --mmap=no -d compress :123)

Thu, 21 May 2020 14:26:11 GMT - Antoine Martin: priority changed

priority changed from critical to major

I abandoned xpra 4.0 as it has additional problem that looks like a back buffer flicker.

Which windows? All of them? Only text windows? My guess is that this probably goes away if you run the client with --opengl=force. (or --opengl=off)

... and in both cases -d compress shows compression zlib

I strongly suspect that you're leaving a key detail out, because it should be impossible to end up using zlib instead of lz4 with any standard installation. No matter what I do, I end up with lz4 and never see any zlib anywhere. So I am lowering the priority.

When the client is connected, your server should show both client.lz4=True and network.lz4=True when you run xpra info | grep "\.lz4".

Thu, 21 May 2020 15:13:20 GMT - MJ:

Looks I have client.lz4=False I have no idea how the value is set and how it matters whether it would be zlib or lz4 unless the problem was with zlib specifically.

Thu, 21 May 2020 15:37:50 GMT - Antoine Martin:

Looks I have client.lz4=False

I've just downloaded https://xpra.org/dists/windows/Xpra-Python3-x86_64_Setup_3.0.9-r26126.exe and then installed it in a clean windows-10 VM, and I get client.lz4=True. (also with the "client" build) So something funky is going on with your system. It should also show lz4 : True if you run Network_info.exe. It does here.

Thu, 21 May 2020 19:23:58 GMT - MJ:

Replying to Antoine Martin:

It should also show lz4 : True if you run Network_info.exe. It does here.

It shows False in my case,

Thu, 21 May 2020 19:45:14 GMT - MJ:

I have reinstalled client to the version you suggested. lz4 is now true and debug shows it is in effect.

More importantly the memory issue has apparently gone away even if I peg the client to zlib.

The bug no longer affects me. However, the bug still exist as it seems a dodgy client can make server leak memory and the memory is not reclaimed on client disconnection.

Thanks for all the help.

Fri, 22 May 2020 02:16:04 GMT - Antoine Martin:

However, the bug still exist as it seems a dodgy client can make server leak memory and the memory is not reclaimed on client disconnection.

Yes, there are actually two bugs I still need to fix:

server side issue with zlib
client side builds missing lz4 (and maybe also lzo?)

Can you tell me which exact version of the installer is missing lz4? Ideally as an md5 sum of the exe. We have so many builds, it's hard to know where to look! (pure client vs regular builds, python2 vs python3, 32-bit vs 64-bit.. many combinations!)

Fri, 22 May 2020 07:05:32 GMT - MJ:

The client I used was xpra v3.0.8-r25879, I could see the lz4 dll being present there and certainly it was on the default list of --compressors.

Fri, 22 May 2020 07:39:27 GMT - Antoine Martin:

The client I used was xpra v3.0.8-r25879

There are 8 different builds for 3.0.8 on mswindows. Which one was it?

Fri, 22 May 2020 12:27:13 GMT - MJ:

Xpra-Client-Python2-x86_64_3.0.8-r25879.msi was the bad one while now I am on Xpra-Python3-x86_64_Setup_3.0.9-r26126.exe The big differences are python2 vs python3 and msi installer vs exe

md5sum:
5694c85c55f6d9b4d2aa1afb74cd00b4  Xpra-Client-Python2-x86_64_3.0.8-r25879.msi

Sun, 24 May 2020 15:43:35 GMT - Antoine Martin: owner, status changed

owner changed from MJ to Antoine Martin
status changed from new to assigned

First problem identified: liblz4.dll ended up duplicated in two paths, and the library loader failed to load it from either of them.. (probably a difference between python2 and python3?). r26460 fixes that. (backport in r26461)

Three issues remain:

there's a new packaging problem with python2 builds: something broke python-gobject so the latest build has lz4 support (can be seen via Network_info.exe) but the client doesn't run: #2777
why didn't the server switch to lzo? (maybe not installed? will check)
the actual memory leak

Tue, 22 Sep 2020 07:41:29 GMT - tomuxi:

(Following this ticket as memory leaks on the server side have been the major reason for not being able to recommend xpra for our other users so far.)

Tue, 22 Sep 2020 07:43:53 GMT - Antoine Martin: owner, status changed

owner changed from Antoine Martin to MJ
status changed from assigned to new

More importantly the memory issue has apparently gone away even if I peg the client to zlib. (..) memory leaks on the server side have been the major reason ...

Is it not gone then?

Tue, 22 Sep 2020 08:04:30 GMT - MJ:

I was able to reproduce this only with xpra3x+python2 windows client, and not since I switched the client. However, the leaks ave been occurring on server side. It is just that said client was particularly able in exposing the bug.

Wed, 23 Sep 2020 06:58:24 GMT - tomuxi:

(In my case the leaks were appearing in a similar setup - using terminator with multiple tabs and splits, and a xpra client (from 3.0.9-python2-setup.exe) that was not able to use lz4 - yielding to gigabyte size leaks on server side over some weeks, ultimately causing swapping and ooming. After switching to a client (from 3.0.11-python2-setup.exe) that is able to use lz4 things seem better on the server side at least after two days. Previously, similar memory problems have been occurring on the server side since I started to use xpra (since 2.something).

Thu, 24 Sep 2020 11:49:29 GMT - Antoine Martin: owner, status changed

owner changed from MJ to Antoine Martin
status changed from new to assigned

I think I can reproduce the problem with r27540 and these specific commands on Linux:

XPRA_ALPHA=0 xpra start --start=xterm --bind-tcp=0.0.0.0:10000 -d compress,encoding
XPRA_ALPHA=0 XPRA_LZO=0 XPRA_LZ4=0 xpra attach --no-mmap -d draw --encodings=scroll,rgb

Then scrolling in terminator after splitting horizontally. It seems to be the scroll packets that cause the leak. Scrolling one page at a time doesn't do it, one line at a time does.

Fri, 25 Sep 2020 10:38:52 GMT - tomuxi:

Unfortunately also for me, using lz4 and 3.0.11 for both server and client, still leaking. After 4 days of moderate terminator session xpra is using 2g. It seems mouse-wheel scrolling (line-by-line) indeed aggressively increases this amount.

Fri, 25 Sep 2020 11:50:49 GMT - Antoine Martin: attachment set

attachment set to malloc-tracker.patch

replace malloc / free with tracking functions

Fri, 25 Sep 2020 12:00:55 GMT - Antoine Martin:

Using the patch above (easier to use than valgrind!), it's pretty clear that we aren't leaking through malloc / free calls in the cython scrolling detection as the tracker is always almost empty.

My suspicions turn to:

restride
non-scroll sub-image (+rgb encode?)

Sat, 26 Sep 2020 14:51:45 GMT - Antoine Martin:

Using tracemalloc around do_scroll_encoding and comparing between 100 runs shows nothing unexpected:

[ Top 10 ]
/usr/lib64/python3.7/site-packages/xpra/os_util.py:118: size=144 KiB (-143 KiB), count=2013 (-1994), average=73 B
/usr/lib64/python3.7/site-packages/xpra/x11/server_keyboard_config.py:100: size=96 B (-71.2 KiB), count=1 (-760), average=96 B
/usr/lib64/python3.7/site-packages/xpra/server/source/source_stats.py:154: size=92.7 KiB (+61.2 KiB), count=1181 (+786), average=80 B
/usr/lib64/python3.7/site-packages/xpra/server/source/source_stats.py:113: size=53.6 KiB (+52.7 KiB), count=757 (+753), average=73 B
/usr/lib64/python3.7/site-packages/xpra/x11/x11_server_core.py:490: size=68.1 KiB (-45.8 KiB), count=967 (-651), average=72 B
/usr/lib64/python3.7/site-packages/xpra/x11/models/window.py:788: size=0 B (-16.0 KiB), count=0 (-1)
/usr/lib64/python3.7/site-packages/xpra/util.py:374: size=15.6 KiB (-15.6 KiB), count=1 (-1), average=15.6 KiB
/usr/lib64/python3.7/site-packages/xpra/server/source/source_stats.py:101: size=30.2 KiB (+13.2 KiB), count=332 (+149), average=93 B
/usr/lib64/python3.7/logging/__init__.py:1472: size=25.8 KiB (+11.2 KiB), count=300 (+130), average=88 B
/usr/lib64/python3.7/site-packages/xpra/server/source/source_stats.py:100: size=22.3 KiB (+9768 B), count=260 (+111), average=88 B

Sat, 26 Sep 2020 16:00:39 GMT - Antoine Martin: status changed; resolution set

status changed from assigned to closed
resolution set to fixed

Finally fixed in r27552. (stable updates coming soon)

This bug required a combination of:

x11 seamless servers
scrolling - which uses sub-images
rgb encoding, with either re-striding or rgb pixel order reformatting

It's been present for a long time, but was more of a problem in recent versions because we use scrolling detection more aggressively now.

Feel free to re-open if you still reproduce the problem.

Sat, 23 Jan 2021 05:59:13 GMT - migration script:

this ticket has been moved to: https://github.com/Xpra-org/xpra/issues/2730