xpra icon
Bug tracker and wiki

Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#912 closed defect (fixed)

"Too many open files" caused by 0.14 clients with trunk (0.16) servers, after many sound restarts

Reported by: Antoine Martin Owned by: Antoine Martin
Priority: blocker Milestone: 0.16
Component: server Version: 0.15.x
Keywords: Cc:

Description

My OSX clients running in vbox get lots of restarts, which makes us restart a new process on the server.
Eventually leading to:

2015-07-08 11:22:39,408 error setting up sound: [Errno 24] Too many open files
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/server/source.py", line 799, in start_sending_sound
    ss.start()
  File "/usr/lib64/python2.7/site-packages/xpra/sound/wrapper.py", line 192, in start
    subprocess_caller.start(self)
  File "/usr/lib64/python2.7/site-packages/xpra/net/subprocess_wrapper.py", line 313, in start
    self.process = self.exec_subprocess()
  File "/usr/lib64/python2.7/site-packages/xpra/net/subprocess_wrapper.py", line 338, in exec_subprocess
    proc = subprocess.Popen(self.command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=sys.stderr.fileno(), env=self.get_env(), **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 702, in __init__
    errread, errwrite), to_close = self._get_handles(stdin, stdout, stderr)
  File "/usr/lib64/python2.7/subprocess.py", line 1130, in _get_handles
    c2pread, c2pwrite = self.pipe_cloexec()
  File "/usr/lib64/python2.7/subprocess.py", line 1175, in pipe_cloexec
    r, w = os.pipe()
OSError: [Errno 24] Too many open files

And maybe other weird behaviour as we run out of resources.

Relevant links:

The subprocess should be garbage collected after we call terminate() and cause a SIGINT: the child reaper has code to explicitly "forget" about the dead process.

Change History (6)

comment:1 Changed 4 years ago by Antoine Martin

Status: newassigned

I can see the leak simply by keeping an eye on:

watch "ls -la /proc/19763/fd  | wc -l"

It goes up by 8 file descriptors with every restart!

And you can trigger this bug with 0.14.x clients more easily by using XPRA_SOUND_FAKE_OVERRUN=1 xpra attach ..

Last edited 4 years ago by Antoine Martin (previous) (diff)

comment:2 Changed 4 years ago by Antoine Martin

Resolution: fixed
Status: assignedclosed

The leak was coming from the new palib (used in trunk only) and not from the subprocess code!

  • r9905 reverts back to "pactl" for now
  • r9906 makes it easier to look for fd leaks
Last edited 4 years ago by Antoine Martin (previous) (diff)

comment:3 Changed 4 years ago by Antoine Martin

Summary: "Too many open files" caused by 0.14 clients with 0.15 servers, after many sound restarts"Too many open files" caused by 0.14 clients with trunk (0.16) servers, after many sound restarts

comment:4 Changed 4 years ago by Antoine Martin

I was going to replace this palib code with a "more simple" dbus version (in a separate process if needed, since python dbus does not play well with threading - and we want synchronous calls, which may not be possible).

But oh my. Every single example out there of using pulseaudio via python dbus is completely broken, at least with Fedora 22 and Ubuntu Vivid. They just don't run because the connection fails. This includes the reference code: connecting to server, which spends a lot of lines of code trying to figure out where the pulseaudio socket lives (and why do they keep on changing it without keeping at least a symlink in the old location?), and still gets it wrong. Looks lile it should be using /run/user/1000/pulse/native instead (for uid=1000).

Once you get past this hurdle, you get the very unhelpful org.freedesktop.DBus.Error.NoReply, which doesn't tell you why things don't work at all, and from then on your connection is dropped and you get org.freedesktop.DBus.Error.Disconnected.
The only thing I found was in the system log: [pulseaudio] pstream.c: Received SHM frame on a socket where SHM is disabled.
I didn't ask for SHM and I don't need it either, wth?

This is par for the course given the constant API breakage coming from systemd / freedesktop (previous example here: ticket:492#comment:3)

Options left:

  • give up and live with pactl
  • find the leak in palib
  • use python-pulseaudio (and hope it doesn't leak either)

That's enough time wasted for now.

Last edited 4 years ago by Antoine Martin (previous) (diff)

comment:5 Changed 3 years ago by Antoine Martin

The palib code is not getting fixed anytime soon, removed completely in r12177.

comment:6 Changed 3 years ago by Antoine Martin

See #1148.

Note: See TracTickets for help on using tickets.