xpra icon
Bug tracker and wiki

Opened 8 months ago

Closed 8 months ago

Last modified 4 months ago

#1473 closed enhancement (fixed)

HTML5: audiocontext caching, re-connect, events, etc

Reported by: JAremko Owned by: JAremko
Priority: trivial Milestone: 2.1
Component: html5 Version: trunk
Keywords: Cc:

Description

I'm making an April Fools' Day button for spacemacs.org with Xpra HTML5 and Docker Swarm back-end with the default load balancer. Here it is: http://xpratest.tk (temporary location) I had to modify Xpra code to make it work properly(ish).
I think it will be great to make it doable with the unmodified Xpra

So the modifications:
First of all client need the ability to reconnect without refreshing page. It includes killing web workers and timers + Client constructor calls Utilities.getAudioContext that creates audio contexts (they are limited per page)
Also the error WebSocket connection to 'ws://****/' failed: Error in connection establishment: net::ERR_CONNECTION_REFUSED should be handled properly (It occurs when there is no live backends)

At the server side I needed a way to drop the new WS client instead of the old one - this way a client will attempt to connect until it hits a live container without a client. And it has to be done before the new client will be able to mess up with the previous one (for example, force disconnect)

  • - - - - - - - - - -

WARNING: eye bleed inducing code smell ahead!

server: https://github.com/JAremko/browsermax
client: https://github.com/JAremko/develop.spacemacs.org

For now random hello timeouts seems to be the biggest problem. I had to UP the timeout substantially. (may be Docker related problem - I built it from github trunk, because I needed some extra sandbox features)

Change History (26)

comment:1 Changed 8 months ago by JAremko

Also protecting Xpra server with CAPTCHA would be great :P For the cases like this. When you just want to embed Xpra window on a site and let users access it with some basic abuse prevention, but without custom reverse proxy shenanigans.

comment:2 Changed 8 months ago by Antoine Martin

Owner: changed from Antoine Martin to JAremko

A more generic solution to your "Drop-latest-WS-connection.patch" has been added in r15393: if "steal" is false (not the default), we reject the connection if a client is already connected. (with some other cleanups thrown in)

The AudioContext limitation: wouldn't it be cleaner to just cache the return value in Utilities.getAudioContext? (this should be safe as it is never called from a worker?)

As for the hello timeouts: do you really need 120 seconds?!
Where is it getting stuck during all that time?

The Error in connection establishment: net::ERR_CONNECTION_REFUSED - isn't this handled already? I get sent back to the connect page.

I think that captcha stuff is out of scope as this would make it tied to an API.

comment:3 in reply to:  2 Changed 8 months ago by JAremko

Hi Antoine, thanks for responding!


Replying to Antoine Martin:

A more generic solution to your "Drop-latest-WS-connection.patch" has been added in r15393: if "steal" is false (not the default), we reject the connection if a client is already connected. (with some other cleanups thrown in)

Thx. I'll look into it.


The AudioContext limitation: wouldn't it be cleaner to just cache the return value in Utilities.getAudioContext? (this should be safe as it is never called from a worker?)

you're right. This is how it should be handled in the "oficial implementation" I just do not need sound at all :)


As for the hello timeouts: do you really need 120 seconds?!
Where is it getting stuck during all that time?

I think it is something related to my docker setup. Or due to web-browser "HTTP simultaneous connections per host limit". So I need better tests - if I want to go serious with this :P So far it looks like this huge timeout thingy doesn't hurt.


The Error in connection establishment: net::ERR_CONNECTION_REFUSED - isn't this handled already? I get sent back to the connect page.

new WebSocket(...) in the web worker probably should be wrapped in something that allows retrying.


I think that captcha stuff is out of scope as this would make it tied to an API.

How about some kind of a universal interface for the captcha providers? Example: xpra start --captcha-provider=/usr/local/bin/captcha ... Where /usr/local/bin/captcha is a user made proxy to a captcha API like recaptcha it returns html code that will be shown by Xpra HTML5 client + a way to verify it (may be if called with a user response as an argument?) If server's captcha will use the same TCP connection (WS channel) - it will simplify load balancing. But it will need timeout on captcha solving to prevent DoS.

Last edited 8 months ago by JAremko (previous) (diff)

comment:4 Changed 8 months ago by JAremko

Also for restarting Client it doesn't make sense to retest stuff like web worker support. And Xpra Client should clean the Xpra container element.

comment:5 Changed 8 months ago by JAremko

If Xpra had Client event onconnect it would help with switching host page interface (changing from starting to started state, for example)
And events like "on first GUI element(window?) appeared" and "on last GUI element disappeared" will make this ugly сrutch unnecessary.

comment:6 Changed 8 months ago by Antoine Martin

  • audiocontext caching done in r15399
  • r15404 adds the 4 event hooks you need (see changes to index.html for usage) - note: the "on_last_window" may fire more than once if the client stays connected and a new window is shown then destroyed, "on_first_ui_event" will only fire once - but it will fire again after a "re-connection". See below:
  • "reconnect" option added in r15402 (with lots of other cleanups) - that should deal with ERR_CONNECTION_REFUSED

Note: this doesn't fire for when the server is terminated normally ("disconnect" packet handler still goes back to the connect page)

Tested by killing the server with "kill -9" and then re-starting one quickly: the html5 client connects to the new server. (forcibly killing the TCP connection should have the same effect)


I don't think I will ever have time or interest in the captcha API feature, so please create a separate ticket for that if you wish. (bearing in mind that unless you start working on it, not much is likely to happen...)

comment:7 in reply to:  6 Changed 8 months ago by JAremko

Replying to Antoine Martin:

This is amazing, thank You very much!

comment:8 in reply to:  6 Changed 8 months ago by JAremko

Replying to Antoine Martin:

  • "reconnect" option added in r15402

Can this.reconnect_count=0 (or -1) mean infinitely?

comment:9 in reply to:  6 Changed 8 months ago by JAremko

Replying to Antoine Martin:

I'm getting error Uncaught ReferenceError: me is not defined at Client.js:30

// assign callback for window resize event
if (window.jQuery) {
  jQuery(window).resize(jQuery.debounce(250, function (e) {
    me._screen_resized(e, me);
  }));
}

http://xpra.org/trac/browser/xpra/trunk/src/html5/js/Client.js?rev=15402#L30

Last edited 8 months ago by JAremko (previous) (diff)

comment:10 Changed 8 months ago by JAremko

chrome tab dies if server is unreachable due to multiple Protocol.js workers thread alive simultaneously

http://i.imgur.com/lG8jtOx.png

Also they're keep on trying to connect to the server even when I'm clearly connected and can interact with GUI

Reproduce at http://xpratest.tk/ : Connect, then disconnect by closing window and try to connect from the same tab.

Can it be because I have such a huge hello timeout?

I set back-end count to 1 so it will be easier to debug. Also firewall allows only 1 new connection in 20 seconds (from the same IP)

Last edited 8 months ago by JAremko (previous) (diff)

comment:11 Changed 8 months ago by JAremko

The steal=false option seems to work, but the logs entry may be incomplete. (I don't see the text part)

Last edited 8 months ago by JAremko (previous) (diff)

comment:12 Changed 8 months ago by JAremko

Should reconnect occur when server or client timeout happens?

Last edited 8 months ago by JAremko (previous) (diff)

comment:13 Changed 8 months ago by Antoine Martin

chrome tab dies if server is unreachable due to multiple Protocol.js workers thread alive simultaneously
Also they're keep on trying to connect to the server even when I'm clearly connected and can interact with GUI

How do I reproduce this with the default xpra html5 client?
There should only be a single worker, which we re-use.
The re-connection should only happen when the current connection failed or dropped.

The steal=false option seems to work, but the logs entry may be incomplete. (I don't see the text part)

I don't understand what you mean. What logs? Can you show an "incomplete" sample of the log you are talking about?

Should reconnect occur when server or client timeout happens?

As of r15414, we re-connect on ping echo timeouts - which now also trigger more quickly. (15 seconds, configurable)

FYI: r15415 may be of interest to you too, it shows the connection setup progress

Last edited 8 months ago by Antoine Martin (previous) (diff)

comment:14 in reply to:  13 Changed 8 months ago by JAremko

Replying to Antoine Martin:

  • Uncaught ReferenceError: me is not defined at Client.js:30 is fixed in r15413
  • to retry infinitely, just use Number.MAX_SAFE_INTEGER

    chrome tab dies if server is unreachable due to multiple Protocol.js workers thread alive simultaneously
    Also they're keep on trying to connect to the server even when I'm clearly connected and can interact with GUI

How do I reproduce this with the default xpra html5 client?
There should only be a single worker, which we re-use.
The re-connection should only happen when the current connection failed or dropped.

I had the same problem with my old implementation when WebSocket? constructor failed many times. I solved it with this. May be you can make sure that the old Protocol worker is removed before creating a new one?

Mb log will help https://gist.github.com/JAremko/abfb8130d87e85d7df397ea6b112ca80

I think to detect it with the default client you need to disable redirect on disconnect and connect to a wrong WS address. (but currently my client is pretty "default") https://github.com/JAremko/develop.spacemacs.org/blob/gh-pages/index.html#L383


The steal=false option seems to work, but the logs entry may be incomplete. (I don't see the text part)

I don't understand what you mean. What logs? Can you show an "incomplete" sample of the log you are talking about?

Oh borrower log show only "session busy" I was looking for "this session is already active" ok.


I noticed that after a reconnect the Client doesn't honor this rule https://github.com/JAremko/develop.spacemacs.org/blob/gh-pages/index.html#L448

Last edited 8 months ago by JAremko (previous) (diff)

comment:15 Changed 8 months ago by JAremko

Hm...

2017-03-26 06:26:47,482 created unix domain socket: /home/emacs/.emacs.d/.cache/bbd19a81f821-14
Unable to create /home/emacs/.dbus
Unable to create /home/emacs/.dbus/session-bus
2017-03-26 06:26:51,360 serving html content from: /usr/share/xpra/www
2017-03-26 06:26:51,467 started command 'emacs -geometry 100x48 --chdir "/home/emacs/.emacs.d/.cache/workspace"' with pid 50
2017-03-26 06:26:51,467 xpra X11 version 2.1 64-bit
2017-03-26 06:26:51,467  uid=1000 (spacemacser), gid=1000 (xpra)
2017-03-26 06:26:51,467  running with pid 12 on Linux
2017-03-26 06:26:51,468  connected to X11 display :14 with 24 bit colors
2017-03-26 06:26:51,469 15.6GB of system memory
2017-03-26 06:26:51,485 xpra is ready.

(process:51): GLib-GIO-CRITICAL **: g_settings_schema_source_lookup: assertion 'source != NULL' failed
2017-03-26 06:29:53,152 Handshake complete; enabling connection
2017-03-26 06:29:53,159 HTML5 Linux client version 2.1
2017-03-26 06:29:53,159  automatic picture encoding enabled
2017-03-26 06:29:53,160  also available:
2017-03-26 06:29:53,160   jpeg, png, rgb32
2017-03-26 06:29:53,160  client root window size is 1920x1014 with 1 display:
2017-03-26 06:29:53,160   HTML (508x268 mm - DPI: 96x96)
2017-03-26 06:29:53,160     Canvas
2017-03-26 06:29:53,161 setting keyboard layout to 'us'
2017-03-26 06:29:53,207 client 1: got hello: server version 2.1 accepted our connection
2017-03-26 06:29:53,225 client 1: startup complete
2017-03-26 06:29:58,465 Handshake complete; enabling connection
2017-03-26 06:29:58,465 Disconnecting client 10.255.0.2:52434:
2017-03-26 06:29:58,465  new client (this session does not allow sharing)
2017-03-26 06:29:58,466 xpra client 1 disconnected.
2017-03-26 06:29:58,467 HTML5 Linux client version 2.1
2017-03-26 06:29:58,467  automatic picture encoding enabled
2017-03-26 06:29:58,467  also available:
2017-03-26 06:29:58,467   jpeg, png, rgb32
2017-03-26 06:29:58,467 Last client has disconnected, terminating
2017-03-26 06:29:58,467 xpra is terminating.
2017-03-26 06:29:58,471  client root window size is 1920x1014 with 1 display:
2017-03-26 06:29:58,471   HTML (508x268 mm - DPI: 96x96)
2017-03-26 06:29:58,471     Canvas
2017-03-26 06:29:58,472 keyboard mapping already configured (skipped)
2017-03-26 06:29:58,506 client 2: got hello: server version 2.1 accepted our connection
2017-03-26 06:29:58,508 client 2: startup complete

This server has become a zombie

comment:16 Changed 8 months ago by JAremko

So the zombie protocol workers and zombie servers are the biggest two problems now.

comment:17 Changed 8 months ago by JAremko

Looks like all you need to get this "zombie worker" bug in chrome is to disable redirect on disconnect and attempt to connect to a bad(closed, unresponsive) port with many retries and wait a minute.

http://i.imgur.com/gx5qkyt.png

each error block spawns extra workers.

Firefox doesn't seems to be affected.

Last edited 8 months ago by JAremko (previous) (diff)

comment:18 Changed 8 months ago by JAremko

I made You a test page http://xpratest.tk/zombie-test

It is the default client (r15425). I only change server, port and enable debug mode.

Last edited 8 months ago by Antoine Martin (previous) (diff)

comment:19 Changed 8 months ago by Antoine Martin

Lots of fixes in r15430 + r15431.
Does that work for you?

(PS: please don't add links to the wiki if those are likely to go 404 in the future)

comment:20 in reply to:  19 Changed 8 months ago by JAremko

Replying to Antoine Martin:

Lots of fixes in r15430 + r15431.
Does that work for you?

(PS: please don't add links to the wiki if those are likely to go 404 in the future)

Looks good to me. No zombie workers and it works.

Have you fixed zombie servers as well?

comment:21 Changed 8 months ago by Antoine Martin

Have you fixed zombie servers as well?

What are those?

Can I close this ticket?

comment:22 in reply to:  21 Changed 8 months ago by JAremko

Replying to Antoine Martin:

Have you fixed zombie servers as well?

What are those?

2017-03-26 06:29:58,465 Disconnecting client 10.255.0.2:52434:
2017-03-26 06:29:58,465  new client (this session does not allow sharing)
2017-03-26 06:29:58,466 xpra client 1 disconnected.
2017-03-26 06:29:58,467 HTML5 Linux client version 2.1
2017-03-26 06:29:58,467  automatic picture encoding enabled
2017-03-26 06:29:58,467  also available:
2017-03-26 06:29:58,467   jpeg, png, rgb32
2017-03-26 06:29:58,467 Last client has disconnected, terminating
2017-03-26 06:29:58,467 xpra is terminating.
2017-03-26 06:29:58,471  client root window size is 1920x1014 with 1 display:
2017-03-26 06:29:58,471   HTML (508x268 mm - DPI: 96x96)
2017-03-26 06:29:58,471     Canvas
2017-03-26 06:29:58,472 keyboard mapping already configured (skipped)
2017-03-26 06:29:58,506 client 2: got hello: server version 2.1 accepted our connection
2017-03-26 06:29:58,508 client 2: startup complete

It seems to be something with that the first client exited before the new one completed handshake so the Xpra server is simply hangs.

Hard to reproduce...

Last edited 8 months ago by JAremko (previous) (diff)

comment:23 Changed 8 months ago by Antoine Martin

Resolution: fixed
Status: newclosed

Please create a new ticket for the server issue - this doesn't look related to the html5 client at all.

comment:24 Changed 8 months ago by JAremko

ok. Thanks for all the hard work! Really appreciate it.

Last edited 8 months ago by JAremko (previous) (diff)

comment:25 Changed 5 months ago by Antoine Martin

Milestone: future2.1
Summary: HTML5 mode suggestionsHTML5: audiocontext caching, re-connect, events, etc

(edit milestone and title)

See also #1491

comment:26 Changed 4 months ago by Antoine Martin

re-connect bug: #1586

Note: See TracTickets for help on using tickets.