xpra icon
Bug tracker and wiki

Opened 3 months ago

Last modified 3 weeks ago

#1941 assigned defect

Java Mouse Location Incorrect when moving a window

Reported by: mjharkin Owned by: Antoine Martin
Priority: critical Milestone: 2.5
Component: server Version: trunk
Keywords: Cc:

Description (last modified by Antoine Martin)

I have an undecorated window that I want to move by dragging the panel inside. However under the scenario WindowDragNoDelay the window jumps erratically with the mouse position read incorrectly in Java (also printed to stdout).

Through debugging I've 2 scenarios (WindowDragWithDelay,WindowDragNoMove) that can return the mouse location correctly and 1 (WindowDragNoDelay) that doesn't.

As a control (CentOS without Xpra) WindowDragNoDelay will return the correct mouse location.

Reproduce by installing java and running the windowdrag.jar in attached zip file:

java -cp windowdrag.jar WindowDragNoDelay
java -cp windowdrag.jar WindowDragWithDelay
java -cp windowdrag.jar WindowDragNoMove

Can attach logs if you provide the log categories you want.

Python Client on Windows 10 (also can reproduce with HTML5 client)
Server on CentOS 7.
Client and server both on revision r20214

Attachments (2)

WinDrag.zip (18.7 KB) - added by mjharkin 3 months ago.
Sample jar and source
disable-pointer-adjustment.patch (577 bytes) - added by Antoine Martin 3 months ago.
disable pointer adjustments

Download all attachments as: .zip

Change History (16)

Changed 3 months ago by mjharkin

Attachment: WinDrag.zip added

Sample jar and source

comment:1 Changed 3 months ago by mjharkin

Component: javaserver

comment:2 Changed 3 months ago by Antoine Martin

Description: modified (diff)
Priority: majorcritical
Status: newassigned

Changed 3 months ago by Antoine Martin

disable pointer adjustments

comment:3 Changed 3 months ago by Antoine Martin

Description: modified (diff)

The patch above "fixes" the issue, but we can't apply it.

The problem comes from the fact that since the xpra server and client(s) can be detached and re-attached, each one has its own window model and each model can be different. For example, windows may not be shown on screen exactly where the server thinks they are. That's especially the case when you enable session sharing and connect with multiple clients.

So we keep track of where the clients map each window, and when we process pointer events from a client we adjust the position of the event so that it lands in exactly the same place on the server side window.
When running your example code, we create a race condition: the server moves the window and sends a message to the client to do likewise but before the client has a chance to do so it has sent a pointer motion event which then gets adjusted for the new window position the client knew nothing about at that point.. So we shouldn't adjust it, and that's why the patch "works".

comment:4 Changed 3 months ago by mjharkin

Resolution: wontfix
Status: assignedclosed

Thanks for the quick turn around and explanation, I've tested the patch and confirm it works as expected.

Our use case is single screens per user and sharing disabled so will probably be able to work something out from the patch.

Again really appreciated.

comment:5 Changed 3 months ago by Antoine Martin

Resolution: wontfix
Status: closedreopened

Re-opening: this bug should be fixed, just not with the patch above which was just there to show where the problem came from.

comment:6 Changed 3 months ago by Antoine Martin

Correct fix in r20225, this was hard because everything is asynchronous, including X11 events, client-server messaging, etc
So now the client keeps track of the delta between the window location requested by the server and what it actually is using at any point in time. It communicates this delta value with the server through the "map" and "configure" packets.
Then the server can adjust pointer events with the correct delta, even if the window has moved since the last time the client got a move notification.

Still TODO:

  • ClientTray class could be updated so we can eventually replace the mapped_at attribute with just the delta, since the former is prone to race conditions
  • update the html5 client? (same reason)
  • maybe move all window scaling code to the window class (rather than passing both scaled and unscaled coordinates)
  • created new ticket: #1942 to make the code more manageable in the future
  • re-test: shadow servers, scaling, etc

@mjharkin: your test case now works with this new code. You will need an updated client and server (2.4 beta r20225 or later, builds have been uploaded for mswindows, centos 7.4 and 7.5: https://xpra.org/beta/). This change is not suitable for backporting to older branches.

Note that this is not how you should be doing the window move from your Java code, as this is excruciatingly slow, even on local connections.
Instead of managing the window position yourself (which will happen server-side and cause the churn), you should be requesting that the window manager moves the window for you. This will happen client-side and will be smooth as silk. (though somewhat less smoothly on mswindows and macos than other platforms since we have to emulate the X11 window-manager-spec's _NET_WM_MOVERESIZE: #772)
This can be achieved on X11 with _NET_WM_MOVERESIZE, you can find an example we use for testing here: browser/xpra/trunk/src/tests/xpra/test_apps/test_window_initiatemoveresize.py

comment:7 Changed 3 months ago by mjharkin

I can also confirm the test case works with r20225.

I've ran into 2 issues though:

  1. Using the installer from https://xpra.org/beta/ worked over a tcp connection but I'm getting an error over websocket connection:
 disconnect: zlib packet decompression failed must be string or read-only buffer, not memoryview

Not sure if it's caused by these changes though.

  1. Building the client from scratch on Windows completed successfully but when I run Xpra-Launcher-Debug.exe I'm getting dll module load failures.
2018-08-29 21:44:04,041 Warning: zlib is the only compressor enabled
2018-08-29 21:44:04,045  install and enable lzo or lz4 support for better performance
2018-08-29 21:44:04,049 Xpra gtk2 client version 2.4 64-bit
2018-08-29 21:44:04,058  running on Microsoft Windows 10
2018-08-29 21:44:04,071 Error: failed to load overlay icon 'C:\Users\Ultrabook\git\Xpra\trunk\src\build\exe.mingw-2.7\icons\xpra.png':
Traceback (most recent call last):
  File "./xpra/client/mixins/window_manager.py", line 189, in init
  File "C:/msys64/mingw64/lib/python2.7/site-packages/PIL/Image.py", line 64, in <module>
    from . import _imaging as core
ImportError: DLL load failed: The specified module could not be found.
2018-08-29 21:44:04,097  DLL load failed: The specified module could not be found.
2018-08-29 21:44:04,104 Warning: failed to import opencv:
2018-08-29 21:44:04,108  No module named cv2
2018-08-29 21:44:04,112  webcam forwarding is disabled
2018-08-29 21:44:04,127 Error importing swscale colorspace conversion (csc_swscale)
2018-08-29 21:44:04,130  DLL load failed: The specified module could not be found.
2018-08-29 21:44:04,146 Error importing webp decoder (dec_webp)
2018-08-29 21:44:04,148  DLL load failed: The specified module could not be found.
2018-08-29 21:44:04,155 Error importing JPEG decoder (dec_jpeg)
2018-08-29 21:44:04,162  DLL load failed: The specified module could not be found.
2018-08-29 21:44:04,169 Error importing vpx decoder (dec_vpx)
2018-08-29 21:44:04,171  DLL load failed: The specified module could not be found.
2018-08-29 21:44:04,180 Error importing avcodec2 decoder (dec_avcodec2)
2018-08-29 21:44:04,182  DLL load failed: The specified module could not be found.
2018-08-29 21:44:04,267 get_pixbuf(information.png)
Traceback (most recent call last):
  File "./xpra/client/gtk_base/gtk_client_base.py", line 532, in get_pixbuf
GError: Couldnâ?Tt recognize the image file format for file â?oC:\Users\Ultrabook\git\Xpra\trunk\src\build\exe.mingw-2.7\icons\information.pngâ??
...

Agreed this type of code is less than ideal but I'm constrained by a 3rd party. Funnily enough the 3rd party code is less efficient (more computation) than the test case but results in better performance for this scenario.

comment:8 Changed 3 months ago by J. Max Mena

I dislike popping in to a random ticket, but r20225 introduces a serious mouse bug. I've double checked this with a bisection and r20225 is the guilty revision.

For reference, both the client and server are Fedora 28 machines running trunk Xpra built from source.

r20225 and newer sessions have a mouse offset - in that where the mouse is on the client does not reflect where the server sees the mouse. It's quite simple to reproduce:

  • Start a session with an Xterm (making it fullscreen can help show the offset a little clearer) and attach (checked with SSH and TCP clients)
  • Run dmesg to get a wall of text
  • Try to use the mouse to select some text

On my machine - a non-fullscreen Xterm in this situation is unable to select any text that shows up. Making it go fullscreen - I see that the mouse is selecting text about 10 or so pixels below (about 4 lines of text at my DPI/resolution) where the client mouse is.

I can gather some -d mouse client and server side logs if you'd like - but it's very easy to reproduce and visualize what is happening.

Last edited 3 months ago by J. Max Mena (previous) (diff)

comment:9 Changed 3 months ago by Antoine Martin

Owner: changed from Antoine Martin to J. Max Mena
Status: reopenednew

disconnect: zlib packet decompression failed must be string or read-only buffer, not memoryview

Fixed: ticket:1926#comment:2.

Not sure if it's caused by these changes though.

It wasn't, but thanks for reporting it.
(out of curiosity, what made you try the websocket transport? it is slower after all)

install and enable lzo or lz4 support for better performance

For lz4, see ticket:1929#comment:1. (patching required)
For lzo, try this from a mingw shell:

python -c "import lzo"

Is it installed? Or did it not get packaged properly?
(try to run Network_info.exe -v)

Error: failed to load overlay icon ...
(..)
DLL load failed: The specified module could not be found.

Hmm. Things are not going to work well without Pillow and the codecs.
Maybe your system has a dependency that I don't have and it doesn't get packaged?
Are you fully up to date? (pacman -Syyu)
Try updating all the python libraries too:

sh win32/UPDATE_PYTHON_LIBS.sh

You can try to figure out which DLL is missing using Dependencies (a new "dependency walker") by pointing it to the "pyd" files that generate the DLL warnings.
If you still have problems with this, can you follow up in #1883 or even a new ticket?
(let's keep this ticket about pointer position)


r20225 introduces a serious mouse bug. I've double checked this with a bisection and r20225 is the guilty revision.

r20230 fixes this: when the client moves a window (or maps it), the server should only apply the delta value until the client catches up (if it ever does).
Which means that this statement:

.. so we can eventually replace the mapped_at attribute with just the delta, since the former is prone to race conditions

Is wrong. We need to keep both.

Last edited 3 months ago by Antoine Martin (previous) (diff)

comment:10 Changed 3 months ago by mjharkin

@Antoine Martin

Now using lz4 as you described.

python -c "import lzo"

Gave an error (but not needed as I'll use lz4)

NameError: name 'lzo' is not defined

Still have an issue with Pillow. Am fully up to date on a clean install as per the Windows building guide.
But will continue investigating and update the relevant ticket if needed.

For your interest we are using the websocket connection for 2 reasons:

  1. allow use to use the same port as http traffic (with Nginx proxy).
  2. allow us to load balance easily (by scaling the service in Docker) albeit at the cost of loosing the session on reconnect.

comment:11 Changed 3 months ago by Antoine Martin

Owner: changed from J. Max Mena to Antoine Martin
Status: newassigned

Taking the ticket back: forgot reinit_windows (fixed in r20243) and this solution isn't going to work well with sharing enabled - I think I have a better plan.

Last edited 3 months ago by Antoine Martin (previous) (diff)

comment:12 Changed 3 months ago by Antoine Martin

Reverted r20225 + r20230 in r20249. Still working on a correct solution..

Related: r20252 + r20253 exposes window-relative pointer position in pointer-position and button-action packets, this may help and could actually simplify pointer adjustments when dealing with desktop and shadow servers: with those we never care about the position of the window on the client, only the relative coordinates are useful.

comment:13 Changed 6 weeks ago by Antoine Martin

Milestone: 2.42.5

Too late for 2.4, this sort of change can break things.

comment:14 Changed 3 weeks ago by Antoine Martin

#2008 could also be related to this bug.

Note: See TracTickets for help on using tickets.