xpra icon
Bug tracker and wiki

Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#35 closed defect (fixed)

Window does not properly refresh

Reported by: Doug Doole Owned by: Antoine Martin
Priority: major Milestone: 0.0.7.x
Component: client Version: 0.0.7.32
Keywords: Cc:

Description

In recent builds, I've been having window refresh issues, particularly when using gvim. If I page up or page down, or search for a term that requires moving more than a page in the file, then sometimes the window doesn't redraw properly and the top part of the window is the new text and the bottom part of the window is whatever was displayed previously. The point in the window where the redraw stops is not consistent, although it's generally about mid-window.

Most of the time, using "refresh" from the system tray icon's menu will clear up the drawing problem.

I have observed this behaviour on both 0.0.7.28 and 0.0.7.29. I did not see the problem on 0.0.7.23.

My client is on Kubuntu 11.04 and my server is on Ubuntu 10.04.

Please let me know what diagnostics I can gather to help you narrow this down.

Attachments (4)

q12.sql.basic.mod.exfmt (83.8 KB) - added by Doug Doole 9 years ago.
xpra-show-damage-sequence.patch (5.7 KB) - added by Antoine Martin 9 years ago.
adds debug output to allow us to verify damage sequence numbers
screen-1124.jpg (192.6 KB) - added by Doug Doole 9 years ago.
2011-11-24-130321_3840x2560_scrot.png (134.7 KB) - added by Doug Doole 9 years ago.

Download all attachments as: .zip

Change History (44)

comment:1 Changed 9 years ago by Antoine Martin

Status: newaccepted

Would you mind trying to disable refresh batching by setting:

BATCH_EVENTS = False

in the damage() method in xpra/server.py

I think the problem is that the damage data is arriving in the wrong order (the ones that get batched arrive later than some of the newer damage data).
Can you confirm that this solves the problem and I'll cook up a better patch?

comment:2 Changed 9 years ago by Antoine Martin

This should be fixed properly in r260

comment:3 Changed 9 years ago by Doug Doole

I've applied the change to server.py and after playing with it for a few minutes it looks good.

comment:4 Changed 9 years ago by Doug Doole

I spent enough time using the server.py patch this afternoon (with no redraw glitches) that I'm confident your assessment of the problem is correct.

comment:5 Changed 9 years ago by Doug Doole

I just moved to 0.0.7.30 and the screen is refreshing properly. Thanks for the quick fix.

comment:6 Changed 9 years ago by Antoine Martin

Resolution: fixed
Status: acceptedclosed

comment:7 Changed 9 years ago by Doug Doole

Resolution: fixed
Status: closedreopened

Unfortunately, I just hit a case where the screen is not refreshing properly on 0.0.7.30. So the patch works much better than the original code, but it still has issues.

If I set BATCH_EVENTS = False, the refresh problem does go away.

comment:8 Changed 9 years ago by Antoine Martin

Status: reopenedaccepted

comment:9 Changed 9 years ago by Antoine Martin

Can you tell me how to reproduce?
I have just reviewed (and simplified the code in r269) and cannot see anything wrong.

For reference (please read to ensure this makes sense and that I haven't missed anything obvious), here's how damage regions are handled in the damage() method:

  • if batching is disabled, just queue the new damage region in self._damage for this window id, adding the region to the existing one if one already existed and tell the network code that a packet is ready. ("damage_now")
  • add this damage event to self._damage_last_events
  • if the window id already has a delayed region in self._damage_delayed then we add the damage data to it
  • if there aren't enough events for batching, or if they happened too slowly then we just damage_now (see above)
  • otherwise we create a new entry in self._damage_delayed for this window and schedule the "send_delayed" to run shortly after.

"send_delayed" simply expires this window's entry in self._damage_delayed dict by moving it to "self._damage_delayed_expired" and tell the network code that a packet is ready.
When sending damage packets, we pick them from the expired list first (if any) or from the regular damage list otherwise.

So for example, when the screen suddenly starts updating rapidly, we will fill up self._damage_last_events, and when it reaches the batching limit we will create a new damage_delayed entry and append all new damage regions to it until send_delayed fires. After it does fire (and assuming the fast screen updates continue), a new one will be created and the same process will continue.
When the screen updates slow down, the last few updates will continue to get added to the current damage_delayed entry until send_delayed fires, after that they will fire directly as "damage_now". The order will be preserved since we pick the expired ones ahead of the regular ones...

Last edited 9 years ago by Antoine Martin (previous) (diff)

Changed 9 years ago by Doug Doole

Attachment: q12.sql.basic.mod.exfmt added

comment:10 Changed 9 years ago by Doug Doole

I attached the file q12.sql.basic.mod.exfmt to this bug. To reproduce the problem, all I have to do is:

1) gvim q12.sql.basic.mod.exfmt
2) Maximize the gvim window
3) Start using <page down> to scroll through the file.

The drawing problem occurs on the third page of the file, where the tree structure is being drawn and fills the whole screen.

Some observations:

  • When I first reported this bug, it was rather random when it would happen. With this new occurrence, it is very reproducible.
  • Screen resolution doesn't seem to matter, but the gvim window must be maximized.
  • The drawing problem only appears when scrolling down through the file. Scrolling up doesn't cause the problem.
  • Not all files exhibit this problem. I've got another file with a very similar graph and it displays properly.

Changed 9 years ago by Antoine Martin

adds debug output to allow us to verify damage sequence numbers

comment:11 Changed 9 years ago by Antoine Martin

I have tried scrolling with <page down>, with just <down>, etc... and cannot reproduce with gvim... I have tried with debug on or off, etc. Screen size 1920x1080

The attached patch was used to verify the sequence of the damage packets, and confirms that the packets are received in the correct order (same order as the damage events). So the problem must be somewhere else...
When this does occur, is it possible to get it to redraw the screen region that are messed up without using the "refresh" menu option?

comment:12 Changed 9 years ago by Doug Doole

I have seen the problem on 4 different screen sizes: 1280x1024, 1600x1200, 1680x1050, 1920x1200. However, in all cases the client has been running nVidia's twinview over two screens and the maximized window was filling just one of the monitors. I'll try again with a system that has only one monitor on it, but it may take me a day or two to get the test done.

I hadn't played with the "refresh" option this time, and surprisingly, using the "refresh" option does not clear up the bad drawing. It may or may not cause some small parts of the problem area to redraw, but the majority of the window is still incorrect.

So far the only reliable way I have found to fix the display is to hit <page down>, <page up>. (<page up>, <page down> generally leads to the same bad display.)

comment:13 Changed 9 years ago by Doug Doole

Aha. I think I just found the significant aspect of this problem. When I am looking at that file, I'm using a 3 tier setup.

I use the xpra client to display a konsole window at the xpra server. In the konsole, I do "ssh -X final_machine".

When I gvim the file at the final machine, I get the screen corruption problem. If I bring the file to my xpra server and gvim it there, I do not get the problem.

Sorry, I should have tried removing the extra layer sooner. I have everything hidden behind scripts and generally don't think of the network connections. (This is the first time the extra layer has been an issue. Both my initial report of drawing issues as well as the cursor left/down problem in #36 behave the same whether I have the 3rd tier or not.)

comment:14 Changed 9 years ago by Antoine Martin

That is very strange, I don't understand why the ssh X11 forwarding would cause screen corruption...
Could be an interaction between Xvfb and ssh-X11 I guess, have you tried using Xdummy to see if that helps? (#10)

comment:15 Changed 9 years ago by Doug Doole

I'll take a look at Xdummy, but it might take me a couple days to get to it.

In any case, this seems to be a different issue than the one you fixed. Would you prefer to close off this ticket as fixed, and track this ssh forwarding issue in a new ticket?

comment:16 Changed 9 years ago by Antoine Martin

Hmm, let's leave it here for now, as turning BATCH_EVENTS off solves the issue... this seems to indicate a bug in the damage batching code (although it may still be something else, like a race made more likely with it enabled)

(FYI: there are Xdummy packages in ticket #10)

comment:17 Changed 9 years ago by Doug Doole

I seem to be having trouble with my Xdummy configuration.

In the server log file I see Xdummy starting up, and there are no error messages, but I never get the "xpra is ready message".

After that, I am unable to connect to the server ("[Errno 2] No such file or directory"), nor can I stop xpra ("cannot find a live server to connect to").

In my process list, I can see both an xpra process and an Xorg-Xdummy process running.

There is nothing in my /var/log/Xorg.1.log file.

comment:18 Changed 9 years ago by Antoine Martin

There should be nothing in you /var/log/Xorg.1.log file since this is not the logfile that should be used.
Can you paste the exact commands that you tried to use? (see comment:3 in #10)

This should work:

xpra start :100 --no-daemon --enable-mmap \
  --xvfb="/usr/local/bin/Xdummy +extension GLX +extension RANDR -config /usr/share/winswitch/xorg-xdummy-xpra.conf -logfile $HOME/Xorg-test.log"

comment:19 Changed 9 years ago by Doug Doole

I was using a simpler line:

xpra start :1 --xvfb=/opt/xpra/Xdummy/Xdummy-xpra

Your longer line didn't make any difference. (As an aside, --enable-mmap wasn't recognized in my build.)

After playing a bit, I think Xdummy is hanging part way through its initialization - there's not enough output or in the log file.

Most interestingly, when I kill xpra with ctrl-c (assuming I used --no-daemon), Xdummy responds with:

Fatal server error:
xf86OpenConsole: Cannot open /dev/tty0 (No such file or directory)

I am running xpra on a VM on a machine in a server room - its entirely possible that the machine does not have a keyboard or monitor attached to it.

comment:20 Changed 9 years ago by Doug Doole

I found the problem. I stopped using the Xdummy-xpra script and just called Xdummy directly. This got me past the hang and tty problems.

comment:21 Changed 9 years ago by Doug Doole

Unfortunately, using Xdummy does not solve the repainting problems.

comment:22 Changed 9 years ago by Antoine Martin

Although I have not been able to identify the cause of the corruption, I have seen it (but very rarely), the changes introduced in r284 (threaded server) seem to avoid it. Whether the problem is truly gone or much harder to trigger is not entirely clear...
Please let me know if this helps.

comment:23 Changed 9 years ago by Doug Doole

I probably won't be able to try out r284 this week.

However, a new symptom to report: I just encountered screen corruption using full screen vi even though I have "BATCH_EVENTS = false" in my server.py file.

comment:24 Changed 9 years ago by Doug Doole

I gave r284 a try. It looks like one step forward, and a half step back.

When testing with "vi q12.sql.basic.mod.exfmt", I was unable to reproduce the screen corruption. So that's definitely good news.

However, when I maximized the vi window, only the window border maximized. The displayed text and the scroll bar remained the same size, but moved to the upper left corner of the screen. (See screen-1124.jpg.) The contents of the window did not respond to mouse or keyboard. Using the refresh menu option did not help either. However, once I minimized the window and restored it, it worked properly.

So r284 is definitely a step in the right direction, but something still isn't quite right.

Changed 9 years ago by Doug Doole

Attachment: screen-1124.jpg added

comment:25 Changed 9 years ago by Antoine Martin

Was this on r284 or r292 ?
There were a number of changes in this area recently so I suspect this may be the latter?

comment:26 Changed 9 years ago by Antoine Martin

Hmmm, can't reproduce this here at all, here's how I am testing:

xpra start :7
DISPLAY:7 gvim ./Downloads/q12.sql.basic.mod.exfmt
xpra attach :7

Then I just scroll a few pages down and maximize the window?
Am I missing something? Is it 100% reproducible? Anything in the client or server log?

comment:27 Changed 9 years ago by Doug Doole

I tested on r284. However I just pulled down r292 and it exhibits the same behaviour.

The server log showed a couple "Uh-oh, our size doesn't fit our window sizing constraints!" messages when I opened vi. I didn't see any additional messages when I maximized the window.

I didn't see any messages on the client.

As mentioned in my earlier description, I'm using a 3 tier setup:

xpra client --ssh--> xpra server --ssh X forwarding--> work machine

If I edit the file directly at the xpra server, I don't have any screen problems. I only see the trouble when I do "ssh -X" from the xpra server to the work machine, and then edit the file on the work machine.

comment:28 Changed 9 years ago by Antoine Martin

OK, I've tried that too, and still cannot get it to misbehave...

  • Do you think latency is an issue here? Do I need to test on a host with high latency? What is the ping time between your xpra server and your work machine?
  • Does using the --auto-refresh-delay=DELAY option (with short and long delays) eventually redraw the screen properly?
  • When the screen corruption does occur, can you take a screenshot of xpra's Xvfb display and see if the corruption is also present there? (ie: "DISPLAY=:100 scrot")

comment:29 Changed 9 years ago by Doug Doole

  • This is not a high latency connection. All three machines are in the same building and we have a high speed network. Ping times averaged about 0.25 ms.
  • I tried delays of 0.1, 1 and 10 and it didn't have any effect.
  • The screen shot showed the same corruption. (See attached 2011-11-24-130321_3840x2560_scrot.png)

Changed 9 years ago by Doug Doole

comment:30 Changed 9 years ago by Doug Doole

Whoops, need to amend my comment about the screen shot - I'd pulled up the wrong file.

The Xvfb screen shot just shows a black rectangle - nothing like what I'm seeing on the screen.

Last edited 9 years ago by Doug Doole (previous) (diff)

comment:31 Changed 9 years ago by Antoine Martin

Yes, silly me, the screenshot cannot show the window contents since those are captured/redirected by xpra...

When I resize the window, I do see something similar to your screenshot before the full-screen update arrives - takes a split second. Now, I really need a way to reproduce this problem here before I can fix it...
I noticed that you have a fairly large screen size, can you confirm if your problem is reproducible at lower resolutions too? Maybe there is an overflow of some sort at high resolutions?

comment:32 Changed 9 years ago by Doug Doole

3840x2560 is the default size for the Xvfb session - I don't change it.

In the past, I've tried things with a bunch of different physical screen sizes at the client (see comment:12) and it didn't make a difference. Admittedly, I haven't tried those experiments with the current problem.

So, did you want me to try with a different physical screen size at the client, or did you want me to try requesting a different resolution for Xvfb?

comment:33 Changed 9 years ago by Antoine Martin

Well, the Xvfb size should not really matter (as long as it is large enough obviously) so it would be nice to confirm that this problem is not related to your large physical screen size somehow.

I don't understand why it would cause problems since the data does get there when you re-connect (right?), but this patch splits large requests into sub-regions - maybe it will help:

Index: xpra/server.py
===================================================================
--- xpra/server.py	(revision 295)
+++ xpra/server.py	(working copy)
@@ -331,19 +331,29 @@
             except Exception, e:
                 log.error("damage_to_data: error processing region %s: %s", damage, e)
                 continue
-            gobject.idle_add(self._process_damage_regions, id, window, regions)
+            gobject.idle_add(self._process_damage_regions, id, window, ww, wh, regions)
 
-    def _process_damage_regions(self, id, window, regions):
+    def _process_damage_regions(self, id, window, ww, wh, regions):
         # It's important to acknowledge changes *before* we extract them,
         # to avoid a race condition.
-        log("damage_to_data: regions=%s, sending damage ack", regions)
+        log.info("damage_to_data: regions=%s, sending damage ack", regions)
         window.acknowledge_changes()
         pixmap = window.get_property("client-contents")
         if pixmap is None:
             log.error("wtf, pixmap is None?")
             return
-        log("damage_to_data: pixmap size=%s", pixmap.get_size())
-        #TODO: here we could split a single large full_window update into chunks...
+        log.info("damage_to_data: pixmap size=%s, window size=%s", pixmap.get_size(), (ww, wh))
+        if len(regions)==1:
+            (x, y, w, h, full_window) = regions[0]
+            if full_window and w*h>=512*512:
+                #split the single large full screen region:
+                regions = []
+                while y<wh:
+                    h = min(512, wh-y)
+                    regions.append((x,y,w,h,False))
+                    y += 512
+                log.info("process_damage_regions: split full window update into: %s" % regions)
+                
         for region in regions:
             (x, y, w, h, full_window) = region
             if full_window:

comment:34 Changed 9 years ago by Doug Doole

I tried your patch, and it didn't help. (I did capture the log in case it will be helpful.)

I also restarted my laptop so that it only had the local screen (1680x1050). That didn't help either.

comment:35 Changed 9 years ago by Antoine Martin

OK, I managed to get it to misbehave with a simple local connection and it seemed 100% reproducible at the time... but since then I have not been able to reproduce it again!?

For reference, it did completely stop redrawing anything, but in some cases I was able to make it resume by focusing in/out or re-attaching.

On the plus side, it did force me to investigate a few areas and I have found things that needed improvement, hopefully one of those will include your fix:

  • r302: request the new window contents after resizing it!
  • r303: try harder to cancel the damage requests that we can still withdraw (not an actual bug - but will leave fewer damage requests to debug/inspect)
  • r300: use the "actual-size" of the window if available as "geometry" is sometimes wrong!? (not that it should really matter for this particular bug since we should be sending the full pixmap anyway...)

If you still get the same problem with r305 (includes a few logging tweaks on top), it would be very interesting if you could post:

  • the server output in "-d all" mode egrepped for "resize_window", "process_damage_regions", "data_to_packet" and "write thread"
  • the client log with the small patch below to confirm the client is receiving all the damage data

If at all possible, starting just before the problem occurs as this will be quite long otherwise.

After seeing this bug for myself, I now suspect it is some sort of deadlock in the new threaded code, so I will review this thoroughly before making the new release.

Client debugging patch:

Index: xpra/client.py
===================================================================
--- xpra/client.py	(revision 298)
+++ xpra/client.py	(working copy)
@@ -813,6 +813,7 @@
         else:
             packet_sequence = None
         window = self._id_to_window[id]
+        log.info("drawing %s", (x,y,width,height))
         window.draw(x, y, width, height, coding, data)
         if packet_sequence and self.send_damage_sequence:
             self.send_now(["damage-sequence", packet_sequence])

comment:36 Changed 9 years ago by Doug Doole

I moved to r305 and I am still seeing the problem.

To capture the log data, I established my connections, loaded up vi and then marked the logs. Then I hit maximize. So I should only have captured log data from the maximize operation. (I may have caught disconnecting the client as well.)

For the lines you're interested in, the server only reported:

resize_window to 1590x1019, desktop manager set it to 1590x1019, visible=False
write thread: empty marker, exiting
write thread: ended, closing socket

Nothing was logged at the client.

In terms of recovering from the problem once it has occurred. I can get the window to redraw be either minimizing and then restoring the window, or dropping and reconnecting the client.

comment:37 Changed 9 years ago by Antoine Martin

OK, for some strange reason, Xpra thinks that your window is not visible and skips sending the new window data.
With my environment, the window is always shown as visible=True...
Is your window hidden when you maximize it? (and even so, it should still work..)

Maybe this will workaround it (no guarantee), but I need to figure out why Xpra thinks that the window is not visible to be able to commit a proper fix:

         (_, _, ww, wh) = self._desktop_manager.window_geometry(window)
         visible = self._desktop_manager.visible(window)
         log("resize_window to %sx%s, desktop manager set it to %sx%s, visible=%s", w, h, ww, wh, visible)
-        if visible:
-            self._damage(window, 0, 0, w, h)
+        #if visible:
+        self._damage(window, 0, 0, w, h)
 
     def _process_focus(self, proto, packet):
         if len(packet)==3:

comment:38 Changed 9 years ago by Doug Doole

My window is definitely visible. It is even sitting in front of all the other windows.

Commenting out the if visible check did not help.

comment:39 Changed 9 years ago by Antoine Martin

Resolution: fixed
Status: acceptedclosed

I am pretty sure the original issue can be closed now, as this is now a different bug: #48, please add details there.

If not, feel free to re-open.

comment:40 Changed 9 years ago by Antoine Martin

Milestone: 0.0.7.x
Version: 0.0.7.32
Note: See TracTickets for help on using tickets.