xpra icon
Bug tracker and wiki

This bug tracker and wiki are being discontinued
please use https://github.com/Xpra-org/xpra instead.


Changes between Initial Version and Version 1 of Ticket #2851


Ignore:
Timestamp:
07/23/20 04:17:46 (7 months ago)
Author:
Antoine Martin
Comment:

Thanks for the detailed analysis. It sounds like this may be a threading issue, perhaps we end up calling free on the ScrollData object whilst the encoding thread is still using it. (as it can be called from quite a few places)

I guess the hack fix would be to return early from match_distance if a1 or a2 are null, but this wouldn't get to the root cause

Yes. Another easy "fix" would be to add locking, but that's just lazy and might slow things down - I will try to locate the threading issue first. (starting with some thread checks first)

You may be able to avoid the crash for now by turning off scroll encoding. In v4, this can be done using XPRA_SCROLL_ENCODING=0 xpra ... (works for both client and server), in v4.1 you have to use --encodings= (see #2810).

This could also just be a memory corruption from somewhere else manifesting itself there.


I have no clue as to what, because I don't actually understand the code

Mile high view: ScrollData holds the xxhash checksums of every line of two screen updates of the same size:

  • update adds a new screen update and computes the new checksums
  • calculate finds the scrolling distances that can be used to generate the newest screen update from the previous one (including duplicates)
  • get_scroll_values finds the best matches (without duplicates) and uses match_distance to do this

There are a few docstrings in the source code, though they're probably not detailed enough.

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #2851

    • Property Priority changed from major to critical
    • Property Status changed from new to assigned
  • Ticket #2851 – Description

    initial v1  
    44
    55The crash is happening in the match_distance function of motion.pyx, here:
    6 
     6{{{
    77        for i1 in range(rstart, rend):
    88            i2 = i1+distance
    99            v = a1[i1]       <----CRASH
    10 
     10}}}
    1111By analysing a core file I've managed to ascertain that 'distance' is zero, line_state is a list of 970 zero-bytes, and 'self' has the following value:
    12 
     12{{{
    1313{ob_base = {ob_refcnt = 4,                                                         
    1414    ob_type = 0x7f0f86a329e0 <__pyx_type_4xpra_6server_6window_6motion_ScrollData>}, __pyx_vtab =
     
    1616  __weakref__ = 0x0, distances = 0x0, a1 = 0x0, a2 = 0x0, matched = 0 '\000',             
    1717  x = 211, y = 44, width = 1542, height = 970}
    18 
    19 So the crash is happening because self.a1 is null.
     18}}}
     19So the crash is happening because `self.a1` is `null`.
    2020
    2121This is called from get_scroll_values() here:
    22 
     22{{{
    2323            for i in reversed(sorted(scroll_hits.keys())):
    2424                v = scroll_hits[i]
     
    2626                    #find matching lines:
    2727                    line_defs = self.match_distance(line_state, scroll, MIN_LINE_COUNT)
    28 
     28}}}
    2929It looks like 'i' is pointing to the 9th element of the sorted list when this happens, although this is difficult to say because of the various layers of translation and optimisation.  The puzzle is that this function starts with
    30 
     30{{{
    3131        if self.a1==NULL or self.a2==NULL:
    3232            return None
    33 
     33}}}
    3434and so self.a1 and self.a2 should be guaranteed non-null, right?  So something must be nullifying them after the function has started.  I have no clue as to what, because I don't actually understand the code.  I am imagining that distance shouldn't really be zero here, but don't know whether that's anything to do with it.  I guess the hack fix would be to return early from match_distance if a1 or a2 are null, but this wouldn't get to the root cause.