xpra icon
Bug tracker and wiki

Opened 13 days ago

Last modified 2 days ago

#2022 new defect

Client doesn't stay on NVENC h264, even when selected in the UI

Reported by: Nathan Hallquist Owned by: Nathan Hallquist
Priority: major Milestone: 2.5
Component: encodings Version: 2.4.x
Keywords: Cc:

Description

Unless I run xpra with only h264 enabled,

/usr/bin/python2 /usr/bin/xpra start -d encoder --start=xterm --no-daemon --video-encoders=nvenc --encodings=h264 :57

I can observe from the debug data that very few frames are actually rendered using NVENC even when I have h264 selected from the UI. When I run it using the above command, it works wonderfully. I'm using r20858 on CentOS7 with VirtualGL on the server and an r20882 client on windows. The driver is 390.87. I'm using a C2075 for render and a K20c for encode.

I do not think the problem has much to do with my hardware or reversion because I'm having this trouble on all of my configurations including a brand-new new computer (the other one is pretty old) with a GTX 1060 running release 2.3.4 server with release 2.4 client.

Nathan

Attachments (12)

2.txt (50.6 KB) - added by Nathan Hallquist 12 days ago.
Log for second try.
3.txt (181.5 KB) - added by Nathan Hallquist 12 days ago.
third example
1.txt (89.1 KB) - added by Antoine Martin 11 days ago.
re-adding attachment with txt extension so it can be viewed online
4.txt (119.1 KB) - added by Nathan Hallquist 11 days ago.
5.txt (117.9 KB) - added by Nathan Hallquist 11 days ago.
6.txt (338.0 KB) - added by Nathan Hallquist 11 days ago.
7.txt (289.3 KB) - added by Nathan Hallquist 11 days ago.
8.txt (225.6 KB) - added by Nathan Hallquist 11 days ago.
11.txt (682.6 KB) - added by Nathan Hallquist 10 days ago.
12.txt (1.6 MB) - added by Nathan Hallquist 10 days ago.
14.txt (475.9 KB) - added by Nathan Hallquist 9 days ago.
15.txt (247.0 KB) - added by Nathan Hallquist 9 days ago.

Change History (44)

comment:1 Changed 13 days ago by Antoine Martin

Owner: changed from Antoine Martin to Nathan Hallquist

Xpra will only use video encoders when it thinks is necessary, for more details on the original feature see #410 - there have been many updates and fixes since.

Roughly speaking, the video encoder will not be triggered unless something on screen refreshes rapidly enough.
By only allowing h264, you are bypassing all the automatic encoding logic.

To test, you can run glxgears or anything else that generates lots of frames and verify that NVENC does get used.

If you feel that your screen updates would be better compressed with a video encoder and that the automatic logic is too slow to switch to the video encoder, you can debug the video region detection with -d regiondetect, the encoding selection logic with -d encoding.
Bear in mind that GPU encoders take longer to setup, so it is best to only use them when we're certain that the content is video-ish and is going to keep producing screen updates regularly.

Maybe we could tweak the logic to switch to video sooner when we know we're dealing with an efficient encoder (ie: NVENC).

Last edited 13 days ago by Antoine Martin (previous) (diff)

comment:2 Changed 13 days ago by Nathan Hallquist

Here's a "-d compress" log. For context, this log was collected during a *sustained* rotation operation on an OpenGL rendering of a finite element model of vehicle impact (it is a standard benchmark for us). The log looks like the below excerpt the whole way through. When XPRA jumps between RGB and YUV the screen briefly locks hard.

I have my system tray UI set to speed=auto, quality=auto and encoding = h264.

There are two cases I want to include:

(1) Without "XPRA_FORCE_CSC_MODE=BGRX" it is slow and throws "fail to encode errors" for NVENC sometimes, especially when I set speed to 100 (I had speed set to auto). In this case it didn't throw those errors, but the whole picture *did* turn yellow a few times during the collection of this log.

(2) with XPRA_FORCE_CSC_MODE=BGRX it is also very slow, but not as slow as before (but nothing like speed and perfection of having only NVENC enabled). There were noticeable jerks when it switched between encodings. Never any NVENC encoding errors nor random color changes like when BGRX isn't set.

Case (1) Plain, no XPRA_FORCE_CSC_MODE.

2018-11-01 13:12:06,968 compress:   1.2ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   134, client_options={'zlib': 5, 'delta': 181, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 186}
2018-11-01 13:12:07,553 compress: 582.9ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   1.0%  ( 6128KB to    61KB), sequence   135, client_options={u'speed': 41, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 20}
2018-11-01 13:12:07,695 compress:   2.0ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   136, client_options={'zlib': 5, 'delta': 186, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 189}
2018-11-01 13:12:08,232 compress: 533.5ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   1.0%  ( 6128KB to    60KB), sequence   137, client_options={u'speed': 46, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 34}
2018-11-01 13:12:08,318 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   138, client_options={'zlib': 5, 'delta': 189, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 192}
2018-11-01 13:12:08,365 compress:  46.0ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.8%  ( 6128KB to    48KB), sequence   139, client_options={u'speed': 46, u'frame': 1, u'pts': 626, 'csc': 'YUV444P', u'quality': 34}
2018-11-01 13:12:08,506 compress:   0.2ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   140, client_options={'zlib': 5, 'delta': 192, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 195}
2018-11-01 13:12:08,548 compress:  40.0ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.6%  ( 6128KB to    34KB), sequence   141, client_options={u'speed': 46, u'frame': 2, u'pts': 814, 'csc': 'YUV444P', u'quality': 34}
2018-11-01 13:12:08,696 compress:   1.4ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   142, client_options={'zlib': 5, 'delta': 195, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 198}
2018-11-01 13:12:09,204 compress: 505.4ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   1.0%  ( 6128KB to    58KB), sequence   143, client_options={u'speed': 44, u'frame': 0, u'pts': 0, 'csc': u'XRGB', u'quality': 44}
2018-11-01 13:12:09,300 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   144, client_options={'zlib': 5, 'delta': 198, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 201}
2018-11-01 13:12:09,914 compress: 612.8ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   2.2%  ( 6128KB to   137KB), sequence   145, client_options={u'speed': 46, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 24}
2018-11-01 13:12:09,924 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   146, client_options={'zlib': 5, 'delta': 201, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 204}
2018-11-01 13:12:10,584 compress: 658.1ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   1.2%  ( 6128KB to    73KB), sequence   147, client_options={u'speed': 46, u'frame': 0, u'pts': 0, 'csc': u'XRGB', u'quality': 24}
2018-11-01 13:12:10,648 compress:   0.4ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   148, client_options={'zlib': 5, 'delta': 204, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 207}
2018-11-01 13:12:11,210 compress: 561.0ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   1.2%  ( 6128KB to    70KB), sequence   149, client_options={u'speed': 30, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 5}
2018-11-01 13:12:11,231 compress:   0.1ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   150, client_options={'zlib': 6, 'delta': 207, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 210}
2018-11-01 13:12:11,802 compress: 569.8ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.4%  ( 6128KB to    23KB), sequence   151, client_options={u'speed': 30, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 5}
2018-11-01 13:12:11,822 compress:   0.2ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   152, client_options={'zlib': 6, 'delta': 210, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 213}
2018-11-01 13:12:11,846 compress:  23.9ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.2%  ( 6128KB to    13KB), sequence   153, client_options={u'speed': 30, u'frame': 1, u'pts': 591, 'csc': 'YUV444P', u'quality': 5}
2018-11-01 13:12:12,834 compress:   0.2ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   154, client_options={'zlib': 7, 'delta': 213, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 216}
2018-11-01 13:12:13,367 compress: 532.6ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.3%  ( 6128KB to    21KB), sequence   155, client_options={u'speed': 28, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 4}
2018-11-01 13:12:13,368 compress:   0.1ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   156, client_options={'zlib': 7, 'delta': 216, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 219}
2018-11-01 13:12:13,423 compress:  52.1ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.1%  ( 6128KB to     8KB), sequence   157, client_options={u'speed': 28, u'frame': 1, u'pts': 453, 'csc': 'YUV444P', u'quality': 4}
2018-11-01 13:12:13,744 compress:   0.8ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   158, client_options={'zlib': 6, 'delta': 219, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 222}
2018-11-01 13:12:14,316 compress: 569.9ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.4%  ( 6128KB to    22KB), sequence   159, client_options={u'speed': 32, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 8}

Case 2:

2018-11-01 13:09:17,544 compress:   0.5ms for   36x36   pixels at 1596,411  for wid=2     using     rgb24 with ratio  37.5%  (    5KB to     1KB), sequence   323, client_options={'zlib': 5, 'delta': 401, 'bucket': 1, 'rgb_format': 'BGRX', 'flush': 1, 'store': 469}
2018-11-01 13:09:17,547 compress:   1.1ms for 1512x16   pixels at    2,941  for wid=2     using      jpeg with ratio   1.7%  (   94KB to     1KB), sequence   324, client_options={'quality': 49}
2018-11-01 13:09:18,099 compress:  89.3ms for 1636x959  pixels at    0,0    for wid=2     using      webp with ratio   2.6%  ( 6128KB to   161KB), sequence   325, client_options={u'quality': 100, u'rgb_format': 'BGRX'}
2018-11-01 13:09:18,234 compress:   0.6ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   326, client_options={'zlib': 4, 'delta': 460, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 475}
2018-11-01 13:09:18,814 compress: 577.6ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   4.7%  ( 6128KB to   290KB), sequence   327, client_options={u'speed': 39, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 10}
2018-11-01 13:09:18,835 compress:   0.5ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   328, client_options={'zlib': 6, 'delta': 475, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 478}
2018-11-01 13:09:18,892 compress:  56.2ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   2.2%  ( 6128KB to   133KB), sequence   329, client_options={u'speed': 39, u'frame': 1, u'pts': 606, 'csc': 'YUV444P', u'quality': 10}
2018-11-01 13:09:19,385 compress:   0.7ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.9%  (    6KB to     0KB), sequence   330, client_options={'flush': 1, 'zlib': 6, 'bucket': 0, 'store': 481, 'rgb_format': 'BGRX'}
2018-11-01 13:09:19,442 compress:  55.0ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   3.2%  ( 6128KB to   193KB), sequence   331, client_options={u'speed': 39, u'frame': 2, u'pts': 1156, 'csc': 'YUV444P', u'quality': 10}
2018-11-01 13:09:19,841 compress:   1.2ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   332, client_options={'zlib': 5, 'delta': 481, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 484}
2018-11-01 13:09:20,438 compress: 592.9ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.4%  ( 6128KB to    26KB), sequence   333, client_options={u'speed': 41, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 7}
2018-11-01 13:09:20,569 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   334, client_options={'zlib': 5, 'delta': 484, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 487}
2018-11-01 13:09:20,620 compress:  50.2ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.3%  ( 6128KB to    16KB), sequence   335, client_options={u'speed': 41, u'frame': 1, u'pts': 730, 'csc': 'YUV444P', u'quality': 7}
2018-11-01 13:09:20,832 compress:   0.8ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   336, client_options={'zlib': 5, 'delta': 487, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 490}
2018-11-01 13:09:21,400 compress: 562.8ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.7%  ( 6128KB to    41KB), sequence   337, client_options={u'speed': 47, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 23}
2018-11-01 13:09:21,504 compress:   0.7ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   338, client_options={'zlib': 5, 'delta': 490, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 493}
2018-11-01 13:09:21,514 compress:   1.2ms for 1636x959  pixels at    0,0    for wid=2     using    scroll as   8 rectangles  ( 6128KB)           , sequence   340, client_options={'flush': 2}
2018-11-01 13:09:21,543 compress:  28.7ms for 1636x561  pixels at    0,144  for wid=2     using      jpeg with ratio   0.9%  ( 3585KB to    31KB), sequence   341, client_options={'quality': 23, 'optimize': True, 'flush': 1}
2018-11-01 13:09:21,547 compress:   3.7ms for 1636x61   pixels at    0,763  for wid=2     using      jpeg with ratio   0.6%  (  389KB to     2KB), sequence   342, client_options={'quality': 23, 'optimize': True}
2018-11-01 13:09:21,699 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   342, client_options={'zlib': 5, 'delta': 493, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 496}
2018-11-01 13:09:21,707 compress:   0.8ms for 1636x959  pixels at    0,0    for wid=2     using    scroll as   1 rectangles  ( 6128KB)           , sequence   344, client_options={'flush': 1}
2018-11-01 13:09:21,740 compress:  32.2ms for 1636x606  pixels at    0,260  for wid=2     using      jpeg with ratio   1.1%  ( 3872KB to    42KB), sequence   345, client_options={'quality': 23, 'optimize': True}
2018-11-01 13:09:21,929 compress:   0.2ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   345, client_options={'zlib': 5, 'delta': 496, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 499}
2018-11-01 13:09:22,522 compress: 591.0ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   1.7%  ( 6128KB to   101KB), sequence   346, client_options={u'speed': 49, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 26}
2018-11-01 13:09:22,523 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   347, client_options={'zlib': 5, 'delta': 499, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 502}
2018-11-01 13:09:22,571 compress:  47.3ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.9%  ( 6128KB to    56KB), sequence   348, client_options={u'speed': 49, u'frame': 1, u'pts': 531, 'csc': 'YUV444P', u'quality': 26}
2018-11-01 13:09:22,634 compress:   1.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   349, client_options={'zlib': 5, 'delta': 502, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 505}
2018-11-01 13:09:22,688 compress:  51.2ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.7%  ( 6128KB to    41KB), sequence   350, client_options={u'speed': 49, u'frame': 2, u'pts': 703, 'csc': 'YUV444P', u'quality': 26}
2018-11-01 13:09:22,827 compress:   0.7ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   351, client_options={'zlib': 5, 'delta': 505, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 508}
2018-11-01 13:09:23,371 compress: 542.7ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.8%  ( 6128KB to    49KB), sequence   352, client_options={u'speed': 51, u'frame': 0, u'pts': 0, 'csc': 'YUV444P', u'quality': 24}
2018-11-01 13:09:23,427 compress:  55.6ms for 1636x959  pixels at    0,0    for wid=2     using      webp with ratio   2.0%  ( 6128KB to   122KB), sequence   353, client_options={u'quality': 100, u'rgb_format': 'BGRX'}
2018-11-01 13:09:23,495 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   354, client_options={'zlib': 5, 'delta': 508, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 513}
2018-11-01 13:09:23,542 compress:  45.6ms for 1636x958  pixels at    0,0    for wid=2     using      h264 with ratio   0.5%  ( 6128KB to    28KB), sequence   355, client_options={u'speed': 51, u'frame': 1, u'pts': 669, 'csc': 'YUV444P', u'quality': 24}
2018-11-01 13:09:23,654 compress:   0.9ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   356, client_options={'zlib': 5, 'delta': 513, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 516}
2018-11-01 13:09:23,663 compress:   0.8ms for 1636x959  pixels at    0,0    for wid=2     using    scroll as   1 rectangles  ( 6128KB)           , sequence   358, client_options={'flush': 1}
2018-11-01 13:09:23,683 compress:  20.4ms for 1636x601  pixels at    0,266  for wid=2     using      jpeg with ratio   0.9%  ( 3840KB to    35KB), sequence   359, client_options={'quality': 24}
2018-11-01 13:09:23,986 compress:   0.3ms for 1636x1    pixels at    0,958  for wid=2     using     rgb24 with ratio   0.8%  (    6KB to     0KB), sequence   359, client_options={'zlib': 5, 'delta': 516, 'bucket': 0, 'rgb_format': 'BGRX', 'flush': 1, 'store': 519}
2018-11-01 13:09:23,999 compress:   0.6ms for 1636x959  pixels at    0,0    for wid=2     using    scroll as   3 rectangles  ( 6128KB)           , sequence   361, client_options={'flush': 4}
2018-11-01 13:09:24,016 compress:  16.3ms for 1636x516  pixels at    0,312  for wid=2     using      jpeg with ratio   1.1%  ( 3297KB to    35KB), sequence   362, client_options={'quality': 26, 'flush': 3}
2018-11-01 13:09:24,018 compress:   1.0ms for 1636x4    pixels at    0,836  for wid=2     using       png with ratio   0.8%  (   25KB to     0KB), sequence   363, client_options={'flush': 2}
2018-11-01 13:09:24,020 compress:   1.8ms for 1636x5    pixels at    0,915  for wid=2     using       png with ratio   4.6%  (   31KB to     1KB), sequence   364, client_options={'flush': 1}
2018-11-01 13:09:24,023 compress:   1.5ms for 1636x12   pixels at    0,926  for wid=2     using      jpeg with ratio   3.4%  (   76KB to     2KB), sequence   365, client_options={'quality': 26}
2018-11-01 13:09:24,619 compress:  73.4ms for 1636x959  pixels at    0,0    for wid=2     using      webp with ratio   1.9%  ( 6128KB to   116KB), sequence   365, client_options={u'quality': 100, u'rgb_format': 'BGRX'}
Last edited 13 days ago by Nathan Hallquist (previous) (diff)

comment:3 Changed 12 days ago by Antoine Martin

First things first, something is not right with your setup if rgb ends up using zlib compression. You should have lz4 installed and if not then you should see a big warning in your server log on startup.
You can confirm the status by running xpra/net/net_util.py on Linux and Network_info.exe on win32.

As for the logs, I meant "-d compress" as well as, or combined with the -d regiondetect and -d encoding from comment:1.

From the first log sample, most of the time it seems to be taking 500ms to compress at a resolution of 1636x958. That's dreadfully slow. You may want to raise min-speed and leave the encoding set to "auto" so it can at least switch to vp8 / vp9 when not using NVENC.
Why it isn't switching to nvenc would require the -d score log output. I think that is going to be the key.

it is slow and throws "fail to encode errors" for NVENC sometimes

Please post the exact error message, including the full stacktrace.
You may need to add -d video,encoder to see it.
That could also cause encoder selection problems, we should be able to fix those or at least avoid the conditions that trigger it.

with XPRA_FORCE_CSC_MODE=BGRX

Why are you changing this switch exactly?
Setting a high enough (min-)quality should give the same result.
There are two ways that NVENC can handle BGRX input: natively for the cards + driver combinations that support it (can be turned off with XPRA_NVENC_NATIVE_RGB=0), or via a CUDA conversion step otherwise.
It's not clear what your card supports.
Please post the output of the server startup with -d nvenc
Either way, NVENC should win the scoring compared to doing the CSC step via another module (libyuv or whatever).

There are some scroll encoding packets in the middle, those can cause problems when encoding video as we then skip frames and encode them differently.
You can play with raising:

XPRA_SCROLL_MIN_PERCENT=80

Or even disable scroll encoding altogether:

XPRA_SCROLL_ENCODING=0

As for the IRC discussion regarding setting hints, we can worry about that once the encoding selection bug is fixed - it will help (see #1952 and #2023)

Last edited 12 days ago by Antoine Martin (previous) (diff)

comment:4 Changed 12 days ago by Nathan Hallquist

After a lot of struggling I've gotten LZ4 to work, but I don't understand why it was wrong. For some reason, on my computer, MINGW_BUILD.sh doesn't include lz4 correctly for version 2.1.0 or 2.1.1. When I backed down to version 1.1.0 my windows build correctly includes lz4, and I see this:

2018-11-02 13:01:50,174 compress:   0.3ms for 1636x959  pixels at    0,0    for wid=2     using    scroll as   1 rectangles  ( 6128KB)           , sequence   111, client_options={'flush': 2}
2018-11-02 13:01:50,315 compress: 141.1ms for 1636x593  pixels at    0,233  for wid=2     using     rgb24 with ratio   1.7%  ( 3789KB to    66KB), sequence   112, client_options={'flush': 1, 'lz4': 1, 'rgb_format': 'BGRX'}
2018-11-02 13:01:50,320 compress:   4.2ms for 1636x20   pixels at    0,915  for wid=2     using     rgb24 with ratio   3.4%  (  127KB to     4KB), sequence   113, client_options={'lz4': 1, 'rgb_format': 'BGRX'}
2018-11-02 13:01:50,740 compress:   8.3ms for 1636x959  pixels at    0,0    for wid=2     using     rgb24 with ratio   3.3%  ( 6128KB to   199KB), sequence   113, client_options={'lz4': 1, 'rgb_format': 'BGRX'}

Version 2.1.0 works only if I run the python script from within mingw shell. However, lz4 doesn't seem to be getting put into the build directory correctly when I use version 2.1.0. Consequently, it doesn't make it into the installer. I'm not sure why it's wrong, and I'll look into a little more before giving up and moving on to your other suggestions. (This problem is less important to me than the encoder one, especially since I've got a workaround, namely downgrade to 1.1.0)

Making lz4 work didn't resolve the other issues regarding the NVIDIA encoder.

Last edited 12 days ago by Nathan Hallquist (previous) (diff)

comment:5 Changed 12 days ago by Nathan Hallquist

By running python with "-S" in the "dist/lib" directory I've determined that lz4 2.1.1 works *if* I copy the "egg" data into "dist/lib".

Python is far outside my core expertise, so I can only speculate about the right fix. Maybe the install_exe function of setup.py needs to include the "egg" directories...

Last edited 12 days ago by Nathan Hallquist (previous) (diff)

comment:6 Changed 12 days ago by Nathan Hallquist

Command line:

 xpra start -d compress,regiondetect,encoding --start=gnome-terminal --no-daemon --video-encoders=nvenc :57

For this capture I had encoder set to H264 (everything else was set to auto). I rendered a 3D model and rotated it. I had XPRA_SCROLL_ENCODING=0 (because other experiments showed this helps some). This particular run was *very* jerky on the gigabit LAN. I have no idea why.

If you are interested I can send you all the software I am running.

raw-attachment/ticket/2022/1.txt

Last edited 11 days ago by Antoine Martin (previous) (diff)

comment:7 Changed 12 days ago by Nathan Hallquist

Same as before except everything is set to "auto" including encoder. Very slow.

attachment/ticket/2022/2.txt

Last edited 11 days ago by Antoine Martin (previous) (diff)

comment:8 Changed 12 days ago by Nathan Hallquist

In this case I disable all encodings except rgb and h264. When I begin to rotate it's jerky, but once it gets going its great!

Command line:

xpra start -d compress,regiondetect,encoding --start=gnome-terminal --no-daemon --video-encoders=nvenc --encodings=h264,h265,rgb :57

Output:

attachment/ticket/2022/3.txt

Last edited 11 days ago by Antoine Martin (previous) (diff)

Changed 12 days ago by Nathan Hallquist

Attachment: 2.txt added

Log for second try.

Changed 12 days ago by Nathan Hallquist

Attachment: 3.txt added

third example

Changed 11 days ago by Antoine Martin

Attachment: 1.txt added

re-adding attachment with txt extension so it can be viewed online

comment:9 Changed 11 days ago by Antoine Martin

After a lot of struggling I've gotten LZ4 to work, but I don't understand why it was wrong.
For some reason, on my computer, MINGW_BUILD.sh doesn't include lz4 correctly for version 2.1.0 or 2.1.1.

So you're using custom builds rather than our official releases. Please make sure to mention that every time, as per wiki/ReportingBugs.
And test with official builds too to make sure that the problem doesn't come from your builds.
As for lz4, this is a well documented issue: ticket:1883#comment:4 with its own ticket: #1929

Making lz4 work didn't resolve the other issues regarding the NVIDIA encoder.

It wasn't meant to.
It was essential to find out why your output was not the one that was expected.
And it will help with performance (outdated benchmark here: https://xpra.org/stats/lz4/)

I had XPRA_SCROLL_ENCODING=0 (because other experiments showed this helps some)

"scroll" encoding is not used for windows we correctly tag as "video", hence why you should look into tagging your video windows.

xpra start -d compress,regiondetect,encoding --start=gnome-terminal --no-daemon --video-encoders=nvenc --encodings=h264,h265,rgb :57

As I mentioned before, that's a very bad idea:

  • don't disable other video encoders, include "-d score" and then we'll see if the other video encoders somehow get used when they should not (NVENC should win)
  • don't disable other encodings! xpra will be forced to use "rgb" for non-video stuff, which is going to completely kill your bandwidth, even on a gigabit lan

Considering those important configuration issues, the logs you attached are not very telling.
Please see comment:3 :

Why it isn't switching to nvenc would require the -d score log output. I think that is going to be the key.

Please also include both:

  • -d score,video
  • -d score,encoding,video,scaling,csc,stats (just in case details are missing from the first one)

What I do see in there:

  • lots of:
    send_delayed_regions: using full window update 1636x959 as  h264, from <bound method WindowVideoSource.get_best_encoding_video of WindowVideoSource(2 : (1636, 959))>
    do_video_encode(h264, XShmImageWrapper(BGRX: 0, 0, 1636, 959), {})
    make_instance() nvenc - instance count=0
    

So it is choosing the correct encoding (h264) and the correct encoder (nvenc) but for some reason the context gets recycled every time.
And since nvenc is slow to setup, that's probably where the delay comes from.

  • by the time the log samples start, the batch delay is already through the roof:
    send_delayed for wid 2, batch delay is 233ms, elapsed time is 233ms
    

So something already happened before that which reduced the framerate to less than 5fps.
Could well be the wacky settings, or the re-cycling issue, -d stats should be enough to see why that is.

Last edited 11 days ago by Antoine Martin (previous) (diff)

comment:10 Changed 11 days ago by Nathan Hallquist

I *have* been testing with both builds consistently on the client side, and I've been using the builds from winswitch beta on CentOS7. There are no other surprises in my configs. I have not, for instance, set LD_LIBRARY_PATH to anything interesting. I had not seen wiki/ReportingBug before (I should have thought to look for such a document). My apologies!

Regarding tagging, can you point me to some information on how that works (I'm ignorant about how a program passes this kind of information through)? I will pass it on to our UI guys, and I expect that if it's straightforward I get can them to put it in our UI trunk right away. (They are using wxwidgets with gtk2 on X. Our software's window has a rectangular video region surrounded by buttons).

Regarding, the additional debug captures I will try to do that in a few hours.

Again. Thank you.

comment:11 Changed 11 days ago by Antoine Martin

Regarding tagging, can you point me to some information on how that works

It's all in this ticket which is assigned to you since I knew you would be interested: ticket:2023#comment:1. There's a trivial example included: browser/xpra/trunk/src/tests/xpra/test_apps/test_window_content_type.py.
Basically, set the _XPRA_CONTENT_TYPE X11 window property to what you want (ie: "video" or "text").

Last edited 11 days ago by Antoine Martin (previous) (diff)

comment:12 in reply to:  9 Changed 11 days ago by Nathan Hallquist

I'm doing the first requested test: debug on score and video. My command line:

xpra start -d score,video --start="vglrun ./lspp45 ../d3plots/d3plot" --no-daemon --video-encoders=nvenc  :57 2>&1 |tee 4.txt

The "vglrun" command is so that I can spare you all the startup logging. Once I connected to the server I went straight to rotating.

attachment/ticket/2022/4.txt

Same exercise but with -d score,video,compress

attachment/ticket/2022/5.txt

Same exercise but with -d score,encoding,video,scaling,csc,stats

attachment/ticket/2022/6.txt

Same exercise but with -d score,encoding,video,scaling,csc,stats,compress

attachment/ticket/2022/7.txt

Same exercise but with -d score,encoding,video,scaling,csc,stats,compress using the r20905 client from the web site.

attachment/ticket/2022/8.txt

After uploading the 7th one, only as control, I tried "--encodings=h264,h265" (this is not what I did for 8.txt). I found that it locks up just a tiny little bit at the beginning, unless I let it settle before setting in on rotating. On the 8.txt (which had all encoders enabled and was with the stock client) I was extra careful to let it "settle". As I expected, with settling it is still has the issue for which I opened this ticket, but I figure it's best to isolate one problem at a time. I, therefore, regard 8.txt as the nicest debug log.

Last edited 10 days ago by Antoine Martin (previous) (diff)

Changed 11 days ago by Nathan Hallquist

Attachment: 4.txt added

Changed 11 days ago by Nathan Hallquist

Attachment: 5.txt added

Changed 11 days ago by Nathan Hallquist

Attachment: 6.txt added

Changed 11 days ago by Nathan Hallquist

Attachment: 7.txt added

comment:13 in reply to:  11 Changed 11 days ago by Nathan Hallquist

I've forwarded this to our chief UI developer. I'll let you know when it's done.

Replying to Antoine Martin:

Regarding tagging, can you point me to some information on how that works

It's all in this ticket which is assigned to you since I knew you would be interested: ticket:2023#comment:1. There's a trivial example included: browser/xpra/trunk/src/tests/xpra/test_apps/test_window_content_type.py.
Basically, set the _XPRA_CONTENT_TYPE X11 window property to what you want (ie: "video" or "text").

Changed 11 days ago by Nathan Hallquist

Attachment: 8.txt added

comment:14 Changed 10 days ago by Antoine Martin

As per previous comments, do not disable the other video encoders.

What I found in the logs despite that:

  • DPI set to 45 x 25 (wanted 96 x 96)

You should be using the patched dummy driver from the repo.

  • we don't find a video encoder for small areas (because you disabled the other video encoders):
    video size 484x92 out of range for nvenc, min 128x128
    (..)
    Error: no video pipeline options found for BGRX at 485x93
    
  • your card is limited to 2 contexts since you're using a gamer card without a license key:
    CUDA 9.2.0 / PyCUDA 2018.1, found 1 device:
      + GeForce GTX 1060 6GB @ 0000:AF:00.0 (memory: 98% free, compute: 6.1)
    (..)
    setup_pipeline failed for (84, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 520, 188, video_spec(nvenc)): cannot open encoding session, context is NULL
    

And that causes failures when we reach the limit. 2 contexts should be enough if you only have one "video" window though.
When we do hit this limit, things go downhill fast:

setup_pipeline(..) failed! took 2197.59ms

That's the encoding pipeline blocked for over 2 seconds! ouch.

  • the batch delay is already high because of this:
    update_batch_delay: congestion                   : 1.87,8.74  {}
    

You may want to turn off bandwidth detection to see if that helps with --bandwidth-detection=no. (#1912)
Or post the -d bandwidth log output so we can fix this.
But then it also goes up because of this:

update_batch_delay: compression-work-queue       : 2.64,5.12  {u'target': 1000, u'weight_multiplier': 3648, u'smoothing': u'logp', u'aim': 250, u'aimed_avg': 5247, u'div': 1000, u'avg': 12057, u'recent': 13314}

That's nvenc hogging the compresion work queue because it is slow to setup, and even worse when it hits the 2 context limit and fails (2 seconds as per above)... not much we can do about this.

  • the video context seems to get recycled too quickly, then we see:
    check_pipeline('auto', 451, 134, 'BGRX') setting up a new pipeline as check failed - encodings=['h264', 'h265']
    

why it failed is not logged, so I can't be sure why, but here are some possibilities:

r20927 also adds an env var you can use to turn on video context cleanup debugging:

XPRA_DEBUG_VIDEO_CLEAN=1 xpra start ...

This should tell us why the video context is being destroyed so quickly.

There are new centos 7.5 builds here: https://xpra.org/beta.

Last edited 10 days ago by Antoine Martin (previous) (diff)

comment:15 in reply to:  14 Changed 10 days ago by Nathan Hallquist

I did not understand what you meant by "do not disable the other video encoders". For some reason I thought you meant don't use "--encodings=", I just now think you don't want "--video-encoders=" either. I apologize for that. I was not being intentionally obtuse. I am very grateful for your help. I do not mean to waste your time!

I was aware of the limit regarding encoder streams, but I didn't realize I was bumping up against it. Is there an easy way to buy or request licenses from NVIDIA (for testing)?

All of my users have GTX cards (and except for a solitary tesla card), that's all I have. I have two kerpler-era K20c tesla cards. That is the card that sometimes draws frames with weird colors (lots of yellow) with XPRA. I infer older cards have some limitations in this regard. That card, by the way, does work great for the render window if I add the "CSC_MODE=BGRX" environment variable and a --encodings. I think that card can also be made to work. If getting a Quadro (or some other card) will expedite this I'll buy whatever you recommend. Any recommendations?

Ultimately, I'd like to get XPRA to do NVENC only in my CAD render window and not anywhere else (everywhere else works just fine), so hopefully we can eventually use GTX cards. I'll try to get into the office later today so that I can perform the followup tests you requested.

Thanks again!

Last edited 10 days ago by Antoine Martin (previous) (diff)

comment:16 Changed 10 days ago by Antoine Martin

I did not understand what you meant

My bad, a lot of things that seem obvious to me probably aren't.

I was aware of the limit regarding encoder streams, but I didn't realize I was bumping up against it

If we can fix all the bugs, you should be OK with just 2 contexts.

Is there an easy way to buy or request licenses from NVIDIA (for testing)?

I don't know.

I have two kepler-era K20c tesla cards. That is the card that sometimes draws frames with weird colors (lots of yellow) with XPRA

Could be bugs with lossless and / or YUV444 mode with those older cards.
If you can get us some "-d video" debug of when that happens, we may be able to restrict them to the modes that do work.

If getting a Quadro (or some other card) will expedite this I'll buy whatever you recommend. Any recommendations?

I've used both gamer cards (GTX 7xx to GTX10xx - with and without license keys) and quadros myself.
There is absolutely no difference in performance, the quadros just at least 4 times as much.
See Video Encode and Decode GPU Support Matrix

Ultimately, I'd like to get XPRA to do NVENC only in my CAD render window and not anywhere else (everywhere else works just fine)

Tagging windows should allow you to do that.

comment:17 in reply to:  14 Changed 10 days ago by Nathan Hallquist

I've installed the dummy driver. I've updated to r20927. I'm running with the environment variable you specified. I've turned off bandwidth detection. Bandwidth detection made no noticeable difference. The DPI problem seems to be resolved (based on my browsing of the log file).

I've run:

 xpra start -d score,encoding,video,scaling,csc,stats,compress --start="vglrun ./lspp45 ../d3plots/d3plot" --no-daemon --bandwidth-detection=no  :57 2>&1 |tee 11.txt

Here's the full log:

attachment/ticket/2022/11.txt

After enabling XPRA_DEBUG_VIDEO_CLEAN=1 I'm getting this:

2018-11-04 14:27:23,066 video_context_clean()
  File "/usr/lib64/python2.7/threading.py", line 785, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/site-packages/xpra/server/background_worker.py", line 62, in run
    item()
  File "/usr/lib64/python2.7/site-packages/xpra/server/source/encodings_mixin.py", line 160, in recalculate_delays
    ws.reconfigure()
  File "/usr/lib64/python2.7/site-packages/xpra/server/window/window_source.py", line 1102, in reconfigure
    self.update_encoding_options(force_reload)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window/window_video_source.py", line 1035, in update_encoding_options
    self.check_pipeline_score(force_reload)
  File "/usr/lib64/python2.7/site-packages/xpra/server/window/window_video_source.py", line 1131, in check_pipeline_score
    self.video_context_clean()
  File "/usr/lib64/python2.7/site-packages/xpra/server/window/window_video_source.py", line 300, in video_context_clean
    traceback.print_stack()
2018-11-04 14:27:23,069 video_context_clean() done
Last edited 9 days ago by Antoine Martin (previous) (diff)

Changed 10 days ago by Nathan Hallquist

Attachment: 11.txt added

comment:18 in reply to:  16 Changed 10 days ago by Nathan Hallquist

Replying to Antoine Martin:

I did not understand what you meant

My bad, a lot of things that seem obvious to me probably aren't.

I very much appreciate your patience!

If you can get us some "-d video" debug of when that happens, we may be able to restrict them to the modes that do work.

I just tried this exercise from the gigabit LAN. This older config is working much better than the GTX config. With bandwidth detection turned off my eyeball cannot detect anything wrong (when bandwidth detection is on it locks occasionally). Last time I saw the colors to go weird I was using broadband, so I'll go back home and try that (I came to work for this test).

For completeness I'm posting my experience on gigabit on my old config. I'm rendering to a C2075 (which is so old it cannot encode) and encoding on a K20c (which doesn't support rendering).

Command:

 xpra start -d score,encoding,video,scaling,csc,stats,compress --start="vglrun ./lspp45 ../d3plots/d3plot" --no-daemon --bandwidth-detection=no  :57 2>&1 |tee 12.txt

Environment:

[nathan@bobross lsprepost4.5_centos7]$ env | grep XPRA
XPRA_CUDA_DEVICE_NAME=Tesla K20c
XPRA_CUDA_DEVICE=0
XPRA_SCROLL_ENCODING=0
[nathan@bobross lsprepost4.5_centos7]$

attachment/ticket/2022/12.txt

Update: From home I find that with nvenc and only nvence enabled (--encodings) perforamnce is great. With normal options (nothing disabled) performance is bad and my client eventually crashes. I've attached the server log from that.

attachment/ticket/2022/14.txt

Last edited 9 days ago by Nathan Hallquist (previous) (diff)

Changed 10 days ago by Nathan Hallquist

Attachment: 12.txt added

comment:19 Changed 9 days ago by Antoine Martin

This older config is working much better than the GTX config

Probably because of the 2 context limit on gamer cards.

From this new log:

  • nvenc always wins the scoring
    get_video_pipeline_options(['h264', 'vp9', 'vp8', 'mpeg1', 'mpeg2'], 3840, 1027, 'BGRX') \
        scores=[
         (81, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 3840, 1026, video_spec(nvenc)),
         (..)
        ]
    

So disabling the other video encoders is counter-productive.

  • this log sample seems to be missing XPRA_DEBUG_VIDEO_CLEAN=1 so I still cannot tell why the video encoder context gets recycled...

The sample from comment:17 lacks the log context around the backtrace. A single backtrace is not enough in any case, some video context re-cycling is normal.
That said, there aren't many instances of re-cycling during this test, so this may not tell us much.

  • the nvenc encoder seems to stick around for quite some time, long enough for over 750 frames:
    video_encode nvenc encoder: h264 1462x756 result is 148828 bytes (56.9 MPixels/s), \
        client options={u'speed': 94, u'frame': 778, u'pts': 19778, 'csc': 'YUV444P', u'quality': 97}
    

The batch delay was also very good (low) at that point:

calculate_batch_delay for wid=1 current batch delay=8, last update 1 seconds ago

You seem to be getting at least 25 fps, reaching closer to 50fps near the end of the log - which is quite impressive.
Was it not smooth at that point?

At which point in time in this log file am I looking for problems?

With bandwidth detection turned off my eyeball cannot detect anything wrong

Maybe no problems in this log then?

Changed 9 days ago by Nathan Hallquist

Attachment: 14.txt added

comment:20 in reply to:  19 Changed 9 days ago by Nathan Hallquist

I agree that the K20c run on the LAN was good (I meant to indicate that in my original post). I've since added 14.txt (I did not realize that you had already replied, otherwise I would have made a new reply).

14.txt was on broadband. Over broadband I'm seeing that --encodings=h264 is much better than the default options (I am only sending logs for the default options per your request).

Regarding the environment variable, you are right. I did forget to add the new environment variable on 12.txt and 14.txt. However, I did have the environment variable set correctly on 11.txt (my most recent run on the gtx). I'll redo 12.txt and 14.txt in the morning.

comment:21 Changed 9 days ago by Antoine Martin

Ah, didn't see that 11.txt had the backtraces.

From this 11.txt log:

  • how did you install the latest builds without installing libyuv? it is an RPM dependency in v2.5
  • cosmetic: either disable printing or give yourself the correct permissions
  • at some point nvenc fails because of the 2 context limit:
    setup_pipeline failed for (78, (1, 1), (1, 1), 520, 188, video_spec(swscale), u'XRGB', (1, 1), 520, 188, video_spec(nvenc)): \
        cannot open encoding session, context is NULL
    

and we try a few other nvenc combinations (like nvenc without swscale, etc) - which takes time.
but later on we re-try and it succeeds:

check_pipeline_score(False) \
   found a better video encoder class than \
     <type 'xpra.codecs.enc_x264.encoder.Encoder'>: (86, (1, 1), None, 0, 0, None, 'BGRX', (1, 1), 520, 188, video_spec(nvenc))

We could use the failures to detect the actual context limit value for each card, and then use this value to avoid trying to create new contexts when the limit is reached. (we have code for doing this already - it just uses the wrong limit: the default is 32)
It is difficult to see why this happened just from the log output, if you don't have too many windows then I suspect that it could be because the encoder context cleanup happens in the encode thread so when we destroy an nvenc encoder (for resizing or whatever), it isn't actually freed immediately whereas we may try to create the one immediately... I'm not sure how easy it would be to change that.

  • there aren't many instances of context recycling:
    • switch from x264 to nvenc (as per above)
    • a window is removed (2018-11-04 14:35:01,187)
    • we detect a sub-rectangle of the window where the "video" seems to happen: video subregion was None, now rectangle(3, 27, 1558, 879) (window size: 1652,998)
    • and later on the other way: video subregion was rectangle(3, 27, 1558, 879), now None (window size: 1652,998)
    • and again: video subregion was None, now rectangle(3, 27, 1558, 879) (window size: 1652,998)
    • and finally at the end when there is no activity for ~10 seconds (video_encoder_timeout)

There's a dbus API you could use to turn off video region detection, or to set a particular region: #1060, #1401.
Maybe we could just enhance the content-type to support values like "video:full", "video:subregion" or something like that.
Re-cycling contexts is expensive with nvenc.

With normal options (nothing disabled) performance is bad and my client eventually crashes.

How did it crash?
Can you run it from the command line and see its output?
Please try with the latest builds, preferably the 64-bit ones.


From the 14.txt log:

  • a strange error:
    Warning: XShm get_pixels_ptr XImage is NULL
    nvenc(XRGB/NV12/H264 - hp -  520x188 ).compress_image(XShmImageWrapper(BGRX: 0, 0, 520, 189), 40, 99, {'av-sync': True, 'bandwidth-limit': 0, 'content-type': ''})
      Traceback (most recent call last):
      File "/usr/lib64/python2.7/site-packages/xpra/server/window/window_video_source.py", line 1921, in do_video_encode
       ret = ve.compress_image(csc_image, quality, speed, options)
      File "xpra/codecs/nvenc/encoder.pyx", line 2179, in xpra.codecs.nvenc.encoder.Encoder.compress_image
      File "xpra/codecs/nvenc/encoder.pyx", line 2216, in xpra.codecs.nvenc.encoder.Encoder.do_compress_image
      File "xpra/codecs/nvenc/encoder.pyx", line 2245, in xpra.codecs.nvenc.encoder.Encoder.copy_image
    AssertionError: failed to get pixels from XShmImageWrapper(BGRX: 0, 0, 520, 189)
    

No idea what caused this. Unless this happens repeatedly, I think we can ignore this.

  • there's a lot of re-cycling going on, which is why things are slow, every time an nvenc context is (re)created, the initial frame takes ~400ms:
    compress: 429.1ms for 1632x914  pixels at    0,0    for wid=1     using      h264 with ratio   3.4%  ( 5833KB to   197KB), sequence    13, client_options={u'speed': 99, u'frame': 0, u'pts': 0, 'csc': u'XRGB', u'quality': 40}
    

Please enable XPRA_DEBUG_VIDEO_CLEAN=1 for that.

Last edited 9 days ago by Antoine Martin (previous) (diff)

comment:22 in reply to:  21 Changed 9 days ago by Nathan Hallquist

  • how did you install the latest builds without installing libyuv? it is an RPM dependency in v2.5

I think it is installed. What line in the log indicates that it is not?

[nathan@curry ~]$ rpm -q --info libyuv
Name        : libyuv
Version     : 0
Release     : 0.31.20180904git9a07219.el7_5
Architecture: x86_64
Install Date: Mon 29 Oct 2018 11:39:10 AM PDT
Group       : Unspecified
Size        : 361374
License     : BSD
Signature   : DSA/SHA1, Tue 23 Oct 2018 11:37:47 PM PDT, Key ID 18adb31cf18ad6bb
Source RPM  : libyuv-0-0.31.20180904git9a07219.el7_5.src.rpm
Build Date  : Tue 23 Oct 2018 10:00:00 PM PDT
Build Host  : buildbot
Relocations : (not relocatable)
URL         : https://chromium.googlesource.com/libyuv/libyuv
Summary     : YUV conversion and scaling functionality library
Description :
This is an open source project that includes YUV conversion and scaling
functionality. Converts all webcam formats to YUV (I420). Convert YUV to
formats for rendering/effects. Rotate by 90 degrees to adjust for mobile
devices in portrait mode. Scale YUV to prepare content for compression,
with point, bilinear or box filter.
[nathan@curry ~]$ rpm -q --verify -v libyuv
.........    /usr/lib64/libyuv.so.0
.........    /usr/lib64/libyuv.so.0.0.0
.........    /usr/share/doc/libyuv-0
.........  d /usr/share/doc/libyuv-0/AUTHORS
.........  d /usr/share/doc/libyuv-0/PATENTS
.........  d /usr/share/doc/libyuv-0/README.md
.........    /usr/share/licenses/libyuv-0
.........  l /usr/share/licenses/libyuv-0/LICENSE
[nathan@curry ~]$
Last edited 9 days ago by Nathan Hallquist (previous) (diff)

comment:23 Changed 9 days ago by Antoine Martin

I think it is installed. What line in the log indicates that it is not?

My bad, something has gone wrong with the centos7 builds, tracking this here: ticket:2004#comment:2

comment:24 in reply to:  21 Changed 9 days ago by Nathan Hallquist

I made a video demonstrating what our software looks like. This was rendered on the machine with the Tesla K20c (this is similar to 12.txt). I made it so that you can see what I am doing. For instance, the window closing events are for the "help" box and the "animate" box. The only part that needs nvenc encoding is the box with the 3d drawing.

This case works pretty well. It locks up during a rotation at 9 1/3 seconds (it would not do that with --encodings=h264), but it's pretty good. (I don't have a proper debug log because I was making it for demonstration rather than debug purposes).

Again, thanks for all your help. I'll be following up shortly with more debug logs.

https://ftp.lstc.com/anonymous/outgoing/nathan/bobross-ok.mp4

comment:25 in reply to:  21 Changed 9 days ago by Nathan Hallquist

I was able to get a repeat of 14.txt (with XPRA_DEBUG_VIDEO_CLEAN=1). Same server different client. This time I was using the stock XPRA 64bit client (from the web site) connecting through my cell phone hotspot. When tested with --encoding=h264 (for control) the performance was shockingly good (on cellphone hotspot''). However, with the normal options it was slow and eventually crashed.

[nathan@bobross lsprepost4.5_centos7]$  xpra start -d score,encoding,video,scaling,csc,stats,compress --start="vglrun ./lspp45 ../d3plots/d3plot" --no-daemon --bandwidth-detection=no :57 2>&1 | tee  15.txt^C
[nathan@bobross lsprepost4.5_centos7]$ env | grep XPRA
XPRA_CUDA_DEVICE_NAME=Tesla
XPRA_CUDA_DEVICE=0
XPRA_SCROLL_ENCODING=0
XPRA_DEBUG_VIDEO_CLEAN=1
[nathan@bobross lsprepost4.5_centos7]$

attachment/ticket/2022/15.txt

I *was not* able to replicate using the same commands when connected on either 802.11g or 802.11n. In all cases I was port forwarding ssh. This trouble arises *only* on cell phone and broadband. Broadband is an especially important case, however, because we have several offsite users.

How do I get debug log off the client?

Last edited 8 days ago by Antoine Martin (previous) (diff)

Changed 9 days ago by Nathan Hallquist

Attachment: 15.txt added

comment:26 Changed 8 days ago by Antoine Martin

However, with the normal options it was slow and eventually crashed.

As per comment:21, how did it crash, what was the output, etc?

How do I get debug log off the client?

By default, the client debug log output is sent to the server. (see --remote-logging)
But when dealing with crashes, the last messages may not make it to the server, so just run:

xpra_cmd.exe attach ssh://whatever/ -d whateverdebugflags

If it isn't clear what is causing the crash, using -d all might be required to cover all possibilities.


From this latest log:

  • there are newer centos server builds than r20927, please try that
  • the context re-cycling issue is not a major factor in this log (see details below), so maybe this was a red herring or just an aggravating factor
  • the batch delay goes up because of bandwidth congestion (at one point is is above 100ms which would limit your screen updates to less than 10fps), try running the client with --bandwidth-detection=no --bandwidth-limit=100mbps and add -d bandwidth to the server debug flags to confirm (the client overrules the server's bandwidth-detection flag - and maybe we shouldn't allow that?) - h264 is more efficient, so it could be avoiding some bandwidth issues
  • we haven't looked at the impact of auto-refresh, maybe that's what is killing the performance over slower connections, try: --auto-refresh-delay=0 on the client and see if that helps


The video_context_clean events come from:

  • windows that are destroyed (_lost_window), twice - in both cases, the windows used h264 for a single frame (and webp, jpeg for a few more):
    encoding_totals for wid=2 with primary encoding=auto : {'h264': [1, 60300], 'webp': [2, 120868], 'jpeg': [3, 129888]}
    

Normally, this isn't a huge problem but with nvenc this is more costly because of the initial setup cost and the fact that this is "wasting" a precious encoder context.
If the video encoders trigger too early, we could favour software encoders until we have enough frames to be certain that a hardware video encoder is worth using.

  • a change in video region detection:
    video subregion was rectangle(0, 0, 1604, 881), now rectangle(3, 27, 1450, 762) (window size: 1604,881)
    
    as single change should not be a huge problem
  • client disconnection at the end - as expected

... with the normal options it was slow ...
This trouble arises *only* on cell phone and broadband.

Can you describe this "trouble"?
Is it stuttering, occasionally pausing, or just generally slow?
If the problems are punctual, can you match up the problems with timecodes from the log output? (roughly)
Try capturing "xpra info" at a time when things are slow.

comment:27 in reply to:  26 ; Changed 8 days ago by Nathan Hallquist

I've been away all day dealing with some other stuff. I think you are on to something. When I turn off the bandwidth detection on the client side the performance is better than I've ever seen. I (almost) cannot distinguish broadband from LAN.

The interesting part: the longer I leave the session running the slower things get. I'll try to get debug data before the day is out. Otherwise I'll have it first thing tomorrow.

Again, thank you!

comment:28 in reply to:  27 Changed 7 days ago by Nathan Hallquist

Added the --auto-refresh-delay option. Everything runs smoothly (with the Tesla card) for as long as I leave the window open.

I like auto-refresh very much so I'll be doing anything I can to help fix it. (with autorefresh turned on the video streams become slower and slower seemingly with each refresh until the client crashes--after reconnect its perfect at first, and, then, the process repeats exactly as before) I will be taking debug logs tomorrow. I'll also try to match lock-ups to timestamps in the logs.

Last edited 7 days ago by Antoine Martin (previous) (diff)

comment:29 Changed 6 days ago by Antoine Martin

I like auto-refresh very much so I'll be doing anything I can to help fix it.

The debug we need to get to the bottom of this would be:

  • leave --bandwidth-detection=yes on the client and post the server's -d bandwidth logs - it looks like the bandwidth detection is detecting the difference between LAN and broadband / phone connections, but maybe over-compensating
  • for auto-refresh issues, post the -d refresh,regionrefresh,compress,video for when things get out of whack - the bandwidth detection should have an impact on the auto-refresh so you may want to try with / without, nvenc's slow startup could be causing the auto-refresh to trigger too early, or maybe it triggers and causes context re-cycling - we'll see

comment:30 Changed 6 days ago by Nathan Hallquist

Since I've updated my server to the latest beta and disabled the bandwidth detection I've not been able to replicate any trouble on the Tesla card when I use clients compiled by xpra.org. I've tried 2.4 and the latest betas both and they work!

The good news for me is that my builds give the same bad results with or without my patches. I really want to get my patches incorporated into mainline, but I don't want to send them until I make them work on Linux and MacOS.

The bad news (for me, not for you) is that, until I get our UI team to add the X properties and until I go back to testing the Gamer cards I think my only problems are created by something in the way I'm building XPRA on Windows.

I'm not really sure to begin debugging this. I'm going to start with the suggestions in your previous replies (I haven't tried them all yet). The log captures on the server side look pretty similar for both clients (lots of h264). My client is slow even when it *is* using h264 on the server side. I can't figure out how to read "-d all" because there are so many GTK messages.

I am working on this today.

Again, I thank you and your team for all your help!

Last edited 6 days ago by Nathan Hallquist (previous) (diff)

comment:31 Changed 6 days ago by Antoine Martin

Since I've updated my server to the latest beta and disabled the bandwidth detection I've not been able to replicate any trouble on the Tesla card when I use clients compiled by xpra.org.

Did you identify what fixed things?
My guess is the fixes from comment:14, likely r20914.

I really want to get my patches incorporated into mainline, but I don't want to send them until I make them work on Linux and MacOS.

What do those patches do?
Can you open a ticket for that?

I think my only problems are created by something in the way I'm building XPRA on Windows.

The number 1 performance issue is likely to be the lack of opengl, see ticket:1883#comment:8
If you can't figure it out, please open a new ticket for that.

comment:32 Changed 2 days ago by Antoine Martin

FYI: as of r20971 the server controls the bandwidth-detection option and the client can no longer enable this feature on its own.

Note: See TracTickets for help on using tickets.