lain.widget.fs: indefinitely a_glib_poll with NFS

Question

lain.widget.fs: indefinitely a_glib_poll with NFS

Lightsockie opened this issue 6 years ago · comments

During launch awesome will stall for a long while before eventually it finishes executing. It's also very sporadic, and will stall every so often after this. I can still use my browser (if I happen to be focus'd on it), but awesome will stall (keyboard works, mouse doesn't).

This happens after the run_once section of the rc.lua, but before the section that applies wallpapers. (I know because I have a run_once command that'll apply them with nitrogen, but it gets overran by the theme setting.)

I don't get this problem when running the default awesome configuration. And, oddly enough, I also don't get this problem when running the vertex theme. Under the common issues section of the Awesome FAQ, it mentions how there are certain calls that'll stall the main thread a lot. Maybe one of the dependencies of the other themes, that isn't in vertex, is causing this for me? (The FAQ also mentions compton. I've tried with and without it in my .xinitrc)

Everything was working until recently. It'd been around two weeks since I last reset my box (and thus reloaded Awesome). Perhaps a change in libraries has caused this? I can post pacman.log if you want.

$ awesome -v
awesome v4.2 (Human after all)
 • Compiled against Lua 5.3.4 (running with Lua 5.3)
 • D-Bus support: ✔
 • execinfo support: ✔
 • xcb-randr version: 1.5
 • LGI version: 0.9.2
$ lua -v
Lua 5.3.5  Copyright (C) 1994-2018 Lua.org, PUC-Rio

rc.lua: https://paste.ee/p/EoWjE
.xinitrc: https://paste.ee/p/77BOD
Start of the xorg I'm currently using, with vertex: https://paste.ee/p/YKj0c
Image of running awmtt showing the problem: https://my.mixtape.moe/mxyvsn.png
Image of running awmtt, before it finishes loading: https://my.mixtape.moe/bwqqny.png

There's also an issue with X and my screens, but based on this ticket in Awesome's repo, I think that's unrelated to this. Noting it just incase.

Luca CPZ · Answer 1 · Wed Sep 12 2018 18:35:38 GMT+0800 (China Standard Time)

This seems another instance of lcpz/lain#387.

Basically, you have an outdated version of Glib, which is not compatible with the latest lain.widget.fs.

Either you disable the widget in theme.lua, or update your box. What's your distro? If you have to do it manually, here are some instructions (replace 2.54.3 with 2.58.0).

Sockie · Answer 2 · Wed Sep 12 2018 18:58:02 GMT+0800 (China Standard Time)

Saw that ticket, actually (or ones like it). I use Arch. I think my glib is up to date

$ pacman -Q glib2
glib2 2.58.0-1

Disabling the lain.widget.fs entries in the powerarrow-dark theme does seem to fix the problem, though. Sorry, thought I had tried that already. Posting this before I reload awesome. ~~I'll edit it on the other side if I didn't crash letting you know everything worked out. If you don't see an edit, assume I died and crawled back into bed~~ Seems to be working!

Thanks lcpz! As far as this ticket is concerned, I'm satisfied. Although it seems odd that the glib library is seemingly up to date, but the widget still causes instability (unless I'm querying the wrong package).

Luca CPZ · Answer 3 · Wed Sep 12 2018 19:33:36 GMT+0800 (China Standard Time)

This is strange, you should have no issue at all under an up to date Arch (I'm running it myself).

Try to run again a theme that uses fs, but this time with one of these methods. Let's see if we can get some error log.

Sockie · Answer 4 · Thu Sep 13 2018 00:13:16 GMT+0800 (China Standard Time)

Using startx -- -keeptty -nolisten tcp > $HOME/.xorg.log 2>&1 with powerarrow I get this. It takes a couple minutes for awesome to load (aka much longer than the 0.17 seconds the a_glib_poll says). After powerarrow loads, I can consistently re-stall it by hovering over the fs widget. I must have been accidentally doing this before to get my "inconsistent" behavior.

X.Org X Server 1.20.1
X Protocol Version 11, Revision 0
Build Operating System: Linux Arch Linux
Current Operating System: Linux Alice 4.18.6-arch1-1-ARCH #1 SMP PREEMPT Wed Sep 5 11:54:09 UTC 2018 x86_64
Kernel command line: BOOT_IMAGE=(hd1,1)/vmlinuz-linux root=/dev/sdb3
Build Date: 09 August 2018  06:37:34PM
 
Current version of pixman: 0.34.0
	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
	(++) from command line, (!!) notice, (II) informational,
	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/home/gabriel/.local/share/xorg/Xorg.0.log", Time: Wed Sep 12 07:59:37 2018
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
(II) [KMS] Kernel modesetting enabled.
The XKEYBOARD keymap compiler (xkbcomp) reports:
> Warning:          Unsupported high keycode 372 for name <I372> ignored
>                   X11 cannot support keycodes above 255.
>                   This warning only shows for the first high keycode.
Errors from xkbcomp are not fatal to the X server
2018-09-12 08:01:07 W: awesome: a_glib_poll:432: Last main loop iteration took 0.112483 seconds! Increasing limit for this warning to that value.
2018-09-12 08:01:08 W: awesome: a_glib_poll:432: Last main loop iteration took 0.173636 seconds! Increasing limit for this warning to that value.
glx_bind_pixmap(0x00400bef): Failed to query Pixmap info.
win_paint_win(0x0080008e): Failed to bind texture. Expect troubles.
win_paint_win(0x0080008e): Missing painting data. This is a bad sign.
xinit: connection to X server lost

waiting for X server to shut down urxvt: X connection to ':0' broken, unable to recover, exiting.
(II) Server terminated successfully (0). Closing log file.

XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 80 requests (80 known processed) with 2 events remaining.

I think the stuff I posted in the OP is all relevant data, too. Since it's all pre-removal of fs from powerarrow-dark.

Luca CPZ · Answer 5 · Thu Sep 13 2018 01:51:58 GMT+0800 (China Standard Time)

OK, so this is not related to lcpz/lain#387, but you still have an issue with Glib/Lgi, which remains mysterious as it doesn't output anything on X.

Whether the problem is caused by Glib/Lgi or you, I can't do more from my scope. Feel free to ask anything else or close the issue.

Sockie · Answer 6 · Thu Sep 13 2018 03:42:28 GMT+0800 (China Standard Time)

Well seems to be working for me now, so I'm content. Slight bummer to lose the fs widget, but I can live with it if it means I get my awesome theme for the last few years back.

If there's anything more you think I should try, I'm game to be a guinea pig. But I'll close the issue. Thanks again

Luca CPZ · Answer 7 · Thu Sep 13 2018 04:45:20 GMT+0800 (China Standard Time)

If there's anything more you think I should try, I'm game to be a guinea pig.

The problem is either Glib or you. Try to investigate why you have that vain delay: it's either something you don't have properly configured or a bug with Glib or LGI. In the latter case, you'll have to find the logs, or be able to reproduce the issue on a different Arch machine so that you can create a bug report. The cause could be any of the function calls here. Let me know.

Sockie · Answer 8 · Fri Sep 14 2018 02:03:54 GMT+0800 (China Standard Time)

Oh! I bet it has to do with an NFS mount timeouts. NFS being NFS, I have a systemd-idle-timeout set in /etc/fstab to like 1min. I have two NFS boxes, one flakey and one rock solid (for NFS). One or both timing out fits the timetable of what I have to wait for awesome to come online.

I bet the hangtime is from calls to fs.update() trying to query that and getting stuck waiting for the flaky box to time out. I can't verify this hunch now, but I'll try to test it tonight.

Looking at the history for fs.lua, I think I can see why I didn't run into this issue earlier. This computer I'm running now is a fresh build from a few weeks ago. I think this is my second reset since I built it, and I prolly had the NFS setup proper for the first run. Prior to that (on my old box), I wouldn't put it past me to forget to pull in updates and be on old code. I vaguely recall seeing broken data on the fs widget when looking at network drives, and just writing it off cuz "of course they'd be broken its not setup right now". Shot in the dark, but maybe this commit is what changed from broken to stall? lcpz/lain@be9ae68#diff-e431cc80df53aabd688d2e5fb4dc8b42

Luca CPZ · Answer 9 · Fri Sep 14 2018 02:26:05 GMT+0800 (China Standard Time)

maybe this commit is what changed from broken to stall?

As you can see, if info is nil, these three will be nil as well. That commit was introduced because of this. So, I'd say it's very unlikely, but why don't you try yourself reverting that patch on local?

Sockie · Answer 10 · Fri Sep 14 2018 02:52:11 GMT+0800 (China Standard Time)

Oh, yeah you're totally right that makes sense. It probably wasn't that commit.

Does Gio do the querys async? Looks like before the swap to Gio the call was done with a helpers.async call. Perhaps that's the difference. If it timed out, it died alone in another thread. lcpz/lain@24ca585#diff-e431cc80df53aabd688d2e5fb4dc8b42L67

After I verify that it is indeed NFS causing the problems, and you think this is feasible as an explanation, I'll try rolling this back for fs.lua and see if that changes things. I'll try that other commit too

Luca CPZ · Answer 11 · Fri Sep 14 2018 17:03:37 GMT+0800 (China Standard Time)

AFAIK, Gio works asynchronously. Before that commit fs was already async, but it used an external script.

Sockie · Answer 12 · Sat Sep 15 2018 01:22:27 GMT+0800 (China Standard Time)

Well I fixed the hanging mount and can confirm it was definitely the problem. Now that it's fixed though, I can't test the old revision until it breaks again (which with how stable this NFS box seems to be, prolly won't take long).

I've never used this library and have no working knowledge of it. Looking at what Gio.unix_mounts_get() is calling, though, it doesn't seem inherently async. It seems that maybe you need to explicitly use a "GTask" if you want to use Gio to do async operations? Looking at the previous version with helpers.async, it looks like when building f via shell's df, it was pushed into another thread and could be missed by an end-user as "oh it just doesn't do anything" while it's hanging (with it's results eventually being returned and dropped because the user has moved their mouse away from the widget by then, or Awesome has finished loading everything else and skipped over it).

Luca CPZ · Answer 13 · Sat Sep 15 2018 02:39:03 GMT+0800 (China Standard Time)

So NFS was the problem? How did you fix it? Can you try the current widget?

Looking at the previous version with helpers.async, it looks like when building f via shell's df, it was pushed into another thread and could be missed by an end-user as "oh it just doesn't do anything" while it's hanging

In the worst case, it showed an empty notification when first hovering on, then a complete notification when the thread would finish. AFAIK, it never happened to anyone with up-to-10 years old machines.

Sockie · Answer 14 · Sat Sep 15 2018 03:13:14 GMT+0800 (China Standard Time)

Well the problem was whenever something would try to query this NFS drive, that timer would kick in and it'd stall for 1min (or forever if I didn't have that 1min, thanks NFS). The drive was technically mount'd as far as the fs is concerned though. So all I did is unmount it and the problem went away.

#/etc/fstab
wonderland:/ /mnt/Wonderland nfs noauto,x-systemd.automount,x-systemd.device-timeout=10,timeo=14,x-systemd.idle-timeout=1min,users 0 0

The current widget is enabled and is working again. I haven't tried the old version of the widget yet. Need to wait for the share to shutter out again (it's common with this particular NFS box)

Luca CPZ · Answer 15 · Sat Sep 15 2018 03:41:57 GMT+0800 (China Standard Time)

Thanks for sharing this information, hopefully it could be of help to others using NFS and my repositories.

I renamed the thread to make it more easy to be found.

From a quick search, it seems that so far there has been no issue related to NFS and g_get_unix_mounts in the Glib repository. If this is the case, maybe (and if you have time) you can produce a bug report. It would be nice if Glib could detect NFS and behave accordingly to prevent such stalls.