cea-hpc / shine

Lustre administration tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

oss2: AttributeError: type object 'FSAction' has no attribute 'ev_error'

btravouillon opened this issue · comments

On a system running with clustershell 1.8, shine fsck fails with the following backtrace:

[root@admin ~]# shine fsck -f scratch -d -n oss2 -i 3
Fsck scratch on oss2: are you sure? (y)es/(N)o: y
FSProxyAction fsck on oss2
SSHCLIENT: ssh -oForwardAgent=no -oForwardX11=no -oConnectTimeout=30 -oBatchMode=yes oss2 /usr/sbin/shine fsck -f scratch -R -d -l scratch-OST0003
oss2: SHINE:3:<pickle>
oss2: SHINE:3:<pickle>
oss2: POPEN: e2fsck -f -C2 /dev/disk/by-id/wwn-0x500800380001bd50 -y
oss2: LINE e2fsck 1.42.13.wc5 (15-Apr-2016)
oss2: SHINE:3:<pickle>
oss2: Fsck of scratch-OST0003 (/dev/disk/by-id/wwn-0x500800380001bd50) failed
oss2: >>
oss2: Traceback (most recent call last):
oss2:   File "/usr/sbin/shine", line 34, in <module>
oss2:     sys.exit(Controller().run_command())
oss2:   File "/usr/lib/python2.7/site-packages/Shine/Controller.py", line 259, in run_command
oss2:     rc = command.filter_rc(command.execute())
oss2:   File "/usr/lib/python2.7/site-packages/Shine/Commands/Base/FSLiveCommand.py", line 127, in execute
oss2:     result = max(result, self.execute_fs(fs, fs_conf, eh, vlevel))
oss2:   File "/usr/lib/python2.7/site-packages/Shine/Commands/Fsck.py", line 125, in execute_fs
oss2:     mountdata=self.options.mountdata)
oss2:   File "/usr/lib/python2.7/site-packages/Shine/Lustre/FileSystem.py", line 534, in fsck
oss2:     self._run_actions()
oss2:   File "/usr/lib/python2.7/site-packages/Shine/Lustre/FileSystem.py", line 272, in _run_actions
oss2:     task_self().resume()
oss2:   File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 803, in resume
oss2:     self._resume()
oss2:   File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 766, in _resume
oss2:     self._run(self.timeout)
oss2:   File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 400, in _run
oss2:     self._engine.run(timeout)
oss2:   File "/usr/lib/python2.7/site-packages/ClusterShell/Engine/Engine.py", line 723, in run
oss2:     self.runloop(timeout)
oss2:   File "/usr/lib/python2.7/site-packages/ClusterShell/Engine/EPoll.py", line 157, in runloop
oss2:     client._handle_read(sname)
oss2:   File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Worker.py", line 454, in _handle_read
oss2:     msgline(self.key, msg, sname)
oss2:   File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Worker.py", line 577, in _on_msgline
oss2:     self.eh.ev_error(self)
oss2:   File "/usr/lib/python2.7/site-packages/Shine/Lustre/Actions/Fsck.py", line 101, in ev_error
oss2:     FSAction.ev_error(self, worker)
oss2: AttributeError: type object 'FSAction' has no attribute 'ev_error'
Fsck failed
= FILESYSTEM STATUS (scratch) =
TYPE # STATUS  NODES
---- - ------  -----
OST  1 offline oss2

This is related to cea-hpc/clustershell#232 where ev_error has been dropped.

Thx @actatux, but upgrading to clustershell 1.8 shouldn't have broken this, so I think you discovered a case that we didn't handle properly after the 1.7 -> 1.8 EventHandler API changes... doh.

Hi @thiell. The API change seems to be handled correctly. Indeed, Fsck.ev_error is fine thanks to the following code in lib/ClusterShell/Worker/Worker.py:

573             if self.eh is not None:
574                 # this part is tricky to support backward compatibility...
575                 # check for deprecated ev_error (< 1.8)
576                 if hasattr(self.eh, 'ev_error'):
577                     self.eh.ev_error(self)

However, FSAction.ev_error is not defined: type object 'FSAction' has no attribute 'ev_error'

FSAction does not define ev_error, nor its ancestor classes (Action, EventHandler), thus I remove it in my patch proposal. It is not an issue with clustershell.

Yep! But in clustershell 1.7 and below, EventHandler.ev_error() was defined, and subclasses could inherit from it, this was totally legit. Now upgrading to 1.8 breaks that, this is what I mean when I say it's an issue with clustershell. Of course, your patch will work fine with clustershell 1.8, and because FSAction.ev_error() was an no-op, it's also fine with previous versions of clustershell. But we might want to fix clustershell compat rather than shine in that case. Let's see what @degremont think.

Since it's a no-op could we just... maybe.. fix both? :)
Could log-once that it's deprecated on clustershell side (which will make things work again with a warning to stop using it) and remove it here as it really is deprecated

As @thiell said, we are both in line with the philosophy of ClusterShell EventHandlers and how the update in 1.8 should have been done.
This is a bug in CS, and the fix is not to remove this kind of ev_error calls in all applications doing this :)
(by the way, ev_timeout has the same issue)

However, I do not like having Shine being incompatible with CS 1.8, so we will probably land this patch.