oss2: AttributeError: type object 'FSAction' has no attribute 'ev_error'
btravouillon opened this issue · comments
On a system running with clustershell 1.8, shine fsck fails with the following backtrace:
[root@admin ~]# shine fsck -f scratch -d -n oss2 -i 3
Fsck scratch on oss2: are you sure? (y)es/(N)o: y
FSProxyAction fsck on oss2
SSHCLIENT: ssh -oForwardAgent=no -oForwardX11=no -oConnectTimeout=30 -oBatchMode=yes oss2 /usr/sbin/shine fsck -f scratch -R -d -l scratch-OST0003
oss2: SHINE:3:<pickle>
oss2: SHINE:3:<pickle>
oss2: POPEN: e2fsck -f -C2 /dev/disk/by-id/wwn-0x500800380001bd50 -y
oss2: LINE e2fsck 1.42.13.wc5 (15-Apr-2016)
oss2: SHINE:3:<pickle>
oss2: Fsck of scratch-OST0003 (/dev/disk/by-id/wwn-0x500800380001bd50) failed
oss2: >>
oss2: Traceback (most recent call last):
oss2: File "/usr/sbin/shine", line 34, in <module>
oss2: sys.exit(Controller().run_command())
oss2: File "/usr/lib/python2.7/site-packages/Shine/Controller.py", line 259, in run_command
oss2: rc = command.filter_rc(command.execute())
oss2: File "/usr/lib/python2.7/site-packages/Shine/Commands/Base/FSLiveCommand.py", line 127, in execute
oss2: result = max(result, self.execute_fs(fs, fs_conf, eh, vlevel))
oss2: File "/usr/lib/python2.7/site-packages/Shine/Commands/Fsck.py", line 125, in execute_fs
oss2: mountdata=self.options.mountdata)
oss2: File "/usr/lib/python2.7/site-packages/Shine/Lustre/FileSystem.py", line 534, in fsck
oss2: self._run_actions()
oss2: File "/usr/lib/python2.7/site-packages/Shine/Lustre/FileSystem.py", line 272, in _run_actions
oss2: task_self().resume()
oss2: File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 803, in resume
oss2: self._resume()
oss2: File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 766, in _resume
oss2: self._run(self.timeout)
oss2: File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 400, in _run
oss2: self._engine.run(timeout)
oss2: File "/usr/lib/python2.7/site-packages/ClusterShell/Engine/Engine.py", line 723, in run
oss2: self.runloop(timeout)
oss2: File "/usr/lib/python2.7/site-packages/ClusterShell/Engine/EPoll.py", line 157, in runloop
oss2: client._handle_read(sname)
oss2: File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Worker.py", line 454, in _handle_read
oss2: msgline(self.key, msg, sname)
oss2: File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Worker.py", line 577, in _on_msgline
oss2: self.eh.ev_error(self)
oss2: File "/usr/lib/python2.7/site-packages/Shine/Lustre/Actions/Fsck.py", line 101, in ev_error
oss2: FSAction.ev_error(self, worker)
oss2: AttributeError: type object 'FSAction' has no attribute 'ev_error'
Fsck failed
= FILESYSTEM STATUS (scratch) =
TYPE # STATUS NODES
---- - ------ -----
OST 1 offline oss2
This is related to cea-hpc/clustershell#232 where ev_error has been dropped.
Thx @actatux, but upgrading to clustershell 1.8 shouldn't have broken this, so I think you discovered a case that we didn't handle properly after the 1.7 -> 1.8 EventHandler API changes... doh.
Hi @thiell. The API change seems to be handled correctly. Indeed, Fsck.ev_error is fine thanks to the following code in lib/ClusterShell/Worker/Worker.py:
573 if self.eh is not None:
574 # this part is tricky to support backward compatibility...
575 # check for deprecated ev_error (< 1.8)
576 if hasattr(self.eh, 'ev_error'):
577 self.eh.ev_error(self)
However, FSAction.ev_error is not defined: type object 'FSAction' has no attribute 'ev_error'
FSAction does not define ev_error, nor its ancestor classes (Action, EventHandler), thus I remove it in my patch proposal. It is not an issue with clustershell.
Yep! But in clustershell 1.7 and below, EventHandler.ev_error()
was defined, and subclasses could inherit from it, this was totally legit. Now upgrading to 1.8 breaks that, this is what I mean when I say it's an issue with clustershell. Of course, your patch will work fine with clustershell 1.8, and because FSAction.ev_error()
was an no-op, it's also fine with previous versions of clustershell. But we might want to fix clustershell compat rather than shine in that case. Let's see what @degremont think.
Since it's a no-op could we just... maybe.. fix both? :)
Could log-once that it's deprecated on clustershell side (which will make things work again with a warning to stop using it) and remove it here as it really is deprecated
As @thiell said, we are both in line with the philosophy of ClusterShell EventHandlers and how the update in 1.8 should have been done.
This is a bug in CS, and the fix is not to remove this kind of ev_error
calls in all applications doing this :)
(by the way, ev_timeout
has the same issue)
However, I do not like having Shine being incompatible with CS 1.8, so we will probably land this patch.
I've opened cea-hpc/clustershell#377 for that.