juanmc2005 / diart

Trying to write a simple example test using the config and found a bug

Argument in argparse is named tau

Line 21 in 858cd1a

    
           parser.add_argument("--tau", default=0.5, type=float, help=f"{argdoc.TAU}. Defaults to 0.5")

passed to from_dict, which looks for tau_active instead, replacing whatever i feed to it with the default:

diart/src/diart/blocks/config.py

Line 110 in 858cd1a

tau = utils.get(data, "tau_active", None)

Same is true of rho -> rho_update and delta -> delta_new and max-speakers -> max_speakers and cpu -> device

Some additional things i noticed in these modules that aren't bugs per se but just some tips:

The initialization logic is written twice, once in __init__ and once in from_dict without clear purpose. this makes for very fragile and bug-prone code. I'm not sure what the from_dict method is supposed to do differently than just calling the class like PipelineConfig(**args) - my expectation as a user would be that they would be identical. In fact this seems to be the cause of the bug at hand - the **kwargs in __init__ is superfluous since they are ignored, but if it wasn't there and I called Pipelineconfig(**args) then I would have gotten an exception from passing an unexpected parameter.
The default values are actually defined three times - another time in the argparser itself. Definitely recommend having a single place these are defined.
you probably want @classmethod here:

diart/src/diart/blocks/config.py

Line 95 in 858cd1a

@staticmethod
try using the abc module for abstract classes like eg. -

diart/src/diart/blocks/config.py

Line 12 in 858cd1a

class BasePipelineConfig:
this signature should probably be data: dict since the method is named "from_dict" -

diart/src/diart/blocks/config.py

Line 30 in 858cd1a

def from_dict(data: Any) -> 'BasePipelineConfig':
dicts already have a get method, eg. {'a': 'dict'}.get('a', None) that works the same as utils.get
the __init__ methods are very heavy for a config class - instantiating two models. If this is just a config file you should probably just store the string values and instantiate the models elsewhere.

part of openjournals/joss-reviews#5266

Hi @sneakers-the-rat,

tau_active and tau are aliases, you can see in the following lines that it checks for tau_active first, if that fails it looks for tau defaulting to 0.6 (see here). The same applies for the other arguments you mention.
Notice that the cli argument max-speakers is converted to max_speakers automatically by argparse.

I do agree on the comments about the initialization code and the repetitiveness. Do you have any suggestions for a pythonic way of addressing this?

Ah, my bad, something was causing my test to fail and it was late so i must not have read carefully.

Yes! I am unsure what the from_dict method is supposed to do differently than the __init__ method - i test that they are identical here: https://github.com/sneakers-the-rat/diart/blob/b1a0ccaa35f8b36aa30f978a4bcb16db69652a42/tests/test_config.py#L35

usually when you have a from_x method it's because the thing you're instantiating from is different in form than how you usually instantiate it - eg. a from_json method might take a path to a .json file with instantiation arguments. In this case, instantiating like MyClass(**{'arg1':'val1', 'arg2':'val2'}) is the same as MyClass(arg1='val1', arg2='val2')

So my suggestion would be to remove the from_dict method and use the same variable names everywhere.

I don't mean to harp on you about a minor thing, I am more focused on showing you how to make the code more testable by removing special cases, minimizing where things can go wrong, etc. I won't be able to review the entire package, so this is just one example of a strategy for avoiding bugs.

Implemented in #189

[joss] Inconsistent naming causes cli arguments to be ignored