nansencenter / metanorm

Metadata normalizing tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simplification of metanorm

aperrin66 opened this issue · comments

Metanorm was originally designed this way: each normalizer takes care of one metadata convention, then passes responsibility for the attributes it could not fill to the next normalizer.

Looking at the state of the code now, it appears that this is only applicable in some rare cases. The metadata conventions are followed so loosely and vary so much from one metadata provider to the next that most normalizers end up being specific to a provider.

This results in weird and/or inefficient code which must reconcile real world cases with the original design of metanorm.

We could probably make the code both more simple and efficient by having a structure like this:

  • one normalizer per provider
  • each normalizer has a method that can tell from the raw attributes if the normalizer can be used
  • each normalizer provides the necessary methods to fill all the attributes

UPDATE:
The base structure is in place, and I migrated the Creodias normalizer to have a simple example.

Here are the remaining normalizers to migrate/create (hopefully I did not forget any):

For each of these, please create a branch from issue81_simplification_refactoring, do the modifications, and open a pull request with issue81_simplification_refactoring as target branch.

The new normalizers will be put in the metanorm/normalizers/geospaas/ folder.

The new Creodias normalizer can be taken as example.

Once all the normalizers have been migrated, we can remove the old base classes and move on to adapt geospaas_harvesting.

If an existing normalizer does not have a particular get_...() method, remember to check url.py.
Some hard-coded values are defined in there even for providers which have their own normalizer.

There is some repetition in the various normalizers, but I would like to wait for all normalizers to be migrated to the new format before factorizing.