allenai / dolma

Data and tools for generating and inspecting OLMo pre-training data.

Home Page:https://allenai.github.io/dolma/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tagger_modules do not work in current git version

peterbjorgensen opened this issue · comments

The modules show up with dolma list --tagger_modules mypackage.mymodule but it crashes if you do dolma tag --tagger_modules mypackage.mymodule ...
The problem is that the tagger modules are not loaded before this part of the code which instantiates the taggers by name

for tagger_name in taggers:
# instantiate the taggers here to make sure they are all valid + download any necessary resources
tagger = TaggerRegistry.get(tagger_name)
# delete the tagger after we are done with it so that we don't keep it in memory
del tagger

This means the dolma tag commands crashes with ValueError: Unknown tagger mytagger ...

Excellent catch!! fixed in main.