explosion / catalogue

Super lightweight function registries for your library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question: In spaCy you use catalogue to automatically register cli commands, how exactly you are doing that?

AlirezaTheH opened this issue · comments

in spacy.cli.util.py you did this:

app = typer.Typer(name=NAME, help=HELP)

def setup_cli() -> None:
    # Make sure the entry-point for CLI runs, so that they get imported.
    registry.cli.get_all()
    # Ensure that the help messages always display the correct prompt
    command = get_command(app)
    command(prog_name=COMMAND)

Where the registry.cli is a catalogue Registry. But I can't get where you actually register commands to this. because they are in different python files and won't be registered normally. But somehow it is working. can you please give me some explanation?

‌PS: When I use the same structure commands in other files doesn't get imported.

You can think of a registry as just being a dictionary. Some code has to be run for things to end up in the registry, but you don't have to call a function for this to happen, it can be enough to import the relevant module.

For example pretend we have this file:

# stuff.py

print("This is some code.")

def bloop(func):
    print("Bloop!")
    return func

@bloop
def dostuff():
    print("I am doing stuff")

If you import that, any code outside of functions definitions will be executed immediately, so it'll print This is some code.. Function definitions are also "executed" - this doesn't execute their insides, it just saves the function name with the function definition. But decorators are executed at this point too, just like code outside functions. So Bloop! will also be printed as soon as you import this file.

Functions are added to the registry inside decorators, which is why they are run with code that has been imported but not called in any way. (Sorry if you know all that, but sometimes people are unaware of it.)

There's also some magic going on with "entry points" so that spaCy knows how to find other packages to import so they end up in the registry; this is how spacy info knows what pipelines you have installed.

There's also some magic going on with "entry points" so that spaCy knows how to find other packages to import so they end up in the registry; this is how spacy info knows what pipelines you have installed.

This is the part that I want to know. how exactly spaCy know which files to import, and how they are imported?

"Entry points" are a Python feature that allow packages to announce they exist to other packages, amongst other things. The most common use is to tell pip about commands you can use in the shell, but there are many other ways to use it.

The spec:

https://packaging.python.org/specifications/entry-points/

A blog post explaining how you use them, if not how they're implemented:

https://amir.rachum.com/blog/2017/07/28/python-entry-points/

This package gets entry points using importlib.metadata, which was new in Python 3.8. For older Python versions we use a backport.

The actual call is here.

Thank you for your time, I've already checked this.
When registry.cli.get_all() is called, catalogue searches through AVAILABLE_ENTRY_POINTS to find if there is an entry point with name spacy_cli to load it. but there is no such an entry point. so it will load nothing. So there must be some other thing that loads spacy.cli module which will register the commands, but I can't find it.
What I am missing?

Oh, I see now. It's all because you import actual functions and app in cli.__init__. So if there wasn't any functions other than commands they won't be registered. registry.cli.get_all() was very deceptive!