aphp / edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

Home Page:https://aphp.github.io/edsnlp/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: better API for adding pipes to a pipeline

percevalw opened this issue · comments

Feature type

Adding a pipe to a pipeline has quite a few limitations at the moment:

import edsnlp

nlp = edsnlp.blank('eds')
nlp.add_pipe('eds.matcher', config={"terms": {"key": ["expr 1", "expr 2"]}})
...
  • there is no easy way of knowing which pipes are available from the notebook / IDE, and there is no autocompletion
  • all pipe parameters are nested in a configuration dict, which is cumbersome
  • there is no autocompletion of these parameters, since they are passed via a configuration dict

We can deviate from spacy iconic API and think of something better along these lines:

import edsnlp
import edsnlp.pipes as eds

nlp = edsnlp.blank('eds')
nlp.add_pipe(eds.matcher(terms={"key": ["expr 1", "expr 2"]}))

The problem is, some pipes (like eds.matcher) requires an nlp object at init time which is given by add_pipe. We could ask the user to provide the nlp argument nlp.add_pipe(eds.matcher(nlp=nlp, terms={"key": ["expr 1", "expr 2"]})) but this feels redundant.

Another option is to have promise = eds.matcher(terms={"key": ["expr 1", "expr 2"]}) return a "promise"/"curried" component if a required nlp attribute is missing, which is actually instantiated when it is added to the pipeline (via promise.instantiate(nlp=self)). This feels like an anti-pattern, and therefore should be extensively documented, and produce warnings whenever a user tries to use a non-initialized pipe outside a pipeline.